L. F. M. ten Bosch
Inst. for Perception Res./IPO, P.B. 513, Eindhoven, The Netherlands
The design of an algorithm that classifies pitch movements is discussed. The algorithm consists of two steps: (a) a training phase, which is based on a labeled training corpus of 249 grammatical Dutch sentences, and (b) the recognition phase. The IPO labeling system [J.'t Hart et al., A Perceptual Study of Intonation. An Experimental-phonetic Study to Speech Melody (Cambridge U.P., Cambridge, 1990)] is used. The setup of the algorithm is based on (A) the extraction of two time-varying features (pitch, and a feature called vowel strength), followed by (B) a search for characteristic movements over time in the resulting feature set, (C) a decision procedure similar to a linear discriminant analysis, and (D) the use of an intonation grammar. Without using (D), a classification rate of 81% is attained by using the database of 249 semispontaneous sentences, spoken by more than 40 male and female voices. Invoking the grammar [step (D)], at least 6% of the remaining 19% errors can be resolved by disambiguating phonetic candidates. The approach toward the automatic recognition of pitch movements is essentially different from the approach followed in speech segment recognition systems based on hidden Markov modeling. This difference will be discussed. Although the algorithm is developed for Dutch speech material and for the labels as defined in the IPO intonation labeling system, the method is generally applicable to other languages and to other (consistent) intonation labeling systems.