Christopher Long Phil Bangayan Abeer Alwan
Dept. of Elec. Eng., 66-147E Engr. IV, UCLA, 405 Hilgard Ave., Los Angeles, CA 90024-15940
An analysis-by-synthesis approach was adopted to classify the acoustic and perceptual features of three pathological voice qualities: breathy, strained, and rough. One hundred and sixty waveforms of the vowel /(open aye)/ spoken by female and male subjects with pathological voice qualities were obtained from the VA Hospital in West LA. The temporal and spectral features of the waveforms were studied and the results were used in synthesizing the utterances using the Klatt formant synthesizer. Preliminary results on 30 breathy and strained voices indicate that the perception of ``pathological'' breathiness is mainly related to: (1) a large open quotient of the glottal waveform (OQ) and (2) the amplitude of aspiration noise (AH) relative to that of voicing (AV) with female voices exhibiting a larger (AH--AV) than male voices. For some voices, it was also necessary to introduce extra poles to the vocal-tract transfer function to achieve a better spectral match. Synthesis of strained voices required a lower OQ than that needed for normal voices and, in some cases, amplitude and/or frequency modulation was introduced to achieve a better match in the time domain. The synthetic voices were judged perceptually by clinicians to be of high quality. The results will be discussed in terms of the effects of different vibratory patterns of the vocal folds on the acoustic speech waveform.