ASA 128th Meeting - Austin, Texas - 1994 Nov 28 .. Dec 02

3aSP13. Modeling listeners' categorization of a large F1-F2-F3 continuum.

Terrance M. Nearey

Michael Kiefte

Dept. of Linguist., Univ. of Alberta, Edmonton, AB T6G 2E7, Canada

A number of alternate spectral representations have been suggested for vowel spectra [see H. Hermansky, J. Acoust. Soc. Am. 87, 1738--1752 (1990)]. To better evaluate the perceptual relevance of some of these, 972 vowels were synthesized. The stimuli were each 115 ms in duration with a falling F0 contour (125--100 Hz). F1 ranged (in 0.5 Bark steps) from 250 to 760, F2 from 750 to 2260, and F3 from 1360 to 3080 Hz. F4 and F5 were fixed at 3500 and 4500 Hz, respectively. (Constraints were placed on formant separations to ensure relatively natural stimuli.) Fifteen speakers of Western Canadian English categorized the stimuli as the vowels /i, (small capital eye), e, (cursive beta), (ae ligature), (inverted vee), (inverted open aye), o, (small capital you), u, (hooked backward eh)/. Preliminary results indicate that while nominal synthesis formant frequencies can provide a relatively good fit to the data, alternate representations such as cepstral coefficients based on Hermansky's PLP analysis may provide moderate improvements of fit. However, linear transformations of the PLP cepstra show strong correlations with formant frequencies [similar to those noted by D. Broad and F. Clermont, J. Acoust. Soc. Am. 86, 2013--2017 (1985)]. [Work supported by SSHRC.]