Stephen A. Zahorian
Zaki B. Nossair
Dept. of Elec. and Comput. Eng., Old Dominion Univ., Norfolk, VA 23529
In previous experiments for which multiple tone stimuli were synthesized such that either the formants or global spectral shape were matched to that of naturally spoken vowel tokens, it was found for both cases that vowel identity and quality was not well preserved [S. A. Zahorian and Z.-J. Zhong, J. Acoust. Soc. Am. 92, 2414--2415 (1992)]. In the present study several additional criteria were tested for selecting the amplitudes and frequencies of sinusoids with the objective that stimuli synthesized from these sinusoids would be perceived as most similar to original ``target'' vowel tokens. Of the methods investigated, vowel quality from stimuli synthesized from N sinusoids was best preserved if these sinusoids match the N largest peaks in the magnitude spectrum of the original vowels. Depending on the vowel, between 5 and 10 sinusoids are required such that the synthesized token is perceived as sounding nearly identical to the original token. A new metric for spectral shape that yields acoustically invariant cues to vowel perception in a manner consistent with the results of these experiments has been developed and will be presented. This new metric also predicts the number of sinusoids that are required to synthesize each vowel.