Zaki B. Nossair
Stephen A. Zahorian
Dept. of Elec. and Comput. Eng., Old Dominion Univ., Norfolk, VA 23529
In previous synthesis experiments with multi-tone vowels it was found that vowel perception is more closely correlated with acoustic cues derived from the envelope of the magnitude spectrum than with cues derived from the overall magnitude spectrum [Zahorian et al., J. Acoust. Soc. Am. 93, 2298--2299 (1993)]. In the present study automatic vowel classification experiments were used to compare spectral magnitude features versus spectral envelope features. The features computed were cepstral coefficients, similar to those typically used in automatic speech recognition, versus cepstral coefficients computed from the envelope spectrum. The two feature sets were compared for four conditions: a single frame of clean speech, a single frame of noisy speech with varying signal-to-noise ratios, multi-frames of clean speech, and multi-frames of noisy speech. For clean speech conditions, the automatic classification results were nearly identical for the two feature sets. However, for both of the noisy speech conditions, the spectral envelope features resulted in a small but consistent improvement in vowel classification relative to results obtained with spectral magnitude features. These results corroborate that, for noise-free conditions, the normal cepstral coefficients indirectly encode the envelope spectrum. However, a more direct envelope representation is preferred to improve noise immunity.