ASA 126th Meeting Denver 1993 October 4-8

4pSP4. A new masked spectrum representation applied to English /r/--/l/ dissimilarity measurement.

Kiyoaki Aikawa Reiko A. Yamada

ATR Human Information Process. Res. Labs., 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-02 Japan

This paper proves that a dynamic cepstrum is effective not only in automatic speech recognition, but also in explaining the speech perception mechanism. Talker dependency on the perception of American English /r/--/l/ in Japanese listeners has been reported [J. S. Logan et al., J. Acoust. Soc. Am. 89, 874 (1991)]. In order to interpret this phenomenon with regard to the acoustical properties of the stimuli, talker dependency is assumed to be caused by the acoustical dissimilarity of the stimuli. This paper applies a dynamic cepstrum to measuring the dissimilarity of the stimuli. The dynamic cepstrum is a new spectral representation for automatic speech recognition that incorporates the time-frequency characteristics of forward masking [K. Aikawa et al., J. Acoust. Soc. Am. 92, 2476(A) (1992)]. This parameter enhances formant shifts and suppresses stationary spectral features. In a perception experiment, identification tests for English /r/--/l/ minimal pairs uttered by five talkers were conducted. The dynamic cepstrum and a conventional cepstrum were compared for their performance in measuring the acoustical dissimilarity of the minimal pairs. The /r/--/l/ dissimilarity measured by the dynamic cepstrum shows a talker dependency highly correlated with the talker dependency of the perception experimental results. [Speech data were provided by Dr. D. B. Pisoni, Indiana University.]