ASA 129th Meeting - Washington, DC - 1995 May 30 .. Jun 06

1pSC6. On the perceptual distance between speech segments.

Oded Ghitza

M. Mohan Sondi

Acoust. Res. Dept., AT&T Bell Labs., Rm. 2D-536, Murray Hill, NJ 07974

For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (Speech segments means pieces of a speech signal, of duration 50--150 ms.) Such a distance metric would be useful for speech coders at low bit rates because perturbations introduced by such coders typically last for several tens of milliseconds. It would also be useful for automatic speech recognition in adverse conditions. Since human beings perform well in spite of gross distortions of the signal (e.g., due to reverberation, noisy environments, etc.) it is justifiable to assume that mimicking human behavior will improve recognition performance. In this talk, attempts at defining such a metric will be described. The problem is approached in the framework of the Diagnostic Rhyme Test [DRT]. The errors made by subjects were measured when judiciously chosen time-frequency ``tiles'' were interchanged between the words in each pair of the DRT test [Ghitza, 2507--2515 (1993)]. Next the same task is performed with an array of automatic speech recognizers [Ghitza and Sondhi, Comp. Speech Lang. 7(2), 101--120 (1993)], using a parametrized distance metric. Finally, the parameters of the distance metric are optimized so as to minimize the difference between the error patterns of the human listeners and those of the automatic speech recognizers.