ASA 127th Meeting M.I.T. 1994 June 6-10

2pSP46. Temporal masking in automatic speech recognition.

M. Pavel

Hynek Hermansky

Oregon Graduate Inst., P. O. Box 91000, Portland, OR 97291-10000

It is reasonable to expect that relevant features for speech recognition should be the features that are well heard. Therefore the successful extraction of such relevant acoustic features must respect properties of the auditory periphery. To examine its temporal properties full masking (detection) and partial masking (loudness matching) experimental paradigms are used. The results of these masking experiments suggested a model consisting of a combination of linear and nonlinear components. Hence a recently introduced RelAtive SpecTrAl (RASTA) engineering technique for speech feature extraction [Hermansky et al., Proc. Int. Conf. Acoustic, Speech, and Signal Process. (1993)] has been introduced, which employs linear temporal filtering of critical-band spectral energies done between two static nonlinearities and which has been successful as a feature extraction technique in automatic speech recognition. A correspondence between the results of the experiments and the behavior of the RASTA model has been found. Possible modifications of the RASTA technique to incorporate additional details of the experimental results will be discussed.