ASA 125th Meeting Ottawa 1993 May

2pSP7. Multi-microphone cross-correlation based processing for robust speech recognition.

Thomas M. Sullivan

Richard M. Stern

Dept. of Elec. and Comput. Eng. and School of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA 15213

A new algorithm of signal processing for robust speech recognition using multiple microphones is described. The algorithm, loosely based on human binaural perception, consists of imposing time-aligning delays on the speech signals from each microphone and passing the delayed speech through a bank of bandpass filters and nonlinear rectifiers. The outputs of the nonlinear rectifiers within each frequency band are cross-correlated, providing an estimate of the spectral profile of short-term energy in the speech signal that is resilient to the presence of off-axis noise sources. A cepstral representation of these energy estimates is used as the feature set for automatic speech recognition using the CMU SPHINX system. The multichannel cross-correlation-based algorithm was found to preserve the shape of vowel spectra in additive noise, and it provides better recognition accuracy than is obtained using equivalent single-channel processing with nonclosetalking microphones. The performance of this system was compared to that obtained using delay-and-sum beamforming and conventional adaptive filtering approaches. Finally, some implications of these results for human binaural hearing will be commented on. [Work supported by Motorola and DARPA.]