Chi Wei Che
CAIP Ctr., Rutgers Univ., Piscataway, NJ 08855-1390
When speech recognition technology moves from the laboratory to real-world applications, there is increasing need for robustness. This paper describes a system of microphone arrays and neural networks (MANN) for robust hands-free speech recognition. MANN has the advantage that existing speech recognition systems can directly be deployed in practical adverse environments where distant-talking sound pickup is required. No retraining nor modification of the recognizers is necessary. MANN consists of two synergistic components: (1) signal enhancement by microphone arrays and (2) feature adaptation by neural network computing. High-quality sound capture by the microphone array enables successful feature adaptation by the neural network to mitigate environmental interference. Through neural network computation, a matched training and testing condition is approximated which typically elevates performance of speech recognition. Both computer-simulated and real-room speech input are used to evaluate the capability of MANN. Measurements of isolated-word recognition in noisy, reverberant, and distant-talking conditions show that MANN leads to a word recognition accuracy which is within 4%--6% of that obtained under a close-talking condition in quiet.