Jont B. Allen
Acoustics Res. Dept., Rm. 2D553, AT&T Bell Labs, Murray Hill, NJ 07974
Until the performance of automatic speech recognition hardware surpasses human performance in accuracy and robustness, one stands to gain by understanding the basic principles behind how humans recognize speech. This problem has been studied extensively at Bell Labs between the years of 1920 and 1950 by Harvey Fletcher and his colleagues. The motivation for these studies was to quantify speech sounds in the telephone plant and to improve speech articulation and quality. To do this they studied the effects of filtering and noise on the average speech articulation errors for nonsense CVC syllables, words, and sentences. During WWII these studies were continued at Harvard, but under much more adverse conditions of filtering, noise, and distortion. These studies have recently been extended to include the effects of reverberation degradation due to changes in the modulation transfer function. Fletcher and his colleagues derived a linear speech information density function D((omega)) for the CVC's and found a formula that accurately predicts the average CVC errors. The area under D is called the articulation index. Many people are not familiar with Fletcher's derivation, which is very insightful. He then went on to find the relations between the errors for these nonsense speech sounds and words and sentences. This work has recently been reviewed and extended by Boothroyd. Taken as a whole, these studies can tell us a great deal about how humans process and recognize speech sounds.