5aSC36. The role of temporal and spectral cues in speech recognition.

Session: Friday Morning, December 6


Author: Philipos C. Loizou
Location: Dept. of Appl. Sci., Univ. of Arkansas, Little Rock, AR 72204-1099
Author: Michael F. Dorman
Location: Arizona State Univ., Tempe, AZ 85287-0102


It has been proposed [Shannon et al., Science 270, 303--304 (1995)] that the high levels of speech understanding achieved with a small number (three to four) of channels reflects the information content of ``primarily temporal cues'' for speech recognition. Here, this outcome is viewed differently. It suggests that only a small number of channels is necessary to allow reasonable resolution in the frequency domain. This interpretation is derived from the results of a set of experiments in which two conditions were created. In one, the speech materials were processed normally through a four-channel processor. In the other, the channel assignments were reversed; i.e., the envelope in channel 1 was directed to channel 4, etc. These stimuli had the same temporal information as the original stimuli but had abnormal information in the frequency domain. The identification of these stimuli was significantly worse than the identification of the normally processed stimuli. These results suggest that for a four-channel processor, speech recognition is determined principally by information in the frequency domain, and that temporal information plays only a supplemental role. Supporting evidence for this position comes from additional experiments in which speech was processed through a processor that outputs sine waves rather than bands of noise.

ASA 132nd meeting - Hawaii, December 1996