Dept. of Psychol., Univ. of Hong Kong, Hong Kong
The amplitude envelopes of rectified bandpass filtered speech have been found to provide useful cues for speech perception [K. W. Grant, L. D. Braida, and R. J. Renn, J. Acoust. Soc. Am. 95, 1065--1073 (1994)]. An analog terminal was built to yield 25 such envelopes from filters with center (carrier) frequencies from 150 to 4850 Hz. Each envelope was then subjected to another round of bandpass filtering and rectification to yield a modulation spectrum of up to nine channels with center (modulation) frequencies from half the carrier frequency to 700 Hz. The spectra were examined for cues for the identification of voicing, fundamental frequency, and consonants. Voicing was generally characterized by the concentration of formant energy at a single carrier and modulation frequency, corresponding to the formant and fundamental frequencies, respectively. The second formant of the front vowel /i/ and nasal release sometimes exhibited bimodal modulation spectra, suggesting multiple sources of modulation. Stop consonants and fricatives were characterized by elements scattered at high carrier and modulation frequencies whose occurrences might not coincide. Some consonants could be identified with elements at specific modulation frequencies: e.g., /g/ and /j/ suggested a 700-Hz source modulating carriers whose frequencies depended on the following vowel.