Date:    Mon, 21 Sep 1992 11:38:00 EST

Dear Al: I am not so sure Al Liberman's "pre-emptiveness" hypothesis should be taken seriously in the first place. If the speech module takes all the auditory information it needs, how come I hear ANYTHING when I listen to speech? You might answer: What I "hear" are the phonemes, not the sounds they are made of. This may be so in the case of stop consonants, whose transitory auditory correlates are difficult to hear (and describe) as sound objects, but it is patently false in the case of fricatives and other continuants as well as vowels, whose auditory correlates (noises or resonances of certain timbres or brightnesses) can be focused on relatively easily, without losing the speech percept. Even in the case of the stop consonants, it may be argued that we cannot describe what we hear, though we do hear the auditory substrate. I would be more interested in the related question of whether speech sound formation provides an auditory grouping principle that competes with principles of primitive stream segregation. For example, if the three components of a three-tone sine-wave speech analog were accompanied by a fourth tone that moves in parallel with the second tone but an octave higher, will listeners be able to perceive the speech? And if they perceive the speech, will they hear it as being accompanied by a high whistle? Alternatively, the second and fourth tones may group to form a single complex tone, which may either impair speech perception or perhaps give the "voice" a different quality. Another reservation about your proposed experiment: The information in sine-wave speech that affords speech perception is fairly abstract to begin with. Listeners hear the phonetic message, but they also hear the strange whistling quality of the speech. So, what, if anything, has been "removed" by the speech processor? The processor does not expect pure tones, it expects formants. Therefore, any experiment using pure tones may be dismissed as irrelevant to the issue. Moreover, to the extent that the relevant information is the RELATIONSHIP between several frequency-modulated resonances or tones, pre-emption of that relationship would leave the components of the auditory signal unaltered. Best regards, Bruno Repp Haskins Laboratories

This message came from the mail archive
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University