("(Yoshitaka Nakajima)" )

From:    "(Yoshitaka Nakajima)"  <nakajima(at)KYUSHU-ID.AC.JP>
Date:    Tue, 22 Sep 1992 13:41:09 JST

Dear Al, Thanks for your quick response. My demonstration may be not an appropriate one for your purpose, but I just thought it showed something related to your stuff. Your explanation (derived from Whalen & Liberman's idea) seems possible basically. But I would like to return the following questions to you for all of us to think the matter over: 1. Why did my listeners hear several human voices instead of several pure tones or non-speech sounds ? It is strange that the components left for non-speech percepts produce the perceptual impression of human voices uttering the same vowel. 2. Why doesn't the same phenomenon take place when we listen to natural speech. If we need just a small amount of energy for phoneme perception and the rest is used to perceive other voices or sounds, we might hear more than one voice when listening to someone's speech. My temporary explanation of my demonstration presumes that the perception of the vowel and the perception of the inharmonic components are performed in parallel. 1) The vowel /a/, /i/ or /u/ is recognized from the fixed spectral envelope. 2) We perceive several voices (or tones) because we cannot fuse the inharmonic components into a single voice. 3) The vowel recognized in the first stage is allocated to the voices perceived in the second stage. Note that there is no reason to consider that the first stage, which is schematic, should be finished before the more primitive second stage. Taking into account the questions I put above, I think my present explanation is the most plausible one for the time being. But I'm open to any criticisms or "debugs". Yoshitaka Nakajima

This message came from the mail archive
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University