Cognitive and Neural Systems Dept., Boston Univ., 111 Cummington St., Rm. 244, Boston, MA 02215
Krishna K. Govindarajan
Res. Lab of Electron., MIT
Lonce L. Wyse
Inst. for Systems Sci., Natl. Univ. of Singapore
Michael A. Cohen
Cognitive and Neural Systems Dept., Boston Univ.
In environments with multiple sound sources, the auditory system is capable of teasing apart the impinging jumbled signal into different mental objects or streams. A neural model of auditory scene analysis is presented that groups different frequency components based on pitch and spatial location cues, and selectively allocates the components to different streams. Grouping is accomplished through a resonance that develops between a given object's pitch, its harmonic spectral components, and (to a lesser extent) its spatial location. Those spectral components that are not reinforced by being matched with the top-down prototype read-out by the selected object's pitch representation are suppressed, thereby allowing another stream to capture these components. The model simulates data from psychophysical grouping experiments, such as how an upward sweeping tone create a bounce percept by grouping with a downward sweeping tone due to frequency proximity, even if noise replaces the tones at their intersection point. The model also simulates illusory auditory percepts such as the auditory continuity illusion, and the Deutsch scale illusion whereby downward and upward scales presented alternately to two ears regroup based on frequency proximity. Stream resonances provide the coherence whereby one voice or instrument is tracked through a multiple source environment.