Daniel P. W. Ellis
MIT Media Lab, Perceptual Computing Group, E15-368C, Cambridge, MA 02139
Many of the rules employed by the human auditory system to fuse and segregate acoustic energy into separately perceived sources are described by Bregman [A. S. Bregman, Auditory Scene Analysis (MIT, Cambridge, MA, 1990)]. However, the precise application of these rules to anything but the most simple experimental stimuli is less well understood. This idealized model of low-level auditory processing [Ellis and Vercoe, J. Acoust. Soc. Am. 91, 2334 (A) (1992)]; [Ellis, J. Acoust. Soc. Am. 92, 2376 (A) (1992)] is specifically designed to facilitate automatic grouping for real stimuli. This is achieved by representing sounds as sets of sinusoid tracks, the discrete time-frequency elements required by such rules. This system is discussed for grouping tracks, and it is shown that the results are obtained by applying simple rules for harmonicity, common fate, and proximity to some real sounds. The means by which ambiguity has been resolved between different types of rule, along with other issues arising from the model, will also be described.