Daniel P. W. Ellis
MIT Media Lab Perceptual Comput. Group, E15-368C, Cambridge, MA 02139
When extracting and identifying vowels from noisy backgrounds, the human auditory system must identify which frequency regions ``belong'' to the vowel (as formants), and which are extraneous. This is probably based, at least in part, on the common pitch-period amplitude-modulation of all these regions. Previous computer models of this process have used autocorrelation to identify this cue within each channel of a bandpass array [R. Meddis and M. J. Hewitt, J. Acoust. Soc. Am. 91, 233--245 (1992)]. Autocorrelation, however, is less effective for signals exhibiting rapid frequency modulation or jitter, even though these are strong cues to fusion. A model is proposed that detects comodulation synchrony between frequency bands through a simplified nonlinear approximation to the cross-correlation function. By grouping bands that show a steady modulation skew, pitch-period fluctuation can be used to reinforce grouping rather than weaken it. The physiological plausibility of such a process is also argued.