Max S. Cynader
Dept. of Ophthalmol., Univ. of British Columbia, 2550 Willow St., Vancouver, BC V5Z 3N9, Canada
It is well known that an array of N sensors can discriminate between at most N-1 continuous sources. However, if the time series produced by the sources contain transients, this limit applies only to samples that are shorter than the duration of the transients. This principle has been applied to the problem of localizing two speakers by using only two microphones. Human speech contains a large number of more or less sharp transients. The time of arrival of the transients at each microphone is estimated by finding the local maxima in the envelope of the time series. The time series at each microphone is replaced with the value of the envelope at the local maxima location, and is set to zero everywhere else. The cross correlation of the two resulting time series presents peaks at the time delays corresponding to the correct location of speakers, whereas the cross correlation of the original time series had a single peak at the wrong time delay. It is shown that the difference is due to the nonlinearity of the operations performed on the time series.