Philip N. Denbigh Hongyue Luo
School of Eng., Univ. of Sussex, Falmer, Brighton BN1 9QT, UK
An algorithm has been developed that can segregate overlapping voices by exploiting harmonicity. It is primarily a monaural method that relies on tracking pitch during the voiced parts of speech, and it does this successfully in spite of the spectral modifications imposed by room reverberation. The incoming waveform is divided up into frames that undergo spectral analysis by means of the FFT. Pitch determination is based on the subharmonic summation method, but tracking constraints are imposed that improve performance. First, the continuity of pitch is exploited by limiting the search for harmonics to the regions corresponding to the harmonics of the previous pitch measurement. Second, a condition is imposed whereby the amplitude of the harmonics cannot change significantly between frames. The combination results in a ``two-dimensional harmonic sieve'' that segregates voiced sounds. A remaining problem is that of associating a separated voice with the correct speaker when tracking is lost, for example, after a gap in voicing or after the pitch frequencies cross. For this, a binaural cue is invoked that relies on the average difference of arrival time of the relevant harmonics of one voice between two microphones. This also proves to be robust to reverberation.