Xavier Rodet
IRCAM
Univ. Paris-6, Paris, France
Boris Doval
Univ. Paris-6, Paris, France
This presentation deals with the estimation of fundamental frequency (f[sub 0]) of pseudoperiodic sound signals with important results for polyphonic frequency tracking, and voice separation. Given a set of candidate partials in the signal, the estimation of f[sub 0] is taken in the sense of finding the optimal period duration(s) according to a criterion of maximum-likelihood harmonic matching. Excellent results have been obtained on large databases of speech (40 mn) and music [B. Doval and X. Rodet, Proc. IEEE-ICASSP, Toronto, May (1991)]. The algorithm has been implemented at IRCAM to run in real time for live performance frequency tracking. Developments are in several directions. A combined estimation of f[sub 0] and of a spectral envelope improves both estimations. Most important is the estimation of the ``a priori'' distributions of the different random variables on a learning set. Finally, a hidden Markov model tracks f[sub 0] trajectories between adjacent frames. The first experiments of polyphonic frequency tracking and voice separation are very promising. The model can be transposed directly to the maximum-likelihood estimation of several harmonic sounds since it already considers more than one f[sub 0] value. [sup a)]Presently on sabbatical at Ctr. for New Music and Audio Technol. (CNMAT), Univ. of California at Berkeley, 1750 Arch St., Berkeley, CA 94709.