[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Robust method of fundamental frequency estimation.
That may be true, but there are other good time-domain
> correlation-based pitch models that can NOT be expressed in terms of the
> For example, the Meddis & Hewitt or Meddis & O'Mard models, or
Slaney & Lyon models,
derived from Licklider's duplex theory, which do the ACF after what the
cochlea model does, which is a separation into filter channels and a
I do not agree. If you know the frequency response of the cochlea, you can
predict the spectrum of its output from the spectrum of its input. The
effects of half-wave-rectification and compression are more difficult to
analyze, but not impossible. I remember reading a little bit about it in
Anssi Klapuri?s PhD thesis.
Did you consider any such models?
I have used these models in the past, but I stopped using them. If I am
not wrong, what Slaney & Lyon?s model does is to apply a summary
autocorrelation to the output of a gammatone filterbank (it does some
extra steps, but the main idea is that one). Since this can be shown to be
equivalent to applying autocorrelation to the original signal (use
Wiener?Khinchin theorem and linearity property of Fourier Transform), I
have not used it anymore.
For those unfamiliar, the Wiener Khinchine
theorem relates the power spectrum of a signal to
its autocorrelation function. One is the Fourier
transform of the other. The theorem is often
invoked to claim 'equivalence' between spectral
and temporal (or at least autocorrelation-based)
Note that the theorem gives a relation between
waveforms defined over infinite time, and spectra
defined over infinite frequency. The
'equivalence' does not apply strictly to
short-term transforms applicable to pratical
signals (band-limited and considered over a short
time interval), although the theorem is useful to
give insight to asymptotic behavior.
I believe the theorem can be extended to apply
rigourously to 'short-term ACF' and 'short-term
power spectra' (Jont Allen might have more to
say). However the 'running ACF' used in the
Licklider/Lyon/Slaney/Meddis&Hewitt model differs
from the 'short-term ACF', so again the
equivalence doesn't apply strictly.
Note also, when going from waveform to ACF via
the power spectrum, that the initial and final
Fourier transforms (both linear) are separated by
the power calculation (non-linear). Swapping ACF
and filterbank thus does not follow from
linearity. It might nevertheless be allowed by
orthogonality between basis functions of the
Fourier transform. However the property is again
lost if a rectifier or hair-cell model is
inserted at each filter output.
There are at least two reasons why filtering into
bands before the ACF might be useful.
One I think you mention in another email: by
weighting channels inversely to their amplitude
one can counter the expansive property of the
power (squared magnitude), that otherwise causes
the ACF to be dominated by high-amplitude parts
of the spectrum (e.g. formants of speech). The
'cepstrum' is another way to achieve a similar
Another reason for using a filterbank is that,
again by appropriate weighting of channels, you
can discount spectral regions dominated by noise
(or by another voice). Wu, Wang and Brown have
recently made use of this property in the context
of multiple-speaker F0 estimation, but I think
the real pioneer was Dick Lyon in an ICASSP paper
in 1983. He applied the idea to binaural
localization and separation, and Mitch Weintraub
(his student?) applied it a little later to
speech separation on the basis of F0 cues.