[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Robust method of fundamental frequency estimation.



Erick,

I do not see why you say I am wrong. I think the argument you use to say
that is the one in which you apply the cosine transform three times.
However, this scenario  does not correspond to the one I described. In the
scenario I described autocorrelation is applied to TIME-domain signals
(i.e., the output of the filterbank), not spectrums.

Let me describe my reasoning again with more detail. To facilitate the
explanation let's assume we have infinite length signals and infinitely
narrow filters. Applying the filterbank to the signal leave us with a
decomposition of the signal into its sinusoidal components. Since there is
only one sinusoid per channel, the spectrum at each channel consists of a
single pulse (possibly of zero magnitude) at the central frequency of the
channel. Computing autocorrelation at each channel corresponds to squaring
the magnitude of the spectrum of the signal (a single pulse) and
synthesizing a cosine at that frequency (by Wiener?Khinchin theorem). The
summary autocorrelation just adds those cosines over channels.

Since linearity of a cosine transform allows us to change the order of
synthesis and addition, we can perform first the addition of the
spectrums, which would leave us with the square of the original spectrum,
and then perform the cosine transform, but this is just the
autocorrelation of the original signal (by Wiener?Khinchin theorem).

This argument can be easily extended to wider non-overlapping rectangular
filters. In the case of non-rectangular gammatone ERB filters things may
change a little bit, but I do not see how this change could help to
improve the estimation of pitch.

Arturo

> Arturo Camacho <acamacho@xxxxxxxxxxxx> wrote:
>
>
>>> autocorrelation-based pitch models that can NOT be expressed in terms
>>> of the spectrum. For example, the Meddis & Hewitt or Meddis & O'Mard
>>> models, or Slaney & Lyon models,
>>> derived from Licklider's duplex theory, which do the ACF after what
>>> the cochlea model does, which is a separation into filter channels and
>>> a
>> If I am
>> not wrong, what Slaney & Lyon?s model does is to apply a summary
>> autocorrelation to the output of a gammatone filterbank (it does some
>> extra steps, but the main idea is that one). Since this can be shown to
>> be equivalent to applying autocorrelation to the original signal (use
>> Wiener?Khinchin theorem and linearity property of Fourier Transform),
>>
>
> Roberto,
>
>
> Your are wrong in your guess that to apply a summary autocorrelation to
> the output of a filterbank is equivalent to applying autocorrelation to
> the original signal. According to the theorem you mentioned but perhaps not
> understood, autocorrelation corresponds to performing cosine transform
> twice, i.e. back and forth: A first cosine transform of a signal f_0(t)
> from time domain yields F_0(omega) in frequency domain.
> Subsequent second cosine transform of F_0(omega) yields a f_1(tau) in time
>  domain again. These two steps together correspond to the autocorrelation
> function ACF of the  o r i g i n a l  signal: f_0-->f_1(tau). Remember:
> ACF corresponds to
> twice cosine transform, a first one and an inverting second one.
>
> Bogert and Tukey called that inverted spec_trum a ceps_trum, inverting
> the order of letters in the syllable spec into ceps.
>
> This f_1(tau) is what perhaps comes close to a major part of auditory
> function even if it is hard to abandon what we learned that we are hearing
> frequencies and admit that autocorrelation lag is largely equivalent to
> frequency.
>
> ACF of the spectrum F_0(omega) would correspond not to just two but to to
> three cosine transforms in series and eventually result in a function F_1
> of omega: f_0(t)-->F_0(omega]-->f_1(tau)-->F_1(omega).
>
> Brain cannot directly process functions of omega. In cat, there are about
> 33,000
> T-multipolar chopper neurons of the ventral cochlear nucleus (VCN). T
> means they immediately project to the IC via trapezoid body (TB). They
> might translate place code into downsampled frequencies while preserving
> tonotopy at a time. At least they show very regular responses with a
> highly reproducible pattern of spike trains in which the interspike
> intervals are all about the same length. Frequencies of chopper neurons
> are on average about three times lower than average frequencies of firing
> within single auditory nerve fibers which already tend to be considerably
> lower than each belonging characteristic frequency CF for CFs in excess of
> 500 Hz.
>
>
> Regards,
> Eckard Blumschein
>
>
>


-- 
__________________________________________________

Arturo Camacho
PhD candidate
Computer and Information Science and Engineering
University of Florida

E-mail: acamacho@xxxxxxxxxxxx
Web page: www.cise.ufl.edu/~acamacho
__________________________________________________