[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MFCC method


You are right that the normal cepstrum is partly motivated by the periodic frequency spectrum ripples than come from the pitch harmonics. There's not a lot of logic to using a Fourier or cosine transform on a warped frequency scale if you're looking for those pitch ripples. The real "logic", in retrospect at least, is the observation of Pols that the principle components capture most of the variance using a few smooth basis functions, smoothing away the pitch ripples; and the observations of others that the principal components of a collection of vowel spectra on a warped frequency scale aren't so far from the cosine basis functions.

Far be it from me to defend MFCC as a sound representation. But if what you care about is smoothed short-time power spectrum without much pitch effect, it's not bad.


At 8:18 PM -0800 1/9/09, Arturo Camacho wrote:
Actually, I do not find much logic behind taking the Fourier transform
(FT) of a log-amplitude spectrum transformed to a (quasi) logarithmic
scale, as done in MFCC. It is reasonable to take the FT of a
log-amplitude spectrum in the linear frequency scale (standard
cepstrum analysis) because this spectrum is often almost periodic (at
least for most naturally-occurring periodic signals). However, after a
(quasi-) logarithmic frequency scale transformation, I would rarely
expect the spectrum to be periodic (it will stretch as the frequency
increases), and therefore I do not find the logic behind trying to
represent it as a linear combination of sinusoids, as done implicitly
when taking a FT.