[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mfcc filters gain

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: mfcc filters gain
From: Jean-Julien Aucouturier <jj@xxxxxxxxxxx>
Date: Thu, 4 Nov 2004 11:07:50 +0100
Delivery-date: Thu Nov 4 05:28:11 2004
In-reply-to: <200411040457.iA44uiHr009686@mailscan4.cc.mcgill.ca>
Organization: SONY CSL
References: <200411040457.iA44uiHr009686@mailscan4.cc.mcgill.ca>
Reply-to: Jean-Julien Aucouturier <jj@xxxxxxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803

I am also wondering if some work has already be done to improve mfcc-like processing. As it is suggested in [1], Moore's ERB scale or
Bark scale seems to be more appropriated than the mel scale, and gammatone filterbank should be much more accurate (even if probably more
computationaly expensive) than a triangular filterbank ?


One specific "variant" that I know of :

In [1], //the authors propose a simple extension of the MFCC algorithm
to better account for music signals. Their observation is that the MFCC
computation averages (or sums, depending on whether you normalize or
not) the spectrum in each sub-band, and thus reflects the average
spectral characteristics. However, very different spectra can have the
same average spectral characteristics. Notably, they argue that it is
important to also keep track of the relative spectral distribution of
peaks (related to harmonic components) and valleys (related to noise).
Therefore, they extend the MFCC algorithm to not only compute the
average spectrum in each band (or rather the spectral peak), but also a
correlate of the variance, the Spectral Contrast (namely the amplitude
between the spectral peaks and valleys in each subband). This modifies
the algorithm to output 2 coefficients (instead of one) for each Mel
subband. Additionally, in the algorithm published in [1], the authors
replace the Mel filterBank traditionally used in MFCC analysis by an
octave-scale filterbank (C_0 - C_1, C_1 - C_2, etc.), assumably more
suitable for music. They also decorrelate the spectral contrast
coefficients using the optimal Karhunen-Loeve transform.

This algorithm was successful at improving the classification rate for a
musical genre classification task, as reported in [1]. I have compared
several implementations of this variant (notably using the regular Mel
filterbank, or a DCT approximation to the K-L transform) to regular
MFCCs on a music similarity task, and found no noticeable improvements
of precision/recall  (+/- 1%). I have to admit I'm a bit puzzled by the
idea of considering statistical variations "inside critical bands". For
what I understand, as critical bands integrate the energy in their
range, the authors' approach amounts to looking at finer details than
what the auditory system does.

Best,
JJ


[1] D.-N. Jiang, L. Lu, H.-J. Zhang, J.-H. Tao, and L.-H. Cai. Music
type classification by spectral contrast feature. In Proceedings of The
IEEE International Conference on Multimedia and Expo,
Lausanne (Switzerland), August 2002.

--
Jean-Julien Aucouturier, Assistant Researcher
http://www.csl.sony.fr/~jj
SONY CSL Paris        Tel: (33) 1 44 08 05 11
6, rue Amyot          Fax: (33) 1 45 87 87 50
75005 PARIS

Prev by Date: Re: the efferent auditory system
Next by Date: Re: mfcc filters gain
Previous by thread: Re: FW: FW: mfcc filters gain
Next by thread: Re: mfcc filters gain
Index(es):
- Date
- Thread