[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MFCC method

I agree that the Greenwood function would be more logical, but as Lazlo Toth points out, there's not really enough difference to matter in cases that have been looked at.

Don Greenwood gave me a collection of his papers, which I'm supposed to put on-line some place and summarize better in wikipedia, too. Thanks for pointing out that article; I might link some paper copies there, and refer to it from the mel article.

And I want to thank Malcolm again since I might not have made it clear that the reason the info appears in wikipedia is because he sent it here, and sent me copies of the papers. He has the sense to convert his hearing library to digital form some years ago, so he can find things that I never can.


At 11:28 PM -0800 1/8/09, Arturo Camacho wrote:
Dear Dick,

The Wikipedia page that you mention says that the Mel scale
"approximates the human auditory system's response more closely than
the linearly-spaced frequency bands used in the normal cepstrum." If
that means that the Mel scale approximates better the tonotopic
response of the cochlea than the linear scale, I wonder if it would
not be an even better idea to use the Greenwood function (see entry in
Wikipedia), which was explicitly created with that purpose. (Recall
that the Mel scale was designed to represent equidistant steps in
pitch, but that does not necessarily corresponds with equidistant
tonotopic steps.)



On Thu, Jan 8, 2009 at 8:46 PM, Richard F. Lyon <DickLyon@xxxxxxx> wrote:
 Thanks Malcolm; now that you've told us, it's in wikipedia:
 Including the connection to earlier work by Pols; I can share
 a copy of Plomp, Pols, and van de Geer (1967) on request.


 At 2:07 PM -0800 1/7/09, Malcolm Slaney wrote:

 On Jan 7, 2009, at 12:40 PM, James W. Beauchamp wrote:

 I'm looking for a (the?) seminal article on the MFCC method of
 coding spectral envelopes. It could be a journal paper or a chapter
 in a book. Also, who was the first to publish on this idea?

 These are the usual references, especially the 1980 paper.

 P. Mermelstein, Distance measures for speech recognition, psychological
 and instrumental, in Pattern Recognition and Artificial Intelligence, C. H.
 Chen, Ed., pp. 374­388. Academic, New York, 1976.

 S.B. Davis, and P. Mermelstein, Comparison of Parametric Representations
 for Monosyllabic Word Recognition in Continuously Spoken Sentences, in IEEE
 Transactions on Acoustics, Speech, and Signal Processing, vol. 28(4), 1980,
 pp. 357­366.

 But Mermelstein usually credits John Bridle's work for the idea
        JSRU Report No. 1003
        J . S. Bridle and M. D. Brown

 I have copies of the early two if you need them.

 - Malcolm


Arturo Camacho, PhD
Computer and Information Science and Engineering
University of Florida

E-mail: acamacho@xxxxxxxxxxxx
Web page: www.cise.ufl.edu/~acamacho