Re: Origin of the Mel frequency scale equation? (Dan Ellis )

Subject: Re: Origin of the Mel frequency scale equation?
From:    Dan Ellis  <dpwe@xxxxxxxx>
Date:    Mon, 10 Mar 2008 11:54:56 -0400

Davis & Mermelstein (1980) say in their footnote: | Fant [ 8] compares Beranek's mel-frequency scale, Koenig's scale, | and Fant's approximation to the mel-frequency scale. Since the differ- | ences between these scales are not significant here, the mel-frequency | scale should be understood as a linear frequency spacing below Hz | and a logarithmic spacing above 1000 Hz. where [8] is: C. G. M. Fant, "Acoustic description and classification of pho- netic units," Ericsson Technics, vol. 1, 1959; also G. Fant, Speech Sounds and Features. Cambridge, MA: MIT Press, 1973, pp. 32-83. Searching on Beranek+mel, I find references to: LL Beranek, Acoustic Measurements, Wiley, New York, 1949), p.329. as the source for mel(f) = 1127 ln(1 + f/700) This is the equation used in HTK's HSigP.c (presumably as written by Steve Young in 1989), which is probably the most widely-used mel calculation in the world by data processed. In a detailed study by Umesh, Cohen and Nelson published at ICASSP'99, they cite O'Shaughnessy's 1987 book as the source for mel(f) = 2595 log_10(1+f/700), which is the same in base 10. Fitting the Mel Scale, S. Umesh, L. Cohen, D. Nelson, ICASSP 1999 (Phoenix) , I-217-220. They also cite that Koenig first proposed a "split" approximation that is linear below 1000 Hz and logarithmic above, adjusted to have continuous slope at 1000 Hz. W Koenig, "A new frequency scale for acoustic measurements" Bell Telephone Laboratory Record, vol. 27, pp. 299-301, 1949 This is the form used in Slaney's Matlab Auditory Toolbox (1993), probably the second-most widely used version of the calculation. A couple of years ago I tried to implement a"universal" MFCC calculation routine that could mimic the various other implementations I knew of. It isn't perfect, but at least it makes explicit some of the axes of variation. DAn. p.s. here is a version of Slaney's equation, which maps the frequency range 133 Hz to 6400 Hz to the range 0.0 to 40.0, with 1000 Hz mapping to 13.0, and being linear below and logarithmic above that point. mel(f) = { (f - f_0)/f_step for f <= f_b { m_b + ln(f/f_b)/m_step for f > f_b where f_0 = 133.33, f_step = 66.67, f_b = 1000, m_b = (f_b - f_0)/f_sp = 13.0 (by construction) and m_step = ln(6.4)/27 - so the range from 1000 Hz to 6400 Hz accounts for the remaining 27.0 to take the scale up to 40.

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University