[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Origin of the Mel frequency scale equation?

Davis & Mermelstein (1980) say in their footnote:

| Fant [ 8] compares Beranek's mel-frequency scale, Koenig's scale,
| and Fant's approximation  to  the mel-frequency scale.  Since the  differ-
| ences  between  these scales are not significant  here, the mel-frequency
| scale should be understood as a  linear  frequency spacing  below Hz
| and  a  logarithmic spacing above 1000 Hz.

where [8] is:

C. G. M. Fant,  "Acoustic  description  and classification of  pho-
netic units," Ericsson Technics, vol. 1, 1959; also G. Fant,
Speech Sounds and Features. Cambridge, MA: MIT  Press, 1973,
pp. 32-83.

Searching on Beranek+mel, I find references to:

  LL Beranek, Acoustic Measurements, Wiley, New York, 1949), p.329.

as the source for mel(f) = 1127 ln(1 + f/700)

This is the equation used in HTK's HSigP.c
(presumably as written by Steve Young in 1989), which
is probably the most widely-used mel calculation in the
world by data processed.

In a detailed study by Umesh, Cohen and Nelson published
at ICASSP'99, they cite O'Shaughnessy's 1987 book as the source
for mel(f) = 2595 log_10(1+f/700), which is the same in base 10.

  Fitting the Mel Scale, S. Umesh, L. Cohen, D. Nelson,
  ICASSP 1999 (Phoenix) , I-217-220.

They also cite that Koenig first proposed a "split" approximation
that is linear below 1000 Hz and logarithmic above, adjusted
to have continuous slope at 1000 Hz.

  W Koenig, "A new frequency scale for acoustic
  measurements" Bell Telephone Laboratory Record,
  vol. 27, pp. 299-301, 1949

This is the form used in Slaney's Matlab Auditory Toolbox (1993),
probably the second-most widely used version of the calculation.

A couple of years ago I tried to implement a"universal" MFCC
calculation routine that could mimic the various other implementations
I knew of.  It isn't perfect, but at least it makes explicit some of the
axes of variation.


p.s. here is a version of Slaney's equation, which maps the frequency
range 133 Hz to 6400 Hz to the range 0.0 to 40.0, with 1000 Hz mapping
to 13.0, and being linear below and logarithmic above that point.

mel(f) = {  (f - f_0)/f_step   for f <= f_b
         { m_b + ln(f/f_b)/m_step   for f > f_b

where f_0 = 133.33, f_step = 66.67, f_b = 1000,
m_b = (f_b - f_0)/f_sp = 13.0 (by construction)
and m_step = ln(6.4)/27 - so the range from
1000 Hz to 6400 Hz accounts for the remaining
27.0 to take the scale up to 40.