[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MFCC method

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: MFCC method
From: Donald D Greenwood <ddg@xxxxxxxxxxxxx>
Date: Fri, 9 Jan 2009 12:59:12 -0800
Approved-by: ddg@xxxxxxxxxxxxx
Comments: To: Arturo Camacho <acamacho@xxxxxxxxxxxx>
Delivery-date: Fri Jan 9 16:21:32 2009
In-reply-to: <20090109080404.51A556502@xxxxxxxxxxxxxxxxxxxxxxx>
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
References: <20090107214829.5E1664784@xxxxxxxxxxxxxxxxxxxxxxx> <20090107221249.5404461AF@xxxxxxxxxxxxxxxxxxxxxxx> <20090109054755.C04002A73@xxxxxxxxxxxxxxxxxxxxxxx> <20090109080404.51A556502@xxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: Donald D Greenwood <ddg@xxxxxxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear Arturo,

In your response to Dick Lyon you refer to the observation that theMel Scale "approximates the human auditory system's response moreclosely than the linearly-spaced frequency bands used in the normalcepstrum" and you make a reference to my frequency-position functionof 1961, 1990, and 1991 as a potential substitute.

[Greenwood, D.D. (1961) Critical bandwidth and the frequencycoordinates of the basilar membrane. J. Acoust. Soc. Am. 33,1344-1356.

Greenwood, D.D. (1990) A cochlear frequency-position function forseveral species - 29 years later. J. Acoust. Soc. Am. 87, 2592-2605.

Greenwood, D.D. (1991) Critical bandwidth and consonance in relationto cochlear frequency-position coordinates, Hearing Res. 54, 165-208.]

As for abandoning the Mel Scale, I would agree - whatever replacesit. It has seemed to me since 1956 (for reasons appearing below) thatthere are good reasons not to use the Mel Scale at all any more. In2006 I replied to that effect privately in a message responding to apost of Jim Beauchamp, at which time he suggested that my reply wouldmake a good post to the list. Perhaps I should have done so then.

In any case, I do so now as an expedient alternative to composinganother. Here (between the horizontal dashed lines) is Jim's messageand my reply to him as of 2006. His message is indicated by marginalbars. My reply appears between and after them.

----------------------------------------

Date:         Wed, 17 May 2006 15:07:02 -0500
From: beaucham <beaucham@xxxxxxxxxxxxxxxxxxxxxx>
Subject: critical band vocoder
To: AUDITORY@xxxxxxxxxxxxxxx

For many years vocoders were used for data reduction of speechsignals.

A vocoder separates the input signal into consecutive bands, .

Recently, mel-frequency cepstral coefficients have been popular for
speech recognition. Mel frequency spacing is approximatelyproportional
to critical-band frequency spacing.

Only approximately - and not closely proportional. Ironically,although Stevens stressed the purported proportionality, if equalnumbers of Mels had been any more closely proportional to either equaldistances on Bekesy's map or to CB (the actual data is meant here -not Zwicker's CB curve, which differs in major respects from the CBdata). the less justified would Stevens' other conclusion have beenthat equal pitch differences did NOT correspond to equal frequencyratios (which much annoyed musicians and John Pierce). An almostlogarithmic map, as Bekesy's map obviously was, implies that equaldistances correspond closely to equal frequency ratios except wherethe "almost" becomes relevant, i.e. below 400 to 500 Hz. But perhapsStevens' "other conclusion" was mainly an unexamined hangover from thediscarded and very different 1937 mel scale, obtained by a differentmethod.

Furthermore, the mel scale's popularity has had little justification.There have been good reasons for not using the mel scale for manyyears. A major one was that it (the 1940 mel scale) was notreplicated by Lewis (1942) nor checked by anyone else (so far as Iknow) until 1956 (at Steven's behest). The full results of that 1956check (not actually intended to be a check of the mel scale itself -though the results turned out that way) were published in HearingResearch in 97:

Greenwood, D.D. (1997) The Mel Scale's disqualifying bias and aconsistency of pitch-difference equisections in 1956 with equalcochlear distances and equal frequency ratios, Hearing Res. 103,199-2248.

A shorter paper of similar content was presented (and 'published') atthe 97 Fechner Society meeting in Poznan, Poland under the title: "THEMEL SCALE'S BIAS AND EQUAL PITCH-DIFFERENCES: IMPLICATIONS OF ANALMOST LOGARITHMIC COCHLEA AND POSSIBLY SUBJECT-DEPENDENT CRITERIA".

[Stevens' reference to the 1956 methodological check (in a 1957 paperof his [Stevens, S.S. (1957) On the psychophysical law, Psychol. Rev.,64, 153-181]) was in relation to the distinction he conceived betweendifferent types of scales (metathetic vs prothetic) rather than to thefurther implication of the results in respect to the mel scale itself- which he may never have considered.]

My question is: Has anyone designed
and tested a vocoder using critical-band spacing of the filters?

Mine also. I suggested this for the sound spectrogram, and latervocoder, numerous times (starting in the 60s) to colleagues in speech,and to miscellaneous others, to no observed effect.


Cheers (and greetings to Jont if you see him),

DDG
-----------------------------------

[This greeting to Jont still applies.]

A Part of the Abstract of my 1997 Mel paper should provide the reasonsStevens wanted the experiment done and a brief statement of theoutcomes.


"Abstract of first 1997 paper cited above

In 1956, Stevens "commissioned" an experiment to equisect a pitchdifference between two tones. Results appear to reveal amethodological flaw that would invalidate the Mel Scale (Stevens andVolkmann, 1940). Stevens sought to distinguish sensory continua, e.g.loudness and pitch, on various criteria. He expected that the pitchcontinuum would not exhibit "hysteresis"; i.e., that subjects dividinga pitch difference (Df) into equal-appearing parts would not setdividing frequencies higher when listening to notes in ascending orderthan in descending order. Seven subjects equisected a pitchdifference, between tones of 400 and 7000 Hz, into equal-seeming partsby adjusting the frequencies of three intermediate tones. All sevenexhibited hysteresis, contrary to expectation. This outcome bears onother issues: Years prior, Stevens suggested that equal pitchdifferences might correspond to equal cochlear distances, but not toequal frequency ratios nor to equal musical intervals (Stevens andDavis, 1938; Stevens and Volkmann, 1940). In 1960 (reported now), boththe 1940 Mel scale and the equal-pitch differences of 1956 werecompared to equal cochlear distances, using a frequency-positionfunction that fitted Békésy's cochlear map (Greenwood, 1961; 1990).When ascending and descending settings were combined to contra-posebiases, equal pitch differences did coincide with equal distances -which the Mel Scale did not. Further, the biased ascending-order datacoincided with the Mel scale, suggesting the Mel scale was similarlybiased. Thus, the combined-order equal-pitch differences of 1956 -but not the Mel scale - are consistent with equal cochlear distances.But, since the map between 400 and 7000 Hz is nearly logarithmic,equal frequency ratios also approximate equal distances. Ironically,above 400 Hz, Békésy's map and Stevens' equal-distance hypothesisjointly imply that musical intervals will nearly agree with equalpitch differences, which Stevens thought he had disconfirmed. But,given Békésy's map, only near the cochlear apex will equal distancesnot approximate equal frequency ratios; . . . "

I hope this belated 2006 "post" - and the 1997 paper - may be ofinterest.



- Donald


On 8 Jan, 2009, at 11:28 PM, Arturo Camacho wrote:

Dear Dick,

The Wikipedia page that you mention says that the Mel scale
"approximates the human auditory system's response more closely than
the linearly-spaced frequency bands used in the normal cepstrum." If
that means that the Mel scale approximates better the tonotopic
response of the cochlea than the linear scale, I wonder if it would
not be an even better idea to use the Greenwood function (see entry in
Wikipedia), which was explicitly created with that purpose. (Recall
that the Mel scale was designed to represent equidistant steps in
pitch, but that does not necessarily corresponds with equidistant
tonotopic steps.)

Regards,

Arturo

On Thu, Jan 8, 2009 at 8:46 PM, Richard F. Lyon <DickLyon@xxxxxxx>wrote:

Thanks Malcolm; now that you've told us, it's in wikipedia:
http://en.wikipedia.org/wiki/Mel-frequency_cepstrum#History
Including the connection to earlier work by Pols; I can share
a copy of Plomp, Pols, and van de Geer (1967) on request.

Dick

At 2:07 PM -0800 1/7/09, Malcolm Slaney wrote:

On Jan 7, 2009, at 12:40 PM, James W. Beauchamp wrote:
I'm looking for a (the?) seminal article on the MFCC method of
coding spectral envelopes. It could be a journal paper or a chapter
in a book. Also, who was the first to publish on this idea?
These are the usual references, especially the 1980 paper.
P. Mermelstein, Distance measures for speech recognition,psychologicaland instrumental, in Pattern Recognition and ArtificialIntelligence, C. H.
Chen, Ed., pp. 374 388. Academic, New York, 1976.
S.B. Davis, and P. Mermelstein, Comparison of ParametricRepresentationsfor Monosyllabic Word Recognition in Continuously SpokenSentences, in IEEETransactions on Acoustics, Speech, and Signal Processing, vol.28(4), 1980,
pp. 357 366.


But Mermelstein usually credits John Bridle's work for the idea
      JSRU Report No. 1003
      AN EXPERIMENTAL AUTOMATIC WORD·RECOGNITION SYSTEM:
      INTERIM REPORT
      J . S. Bridle and M. D. Brown


I have copies of the early two if you need them.

- Malcolm




--
__________________________________________________

Arturo Camacho, PhD
Alumni
Computer and Information Science and Engineering
University of Florida

E-mail: acamacho@xxxxxxxxxxxx
Web page: www.cise.ufl.edu/~acamacho
__________________________________________________

Prev by Date: long mp3 files
Next by Date: Re: MFCC method
Previous by thread: Re: MFCC method
Next by thread: Re: MFCC method
Index(es):
- Date
- Thread