[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Frequency to Mel Formula


Your comments and questions are well taken. Stevens discarded the "half-pitch" mel scale of 1937 completely, as I commented. Siegel 1964 or 65) did the only published effort I know about to replicate the 1937 experiment of Stevens, as mentioned by Richard Warren and as described also in Footnote 4 of my 1997 paper, where I present arguments against half-pitch judgments that you and others have. Stevens replaced the 1937 scale with the 1940 mel scale, that was based on experiments intended to divide frequency intervals into four (4 not 2) equal perceptual intervals. That scale appears to have been methodologically biased, as I commented. That scale is also the one that was approximated by Fant and used thereafter, as Pierre says, by the "speech science and technology community". I agree with Pierre that they have been ill-advised to use it, but no doubling or halving was involved in arriving at the 1940 scale.

In 1940 Stevens, as you have noted, did include a separate experiment (reported in the same paper as the equisection data) asking subjects to make what he called half-pitch judgments, but Stevens provided 40 Hz signals to the subjects to provide them with "zero" pitch approximations. That seems to have converted those subject judgments into bisection experiments, i.e. 40 Hz on one end, standard tone on the other, with subject setting the "middle" tone. In any case, when Stevens provided the "zero point" you call for, the resulting pseudo half-pitch data fit in with his 4 part equisection results, but (as he says) they were not used to make the 1940 scale, which used 4 part equisection data only.

The use of narrow noise bands signals to try equisection experiments sounds like an experiment that could be tried, whether the outcome would be useful or not.

Donald Greenwood
On 29 Jul, 2009, at 8:15 PM, Richard F. Lyon wrote:


Certainly the circular or helical aspect of pitch is crucial, in many aspects of pitch perception. But there's also this one- dimensional scale that can be valid in some contexts. I hadn't said or known anything about this "half-pitch" concept, which would certainly bring in the whole octave equivalence complication. But is that what was used for the mel-scale tests and such? I didn't think so. Rather, the idea was to subdivide intervals into perceptually equal intervals ("equisection"). Of course, if the intervals are like 2 octaves or such, or the subject is musically savvy, that's going to bias the judgements based on the pitch circularity. But if the signals are something like narrow noise bands, maybe it would be possible to do the task while avoiding those cues of "consonance" and such?

The "half pitch" idea presumes a well-defined, or well-perceived at least, zero point, as well as a nonlinear mapping to try to get at. Plus it puts the likely result right where the octave is, at least for low frequencies. Did anyone actually use that approach? Richard Warren and Snorre Farner say several studies did so; I'm surprised; it seems like a bad idea. Wouldn't you almost always get a result of half pitch equal to half frequency? Is that the explanation for why the linear-to-log breakpoint ended up so high? Or did they really do equisection of intervals defined by two nonzero tone frequencies?

Stevens says they did both, but the curve he plots show only the equisection results: