Your formulation suggests that you identify pitch with chroma. My position is that pitch is a two dimensional entity, in which both chroma and tone height are involved. In a musical context the chroma aspect plays a dominant role. In a speech context I have never noticed the appearance of the chroma aspect. Probably because we never talk with a limited set of fixed pitches. Or because our "vowel processor" is occupied with the meaning of the words and not with the names of the notes, which is certainly the case in the perception of possessors of absolute pitch, who are very chroma oriented.


I think the resolution is in what Leon said, that the mel scale is really more about "tone height" or "frequency" than about pitch or melody. So it's mis-named, at the least. It's also not accurate, as Don points out, and maybe a cochlear map is really the better concept.

But as you also know, it's used in speech primarily because it seems to work well (at least a local optimum), which is mostly about not resolving pitch harmonics but adequately resolving formants. I think you also agree with me in the feeling that it works well largely because speech systems don't usually have a good model for what to do with pitch information, so they're better off not resolving it; and that this is a problem and an opportunity to find a better way...


For the musician-me, the Mel scale is an oxymoron -- I know quite well what the half or the double of an interval is, regardless of its chroma and regardless of whether the interval is sanctioned by the Western system.

For the psychoacoustician-me, the concept of the Mel scale is invalid. When experience (i.e., musicianship) is detrimental to determining a scale, any scientifically thinking individual should just scratch his/her head and
close the book on the topic.

Because of this negative conclusion, I have been wondering for a long time why the speech science and technology community insists on basing their work on MFCC, a measure derived from an at best dubious and at worst invalid
frequency scale.

