[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: FW: mfcc filters gain

At 9:46 AM -0500 11/4/04, J. Scott Merritt wrote:

Thank you for that posting.  It seems related to a recent discussion I had
with a colleague regarding the need to eliminate the natural spectral tilt
of human speech before taking the DCT.  By Dr. Skowronski's reasoning, it
appears clear that spectral tilt compensation is not required before taking
the DCT.

Best regards, Scott.

It seems to me that anyone who wants to use the mfcc technique, or
other cepstral or homomorphic technique, really should start by
understanding the math and the sensitivities enough to know what
they're getting into.  Dr. Skowronski's observation that in a
cepstrum any set of channel gains in the analyzer is just an offset
is cepstral space is elementary and well known.

However, when you move to make the whole process more robust, e.g. by
incorporating Steve Beet's Nth root instead of log (which I totally
endorse), or any other kind of stabilization of the log function's
singularity at zero, this property of the log no longer saves you,
and it becomes important to get channel weightings that are at least
in the right ballpark (they get Nth-rooted, too, so they're not
critical, but they don't totally drop out as they did with the log).
Time-adaptive channel gains can also work very well, and help by
removing some of the cepstral space offsets due to the channel
(rooom, mic, speaker, etc.).

If you allow one tilt parameter to optimize over, you'll probably
benefit from it.

The triangular filter, as was mentioned by someone already, is
probably a poor choice compared to any more auditory-motivated filter
such as gammatone or all-pole gammatone.  As Slaney and I have shown,
these can be very efficiently implemented.