[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mfcc filters gain
On Wed, 3 Nov 2004, Guillaume Lemaitre wrote:
> In the Malcom Slaney's Matlab implementation of mel frequency cepstral
> coefficients, triangular filters are normalized "so that each filter has
> unit weight". I am wondering what does this normalization correspond to.
A very good question (this means that I was also wondering about it and
could not find the answer...). If you normalize then the resulting
values remain comparable. If you don't, then the wider filters return
larger values on the average. So from a practical point of view,
normalization might turn out to be useful in certain cases. Probably it is
good for the subsequent cosine transform in mfcc transformation. But from
a theoretical point of view, a I think it is hardly explainable (???).
> I am also wondering if some work has already be done to improve
> mfcc-like processing. As it is suggested in , Moore's ERB scale or
> Bark scale seems to be more appropriated than the mel scale, and
> gammatone filterbank should be much more accurate (even if probably more
> computationaly expensive) than a triangular filterbank ?
You will find quite many different scales in the literature, and sometimes
even several different formulas for the same scale. I have tried a couple
of them, and never found a significant difference in the recognition
results. In my sceptic opinion, there are much bigger inaccuracies in
current speech recognition technology, so these little differences doesn't
really matter. Anyway, probably the most interesting idea in this field
was when several authors tried to directly optimize the filters in order
to achieve the best possible recognition. I have seen a couple of papers on this, but
unfortunately don't have any references at hand...
Hungarian Academy of Sciences *
Research Group on Artificial Intelligence * "Failure only begins
e-mail: email@example.com * when you stop trying"