Re: mfcc filters gain ("J. Scott Merritt" )


Subject: Re: mfcc filters gain
From:    "J. Scott Merritt"  <merrij3(at)RPI.EDU>
Date:    Fri, 12 Nov 2004 10:38:45 -0500

Steve, When you use the Nth root instead of the log, some channel effects may no longer cancel out completely, at least from a theoretical perspective. Is there a special/different method of Cepstral normalization that you recommend ? I am particulary interested in cancelling the pronounced difference I am seeing in the first Cepstral component caused by significant differences in the frequency response of various microphones. (Some have a pronounced boost in the higher frequencies which generatea a spectral tilt that is reflected in C1). Thanks, Scott. On Thu, 4 Nov 2004 10:55:40 -0000 Steve Beet <steve.beet(at)IEEE.ORG> wrote: > Just to add to the confusion: as well as changing the shape of the filters, > > it's worth looking at their bandwidth and the non-linearity (traditionally a > > log operation, as specified by the theory of homomorphic filtering). > > The problem with triangular filters is that any small change in frequency > > causes a large change in MFCC values just because the peak is too sharp. > > Almost any shape with a flatter top will work better, provided you get the > > width right. > > The problem with logarithms is that they over-emphasise any very small > > signals (which are most likely background noise). Traditionally the solution > > is to put a lower floor on the values being logged, but a "nicer" solution > > to my mind, is to use an Nth root operation instead. As N increases, the Nth > > root gets closer and closer to a scaled and shifted log operation, while as > > N decreases, the effects of low levels of noise become less and less. You > > need to experiment with the value of N to suit the noise characteristics in > > your data. > > By improving the shape and width of the filters, and optimising N in the Nth > > root operation, you can get somewhere between 20 and 40% reduction in word > > error rate, so it's worth looking into. These figures are based on my > > experiments with telephone speech from the UK SpeechDat database. > > My own work in this area is largely unpublished, but there was at least one > > paper in the "Aurora" sessions of Eurospeech a few years ago which looked at > > these issues and came to similar conclusions. Unfortunately I too can't find > > any specific references at the moment. > > Regards, > > Steve Beet > > > > ________________________________________________ > > Dr S W Beet, Principal R & D Engineer, > > Aculab plc, Lakeside, Bramley Road, Mount Farm, > > Milton Keynes, Bucks., MK1 1PT, UK > > Tel: (+44) 1908 273963 ; Fax: (+44) 1908 273801 > > ________________________________________________ > > > > ----- Original Message ----- > > From: "Toth Laszlo" <tothl(at)INF.U-SZEGED.HU> > > To: <AUDITORY(at)LISTS.MCGILL.CA> > > Sent: Wednesday, November 03, 2004 6:32 PM > > Subject: Re: mfcc filters gain > > > > You will find quite many different scales in the literature, and sometimes > > even several different formulas for the same scale. I have tried a couple > > of them, and never found a significant difference in the recognition > > results. In my sceptic opinion, there are much bigger inaccuracies in > > current speech recognition technology, so these little differences doesn't > > really matter. Anyway, probably the most interesting idea in this field > > was when several authors tried to directly optimize the filters in order > > to achieve the best possible recognition.... >


This message came from the mail archive
http://www.auditory.org/postings/2004/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University