Re: mfcc filters gain (Steve Beet )


Subject: Re: mfcc filters gain
From:    Steve Beet  <steve.beet(at)IEEE.ORG>
Date:    Thu, 4 Nov 2004 10:55:40 -0000

Just to add to the confusion: as well as changing the shape of the filters, it's worth looking at their bandwidth and the non-linearity (traditionally a log operation, as specified by the theory of homomorphic filtering). The problem with triangular filters is that any small change in frequency causes a large change in MFCC values just because the peak is too sharp. Almost any shape with a flatter top will work better, provided you get the width right. The problem with logarithms is that they over-emphasise any very small signals (which are most likely background noise). Traditionally the solution is to put a lower floor on the values being logged, but a "nicer" solution to my mind, is to use an Nth root operation instead. As N increases, the Nth root gets closer and closer to a scaled and shifted log operation, while as N decreases, the effects of low levels of noise become less and less. You need to experiment with the value of N to suit the noise characteristics in your data. By improving the shape and width of the filters, and optimising N in the Nth root operation, you can get somewhere between 20 and 40% reduction in word error rate, so it's worth looking into. These figures are based on my experiments with telephone speech from the UK SpeechDat database. My own work in this area is largely unpublished, but there was at least one paper in the "Aurora" sessions of Eurospeech a few years ago which looked at these issues and came to similar conclusions. Unfortunately I too can't find any specific references at the moment. Regards, Steve Beet ________________________________________________ Dr S W Beet, Principal R & D Engineer, Aculab plc, Lakeside, Bramley Road, Mount Farm, Milton Keynes, Bucks., MK1 1PT, UK Tel: (+44) 1908 273963 ; Fax: (+44) 1908 273801 ________________________________________________ ----- Original Message ----- From: "Toth Laszlo" <tothl(at)INF.U-SZEGED.HU> To: <AUDITORY(at)LISTS.MCGILL.CA> Sent: Wednesday, November 03, 2004 6:32 PM Subject: Re: mfcc filters gain You will find quite many different scales in the literature, and sometimes even several different formulas for the same scale. I have tried a couple of them, and never found a significant difference in the recognition results. In my sceptic opinion, there are much bigger inaccuracies in current speech recognition technology, so these little differences doesn't really matter. Anyway, probably the most interesting idea in this field was when several authors tried to directly optimize the filters in order to achieve the best possible recognition....


This message came from the mail archive
http://www.auditory.org/postings/2004/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University