[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About importance of "phase" in sound recognition



On Tue, 5 Oct 2010, emad burke wrote:

> I'm exactly talking about what is called "in-sensitivity to phase". I'm
> talking about the phase information that is
> discarded in the process of MFCC feature extraction and it has been proven
> to be succesfull feature set for speech recognition.

As I said, the problem is that quite probably different things are meant
by "phase" in the classic "insensitivity to phase" claim and the speech
technology interpretation. In speech recognition FFT is applied to
decompose the signal into frequency components. FFT decomposes the signal
into sine waves with GIVEN frequencies. For example, if the signal
consists of only one sine wave, but its frequency is not among the FFT
frequencies, then you sine wave gets decomposed into a lot of sines, and
the phase information will guarantee that after summing these components,
you receive back just one sine wave. If you change the phases, the whole
thing falls apart.
I don't exactly know what the old sentence "the ear is insesitive to
phase" meant by phase, but surely not the phases of the components obtained
by FFT. So I think this if the main source of confusion.

> On the other hand, couple of years ago there was a
> publication by a mathematician (pete-cassaza) that kind of reinforced the
> argument of phase insensitivety of speech recognition, but this time
> mathematically; very briefly stating that if you have a redundant set of
> magnitude coeeficients, then phase doesnt matter at all,

This is stated too briefly. What is the mathematical meaning of "doesn't
matter"? Do you have a reference?

               Laszlo Toth
        Hungarian Academy of Sciences         *
  Research Group on Artificial Intelligence   *   "Failure only begins
     e-mail: tothl@xxxxxxxxxxxxxxx            *    when you stop trying"
     http://www.inf.u-szeged.hu/~tothl        *