[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About importance of "phase" in sound recognition

Hi Kevin,

thanks for the reply. the phase definition that I'm talking about is more of the third definition of yours. I'm exactly talking about what is called  "in-sensitivity to phase". I'm talking about the phase information that is discarded in the process of MFCC feature extraction and it has been proven to be succesfull feature set for speech recognition. The "insensitivity to phase" that implicitly implies that if you change the order (precedence) of travelling waves in each cochlear channel among each other, it will not affect the perception and you can add random phases to different channels without affecting the perception(?).

Now this was on one hand. On the other hand, couple of years ago there was a publication by a mathematician (pete-cassaza) that kind of reinforced the argument of phase insensitivety of speech recognition, but this time mathematically; very briefly stating that if you have a redundant set of magnitude coeeficients, then phase doesnt matter at all, and as they say in the paper this mathematically confirms the belief in the speech recognition community over the years about phase insensitivity, ...

And also there are some papers on the opposition side as well. This basically is the source of my confusion.


On Tue, Oct 5, 2010 at 6:30 PM, Kevin Austin <kevin.austin@xxxxxxxxxxxx> wrote:

In my experience, this word is used to refer to three (or more) different things, one of the acoustical and the other to HRTF and time delays between the ears. IN my classes this has been the area of greatest confusion. A third is about the phase of partial in a signal.

[1] A source sends a signal. It has one reflection that delays it by 5 ms, putting it out of phase with the original. The (combined) signal arrives at my ear with a 5 ms delay mixed with the original signal.
But maybe time delay and phase shift are not always considered to be the same thing.

[2] A 1 kHz signal appears at 45° to my left. It arrives at my two ears at different times. The phase of the signal is different, but only in my head. [interaural time difference]

[3] Build a square wave from its partials. Rebuild it shifting the phase of each successive partial by 35°. My reading said that these two signals will sound the same as the ear is "insensitive to phase". They don't sound the same to me.


Which aspect of phase are you asking about?


On 2010, Oct 5, at 11:23 AM, emad burke wrote:

> Dear List,
> I've been confused about the role of "phase" information of the sound (eg speech) signal in speech recognition and more generally human's perception of audio signals. I've been reading conflicting arguments and publications regarding the extent of importance of phase information. if there is a border between short and long-term phase information that clarifies this extent of importance, can anybody please introduce me any convincing reference in that respect. In summary I just want to know what is the consensus in the community about phase role in speech recognition, of course if there is any at all.
> Best
> Emad