[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About importance of "phase" in sound recognition



Hello Emad,

I agree with earlier correspondents that much of the confusion comes from thinking of signals in Fourier terms. The ear performs a spectral analysis but the analysis is not properly represented by a windowed FFT, or spectrogram.

The cochlea performs a wavelet transform which is better simulated with an auditory filterbank (e.g. Unoki at al., 2006). The output of each filter is encoded by auditory nerves that phase lock at speech frequencies. So there is no question that phase information gets into the auditory system. A summary of the early literature is presented in Patterson (1987). The paper summarizes our understanding of monaural phase perception and provides new data supporting earlier theories which basically say

The auditory system preserves phase changes that change the envelope of the wave coming out of an individual auditory filter (within channel changes). Reverberation can produce this kind of change in a speech signal.

The auditory system loses most of the phase information that defines time delays between channels. These global phase shifts are encountered in signal transmission.

So one answer is to assess the phase changes you are concerned about by passing them through an auditory filterbank and checking to see whether there are within channel differences that MFCCs do not preserve.

Subsequent experiments, like that of Gockel et al. (2002) suggest, as Laszlo Toth intuited, that phase changes that disrupt glottal pulse integrity reduce detectability in noise, and the effect is greater when the glottal pulse rate is lower.

I can provide pdfs of the references below on request.

Regards Roy P

Patterson, R.D. (1987b). A pulse ribbon model of monaural phase perception. J. Acoust. Soc. Am., 82, 1560-1586.

Unoki, M., Irino, T., Glasberg, B., Moore, B. C. J. and Patterson, R.D. (2006). “Comparison of the roex and gammachirp filters as representations of the auditory filter,” J. Acoust. Soc. Am. 120.3  1474-1492.

Gockel, H.,
Moore, B.C.J. and Patterson, R.D. (2002). Asymmetry of masking between complex tones and noise: The role of temporal structure and peripheral compression. J. Acoust. Soc. Am. 111 2759-2770.


On 05/10/2010 16:23, emad burke wrote:
Dear List,

I've been confused about the role of "phase" information of the sound (eg speech) signal in speech recognition and more generally human's perception of audio signals. I've been reading conflicting arguments and publications regarding the extent of importance of phase information. if there is a border between short and long-term phase information that clarifies this extent of importance, can anybody please introduce me any convincing reference in that respect. In summary I just want to know what is the consensus in the community about phase role in speech recognition, of course if there is any at all.

Best
Emad


-- 
Roy Patterson
Centre for the Neural Basis of Hearing
Department of Physiology, Development and Neuroscience
University of Cambridge, Downing Street, Cambridge, CB2 3EG
    phone +44 (1223) 333819   fax +44 (1223) 333840
    email: rdp1@xxxxxxxxx  
  http://www.pdn.cam.ac.uk/groups/cnbh/
  http://www.AcousticScale.org