Re: About importance of "phase" in sound recognition (Steve Beet )


Subject: Re: About importance of "phase" in sound recognition
From:    Steve Beet  <steve.beet@xxxxxxxx>
Date:    Fri, 8 Oct 2010 20:56:58 +0100
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

I'm surprised that no-one so far in this thread has mentioned the "group delay" of signals, either in the auditory system (as in my own rather specialised auditory modelling work in the late 1980s) or in a more mathematically-oriented traditional DSP form in the work of Yegnanarayana et al. The group delay (Yegnanarayana's "Modified Group Delay Function", or the phase parameter in my "Reduced Auditory Representation") provides information derived from the phase components of signals, but in a form which visually is very similar to conventional PSD estimates. This form of data has many advantages over conventional PSD representations (amplitude independence, clear and relatively noise- and channel-immune representation of formants, etc.), but it also has a down-side - e.g. when trying to differentiate fricatives from background noise, when amplitude is a key factor, and phase alone is not enough. If you know how to use the phase you can get as much information out of it as you can from the amplitude. To a first approximation, one can model human perception solely in terms of amplitude, but there are some effects which can only be explained if you include phase information as well. I've never seen a successful attempt to model human perception solely from phase information, but I suspect it may be possible. This is hardly surprising when you consider the peripheral auditory system, which provides phase-locked neuron firings at low frequencies, but seems to only provide amplitude information once you get above a few kHz. It would be very strange if the higher levels of the auditory system did not take account of synchronisation when it occurs - just as it would be strange if it were able to extract phase information from signals where the peripheral auditory system did not first extract such information. However, if you want to decide exactly what sounds "real" people can differentiate between, I think you're fighting a losing battle. Quite apart from the variability between individuals, and between one individual on different occasions (before and after musical education, when healthy and when suffering from an ear infection, etc., etc.), I don't believe you can do so, either in terms of phase, amplitude, or both together. Conventional phase and amplitude is based on a mathematical model which only really makes sense for stationary signals, and as such is only applicable to highly artificial environments. This is especially true for FFT-based analysis. To model perception accurately, you need to create a complete model of the whole auditory system, right the way from the cochlea up to the cerebellum. For a more pragmatic approach, modern audio codecs can provide a good indication of the perceptual importance of different components of signals, but they are mostly based on very simplistic models of perception. Steve Beet


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2010/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University