[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About importance of "phase" in sound recognition

Hi Emad,
  like Kevin, I have to ask: what phase do you mean?  I have been
working for a while now (just finishing my Ph.D.) on phase in the
context of monaural phase, and phase perception.

If we're talking about the phase that you get from a Short-Time
Fourier Transform (STFT) transform, it has IMHO very little perceptual
meaning (lots of information, but not easily translatable to
'meaning') since you're basically converting a short section of sound
by first windowing it, multiplying it with a perfect monochromatic
complex sinusoid, and then compare the rotation of that to the sound -
this has no easy equivalent perceptually.  It is a mathematical
decomposition of the signal and very useful as such, and can with some
effort be analysed from a perceptual viewpoint - but mostly only by
returning it into time domain.

The STFT is a block-based transform; our ears don't work on short-time
blocks.  There are various time-constants of perceptual effects, but
no fixed boundaries.

I have been using a gammatone magnitude/phase decomposition.  I use a
gammatone filterbank (FB) at some spacing based on the ERB/Bark scale
with a view towards reconstruction (see for example Strahl in JASA
Nov. 2009).  The envelope of the signals in each channel of this FB
can be regarded as the strength of excitation of a hair cell (HC)
ensemble at a point on the Basilar Membrane (BM).  Normalising w.r.t.
the envelope you are left with a nearly sinusoidal "carrier" signal
whose frequency is centered around the channels center frequency,
which is the critical frequency of the hair cell ensemble associated
with the filter.  This carrier can be regarded as the instantaneous
phase of the original signal in a auditory channel; we can now ask how
much "phase distortion" is audible in each channel - this phase signal
is synchronized with the IHC response for lower frequency auditory

Problem is that auditory channels overlap a lot thus you can't modify
the carrier/phase of one channel independently of those in adjacent

I haven't looked at this carrier/phase with regards to binaural
hearing yet, but since this is an essentially time-domain
phase/magnitude decomposition, it should be amenable to be examined
for inter-aural time differences, and the magnitude signal for
inter-aural level differences.


On Tue, Oct 5, 2010 at 11:23, emad burke <emad.burke@xxxxxxxxx> wrote:
> Dear List,
> I've been confused about the role of "phase" information of the sound (eg
> speech) signal in speech recognition and more generally human's perception
> of audio signals. I've been reading conflicting arguments and publications
> regarding the extent of importance of phase information. if there is a
> border between short and long-term phase information that clarifies this
> extent of importance, can anybody please introduce me any convincing
> reference in that respect. In summary I just want to know what is the
> consensus in the community about phase role in speech recognition, of course
> if there is any at all.
> Best
> Emad

Joachim Thiemann :: http://www.tsp.ece.mcgill.ca/~jthiem