[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

responses to relative phase in audition and vision posting

Thank you to everyone who responded to my email concerning relative phase
in audition and vision. Here are the responses:

My original post:
I've been working on some studies examining possible parallels between the
processing of spatial frequency and auditory frequency (e.g., Ivry and
Robertson 1998 [Two Sides of Perception],) and the issue of phase
information was raised to me
recently; a colleague pointed out that while relative phase information
seems to be unimportant when processing multiple auditory frequencies
(e.g., computing pitch), it is very important in vision. If you look at an
image which contains all of the same spatial frequencies at the same
positions, but without being in-phase, the image is not coherent.
My question is whether anyone has ideas about why this is the case. I'm
starting to wonder if this results from the kind of information that phase
might provide about the environment; i.e., perhaps whereas in vision phase
information is important to interpreting the spatial scene correctly, this
may not be as critical in audition? Does the indifference of pitch
computation mechanisms to phase reflect a lack of informativity of this
information about objects in the environment?
Also, are other elements of audition besides pitch perception (e.g.,
timbre) more dependent on relative phase?

Bob Carlyon <bob.carlyon@mrc-cbu.cam.ac.uk>
funny you should mention that....
The auditory system IS sensitive to phase between harmonics that interact
within a single auditory filter (because phase alters the shape of teh
waveform at the output of that filter)
It is also sensitive to phase differences between envelopes: e.g if you
apply AM at teh same rate to two different carriers (well separated in
frequency), and delay the phase of teh modulator applied to one of them,
subjects can detect this
It is NOT sensitive to (carrier) phase differences between individual
frequency components that are well separated so that they do not interact
in a single filter output. Shihab Shamma and I have submitted a paper to
JASA which states that there are three reasons for this (i) resolved
partials produce peaks in teh travelling wave, around which there are
dramatic phase transitions. So, different the neurons responding to the
same partial do so at lots of different phases. To compare the responses to
two partials, you'd need to know which neurons to compare with which (ii)
two different partials have, by definition, different frequencies. Even
when harmonically related (say 1st and 4th harmonics of a given F0) the
peaks in the filter output to the lower component will co-incide (when in
phase) with only a proportion of those to the higher component (1/4th in
the above example). (iii) when the resolved harmonics have a moderately
high frequency, the response to them will be temporally smoothed by the
hair cell & later stages of teh auditory system.

[Dr. Carlyon was also kind enough to pass on a preprint to me on this topic;
interested parties may request one from him]

Eckard Blumschein <Eckard.Blumschein@e-technik.uni-magdeburg.de>
While the input signal of vision is already parallel structured,
audition has to cope with the need for transforming the structure
from serial to parallel. Ohm's law of acoustics reflects this function
even if Seebeck's objection was justified and Fourier transform is
merely a poor approximation of physiology.
Incidentally, physicists and mathematicans publically dicussed with me
Fourier transform vs. Fourier cosine transform with respect to causality.
I was correct. Only the latter is causal. There is no justification for
the ambivalence and redundancy of FT. There is no reason for negative
values of elapsed time, frequency, radius and wavenumber.
On one hand it is true: Theoretically, amplitude and phase of a complex
quantity are equally important as also are its real and imaginary values.
On the other hand, one can look at the amplitude spectrum separately,
and this might even be useful for measuring of a time span. Radar systems
do so when applying the standard IFFT approach on stepped frequency chirps.
Animal hearing is mainly designed for estimation of temporal distances
and recognition of a coherent source. Cochlear tonotopy is perhaps the
most simple solution therefore in that, it simply omitts phase.
However, since place on partition is just an additional dimension,
a diversified field of temporal structures is still remaining.
At first it allows for some archaic functions (being unique to hearing)
of auditory midbrain which are obviously not subject to the phase deafness.
It is also the basis for cortical auditory analysis that seems to be
pretty similar to corresponding analysis in vision.
Simplifying corollary: Cochlea acts like a serial to parallel interface
that provides an additional amplitude spectrum before cortical analysis.
Measurement with tones must not be generalized to hearing as a whole.
Paradox effects of phase and also of frequency components outside the
audible range reveal that the idea of hearing as a frequency analyzer
neglects fast interaural comparison as well as the sluggish cortical
counterpart of visual perception.

Amanda Lauer <alauer@psyc.umd.edu>
Masking of sounds can be affected by relative phase. For instance,
threshold for tones embedded in harmonic complex maskers with identical
long-term frequency spectra, but with different phase spectra is
strongly affected by the starting phases of the components.
A few recent papers:
Lentz & Leek (2001). Psychophysical estimates of cochlear phase
response: Masking by harmonic complexes. JARO, 2, 408-422.
Oxenham & Dau (2001). Towards a measure of auditory-filter phase
response. JASA, 110, 3169-3178.

Brad Libbey <gt1556a@mail.gatech.edu>
I've had similar questions myself. For my thesis I created
reverberation-like noise by randomizing the phase of reverberation. I did
this by windowing 93 ms segments of reverberant speech, taking a fast
Fourier transform, randomizing the phase, converting back to time domain,
overlapping, and adding the segments. The anechoic speech signal is then
added to the reverberation-like noise.
Subjects in reverberation identified 76% of the words correctly and
subjects in reverberation-like noise identified about 66% correctly.
This is when the speech to reverberation ratio matches the speech to
reverberation-like noise ratio. Another way of thinking of this data is
that the speech to noise ratio has to be roughly 5 dB greater for the
reverberation-like noise to match intelligibility scores.
One possible reason for this is the temporal smearing that occurs within
the time window due to the 93 ms window length. The other possible reason
is related to your question. Does the auditory system have trouble
dealing with the random phase? I asked around at a conference recently
and the following was suggested. Relative phase is significant within a
critical band but not across critical bands. This is partially backed up
by research done by Traunmuller, H. (1987) The Psychophysics of Speech
Perception, Chapter "Phase Vowels" Martinus Nijhoff, Hingham, MA, USA p
Also might try Patterson, R. (1987) "A pulse ribbon model of monaural
phase perception" JASA 82 p 1560-1586.

Houtsma, A.J.M. <A.J.M.Houtsma@tue.nl>
Part of your problem is that your statement about the unimportance of
relative phase information in auditory processing is much too broad
and actually incorrect. Se for, instance, Julius Goldstein's 1967
study in JASA on this topic. In a nutshell, it boils down to the
fact that the auditory system is insensitive to relative phase as
long as tone components are in different critical bands (i.e. are
about 15 % apart in frequency). However, when frequencies are closer
together and tones fall in the same critical band, your ear can easily
detect relative phase changes. With respect to pitch perception, there
appear to be two mechanisms, a strong one based on resolved components
(this one is phase-insensitive) and another much weaker one based on
unresolved components (this one IS sensitive to phase). See Houtsma
and Smurzynski, JASA 1990).
One reason for the difference in phase sensitivity between the visual
and auditory system may be that the eye does not have a clear analogy
to the ear's critical band.

John Hershey <jhershey@cogsci.ucsd.edu>
In vision we talk about phases of an image - a whole array of signals -
in the spatial domain, whereas the phase insensitivities of audition
refer to the phases of an acoustic signal in the time domain. So in
some sense it's apples and oranges, but you could compare phase
sensitivities of vision and audition within either the spatial or
temporal domain. You could also ask why it's apples and oranges, given
that both phenomena (light and sound) behave like waves. The difference
has something to do with wavelength.
Regarding spatial phase: the relative phases of the spatial frequency
components of a focused image will be important no matter what the
modality because we are interested in locating objects in the world.
If sound had a shorter wavelength -- and we lived in an acoustic world
where things behaved in a light-like way -- we could perhaps have some
sort of an acoustic lens and a high-resolution spatial sound retina of
On the other hand, even with long wavelengths, reverberation, and only
two sensors we still manage to use the time differences between the two
ears (among other things) to form a spatial image in the brain, albeit
at lower resolution than in vision. So if you were to look at the
spatial frequency components of this image you would find that the
relative phases of these spatial frequency components are important for
locating sounds, even if the relative phases in the time domain are not.
Regarding temporal phase, the usual explanation of why relative phase of
resolved components in the audio time domain is relatively unimportant
for many tasks is that reverberation/refraction scrambles the relative
phases between components that are far apart in frequency, due to the
long wavelength of sound among other things. They are still important
for transients and for components that are nearby in frequency.
As for vision, the visual system has such a different response to
temporal signals that it would be difficult to compare to audition --
for instance we pick up different frequencies as color. At the level of
the electromagnetic waves/photons vision is probably insensitive to
phase relationships between different frequencies.
However if we think of the brightness (amplitude envelope) over time as
the signal of interest, then relative phase of the frequency components
of this signal is likely important in the visual time domain. The
spatio-temporal signature of the moving edge tends to be coherent - - if
it weren't we would see the edges blur in a strange way. That said, if
we think about the loudness envelope of an acoustic signal instead of
the signal itself -- then the phase relationships are again important
even across different temporal frequency bands.