[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

audition vs. vision: responses

Dear List,

Several people requested I post these responses to my query about
auditon vs. vision.

Many thanks to all who replied.

Ani Patel

Can anyone recommend a reference which compares the processing of
temporal information in audition vs. vision from a neural standpoint?

It is known that the the auditory system has neuroanatomical
specializations which help preserve the precise timing of afferent input
(e.g. endbulbs of Held on bushy cells in the cochlear nucleus), but how
does this compare with the visual system?

REPLY #1 (Jont Allen)

I made a lame shot at this, which is published:

,author={Allen, J.B.}
,title={The intensity JND comes from Poisson neural noise: Implications
for image coding}
,booktitle={Human Vision and Electronic Imaging V}
,editor={Rogowiz, B.E. and Pappas, T.N.}
,publisher={Proc. of SPIE}
,address={PO Box 10, Bellingham, Washington 98227-0010}
,note_={www.spie.org} }

It is at http://auditorymodels.org/tmp/
and is called Vision.pdf
I am not sure it is what you are looking for, but it is in the ball
REPLY #2 (Israel Nelken)

There are at least two time scales at which you can compare vision and
audition. I don't think the sub-ms scale of audition really has an
equivalent in vision, but the longer time scales, of 10s of ms, might
have. As an example: the binaural system (which requires the sub-ms
scale, and is extremely specialized) has also a long time constant. It
is expressed in the so-called sluggishness of the binaural system, that
cannot follow very fast changes in ITD (more than order of 10 Hz). I
guess the sluggishness has parallel in vision.
        More specifically: Hermann Wagner had a paper in Nature with
Frost on the barn owl stereo vision, where they used methods taken from
auditory research. Hermann has a lot written on auditory motion, which
requires temporal processing, and he finds (at least in the barn owl)
quite a lot that is similar to visual motion detectors.
REPLY #3 (Franck Ramus)

I'm not sure how relevant this is to your question, but part of the
literature on dyslexia
focuses on impairments of temporal processing in both the auditory and
the visual modality.
the most explicit paper about this is probably:

Stein, J., & Talcott, J. B. (1999). Impaired neuronal timing in
developmental dyslexia: The
magnocellular hypothesis. Dyslexia, 5, 59-77. perhaps you will find it
hard to find, ask me a copy then. a less explicit version is: Stein, J.,
& Walsh, V. (1997). To see but not to read; the magnocellular theory of
dyslexia. Trends Neurosci., 20(4), 147-152.

a lot of the story relies on work by Galaburda emphasising the existence
of analogous
magno/parvocellular pathways in the auditory and the visual system, and
finding abnormalities
on the magno side of both modalities in certain dyslexics:
Galaburda, A. M., Menard, M. T., & Rosen, G. D. (1994). Evidence for
aberrant auditory anatomy
in developmental dyslexia. Proc.Natl.Acad.Sci.U.S.A, 91(17), 8010-8013.
REPLY #4 (Eckard Blumschein)

I just vaguely remember of a paper on temporal signal processing in
You might look into Brain Research 2000 (online via Neuroscion).
REPLY #5 (Pascal Belin)

you should check this excellent (but old) review:

Hammond, G R
Year: 1982
Title: Hemispheric differences in temporal resolution
Journal: Brain Cog.
Volume: 1
Pages: 95-118
REPLY #6 (Peter Cariani)

There are several lines of comparison.
I don't know of any comprehensive reviews.

The first line of comparison is between spatial visual rate-based MTF's
auditory rate-based MTF's. Many people in vision have examined rate
curves to drifting gratings (e.g. see work from Dan Pollen's lab and
DeValois' book on Spatial
Vision) and find that the best modulation frequencies in visual cortical
populations are in the 4-8 Hz range. This can be compared with similar
best MTF's for AM tones in auditory cortex (work of Shreiner and
Langner, and Shamma). The ripple stimuli that Shamma developed are
the auditory rate-place analogues of drifting sinusoidal spatial
gratings in vision
(see J. Neurophysiol. 76(5) pp 3503-3534). You should ask them which
are best. My recollection is that similar best modulation frequencies
are seen
for visual stimuli in visual cortex as for acoustic stimuli in auditory
cortex, i.e. BMFs
of 4-16 Hz. (One could think of this commonality as having to do with
factors that are
generic to cerebral cortex (organization, connectivity, mix of
inhibition and excitation,
recovery time courses of cortical pyramidal cells) or due to common
functional demands for
both audition and vision (e.g. perhaps common perceptual integration
times, if one
thinks in terms of thalamocortical oscillations)).

Phase-locking in vision could conceivably determine the limits of visual

Visual neurons in LGN and V1 phase-lock to drifting spatial patterns,
such that
spatial intervals could be encoded through stimulus-driven, spatial
patterns of
temporal correlations between spikes. Thus, when an image is drifted
across a
retina (the eyes are in constant drift, even during fixation), there
will be a
spatial pattern of temporally-correlated spikes in different retinotopic
channels. When there are no edges, the spikes produced roughly follow a
Poisson process,
so that there is no spatial correlation. When there are edges, there
then appear

spatial patterns of temporally-correlated spikes. In this view,
correlations encode spatial form, while uncorrelated rates encode avg.
luminance. Rather than the correlations between spikes in the same
neural channels that
appear to encode frequency and periodicity in audition, perhaps the
system uses correlations between spikes in different channels (i.e. more
like binaural
cross-correlation, with many channels, not just pairs of channels; or
Shamma's stereausis model or global cross-correlations).

On the theoretical and perceptual side, there are  spatial
theories of visual form perception (Uttal's model) that  are related to
temporal autocorrelation
models for pitch, just as Reichardt's motion detection model is related
to the Jeffress model
(i.e. computation of temporal disparities across different input
There is a visual, spatial-frequency equivalent of the "pitch of the
fundamental" (see the deValois book), and a temporal disparity depth
illusion (Pulfrich) in
which time-of-arrival differences in the two monocular pathways lead to

The Huggins and Bilsen phase-difference pitches are the auditory
equivalent of
Julez random-dot stereodiagram (and I have wondered whether Bela Julez
have gotten the idea from Huggins, who was also at Bell Labs in the
1950's --
many ideas and techniques have first been developed in audition then
used in vision). THere
is also a web demonstration of creation of visual form from temporal
*Lee & Blake, Science, 284, 1165-1168; they have demos at
These analogies all make sense if one maps spatial intervals to temporal

I recently wrote a minireview on temporal coding in different modalities
that is
available at http://peter-office.meei.harvard.edu/CarianiTempCodes.pdf.
References can be found there.

It has been known for over half a century that visual neurons phase-lock
to stimulus
frequencies below flicker-fusion, say < 50-100 Hz. Bialek & company
showed that
there is stimulus-related information in spike times in fly vision at
time resolutions
below 1 msec (I have heard 100-200 usec, which is comparable to
latency-jitters of first spikes in
the auditory nerve and auditory cortex (Phillips, 1988, Hearing Res, 40,
Heil, c. 1997, J Neurophysiol, 78, 2438-2454). Other workers (Reinagel &
Jonathan Victor's group) have been finding spike precisions of 5-10 ms
down to 1 ms,
at the limits of their current methods, in mammalian systems (thalamus,

The rate-based party line (e.g. Shadlen) has been moving to shorter and
integration times, and now there is talk of "instantaneous rates"
computed over
10 ms moving integration windows, rather than the tens or hundreds of
milliseconds of
yesteryear.  Stanley Klein's group, looking at vernier acuity for slowly
stimuli have estimated that spike jitters of 1 msec would be sufficient
to account for
human performances.

Finally, one does not need endbulbs to preserve timing. Endbulbs are
nice for
preserving timing, but convergence of many small inputs (Central Limit
and/or well-timed inhibition can actually improve it.  See work by
Smith, Joris,
& Yin, J Neurophysiol 1998 Jun;79(6):3127-42 and earlier papers by Joris
and others on
improvement of phase-locking through such processing. Things are
obviously different, timing wise, at the cortex. While phase locking to
periodicities above 100-200 Hz
is not at all obvious or strong in the vast majority of cortical single
distributions of first spike latencies can be quite compact (100-200
usec), so this should provoke us to think about how this could be if
integration times are on the order of tens of milliseconds, as is
commonly assumed. So much of the work in the auditory CNS has been done
under anesthesia, which, among other things, smears out fine timing

So don't restrict fine timing to the auditory brainstem just yet. There
are many

unsolved mysteries that lie before us, and what becomes of fine timing
is one of them.

Aniruddh D. Patel
The Neurosciences Institute
10640 John Jay Hopkins Drive
San Diego, CA 92121

Tel     858-626-2085
Fax     858-626-2099
Email   apatel@nsi.edu
Website http://www.nsi.edu/users/patel