[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: audition vs. vision

Hi Ani,

There are several lines of comparison.
I don't know of any comprehensive reviews.

The first line of comparison is between spatial visual rate-based MTF's and
auditory rate-based MTF's. Many people in vision have examined rate tuning
curves to drifting
gratings (e.g. see work from Dan Pollen's lab and DeValois' book on Spatial
Vision) and find that
the best modulation frequencies in visual cortical populations are in the 4-8 Hz
range. This
can be compared with similar best MTF's for AM tones in auditory cortex (work of

Shreiner and Langner, and Shamma). The ripple stimuli that Shamma developed are
auditory rate-place analogues of drifting sinusoidal spatial gratings in vision
(see J. Neurophysiol. 76(5) pp 3503-3534). You should ask them which references
are best. My recollection is that similar best modulation frequencies are seen
for visual
stimuli in visual cortex as for acoustic stimuli in auditory cortex, i.e. BMFs
of 4-16 Hz.
(One could think of this commonality as having to do with factors that are
generic to
cerebral cortex (organization, connectivity, mix of inhibition and excitation,
recovery time
courses of cortical pyramidal cells) or due to common functional demands for
audition and vision (e.g. perhaps common perceptual integration times, if one
thinks in terms
of thalamocortical oscillations)).

Phase-locking in vision could conceivably determine the limits of visual acuity.

Visual neurons in LGN and V1 phase-lock to drifting spatial patterns, such that
spatial intervals could be encoded through stimulus-driven, spatial patterns of
temporal correlations between spikes. Thus, when an image is drifted across a
retina (the eyes are in constant drift, even during fixation), there will be a
spatial pattern of temporally-correlated spikes in different retinotopic
When there are no edges, the spikes produced roughly follow a Poisson process,
so that there is no spatial correlation. When there are edges, there then appear

spatial patterns of temporally-correlated spikes. In this view, spatiotemporal
correlations encode spatial form, while uncorrelated rates encode avg.
Rather than the correlations between spikes in the same neural channels that
appear to encode frequency and periodicity in audition, perhaps the visual
uses correlations between spikes in different channels (i.e. more like binaural
cross-correlation, with many channels, not just pairs of channels; or like
stereausis model or global cross-correlations).

On the theoretical and perceptual side, there are  spatial autocorrelation
theories of visual
form perception (Uttal's model) that  are related to temporal autocorrelation
models for pitch,
just as Reichardt's motion detection model is related to the Jeffress model
(i.e. computation
of temporal disparities across different input channels).
There is a visual, spatial-frequency equivalent of the "pitch of the missing
(see the deValois book), and a temporal disparity depth illusion (Pulfrich) in
time-of-arrival differences in the two monocular pathways lead to apparent
The Huggins and Bilsen phase-difference pitches are the auditory equivalent of
Julez random-dot stereodiagram (and I have wondered whether Bela Julez might
have gotten the idea from Huggins, who was also at Bell Labs in the 1950's --
many ideas
and techniques have first been developed in audition then used in vision). THere
also a web demonstration of creation of visual form from temporal structure
*Lee & Blake, Science, 284, 1165-1168; they have demos at
These analogies all make sense if one maps spatial intervals to temporal ones.

I recently wrote a minireview on temporal coding in different modalities that is
available at http://peter-office.meei.harvard.edu/CarianiTempCodes.pdf.
References can be found

It has been known for over half a century that visual neurons phase-lock to
frequencies below flicker-fusion, say < 50-100 Hz. Bialek & company showed that
is stimulus-related information in spike times in fly vision at time resolutions
below 1 msec (I have
heard 100-200 usec, which is comparable to latency-jitters of first spikes in
the auditory
nerve and auditory cortex (Phillips, 1988, Hearing Res, 40, 137-146;
Heil, c. 1997, J Neurophysiol, 78, 2438-2454). Other workers (Reinagel & Reid;
Victor's group) have been finding spike precisions of 5-10 ms down to 1 ms,
at the limits of their current methods, in mammalian systems (thalamus, cortex).

The rate-based party line (e.g. Shadlen) has been moving to shorter and shorter
integration times, and now there is talk of "instantaneous rates"  computed over
10 ms
moving integration windows, rather than the tens or hundreds of milliseconds of
yesteryear.  Stanley Klein's group, looking at vernier acuity for slowly moving
have estimated that spike jitters of 1 msec would be sufficient to account for
human performances.

Finally, one does not need endbulbs to preserve timing. Endbulbs are nice for
preserving timing, but convergence of many small inputs (Central Limit theorem)
and/or well-timed inhibition can actually improve it.  See work by Smith, Joris,
& Yin,
J Neurophysiol 1998 Jun;79(6):3127-42 and earlier papers by Joris and others on
improvement of phase-locking through such processing.
Things are obviously different, timing wise, at the cortex.
While phase locking to periodicities above 100-200 Hz
is not at all obvious or strong in the vast majority of cortical single units,
distributions of first spike latencies
can be quite compact (100-200 usec), so this should provoke us to think about
this could be if integration times are on the order of tens of milliseconds, as
is commonly assumed. So much of the work in the auditory CNS has been done
under anesthesia, which, among other things, smears out fine timing information.

So don't restrict fine timing to the auditory brainstem just yet. There are many

unsolved mysteries that lie before us, and what becomes of fine timing
is one of them.

Good luck,

Aniruddh Patel wrote:

> Dear List,
> Can anyone recommend a reference which compares the processing of
> temporal information in audition vs. vision from a neural standpoint?
> It is known that the the auditory system has neuroanatomical
> specializations which help preserve the precise timing of afferent input
> (e.g. endbulbs of Held on bushy cells in the cochlear nucleus), but how
> does this compare with the visual system?
> Thanks,
> Ani Patel
> --
> Aniruddh D. Patel
> The Neurosciences Institute
> 10640 John Jay Hopkins Drive
> San Diego, CA 92121
> Tel     858-626-2085
> Fax     858-626-2099
> Email   apatel@nsi.edu
> Website http://www.nsi.edu/users/patel