Re: audition vs. vision (Peter Cariani )

Subject: Re: audition vs. vision
From:    Peter Cariani  <peter(at)EPL.MEEI.HARVARD.EDU>
Date:    Thu, 15 Mar 2001 15:27:37 -0500

Hi Ani, There are several lines of comparison. I don't know of any comprehensive reviews. The first line of comparison is between spatial visual rate-based MTF's and auditory rate-based MTF's. Many people in vision have examined rate tuning curves to drifting gratings (e.g. see work from Dan Pollen's lab and DeValois' book on Spatial Vision) and find that the best modulation frequencies in visual cortical populations are in the 4-8 Hz range. This can be compared with similar best MTF's for AM tones in auditory cortex (work of Shreiner and Langner, and Shamma). The ripple stimuli that Shamma developed are the auditory rate-place analogues of drifting sinusoidal spatial gratings in vision (see J. Neurophysiol. 76(5) pp 3503-3534). You should ask them which references are best. My recollection is that similar best modulation frequencies are seen for visual stimuli in visual cortex as for acoustic stimuli in auditory cortex, i.e. BMFs of 4-16 Hz. (One could think of this commonality as having to do with factors that are generic to cerebral cortex (organization, connectivity, mix of inhibition and excitation, recovery time courses of cortical pyramidal cells) or due to common functional demands for both audition and vision (e.g. perhaps common perceptual integration times, if one thinks in terms of thalamocortical oscillations)). Phase-locking in vision could conceivably determine the limits of visual acuity. Visual neurons in LGN and V1 phase-lock to drifting spatial patterns, such that spatial intervals could be encoded through stimulus-driven, spatial patterns of temporal correlations between spikes. Thus, when an image is drifted across a retina (the eyes are in constant drift, even during fixation), there will be a spatial pattern of temporally-correlated spikes in different retinotopic channels. When there are no edges, the spikes produced roughly follow a Poisson process, so that there is no spatial correlation. When there are edges, there then appear spatial patterns of temporally-correlated spikes. In this view, spatiotemporal correlations encode spatial form, while uncorrelated rates encode avg. luminance. Rather than the correlations between spikes in the same neural channels that appear to encode frequency and periodicity in audition, perhaps the visual system uses correlations between spikes in different channels (i.e. more like binaural cross-correlation, with many channels, not just pairs of channels; or like Shamma's stereausis model or global cross-correlations). On the theoretical and perceptual side, there are spatial autocorrelation theories of visual form perception (Uttal's model) that are related to temporal autocorrelation models for pitch, just as Reichardt's motion detection model is related to the Jeffress model (i.e. computation of temporal disparities across different input channels). There is a visual, spatial-frequency equivalent of the "pitch of the missing fundamental" (see the deValois book), and a temporal disparity depth illusion (Pulfrich) in which time-of-arrival differences in the two monocular pathways lead to apparent depth. The Huggins and Bilsen phase-difference pitches are the auditory equivalent of Julez random-dot stereodiagram (and I have wondered whether Bela Julez might have gotten the idea from Huggins, who was also at Bell Labs in the 1950's -- many ideas and techniques have first been developed in audition then used in vision). THere is also a web demonstration of creation of visual form from temporal structure *Lee & Blake, Science, 284, 1165-1168; they have demos at These analogies all make sense if one maps spatial intervals to temporal ones. I recently wrote a minireview on temporal coding in different modalities that is available at References can be found there. It has been known for over half a century that visual neurons phase-lock to stimulus frequencies below flicker-fusion, say < 50-100 Hz. Bialek & company showed that there is stimulus-related information in spike times in fly vision at time resolutions below 1 msec (I have heard 100-200 usec, which is comparable to latency-jitters of first spikes in the auditory nerve and auditory cortex (Phillips, 1988, Hearing Res, 40, 137-146; Heil, c. 1997, J Neurophysiol, 78, 2438-2454). Other workers (Reinagel & Reid; Jonathan Victor's group) have been finding spike precisions of 5-10 ms down to 1 ms, at the limits of their current methods, in mammalian systems (thalamus, cortex). The rate-based party line (e.g. Shadlen) has been moving to shorter and shorter integration times, and now there is talk of "instantaneous rates" computed over 10 ms moving integration windows, rather than the tens or hundreds of milliseconds of yesteryear. Stanley Klein's group, looking at vernier acuity for slowly moving stimuli have estimated that spike jitters of 1 msec would be sufficient to account for human performances. Finally, one does not need endbulbs to preserve timing. Endbulbs are nice for preserving timing, but convergence of many small inputs (Central Limit theorem) and/or well-timed inhibition can actually improve it. See work by Smith, Joris, & Yin, J Neurophysiol 1998 Jun;79(6):3127-42 and earlier papers by Joris and others on improvement of phase-locking through such processing. Things are obviously different, timing wise, at the cortex. While phase locking to periodicities above 100-200 Hz is not at all obvious or strong in the vast majority of cortical single units, distributions of first spike latencies can be quite compact (100-200 usec), so this should provoke us to think about how this could be if integration times are on the order of tens of milliseconds, as is commonly assumed. So much of the work in the auditory CNS has been done under anesthesia, which, among other things, smears out fine timing information. So don't restrict fine timing to the auditory brainstem just yet. There are many unsolved mysteries that lie before us, and what becomes of fine timing information is one of them. Good luck, Peter Aniruddh Patel wrote: > Dear List, > > Can anyone recommend a reference which compares the processing of > temporal information in audition vs. vision from a neural standpoint? > > It is known that the the auditory system has neuroanatomical > specializations which help preserve the precise timing of afferent input > (e.g. endbulbs of Held on bushy cells in the cochlear nucleus), but how > does this compare with the visual system? > > Thanks, > Ani Patel > > -- > Aniruddh D. Patel > The Neurosciences Institute > 10640 John Jay Hopkins Drive > San Diego, CA 92121 > > Tel 858-626-2085 > Fax 858-626-2099 > Email apatel(at) > Website

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University