audition vs. vision: responses (Aniruddh Patel )

Subject: audition vs. vision: responses
From:    Aniruddh Patel  <apatel(at)NSI.EDU>
Date:    Wed, 21 Mar 2001 14:18:03 -0800

Dear List, Several people requested I post these responses to my query about auditon vs. vision. Many thanks to all who replied. Ani Patel ---------------------------------------------------------------------------------- ORIGNAL QUERY: Can anyone recommend a reference which compares the processing of temporal information in audition vs. vision from a neural standpoint? It is known that the the auditory system has neuroanatomical specializations which help preserve the precise timing of afferent input (e.g. endbulbs of Held on bushy cells in the cochlear nucleus), but how does this compare with the visual system? ---------------------------------------------------------------------------------- REPLY #1 (Jont Allen) I made a lame shot at this, which is published: (at)inproceedings{Allen00a ,author={Allen, J.B.} ,title={The intensity JND comes from Poisson neural noise: Implications for image coding} ,booktitle={Human Vision and Electronic Imaging V} ,editor={Rogowiz, B.E. and Pappas, T.N.} ,publisher={Proc. of SPIE} ,volume=3959 ,month=jan ,address={PO Box 10, Bellingham, Washington 98227-0010} ,year={2000} ,pages={222--233} ,note_={} } It is at and is called Vision.pdf I am not sure it is what you are looking for, but it is in the ball park. ---------------------------------------------------------------------------------- REPLY #2 (Israel Nelken) There are at least two time scales at which you can compare vision and audition. I don't think the sub-ms scale of audition really has an equivalent in vision, but the longer time scales, of 10s of ms, might have. As an example: the binaural system (which requires the sub-ms scale, and is extremely specialized) has also a long time constant. It is expressed in the so-called sluggishness of the binaural system, that cannot follow very fast changes in ITD (more than order of 10 Hz). I guess the sluggishness has parallel in vision. More specifically: Hermann Wagner had a paper in Nature with Barry Frost on the barn owl stereo vision, where they used methods taken from auditory research. Hermann has a lot written on auditory motion, which requires temporal processing, and he finds (at least in the barn owl) quite a lot that is similar to visual motion detectors. ------------------------------------------------------------------------------ REPLY #3 (Franck Ramus) I'm not sure how relevant this is to your question, but part of the literature on dyslexia focuses on impairments of temporal processing in both the auditory and the visual modality. the most explicit paper about this is probably: Stein, J., & Talcott, J. B. (1999). Impaired neuronal timing in developmental dyslexia: The magnocellular hypothesis. Dyslexia, 5, 59-77. perhaps you will find it hard to find, ask me a copy then. a less explicit version is: Stein, J., & Walsh, V. (1997). To see but not to read; the magnocellular theory of dyslexia. Trends Neurosci., 20(4), 147-152. a lot of the story relies on work by Galaburda emphasising the existence of analogous magno/parvocellular pathways in the auditory and the visual system, and finding abnormalities on the magno side of both modalities in certain dyslexics: Galaburda, A. M., Menard, M. T., & Rosen, G. D. (1994). Evidence for aberrant auditory anatomy in developmental dyslexia. Proc.Natl.Acad.Sci.U.S.A, 91(17), 8010-8013. ---------------------------------------------------------------------------------------------- REPLY #4 (Eckard Blumschein) I just vaguely remember of a paper on temporal signal processing in vision. You might look into Brain Research 2000 (online via Neuroscion). --------------------------------------------------------------------------------------------- REPLY #5 (Pascal Belin) you should check this excellent (but old) review: Hammond, G R Year: 1982 Title: Hemispheric differences in temporal resolution Journal: Brain Cog. Volume: 1 Pages: 95-118 --------------------------------------------------------------------------------------------- REPLY #6 (Peter Cariani) There are several lines of comparison. I don't know of any comprehensive reviews. The first line of comparison is between spatial visual rate-based MTF's and auditory rate-based MTF's. Many people in vision have examined rate tuning curves to drifting gratings (e.g. see work from Dan Pollen's lab and DeValois' book on Spatial Vision) and find that the best modulation frequencies in visual cortical populations are in the 4-8 Hz range. This can be compared with similar best MTF's for AM tones in auditory cortex (work of Shreiner and Langner, and Shamma). The ripple stimuli that Shamma developed are the auditory rate-place analogues of drifting sinusoidal spatial gratings in vision (see J. Neurophysiol. 76(5) pp 3503-3534). You should ask them which references are best. My recollection is that similar best modulation frequencies are seen for visual stimuli in visual cortex as for acoustic stimuli in auditory cortex, i.e. BMFs of 4-16 Hz. (One could think of this commonality as having to do with factors that are generic to cerebral cortex (organization, connectivity, mix of inhibition and excitation, recovery time courses of cortical pyramidal cells) or due to common functional demands for both audition and vision (e.g. perhaps common perceptual integration times, if one thinks in terms of thalamocortical oscillations)). Phase-locking in vision could conceivably determine the limits of visual acuity. Visual neurons in LGN and V1 phase-lock to drifting spatial patterns, such that spatial intervals could be encoded through stimulus-driven, spatial patterns of temporal correlations between spikes. Thus, when an image is drifted across a retina (the eyes are in constant drift, even during fixation), there will be a spatial pattern of temporally-correlated spikes in different retinotopic channels. When there are no edges, the spikes produced roughly follow a Poisson process, so that there is no spatial correlation. When there are edges, there then appear spatial patterns of temporally-correlated spikes. In this view, spatiotemporal correlations encode spatial form, while uncorrelated rates encode avg. luminance. Rather than the correlations between spikes in the same neural channels that appear to encode frequency and periodicity in audition, perhaps the visual system uses correlations between spikes in different channels (i.e. more like binaural cross-correlation, with many channels, not just pairs of channels; or like Shamma's stereausis model or global cross-correlations). On the theoretical and perceptual side, there are spatial autocorrelation theories of visual form perception (Uttal's model) that are related to temporal autocorrelation models for pitch, just as Reichardt's motion detection model is related to the Jeffress model (i.e. computation of temporal disparities across different input channels). There is a visual, spatial-frequency equivalent of the "pitch of the missing fundamental" (see the deValois book), and a temporal disparity depth illusion (Pulfrich) in which time-of-arrival differences in the two monocular pathways lead to apparent depth. The Huggins and Bilsen phase-difference pitches are the auditory equivalent of Julez random-dot stereodiagram (and I have wondered whether Bela Julez might have gotten the idea from Huggins, who was also at Bell Labs in the 1950's -- many ideas and techniques have first been developed in audition then used in vision). THere is also a web demonstration of creation of visual form from temporal structure *Lee & Blake, Science, 284, 1165-1168; they have demos at These analogies all make sense if one maps spatial intervals to temporal ones. I recently wrote a minireview on temporal coding in different modalities that is available at References can be found there. It has been known for over half a century that visual neurons phase-lock to stimulus frequencies below flicker-fusion, say < 50-100 Hz. Bialek & company showed that there is stimulus-related information in spike times in fly vision at time resolutions below 1 msec (I have heard 100-200 usec, which is comparable to latency-jitters of first spikes in the auditory nerve and auditory cortex (Phillips, 1988, Hearing Res, 40, 137-146; Heil, c. 1997, J Neurophysiol, 78, 2438-2454). Other workers (Reinagel & Reid; Jonathan Victor's group) have been finding spike precisions of 5-10 ms down to 1 ms, at the limits of their current methods, in mammalian systems (thalamus, cortex). The rate-based party line (e.g. Shadlen) has been moving to shorter and shorter integration times, and now there is talk of "instantaneous rates" computed over 10 ms moving integration windows, rather than the tens or hundreds of milliseconds of yesteryear. Stanley Klein's group, looking at vernier acuity for slowly moving stimuli have estimated that spike jitters of 1 msec would be sufficient to account for human performances. Finally, one does not need endbulbs to preserve timing. Endbulbs are nice for preserving timing, but convergence of many small inputs (Central Limit theorem) and/or well-timed inhibition can actually improve it. See work by Smith, Joris, & Yin, J Neurophysiol 1998 Jun;79(6):3127-42 and earlier papers by Joris and others on improvement of phase-locking through such processing. Things are obviously different, timing wise, at the cortex. While phase locking to periodicities above 100-200 Hz is not at all obvious or strong in the vast majority of cortical single units, distributions of first spike latencies can be quite compact (100-200 usec), so this should provoke us to think about how this could be if integration times are on the order of tens of milliseconds, as is commonly assumed. So much of the work in the auditory CNS has been done under anesthesia, which, among other things, smears out fine timing information. So don't restrict fine timing to the auditory brainstem just yet. There are many unsolved mysteries that lie before us, and what becomes of fine timing information is one of them. ------------------------------------------------------------------------------------------ END OF REPLIES -- Aniruddh D. Patel The Neurosciences Institute 10640 John Jay Hopkins Drive San Diego, CA 92121 Tel 858-626-2085 Fax 858-626-2099 Email apatel(at) Website

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University