Re: Granular synthesis and auditory segmentation (Peter Cariani )


Subject: Re: Granular synthesis and auditory segmentation
From:    Peter Cariani  <peter(at)epl.meei.harvard.edu>
Date:    Thu, 15 Oct 1998 16:44:36 +0100

While I agree with much of what Bates and Fabbri are saying regarding threshold crossing times (essentially I believe that frequency information is largely conveyed through an interspike interval code operating over the entire auditory nerve array -- i.e. all-order interval statistics of tens of thousands of auditory nerve fibers that constitutes an autocorrelation-like representation of the stimulus), there have been expressed some common misconceptions about how auditory nerve fibers respond to sounds. Many general neuroscience textbooks are absolutely terrible on these issues. Meijer: >I don't know about your neurons, but mine completely fail >to replenish their synapses above about 1 or 2 kHz even >after plenty of coffee. Fabbri: ... Actually many physiology books discuss the refractory period (the ability to replenish chemical balance) of Cochlear neurons as operating to about 5"KHz". ... A detail readily confirmed by the literature. ... Given that most telephony systems run 300"Hz" to 3"KHz", the 5"KHz" refractory period does well in practical situations! Relative refractory periods of auditory nerve fibers in the cat are generally on the order of 1 - 10 msec or more, with most having mean refractory periods around 4 or 5 msec. This can be deduced from maximum sustained driven rates that are seen in the auditory nerve (generally a few hundred Hz). I think this is what the discussion of synaptic replenishment was alluding to.) Minimum refractory periods can be as short as 750 usec, but one sees these as occasional spike doublets rather than sustained bursts or stimulus-entrained responses. While ANF's do not fire once every cycle for components above a few hundred Hz, they do phase-lock to stimulus components (at least) up to 4-5 kHz. Because there is synaptic jitter, one would have to look at longer and longer spike trains in order to detect significant phase-locking at higher frequencies if it is indeed there. A few people have looked but there is some dispute over methodological details. I think there is also a question of whether anesthesia might affect phase-locking at higher frequencies (at other levels of the auditory system, barbiturate anesthetics appear to reduce the upper frequencies for which there is timing information (by whatever criterion) by half. The "volley principle" was proposed in the 1930's as a means by which ANF's collectively could temporally encode stimulus periodicities up to several kHz, despite the fact that no one ANF fires at that high a rate. There were some initial errors in the way the volley principle was conceptualized, i.e. it was thought that each ANF would fire deterministically once every N-th cycle, whereas it was later found that whether the ANF fires on any given cycle is stochastic rather than deterministic (e.g. a Poisson process with deadtime). There are some far-reaching implications of phase-locking of responses that should be drawn out and discussed. These apply not only to the auditory system, but also to other sensory systems whose receptors follow the stimulus waveform (mechanoception, for flutter-vibration; vision, for drifting images). First, to the extent that there is phase-locking the fine temporal structure of the stimulus is represented in the times-of-arrivals of spikes. This means that the phase structure of the stimulus is contained in the times-of-arrivals of spikes in the ANF array (albeit with different cochlear delays and absolute spike latencies for different ANF's). Second, to the extent that there is phase-locking, all-order interval distributions will roughly reflect the autocorrelation functions of the stimulus. For each CF-channel in the auditory nerve array, this is the autocorrelation function of the stimulus after it has undergone cochlear filtering. When autocorrelations of many overlapping frequency channels are combined, the form of the summary autocorrelation resembles that of the unfiltered stimulus (this is the autocorrelational equivalent of summing all of the individual components in a power spectrum). We see a rough resemblance between the forms of all-order interval distributions taken over populations of ANF's with many different CF's and those of the stimulus autocorrelation function, i.e. patterns of major and minor peaks and their locations for periodicities below a few kHz. [I say "roughly reflect", because half-wave rectification does subtly alter the form of interspike interval distributions and autocorrelation functions. The autocorrelation of a sinusoidal component has the form of a cosine of the same frequency; the autocorrelation of a half-wave rectified component has flattened troughs, but closely resembles that of the unrectified component near the peaks. The positions of peaks and troughs are unchanged by half-wave rectification. When you listen to a half-wave rectified pure tone, it does sound very slightly different in timbre, slightly buzzy, but the pitch is the same,so the pure tone and its rectified counterpart don't sound all that different. I think Didier is right, however, in insisting on factoring in the effects of cochlear filtering first -- there are simple thought experiments with waveforms having positive-negative asymmetries which nevertheless still sound the same when polarities are reversed. If we only listened to the positive parts of asymmetric waveforms, different polarities should sound different.] To the extent that there is phase-locking, in the time-of-arrival patterns one then has phase structure, whereas in the interval patterns created by the times-of-arrivals one has a representation of stimulus spectral structure via an autocorrelation like temporal representation. Common, repeated time-of-arrival (phase) structure is a very basic & powerful heuristic for perceptual grouping that does not require channel-selection or segregation. So it's possible to use phase in building up fused auditory objects, but then to use other aspects of the pattern, such as interspike interval statistics (which are phase-deaf) to encode the attributes of auditory objects (pitch, timbre, rhythm). There is a very interesting paper by Kubovy in Kubovy & Pomerantz, eds. (1981, Perceptual Organization, LEA) on phase effects and auditory grouping. One of the big advantages of using spike timing are the precisions of spike arrival times (clearly the 10-20 usec timing jnd's that are seen for binaural ITD's are due to phase-locked spike precisions although it is not clear to me concretely how central ITD analyzers actually achieve those precisions. Jnd's for pure tone frequencies near 1 kHz similarly correspond to temporal differences on the order of 20 usec........). Other advantages are the extremely invariant nature of the interval patterns that are produced, in the face of large changes in level, background noise, location in auditory space, etc. etc. etc. This makes the problem of centrally representing auditory form invariances much simpler (one does not require compensatory mechanisms to adjust for level or s/n or locational differences or system nonlinearities.) If one takes these mechanisms seriously, then the thing to look at are (running) autocorrelations of various signals of interest (speech, music, env. sounds, etc.). Running autocorrelations are expansions of the signal (e.g. RAC(t, tau) = X(t)*X(t-tau)) that require no contiguous temporal analysis window (as a spectrogram does), and hence can have as fine time resolution as the time-series that is the waveform being analyzed. It's a way of getting "instantaneous periodicity" (oxymoron). Peter Cariani Peter Cariani, Ph.D. Eaton Peabody Laboratory of Auditory Physiology Massachusetts Eye & Ear Infirmary 243 Charles St, Boston MA 02114 tel (617) 573-4243 FAX (617) 720-4408 email peter(at)epl.meei.harvard.edu McGill is running a new version of LISTSERV (1.8d on Windows NT). Information is available on the WEB at http://www.mcgill.ca/cc/listserv


This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University