Re: purely spectral pitch (Peter Cariani )

Subject: Re: purely spectral pitch
From:    Peter Cariani  <peter(at)>
Date:    Mon, 30 Oct 2000 14:46:21 -0400

Hi John, I'm not clear on how these psychophysical findings bear on the nature of the neural representations and processing mechanisms involved. Part of my original point was that we need to be as clear as possible about whether we are talking about inferences made from psychopacoustical data, from neural data, or from neurocomputational models. We also need to lay out, as self-consciously as possible, the assumptions that we use to interpret data of one type to make inferences about the other. > There are two lines of evidence that suggest that the auditory system > does not process such time structure [the temporal structure of spike coincidences and anticoincidences that are hypothesized to be present in the outputs of binaural cross-correlators]. Brackets mine, pac. > > Krumbholz and Patterson (1999, 2000) showed that the lowest discernible > pitch for complex dichotic pitches and for complex tones unmasked by > the binuaral system is about an octave higher than the lowest pitch of > stimuli that also provide temporal cues. This increase in the lower > limit of pitch is to be expected from a mechanism that relies on > spectral decomposition by the cochlea, because the ERB never gets > below about 30 Hz and components less that 60 Hz or so apart cannot > be resolved at any frequency. Could you unpack this a bit? I haven't had a chance to read Krumbholz & Patterson yet, but the assumption of the argument that you present seems to be that the dichotic pitch mechanism is based on some kind of frequency-domain, spectral representation because there is some correspondence with the behavior of ERB's, and these are conventionally defined in the frequency domain. We should bear in mind that this is psychophysical data that reflects the capacities of the whole system, and that inferences about the nature of the underlying neural processing involved are anything but straightforward. In neurocomputational terms, if one considers neural processing mechanisms in which one has a temporal contiguity window of 20-30 msec (e.g. coincidence arrays with a limit on the maximum relative delay), then the lowest periodicitites that can be represented/distinguished are around 30 Hz, and inputs that arrive outside of the temporal contiguity window will be processed independently (and therefore not summed together). There could be different temporal contiguity/temporal integration windows for monaural and binaural processing that depend on the ranges of response latencies available to stations in the different pathways. This idea of a temporal contiguity constraint for patterns is consistent with the spectral shape integration windows of Chistovitch (1985, JASA) (fusion of 2 temporally-offset single-formant vowels into one 2-formant vowel percept), of Hall (low pitches produced by non-temporally overlapping harmonics), and of Turgeon & Bregman (fusion/masking experiments using temporally-offset flankers). On the other hand, the integration windows for loudness summation are much longer, so we shouldn't assume that all processing involves the same temporal constraints. In any case, it seems to me that the argument rests on the assumption that psychophysically-observed ERB's must necessarily be explained in terms of purely spectral mechanisms (mainly because these are the terms in which current critical band models are cast). But I don't see anything that logically rules out a temporal neural mechanism for these phenomena. (Moore discussed some of these ideas in his textbook An Intro. to the Psychology of Hearing, 3rd ed.). Such a model could be formulated that took into account cochlear filtering and rate-level functions (bandwidths of the filters and rate-level functions affect the temporal patterns and rates of discharge that determine the shapes of population-wide interval distributions). The main difference between a temporal account and a rate-place account would lie not in the neural response properties per se (these are what they are), but what aspects of neural responses are analyzed by the central auditory system. Central use of temporal information is one way to bridge the gap that separates the coarseness of cochlear filters with the fineness of auditory perception (lateral inhibition is another). > Krumbholz and Patterson (2000) and Culling and Colburn (2000) have found > that binaural sluggishness applies to detection of temporal modulation of > signals unmasked by the binaural system. In other words, it detects only > the grossest of temporal structures. Slow temporal modulation of a pitch is different from the representation of the pitch itself, and there may be different mechanisms that integrate auditory images on different timescales such that fine distinctions can be made when the stimulis is stationary, but that changes in periodicity and/or location of the stimulus are registered more slowly. (Because we can distinguish ITD differences on the order of tens of microseconds doesn't mean we must register changes in ITD's that fast). So, I don't quite see that binaural sluggishness necessarily implies only a very coarse temporal analysis in all processing domains and at all levels of the system. -- Peter Cariani

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University