[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: purely spectral pitch

Hi John,

I'm not clear on how these psychophysical findings
 bear on the nature of the neural representations
and processing mechanisms involved. Part of my
original point was that we need to be as clear as possible
about whether we are talking about inferences made
from psychopacoustical data, from neural data,
or from neurocomputational models. We also need to
lay out, as self-consciously as possible, the assumptions that we
use to interpret data of one type to make inferences about
the other.

> There are two lines of evidence that suggest that the auditory system
> does not process such time structure [the temporal structure of
spike coincidences and anticoincidences that are hypothesized to be
present in the outputs of
binaural cross-correlators].  Brackets mine, pac.
> Krumbholz and Patterson (1999, 2000) showed that the lowest discernible
> pitch for complex dichotic pitches and for complex tones unmasked by
> the binuaral system is about an octave higher than the lowest pitch of
> stimuli that also provide temporal cues. This increase in the lower
> limit of pitch is to be expected from a mechanism that relies on
> spectral decomposition by the cochlea, because the ERB never gets
> below about 30 Hz and components less that 60 Hz or so apart cannot
> be resolved at any frequency.

Could you unpack this a bit?

I haven't had a chance to read Krumbholz & Patterson yet, but
the assumption of the argument that you present seems to be
that the dichotic pitch mechanism is based on some kind
of frequency-domain, spectral representation because there is some correspondence
with the behavior of ERB's, and these are conventionally defined in the frequency
domain. We should bear in mind that this is psychophysical data that
reflects the capacities of the whole system, and that inferences about
nature of the underlying neural processing involved are anything but straightforward.

In neurocomputational terms, if one considers neural processing
mechanisms in which one
has a temporal contiguity window of 20-30 msec  (e.g. coincidence
arrays with a limit on the maximum relative delay), then the lowest
periodicitites that
can be represented/distinguished are around 30 Hz, and inputs that
arrive outside of the
temporal contiguity window will be processed independently (and therefore
not summed together).  There could be different temporal
contiguity/temporal integration
windows for monaural and binaural processing that depend on the ranges
of response
latencies available to stations in the different pathways.

This idea of a temporal contiguity constraint for patterns is
 consistent with the spectral shape integration windows of Chistovitch
(1985, JASA) (fusion of 2 temporally-offset single-formant
vowels into one 2-formant vowel percept), of Hall (low pitches produced by
non-temporally overlapping harmonics), and of Turgeon & Bregman
(fusion/masking experiments using temporally-offset flankers). On the other
hand, the integration windows for loudness summation are much longer, so we
shouldn't assume that all processing involves the same temporal constraints.

In any case, it seems to me that the argument rests on the assumption that
psychophysically-observed ERB's must necessarily be explained in terms of
purely spectral mechanisms (mainly because these are the terms in which
current critical band models are cast). But I don't see anything that
logically rules out a
temporal neural mechanism for these phenomena. (Moore discussed some of
these ideas
in his textbook An Intro. to the Psychology of Hearing, 3rd ed.).
Such a model could be formulated that took into account cochlear
and rate-level functions (bandwidths of the filters and rate-level
affect the temporal patterns and rates of discharge that determine
the shapes of population-wide interval distributions).
The main difference between a temporal account and a
rate-place account would lie not in the neural response properties per se
(these are what they are), but what aspects of neural responses are
analyzed by the central auditory system. Central use of temporal information
is one way to bridge the gap that separates the coarseness of cochlear
filters with
the fineness of auditory perception (lateral inhibition is another).

> Krumbholz and Patterson (2000) and Culling and Colburn (2000) have found
> that binaural sluggishness applies to detection of temporal modulation of
> signals unmasked by the binaural system. In other words, it detects only
> the grossest of temporal structures.

Slow temporal modulation of a pitch is different from the representation
the pitch itself, and there may be different mechanisms that integrate auditory
images on different timescales such that fine distinctions can be made when
the stimulis is stationary, but that changes in periodicity and/or
location of the stimulus
are registered more slowly. (Because we can distinguish ITD differences
on the
order of tens of microseconds doesn't mean we must register changes in
that fast). So, I don't quite see that binaural sluggishness necessarily
implies only a very coarse temporal analysis in all processing domains
and at all levels of the system.

-- Peter Cariani