[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Autocorrelation-like representations and mechanisms

On Saturday, February 22, 2003, at 02:01  PM, Martin Braun wrote:
As to the pitch model of autocorrelation, also this is a model that is
anatomically and physiologically unrealistic. Ray Meddis, one of the
advocates of this model has now given it up, in favor of a new model
below) that is based on anatomical and physiological data that were
described in detail by Gerald Langner and me.

I do regret not reading my email while at ARO last week.

I don't think that Ray's recent proposed mechanism for central
processing of peripheral temporal patterns is necessarily a repudiation
of the notion of an early representation of pitch based on all-order
interval information. In any case, most pitch phenomena do behave in a
manner that is consistent with an autocorrelation-like representation
(more on this below), so any putative central neural mechanism needs to
behave accordingly. This is why Licklider constructed the neural
temporal autocorrelation architecture that he did. We don't stop saying
that the binaural system is a cross-correlator because inhibition may
be involved. Likewise, the monaural system as a whole may behave in
most important respects like an autocorrelator, even if the mechanism
is not a Licklider style network with synaptic delays and coincidence
counters that function as explicit pitch detectors. I do very much hope
that Ray's mechanism works (it would solve what I think is the central
open problem in auditory neurophysiology, how the auditory CNS uses
monaural peripheral timing information), but there are strong reasons
to doubt a priori that any mechanism based on modulation-tuning per se
will work (more below).

We need to make the distinction between neural representations of pitch
and the central neural mechanisms that analyze the representations.

In order to reverse-engineer the auditory system, we need to understand
3 things:
1) what auditory functions  the system performs (detections,
discriminations, groupings, etc -- psychophysics)
2) the nature of the neural representations for auditory percepts (the
signals that the system uses)
3) the nature of the neural computational mechanisms by which the
system operates on its internal representations to realize auditory

It makes a great deal of sense to work on all three problems
simultaneously, since they are all interrelated.
We need to find aspects of neural activity/information processing that
resemble or can support the perceptual dimensions and distinctions that
are observed psychophysically.

Meddis & Hewitt's simulations study  (1991, 1998) and our experimental
study (Cariani & Delgutte, 1996) were directed at the nature of the
early neural REPRESENTATION of pitch, and not at the central MECHANISMS
by which pitch is analyzed.

These  studies showed that features of the global interspike interval
distribution of the auditory nerve predict very well a surprisingly
wide variety of pitch phenomena. Some of these are:

1) Pitch of pure tones
2) Low F0 pitch of complex tones, of both low and high harmonics
3) Invariance of pitch over a very wide range of stimulus levels
4) Invariance of pitch of complex tones with low harmonics with respect
to phase spectrum
       and phase-dependent envelope effects for high harmonics
5) Pitch shift of inharmonic complex tones
6) Low pitch of iterated ripple noise
7) Dominance frequency region for low pitch
8) Pitch equivalence between pure tones and low pitch of complex tones
9) Low pitch of AM noise
10) Relative pitch strength (salience)

These representations are "anatomically and physiologically" realistic
-- we know that the information is present in the temporal discharge
patterns of neurons in the auditory nerve and cochlear nucleus, and at
the very least in the inputs to the inferior colliculus, and possibly
higher. There is nothing at all "unrealistic" about patterns of spikes.
These representations also can handle timbral distinctions associated
with differences in spectra of stationary sounds (e.g. vowel formant
structure, qualities associated with lower frequency resonances of
vocal tracts and musical instruments). My poster at ARO dealt with how
competition between intervals could potentially explain aspects of
pitch masking and harmonic resolution.

Frankly, I am always surprised by how quickly people are willing to
discard all-order interspike interval representations on the basis of
very limited evidence that is interpreted in the most shallow way.
There is also a long history in the auditory field of interval
representations being dismissed for a variety of flimsy reasons (e.g
the old canard that auditory nerve fibers can't support periodicities
above 300 Hz) and the inadequacies of straw-man models. I just wish,
for once, that the same criticality could be applied to the central
neural analysis of rate-place patterns (someone give me a plausible
neurally-grounded account at the level of the midbrain or higher how we
discriminate the pitches of 1000 Hz from 1050 Hz pure tones when levels
are roved between 60 and 90 dB SPL and/or when the two tones are
presented at different locations in auditory space).

We also need to be more judicious in where the all-order interval
models apply. The interesting ARO poster by Rebecca Watkinson and Chris
Plack that was mentioned involved transient phase-shifts that reminded
me a great deal of Kubovy's demonstrations of popping out by
transiently phase-shifted harmonics (when they are perceptually
separated from the rest of the harmonic complex, presumably they don't
contribute to the F0 pitch of the complex). Definitely, our models for
pitch need to take into account auditory grouping/ fusion/ object
formation factors and mechanisms. I think these mechanisms must precede
analysis of pitch (which, depending on where one puts pitch analysis,
may or may not mean that they are low down in the pathway). In any
case, these shifts are very interesting, and it remains to be seen
whether incorporating phase-shift resets in the autocorrelation model
(e.g. as in Patterson's strobed temporal integration processing or my
recurrent timing nets) will account for the observed pitch effects.

I do readily agree that no obvious neuronal autocorrelators have been
found in any abundance in the auditory pathway, but it is still
possible that something like an autocorrelation analysis is carried out
by other means. (Langner,  et al had an interesting ARO poster with an
intriguing potential anatomical structure for comb
filtering/autocorrelation, but the physiological evidence is still
pretty scant -- too early to tell how real it is).

Modulation-tuned units have been found in abundance, but there are some
basic problems with these when it comes to pitch:
1) they cannot explain pitch equivalence between pure and complex tones
(big, big problem)
2) they are not likely to represent multiple competing pitches in a
robust fashion (e.g. two musical instruments playing notes a third
apart --)
3) they are not likely to yield a representation that does not degrade
at high SPLs
4) they are not likely to explain the pitch shifts of inharmonic
complex tones
5) it's not clear if predicted pitches of low harmonics will be
invariant with respect to phase spectrum (as they should be)

I'm sure Ray will test these kinds of contingencies in his model, and
we'll see how well it works. In the meantime, it would probably be best
to adopt a wait-and-see attitude before throwing out all-order interval
representations. We should welcome all attempts to grapple with the
problem of the central use of timing information (e.g. by R. Meddis, L.
Carney, S. Shamma, yourself and others, more power to you all). I
myself think that the problem may lie in our tacit expectations of the
nature of the central mechanisms and representations -- wouldn't life
be so much simpler if there were nice, level-invariant single-neuron
pitch detectors somewhere in the auditory brainstem, midbrain or
thalamus? It feels like we are missing something big. The problem could
involve our fixation with single neurons as the atomic level of signal
processing. Maybe the system doesn't work that way, and we need to
consider central representations that are based on patterns of firing
(across-neuron intervals, synchronies, latencies, rates) rather than
which specific neural elements are firing how much. The all-order
interval distributions still do follow the psychophysics better than
any of our proposed central analysis mechanisms.  I can also see how
one might process interval information completely in the time domain
with a rich set of delays and coincidence detectors ("neural timing
nets"), but there is no really obvious place where such hypothetical
processing could be carried out.

There is a time and place to be literal-minded and a time and place to
use one's imagination.
Science necessarily involves BOTH conceptual hypothesis formation and
empirically-based hypothesis testing.
When a problem is ill-defined and we don't understand the basic nature
of the solution, then we need to use our imaginations and temporarily
suspend disbelief in order to formulate and entertain new hypotheses --
Using our imaginations is the only way of "getting out of the box" when
we are stuck in a rut and none of our theories work very well (and
without a host of ad-hoc assumptions -- sometimes we are too clever for
our own good).
Once we are on the right track, then it is time to do "normal science"
and "puzzle-solving" (Popper's terms)-- to fill in the gaps and do
hard-nosed empirical testing of hypotheses.

The question is where on that continuum between ill-defined vs.
well-defined do we think the problem of the neural coding of pitch
currently lies?
Do we see a clear direction for the path ahead? How best should we move

Peter Cariani

Kubovy M (1981) Concurrent-pitch segregation and the theory of
indispensible attributes. In: Perceptual Organization (Kubovy M,
Pomerantz JR, eds), pp 55-98. Hillsdale, NJ: Lawrence Erlbaum Assoc.
Kubovy M, Jordan R (1979) Tone-segregation by phase: On the phase
sensitivity of the single ear. J Acoust Soc Am 66:100-106.

Peter Cariani, PhD
Eaton Peabody Laboratory of Auditory Physiology
Massachusetts Eye & Ear Infirmary
243 Charles St., Boston, MA 02114 USA

Assistant Professor
Department of Otology & Laryngology
Harvard Medical School

voice   (617) 573-4243
fax             (617) 720-4408
email   peter@epl.meei.harvard.edu
web             www.cariani.com

Pitch Shifts For Unresolved Complex Tones And The Implications For
Models Of
Pitch Perception

*Rebecca Kensey Watkinson, Christopher John Plack
Department of Psychology, University of Essex, Colchester, United

A Model of the Physiological basis of Pitch Perception.

*Raymond Meddis
Psychology, University of Essex, Colchester, United Kingdom

Little is known about how pitch is processed by the auditory nervous
system. Autocorrelation models of pitch extraction have been successful
in simulating a large number of psychophysical results in this area but
there is little support for the idea that the nervous system acts as an
explicit autocorrelation device. To address this issue, this poster
presents a design for a new model of pitch perception based upon
known neural architecture and also presents some preliminary pitch
analyses using the model. The model offers a physiologically plausible
system for periodicity coding that avoids the need for long delay lines
required by autocorrelation. The system incorporates a model of the
human auditory periphery including outer/middle ear transfer
characteristics, nonlinear frequency analysis and mechanical-electrical
transduction by inner hair cells. The resulting 'auditory nerve' spike
train is used as the input to three further stages of signal processing
thought to be located in the cochlear nucleus, central nucleus and the
external cortex of the inferior colliculus, respectively. The complete
model is implemented using DSAM, a development system for auditory
modelling. The output from the system is the activity of a single array
of neurons each sensitive to different periodicities. The pattern of
activity across this array is uniquely related to the fundamental
frequency of a harmonic complex. The testing of the model is still in
early stages but has so far been successfully tested using a range of
harmonic stimuli and iterated ripple noise stimuli. The poster will
on current progress in testing and refining the model.