[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Christian Kaernbach wrote:
> Peter Cariani wrote:
> > 1) The "autocorrelation" model that they knocked down was not neural
> > model;
> It would be difficult to knock down all possible neural autocorrelation
> models. Our experiment gave a hint that there might be a problem with
> higher-order regularities which are seen by autocorrelation, but not by
> perception. I did not see a contribution up to now where somebody showed
> that neural autocorrelation would produce this asymmetry between first-
> and higher-order intervals. We supplied some modelers with our stimuli,
> and they could not make their models produce this asymmetry.
I'm not aware of the results of the other simulations, and I was
commenting on the model your original paper tested, which was not a
summary autocorrelation or population-interval model, but examined the
autocorrelation only of the upper partials, and without the LP noise. If
you have cochlear filtering and low-pass filtering that produces a
decline in phase-locking at frequencies of 2 kHz and above, then the
temporal structure of the ANF discharges follows the envelope of
higher-frequency, psychophysically-unresolved harmonics. At about 2 kHz
and fundamentals of 200 Hz, the pitches produced are sensitive to phase
spectrum because they are based on envelope shape, which is sensitive to
phase.I think the balance between interval representation of individual
partials and envelopes depends both on absolute frequency (via declining
strength of phase-locking) and harmonic number (via harmonic spacing
becoming a smaller fraction of filter bandwidths). Spike precedence may
also play a role. If the models in question neglect the broad tails of
tuning curves and/or spike precedence/recovery effects arising from high
degrees of stimulus-driven interneural synchrony (as one sees for CF's >
2 kHz), then these kinds of effects might not be produced.
> > 2) The stimuli were harmonic complexes whose harmonics (F0=100 Hz)
> > were all above 5 kHz
> In our new submitted paper (you should know of it) we are down to 2 kHz,
> and the asymmetry is just the same. It is a real problem to go down much
> lower, as one should exclude frequencies lower than 15 times the
> fundamental. You did not specify whether in your below-2-kHz click
> trains resolvable harmonics were excluded (and masked for distortion
> products). In our understanding, it is not the frequency region but the
> resolvability that counts.
I tried many combinations of low-pass click trains with present and
missing fundamentals, although I did not in either the low-pass or
high-pass case introduce low-pass noise. So the click trains that I
made, both the low-pass and high-pass produced strong, definite pitches,
and I must say that the masking effect when HP clicks are interspersed
in the HP case is striking.
I do agree with you that psychophysical resolvability and these masking
effects are coextensive with each other, but I think we disagree mostly
on what we believe to be the nature of the mechanisms underlying whether
harmonics are psychophysically resolvable. The question is really, what
determines the resolvability of harmonics, and this depends on the
nature of the neural representations that one supposes that the auditory
system uses for frequency and periodicity.
Your paper and some other earlier ones by Carlyon and others prompted me
to rethink some of these issues. While it is often not stated
explicitly, the underlying assumption for the hypothesis that there are
two neural mechanisms for processing resolved and unresolved harmonics
seems to be that pitches produced by resolved harmonics are the product
of a neural spectral pattern analysis, while those produced by
unresolved harmonics are the product of temporal discharge patterns that
are ultimately caused by the failure of filters to resolve the
individual partials. A deeper assumption is that the pitches of pure
tones are represented in some sort of rate-place map (possibly with
sharpening and cleaning up from synchrony, lateral inhibition,
efferents, and top-down selection). So then one thinks that the two
patterns of perceptual judgements, pitches of resolved and unresolved
harmonics, are due to the operation of qualitatively different kinds of
neural mechanisms. Is this a fair statement of your overall views?
My difficulty with the resolvability/unresolvability distinction has
been mainly with the tacit physiological assumptions that seem to
accompany it. When one looks at auditory nerve responses for stimuli
presented at moderate (60 dB SPL) levels or higher, one sees very little
resolution of individual harmonics in rate-place profiles for anything
but the first few harmonics (low harmonic numbers << 15). From the point
of view of rate-place profiles, almost everything looks like it should
be unresolved by the system.
The alternative general hypothesis is that frequency representation is
based not on spatial profiles of filter activation, but by an analysis
of the temporal patterns that are produced by the filters. The different
tunings are then not the vehicle for fine frequency discrimination, but
the means by which multiple frequencies/periodicities/auditory objects
can be simultaneously represented (which again may be why one can hear
speech pretty well with 4-6 channels in quiet, but everything goes to
pot if there is noise or competing sound). The filters still matter, but
in a different way. When one looks at the intervals produced in the
auditory nerve, one gets a much finer and robust picture that resembles
much better the psychophysics of pitch perception, as Wever, Siebert,
Goldstein, Moore and many others have demonstrated over many decades.
I believe that population-interval representations can handle these
observations. Models for the pitches produced by complex tones have been
proposed, but these can also be applied to pure tones and the hearing
out of individual partials in complex tones. This comes out of thinking
about the multiplicity of pitches that are heard when harmonic complexes
are presented. While our first approaches to estimating pitches from
population-interval distributions involved (essentially) peak-picking,
which is simplest to explain, a better approach is to examine which sets
of regular interval patterns are seen in the population-interval
distribution, i.e. that would resemble those patterns that would
normally be produced by a single partial at particular frequencies. If
one looks for all different patterns and the relative numbers of
intervals participating in those patterns, then one has a means of
estimating which pitches will be heard, both low pitches of the complex
and pitches of partials (one can compute the correlation between the
population-interval distribution and all possible partial patterns.)
This is a temporal analog of Parncutt's frequency domain model of pitch
multiplicity. The more harmonics, the less salient are the pitches of
partials relative to the low pitch. All other factors being equal, the
more intervals are produced by phase-locking to individual partials
(which depends on phase-locking), the better their representation in the
population-interval distribution and the greater their resolvability.
Filter bandwidths and harmonic numbers also come into play. In this kind
of representation, mistuned harmonics stand out, while tuned harmonics
tend to fuse together. There is also a way of dealing with partial
loudnesses and mutual masking in terms of the relative fractions of
intervals that are related to the respective patterns and how one
pattern reduces the fraction of the other.
I have gone on way too long here already (I ask for your forebearance)
-- the ultimate point is that resolved harmonics need not be associated
with spectral pattern mechanisms, but that interval-based mechanisms can
also be envisioned. I am working on such models. We should not adopt
spectral pattern mechanisms by default or by custom or because we have
no reasonable alternatives.
> > 3. K & D assumed that each of their clicks would give rise to a spike in an auditory nerve fiber.
> Not precisely so. We only assumed that _most_ of the inter-spike intervals on the auditory nerve would correspond to inter-click
> (inter-stimulus) intervals. But this assumption is not crucial to our argument. Please let me cite from our General Discussion section:
> It is plausible that the ‘‘final’’ temporal structure contributing to pitch sensations (either directly or after a conversion
> into a place code) does not occur in the auditory nerve but at a higher location in the auditory system. We believe that at this stage the ISIs > that matter are first-order ISIs. However, the consecutive spikes bounding these ISIs may originate from nonconsecutive spikes at the auditory > nerve level.
The problems with first-order ISIs at any level of the system (and with
modulation detectors to analyze them) have to do with 1) the rate (and
hence, level) dependent nature of the distributions -- high spike rates
eliminate longer intervals, 2) problems of explaining the imperviousness
of the system to disruption by intervening transients, and 3) pitches of
resolved inharmonic complex tones (Schouten & De Boer). Since you shift
all processing of resolved harmonics over to a spectral pattern
mechanism, 2) and 3) don't apply, but 1) still does. (What bothers me
most is that this partitioning of mechanisms is never justified explicitly).
It seems to me that for these envelope-generated periodicities,
first-order and all-order interval distributions yield similar pitch
estimates. In any case, even if there were differences and the
psychophysics followed the first-order ISI, this would in no way
invalidate the population-interval models for (the much more important)
psychophysically-resolved harmonics. It also provides no positive
evidence that any kind of harmonic analysis of spectral pattern is being
carried out by the auditory system.
> > 5. In short, lower frequency hearing has more autocorrelation-like
> > qualities (intervening clicks don't mask much; ...),
> > while high frequency hearing has more modulation-like qualities
> > (intervening clicks mask, ...).
> Again, IMHO it is resolvability that counts. When I asked at the ASA meeting in Berlin what could be a generally accepted boundary between
> low- and high-frequency regions I was pointed to the 4 kHz boundary where neural phase locking ends. With our new publication we are well
> beyond this limit (i.e. intervening clicks mask in stimuli starting at 2 kHz). On the other hand, it is a very simple demonstration that
> intervening clicks don't mask in the high-frequency region AS LONG AS THERE ARE RESOLVABLE HARMONICS.
I'm sorry I wasn't there in Berlin. What counts as a low- or high
frequency region depends on what you are interested in, coding of
carriers and fine structure or coding of envelopes. Phase-locking
probably does not end abruptly at 4 kHz, and it declines rapidly above 2
kHz. In addition, because of cochlear lags, inter-CF synchrony to
modulated stimuli is very high for CF's above 2 kHz (see the PST
neurograms in J. Neurophysiol. 76:3:117-34.), and this I think sets up
spike precedence effects. Whether harmonics will be psychophysically
resolved (in my terms whether temporal patterns related to envelopes vs.
fine structure will dominate) I think will depend both on absolute
frequency and harmonic number.
I agree with you wholeheartedly that psychophysical resolvability counts
and that finding its neural correlates is a highly worthwhile project.