[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Autocorrelation-like representations and mechanisms

Bob Carlyon (post at end):
Yes of course segregation is not all-or-none and slight mistunings can yield graded effects on pitch. Good point to remember. Thanks for the review reference.
The other caveat is that these are high frequency harmonics (> 2 kHz)-- I don't know whether the phase-dependent grouping mechanisms work in that regime.
Does anyone know off hand? Could be envelope/spike precedence effects instead. Maybe the grouping mechanisms operate on envelopes in those regimes rather than fine structure.

I usually assume that once one has an F0 difference of > = 12% the effect of one complex's pitch frequency is very slight or negligible. yes?

My rule of thumb is that whenever one has 1) different onsets, 2) different F0s, or 3) transient changes in phase, then one does need to worry about grouping effects. For each of these factors there is a gray region of partial effect. Unless one has some provision for grouping in one's model, whatever the representation, be it frequency or time domain, all-order or first-order interval, the model can be expected deviate from what is perceived.

We need to sharpen our focus on the relationship between pitch and grouping and the time windows over which these factors play out. There is also another set of time windows related to timbre (Chistovitch's formant-fusion studies) and loudness integration, not to mention integration of location information.

Ultimately we want an integrative theory that encompasses the basic dimensions of auditory perception: pitch, timbre(spectrum), loudness, location, and the organization of these attributes (grouping, scene analysis). Some might think that there are a large number of little, special purpose mechanisms that take care of each of these aspects of auditory perception, while others might think that there are only a few very general representations that have different features/aspects related to the different attributes. It would be worthwhile to have a discussion as to the relative merits of each view.

Christian Kaernbach:
> You can imagine that I am once more disappointed to see how evidence against AC models is discarded as limited, and its interpretation as shallow. You name quite a number of phenomena that are accounted for by AC models. But science is not democracy: Evidence against a mechanism can not be counter-balanced by any number of studies where AC models do well.

First, if we go back to Martin Braun's post, I don't think that Meddis' model is evidence against an all-order interval neural representation of pitch. If it does prove to work, and I do sincerely hope the best for it (we could move on, avanti, to other things), then it provides a possible way of using timing information. In his original post Martin did not give any evidence or argument per se for why all-order interval representations are unrealistic and need to be discarded.

I agree that science is not democracy, but it is very rare, in neuroscience especially, that a counterexample is so well and clearly posed that it falsifies a whole general theory. I would rather that both be driven more by reasoned argument than by rhetoric. I do very strongly agree that we want to strive to test our models empirically as rigorously as possible, and that it is important that we be prepared to abandon our most cherished assumptions should they impede formulations of models that do work (this is our only defense against dogma, received truth, arbitrary authority, and self-deception). I think the psychophysical demonstrations that probe the limits of these temporal representations (e.g. yours and Bob Carlyon's) are important and useful, but I do think that your published interpretation in terms of neural responses and all-order interval representations has been very shallow. I am a very tolerant and patient person, and this would be entirely forgivable if straw-man "waveform AC" models were not accompanied by highly overblown rhetoric about falsifying all all-order interval models en masse. We all need to be clear about what the specific implications are, and the differences between waveform autocorrelations, all-order population-interval distributions, representations, and neural analysis mechanisms. I will be happier when 1) you make it clear that your interpretation falsifies only simple waveform AC models, and not necessarily all-order interval representations in general, 2) you recognize some of the problems with the first-order interval account (level-dependence, inability to deal with F0 or low harmonics), and 3) be more forthright about what you think are the neural representations and mechanisms underlying lower, resolved harmonics. For me it is much more important to account for the strong pitches of resolved harmonics and pure tones than these relatively weak pitches produced by very high harmonics and low S/N ratios -- they were absolutely right in the 1960's about the dominance region for pitch, but Schouten's reliance on high, unresolved harmonics ("residues") misled them into downplaying temporal models. I think the low frequency pitch and timbre phenomena are more important for music and speech perception, although it is certainly true that low frequency modulations of high frequency carriers lend more insight into perception in implant users.

I ran your stimuli through my AN model and Pressnitzer & Winter ran it through theirs, and both groups predicted pitches that were different from the waveform AC, and which qualitatively agreed with the weakness of the pitches to begin with and the masking effects of interleaved intervals. (One problem with their model is that they used narrow, critical band psychophysically-derived filters rather than broader, physiologically-realistic ones). I believe that if one has a reasonable, even relatively simple AN model that incorporates physiological filters, rectification, something like an adaptive gain control (Geisler & Greenberg, JASA 1986), saturating rate-level functions, and spontaneous rate, you will obtain the kinds of spike precedence behavior and interclick interference that causes the all-order interval distribution to diverge from the waveform AC. I think we need to deal with spike precedence/synchrony/population recovery issues and what these do to interval distributions if we are to understand why cochlear implants only support periodicity discriminations up to hundreds of Hz rather than thousands of Hz.

Rather than simply discarding them whenever there is any apparent anomaly, I think it is more constructive to think of ways in which the interval models might be modified. Remember, these models are still evolving, albeit slowly (there are very few, if any, people who are funded primarily to work on them). In the case of Bob Carylon's (1996) study and others since, I think the psychophysicists have made it very clear that there are differences between low- and high-CF channels. Presnitzer & Patterson showed that very low F0 information was mostly carried in lower-CF channels and there are old studies of the temporal durations needed to hear a good pitch rather than a click for pure tones of different frequencies. I think these all suggest to me that different CF channels could have different temporal analysis windows (lower CFs have longer time windows, with a maximum of around 30 ms). The windows could be the result of the duration of ringing in each of the respective CF channels. If this were the case, then high_CF channels would be less well-suited to convey low periodicities (e.g. 100 Hz) and this would result in differences in pitch perception via low and high frequency carriers. One way or another all of the information needs to be combined across CFs -- I think it is likely that it is done in stages as one ascends the pathway with successively overlapping sets of inputs (as in Carney's model), and yes, there will be some CF-dependent processing going on in the meantime. The concept of a unified interval-based representation is still useful (in the same way that the idea of a Central Spectrum is useful) -- somewhere it all has to come together.

Martin Braun (3/5/03):

Peter Cariani wrote:
Modulation-tuned units have been found in abundance, but there are some basic problems with these when it comes to pitch:
1) they cannot explain pitch equivalence between pure and complex tones (big, big problem)

Martin: No problem. F0 periods that arise in the frequency laminae of the partials are forwarded to the frequency laminae of F0.

if this were the case then there should be true pitch detectors that respond both to a pure tone and a complex with a missing F0 at the same frequency. Schwarz & Tomlinson (1990) looked hard in the awake monkey cortex without success. Nobody has found true "pitch" units in any abundance at the level of the IC or higher (except for Riquimaroux's 15 cortical units -- hard to evaluate).

2) they are not likely to represent multiple competing pitches in a robust fashion (e.g. two musical instruments playing notes a third apart --)

Martin: We only hear two pitches, if there are strong timbre labels attached to them. These are decoded in the cortex, which then feeds back to the pitch neurons in the midbrain.

Simultaneously play two notes a whole step (or more) apart on the piano. Do you hear one or two notes?

On what basis can you say that the notes MUST be separated in the cortex? How "anatomically and physiologically realistic" is this assertion?

Given that we presently have a very, very weak grasp on how the system works, I think that we need to be a great deal more careful before we dismiss models on the basis of our present evidence and understanding, and also before we go around ascribing specific general functions to particular stations. I do hope Ray's model works, and maybe it will behave in a manner that I don't expect -- I think it is great that there is another possibility on the table for our consideration and inspiration. But it still seems to me that even at the level of the IC, interval representations for low pitches still have more plausibility than those based on modulation-tuned pitch detectors (go look at Langner's spike raster plots to modulated tones).

3) they are not likely to yield a representation that does not degrade at high SPLs

Martin: Level stability is provided by lamina-based lateral inhibition in the midbrain.

Show me the neural data. I want to see MTFs and pitch representations that are every bit as good at 90 dB SPL as they are at 40 or 50 dB. In all studies of modulation tuning that I have seen, bandpass MTFs broaden at higher levels, unlike pitch perception and interval information. Lateral inhibition is often invoked to get finer tuned responses, but I have yet to see narrow tuning curves at high SPLs for central units with BFs below 1-2 kHz -- all those really nice and sharp tuning curves that go all the way up are invariably (in my experience reading the literature) for BFs > 5 kHz.

4) they are not likely to explain the pitch shifts of inharmonic complex tones

Martin: Period detectors register these pitch shifts, as calculated many years ago.

Only if they are based on autocorrelation or something roughly equivalent. Modulation detectors won't work, see Slaney, 1999. (On the other hand temporal discharge patterns do show this. Greenberg 1980 found the temporal correlates of AM pitch shifts in his FFR study, which puts the interval patterns needed to account for pitch shift of inharmonic AM tones at least in the inputs to the IC. I saw similar patterns in local field potentials in the ICC.)

-- Peter Cariani

Chistovich LA (1985) Central auditory processing of peripheral vowel spectra. J Acoust Soc Am 77:789 - 805.
Chistovich LA, Malinnikova TG (1984) Processing and accumulation of spectrum shape information over the vowel duration. Speech Communication 3:361-370.
Geisler CD, Greenberg S (1986) A two-stage nonlinear cochlear model possesses automatic gain control. J Acoust Soc Am 80:1359-1363.
Greenberg S (1980) Neural Temporal Coding of Pitch and Vowel Quality: Human Frequency-Following Response Studies of Complex Signals. Los Angeles: UCLA Working Papers in Phonetics #52.
Krishna mBS, Semple MN (2000) Auditory tempolral processing: responses to sinusoidally amplitude-modulated tones in inferior colliculus. J Neurophysiol 84:255-273.
Rees A, Møller AR (1987) Stimulus properties influencing the responses of inferior colliculus neurons to amplitude-modulated sounds. Hearing Res 27:129-143.
Riquimaroux H, Hashikawa T (1994) Units in the primary auditory cortex of the Japanese monkey can demonstrate a conversion of temporal and place pitch in the central auditory system. Jounal de Physique IV 4:419-425.
Schwarz DWF, Tomlinson RWW (1990) Spectral response patterns of auditory cortical neurons to harmonic complex tones in alert monkey (Macaca mulatta). Journal of Neurophysiology 64:282-298.
Slaney M (1998) Connecting correlograms to neurophysiology and psychoacoustics. In: Psychophysical and physiological advances in hearing (Palmer AR, Rees A, Summerfield AQ, Meddis R, eds), pp 563-569. London: Whurr.

On Wednesday, March 5, 2003, at 05:46 AM, Bob Carlyon wrote:

Hi Peter,

just a minor quibble, when you say:

The interesting ARO poster by Rebecca Watkinson and Chris Plack that was mentioned involved transient phase-shifts that reminded
me a great deal of Kubovy's demonstrations of popping out by transiently phase-shifted harmonics (when they are perceptually separated from the rest of the harmonic complex, presumably they don't contribute to the F0 pitch of the complex).

(my italics)

Brian Moore's work shows that when a harmonic is mistuned by 3-8% it pops out of the complex but still contributes to its pitch. Similarly, a harmonic turned on say 40 ms before the rest of a complex may pop out, but still contribute to the pitch. Even a harmonic played to the opposite ear contributes more or less fully to the pitch, but it's easy to hear it out a a separate tone. The point is that grouping obeys different rules depending on the task in hand; it is not "all or none".

For a review of this issue, see (plug, plug)

C.J. Darwin and R.P. Carlyon (1995). "Auditory Grouping". In Handbook of Perception and Cognition, Volume 6: Hearing, edited by B.C.J. Moore. Academic, Orlando, Florida, pp 387-424.


Peter Cariani, PhD
Eaton Peabody Laboratory of Auditory Physiology
Massachusetts Eye & Ear Infirmary
243 Charles St., Boston, MA 02114 USA

Assistant Professor
Department of Otology & Laryngology
Harvard Medical School

voice (617) 573-4243
fax (617) 720-4408
email peter@epl.meei.harvard.edu
web www.cariani.com