Re: Melodic consonance (Peter Cariani )

Subject: Re: Melodic consonance
From:    Peter Cariani  <peter(at)>
Date:    Tue, 11 Jul 2000 16:33:57 -0400

Eckard Blumschein (>) wrote: Alexandra Hettergot (>>) wrote: > >I don't believe in the CBW concept's being essentially wrong (though I'd think other aspects (traditionally) apparent in music, as, e.g., timbral difference (i.e., due to both spectral and temporal structure), spatial distance, multichannel treatment, etc., worth to being considered in that respect, too). > The latter is exactly what I would like to suggest. There are people like > me who consider virtual pitch a plausible result of neural principles > rather than a "gestalt" phenomenon. The same reasoning provides a > functional understanding of the neural basis for sensations like > consonance/dissonance on the most basic level, and for more emotional > judgements like pleasantness at a higher level. I think perhaps the conception of "gestalt" phenomena has become corrupted over the years to mean its opposite. Kohler and others thought in terms of psychoneural isomorphisms, between analog, iconic, relational patterns of excitation and the percepts. Their perceptual gestalts were holistic patterns that were thought to be reflections and simple transformations of global patterns of neural activity. Kohler and others thought in terms of field-like, spatiotemporal ionic current patterns in tissues, but these mechanisms were falsified by Lashley and Sperry, who implanted pieces of metal in the cortex that presumably would disrupt these patterns and hence distort their perceptual correlates, but did not. In the last 40 years, an associationistic connectionist model has dominated thinking, and the idea of the "perceptual gestalt" has become a global pattern that has been assembled or inferred from the pieces (i.e. break down the sensory inputs into perceptual atoms and then reassemble the atoms into unified objects and wholes). Feature detectors and feature-bindings are bound up with these assumptions. The Gestaltists, however, rejected this notion that the system first breaks everything down into perceptual atoms and then reconstructs wholes, and argued instead for relational representations based on patterns (relational primitives). In this respect they have commonalities with the Gibsonians in looking for iconic, analog patterns and the means of extracting invariances from them. For the Gestaltists, perceptual Gestalts were thought to be the experiential consequences of the global features of these neural activity patterns. In the context of summary autocorrelation and population-based interspike interval-based models for pitch (e.g. Meddis & Hewitt, Cariani & Delgutte), the representational primitives are interspike intervals -- relations between events -- that form an iconic, analog, representation that resembles the autocorrelation function of the stimulus (taking into account cochlear filtering and the decline of phase-locking with frequency and some temporal precedence effects in high CF regions). The most common intervals in population-interval distributions correspond to the pitches that are heard (in the few exceptions, the pitch is an octave higher or lower). The pattern of interval peaks in the distribution is a global feature of the activity over the whole neural population, so in this case we have a perceptual Gestalt that is based on spike-mediated relations, intervals, rather than spatially-distributed currents (spike information is conveyed axonally, so insertion of metal into tissues would likely not affect such a system's functioning very much -- it would be possible to have mechanisms based on spike correlations that had many of the same properties as the Kohler's fields). Rather than being a "pattern-completion" process, in which a "virtual" pitch is inserted for the "missing fundamental," as it appears if we look in the frequency domain, the process resembles the perception of any other periodicity when seen in terms of autocorrelation-like representations and operations. > >btw, what will "local resonances" cause if not spectral pitches (due to spectral peaks) ? > > Well, local resonance actually performs a coarse spectral analysis. > However, look at the neural pattern drawn by Secker-Walker and Searle > (1990). Temporal period seems to be much more precise than what Vercoe > named spectral "blockvoting". Furthermore, spectral resolution within > cochlea cannot account for the astonishing frequency resolution of hearing. > This also indicates that frequency discrimination by ear is based on > perception of period rather than frequency. > > BTW, I have to correct myself. The paper by Shamma is based on the same > original data by Miller and Sachs. So it would be highly desirable to have > at least a second set of such data. There was a series of papers by Delgutte in JASA in the mid-80's based on a different (cat) dataset that showed similar results. We also looked at population-coding of vowels , and what one sees is very much like the Miller & Sachs data -- there are broad swaths of the auditory nerve array that are driven by the intense harmonics in a formant region, so there may be 4-6 different time patterns in different CF regions of the auditory nerve for a typical vowel. (I believe that this is why only a few frequency channels are needed for speech reception in quiet, e.g. Shannon's demonstration, and why implants work as well as they do -- the channels themselves are not encoding frequency, it is the temporal structure of the signals that they carry that does this. If one believes this, then it becomes imperative to raise the carrier frequencies of implant devices so that finer, higher frequency temporal information can be conveyed by the system). If you throw away all of the CF information, as Palmer did and we did, to construct a population-interval distribution, then one has a very nice, robust, precise representation of the formant structure in the patterns of short intervals that are present. Timbre can also be encoded purely temporally. The rate-place patterns on the other hand are coarse, only allowing for (poor) discriminations of individual components when they are separated by large fractions of an octave (say 1/3 - 1/2 octave). These profiles only worsen at higher levels. This looks bad for spectral pattern mechanisms for pitch that are based on rate-place profiles. The picture for purely spectral peaks and spectral pitch more or less resembles that for pure tone discrimination above 5 kHz -- coarse, musically atonal, and vulnerable to changes in level. So there is a purely place-based pitch percept, but may be much weaker and more ill-defined than what is commonly imagined. Strong, precise and level-invariant pitch percepts that have musical tonality (chroma, can support a melody) are always, to my knowledge, associated with an abundance of phase-locked spike information. -- Peter Cariani Peter Cariani, Ph.D. Eaton Peabody Laboratory of Auditory Physiology Massachusetts Eye & Ear Infirmary 243 Charles St., Boston, MA 02114 USA tel(at)EPL (617) 573-4243 tel(at)MGH (617) 726-5419 FAX (617) 720-4408 Email peter(at) Web:

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University