[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: effect of phase on pitch

R. Parncutt wrote:
> Pondering the evolutionary origins of the ear's "phase deafness" in most
> naturally occurring sounds, I have come up with the following argument. Does
> it make sense? Is there other literature on this subject that I have missed?

I definitely agree that the auditory system is essentially phase-deaf,
except around the edges (which is why the edges are interesting).

However, where we would differ is I think that it is possible that the
phase-deafness of the system is a result of interspike interval analyses
and mechanisms that integrate/fuse invariant phase relationships into
unified objects, whereas you would hold that this system is phase
deaf because it uses rate-place representations. Is this fair?

> In everyday listening environments, phase relationships are typically
> jumbled unrecognizably when sound is reflected off environmental objects;
> that is, when reflected sounds of varying amplitudes (depending on the
> specific configuration and physical properties of the reflecting materials)
> are added onto sound traveling in a direct line from the source. Thus, phase
> information does not generally carry information that can reliably aid a
> listener in identifying sound sources in a reverberant environment
> (Terhardt, 1988; see also Terhardt, 1991, 1992).

Let's consider an echo off one surface that introduces a time delay.
To the extent that the echo's time pattern resembles that of the
original stimulus, depending upon the delay between the two
the sound and its echo can be fused into one object. In an
ecological situation, sound reflecting surfaces and
their properties are not changing rapidly.
The phase structure of echoes combined with the phase structure
of the direct sound will then form an invariant whole,
so that if one has a mechanism for fusing together
repeated relative phase patterns, echoes become fused
with the direct signal (i.e. fusion is a dfferent strategy
for "echo suppression"). At short delays (<15 msec) one hears only
one sound; at longer delays the timbre of the one sound changes,
and at really long delays one hears two sounds. These differences
would be related to how the auditory system
integrates recurrent patterns with different delays.

In such situations, one would not generally be able to distinguish
one particular phase pattern from another, but it would be important
that the time structure of the signal and that of the echo be
largely similar in order for fusion to take place.

I don't think things get much more Gibsonian than this. If the auditory
system operates this way, then there is an invariant time pattern in the
sound environment that the sound and the echo share that
is extracted by the auditory system. One way
to think about this is that the auditory system brings the correlation
structure of sound & echo into the nervous system by means of phase-
locked discharges.

This phase-locking is seen in every sensory
system, albeit on different time scales, so stimulus-driven time
structure has been around at least as long as sensory receptors
and sensory neurons. Essentially, if the fine structure of the
stimulus is present in the timings of discharges, then it is
possible to carry out very, very general kinds of pattern
recognition operations that extract invariant time structure
from what amounts to an analog, iconic representation of the sound.
This is much closer to Gibsonian ideas concerning mechanisms of
perception than models based on spectral features (perceptual atoms)
and complex pattern recognitions.

> This is a matter of
> particular concern in an ecological approach, as non-reverberant
> environments are almost non-existent in the real world (anechoic rooms,
> mountain tops). On the other hand, again in real acoustic environments,
> spectral frequencies (that is, the frequencies of isolated components of
> complex sounds, or clear peaks in a running spectrum, forming frequency
> trajectories in time-varying sounds) cannot be directly affected by
> reflection off, or transmission through, environmental obstacles. They might
> be indirectly affected as a byproduct of the effect that such manipulations
> can have on amplitudes (e.g., a weakly defined peak could be pushed sideways
> if amplitudes increased on one side and decreased on the other), but such
> phenomena could hardly affect audible sound spectra.
> So for the auditory system to reliably identify sound sources, it needs to
> ignore phase information, which is merely a constant distraction, and focus
> as far as possible on a signal's spectral frequencies (and to a lesser
> extent on the relative amplitudes of individual components, keeping in mind
> that these, too, are affected by reflection and transmission).

In a sense we are saying similar things here.
Interspike interval distributions,
like rate-place profiles, are both "phase-deaf" representations, and form
analysis is based on such  basic "phase-deaf" representations.

> The ear's
> phase deafness with regard to pitch perception is thus a positive attribute.
> In fact, it may be regarded as an important phylogenetic achievement - the
> result of a long evolutionary process in which animals whose ears allowed
> phase relationships to interfere with the identification of dangerous or
> otherwise important sound sources died before they could reproduce. If this
> scenario is correct, then it is no surprise that we are highly sensitive to
> small changes in frequency, and highly insensitive to phase relationships
> within complex sounds.

Localization of sound is important, but it is no less important to be able
to recognize the forms of sounds, to be able to distinguish and recognize
different sound sources. The reason that we talk so much in terms of
localization is that we understand more of how localization mechanisms
operate: what are the cues, what are the neural computations. One could
make an analogous argument that it is important to be able to detect
arbitrary recurring sound patterns that come in at different times, and
that therefore basic mechanisms evolved that integrate similar time
patterns over many delays. Such mechanisms would be deaf to the
particular phases of sounds, but sensitive to transient changes in
phase structure.

Birds and humans detect mistuned harmonics quite well. Why is
this? The harmonic complex has a constant phase structure that
recurs from period to period and the mistuned component has a
constant phase structure that recurs at its own unrelated period.
Phase relations between the complex and the mistuned component
are constantly changing. Two sounds are heard because
invariant waveform/phase patterns are fused together and
varying sets of relations are separated. Similar kinds of
considerations apply to double vowels with different F0's.

> Straightforward evidence of the ear's insensitivity to phase in the sounds
> of the real human environment has been provided by Heinbach (1988). He
> reduced natural sounds including speech (with or without background noise
> and multiple speakers) and music to their spectral contours, which he called
> the part-tone-time-pattern. In the process, he completely discarded all
> phase information. The length of the spectrum analysis window was carefully
> tuned to that of the ear, which depends on frequency. Finally, he
> resynthesized the original sounds, using random or arbitrary phase
> relationships. The resynthesized sounds were perceptually indistinguishable
> from the originals, even though their phase relationships had been shuffled.

Yes, but these sounds still had the same time-pattern
within each freq. channel and
the relations of time-patterns across channels
were presumably stable over the course
of the stimulus. If the interchannel
phase relations were constantly changing,
I think the sound would not have
the same quality. If you introduced
many random delays at different timepoints
into the different frequency channels,
I would think that these sounds would break apart.

I've experimented with sequences of vowel periods having different
phase relationships. One can take the waveform of a vowel period
and flip its polarity and/or reverse it in time. This results in
4 possible operations for each vowel period. If you do this in
an orderly, regular, repeating way, the resulting waveform has
a pitch corresponding to the recurrence period of the
whole pattern. If you randomize the sequences,
the waveform has a very noisy pitch and has a
very different quality, and if you
introduce random time delays in between the vowel periods in
addition to the random phase switches, the pitch goes away.
Now the short-term spectral structure of this sound is constant,
but the time-relations between events in one vowel period and
another have been destroyed.

Voice pitches of vowels thus can be seen as the result of
recurrent phase patterns that span vowel periods. It is the
delay between the patterns (the recurrence time) that
determines the pitch. If there are no recurrent phase patterns
there is no pitch. Recurrence time of phase
(time interval) defines frequency.

> It is nevertheless possible to create artificial stimuli for which clear,
> significant perceptual effects of phase relationships on perception can be
> demonstrated. For example, Patterson (1973, 1987) demonstrated that
> listeners can discriminate two harmonic complex tones on the basis of phase
> relationships alone.

I think that this discrim. was on the basis of a timbre difference.
I agree that phase relations can in some cases alter
the relative influence of particular harmonics
and thereby influence timbre.

> Moore (1977) demonstrated that the relative phase of
> the components affects the pitch of harmonic complex tones consisting of
> three components; for each tone, there were several possible pitches, and
> relative phase affected the probability of a listener hearing one of those
> as 'the' pitch.

These several possible pitches, I assume,
were associated with partials that could
be heard rather than with F0.

Again phase structure can subtly
alter the relative salience of
particular harmonics, and hence the
partials that are best heard.

>  Hartmann (1988) demonstrated that the audibility of a
> partial within a harmonic complex tone depends on its phase relationship
> with the other partials.


> Meddis & Hewitt (1991b) succeeded in modeling these
> various phase effects, which (as Moore, 1977, explained) generally apply
> only to partials falling within a single critical band or auditory filter.

I think what happens is that relative phase can affect which harmonic is
most effective at creating discharges that are phase-locked to it.

> In an ecological approach, the existence of phase sensitivity in such
> stimuli (or such comparisons between stimuli) might be explained as follows.
> These stimuli (or stimulus comparisons) do not normally occur in the human
> environment. So the auditory system has not had a chance to'learn' (e.g.,
> through natural selection) to ignore the phase effects. As hard as the ear
> might 'try' to be phase deaf in the above cases, some phase sensitivity will
> always remain, for unavoidable physiological reasons.

But these effects are all extremely subtle. I don't think vowel quality
ever changes so radically that one hears a completely different vowel.
But why are there these kinds of subtle effects at all?
From a rate-perspective, one could argue for some
kind of slight rate-suppression that depended on relative phases of
closely spaced harmonics. The interval account would be similar,
except that instead of rate suppression, one would have interval
or synchrony suppression.

> There could, however, be some survival value associated with the ability to
> use phase relationships to identify sound sources during the first few tens
> of ms of a sound, before the arrival of interference from reflected waves in
> typical sound environments. On this basis, we might expect phase
> relationships at least to affect timbre, even in familiar sounds. Supporting
> evidence for this idea in the case of synthesized musical instrument sounds
> has recently been provided by Dubnov & Rodet (1997). In the case of speech
> sounds, Summerfield & Assmann (1990) found that pitch-period asynchrony
> aided in the separation of concurrent vowels; however, the effect was
> greater for less familiar sounds (specifically, it was observed at
> fundamental frequencies of 50 Hz but not 100 Hz). In both cases, phase
> relationships affected timbre but not pitch.
> The model of Meddis & Hewitt (1991a) is capable of accounting for known
> phase dependencies in pitch perception (Meddis & Hewitt, 1991b). This raises
> the question: why might it be necessary or worthwhile to model something
> that does not have demonstrable survival value for humans (whereas music
> apparently does have survival value, as evidenced by the universality of
> music in human culture).

It's certainly premature to judge what kinds of auditory
representations have  or don't have "demonstrable
survival value for humans." Phase dependencies may
be side issues in ecological terms, but they do shed
light on basic auditory mechanisms.

Deciding what is evolutionarily-relevant is difficult
at best.

In arguing that music perception is culturally universal,
therefore it must have survival value,
I think one commits an evolutionary fallacy,
that every capability is the result of
a particular adaptation to a
particular ecological demand.
Even Steven Pinker doesn't go this far.
At least he would say that music perception
could be a by-product of other adaptations.

It's very hard indeed to identify what the
inherent survival value of music would be. And there can be
generalist evolutionary strategies and general-purpose
pattern recognizers, so that it is not always the case
that evolutionary demands and solutions have to be
so parochial.......
(most of vision isn't face recognition,
even if one thinks that face recognition is
a special purpose module selected
for a special-purpose ecological demand --
we see all sorts
of complex patterns that our evolutionary forebears
never encountered. We were not evolutionarily selected to
read text such as this, but we can do it because our visual
mechanisms have sufficient generality that we can learn
to recognize letters and words).

I'd rather we avoid particularistic adaptive "just-so"
stories to explain away peculiarities of our senses.

However, studying music perception is
very important even if music had/has no inherent survival value for
the species, because it gives us another window on complex modes
of auditory representation and processing. Music is an important
aspect of auditory cognition, and your work on the structure of
auditory cognition is quite valuable regardless of whether music is
essential to survival.

Very general kinds of pattern recognition mechanisms are possible
and could very well be based on the nature of the basic auditory
representations. For example, if an all-order interval analysis is
carried out by the central auditory system, the harmonic relations
(octaves, fifths, low-integer frequency ratios) all fall out of the
inherent harmonic structure of time intervals and their interactions.
(I've read your book and know you don't like these kinds of
Pythagorean relations. But there they are.......)
Our perception of octave similarities would be the result
of very basic similarities in interval representations
rather than the result of acquired
associations. According to this perspective, octave-similarities
and perception of missing fundamentals are the consequence of the
operation of phylogenetically-ancient neural coding systems.

We may be phase-deaf, but much of our auditory perception
may be based on phase-locking nonetheless.

-- Peter Cariani