[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UNmixing Sources, i.e., The Cocktail Party Effect

Dear List,

Several LISTeners have asked about the Stereo example I offered to
Robert Bolia and Brian Gygi on 15 October.  The following points
by Robert and Brian both focus on the relative importance of
"Spatial segregation" ...

- Robert's point was ...
"I'm not sure that I see how an MAA of 1-2 degrees (the 10
microsecond resolution of the binaural system) is crucial to
explaining the Cocktail Party Effect.  Has anyone ever done any
cocktail party experiments with talkers separated by as little as
1 degree?

- Brian's further point was ...
"Let's take another absurd example.  You are listening to two talkers
through a *single* loudspeaker.  You would almost never think they
were one speaker, even though the angle of separation is zero.
Spatial separation is useful, but is far from the defining feature of
stream segregation, for the simple fact that in highly reverberant
environment, the necessary information (time of arrival, phase)
can often be ambiguous."

Robert questions MAA as crucial to the Cocktail Party Effect while
Brian distinguishes between Spatial & Stream segregation in his
Zero-Degree "Audible Angle", 2-Source, SINGLE loudspeaker.

Robert's 2-Source, SINGLE loudspeaker is the subject of the following
Stereo example that demonstrates a particular dynamic process always
in-action during perception.  What I'm about to describe has been
auditioned using a great variety in the choice of source pairs.

The most dramatic example of a source pair is a "recorded book" where
ONE actor does ALL the reading and thus eliminates the question of
any "special" advantage created by using two DIFFERENT voices.
By "special", I refer to the usual spectral cues argument that different
voices (different spectral composition) offer THAT difference as an
independent method of Source segregation.

The experiment uses a "Stereo Mixer" readily available (<$100) from any
Radio Shack (Tandy) store.  Using a battery operated model, I've tested
many acoustic environments by travelling to different stereo systems
with two cassette players and the Stereo Mixer.  I've tested a range of
situations from damped to "highly reverberant environments".  The test
has been perceived "the same" in all environments! ...
        ... LISTeners ... please report any perceived exceptions.

This test was also demonstrated at an AES paper in Berlin, March 1993.

The setup is simple.  Feed one source (Chapter 1 of a recorded book?) to
one loudspeaker.  MIX that initial source with a second source (Chapter 2?)
and feed the mixture to the other loudspeaker.  Of course, your particular
choice of Left (loudspeaker) source vs Right (loudspeaker) source is not
an issue.  In fact, during the AES Berlin demo, I had the audio engineer
"reverse the room" so that Left attendees could hear what Right attendees
heard and, vice versa.

As pointed out by Brian, if you stand in front of the 2-Source speaker you
hear the mixed confusion of two voices.  You eventually perceive that two
voices are present as, over a period of time, you arrive at pauses in ONE
voice while the OTHER voice continues.  As you continue listening, you
realize that even smaller pauses allow you to catch small bursts of
intelligible speech from one voice or the other.  However, you also realize
that, as inferred by Brian, you can NOT continuously FOCUS on ONE voice
in the continued presence of the OTHER.

If you now move to the "normal" stereo listening position, i.e., central to
speakers, you hear a "normal", central, stereo image of the source common
to Left/Right and, the 2nd (mixed) source in ONLY ONE loudspeaker.
This should be a bit surprising since you have just ...
        ... UNmixed ...
the sound coming from the mixed (2-source) loudspeaker into ...
        1) A SINGLE, perceived, Central Stereo Image.
        2) A SINGLE, perceived, voice from the 2-voice loudspeaker.

This UNmixing effect suddenly becomes startling if you move to a (slightly)
different location!  Assuming you are still centrally located, pivot
toward, and
walk one step closer to, the ONE VOICE loudspeaker (NOT the MIXED
loudspeaker).  Suddenly, two things happen ...
        1) The Central Stereo Image vanishes.
        2) You clearly hear ONE voice emanating from ONE loudspeaker
        and, the OTHER voice exclusively from the OTHER loudspeaker !

You can easily (and, CONTINUOUSLY!) FOCUS on the voice of CHOICE
just as you can during a typical Cocktail Party.  However, a person standing
near the MIXED speaker will still hear the MIXED voices, i.e., the mixture is
still emanating but, YOU perceive the INDIVIDUAL voices.

As mentioned above, this test is best performed using the voice of the
SAME person to source each test voice as "spectral segregation" is then
(essentially) ruled-out in this Stream Segregation demo.

A dynamic aspect of the Precedence Effect explains this perceived
Source segregation (UNmixing).  By stepping one foot closer to the
1-voice loudspeaker, the signal from the mixed loudspeaker is
delayed approximately 1 ms.  Thus, the common voice is detected
as BOTH a Source AND as a 1 ms reflection of that (common) Source.
But, the 1 ms "reflection" arrives as 1-component of a 2-voice mixture.
The Precedence Effect UNmixes that mixed arrival by SILENCING the
1 ms delayed component which allows the SECOND component to be
perceived as the ONLY sound arriving from the 2-voice loudspeaker.

This does NOT violate the second law of thermodynamics ("UNmixing")
because, the 1-voice component is TIME-locked to the 2-voice mixture
and is "stream" segregated by the Precedence Effect.  But, clearly, each
loudspeaker was localized, i.e., EACH signal (1-voice and 2-voice) was
detected and Binaurally processed to SILENCE one-component.

An interesting verification of this dynamic UNmixing by the Precedence
Effect is provided by the additional effect of Source Fusion.  That is, the
1 ms delayed component of the 2-voice mixture is not only Silenced: it's
LOUDNESS is FUSED with that of the 1-voice Source in the OTHER

This can easily be verified by reducing the common voice component in
the mixture to zero; the EFFECT of this reduction is best heard by simply
switching OFF/ON the common voice component in the mixture.

If the UNmixing is caused by the Precedence Effect then, the 1 ms
"reflected" component FUSES with the SINGLE voice of the OTHER
loudspeaker.  Thus, the ONLY audible effect of switching OFF/ON the
common voice component (in the mixture) is that the SINGLE voice
heard from the OTHER loudspeaker abruptly changes volume as you
switch OFF/ON the common voice component in the mixture, i.e., in
EITHER case (OFF/ON) you perceive only ONE voice in EACH
loudspeaker but, the loudness of the 1-voice loudspeaker CHANGES
        ... a direct result of FUSION.

The net result of the Cocktail Party Effect (CPE) is our ability to
perceive the
separate sources otherwise MIXED in a normal acoustic environment.
Brian's case of 2-voices emanating from a single point is worst-case CPE.
But, also buried in Brian's point is the case of a "highly reverberant
environment" which is ALSO handled by the DYNAMICS of the Precedence
Effect ... Source reflections are silenced and fused with their corresponding
Sources.  But, FUSION itself implies that the signal from EACH Source is
detected so that REFLECTED energy CAN BE summed with Source energy.

I personally do not care which label we place on our ability to ...
        ... "Source Segregate".

The attempt to further analyze "Segregation" into Spatial, Stream, Spectral,
etc is beside the point.

I personally believe Spectral analysis without specific Source identity is
worthless as the question of WHICH frequency components go with
WHICH source remains unresolved.

I personally believe Stream Analysis is INITIALLY accomplished by a
3-step SPATIAL analysis ...
        1) First step by Azimuth.
        2) Second step by resolving Vertical Half-Plane ambiguity.
        3) Third step by Elevation.
Reflections are employed to gage Source distance and the combination
of 3-step Spatial and Distance (via reflections) form a source map
populated by DATA Sources that are THEN analyzed for content/meaning.

My AES paper (Berlin, 1993) discusses this 3-step Spatial analysis.

Rich Fabbri

McGill is running a new version of LISTSERV (1.8d on Windows NT). 
Information is available on the WEB at http://www.mcgill.ca/cc/listserv