[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Computational ASA -- how many sources can humans perceive?

My anecdotal experience with video game sound effects is similar to Murch's
statement:  humans don't do well in simultaneous real time extraction beyond
two sources in a simulated 3D environment.  Langendijk, Wightman, and
Kistler (JASA, 2001, v.109:2123-2134) have done a different experiment on
sound localization in the presence of distracters and found that listeners
can judge azimuth and elevation of a target pretty well, especially if head
movement is allowed, but that does not address the ability to recognize the
distracters and the target simultaneously.

Thus, there is the "attention" factor, especially if multiple listening
sessions are allowed, or if the listener is cued in advance.  For example,
if one listens to a musical passage with typical rhythmic and pitch
counterpoint, it is feasible to recognize and write down all the
instrumental sounds that are present over time (guitar, bass, bass drum,
cowbell, vocal harmony, etc.).  Further, if one is instructed to "listen for
the cough in the audience just after the trumpet fanfare," it is often
possible to identify a signal that might have been ignored or missed on
first hearing.

Summary:  although one-shot simultaneous recognition may be limited,
multiple takes and cues can improve the recognition score.

Rob Maher

> -----Original Message-----
> From: Valeriy_Shafiro@rush.edu [mailto:Valeriy_Shafiro@rush.edu]
> Sent: Friday, April 30, 2004 1:26 PM
> To: Maher, Rob
> Subject: Re: Computational ASA -- how many sources can humans
> perceive?
> From: "Maher, Rob" rmaher@ECE.MONTANA.EDU
> >It is sometimes argued that "humans can do separation, so
> the problem must
> >be soluble."  I would argue that humans do source _identification and
> >tracking_ very effectively, but perhaps humans do not
> actually solve the
> >computational _separation_ problem, in the sense that the individual
> vectors
> >'B', 'C', etc. are extracted in a neural signal processing context.
> I would like to ask a further question: Do we, in fact, know how many
> independent sound sources in a mixture humans can perceive?
> Thus far I
> know of only one research report where human listeners were asked to
> identify sound sources in a recorded "real-world" sound
> mixture (Ellis, D.
> P. (1996). Prediction-driven computational auditory scene
> analysis).  We
> have been talking about this issue with Brian Gygi, and from the few
> related reports that Brian found, it appears that humans may
> not be that
> good in simultaneous perceiving independent sound sources.
> For instance,
> Jennifer Tufts and Tom Frank J. Acoust. Soc. Am. 101 , 3107
> (1997) found
> that the accuracy of judging the number of talkers in a
> multitalker mixture
> drops considerably when there are more than 3 talkers.  There
> is also a
> report by David Huron (Music Perception, Vol. 19, No. 1
> (2001) pp. 1-64.,
> or on-line
> http://www.music-cog.ohio-state.edu/Huron/Publications/huron.v
 ) that estimating the number of musical lines in
 polyphonic music worsens considerably after 3.  Some anecdotal evidence
for this limit also comes from movie sound effect designers.  This is a
citation from Walter Murch, a renown sound effect artist: "There is a rule
of thumb I use which is never to give the audience more than two-and-a-half
things to think about aurally at any one moment. Now, those moments can
shift very quickly, but if you take a five-second section of sound and feed
the audience more than two-and-a-half conceptual lines at the same time,
they can't really separate them out. There's just no way to do it, and
everything becomes self-canceling." (cited from

Any thoughts, comments, and references relevant to this issue are

Valeriy Shafiro
Communication Disorders and Sciences
Rush University Medical Center
Chicago, IL

office (312) 942 - 3298
 lab    (312) 942 - 3316
email: valeriy_shafiro@rush.edu