[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Computational ASA

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Computational ASA
From: "Maher, Rob" <rmaher@xxxxxxxxxxxxxxx>
Date: Fri, 30 Apr 2004 09:29:09 -0600
Delivery-date: Fri Apr 30 12:15:25 2004
Reply-to: "Maher, Rob" <rmaher@xxxxxxxxxxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Jon--
I think the inherent difficulty of computational source separation has to do
with the generally ill-posed nature of the research problem:  given a
composite observation vector 'A' that is a linear sum of N unknown
time-varying signal vectors  'B', 'C', ..., determine estimates of 'B', 'C',
.... In other words, one equation in N unknowns, where N > 1.  Without some
other valid source of information, there can be no unique solution to the
problem.

To obtain the "other valid source of information," the CASA field has a
variety of threads.  One thread involves the use of conventional DSP
techniques to transform the composite signal into a (typically)
time-frequency representation, then to perform pattern extraction in the
transform domain.  Another thread uses biologically-inspired signal
processing via cochlear models and perceptually-derived nonlinear functions
borrowed from the perceptual audio coding field.  Yet another thread starts
with human psychoacoustical data in an attempt to exploit the cognitive
concepts of source segregation and streaming.

It is sometimes argued that "humans can do separation, so the problem must
be soluble."  I would argue that humans do source _identification and
tracking_ very effectively, but perhaps humans do not actually solve the
computational _separation_ problem, in the sense that the individual vectors
'B', 'C', etc. are extracted in a neural signal processing context.

A computational system that is able reliably to classify the number,
identity, and duration of overlapping sonic events seems like a first step
in the process.  Yet, I don't know of any system to date that comes close to
a casual human's ability to determine the orchestration of a musical
selection or recognize the doorbell at a noisy party.

We certainly need so new insights into the problem, so welcome aboard!

Rob Maher

--
Robert C. (Rob) Maher, Ph.D.
Associate Professor of Electrical and Computer Engineering
Montana State University-Bozeman
rob.maher@montana.edu


> -----Original Message-----
> From: Jon Boley [mailto:jdb@jboley.com]
> Sent: Friday, April 30, 2004 7:59 AM
> To: AUDITORY@LISTS.MCGILL.CA
> Subject: Computational ASA
>
>
> Hi all,
> I am a grad student in the University of Miami's Music
> Engineering program, and I am just starting to learn about
> auditory scene analysis, particularly computational ASA models.
>
> I know there are several CASA experts on this list, so I'd
> like to ask why source separation seems to be so difficult.
> It's seems like the general consensus is that source
> separation is far too difficult, and research has focused on
> understanding features within a mix.  Yet, from what I've
> read, current methods of feature extraction work quite well.
> It only seems natural that we could write an algorithm that
> groups these features according to their perceived source and
> creates separate audio streams based on this information.
> While this would be much more difficult in noisy or
> reverberant environments, I would imagine it would be quite
> simple in a less complex environment.
> What is it that makes source separation so difficult?
>
> Thanks,
> Jon Boley
>

Follow-Ups:
- VAD (Voice Activity Detection) algorithms?
  - From: Richard H.
- Re: Computational ASA
  - From: Paris Smaragdis

Prev by Date: Re: Absolute pitch.
Next by Date: Re: Computational ASA
Previous by thread: Re: Computational ASA
Next by thread: VAD (Voice Activity Detection) algorithms?
Index(es):
- Date
- Thread