Re: Computational ASA ("Maher, Rob" )


Subject: Re: Computational ASA
From:    "Maher, Rob"  <rmaher(at)ECE.MONTANA.EDU>
Date:    Fri, 30 Apr 2004 09:29:09 -0600

Jon-- I think the inherent difficulty of computational source separation has to do with the generally ill-posed nature of the research problem: given a composite observation vector 'A' that is a linear sum of N unknown time-varying signal vectors 'B', 'C', ..., determine estimates of 'B', 'C', .... In other words, one equation in N unknowns, where N > 1. Without some other valid source of information, there can be no unique solution to the problem. To obtain the "other valid source of information," the CASA field has a variety of threads. One thread involves the use of conventional DSP techniques to transform the composite signal into a (typically) time-frequency representation, then to perform pattern extraction in the transform domain. Another thread uses biologically-inspired signal processing via cochlear models and perceptually-derived nonlinear functions borrowed from the perceptual audio coding field. Yet another thread starts with human psychoacoustical data in an attempt to exploit the cognitive concepts of source segregation and streaming. It is sometimes argued that "humans can do separation, so the problem must be soluble." I would argue that humans do source _identification and tracking_ very effectively, but perhaps humans do not actually solve the computational _separation_ problem, in the sense that the individual vectors 'B', 'C', etc. are extracted in a neural signal processing context. A computational system that is able reliably to classify the number, identity, and duration of overlapping sonic events seems like a first step in the process. Yet, I don't know of any system to date that comes close to a casual human's ability to determine the orchestration of a musical selection or recognize the doorbell at a noisy party. We certainly need so new insights into the problem, so welcome aboard! Rob Maher -- Robert C. (Rob) Maher, Ph.D. Associate Professor of Electrical and Computer Engineering Montana State University-Bozeman rob.maher(at)montana.edu > -----Original Message----- > From: Jon Boley [mailto:jdb(at)jboley.com] > Sent: Friday, April 30, 2004 7:59 AM > To: AUDITORY(at)LISTS.MCGILL.CA > Subject: Computational ASA > > > Hi all, > I am a grad student in the University of Miami's Music > Engineering program, and I am just starting to learn about > auditory scene analysis, particularly computational ASA models. > > I know there are several CASA experts on this list, so I'd > like to ask why source separation seems to be so difficult. > It's seems like the general consensus is that source > separation is far too difficult, and research has focused on > understanding features within a mix. Yet, from what I've > read, current methods of feature extraction work quite well. > It only seems natural that we could write an algorithm that > groups these features according to their perceived source and > creates separate audio streams based on this information. > While this would be much more difficult in noisy or > reverberant environments, I would imagine it would be quite > simple in a less complex environment. > What is it that makes source separation so difficult? > > Thanks, > Jon Boley >


This message came from the mail archive
http://www.auditory.org/postings/2004/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University