[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CASA problems and solutions



Dear John, John, Al, DeLiang and others,

To put my 10-cents worth of wisdom into the discussion of CASA vs. ASA,
with due respect to DeLiang I must say that the most striking difference
between ASA performed by humans and CASA is that the former works and the
latter, with few notable exceptions, does not (or at least not well
enough). Therefore, I venture to say that CASA engineers have probably
nothing to lose if their models try to emulate what a human listener does
-- as some of  the models actually attempt to do, such as Okuno's "agents"
that monitor a suspected source. From my point of view, the real trouble is
that we, the humanoidally challenged, are still far from understanding
exactly how people accomplish ASA, i.e., the list of tasks that Al Bregman
enumerates.

One particular function that, to the disappointment of many (including
myself), we are now relegating to second rank is localization-based
segregation. To the  studies showing the relative inefficiency of spatial
source separation listed by John Culling, I would like to append ours soon
to appear in the book of the Mierlo proceedings. The problem of spatial
separation, in my view, is not that the auditory system is a bad localizer
but that the location it signals is so prone to adaptation, as the Franssen
effect demonstrates and as Rachel Clifton has shown in her precedence
effect demonstration. There is nothing as difficult to localize as
simultaneous ongoing steady-state or quasi-steady-state signals: is not the
placement of instruments the last thing you are concerned about when you
hear a symphony concert? The same should be also true for a cocktail party
where the bulk of acoustic energy is carried by vowels -- this is the
reason that the babble is often characterized as a "buzz". Can one identify
the location of individual bees around a hive? What the localization system
is exquisitely sensitive to are two parameters: the very first onset of a
signal and a change in source location (as Erv Hafter's studies have
shown), neither of which are prominent in a cocktail-party.

Nevertheless, even if source localization does seem to be a disappointingly
poor segregation factor, John Bates's model may be closer than he thinks to
what the auditory system may be doing. Actually, it appears that a strict
Helmholtzian view of the ear is at best incomplete because there appears to
exist a parallel broadband analysis without which Neal Viemeister's now
classic observations depicting TMTF could not have occurred. Thus, John's
idea that the ear should first register that something has happened and
when (which inevitably gives an interaural time difference reading) and do
the frequency analysis later is not inconsistent with current auditory theory.

Bipartisanly yours (just to be politically up-to-date),

Pierre



****************************************************************************
Pierre Divenyi, Ph.D.      Speech and Hearing Research (151)
                                     V.A. Medical Center, Martinez, CA
94553, USA
Phone: (925) 370-6745
Fax:     (925) 228-5738
E-mail :                       pdivenyi@marva4.ebire.org
****************************************************************************