Re: speech/music (Malcolm Slaney )


Subject: Re: speech/music
From:    Malcolm Slaney  <malcolm(at)INTERVAL.COM>
Date:    Tue, 31 Mar 1998 09:58:50 -0800

At 1:17 AM -0800 3/31/98, Sue Johnson wrote: >I think there must be some way the brain splits up (deconvolves) the >signal before applying a speech recogniser. No! Who says the brain operates in only a bottom-up manner? The best counter-example I know of is a song by Miriam Makeba where she sings in an african click language. To my non-african ears, a click in the middle of speech is heard as speech, but when the same sound is accompanied by music it is heard as a drum beat. An ambiguous sound changes it grouping based on the context. A similar argument can be made about sine-wave speech--it has both speech-like and tone-like components. Perhaps an even better example is the McGurk Effect-- Wow, a low-level auditory decision changes based on visual input. Certainly the visual system isn't connected at a low level to the auditory system. Some information *must* be travelling top-down. It's expectation driven. Most work on Computational Auditory Scene Analysis has assumed a bottom-up processing model. That's certainly the easy, engineering approach. But there is much evidence that life is not so simple. In a chapter I wrote called "A Critique of Pure Audition" I argue that there are too many processing stages that must come first (and often conflicting) that it can't be bottom-up. Bregman's book mostly discusses bottom-up grouping cues. But I'm sure there must be grouping based on language and similar high-level constructs. I don't know how you would prove it. (Anecdotally, I did notice in a Japanese cocktail party that it was easier to separate native english speakers, probably because their prosody fit my expectations.) Many examples of these effects (include the Click Song) are available at http://web.interval.com/papers/1997-056/ An early version of the chapter, before a massive edit to clean up the language, is online at http://web.interval.com/papers/1995-010/ Unfortunately, copyright restrictions means that the final chapter is not online. You'll have to get a copy of the book. - Malcolm


This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University