Tech report on location-based segregation (Nicole Roman )


Subject: Tech report on location-based segregation
From:    Nicole Roman  <niki(at)CIS.OHIO-STATE.EDU>
Date:    Wed, 17 Jul 2002 14:43:28 -0400

Dear Colleagues, It is my pleasure to announce the availability of the following technical report. Thanks for your attention, Nicoleta Roman ************************************ "Speech segregation based on sound localization", Technical Report #16, June 2002. Department of Computer and Information Science The Ohio State University Nicoleta Roman, The Ohio State University DeLiang Wang, The Ohio State University Guy J. Brown, University of Sheffield ************************************* Abstract --------- At a cocktail party, we can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel machine learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). The auditory masking effect motivates the notion of an “ideal” time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. We observe that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic deviations for ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, we perform pattern classification in order to estimate ideal binary masks. A systematic evaluation shows that the resulting system produces masks very close to ideal binary ones, and gives a significant improvement in performance over an existing approach, as quantified by changes in signal-to-noise ratio before and after segregation. ************************************** The manuscript is available for download at: ftp://ftp.cis.ohio-state.edu/pub/tech-report/2002/TR16.pdf Related sound demos can be found at: http://www.cis.ohio-state.edu/~niki/soundemo.html A preliminary version of this work is included in the Proceedings of 2002 ICASSP.


This message came from the mail archive
http://www.auditory.org/postings/2002/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University