[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Ph.D. dissertation announcement: Sound Source Segregation
Thanks for the announcement and making your thesis available online. I've
read the abstract, though not the full thesis.
It brought back to mind something that's been bothering me about the use of
spatial location in separating sounds. Typically in studying spatial
hearing in humans, we immobilize the head to a greater or lesser degree so
that our subjects won't "cheat" by moving their heads. In doing so, we may
be systematically ignoring an important cue for auditory scene analysis
(ASA). Why do the subjects want to move their heads? Is it because such
motions (as well as whole-body motions) cause the components from different
sound sources to behave differently, making it easier to segregate them?
Has anybody studied the role of head movements in ASA?
Albert S. Bregman,
Psychology Dept., McGill University
1205 Docteur Penfield Avenue
Canada H3A 1B1
Voice & Fax: +1 (514) 484-2592
----- Original Message -----
From: "Nicoleta Roman" <niki@xxxxxxxxxxxxxxxxxx>
Sent: Thursday, December 15, 2005 12:46 PM
Subject: Ph.D. dissertation announcement: Sound Source Segregation
> Dear auditory list members:
> I would like to bring to your attention my recently completed Ph.D.
> dissertation, entitled "Auditory-based algorithms for sound segregation
> in multisource and reverberant environments".
> An electronic version of the thesis is available at:
> Please find the abstract below.
> Nicoleta Roman
> At a cocktail party, we can selectively attend to a single voice and
> filter out other interferences. This perceptual ability has motivated a
> new field of study known as computational auditory scene analysis (CASA)
> which aims to build speech separation systems that incorporate auditory
> principles. The psychological process of figure-ground segregation
> suggests that the target signal should be segregated as foreground while
> the remaining stimuli are treated as background. Accordingly, the
> computational goal of CASA should be to estimate an ideal time-frequency
> (T-F) binary mask, which selects the target if it is stronger than the
> interference in a local T-F unit. This dissertation investigates four
> aspects of CASA processing: location-based speech segregation, binaural
> tracking of multiple moving sources, binaural sound segregation in
> reverberation, and monaural segregation of reverberant speech. For
> localization, the auditory system utilizes the interaural time
> difference (ITD) and interaural intensity difference (IID) between the
> ears. We observe that within a narrow frequency band, modifications to
> the relative strength of the target source with respect to the
> interference trigger systematic changes for ITD and IID resulting in a
> characteristic clustering. Consequently, we propose a supervised
> learning approach to estimate the ideal binary mask. A systematic
> evaluation shows that the resulting system produces masks very close to
> the ideal binary ones and large speech intelligibility improvements. In
> realistic environments, source motion requires consideration. Binaural
> cues are strongly correlated with locations in T-F units dominated by
> one source resulting in channel-dependent conditional probabilities.
> Consequently, we propose a multi-channel integration method of these
> probabilities in order to compute the likelihood function in a target
> space. Finally, a hidden Markov model is employed for forming continuous
> tracks and automatically detecting the number of active sources.
> Reverberation affects the ITD and IID cues. We therefore propose a
> binaural segregation system that combines target cancellation through
> adaptive filtering and a binary decision rule to estimate the ideal
> binary mask. A major advantage of the proposed system is that it imposes
> no restrictions on the interfering sources. Quantitative evaluations
> show that our system outperforms related beamforming approaches.
> Psychoacoustic evidence suggests that monaural processing play a vital
> role in segregation. It is known that reverberation smears the
> harmonicity of speech signals. We therefore propose a two-stage
> separation system that combines inverse filtering of target room impulse
> response with pitch-based segregation. As a result of the first stage,
> the harmonicity of a signal arriving from target direction is partially
> restored while signals arriving from other locations are further
> smeared, and this leads to improved segregation and considerable
> signal-to-noise ratio gains.