[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Ph.D. dissertation announcement: Sound Source Segregation
Dear auditory list members:
I would like to bring to your attention my recently completed Ph.D.
dissertation, entitled "Auditory-based algorithms for sound segregation
in multisource and reverberant environments".
An electronic version of the thesis is available at:
Please find the abstract below.
At a cocktail party, we can selectively attend to a single voice and
filter out other interferences. This perceptual ability has motivated a
new field of study known as computational auditory scene analysis (CASA)
which aims to build speech separation systems that incorporate auditory
principles. The psychological process of figure-ground segregation
suggests that the target signal should be segregated as foreground while
the remaining stimuli are treated as background. Accordingly, the
computational goal of CASA should be to estimate an ideal time-frequency
(T-F) binary mask, which selects the target if it is stronger than the
interference in a local T-F unit. This dissertation investigates four
aspects of CASA processing: location-based speech segregation, binaural
tracking of multiple moving sources, binaural sound segregation in
reverberation, and monaural segregation of reverberant speech. For
localization, the auditory system utilizes the interaural time
difference (ITD) and interaural intensity difference (IID) between the
ears. We observe that within a narrow frequency band, modifications to
the relative strength of the target source with respect to the
interference trigger systematic changes for ITD and IID resulting in a
characteristic clustering. Consequently, we propose a supervised
learning approach to estimate the ideal binary mask. A systematic
evaluation shows that the resulting system produces masks very close to
the ideal binary ones and large speech intelligibility improvements. In
realistic environments, source motion requires consideration. Binaural
cues are strongly correlated with locations in T-F units dominated by
one source resulting in channel-dependent conditional probabilities.
Consequently, we propose a multi-channel integration method of these
probabilities in order to compute the likelihood function in a target
space. Finally, a hidden Markov model is employed for forming continuous
tracks and automatically detecting the number of active sources.
Reverberation affects the ITD and IID cues. We therefore propose a
binaural segregation system that combines target cancellation through
adaptive filtering and a binary decision rule to estimate the ideal
binary mask. A major advantage of the proposed system is that it imposes
no restrictions on the interfering sources. Quantitative evaluations
show that our system outperforms related beamforming approaches.
Psychoacoustic evidence suggests that monaural processing play a vital
role in segregation. It is known that reverberation smears the
harmonicity of speech signals. We therefore propose a two-stage
separation system that combines inverse filtering of target room impulse
response with pitch-based segregation. As a result of the first stage,
the harmonicity of a signal arriving from target direction is partially
restored while signals arriving from other locations are further
smeared, and this leads to improved segregation and considerable
signal-to-noise ratio gains.