[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tech report on speech segregation

Dear Auditory list,

It's my pleasure to announce the following technical report
available via WWW.

Thanks for your attention,

Guoning Hu

"Monaural speech segregation based on pitch tracking and amplitude

Technical Report #6, March 2002

Department of Computer and Information Science
The Ohio State University

        Guoning Hu, The Ohio State University
        DeLiang Wang, The Ohio State University

Speech segregation in the monaural condition has proven to be very
challenging. Monaural speech segregation has been studied in previous
systems that incorporate auditory scene analysis principles. A major problem
for these systems is their inability to deal with speech in the
high-frequency range. Psychoacoustic evidence suggests that different
perceptual mechanisms are involved in handling resolved and unresolved
harmonics. We propose a system that deals with resolved and unresolved
harmonics differently. For resolved harmonics, the system generates segments
based on temporal continuity and cross-channel correlation, and groups them
according to their periodicities. For unresolved harmonics, it generates
segments based on common amplitude modulation (AM) in addition to temporal
continuity and groups them according to AM repetition rates derived from
sinusoidal modeling. Underlying the segregation process is a pitch contour
that is first estimated from speech segregated according to global pitch and
then adjusted according to psychoacoustic constraints. Our system is
systematically evaluated, and it yields substantially better performance
than previous systems, especially in the high-frequency range.

For WWW:

Related sound demos can be found at:

Preliminary versions (in pdf) of this work are included in