[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tech report on speech segregation

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Tech report on speech segregation
From: DeLiang Wang <dwang@xxxxxxxxxxxxxxxxxx>
Date: Wed, 8 Jul 1998 17:28:20 -0400
Comments: To: connectionists@cs.cmu.edu
Comments: cc: gbrown@cis.ohio-state.edu
Reply-to: DeLiang Wang <dwang@xxxxxxxxxxxxxxxxxx>
Sender: Research in auditory perception <AUDITORY@xxxxxxxxxxxxxxx>

The following technical report is available via FTP/WWW:

------------------------------------------------------------------
"Separation of Speech from Interfering Sounds
Based on Oscillatory Correlation"

Technical Report #24, June 1998
The Ohio State University Center for Cognitive Science
------------------------------------------------------------------

        DeLiang L. Wang, The Ohio State University
        Guy J. Brown, University of Sheffield

A multi-stage neural model is proposed for an auditory scene analysis task -
segregating speech from interfering sound sources. The core of the model is
a two-layer oscillator network that performs stream segregation on the basis
of oscillatory correlation. In the oscillatory correlation framework,
a stream is represented by a population of synchronized relaxation
oscillators, each of which corresponds to an auditory feature, and different
streams are represented by desynchronized oscillator populations. Lateral
connections between oscillators encode harmonicity, and proximity in
frequency and time. Prior to the oscillator network are a model of the
auditory periphery and a stage in which mid-level auditory representations
are formed. The model has been systematically evaluated using a corpus of
voiced speech mixed with interfering sounds, and produces improvements in
terms of signal-to-noise ratio for every mixture. Furthermore, the
pattern of improvements seems consistent with human performance. The
performance of our model is compared with other studies on computational
auditory scene analysis. A number of issues including biological plausibility
and real-time implementation are also discussed.

(28 pages, 384 KB compressed)


for anonymous ftp:
        FTP-HOST: ftp.cis.ohio-state.edu
        Directory: /pub/leon/Brown
        Filename: ccs98.ps.gz

for WWW:
        http://www.cis.ohio-state.edu/~dwang/reports.html

   (Some pages may not show up in postscript display, but should print OK)


Send comments to DeLiang Wang (dwang@cis.ohio-state.edu)

Email to AUDITORY should now be sent to AUDITORY@lists.mcgill.ca
LISTSERV commands should be sent to listserv@lists.mcgill.ca
Information is available on the WEB at http://www.mcgill.ca/cc/listserv

Prev by Date: sennheiser hd250 linear 2 headphones: additional info
Next by Date: Re: Sound of tube amplifiers
Previous by thread: sennheiser hd250 linear 2 headphones: additional info
Next by thread: Klatt synthesis
Index(es):
- Date
- Thread