[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]

To: AUDITORY@xxxxxxxxxxxxxxx
From: tothl@xxxxxxxxxxxxxxx
Date: Mon, 9 Nov 1998 18:29:00 MET
Reply-to: tothl@xxxxxxxxxxxxxxx
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

>From tothl Mon Nov  9 18:29:16 +0100 1998 remote from inf.u-szeged.hu
Date: Mon, 9 Nov 1998 18:29:16 +0100 (MET)
From: Toth Laszlo <tothl@inf.u-szeged.hu>
X-Sender: tothl@csilla
To: auditory@lists.mcgill.ca
Subject: Re: your mail
In-Reply-To: <Pine.SUN.3.95.981109104657.14326C-100000@hebb.psych.mcgill.ca>
Message-ID: <Pine.SV4.3.91.981109175402.25391A-100000@csilla>
MIME-Version: 1.0
Received: from inf.u-szeged.hu by inf.u-szeged.hu; Mon,  9 Nov 1998 18:29 MET
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Length: 1880

On Mon, 9 Nov 1998, Al Bregman wrote:

> If detection and classification of voiced vs. unvoiced segments plays a
> central role in algorithmic speech recognition, and if this classification
> is based on distinguishing the pitched vs. noisy parts of the signal, then
> such algorithms wouldn't be able to understand whispered speech, which is
> an easy task for humans.
>
I think that although studying extreme cases (whispered speech, noisy
speech, interrupted speech, etc.) gives the most clues for understanding
speech understanding, for ASR "normal" speech is just a big enough problem.
Most of us would be glad if our speech recognizers did not work for
whispered speech, but otherwise were very robust.
Considering voicedness in ASR, you surely know that current speech
recognizers totally throw away this info. In fact, they want to remove
pitch, put they also remove the voiced/unvoiced decision.
In HMM it would indeed cause problem in the case of whispered speech,
but in our sytem it will (hopefully) simply reinforce a decision, but not
punish in other cases.
Considering human speech understanding, I think that voicedness is a very
robust acoustic cue, and I'm sure that we use it. It's another issue that
we can do without it. But I don't think, for example, that whispered
communication worked wery well in high background noise. Meanwhile, the
voiced/unvoiced cue is quite robust in this case. So, it maybe does not
play a central role, but can be very helpful.
I'd very glad to hear the opinion of others.
i
               Laszlo Toth
        Hungarian Academy of Sciences         *   "Do it or don't do it.
  Research Group on Artificial Intelligence   *    Don't try it."
     e-mail: tothl@inf.u-szeged.hu            *
     http://www.inf.u-szeged.hu/~tothl        *                   /Yoda/

Email to AUDITORY should now be sent to AUDITORY@lists.mcgill.ca
LISTSERV commands should be sent to listserv@lists.mcgill.ca
Information is available on the WEB at http://www.mcgill.ca/cc/listserv

Follow-Ups:
- Laszlo Toth & the Robust Voicing Cue
  - From: Richard Pastore

Prev by Date: Re: your mail
Next by Date: Laszlo Toth & the Robust Voicing Cue
Previous by thread: MPEG-7 Information
Next by thread: Laszlo Toth & the Robust Voicing Cue
Index(es):
- Date
- Thread