On Mon, 9 Nov 1998, Al Bregman wrote:

> If detection and classification of voiced vs. unvoiced segments plays a
> central role in algorithmic speech recognition, and if this classification
> is based on distinguishing the pitched vs. noisy parts of the signal, then
> such algorithms wouldn't be able to understand whispered speech, which is
> an easy task for humans.
I think that although studying extreme cases (whispered speech, noisy
speech, interrupted speech, etc.) gives the most clues for understanding
speech understanding, for ASR "normal" speech is just a big enough problem.
Most of us would be glad if our speech recognizers did not work for
whispered speech, but otherwise were very robust.
Considering voicedness in ASR, you surely know that current speech
recognizers totally throw away this info. In fact, they want to remove
pitch, put they also remove the voiced/unvoiced decision.
In HMM it would indeed cause problem in the case of whispered speech,
but in our sytem it will (hopefully) simply reinforce a decision, but not
punish in other cases.
Considering human speech understanding, I think that voicedness is a very
robust acoustic cue, and I'm sure that we use it. It's another issue that
we can do without it. But I don't think, for example, that whispered
communication worked wery well in high background noise. Meanwhile, the
voiced/unvoiced cue is quite robust in this case. So, it maybe does not
play a central role, but can be very helpful.
I'd very glad to hear the opinion of others.
               Laszlo Toth
        Hungarian Academy of Sciences         *   "Do it or don't do it.
  Research Group on Artificial Intelligence   *    Don't try it."
     e-mail: tothl@inf.u-szeged.hu            *
     http://www.inf.u-szeged.hu/~tothl        *                   /Yoda/

