5pSP3 Speaker-independent sound recognition.

ASA 124th Meeting New Orleans 1992 October

5pSP3. Speaker-independent sound recognition.

S. A. Hanna

Hanada Electronics, P.O. Box 23051, 2121 Carling Ave., Ottawa, ON K2A 4E2, Canada

Ann Stuart Laubstein

Carleton University

The speech recognition machine must be able to (i) recognize intended types by filtering out variability in tokens and (ii) segment a continuous speech stream. Given the infinite (at least in principle) set of possible phrases and sentences, the segments must be smaller units than these such as words (practical only when dealing with a limited vocabulary), syllables, or phonemes. Phoneme-based recognition systems such as the one proposed here divide the speech signal into a string of phoneme-like units, which are subsequently used to recognize large word vocabularies composed of these smaller units. The broad category classification of sound elements developed here is intended as an initial stage in a two-phase phoneme-based speech recognition system. The role of the first phase is to automatically break down the continuous signal into a string of speaker-independent broad sound classes. This is done in two steps. First, the continuous speech signal is segmented and then these segments are assigned to one of the following acoustic categories: vowel-like, voiced-fricative-like, unvoiced-fricative-like, voiced-stop-like, unvoiced-stop-like, and silence. This classification algorithm has been applied to speech sentences uttered by five different speakers. Informal subjective tests indicate that classification accuracy ranges from 90% to 94%.