ASA 124th Meeting New Orleans 1992 October

2pSP16. Acoustical analysis of false starts in spontaneous speech.

Douglas O'Shaughnessy

INRS-Telecommunications, Univ. du Quebec, 16 Place du Commerce, Verdun, PQ H3E 1H6, Canada

In spontaneous speech many false starts occur, where a speaker interrupts the flow of speech to restart the utterance. The acoustic aspects of such restarts in a common database were examined for duration and fundamental frequency. Automatically identifying the type of restart could improve speech recognition performance, by eliminating one version of any repeated words (or parts), and in the case of changed words, suppressing the unwanted words, so that the recognizer operates on only desired words. In virtually all current recognizers, words in a restart either simply pass to the textual component of the recognizer or cause difficulties in having a proper interpretation in the language-model component (since the language model is invariably trained only on fluent text). The spoken data consists of 42 speakers, each speaking about 30 different utterances (median length of about 12 words). There were 60 occasions with simple repeated words (or portions), 30 cases of inserted words, and 25 occurrences of new words substituted. When a word was simply repeated in a restart, it had virtually the same prosodics usually, but occasionally the repeated word had less stress. With a substitution or insertion in the restart, the modified word was virtually always more stressed. For restarts where the speaker stopped in the middle of a word and simply ``backed up,'' the pause lasted 100--400 ms in 85% of the examples. About 75% of the interrrupted words did not have a completion of the vowel in the intended word's first syllable. In virtually all examples, the speaker completed at least 100 ms of the word, however, before pausing for at least 100 ms. When the pause occurred at a word boundary, the words repeated after the pause were either a straight repetition with little prosodic change or a repetition where the repeated words shortened up to 50%. As for the repeated words (after the pause) prior to the inserted word, function words showed little or no shortening, but usually had lower F0; content words here exhibited significant shortening and lower F0. Such prosodic change only applied to nonprepausal words, because words immediately prior to a pause were often subject to significant prepausal lengthening. [Work supported by NSERC.]