Nancy A. Daly
Rm. NE43-606, Spoken Language Systems Group, Lab. for Comput. Sci., MIT, Cambridge, MA 02139
Extemporaneously generated speech often contains verbal hesitations, filled pauses and unfilled pauses, reflecting the speaker's uncertainty in formulating sentences, as in ``Where iiiiiis um the nearest bank.'' This study attempts to describe their acoustic properties using a subset, consisting of 3167 utterances from 66 speakers, of the spontaneous speech voyager urban navigation corpus, based on time-aligned orthographic and phonetic transcriptions. There are 564 verbal hesitations, 2518 unfilled pauses, and 148 filled pauses, and they are concentrated in 49.6% of the corpus utterances. 74.4% of the unfilled pauses occur in isolation, and their durations are longer when they cooccur with verbal hesitations and filled pauses by 46.1% and 363.9%, respectively. Over 70% of the verbal hesitations and filled pauses are followed by unfilled pauses, and they are longer than their isolated counterparts. Thus the results suggest that there may be a mutually enforcing effect among these acoustic events. An attempt has been made to identify verbal hesitations and filled pauses based on relative duration, proximity to silence, and relative mean F[sub 0] , using regression tree analyses, and a classification accuracy of approximately 70% on unseen data has been achieved.