Nanette Veilleux Ahwat Schlosser Mari Ostendorf
ECS Dept., Boston Univ., 44 Cummington St., Boston, MA 02215
One of the difficulties in modeling the prosody of spontaneous speech is distinguishing hesitations from fluent intonational phrase breaks. The goal of this study is to identify acoustic and syntactic cues that can be used to better model hesitation phenomena. The study is based on a set of several hundred utterances of spontaneous speech from the ATIS corpus, each with prosodic constituents and hesitations labeled by hand and syntactic constituents and phonetic alignments labeled automatically. The first set of analyses compared acoustic correlates for prosodic phrase boundaries with and without hesitations. For constituents below the level of the intermediate phrase, the average duration of a pause is longer when there is a hesitation (mean 270 ms vs 19 ms), as is the mean normalized segment duration in the word-final syllable rhyme (mean 1.8 vs 0.2), both with significance p<10[sup -5]. The second set of analyses looked at the differences in syntactic structure at word boundaries with and without hesitations. Again, for constituents below the level of the intermediate phrase, results show that hesitations are likely to occur after a conjunction, auxiliary verb, or preposition, i.e., at locations where the entropy of next-word candidates is relatively high.