5pSC7. Stochastic models of prosody for automatic spoken language systems.

Session: Friday Afternoon, December 6

Time: 3:35

Author: Nanette M. Veilleux
Location: Comput. Sci. Dept., Metropolitan College, Boston Univ., 755 Commonwealth Ave., Boston, MA 02215


The prosody of an utterance, especially the placement and type of prominences and phrase breaks, is important in human understanding of speech. Prosody provides listeners with cues to factors such as syntactic structure, discourse information, and semantic content. Although prosody provides helpful and, sometimes, necessary cues for human listeners, it has not been used to a significant degree in automatic speech recognition or understanding systems. Some of the difficulties stem from an incomplete understanding of the relationship between prosody and higher level linguistic structures, e.g., discourse. Other barriers simply result from the intrinsically variable nature of prosodic production. In both cases, statistically modeling the relationship between prosody and the acoustic signal on one hand, and between prosody and higher level syntactic and semantic structures can be useful. The work here describes such a model, which expresses the mapping between the acoustic signal and syntactic/semantic structures, using prosody as an intermediary representation. Although the current model mainly uses syntax and word class information, extensions are described for handling some discourse phenomena. Analysis of the results using this approach in the context of the ATIS and other speech understanding tasks will be discussed. [Work supported by ARPA and NSF.]

ASA 132nd meeting - Hawaii, December 1996