5aSC38. Characterizing speech timing and speaking rate through subword parse trees.

Session: Friday Morning, December 6


Author: Grace Y. Chung
Location: Spoken Lang. Systems, Lab. Comput. Sci., Cambridge, MA 02139
Author: Stephanie Seneff
Location: Spoken Lang. Systems, Lab. Comput. Sci., Cambridge, MA 02139


A sublexical parse tree is used to model and study characteristics of speech timing in English. This representation is based on the ANGIE system [Seneff et al., Proc. ICSLP (Philadelphia, PA, 1996)], a hierarchical framework which captures morphological, syllabic, and phonological phenomena probabilistically. The duration of a unit in the tree is measured as a percentage of the total duration of its corresponding parent unit. This framework can be used both to conduct statistical studies to characterize temporal phenomena and to create duration models to aid speech recognition. A strategy has been developed in which unit durations in upper layers are successively normalized by their respective realizations in the layers below. This reduces the variance at each unit by enabling the sharing of statistical distributions to overcome sparse data problems. The normalized duration of a word node at the top of the tree can be taken as a parameter for speaking rate. The data were used to examine within-speaker rate variability and to study the second-order effects of speaking rate on relative durations of sublexical units. The experiments are being conducted in the ATIS domain [Glass et al., Proc. ARPA Spoken Language Technology Workshop, TX (1995), pp. 252--256].

