ASA 127th Meeting M.I.T. 1994 June 6-10

4aSP5. Text-to-speech synthesis and statistical analysis of natural speech corpora.

Jan P. H. van Santen

AT&T Bell Labs., 600 Mountain Ave., P. O. Box 636, Murray Hill, NJ 07974-0636

A text-to-speech (TTS) system consists of modules that convert orthographic input via various intermediate representations into speech. The principles underlying these modules are quite diverse. The orthography-to-phoneme module (for some languages) can use pronunciation rules from the research literature. For most modules, however, the literature lacks the information needed for their implementation. Increasingly, these knowledge gaps are filled not by conducting new research but by applying statistical tools to speech corpora and absorbing the estimated parameter values directly in the TTS system. A challenge to statistical approaches, stemming from the input domain of TTS systems being {/m open}, will be discussed. Specifically, the input domain has the property that the number of very rare contextual constellations is so large that the probability of some rare constellation occurring even in a small text sample is quite large. Consequently, corpora rarely cover all constellations that the TTS system nevertheless must be prepared for. This makes it important for statistical approaches to be based on corpus-independent regularities. Such regularities can be discovered with exploratory data analysis (a type of statistical analysis used prior to parameter estimation) together with an understanding of what could account for them. These points will be illustrated with speech timing and corpus-based concatenation.

All posters will be on display from 11:00 am to 12:00 noon.