Dept. of Comput. and Inform. Sci., Univ. of Pennsylvania, Philadelphia, PA 19104
Raw text contains many ambiguities that must be resolved in speech synthesis. Numbers and abbreviations often have different pronunciations in different contexts, as in the phrase ``1240 people at 1240 St. Albans St.'' The token ``IV'' is pronounced differently in ``Henry IV,'' ``Section IV,'' and ``IV drug.'' Proper names, acronyms, and words with the same part of speech may also have context-sensitive pronunciations. Previous speech synthesizers have used heuristics or simple defaults to handle these cases. In recent work [Sproat et al., in International Conference on Spoken Language Processing (1992)], statistical decision procedures have been applied to this text normalization process. The talk will describe further developments in this work, using Bayesian discriminators and decision lists based on nearby words, word classes, and type of text as evidence for the selection between pronunciation variants. Examples will include implementations of these algorithms in the AT&T Bell Laboratories TTS synthesizer.