ASA 124th Meeting New Orleans 1992 October

5pSP12. Development of a female voice for a concatenative synthesis text-to-speech system.

Ann K. Syrdal

AT&T Bell Labs., Rm. 3E529, 101 Crawfords Corners Rd., Holmdel, NJ 07733

In response to telecommunication customer demand, a female voice was developed for a concatenative synthesis text-to-speech system. The voice was initially developed for a 1000-element diphone concatenation system [J. P. Olive and M. Y. Liberman, J. Acoust. Soc. Am. Suppl. 1 78, S6 (1985)] and then extended to a 2500-element acoustic inventory with units of variable size [J. P. Olive, Workshop on Speech Synthesis, Autrans, France, ESCA, 25--30 (1990)], both originally developed with male voices. In these synthesis-by-rule systems, segments used for synthesis are obtained from natural speech and concatenated to synthesize any English utterance. Difficulties for a female voice are (1) better performance of analysis and synthesis algorithms for male than for female speech, and (2) telephone bandwidth constraints, which filter out more phonetically relevant high-frequency acoustic information for female speakers than for males. Intelligibility testing was used to identify and replace problematic acoustic elements, and the introduction of new analysis and synthesis techniques resulted in improved intelligibility and quality.