Ann K. Syrdal
AT&T Bell Labs., Rm. 3E529, 101 Crawfords Corners Rd., Holmdel, NJ 07733
In response to telecommunication customer demand, a female voice was developed for a concatenative synthesis text-to-speech system. The voice was initially developed for a 1000-element diphone concatenation system [J. P. Olive and M. Y. Liberman, J. Acoust. Soc. Am. Suppl. 1 78, S6 (1985)] and then extended to a 2500-element acoustic inventory with units of variable size [J. P. Olive, Workshop on Speech Synthesis, Autrans, France, ESCA, 25--30 (1990)], both originally developed with male voices. In these synthesis-by-rule systems, segments used for synthesis are obtained from natural speech and concatenated to synthesize any English utterance. Difficulties for a female voice are (1) better performance of analysis and synthesis algorithms for male than for female speech, and (2) telephone bandwidth constraints, which filter out more phonetically relevant high-frequency acoustic information for female speakers than for males. Intelligibility testing was used to identify and replace problematic acoustic elements, and the introduction of new analysis and synthesis techniques resulted in improved intelligibility and quality.