4aSC29. Comparing techniques for synthesizing emotional speech.

Session: Thursday Morning, December 5


Author: Caroline Henton
Location: Digital Equipment Corp., 2 Results Way, MRO1 Marlborough, MA 01752


Comparative attempts to incorporate vocal emotions into synthetic speech using two different speech synthesizers are presented. On a Macintosh computer running the concatenative speech synthesizer MacinTalkPro2 it is possible to generate emotions in synthetic speech, using a very limited number of parameters [C. Henton, J. Acoust. Soc. Am. 95, 3010(A) (1994); C. Henton and B. Edelman, Multimedia Tools and Applications 3, 1--25 (1996)]. These parameters are: average speaking pitch; pitch range; pitch movement; speech rate; volume; duration; and silence. Traditional formant synthesizers, such as Digital Equipment Corporation's DECtalk, allow for emotional affect to be added through the manipulation and interaction of a great many more acoustic parameters. The outcome of using the same limited number of parameters to create the same emotions in both systems will be analyzed. Output from the two speech synthesizers reading the same emotionally colored passage will be demonstrated. Improvements and suggestions for how to incorporate emotional synthetic speech into applications according to the sophistication of the user, the speed/memory limitations of the synthesizer and/or CPU, and the intended use for the application will be offered. The need for different gender-specific emotional profiles for adults and children will also be explored.

ASA 132nd meeting - Hawaii, December 1996