4aSC34. A method for registration of a new voice in a text-to-speech synthesizer.

Session: Thursday Morning, December 5


Author: Takashi Saito
Location: IBM Res., Tokyo Res. Lab., IBM Japan Ltd., Japan
Author: Masaharu Sakamoto
Location: IBM Res., Tokyo Res. Lab., IBM Japan Ltd., Japan


In conventional text-to-speech systems, synthesis unit inventories are prepared in advance through a laborious process of speech data gathering, analysis, and manual segmentation, and users of such systems cannot add unit inventories of new voices. This paper describes a method for registration of a new speaker's voice to synthesis unit inventories in a text-to-speech system, by providing users with a function of registering their voices like a training function commonly used in speech recognition systems. An approach taken here is as follows: (1) create an initial unit inventory as a reference unit inventory by manual segmentation, (2) for new speakers, let them give utterances following a guidance synthetic speech of the reference speaker, whose pitch range is adjusted to those of new speakers, (3) segment the new speech database automatically by a phonemic alignment technique combined with constraints given by the segment information of the reference unit inventory. Experimental results are shown for a waveform-concatenation-based TTS system, which was recently developed by the authors [Saito et al., Proc. ICASSP'96, 381--384 (1996)].

