5pSC12. On the model of global F[inf 0] shape for Japanese text-to-speech systems.

Session: Friday Afternoon, December 6

Time: 4:50


Author: Yasushi Ishikawa
Location: Information Technol. R&D Ctr., Mitsubishi Electric Corp., 5-1-1 Ofuna, Kamakura, 247 Japan
Author: Takashi Ebihara
Location: Information Technol. R&D Ctr., Mitsubishi Electric Corp., 5-1-1 Ofuna, Kamakura, 247 Japan
Author: Kunio Nakajima
Location: Information Technol. R&D Ctr., Mitsubishi Electric Corp., 5-1-1 Ofuna, Kamakura, 247 Japan

Abstract:

A model of F[inf 0] control is one of the most important problems for the naturalness of synthesized speech in Japanese TTS systems. In general, a two-stage model which consists of a global model and a local model is used as a Japanese F[inf 0] control model. A local model generates F[inf 0] contour for each accent phrase, a global model generates parameters of a local model from the linguistic information of an accent phrase. The parameter based on tree structure which is obtained from syntactic analysis is a typical parameter for the global model. However, in such a global model, it is difficult to express syntactical context of phrases, and syntactical analysis is also a difficult problem. A global model is proposed which has integrated F[inf 0] shape generation and syntactic analysis. This model is presented as a network of those states which show syntactical and prosodic states of sentences. In the model a linguistic class of input accent phrase decides a state to move, and generates a phrasal accent parameter for a local model when taking the transition. The training method of this network is also proposed. The predicted results showed that this model can predict the phrasal accent parameters with satisfactorily high accuracy. It strongly suggests that high quality synthesized speech can be obtained with the model.


ASA 132nd meeting - Hawaii, December 1996