ATR Interpreting Telecommun. Labs., 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-02 Japan
With the growth of the availability of large speech corpora, statistical models of prosody control have been studied intensively. These computational models have improved the naturalness of synthetic speech and are expected to provide additional supra-segmental information for speech recognition. Though only conventional statistical methods such as linear regression, regression trees, or neural nets have been employed in these computational models, their success has been due to efforts to accommodate observed prosodic characteristics and qualitatively known control mechanisms. In this talk, a review will be presented on how observed prosodic characteristics have been modeled with these statistical tools and also how new statistical models can be designed to cope with the insufficiencies of conventional models. It is expected that the investigation of well-constrained models and their constraints will lead one to more efficient computational models and deeper understanding of prosody control mechanism through these modeling procedures.