ATR Interpreting Telecommun. Res. Labs., 2-2, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, 619-02 Japan
The authors have analyzed the fundamental frequency (F0) contours of Japanese sentences spoken in four styles, e.g., normal, hurried, angry, and kind, for the synthesis of natural sounding speech. Thirty-five sentences in each speaking style spoken by a professional narrator were analyzed. The parameters of the F0 generation model proposed by Fujisaki, e.g., the minimum value of F0 (F[sub min]), the amplitude of the phrase commands (A[sub p]) and the amplitude of the accent commands (A[sub a]), are used here as key factors in the analysis. In the case of the sentences spoken angrily, F[sub min] is kept high, and the change due to both the phrase component and the accent component is minimal. Consequently, the F0 contours of sentences spoken angrily are flat. On the other hand, in the sentences spoken softly (kind) the dynamic range due to the accent component is greater than for the others, and in order to keep it high the amplitude of the phrase component is accordingly supressed. The F0 contours of the sentences spoken hurriedly are similar except that the amplitude of the accent commands is slightly smaller than those spoken normally. It was found that these parameters are useful to express the difference due to the speaking styles.