5pSC2. Automatic detection of accentual phrase boundaries using prosodic features and phoneme boundaries.

Session: Friday Afternoon, December 6

Time: 2:20

Author: Toshio Hirai
Location: ATR Interpret. Telecom. Res. Labs., Kyoto, 61902 Japan
Author: Mari Ostendorf
Location: Boston Univ., Boston, MA 02215
Author: Norio Higuchi
Location: ATR ITL Kyoto, 61902 Japan
Author: Yoshinori Sagisaka
Location: ATR ITL Kyoto, 61902 Japan


There are two approaches for constructing an appropriate fundamental frequency (F[inf 0]) control method for speech synthesis: statistical and rule-based. The statistical approach has the advantage of automatic training, but it requires a large corpora of speech that is annotated with prosodic boundaries. Recently, a method is proposed for high-accuracy detection of these boundaries [Ostendorf and Ross (1996)], given a set of prosodic boundary candidates in which almost all the correct boundaries are included. This paper proposes a detection method to generate these boundary candidates, specifically for accentual phrases which represent one of the smallest prosodic units. The detection algorithm uses local maximums and minimums of the F[inf 0] contour and low-energy regions of the speech waveform for finding candidate regions that correspond to accentual phrases and pauses in speech. The candidate phrase boundaries are then aligned to the nearest phoneme boundaries, which are detected automatically using forced alignment with a speaker-independent speech recognition system given a phoneme transcription. This method was applied to 250 read Japanese sentences. High-detection accuracy (97%) was obtained, with almost all the missed detections having valid candidates within (plus or minus)3 phonemes. The insertion error rate was less than double the number of correct boundaries.

