5pSC8. Posterior use of prosodic features to aid speech recognition.

Session: Friday Afternoon, December 6

Time: 3:50


Author: Keikichi Hirose
Location: Dept. of Information and Commun. Eng., Univ. of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113 Japan
Author: Kouji Iwano
Location: Dept. of Information and Commun. Eng., Univ. of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113 Japan
Author: Atsuhiro Sakurai
Location: TRDC, Texas Instruments, Tsukuba, 305 Japan

Abstract:

A method was proposed for posterior use of prosodic features to ensure correct recognition and to detect recognition errors. Fundamental frequency contours (F[inf 0] contours) are generated for recognition hypotheses using the prosodic rules developed for speech synthesis and are compared with the observed contour. Partial analysis-by-synthesis absorbs unexpected variations in the observed contour. This method can detect recognition errors accompanied by accent type changes and/or syntactic boundary shifts. While syntactic boundaries are useful for speech recognition, detecting them based on prior use of F[inf 0] contours is sometimes rather hard since they are less marked in the F[inf 0] contours. Therefore, the method was evaluated to determine how well it can detect syntactic boundaries using pitch information. Preliminary results given by K. Hirose and A. Sakurai [Proc. ICASSP-96, 809--812 (1996)] were further validated on the ATR continuous speech conference registration database, which includes 37 major syntactic boundaries (not preceded by long pauses but) accompanied by F[inf 0] rises reflecting to phrase components. The method identifies these boundaries with 92% accuracy within 2-mora position error, and, within 1-mora position error, with 86% accuracy. Discussion will extend to augmenting this method using statistical techniques such as HMMs.


ASA 132nd meeting - Hawaii, December 1996