Kevin A. Lenzo
ATR Interpreting Telecommun. Res. Labs., Dept. 3, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-02 Japan
Ohio State Univ., Columbus, OH 43210
A speech recognition system is under development in which a general-purpose inference engine for natural language processing is loaded with domain knowledge and a preprocessing periphery that allows for the integration of prosodic information. Processing takes place in well-defined layers, in which results at one level of abstraction become data to be explained by hypotheses in the next, using a layered-abduction inference mechanism that allows for integration of both top-down and bottom-up information [J. R. Josephson and S. G. Josephson, eds., Abductive Inference: Computation, Philosophy, Technology (Cambridge U. P., 1994)]. The hypothesis types and their interactions are based upon the converter/distributor model of phonetic implementation [ O. Fujimura, ``Syllable Timing Computation in the C/D Model,'' Proceedings of the Third International Conference on Spoken Language Processing (1994)]. Each hypothesis is annotated with a numerical magnitude that is used in computing a ``prosodic contour'' for the overall utterance, which aids in the generation of expectations, implications, and knowledge of which hypotheses can account for what data. Preprocessing events are explained by the features, which, in turn, are explained by hypotheses at the demisyllabic layer, the syllabic layer, and, finally, by the word-level hypotheses.