ASA 126th Meeting Denver 1993 October 4-8

2pSP4. Duration modeling with hidden Markov models.

L. F. M. ten Bosch X. Wang L. C. W. Pols

Inst. for Phonetic Sci., Univ. of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlands

In hidden Markov modeling (HMM) of speech signals, the statistics of speech characteristics are represented by HMM parameters after the HMM training. This procedure is purely statistical. This study concerns the incorporation of explicit knowledge into the HMM training. Therefore one specific parameter, i.e., segment duration, was selected. In order to study the relation between duration and HMM modeling, three types of duration PDFs (DPDFs) are distinguished: (A) the DPDF defined by the segmented database used (the actual duration histogram); (B) the DPDF defined by the trained Markov model (i.e., by the transition matrix), and (C) the DPDF based on the HMM segmentation. While PDF (A) is based on data and PDF (B) is based on the trained model, PDF (C) combines both features and is based on the available set of observation sequences. First, an explicit relation is formulated between topology of the PLU, the three DPDFs, and the so-called Pade expansion. By using the generating function of the PDPT, it is possible to relate topological properties of PLUs on the one hand and algebraic properties of the DPDF on the other. Second, relations between those PDFs are presented by using two databases containing identical texts, but read aloud with a normal and fast speaking rate. This procedure allows a comparison between variations in the phonetic segment duration and the HMM parameters.