M. Aksmanovic L. Deng
Dept. of Elec. and Comput. Eng., Univ. of Waterloo, Waterloo, ON N2L 3G1, Canada
The formulation of the hidden Markov model (HMM) has been successfully used in automatic speech recognition for almost two decades. In the standard formulation, the individual states in the HMM are each associated with a generally distinct but stationary stochastic process. This makes the standard HMM inadequate for representing the nonstationary property of the many speech segments intended to be described by the HMM-state statistics. A generalized HMM has been developed to overcome this inadequacy by introducing state-dependent polynomial regression functions on time that serve as the time-varying means in the HMM's Gaussian output distributions [e.g., L. Deng, Signal Process. 27, 65--78 (1992)]. Recently, Aksmanovic and Deng extended the above model so that the state-dependent nonstationary process contains multiple tracks of the polynomial functions. This new parametric class of nonstationary-state HMMs has been implemented and evaluated. Experiments on fitting models to speech data, on limited-vocabulary word recognition, and on phonetic classification demonstrated advantages of the nonstationary-state HMMs over the traditional stationary-state HMMs. Details of the model implementation and of the experimental results will be described. In particular, the focus will be on comparisons between uses of single-track and multiple-track regression functions defined within the HMM states, and on comparisons among uses of varying orders of the state-dependent polynomial regression functions.