ATR Auditory and Visual Perception Res. Labs., 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan
ATR Human Information Processing Res. Labs.
A ``dynamic cepstrum'' is proposed as a new spectral parameter that outperforms conventional cepstrum in speech recognition. The dynamic cepstrum incorporates both the static and dynamic aspects of speech spectral sequences by implementing forward masking, which is one of the most important mechanisms for extracting the spectral dynamics that provide acoustic cues in speech perception. Recent research on auditory perception [E. Miyasaka, J. Acoust. Soc. Jpn. 39, 614--623 (1983) (in Japanese)] reports that forward masking becomes more widespread over the frequency axis as the masker-signal interval increases. This masking characteristic facilitates the novel filtering methodology of time-dependent spectral filtering. The new dynamic cepstrum spectral parameter can simulate this function. The parameter is obtained by subtracting the masking level from the current cepstrum. The masking level at the current time is calculated as the sum of the masking levels obtained by filtering the preceding spectral sequence, where the cut-off frequency of the low-pass filter shifts lower as a function of masker-signal interval. A /b,d,g,m,n,N/recognition experiment that applies the dynamic cepstrum to hidden Markov models demonstrates that the dynamic cepstrum outperforms the combination of cepstrum and delta cepstrum, where the combination requires twice as many parameters as the dynamic cepstrum.