4aSC22. Optimization of speaker recognition model based on dynamic features of two-dimensional mel-cepstrum.

Session: Thursday Morning, December 5

Time:

Author: Chiyomi Miyajima
Location: Dept. of Intelligence and Comput. Sci., Nagoya Inst. of Technol., Gokiso, Showa, Nagoya, 466 Japan
Author: Tadashi Kitamura
Location: Dept. of Intelligence and Comput. Sci., Nagoya Inst. of Technol., Gokiso, Showa, Nagoya, 466 Japan

Abstract:

This paper describes an optimization of TDMC-VQ based speaker recognition model. Two-dimensional mel-cepstrum (TDMC) is defined as the two-dimensional Fourier transform of a mel-frequency scaled logarithm spectra in the frequency and time domains. It consists of averaged and dynamic spectral features of the two-dimensional mel log-spectra in the analyzed interval. It has been demonstrated that TDMC is very effective for word recognition and robust for a noisy environment. The speaker model is based on vector quantization of the TDMC parameter and is created by a self-organizing algorithm. In this study, TDMC and a conventional mel-cepstrum were used as the speech parameter. An optimization by a weighting function applied to the components of the speech parameter and a combination of the two parameters were studied. In order to evaluate the proposed model, text-independent speaker identification experiments for eight female speakers were carried out. Each model learns speech data of about 18-s length and identifies 200 input sentences. The experimental results have shown that the proposed model with an optimal weighting function gives better performance than the usual method and gives 100% identification performance for 1.4-s input speech. The model also shows robustness for time interval.

ASA 132nd meeting - Hawaii, December 1996