M. Medhi Homayounpour
CNRS/UA 1027, 19 rue des Bernardins, 75005 Paris, France
Inst. on Biomed. Res., Bulgarian Acad. of Sci., Bulgaria
IDIAP, Martigny, Switzerland
A text-independent speaker verification approach based on nonsupervised (SOM=self-organizing map of Kohonen) and supervised LVQ3 algorithm [T. Kohonen, Proc. IEEE 78, 1464--1480 (1990)] is proposed. During the training phase a general prototype SOM is formed by using the input vectors of 20 speakers from a polyphone-like telephony database. Individual SOMs for every speaker are built by passing the vectors of each speaker through the general SOM prototype. Every individual SOM is fine tuned by means of the LVQ3 algorithm. In the verification phase, a short parametrized speech interval of 3-s duration from the speaker under verification is compared to both the general prototype and the speaker's individual SOM. So two accumulated distortions are calculated. The decision to accept or reject a speaker depends on whether the speaker's voice matches to the claimed speaker's individual SOM or to the general prototype SOM. The MFCC, LPCC, and LSP coefficients were used in these studies as distinctive features. Personal SOMs were trained using 100 s of speech from each speaker. A conventional LBG algorithm was also applied. LVQ3 initialized by the SOM codebook was always more efficient than LVQ3 initialized by the LBG codebook using all three kinds of above-mentioned features.