ASA 124th Meeting New Orleans 1992 October

5pSP7. Speaker adaptation for segment-based isolated-word recognition.

Yasushi Yamazaki

Toru Sanada

Shinta Kimura

Speech Recognition Sec., Advanced Systems Res. Div., Fujitsu Labs. Ltd., 1015, Kamikodanaka, Nakahara-ku, Kawasaki 211, Japan

Our speaker-dependent recognition method (SD) requires 184 training words. To reduce this number, three types of speaker adaptation were studied. Using smaller training words, they adapt segment spectrum templates to a new speaker. The first, SA1, learns a transfer function for converting templates for an existing speaker to those for a new speaker. The second, SA2, is based on the idea that the system uses templates for multiple speakers whose voices resemble to that of the new speaker. The third, SA3, adds templates extracted from training words to speaker-independent templates for many speakers. For reference, a speaker-independent method (SI) and a method using other speaker's templates (SO) were evaluated. All the methods were tested using a hard-to-recognize 100-word vocabulary by five speakers. The average recognition rates of SD, SI, and SO are 95.6%, 92.8%, and 85.3%. Those of SA1, SA2, and SA3 with 18 training words are 88.6%, 91.6%, and 93.8%. The performance of speaker adaptation from other speakers' templates, such as SA1 and SA2, is worse than that of SI, and the adaptation from speaker-independent templates (SA3) is promising. In an evaluation using some ordinary 100-word vocabularies, the average recognition rates of SD and SA3 are 98.4% and 98.0%.