ASA 125th Meeting Ottawa 1993 May

3pSP2. Evaluation of speaker normalization methods for vowel recognition using neural network and nearest-neighbor classifiers.

Gail A. Carpenter

Krishna K. Govindarajan

Cognitive and Neural Systems Dept., Boston Univ., 111 Cummington St., Boston, MA 02215

Intrinsic and extrinsic speaker normalization methods were compared using a neural network (fuzzy ARTMAP) and L[sub 1] and L[sub 2] K-nearest neighbor (K-NN) categorizers trained and tested on disjoint sets of speakers of the Peterson--Barney vowel database. Intrinsic methods included one nonscaled, four psychophysical scales (bark, bark with end correction, mel, ERB), and three log scales, each tested on four combinations of F[sub 0], F[sub 1], F[sub 2], F[sub 3]. Extrinsic methods included four speaker adaptation schemes each combined with the 32 intrinsic methods: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). ARTMAP and K-NN showed similar trends, with K-NN performing better, but requiring about ten times as much memory. Among intrinsic methods, ARTMAP and K-NN performed optimally using all the differences between bark scaled F[sub i] (BDA). The ordering of performance for the extrinsic methods were LT, CSi, LS, and CS. For all extrinsic methods, ARTMAP performed best using BDA; K-NN chose psychophysical measures for all except CSi. [Work supported by BP, DARPA, NSF, AFOSR, ONR.]