ASA 127th Meeting M.I.T. 1994 June 6-10

3pSP3. Neural network-based analysis of cues for vowel and consonant identification.

Jim Talley

Dept. of Linguist., 501 Calhoun Hall, Univ. of Texas at Austin, Austin, TX 78712

Many static and dynamic features of the acoustic speech signal have been proposed in the literature as cues for identification of phonetic categories. Ultimately, such features' cue validity is most appropriately studied via well-designed perceptual experiments involving human subjects, but such studies are hindered by their expense (especially in terms of time) and inherent confounds (stimuli must be sufficiently speech-like to garner treatment as such). This paper examines ways in which neural networks (NNs) can be utilized as auxiliary tools in phonetics/speech perception research. Discussion includes the application of NNs to the task of learning relevant speech category discriminations from various restricted characterizations of a corpus of naturally spoken CVC syllables. That task examines learnability as a function of the available featural information. In addition, various ``post-mortem'' techniques (e.g., transfer function mapping, weight analysis,...) are discussed which, when applied to trained NNs, yield estimates of the cue validity of (ensembles of) features with respect to phonetic category discriminations. These methods cannot be blindly interpreted as producing valid characterizations of human speech perception, however, they represent useful tools that are inexpensive and highly targetable (confounds can be controlled) and can serve as guides to fruitful experiments with human subjects. [Work supported by NSF.]