James D. Miller
Central Inst. for the Deaf, 818 S. Euclid, St. Louis, MO 63110
Relations between four acoustic paramenters, fundamental frequency (F0) and the center frequencies of the first three formants (F1, F2, and F3), and the perception of vowels are described. Prediction of listeners' identifications of vowels are best when acoustic trajectories are based on all four parameters. These parameters can be taken separately to form a four-dimensional space or they can be combined to form a three-dimensional space such as Miller's Auditory Perceptual Space (APS). The time-normalized paths through such spaces correlate best with listener responses. Temporal factors such as durations and speeds along these paths, within limits, are not critical. However, the direction of movement along the path can be crucial. While movement in a forward direction usually evokes the perception of the intended vowel, the opposite movement may sometimes evoke the perception of another vowel. Recent work shows that neural networks, trained with inputs based on F0, F1, F2, and F3, perform very similarly to humans listening to the waveforms of the isolated nuclei. These results will be reviewed and their implications for models of vowel perception will be discussed. [Work supported by NIDCD, AFOSR, and CID.]