Terrance M. Nearey
Dept. of Linguistics, Univ. of Alberta, Edmonton, AB T6G 2E7, Canada
Richard S. McGowan
Haskins Labs., New Haven, CT 06511
AT&T Bell Labs., Murray Hill, NJ 07974-0636
In speech perception, the issues emphasized are related to long-term representation of speech units, language-specific differences in speech perception by infants and adults, and some new results from cochlear implant and other hearing research relating to the auditory coding of speech cues. New results on the question of prototypes for speech units and how speaker and contextual variability affect listeners' behavior are reviewed. Next, there is a survey of recent findings on the effect of language background on infants' and adults' perception of speech. Finally, results from perceptual studies of cochlear implant patients involving systematically designed stimulus sets are summarized. In speech production, focus is on the applications of instrumentation, on numerical simulation, and on sensory feedback. There is discussion of measuring instruments that are beginning to provide data on motor compensation and coproduction, and sophisticated numerical simulations that are being developed to answer questions about vibratory mechanisms, air flow, and their relation to sound output. Finally, there is a discussion on the roles of receptor and auditory feedback on articulatory coordination and timing. In speech processing, stress is placed on the progress made in auditory representations, in computational models of language, and in statistical methods for text-to-speech synthesis. In auditory-based representations, 100- to 200-ms speech segments may serve as a basis for distance measures for speech coding that reflect perceptual acceptability of acoustic distortions, and as speech-recognition features that capture dynamic aspects of speech production. In spoken-language processing, the gap between stochastic models of speech (e.g., HMM's and more traditional models of language structure (syntax) and content (semantics) is being bridged. Finally, text-to-speech synthesis is moving, albeit slowly, from hand-tuned to automatically trained systems.