Acoust. Res. Dept., AT&T Bell Labs., 600 Mountain Ave., Murray Hill, NJ 07974
The purpose of this special session is to call the attention of the hearing science community to the need for new knowledge on how speech segments of durations of 50--150 ms long (e.g., phonemes, diphones), are being represented in the auditory system. In this session, the need for such knowledge will be addressed in the context of two specific speech-technology applications---low bit-rate coding and automatic recognition (ASR)---which rely on processing speech information of a segment-length duration. Schemes for low bit-rate coding rely on signal manipulations that spread over durations of several tens of ms, and schemes for speech recognition rely on phonemic/articulatory information that extends over similar time intervals. Current research efforts are focused on the psychophysics of stationary acoustic inputs, for example, aspects of masking, pitch perception, and sound segregation. Research efforts also exist on the cognitive aspects of speech perception, for example, lexical access. In contrast, research on the psychophysical aspects of speech dynamics, as manifested in the acoustic properties over durations of tens of ms, still is in its infancy.