4pSC17. In search of an invariant representation for speech: The modulation spectrogram.

Session: Thursday Afternoon, December 5

Time: 2:45

Author: Steven Greenberg
Location: Univ. of California, Berkeley and Intl. Comput. Sci. Inst., 1947 Center St., Berkeley, CA 94704
Author: Brian E. Kingsbury
Location: Univ. of California, Berkeley and Intl. Comput. Sci. Inst., 1947 Center St., Berkeley, CA 94704

Abstract:

The ability to understand speech spoken by diverse speakers under a wide range of acoustic environmental conditions presents a central challenge to current theories of speech perception. Reverberation, background noise, and speaker variation all contribute to the heterogeneous acoustic realization of phonetic information from which listeners routinely derive linguistic information. Representations based on the detailed spectro-temporal properties of speech (e.g., the sound spectrogram) are not sufficiently stable under such conditions as to provide a reasonable account of the processes by which the brain decodes spoken language. A new representational format for visualizing speech, based on the distribution of the modulation spectrum below 16 Hz (and which emphasizes modulation frequencies centered around 4 Hz) across critical-bandlike channels, provides a high degree of stability under reverberant and low signal-to-noise-ratio conditions. This modulation-based representation appears to capture many essential properties of neurons in the primary auditory cortex, and is also well matched to the temporal dynamics of speech production. As such, it may provide a more principled basis for understanding the mechanisms by which human listeners decode the speech signal under natural acoustic conditions than finer-grained spectro-temporal representations.

ASA 132nd meeting - Hawaii, December 1996