ASA 125th Meeting Ottawa 1993 May

2pSP4. Proper time-frequency distributions for speech.

Les E. Atlas

James W. Pitton

Patrick J. Loughlin

Dept. of Elec. Eng., FT-10, Univ. of Washington, Seattle, WA 98195

The usual representations that are used for speech involve sliding windows across the signal in time and doing Fourier or other analysis on this windowed signal. However, this approach is inconsistent with the notion of a proper distribution. A proper distribution is defined to have the usual properties of a joint probability density function and it has been shown that proper time-frequency distributions exist for speech [Loughlin et al., Proc. ICASSP, V-125--V-128 (1992)]. The most obvious difference of these new distributions is the improved simultaneous resolution in time and frequency. However, this advantage may not be useful for speech recognition and some other advantages could be utilized. For example, the signal-dependent kernel of a proper distribution may be insensitive to changes in room acoustics and could also effectively normalize out differences in vocal tract length. Also, a proper distribution's representation of a periodically excited, time-varying resonator is quite different than that of a spectrogram's, and new features with greater pertinence for auditory modeling may be apparent. [Work supported by Boeing and the Washington Technology Center.]