1pSP16 Demodulation of AM--FM resonances in speech using energy

ASA 127th Meeting M.I.T. 1994 June 6-10

1pSP16. Demodulation of AM--FM resonances in speech using energy separation.

Petros Maragos

School of Elec. & Comput. Eng., Georgia Inst. Technol., Atlanta, GA 30332

Thomas F. Quatieri

MIT Lincoln Lab., Lexington, MA 02173

James F. Kaiser

Rutgers Univ., Piscataway, NJ 08855

Alexandros Potamianos

Georgia Inst. of Technol., Atlanta, GA 30332

Motivated by theoretical and experimental evidences [e.g., in H. Teager and S. Teager, Proc. NATO ASI: Speech Production and Speech Modeling, Bonas, France (1989)] that various nonlinear phenomena during speech production cause modulations of the airflow, AM--FM models for speech resonances and a novel efficient algorithm to estimate their parameters were proposed in [P. Maragos, J. Kaiser, and T. Quatieri, IEEE Trans. Signal Process. 41, 3024--3051 (1993)]. The algorithm uses the differential operation (Psi)(x)=(x)[sup 2]-xx to detect modulations in speech signals by tracking the physical energy implicit in the particular ``source'' producing the observed acoustic resonance signal and by separating this energy into its time-varying amplitude and frequency components. In this paper experimental results are reported on using refinements of this energy separation algorithm to measure modulations in speech resonances. These results indicate that voiced speech signals, bandpass filtered around speech formants, contain significant amplitude and frequency modulations within a pitch period. These modulation features seem promising for applications to speech coding, synthesis, and recognition. Further, applying the algorithm on synthetic speech produced by conventional linear synthesizers did not yield the modulations patterns found in real speech. [P. Maragos is supported by the National Science Foundation. T. F. Quatieri is supported by the Department of the Air Force.]