LIMSI-CNRS, BP133, F91403, Orsay, France
The production of some speech sounds involves the periodic modulation of a noise component (e.g., the voiced fricatives in French, or aspiration noises in breathy vowels). According to the linear acoustic model of speech production, the speech signal is produced by filtering an excitation signal with a linear time-varying filter (which represent the vocal tract transfer function and sound radiation). For instance, in voiced fricatives, the excitation signal is a mixed signal resulting from the modulation of a frication noise source by the glottal flow. Two representations, taking into account the nonstationary nature of mixed-excitation speech sounds, were studied: cyclostationary analysis (using a spectral correlation (SC) estimator of the cyclic frequency-frequency spectrum) and nonstationary analysis [using a smoothed pseudo Wigner--Ville (WV) estimator of the Wigner--Ville spectrum]. The theoretical and experimental results obtained on test signals and actual speech show that some acoustic parameters can be estimated using these analysis methods (frequency of modulation using SC; time structure of the modulation spectrum and vocal tract filter transfer function using WV). Nevertheless, estimation of other acoustic parameters (for instance the spectral density of the excitation noise) appeared difficult, if not impossible.