Bishnu S. Atal
AT&T Bell Labs., Rm. 2D-461, 600 Mountain Ave., Murray Hill, NJ 07974
Recent interest in nonlinear prediction of speech has brought up the need to assess the performance limitations of linear speech models. While nonlinearity is essential in the sound production mechanisms this need not be reflected in a speech signal model. A linear model may produce the same perceptual effect for a human listener as the nonlinear one. The validity of a linear autoregressive model has been tested for the unvoiced segments of speech. In this experiment, a linear prediction residual is first computed from the speech signal. For unvoiced segments, the residual is replaced by white Gaussian noise with its energy matched every 5 ms. The residual is left unmodified for voiced segments. The linear prediction synthesis filter is excited with the composite residual signal to reconstruct speech. The resulting speech quality has been evaluated in a formal MOS test where 32 listeners rated speech material from 12 speakers uttering 3 sentences each. The present approach yielded an MOS of 4.0. In the same test, 16-bit linear PCM obtained a score of 4.1. In conclusion, a linear model with random excitation is sufficient to reproduce unvoiced speech with very high perceptual quality. A demonstration tape will be played during the presentation.