Inst. f. El. Nachrichtentechnik, Aachen Univ. of Technol. RWTH), D-52056 Aachen Germany
A predictor is presented which estimates the mean opinion score (MOS) for a given speech sample from speech and noise transfer characteristics of a specific handset applied to an artificial ear. The critical band rate excitation pattern is computed in 50-ms blocks for original and distorted speech signals and additive room noise. For each block three psychoacoustic parameters are computed: An intelligibility index (I) is evaluated using SNR analysis in each sub-band and considering simultaneous masking effects. Naturalness (N) is estimated by spectral distance between original and distorted speech. A loudness index (L) is derived from loudness (computed similar to ISO532) using a trapezoid function: L decreases if the loudness which is exceeded in 10% of time is lower than 15 sone or higher than 45 sone. The MOS is predicted as a weighted sum of I, N, and L. The prediction results were verified by an opinion test including totally 442 speech samples of several talkers which were filtered simulating typical transfer characteristics of handsets and presented in a noisy environment. Speech level and SNR were varied in a wide range. The comparison proves a good correlation (p=0.93 for all samples and p=0.96 for speech levels below 76 dB SPL).