Herman J. M. Steeneken
TNO Inst. for Human Factors, P.O. Box 23, 3769ZG, The Netherlands
Current objective measures for predicting the intelligibility of speech assume that this can be modeled as a simple addition of the contributions from individual frequency bands. The articulation index (Al) and speech transmission index (STI) are based on this assumption. There is evidence that the underlying assumption of mutually independent contributions is not valid and may lead to erroneous prediction for conditions with a limited frequency transfer, with spectral gaps or with spectrally localized masking. An experiment was performed focused on the contributions from individual frequency bands. For this purpose the speech signal was subdivided into 7 octave bands with center frequencies ranging from 125 Hz to 8 kHz. For 26 different combinations of 3 or more octave bands the CVC-word score (consonant--vowel--consonant, nonsense words) was obtained at three signal-to-noise ratio. For an improved prediction of the observed intelligibility, a revised model was designed that accounts for mutual dependency between adjacent octave bands by the introduction of a redundancy correction (second order cross-product terms). A similar model was found for male and female speech. The data also provided a verification of the relation between signal-to-noise ratio and the contribution to the STI.