Re: [AUDITORY] Speech Intelligibility Index vs. Direct Measurement

I would like to point out that these approaches cannot deal with some of the modern voice processing techniques like:

1) Low bit rate coding

2) Noise suppression

3) Automatic level/gain control distortions

4) Sudden loss of speech signal (time clipping in VoIP)

A first attempt to create a system that can deal with these distortions in predicting intelligibility (on the basis of PESQ www.pesq.org )

can be found in [1] while the latest speech quality assessment standard POLQA [2] [3] (open access papers, or look at www.polqa.info)

is currently being extended towards intelligibility [4].

John Beerends

TNO

The Netherlands

[1] J. G. Beerends, R. A. van Buuren, J. M. Van Vugt, J. A. Verhave, “Objective Speech Intelligibility Measurement on the Basis of Natural Speech in Combination with Perceptual Modeling,” J. Audio Eng. Soc., vol. 57, pp. 299-308 (2009 May.).

[2] J. G. Beerends, C. Schmidmer, J. Berger, M. Obermann, R. Ullman, J. Pomy and M. Keyhl, “Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part I – Temporal Alignment,” J. Audio Eng. Soc., vol. 61, pp. 366-384 (2013 June).

[3] J. G. Beerends, C. Schmidmer, J. Berger, M. Obermann, R. Ullman, J. Pomy and M. Keyhl, “Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II – Perceptual Model,” J. Audio Eng. Soc., vol. 61, pp. 385-402 (2013 June).

[4] J. G. Beerends, “Extending P.863 ‘POLQA’ towards intelligibility testing, first results,” ITU-T Study Group 12, White contribution COM 12-C302 (2012 May).

From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Richard M. Warren
Sent: Friday, November 08, 2013 9:39 PM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Speech Intelligibility Index vs. Direct Measurement

The Speech Intelligibility Index (SII) [1] procedure has been used for many years to determine the intelligibility of passbands heard singly and in combination. The SII is a complex multi-step procedure that initially cancels contributions from filter slopes by combining intelligibility judgments of partially masked high-pass and low-pass speech having a series of cut-off frequencies. This data is used to calculate the band importance values, which then can be used to produce band intelligibility estimates. Studebaker and Sherbecoe (1991) [2] employed a commercially available broadband recording of W-22 word lists to produce band importance estimates that are listed in Table B.3 of the SII. Recently, J.M. Kates (2013) [3] used the data reported by Studebaker and Sherbecoe to produce an improved method used to determine SII importance values and intelligibility estimates shown in his Figures 1 and 3.

But rather than trying to improve the SII procedure, Warren, Bashford, and Lenz (2011) [4] proposed a direct measurement alternative: The Rectangular Passband Intelligibility (RPI) procedure. We employed the same copy of the commercially available recording of W-22 lists employed by Studebaker and Sherbecoe. Instead of generating high-pass and low-pass speech, we used high-order FIR filtering to produce effectively vertical slopes for passbands at different center frequencies. Listeners were then able to use these rectangular passbands to obtain direct intelligibility measurements of the same passbands heard singly and in the combinations reported using the SII procedure. The relative intelligibilities obtained were in general similar using the two different procedures as can be seen in the Warren et al. Figures 5 and 6. Also, the RPI procedure is not limited to word lists: It had been used previously with sentences (Warren, Bashford, & Lenz, 2005) [5].

The RPI has several advantages over the SII procedure. These include: (1) the intrinsic advantage of direct measurement over estimation; (2) the RPI is simpler to use, eliminating the requirement for measuring intelligibility of high-pass and low-pass speech heard at various signal/noise ratios. In addition, there is no need for calculating the intermediate stage of importance values followed by the requirement of a transfer function in order to obtain intelligibility estimates; (3) the RPI procedure is the only one that can obtain passband intelligibility judgments under quiet conditions. This makes it possible to quantify the decrement produced by extraneous sounds and distortions.

REFERENCES

[1.] ANSI S3.5, 1997, Reaffirmed, 2012. “Methods for the Calculation of the Speech Intelligibility Index,” (American National Standards Institute, New York).

[2.] Studebaker, G.A., & Sherbecoe, R.L. “Frequency-Importance and Transfer Functions for Recorded CID W-22 Word Lists,” Journal of Speech and Hearing Research, 1991, Vol. 34, 427-438.

[3.] Kates, J.M. “Improved Estimation of Frequency Importance Functions,” Journal of the Acoustical Society of America, 2013, Vol. 134, EL459-EL464.

[4.] Warren, R.M., Bashford, J.A., Jr., & Lenz, P.W. “An Alternative to the Computational Speech Intelligibility Index Estimates: Direct Measurement of Rectangular Passband Intelligibilities,” Journal of Experimental Psychology: Human Perception and Performance, 2011, Vol. 37, 296 - 302.

[5.] Warren, R.M., Bashford, J.A., Jr., & Lenz, P.W. “Intelligibilities of 1-Octave Rectangular Bands Spanning The Speech Spectrum When Heard Separately and Paired.” Journal of the Acoustical Society of America, 2005, Vol. 118, 3261-3266.

Richard M. Warren

Research Professor

and Distinguished Professor Emeritus

Department of Psychology

University of Wisconsin-Milwaukee

PO Box 413

Milwaukee, WI 53201

(414) 229-5328

Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u niet de geadresseerde bent of dit bericht abusievelijk aan u is toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht te verwijderen. TNO aanvaardt geen aansprakelijkheid voor de inhoud van deze e-mail, de wijze waarop u deze gebruikt en voor schade, van welke aard ook, die verband houdt met risico's verbonden aan het elektronisch verzenden van berichten.

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.