Re: Speech signal SPL & S/N --- Responses (Athanassios Protopapas )

Subject: Re: Speech signal SPL & S/N --- Responses
From:    Athanassios Protopapas  <protopap(at)WIESEL.SCILEARN.COM>
Date:    Wed, 18 Mar 1998 15:58:02 -0800

Dear colleagues: I would like to thank you very much for responding to my query with a lot of very useful information. It looks like the ITU-T standard P.56 is the answer (see attached replies and documents referenced therein) but there are other bits that may also be of more general interest so I am attaching here all the replies that I got. Thanks to Mitch Sommers (msommers(at) Bruno H. Repp (repp(at) Torben Poulsen (tp(at) August Schick (SCHICK(at) Tim Sherwood (timothy.sherwood(at) John G. Beerends (J.G.Beerends(at) Sebastian Moeller (moeller(at) Mike Brookes (mike.brookes(at)IC.AC.UK) for taking the time and trouble to help. Thanassi Protopapas -------------------------------------------------------------------- Here are the original questions: >--What is the best way to control and calibrate sound-pressure levels >during stimulus presentation across speech stimuli? (I've read about >peak-to-peak estimated dB SPL--what is this and how do I do it?) > >--What is the best way to apply the maskers? Should I use noise >generators, or generate the noise digitally and mix it with my digital >speech signal? If I do it digitally, what software (and by extension what >operating system) would be best suited for the job? > >--What is the best way to determine S/N ratio? Should I use rms amplitude? >Over what portion of the signal? Again, what software will allow me to do >this most efficiently? -------------------------------------------------------------------- And the responses follow: -------------------------------------------------------------------- >From msommers(at) Mon Mar 16 12:14:58 1998 There are several ways to deal with calibration of speech signals. Often, calibration is done using a pure-tone calibration tone (almost always 1k) and levels are referenced relative to this. Probably the second most common is to use RMS amplitude levels. Other solutions are possible depending upon your application. If your are interested in vowels for example, you could record synthetic versions of the vowel using values similar to those in the natural signal. -------------------------------------------------------------------- >From repp(at) Mon Mar 16 10:58:16 1998 I haven't worked on speech in a number of years and may not be up-to-date on the latest techniques. However, I have found signal-correlated noise (SCN) very useful in speech masking studies. SCN is generated by randomly reversing the sign of 50% of the speech samples and adding the resulting amplitude-modulated noise to the original speech signal. The relative weights given to the two components determine the S/N ratio. There is no need to normalize the amplitudes of the speech signal in any way because the noise has exactly the same amplitude envelope as the signal, so that the S/N ratio is constant across variations in amplitude. The only drawback of the technique is that SCN, being amplitude-modulated, retains some phonetic information. It depends on the purpose of the study whether this is problem. -------------------------------------------------------------------- >From tp(at)DAT.DTU.DK Tue Mar 17 00:28:40 1998 Re. Speech levels: In a paper by Carl Ludvigsen you may find the information you need: Carl Ludvigsen: Comparison of certain measures of speech and noise level. Scandinavian Audiology 1992, vol 21, pp 23-29. -------------------------------------------------------------------- >From SCHICK(at) Tue Mar 17 01:35:19 1998 Dear collleague, ask Prof. Gerda and Hans Lazarus in the Univ. Bochum (Germany). Hans Lazarus did a lot in your field. He is responsible till tady in the ICBEN for the branch "Noise and speech" -------------------------------------------------------------------- >From timothy.sherwood(at) Tue Mar 17 07:30:38 1998 This my be of interext, extract from Speech Audiomerty, second edition Edited by Mike Martin published by Whurr (ISBN 1-897635-12-5) Extract from chapter on Equipment for speech audiometry and its calibration - Tim Sherwood and Hilary Fuller Measurement of word levels For the audiologist using pre-recorded material, the level for the presentation of a word list to the patient can easily be set using the calibration tone recorded at the beginning of each list or at the beginning of the whole tape or disc. Once the level of the tone is set, adjustments to the level of the whole list are made with the audiometer attenuators which are, in turn, checked as part of the audiometric calibration procedure. For those who are practising live voice testing or are recording new test material, however, the measurement of the levels of the individual words is important and is by no means a simple matter because of the nature of the speech signal. A plot of amplitude against time for a single word (see figure 2 sorry not shown) shows the difficulty. It can clearly be seen that most of the acoustical energy of the word is in the vowel, yet much of the information is in the consonants and, in addition, the vowel has a very short rise time which might affect the measurement made. The problem is not quite as great as it might at first appear because there is a natural pattern to the relationship between the different phonemes of a word and this will not be broken as long as the speaker does not deliberately try to distort the word by, for example, speaking more 'clearly' than usual. Fuller & Whittle (1982) made a study of the measurement of word levels for audiometric purposes. They took as their criterion, the Speech Detection Threshold (SDT) that is, the level at which the listener can just detect the stimulus is present, but cannot recognise the word, and investigated which of a number of different physical measures gave the best prediction of the subjectively determined SDT for normal hearing subjects. They included a wide range of instruments and measures in their study, including the VU meter, Peak Programme Meter (PPM), measures of total energy, the peak level, the RMS level measured with fast and slow meter characteristics and the maximum impulse level. The difficulties between the predictive ability of the different measures was comparatively small, but the correlations jumped from 0.7 to 0.9 when the A-weighting was added to each of the measures. Brady (1971) also made a study of the measurement of word level and showed that there was considerable variability in measurements made with a VU meter by non-expert observers. This work suggests that whilst the VU meter, which is the indicator usually fitted to an audiometer, gives an adequate objective measure of the subjective level of the word, it would be better to use a meter which is more easily read. A meter giving an RMS fast reading and with the ability to switch in A-weighting would be a better choice for this purpose. When recorded lists are being prepared, it is possible to avoid total reliance on measurement of the levels of the words and to produce lists which have been subjectively equalized. The technique is to make a first recording with an experienced speaker using equal vocal effort or feedback from a monitoring meter to get the levels approximately equal. An iterative process is then begun, testing normal hearing subjects close to threshold, to determine which items of the list are particularly easy or difficult to perceive. The levels of these words are then adjusted by lowering those of the easy items and raising the difficult ones until the words are all of approximately the same difficulty. In a similar manner the different word lists of a set can be examined to ensure that the whole corpus of recorded material is homogeneous. The process of selecting material and setting the recorded levels of individual words by subjective testing is tedious and demands considerable investment of time, but if the resultant material is to have widespread use, the effort is well worthwhile. Hood & Poole (1977) spent considerable time studying one set of British recordings (MRC, 1974) and found that they were able to improve them markedly by adjusting the levels of the words. In the USA, Hirsh et al. (1952) made a classic study using subjective measurements during the development of the CID lists and this led to reduction in the total number of test items from 84 to 36. Similarly Markides (1978) performed a normative study on a set of recordings of the British Boothroyd lists (Boothroyd, 1968) recorded at Southampton University. This led to 3 of the 15 lists being omitted from the final tapes because they gave results which were significantly different from the other members of the set, but it meant that the rest of the recordings could be confidently used as a coherent set. The final step in the preparation of recordings for use in speech audiometry is to preface each set with a calibration tone which can be used to set up the audiometer and to check the level of reproduction of the word lists. The duration of the tone should be sufficient to allow ample time for the adjustment of the audiometer, which means that it should be at least 60 seconds long. In practice, if the recordings have been edited so that the lists in a particular set have been equalised, it is unlikely that there would be any advantage in recording a separate calibration tone before each list. The stability of the equipment should be more than adequate to allow for a complete test for both ears to be made, and a tone before each list would only be an encumbrance to the audiologist. For the purpose of setting up the audiometer, a single pure-tone at a frequency of 1 kHz should suffice. If a check is to be made of the response of the whole system, however, then a separately prepared recording containing a series of pure-tones should be used. For speech audiometry, the range up to 8 kHz should be covered and tones at frequencies of 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, and 8 kHz are recommended. It is important that care should be given to the choice of the recorded level of the calibration tone and to its relationship to the recorded level of the word lists. Ideally, the calibration tone should either be recorded at the mean level of the words or should bear a fixed relationship to it. The definition of the mean level will be dependent upon the method which has been used to equalise the constituent parts of the word lists, but as an example, if it is the peak levels of the words which have been equalised, it would be advisable to record the calibration tone at a level of 3 dB below the peaks. Two factors need to be balanced in choosing the recording level used; the need to avoid the possibility of overloading the amplifier by having too high a level and the danger that if too low a level is used, problems may arise through the introduction of excessive background noise. The available dynamic range will vary with equipment which is being used, but a good compromise would seem to be to record the calibration tone at such a level that, when the hearing level control on the audiometer is set to zero, the level produced by the calibration signal is 20 dB SPL. -------------------------------------------------------------------- >From J.G.Beerends(at) Tue Mar 17 08:34:00 1998 There is an ITU (International Telecommunication Union) recommendation that deals with this problem. It is called P.56 as attached below. There is software available within a special ITU software group. [ATTACHMENT NOT INCLUDED DUE TO SIZE; SEE WEB LINKS BELOW INSTEAD ] -------------------------------------------------------------------- >From moeller(at) Tue Mar 17 23:35:46 1998 regarding SPL measurements for speech, there exists a recommendation for telephony, used by ITU-T (former CCITT). The according number is ITU-T Recommendation P.56 (Objective measurement of active speech level), and the electronic version can be obtained by the ITU-T. Further details can be found at -------------------------------------------------------------------- >From mike.brookes(at)IC.AC.UK Tue Mar 17 23:58:52 1998 This topic is covered in ITU-T standard P.56: "Objective Measurement of Active Speech Level". The method described in the standard only works if the SNR is better than 21dB or so. The standard refers to two papers (which I haven't looked at myself): (1) R.W.Berry: Speech-volumne measurements on telephone circuits, Proc IEE, Vol 118, No 2, pp 335-338, Feb 1971 (2) P.T.Brady: Equivalent Peak Level: a threshold independent speech level measure, JASA, Vol 44, pp695-699, 1968. -------------------------------------------------------------------- ------------------------------- Athanassios Protopapas, Ph.D. Principal Scientist Scientific Learning Corporation 1995 University Ave., Ste. 400 Berkeley, CA 94704-1074, U.S.A. Ph:510-665-6721 Fx:510-665-1275 protopap(at)

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University