[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Speech signal SPL & S/N --- Responses

Dear colleagues:

I would like to thank you very much for responding to my query with
a lot of very useful information.  It looks like the  ITU-T standard
P.56 is the answer (see attached replies and documents referenced
therein) but there are other bits that may also be of more general
interest so I am attaching here all the replies that I got.

Thanks to

Mitch Sommers (msommers@artsci.wustl.edu)
Bruno H. Repp (repp@haskins.yale.edu)
Torben Poulsen (tp@dat.dtu.dk)
August Schick (SCHICK@psychologie.uni-oldenburg.de)
Tim Sherwood (timothy.sherwood@npl.co.uk)
John G. Beerends (J.G.Beerends@research.kpn.com)
Sebastian Moeller (moeller@ika.ruhr-uni-bochum.de)
Mike Brookes (mike.brookes@IC.AC.UK)

for taking the time and trouble to help.

Thanassi Protopapas


Here are the original questions:

>--What is the best way to control and calibrate sound-pressure levels
>during stimulus presentation across speech stimuli?  (I've read about
>peak-to-peak estimated dB SPL--what is this and how do I do it?)
>--What is the best way to apply the maskers?  Should I use noise
>generators, or generate the noise digitally and mix it with my digital
>speech signal?  If I do it digitally, what software (and by extension what
>operating system) would be best suited for the job?
>--What is the best way to determine S/N ratio?  Should I use rms amplitude?
>Over what portion of the signal?  Again, what software will allow me to do
>this most efficiently?


And the responses follow:


>From msommers@artsci.wustl.edu  Mon Mar 16 12:14:58 1998

There are several ways to deal with calibration of speech signals.
Often, calibration is done using a pure-tone calibration tone
(almost always 1k) and levels are referenced relative to this.
Probably the second most common is to use RMS amplitude levels.
Other solutions are possible depending upon your application.  If
your are interested in vowels for example, you could record
synthetic versions of the vowel using values similar to those in the
natural signal.


>From repp@haskins.yale.edu  Mon Mar 16 10:58:16 1998

I haven't worked on speech in a number of years and may not be up-to-date
on the latest techniques. However, I have found signal-correlated noise
(SCN) very useful in speech masking studies. SCN is generated by randomly
reversing the sign of 50% of the speech samples and adding the resulting
amplitude-modulated noise to the original speech signal. The relative
weights given to the two components determine the S/N ratio. There is no
need to normalize the amplitudes of the speech signal in any way because
the noise has exactly the same amplitude envelope as the signal, so that
the S/N ratio is constant across variations in amplitude. The only drawback
of the technique is that SCN, being amplitude-modulated, retains some
phonetic information. It depends on the purpose of the study whether this
is problem.


>From tp@DAT.DTU.DK  Tue Mar 17 00:28:40 1998

Re. Speech levels:

In a paper by Carl Ludvigsen you may find the information you need:

Carl Ludvigsen: Comparison of certain measures of speech and noise
level. Scandinavian Audiology 1992, vol 21, pp 23-29.


>From SCHICK@psychologie.uni-oldenburg.de  Tue Mar 17 01:35:19 1998

Dear collleague, ask Prof. Gerda and Hans Lazarus in the Univ.
Bochum (Germany). Hans Lazarus did a lot in your field. He is
responsible till tady in the ICBEN for the branch "Noise and speech"


>From timothy.sherwood@npl.co.uk  Tue Mar 17 07:30:38 1998

This my be of interext, extract from Speech Audiomerty, second edition
Edited by Mike Martin published by Whurr (ISBN 1-897635-12-5) Extract
from chapter on Equipment for speech audiometry and its calibration -
Tim Sherwood and Hilary Fuller

Measurement of word levels

For the audiologist using pre-recorded material, the level for the
presentation of a word list to the patient can easily be set using the
calibration tone recorded at the beginning of each list or at the
beginning of the whole tape or disc. Once the level of the tone is set,
adjustments to the level of the whole list are made with the audiometer
attenuators which are, in turn, checked as part of the audiometric
calibration procedure.

For those who are practising live voice testing or are recording new
test material, however, the measurement of the levels of the individual
words is important and is by no means a simple matter because of the
nature of the speech signal. A plot of amplitude against time for a
single word (see figure 2 sorry not shown) shows the difficulty.

It can clearly be seen that most of the acoustical energy of the word
is in the vowel, yet much of the information is in the consonants and,
in addition, the vowel has a very short rise time which might affect
the measurement made. The problem is not quite as great as it might at
first appear because there is a natural pattern to the relationship
between the different phonemes of a word and this will not be broken as
long as the speaker does not deliberately try to distort the word by,
for example, speaking more 'clearly' than usual.

Fuller & Whittle (1982) made a study of the measurement of word levels
for audiometric purposes. They took as their criterion, the Speech
Detection Threshold (SDT) that is, the level at which the listener can
just detect the stimulus is present, but cannot recognise the word, and
investigated which of a number of different physical measures gave the
best prediction of the subjectively determined SDT for normal hearing
subjects. They included a wide range of instruments and measures in
their study, including the VU meter, Peak Programme Meter (PPM),
measures of total energy, the peak level, the RMS level measured with
fast and slow meter characteristics and the maximum impulse level. The
difficulties between the predictive ability of the different measures
was comparatively small, but the correlations jumped from 0.7 to 0.9
when the A-weighting was added to each of the measures. Brady (1971)
also made a study of the measurement of word level and showed that
there was considerable variability in measurements made with a VU meter
by non-expert observers.

This work suggests that whilst the VU meter, which is the indicator
usually fitted to an audiometer, gives an adequate objective measure of
the subjective level of the word, it would be better to use a meter
which is more easily read. A meter giving an RMS fast reading and with
the ability to switch in A-weighting would be a better choice for this

When recorded lists are being prepared, it is possible to avoid total
reliance on measurement of the levels of the words and to produce lists
which have been subjectively equalized. The technique is to make a
first recording with an experienced speaker using equal vocal effort or
feedback from a monitoring meter to get the levels approximately equal.
An iterative process is then begun, testing normal hearing subjects
close to threshold, to determine which items of the list are
particularly easy or difficult to perceive. The levels of these words
are then adjusted by lowering those of the easy items and raising the
difficult ones until the words are all of approximately the same
difficulty. In a similar manner the different word lists of a set can
be examined to ensure that the whole corpus of recorded material is

The process of selecting material and setting the recorded levels of
individual words by subjective testing is tedious and demands
considerable investment of time, but if the resultant material is to
have widespread use, the effort is well worthwhile. Hood & Poole (1977)
spent considerable time studying one set of British recordings (MRC,
1974) and found that they were able to improve them markedly by
adjusting the levels of the words. In the USA, Hirsh et al. (1952) made
a classic study using subjective measurements during the development of
the CID lists and this led to reduction in the total number of test
items from 84 to 36. Similarly Markides (1978) performed a normative
study on a set of recordings of the British Boothroyd lists (Boothroyd,
1968) recorded at Southampton University. This led to 3 of the 15 lists
being omitted from the final tapes because they gave results which were
significantly different from the other members of the set, but it meant
that the rest of the recordings could be confidently used as a coherent

The final step in the preparation of recordings for use in speech
audiometry is to preface each set with a calibration tone which can be
used to set up the audiometer and to check the level of reproduction of
the word lists. The duration of the tone should be sufficient to allow
ample time for the adjustment of the audiometer, which means that it
should be at least 60 seconds long. In practice, if the recordings have
been edited so that the lists in a particular set have been equalised,
it is unlikely that there would be any advantage in recording a
separate calibration tone before each list. The stability of the
equipment should be more than adequate to allow for a complete test for
both ears to be made, and a tone before each list would only be an
encumbrance to the audiologist.

For the purpose of setting up the audiometer, a single pure-tone at a
frequency of 1 kHz should suffice. If a check is to be made of the
response of the whole system, however, then a separately prepared
recording containing a series of  pure-tones should be used. For speech
audiometry, the range up to 8 kHz should be covered and tones at
frequencies of 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, and 8 kHz
are recommended.

It is important that care should be given to the choice of the recorded
level of the calibration tone and to its relationship to the recorded
level of the word lists. Ideally, the calibration tone should either be
recorded at the mean level of the words or should bear a fixed
relationship to it. The definition of the mean level will be dependent
upon the method which has been used to equalise the constituent parts
of the word lists, but as an example, if it is the peak levels of the
words which have been equalised, it would be advisable to record the
calibration tone at a level of 3 dB below the peaks.

Two factors need to be balanced in choosing the recording level used;
the need to avoid the possibility of overloading the amplifier by
having too high a level and the danger that if too low a level is used,
problems may arise through the introduction of excessive background
noise. The available dynamic range will vary with equipment which is
being used, but a good compromise would seem to be to record the
calibration tone at such a level that, when the hearing level control
on the audiometer is set to zero, the level produced by the calibration
signal is 20 dB SPL.


>From J.G.Beerends@research.kpn.com  Tue Mar 17 08:34:00 1998

There is an ITU (International Telecommunication Union) recommendation
that deals with this problem.
It is called P.56 as attached below.
There is software available within a special ITU software group.



>From moeller@rapido.aea.ruhr-uni-bochum.de  Tue Mar 17 23:35:46 1998

regarding SPL measurements for speech, there
exists a recommendation for telephony, used
by ITU-T (former CCITT). The according number
is ITU-T Recommendation P.56 (Objective
measurement of active speech level), and
the electronic version can be obtained by
the ITU-T. Further details can be found at



>From mike.brookes@IC.AC.UK  Tue Mar 17 23:58:52 1998

This topic is covered in ITU-T standard P.56: "Objective Measurement
of Active Speech Level". The method described in the standard only
works if the SNR is better than 21dB or so.

The standard refers to two papers (which I haven't looked at myself):
(1) R.W.Berry: Speech-volumne measurements on telephone circuits,
    Proc IEE, Vol 118, No 2, pp 335-338, Feb 1971
(2) P.T.Brady: Equivalent Peak Level: a threshold independent speech level
    JASA, Vol 44, pp695-699, 1968.


Athanassios Protopapas, Ph.D.
Principal Scientist
Scientific Learning Corporation
1995 University Ave., Ste. 400
Berkeley, CA 94704-1074, U.S.A.
Ph:510-665-6721 Fx:510-665-1275