[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: technical notes on data used by Martin Braun]



Contrary to my suggestion last week, Martin Braun's data cannot be
the result of temporal sampling. However, I did *not* reach this conclusion
on the basis of the following earlier exchange in this discussion:

Alain de Cheveigné asked:

> The sort of things that come to mind are:
> - were period estimates derived with sample- or subsample- resolution?  At
> what sampling rate?

Bob Ladd answered:

> As noted above, these were not values of successive pitch periods, but
> values of successive analysis frames in acoustic F0 extraction, based
> on speech sampled at 16k.

Martin Braun (p.c.) has forwarded this as evidence that there was no binning
bias in his measurements. However, the most frequently used algorithm with
analysis frames (of fixed duration) is the autocorrelation method. The pitch period
can be estimated as the location of the first peak in the autocorrelation function
of the windowed signal. The time resolution of this peak is exactly the same
as that of the original signal, i.e. 1/16000 of a second. Hence Alain's and my worries.
Apparently, the issue was not considered as a possible problem in Martin's paper.

Nevertheless, I have been informed now that the pitch extraction method used
in Martin's measurements was subharmonic summation (Dik Hermes, JASA 1988),
as implemented in GIPOS. This brings us much closer to the truth. This algorithm
does not exhibit a temporal sampling problem, but it does have a problem with
sampling in the frequency domain.

The SHS algorithm works in the spectral domain. It begins by downsampling
the signal to 2500 Hz, and it uses windows of 256 samples. The frequency
resolution, therefore, is 9.8 Hertz. In order to improve on this rather coarse
frequency sampling, parabolic interpolation is applied at the end. But the
appropriateness of parabolic interpolation is reduced by several previous steps
in the algorithm:
   a non-linear peak enhancement on the squared spectrum;
   a non-linear smoothing of the squared spectrum;
   spline interpolation.
(I have described here the steps that are implemented in the freely available
SHS version in the Praat program; I have no access to the GIPOS implementation).
In order to check whether the 9.8 Hertz sampling has any effect on the outcomes
of SHS, I fed the algorithm with a linear sinusoidal sweep from 120 to 240 Hz.
In Praat, this goes like:

   Create Sound... sweep 0 1 16000 sin(2*pi*x*(120+0.5*120*x/xmax))
   To Pitch (shs)... 0.0001 50 15 1250 15 0.84 600 48
   Draw... 0 1 120 240 yes

When the resulting Pitch object is drawn, the 9.8 Hz sawtooth becomes
dramatically clear. In order to see how large the effect on binning is,
I ran the following Praat script:

for bin from 14 to 60
   n'bin' = 0
endfor
Create Sound... sweep 0 1 16000 sin(2*pi*x*(120+0.5*120*x/xmax))
To Pitch (shs)... 0.0001 50 15 1250 15 0.84 600 48
numberOfFrames = Get number of frames
for frame to numberOfFrames
   f0_shs = Get value in frame... frame Semitones
   if f0_shs <> undefined
      bin = round (f0_shs * 4)
      n'bin' = n'bin' + 1
   endif
endfor
echo Bin n
for bin from 14 to 60
   n = n'bin'
   printline 'bin' 'n'
endfor

The output of this script, as anyone can check, is:

Bin n
14 65
15 144
16 127
17 125
18 148
19 240
20 160
21 135
22 132
23 169
24 243
25 164
26 140
27 140
28 192
29 257
30 160
31 156
32 176
33 264
34 180
35 172
36 176
37 287
38 196
39 156
40 199
41 294
42 202
43 181
44 228
45 292
46 196
47 199
48 321
49 225
50 203
51 312
52 253
53 208
54 279
55 304
56 204
57 288
58 320
59 228
60 161

We see that the number of values in each bin varies greatly.
This not a result of any properties of the sweep: if instead of SHS
we simply use "To Pitch... 0.0001 50 600" in Praat, we get a smooth
monotonically increasing number of values per bin.

So in SHS we see peaks in bins 19, 24, 29, 33, 37, and so on. These are
systematic errors, due to the use of a biased interpolation method
(there is no random variation in the numbers above).
It is clear that this pitch extraction algorithm is unsuitable for binning.
Martin Braun's far-reaching claims about ACDEFG pitch preferences
should depend on an unbiased binning method, and he should have at least
verified this by feeding a sweep or random data to the algorithm.
If his conclusions still stand after reanalysing the same data with an
unbiased interpolation method (as in Praat's default pitch extraction method,
which gives a straight line instead of a sawtooth), we may reconsider the
validity of his claims. Until then, we should be skeptical about the use of a
measurement method whose systematic error is larger than the effect
that it is supposed to have detected.

Best wishes,
   Paul

--

Paul Boersma
Institute of Phonetic Sciences, University of Amsterdam
Herengracht 338, 1016CG Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/
phone +31-20-5252385