Re: [Fwd: technical notes on data used by Martin Braun] (Paul Boersma )


Subject: Re: [Fwd: technical notes on data used by Martin Braun]
From:    Paul Boersma  <paul.boersma(at)HUM.UVA.NL>
Date:    Sat, 16 Jun 2001 20:24:52 +0200

Contrary to my suggestion last week, Martin Braun's data cannot be the result of temporal sampling. However, I did *not* reach this conclusion on the basis of the following earlier exchange in this discussion: Alain de Cheveigné asked: > The sort of things that come to mind are: > - were period estimates derived with sample- or subsample- resolution? At > what sampling rate? Bob Ladd answered: > As noted above, these were not values of successive pitch periods, but > values of successive analysis frames in acoustic F0 extraction, based > on speech sampled at 16k. Martin Braun (p.c.) has forwarded this as evidence that there was no binning bias in his measurements. However, the most frequently used algorithm with analysis frames (of fixed duration) is the autocorrelation method. The pitch period can be estimated as the location of the first peak in the autocorrelation function of the windowed signal. The time resolution of this peak is exactly the same as that of the original signal, i.e. 1/16000 of a second. Hence Alain's and my worries. Apparently, the issue was not considered as a possible problem in Martin's paper. Nevertheless, I have been informed now that the pitch extraction method used in Martin's measurements was subharmonic summation (Dik Hermes, JASA 1988), as implemented in GIPOS. This brings us much closer to the truth. This algorithm does not exhibit a temporal sampling problem, but it does have a problem with sampling in the frequency domain. The SHS algorithm works in the spectral domain. It begins by downsampling the signal to 2500 Hz, and it uses windows of 256 samples. The frequency resolution, therefore, is 9.8 Hertz. In order to improve on this rather coarse frequency sampling, parabolic interpolation is applied at the end. But the appropriateness of parabolic interpolation is reduced by several previous steps in the algorithm: a non-linear peak enhancement on the squared spectrum; a non-linear smoothing of the squared spectrum; spline interpolation. (I have described here the steps that are implemented in the freely available SHS version in the Praat program; I have no access to the GIPOS implementation). In order to check whether the 9.8 Hertz sampling has any effect on the outcomes of SHS, I fed the algorithm with a linear sinusoidal sweep from 120 to 240 Hz. In Praat, this goes like: Create Sound... sweep 0 1 16000 sin(2*pi*x*(120+0.5*120*x/xmax)) To Pitch (shs)... 0.0001 50 15 1250 15 0.84 600 48 Draw... 0 1 120 240 yes When the resulting Pitch object is drawn, the 9.8 Hz sawtooth becomes dramatically clear. In order to see how large the effect on binning is, I ran the following Praat script: for bin from 14 to 60 n'bin' = 0 endfor Create Sound... sweep 0 1 16000 sin(2*pi*x*(120+0.5*120*x/xmax)) To Pitch (shs)... 0.0001 50 15 1250 15 0.84 600 48 numberOfFrames = Get number of frames for frame to numberOfFrames f0_shs = Get value in frame... frame Semitones if f0_shs <> undefined bin = round (f0_shs * 4) n'bin' = n'bin' + 1 endif endfor echo Bin n for bin from 14 to 60 n = n'bin' printline 'bin' 'n' endfor The output of this script, as anyone can check, is: Bin n 14 65 15 144 16 127 17 125 18 148 19 240 20 160 21 135 22 132 23 169 24 243 25 164 26 140 27 140 28 192 29 257 30 160 31 156 32 176 33 264 34 180 35 172 36 176 37 287 38 196 39 156 40 199 41 294 42 202 43 181 44 228 45 292 46 196 47 199 48 321 49 225 50 203 51 312 52 253 53 208 54 279 55 304 56 204 57 288 58 320 59 228 60 161 We see that the number of values in each bin varies greatly. This not a result of any properties of the sweep: if instead of SHS we simply use "To Pitch... 0.0001 50 600" in Praat, we get a smooth monotonically increasing number of values per bin. So in SHS we see peaks in bins 19, 24, 29, 33, 37, and so on. These are systematic errors, due to the use of a biased interpolation method (there is no random variation in the numbers above). It is clear that this pitch extraction algorithm is unsuitable for binning. Martin Braun's far-reaching claims about ACDEFG pitch preferences should depend on an unbiased binning method, and he should have at least verified this by feeding a sweep or random data to the algorithm. If his conclusions still stand after reanalysing the same data with an unbiased interpolation method (as in Praat's default pitch extraction method, which gives a straight line instead of a sawtooth), we may reconsider the validity of his claims. Until then, we should be skeptical about the use of a measurement method whose systematic error is larger than the effect that it is supposed to have detected. Best wishes, Paul -- Paul Boersma Institute of Phonetic Sciences, University of Amsterdam Herengracht 338, 1016CG Amsterdam, The Netherlands http://www.fon.hum.uva.nl/paul/ phone +31-20-5252385


This message came from the mail archive
http://www.auditory.org/postings/2001/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University