voiced/unvoiced ("Alain de Cheveigne'" )

Subject: voiced/unvoiced
From:    "Alain de Cheveigne'"  <alain(at)LINGUIST.JUSSIEU.FR>
Date:    Thu, 5 Nov 1998 13:42:28 +0100

>In speech processing the voiced/unvoiced decision is usually considered >more difficult than the measurement of pitch itself. >How would you measure how strong the sensation of "pitchedness"? Does >this make sense at all, or it is a binary decision, that is, we either >hear or don't hear a pitch? >Especially, I'm looking for ideas about how to make the voiced/unvoiced >detection of speech using auditory-like processing, eg. the summary >autocorrelogram. In this case I'd guess I should measure how strongly the >peak "dominates" the summary autocorrelogram. What would give a measure >of this? E.g. a narrower peak means more definite pitch sensation than a >wide, diffuse one? Or it is the height of the peak compared to its >neighborhood that counts? If so, how wide "neighborhood" should I check? Good question. I've seen various suggestions to the effect that pitch might be weaker if there are multiple (ambiguous) peaks, or if the period peak is wide. More quantititatively, Kaernbach and Demany (1998) suggested using the ratio between peak and background within a 6ms portion of the autocorrelation of the waveform as a measure of pitch strength (the period was 10 ms). This measure is not entirely without problems. Yost (1996) showed that the ratio of period-peak to zero-lag peak of the autocorrelation of the waveform is a good predictor of pitch strength of IRN (iterated repetition noise) stimuli. Wiegrebe, Patterson, Demany and Carlyon (1998) have recently refined this result. They showed that the autocorrelation calculation must be modified for this measure to be valid for a wider class of stimuli. The same measure (peak ratios) can be derived from summary autocorrelograms, but I'm not sure if it predicts pitch strength so well. The ratio of period-peak to zero-lag peak of the autocorrelation function is directly related to the depth of "period valleys" relative to the rest of the cancellation pattern in my own cancellation model (de Cheveigne, 1998). For perfectly periodic stimuli the ACF peak ratio is 1, while the cancellation pattern valley ratio is 0. The cancellation pitch model is related to the AMDF method of speech F0 estimation. The depth of the AMDF valley has been used as "periodicity measure" related to voicing. A similar measure can be derived from peaks of the autocorrelation function of the speech waveform. A difficulty in applying perceptual models to speech processing is of course that pitch and F0 are not quite the same thing. Also, voicing is not quite synonymous with periodicity (glottal pulses are sometimes irregular, and sometimes even occur in isolation). I wouldn't claim that an AMDF-derived periodicity measure solves the problem of voicing detection. Alain --- de Cheveigne, A. (1998). "Cancellation model of pitch perception," J. Acoust. Soc. Am. 103, 1261-1271. Kaernbach, C., and Demany, L. (1998). "Psychophysical evidence against the autocorrelation theory of pitch perception," JASA 104, 2298-2306. Wiegrebe, L., Patterson, R. D., Demany, L., and Carlyon, R. P. (1998). "Temporal dynamics of pitch strength in regular interval noises," JASA 104, 2307-2313. Yost, W. A. (1996). "Pitch strength of iterated rippled noise," JASA 100, 3329-3335. --- ------------------------------------------------------------------ Alain de Cheveigne' Laboratoire de Linguistique Formelle, CNRS / Universite' Paris 7, case 7003, 2 place Jussieu, 75251 Paris CEDEX 05, FRANCE. phone: +33 1 44273633, fax: +33 1 44277919 e-mail: alain(at)linguist.jussieu.fr http://www.linguist.jussieu.fr/~alain/ ------------------------------------------------------------------ Email to AUDITORY should now be sent to AUDITORY(at)lists.mcgill.ca LISTSERV commands should be sent to listserv(at)lists.mcgill.ca Information is available on the WEB at http://www.mcgill.ca/cc/listserv

This message came from the mail archive
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University