[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: voiced/unvoiced detection

At 10:11 AM 11/5/98 -0500, Keith D. Martin wrote:

>....  I subscribe to the interpretation that it is the
>alignment of these peaks across multiple channels that generates a pitch
>sensation rather than the "sharpness" of the peaks, either in individual
>channels or in the summary. This alignment is, of course, reflected in the
>summary autocorrelation, but summing across channels is only one of many
>ways of detecting it (this fact is pointed out in some of the papers from
>around 1990). And the width of the peak in the summary autocorrelation
>depends more on the strength of the various partials in a harmonic signal
>than it does on the "pitchiness" of the sound. So the degree of
>"pitchiness" might be related to the degree of across-channel structure in
>the image....

Just for the fun of making a historical argument, I would like to point out
that a similar idea was expressed in 1977 by Egbert de Boer ("Pitch
theories unified" in Psychophysics and Physiology of Hearing, E.F.Evans &
J.P. Wilson, eds., AP, London, pp.323-334). However, de Boer did not base
his model on autocorrelation. Rather, he obtained his pitch function
("cardinal function") by considering pitch formation to be a stochastic
process in which various alternative (instantaneous) pitches may coexist.
The width of the pitch peak, therefore, is synonymous with variability,
i.e., the function could be regarded as a density. Of course, pitch
uncertainty, i.e., pitch density, could look very similar to
autocorrelations, summed or not. I can't help having a personal preference
for the probability density interpretation because it is broad enough to
include summary autocorrelation as well as many other models.

The nicety of this model is that the problem of whispered or noisy speech
finds an instantaneous solution. The vocal tract may be excited by any good
old excitation waveform, Gussian-like noise from the bronchi, an artificial
larynx vibrator, or the vocal folds in various stages of laryngitis,
producing a continuum of standard deviation magnitudes. Naturally, the
shape of the vocal tract does not care what the excitation waveform is and,
provided the excitation is sufficiently intense, the result will be always
the speech sound corresponding to the shape. That is, if I were able to
whisper louder than the highway noise, I could be perfectly intelligible
speaking in a car with the windows down, despite the fact that the
autocorrelation of the speech I am producing would be absolutely flat.

To continue history, in 1978 de Boer also wrote a more detailed version of
the above cited paper, called "Analytic pitch theories" which he never
published. Interested colleagues are encouraged to write him and request a
copy. He will be very surprised...


Pierre Divenyi             Experimental Audiology Research (151)
                                     V.A. Medical Center, Martinez, CA
94553, USA
Phone: (925) 370-6745
Fax:     (925) 228-5738
E-mail :                       pdivenyi@marva4.ebire.org

McGill is running a new version of LISTSERV (1.8d on Windows NT). 
Information is available on the WEB at http://www.mcgill.ca/cc/listserv