Laszlo Toth & the Robust Voicing Cue

At 06:29 PM 11/9/98 MET, Laszlo Toth wrote:
>Considering human speech understanding, I think that voicedness is a very
>robust acoustic cue, and I'm sure that we use it. It's another issue that
>we can do without it.

   The several questions and brief discussion about the "robust
voiced-voiceless cue" for speech seems to trivialize a very complex problem
that many of us have been working on for the decades since Hirsh’s (JASA
1959) seminal study.   A few sentences should help illustrate the
complexity of voicing contrast.  More than a decade ago Lisker noted that
approximately 32 different stimulus properties have been demonstrated to be
important for the voicing contrast and each could be considered a cue.  For
a few examples of the large literature on this topic, see studies by Lisker
et al. (Lang & Speech, 1977), Summerfield (JEP:HPP 1981; JASA 1982, w/
Haggard, JASA 1977), Darwin et al, (Speech Comm 1982; JASA 1983), Stevens &
 Klatt (JASA, 1974), and several studies from my lab (e.g., JASA 1982,
1986; P&P 1988).
    The focus of the current exchange seems to be centered on the
possibility that the voicing contrast reflects the ability to identify the
nature of the early (post release) portion of the stimulus.  Over the years
a number of us have made conjectures centered around the notion that one
important cue could be the ability to discriminate activation at onset of
the higher formants by the voiced versus noise source.  In fact, our
initial categorical perception study with noise-buzz sequences (Miller et
al., JASA 1976) demonstrated the feasibility of such a conjecture.
Furthermore, Repp (Lang & Speech 1979) demonstrated that the voicing
boundary changes as a function of changes in amplitude of the aspiration
noise and Al Bregman is correct that the voicing contrast is found with
whispered speech (probably with some alternative form of spectral contrast
and requiring a longer F1cutback).  However, there are some languages where
there is the voicing contrast, yet aspiration is weak or absent.  Thus,
although the ability to recognize some aspect of  the nature of the onset
portion of the stimulus seems to be important (and I have my own
conjectures about which ones), it is definitely not simply the ability to
discriminate between F0 and noise activation of the higher formants that is
singularly, or necessarily most, important.
    Given our current knowledge about the complex cues for voicing
contrast, it is difficult to imagine a model that accurately reflects human
perception of the voicing contrast - especially one that treats the
voiced-voiceless distinction as reflecting a single robust cue.  However,
the beginning of the current exchange on the List did include with a
question about one specific cue.   That specific question concerns the role
of pitch identification in type of onset recognition task possibly
represented by voicing contrast.  This very complex issue is the focus of a
manuscript which Ed Crawley and I currently have under review.
  - Dick Pastore

Richard E. Pastore
Director, Center for Cognitive and Psycholinguistic Sciences
Professor of Psychology and Linguistics
Binghamton University (SUNY University Center)
Binghamton, NY 13902-6000


