[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Keith R. Kluender wrote:
>Neil Todd's thoughtful contribution was informative. I only add a brief
>caveat to this discussion regarding the assumption of phonetic stages.
>It is sometimes useful to consider how speech perception would work if
>phonemes did not exist as an intervening stage between acoustic input
>and lexicon. Speech perception researchers, including myself, have
>perseverated on perception of phonemes as if they are the real entities
>to be explained. Much of this heritage arises from the utility of
>phonemes to efficiently describe distinctions between morphemes. As
>such, phonemes are an invention by linguists to describe language at
>a given level of detail. Phonemes may or may not exist as a separable
>level of analysis in the process of speech perception.
>It sometimes is useful to imagine a lexicon that is primary encoded in
>auditory dimensions. What is true is that, if one wishes to economically
>describe the variance in this lexical space (e.g., principle components),
>much or most of the variance in the space could be described in terms
>of dimensions that map fairly well on to phonetic distinctions. However,
>this simply recapitulates the linguists' descriptive claims. It does
>not necessarily afford phonemes any process role. Instead, phonemes may
>be an emergent property of a sufficiently well-populated lexical space.
>I have not scrutinized the literature Todd shares, so I'd like to maintain
>some caution before claiming that those findings can be reinterpretted
>without recourse to a phonetic stage of processing. I would be hopeful,
>however, that this is the case.
In summarising the neurological literature as I have discerned it, I did not
wish to give the impression that I subscribe to the phoneme construct or that
speech perception proceeds in three detached information processing black box
stages. The notion of a phonological encoding process is a convenient fiction
to describe what is more likely a hierarchical processes, involving multiple
levels of analysis, phonetic features, sub-syallabic and syllabic features,
as distinct from a primitive acoustic level. At risk of taxing the patience of
list members here is another edited extract from chapter.
In the last few years there has been a realisation that some new directions are
required in the field of speech perception if further progress is to be made
(Nygaard and Pisoni, 1995; Greenberg, 1996). This realisation has come about
due to the persistence of a number of issues.
The first issue is that of segmentation. Close examination of a speech signal
shows considerable overlap of phonetic units, reflecting the continuous nature
of vocal tract activity in speech. In an attempt to bridge the gap between
these empirical observations and traditional linear phonologies, attempts have
been made to develop a more non-linear phonology which takes into account
the gestural nature of speech production (Fowler, 1996). Such developments,
however, offer few insights into the process by which phonetic information
is recovered. The second issue is that of invariance. Even if it is possible
to segment the signal, there is no invariant set of acoustic features or
properties that correspond uniquely to particular phonemes. This variability
has a number of sources. One source is coarticulation, which, like phonetic
overlap, is due to the continuous nature of vocal tract activity. The second
is speaker variability. Even when one allows for the often extreme differences
of phonetic realisation due to dialectal variation, considerable differences
still remain, due to factors such as variation in the size and shape of the
individual vocal tract, age, gender, and speaking rate.
There are two main reasons why the issues remain unresolved. The first is the
widespread presupposition that the proper unit of analysis is the phoneme or
phonetic feature. The evidence would appear to suggest, however, that the
system makes use of all levels of analysis, with no one level more important
than another. The second, more fundamental reason is the underlying assumption
of most theories and models of speech perception that there exists a lexicon of
abstract, canonical prototypes in LTM, which are compared with the incoming
The assumption has two important consequences:
(1) Sources of variation are regarded as "noise", which must be normalised away
before recognition can take place.
(2) Prosody is regarded as peripheral to the process of speech recognition,
providing at best indirect cues to the recovery of morphosyntactic information.
The result is that potentially rich sources of linguistic (and paralinguistic)
information are disregarded. Speaker variability, for example, not only provides
information important for speaker identification but also for correct pragmatics
(Nygaard and Pisoni, 1995), and there is evidence that listeners retain much
relevant detail in LTM.
More serious, however, is the disregard of prosody, since there is ample
that it plays an absolutely central role in spoken language. According to
Nygaard and Pisoni (1995) "it is apparent that prosody provides a crucial
connection between segments, features, words and higher-level grammatical
In addition, prosody provides useful information regarding lexical identity,
syntactic structure, and the semantic content of a talker's utterance" [p.74].
Indeed, it would not be unreasonable to suggest that it is prosody that actually
holds spoken language together. Without rhythm or melody, speech would be
incomprehensible or meaningless sound.
Greenberg, S. (1996) Understanding speech understanding: Towards a unified
theory of speech
perception. In W. Ainsworth and S. Greenberg (Eds). Proceedings of the
Workshop on The auditory basis of speech perception. Keele, July, 1996.pp 1-7.
Nygaard, L. and Pisoni, D. (1995) Speech perception: New directions in research
In J. Miller and P. Eimas (Eds.) Speech, language and communication. Handbook of
and Cognition 2nd Edition. Academic: San Diego. pp 63-96.
Fowler, C. (1996) Speaking. In Handbook of Perception and Action, Volume 2.
San Diego. pp 503-560.
I think you will agree that Keith that my own position is actually very close to