Neil Todd's thoughtful contribution was informative. I only add a brief
caveat to this discussion regarding the assumption of phonetic stages.
It is sometimes useful to consider how speech perception would work if
phonemes did not exist as an intervening stage between acoustic input
and lexicon. Speech perception researchers, including myself, have
perseverated on perception of phonemes as if they are the real entities
to be explained. Much of this heritage arises from the utility of
phonemes to efficiently describe distinctions between morphemes. As
such, phonemes are an invention by linguists to describe language at
a given level of detail. Phonemes may or may not exist as a separable
level of analysis in the process of speech perception.

It sometimes is useful to imagine a lexicon that is primary encoded in
auditory dimensions. What is true is that, if one wishes to economically
describe the variance in this lexical space (e.g., principle components),
much or most of the variance in the space could be described in terms
of dimensions that map fairly well on to phonetic distinctions. However,
this simply recapitulates the linguists' descriptive claims. It does
not necessarily afford phonemes any process role. Instead, phonemes may
be an emergent property of a sufficiently well-populated lexical space.
I have not scrutinized the literature Todd shares, so I'd like to maintain
some caution before claiming that those findings can be reinterpretted
without recourse to a phonetic stage of processing. I would be hopeful,
however, that this is the case.

By avoiding commitments to phonemes, I've found it easier to think about
processes of perceptual development concerning perception of speech. Many
studies (e.g., Werker, Kuhl) concerning changes in speech perception over
the first year of life often are interpretted as evidence that infants
"learn" the mapping of phonetic units specific to their language environment.
The functional question is what is the cash value of phonemes to an infant.
Does the infant learn phonetic categories (which have no communicative
value in and of themselves) simply to be able to later recognize sequences
of phonemes to understand words? I would argue that it may be more sensible
to consider the infant to be developing an auditory lexicon based upon exposure
to words, and that the apparent development of phonetic categories is really
the typical result of perceptual learning whereby the system comes to efficient-
ly encode variance and covariance in the environment. (Note: words do not exist
as segmented entities in the input, and related statistical learning serves the
purpose of parsing fluent connected speech into words, e.g., Saffran's data
from 7-month-olds.)

I realize that this has come a long way from concerns about how one separates
speech from other environmental sounds. It is worth considering, however, that
Dennis Klatt's LAFS (lexical access from spectra) model of speech recognition
may turn out to be closer to reality than would be apparent from contributions
by speech perception researchers such as myself. It may pay to not be sanguine
that the goal of speech perception is to arrive at phonemes.

