Dear Stephen and List,
There can be no objective way to segment words from a sound stream because the sound stream can be ambiguous, lending itself to several different interpretations with different choices of word boundaries.
Here are some examples of distinct verbal expressions that can be represented by the same sound stream:
They regret / their egret
Bookmark it / book market
Farmer Mays produces good, fresh corn / Farmer May’s produce is good, fresh corn
[In this case, the sound stream can be disambiguated by varying the fundamental frequency contour, leaving the formants invariant. If there is a pitch peak on “pro,” one will hear “produce” as a noun. If there is a pitch peak on “duce,” one will hear “produce” as a verb.]
You can find many additional examples in an article entitled “Speech Recognition Follies” by David Pogue, New York Times, August 15, 2002. He collected his examples from errors made by speech recognition software.
There is a lengthy discussion of the problems posed for speech recognition software by the massive ambiguities of spoken utterances in Capire le parole [Understanding words] by Tullio de Mauro, Sagittari Laterza, Rome (1994).
A review of some of the problems in identifying word boundaries can be found in “Prosody in the Comprehension of Spoken Language: A Literature Review” by Anne Cutler, Delphine Dahan, and Wilma van Donselaar, Language and Speech 40 (1997), 141-201.