Objective Segmenting (Bernard Kripkee )

Subject: Objective Segmenting From: Bernard Kripkee <kripkee(at)VERIZON.NET> Date: Fri, 18 Nov 2005 09:39:07 -0500 This is a multi-part message in MIME format. ------=_NextPart_000_0005_01C5EC23.EC9119F0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dear Stephen and List, =20 There can be no objective way to segment words from a sound stream = because the sound stream can be ambiguous, lending itself to several different interpretations with different choices of word boundaries. =20 Here are some examples of distinct verbal expressions that can be represented by the same sound stream: =20 They regret / their egret =20 Bookmark it / book market =20 Farmer Mays produces good, fresh corn / Farmer May's produce is good, = fresh corn =20 [In this case, the sound stream can be disambiguated by varying the fundamental frequency contour, leaving the formants invariant. If there = is a pitch peak on "pro," one will hear "produce" as a noun. If there is a pitch peak on "duce," one will hear "produce" as a verb.] =20 You can find many additional examples in an article entitled "Speech Recognition Follies" by David Pogue, New York Times, August 15, 2002. = He collected his examples from errors made by speech recognition software. =20 There is a lengthy discussion of the problems posed for speech = recognition software by the massive ambiguities of spoken utterances in Capire le = parole [Understanding words] by Tullio de Mauro, Sagittari Laterza, Rome = (1994). =20 =20 A review of some of the problems in identifying word boundaries can be = found in "Prosody in the Comprehension of Spoken Language: A Literature = Review" by Anne Cutler, Delphine Dahan, and Wilma van Donselaar, Language and = Speech 40 (1997), 141-201. =20 Regards, Bernard Kripkee ------=_NextPart_000_0005_01C5EC23.EC9119F0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html> <head> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <meta name=3DGenerator content=3D"Microsoft Word 10 (filtered)"> <style>  </style> </head> <body lang=3DEN-US link=3Dblue vlink=3Dpurple> <div class=3DSection1> Dear Stephen and List,   There can be no objective way to segment words from a sound stream because the = sound stream can be ambiguous, lending itself to several different = interpretations with different choices of word boundaries.   Here are some examples of distinct verbal expressions that can be represented = by the same sound stream:   They regret / their egret   Bookmark it / book market   Farmer Mays produces good, fresh = corn / Farmer May’s produce = is good, fresh corn   [In this case, the sound stream can be disambiguated by varying the = fundamental frequency contour, leaving the formants invariant.  If there is a = pitch peak on “pro,” one will hear “produce” as a = noun.  If there is a pitch peak on “duce,” one will hear = “produce” as a verb.]   You can find many additional examples in an article entitled “Speech Recognition Follies” by David Pogue, New York Times, = August 15, = 2002.  He collected his examples from errors = made by speech recognition software.   There is a lengthy discussion of the problems posed for speech recognition = software by the massive ambiguities of spoken utterances in Capire le parole [Understanding words] by Tullio de Mauro, Sagittari Laterza, = Rome (1994).    A review of some of the problems in identifying word boundaries can be = found in “Prosody in the Comprehension of Spoken Language: A Literature Review” by = Anne Cutler, Delphine Dahan, and Wilma van Donselaar, Language and Speech 40 = (1997), 141-201.   Regards, Bernard Kripkee </div> </body> </html> ------=_NextPart_000_0005_01C5EC23.EC9119F0--

This message came from the mail archive
http://www.auditory.org/postings/2005/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University