Objective Segmenting (Bernard Kripkee )


Subject: Objective Segmenting
From:    Bernard Kripkee  <kripkee(at)VERIZON.NET>
Date:    Fri, 18 Nov 2005 09:39:07 -0500

This is a multi-part message in MIME format. ------=_NextPart_000_0005_01C5EC23.EC9119F0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dear Stephen and List, =20 There can be no objective way to segment words from a sound stream = because the sound stream can be ambiguous, lending itself to several different interpretations with different choices of word boundaries. =20 Here are some examples of distinct verbal expressions that can be represented by the same sound stream: =20 They regret / their egret =20 Bookmark it / book market =20 Farmer Mays produces good, fresh corn / Farmer May's produce is good, = fresh corn =20 [In this case, the sound stream can be disambiguated by varying the fundamental frequency contour, leaving the formants invariant. If there = is a pitch peak on "pro," one will hear "produce" as a noun. If there is a pitch peak on "duce," one will hear "produce" as a verb.] =20 You can find many additional examples in an article entitled "Speech Recognition Follies" by David Pogue, New York Times, August 15, 2002. = He collected his examples from errors made by speech recognition software. =20 There is a lengthy discussion of the problems posed for speech = recognition software by the massive ambiguities of spoken utterances in Capire le = parole [Understanding words] by Tullio de Mauro, Sagittari Laterza, Rome = (1994). =20 =20 A review of some of the problems in identifying word boundaries can be = found in "Prosody in the Comprehension of Spoken Language: A Literature = Review" by Anne Cutler, Delphine Dahan, and Wilma van Donselaar, Language and = Speech 40 (1997), 141-201. =20 Regards, Bernard Kripkee ------=_NextPart_000_0005_01C5EC23.EC9119F0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html> <head> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <meta name=3DGenerator content=3D"Microsoft Word 10 (filtered)"> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:Arial;} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} span.EmailStyle17 {font-family:Arial; color:windowtext;} (at)page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} --> </style> </head> <body lang=3DEN-US link=3Dblue vlink=3Dpurple> <div class=3DSection1> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>Dear Stephen and List,</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>There can be no objective way to segment words from a sound stream because the = sound stream can be ambiguous, lending itself to several different = interpretations with different choices of word boundaries.</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>Here are some examples of distinct verbal expressions that can be represented = by the same sound stream:</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>They regret / their egret</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>Bookmark it / book market</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>Farmer Mays <b><span style=3D'font-weight:bold'>produces</span></b> good, fresh = corn / Farmer May&#8217;s <b><span style=3D'font-weight:bold'>produce = is</span></b> good, fresh corn</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>[In this case, the sound stream can be disambiguated by varying the = fundamental frequency contour, leaving the formants invariant.&nbsp; If there is a = pitch peak on &#8220;pro,&#8221; one will hear &#8220;produce&#8221; as a = noun.&nbsp; If there is a pitch peak on &#8220;duce,&#8221; one will hear = &#8220;produce&#8221; as a verb.]</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>You can find many additional examples in an article entitled &#8220;Speech Recognition Follies&#8221; by David Pogue, New York Times, = </span></font><font size=3D2><span style=3D'font-size:10.0pt'>August 15, = 2002</span></font><font size=3D2><span style=3D'font-size:10.0pt'>.&nbsp; He collected his examples from errors = made by speech recognition software.</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>There is a lengthy discussion of the problems posed for speech recognition = software by the massive ambiguities of spoken utterances in <u>Capire le parole</u> [Understanding words] by Tullio de Mauro, Sagittari Laterza, = </span></font><font size=3D2><span style=3D'font-size:10.0pt'>Rome</span></font><font = size=3D2><span style=3D'font-size:10.0pt'> (1994).&nbsp; </span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>A review of some of the problems in identifying word boundaries can be = found in &#8220;Prosody in the Comprehension of Spoken Language: A Literature Review&#8221; by = Anne Cutler, Delphine Dahan, and Wilma van Donselaar, Language and Speech 40 = (1997), 141-201.</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>&nbsp;</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>Regards,</span></font></p> <p class=3DMsoNormal><font size=3D2 face=3DArial><span = style=3D'font-size:10.0pt'>Bernard Kripkee</span></font></p> </div> </body> </html> ------=_NextPart_000_0005_01C5EC23.EC9119F0--


This message came from the mail archive
http://www.auditory.org/postings/2005/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University