Re: Importance of "phase" in sound recognition (ita katz )


Subject: Re: Importance of "phase" in sound recognition
From:    ita katz  <itakatz@xxxxxxxx>
Date:    Mon, 11 Oct 2010 09:09:34 +0200
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--0016361e87e6e89a630492520a0f Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Actually the firing pattern of the auditory nerves is phase-locked to the stimuli, at least for frequencies which do not exceed ~3kHz. The reason is that the ion channels in the hair cells are widened when the associated point on the basilar membrane is at the crest of the wave (think of a buckled rod - the 'outer' part is decompressed and the inner part is compressed). When the ion channel is widened, ions flow more easily and the potential develops faster. At higher frequencies the capacitance of the cell, which acts as a low-pass filter, decreases the phase-locking. So the ear preforms analysis both in the spectral domain and the time domain. (Of course the perception of phase is another question). On Sun, Oct 10, 2010 at 11:35 PM, John Bates <jkbates@xxxxxxxx> wrote: > Emad, > > > > Here's something else to consider for your research. > > > > Traditionally, it has been dogma that the cochlea responds only to a > sound=92s amplitude spectrum; therefore we should not hear changes caused > by varying phase. Yet it has been shown repeatedly that we do hear change= s > in sounds as their phase spectrum is varied. How can this be? > > > > Let's look at the problem: In terms of spectral analysis, we find that as > we vary the phase the amplitude spectrum is invariant. Therefore, we > conclude that the perceived changes in the sound are associated with chan= ges > in the phase spectrum. Somehow, the ear must be responding to a supposedl= y > irrelevant phase spectrum. But where is the evidence? > > > > Here=92s an idea: If we look at the signal's waveform, we notice that its > pattern also varies in accord with the phase variations. Thus, it would > appear that in lieu of a phase analyzer, the ear "reads" waveforms. As > absurd as this might seem, how else could the sound changes be heard? We = are > thus convinced that the cochlea must be processing a phase/waveform sourc= e. > Now we ask, =93What is the most available and usable expression of wavefo= rm?=94 > > > > Spatial patterns can be described in terms of their inflection points, in > our case, having time-space locations identified by sequences of real and > complex zeros, readily obtained physically by finding the waveform > derivatives. (H. Voelker and A. Requicha) By using delay lines to preserv= e > past events for present use (the cochlea?), meaningful temporal patterns = in > the stream of zeros (pitch?) can be recognized. Information such as > amplitude and direction of arrival can be associated with patterns of eve= nts > that are referenced to the zeros. In simple terms; the ear processes soun= d > in the time domain, not the frequency domain. The trick is to find out ho= w > the ear does these things. And keep in mind that they are done in real ti= me > and are synchronized with the signal waveform. > > > > So, there you are: The most likely answer for you, that I can see, is tha= t > the cochlea and its various parts must derive meaningful information > from signal waveforms by recognizing patterns in the temporal sequences > of their zeros. > > > > John Bates > ** > ** > *From:* emad burke > > *To:* AUDITORY@xxxxxxxx > *Sent:* Tuesday, October 05, 2010 11:23 AM > *Subject:* About importance of "phase" in sound recognition > > Dear List, > > I've been confused about the role of "phase" information of the sound (eg > speech) signal in speech recognition and more generally human's perceptio= n > of audio signals. I've been reading conflicting arguments and publication= s > regarding the extent of importance of phase information. if there is a > border between short and long-term phase information that clarifies this > extent of importance, can anybody please introduce me any convincing > reference in that respect. In summary I just want to know what is the > consensus in the community about phase role in speech recognition, of cou= rse > if there is any at all. > > Best > Emad > > --0016361e87e6e89a630492520a0f Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Actually the firing pattern of the auditory nerves is phas= e-locked to the stimuli, at least for frequencies which do not exceed ~3kHz= . The reason is that the ion channels in the hair cells are widened when th= e associated point on the basilar membrane is at the crest of the wave (thi= nk of a buckled rod - the &#39;outer&#39; part is decompressed and the inne= r part is compressed). When the ion channel is widened, ions flow more easi= ly and the potential develops faster. At higher frequencies the capacitance= of the cell, which acts as a low-pass filter, decreases the phase-locking.= <br> <br>So the ear preforms analysis both in the spectral domain and the time d= omain. (Of course the perception of phase is another question).<br><br><div= class=3D"gmail_quote">On Sun, Oct 10, 2010 at 11:35 PM, John Bates <span d= ir=3D"ltr">&lt;<a href=3D"mailto:jkbates@xxxxxxxx">jkbates@xxxxxxxx</= a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div bgcolor=3D"#ffffff"> <div> <div> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">Emad,</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">Here&#39;s=20 something else to consider for your research.</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">Traditionally,=20 it has been dogma that=A0the cochlea responds only to a sound=92s amplitude= =20 spectrum; therefore we should not hear changes=A0caused by=A0varying=20 phase. Yet it has been shown=A0repeatedly that we=A0do hear changes in=20 sounds as their phase spectrum is varied. How can this be?</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">Let&#39;s=20 look at the problem: In terms of spectral analysis, we find that=A0as we va= ry=20 the phase the amplitude spectrum is invariant. Therefore, we conclude=A0tha= t=20 the perceived changes in the sound are associated with changes in the phase= =20 spectrum. Somehow, the ear must be responding to a supposedly irrelevant ph= ase=20 spectrum. But where is the evidence?</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">Here=92s=20 an idea: If we look at the signal&#39;s waveform, we notice that its patter= n also=20 varies in accord with the phase variations. Thus, it would appear that in l= ieu=20 of a phase analyzer, the ear &quot;reads&quot; waveforms. As absurd as this= might seem,=20 how else could the sound changes be heard? We are thus convinced that the= =20 cochlea=A0must=A0be processing a phase/waveform source. Now we=20 ask,=A0=93What is the most available and usable expression of=20 waveform?=94</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">Spatial=20 patterns can be described in terms of their inflection points, in our case,= =20 having time-space locations identified by sequences of=A0real and complex= =20 zeros, readily obtained physically=A0by finding the waveform derivatives. (= H.=20 Voelker and A. Requicha) By using delay lines to preserve past events for= =20 present use (the cochlea?),=A0meaningful temporal patterns in the stream of= =20 zeros (pitch?)=A0can be recognized.=A0Information such as amplitude and=20 direction of arrival can be associated with patterns of events that are=20 referenced to the zeros.=A0In simple terms; the ear processes sound in the= =20 time domain, not the frequency domain. The trick is to find out how the ear= does=20 these things. And keep in mind that they are done in real time and are=20 synchronized with the signal waveform.</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">So,=20 there you are: The most likely answer for you,=A0that I can see, is that th= e=20 cochlea and its various parts must derive meaningful information=20 from=A0signal waveforms by recognizing patterns in the temporal sequences= =20 of=A0their zeros.</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">=A0</span></p> <p style=3D"line-height: normal; margin: 0in 0in 0pt 0.5in;" class=3D"MsoNo= rmal"><span style=3D"font-family: &#39;Times New Roman&#39;,&#39;serif&#39;= ; font-size: 12pt;">John=20 Bates</span></p></div> <div><b></b>=A0</div> <div><b></b>=A0</div> <div><b>From:</b> <a title=3D"emad.burke@xxxxxxxx">emad=20 burke</a> </div> <blockquote style=3D"border-left: 2px solid rgb(0, 0, 0); padding-left: 5px= ; padding-right: 0px; margin-left: 5px; margin-right: 0px;"> <div style=3D"font: 10pt arial;"><b>To:</b> <a title=3D"AUDITORY@xxxxxxxx= GILL.CA">AUDITORY@xxxxxxxx</a>=20 </div> <div style=3D"font: 10pt arial;"><b>Sent:</b> Tuesday, October 05, 2010 1= 1:23=20 AM</div> <div style=3D"font: 10pt arial;"><b>Subject:</b> About importance of &quo= t;phase&quot; in=20 sound recognition</div> <div><font face=3D"Arial" size=3D"2"></font><br></div>Dear List,<br><br>I= &#39;ve been=20 confused about the role of &quot;phase&quot; information of the sound (eg= speech) signal=20 in speech recognition and more generally human&#39;s perception of audio = signals.=20 I&#39;ve been reading conflicting arguments and publications regarding th= e extent=20 of importance of phase information. if there is a border between short an= d=20 long-term phase information that clarifies this extent of importance, can= =20 anybody please introduce me any convincing reference in that respect. In= =20 summary I just want to know what is the consensus in the community about = phase=20 role in speech recognition, of course if there is any at=20 all.<br><br>Best<br>Emad</blockquote></div></div> </blockquote></div><br></div> --0016361e87e6e89a630492520a0f--


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2010/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University