Re: [AUDITORY] Logan's theorem - a challenge (Ken Grant )


Subject: Re: [AUDITORY] Logan's theorem - a challenge
From:    Ken Grant  <ken.w.grant@xxxxxxxx>
Date:    Tue, 28 Sep 2021 04:43:32 -0400

--00000000000000834105cd0a36de Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable This discussion reminded me of an important and often overlooked paper by Ron Cole and Brian Scott *Cole, R. A., & Scott, B. (1974). Toward a theory of speech perception. Psychological Review, 81(4), 348=E2=80=93374. https://doi.org/10.1037/h0036656 <https://psycnet.apa.org/doi/10.1037/h0036656>* V/r Ken On Tue, Sep 28, 2021 at 2:42 AM Prof Leslie Smith <l.s.smith@xxxxxxxx> wrote: > I sen this originally Alain de Chaveigne, but perhaps I should have made > it more public. Here goes. > > Dear Alain: > > I did some related work with my student Madhuranda Pahar some while ago: > it ended up with the publication linked to below. > > What we did was to resynthesize speech (or any other sound) from > zero-crossings (positive-going only) in band-limited signals (using the > gamma tone filterbank) plus some information about the maximal size of th= e > signal in the previous half-cycle. > > In essence, given a surprisingly small number of channels, plus a little > information about the signal level (i.e. a log-based coding of the signal > amplitude in the previous half-cycle, using 4 or 5 values - called > threshold levels in the paper), one can quite easily make out the speech. > > It's not a wonderful paper, and could do with more work and more examples= , > and the resynthesis is not particularly straightforward (but that's not > important - what matters is the possibility of resynthesis, as the brain > interprets the AN signal, rather than re-creating it. And we'd never hear= d > of Logan's theorem (unfortunately!). > > Still, I hope this might be of interest. I believe i have the Matlab code > still (but it could do with being reworked. > > The paper can be found at > http://www.cs.stir.ac.uk/~lss/recentpapers/PID6701133.pdf > > Reference: M.Pahar, L.S. Smith Coding and Decoding Speech using a > Biologically Inspired Coding System > presented at IEEE SSCI 2020, (virtual conference) 1-4 December 2020. DOI > 10.1109/SSCI47803.2020.9308328. > > --Leslie Smith > > Alain de Cheveigne wrote: > > Hi all, > > > > Here=C3=A2=E2=82=AC=E2=84=A2s a challenge for the young nimble minds on= this list, and the old > > and wise. > > > > Logan=C3=A2=E2=82=AC=E2=84=A2s theorem states that a signal can be reco= nstructed from its zero > > crossings, to a scale, as long as the spectral representation of that > > signal is less than an octave wide. It sounds like magic given that ze= ro > > crossing information is so crude. How can the full signal be recovered > > from a sparse series of time values (with signs but no amplitudes)? > > =C3=A2=E2=82=AC=C5=93Band-limited=C3=A2=E2=82=AC is clearly a powerful= assumption. > > > > Why is this of interest in the auditory context? The band-limited > premise > > is approximately valid for each channel of the cochlear filterbank > > (sometimes characterized as a 1/3 octave filter). While cochlear > > transduction is non-linear, Logan=C3=A2=E2=82=AC=E2=84=A2s theorem sugg= ests that any > > information lost due to that non-linearity can be restored, within each > > channel. If so, cochlear transduction is =C3=A2=E2=82=AC=C5=93transpare= nt=C3=A2=E2=82=AC , which is > > encouraging for those who like to speculate about neural models of > > auditory processing. An algorithm applicable to the sound waveform can = be > > implemented by the brain with similar results, in principle. > > > > Logan=C3=A2=E2=82=AC=E2=84=A2s theorem has been invoked by David Marr f= or vision and several > > authors for hearing (some refs below). The theorem is unclear as to how > > the original signal should be reconstructed, which is an obstacle to > > formulating concrete models, but in these days of machine learning it > > might be OK to assume that the system can somehow learn to use the > > information, granted that it=C3=A2=E2=82=AC=E2=84=A2s there. The hypot= hesis has far-reaching > > implications, for example it implies that spectral resolution of centra= l > > auditory processing is not limited by peripheral frequency analysis (as > > already assumed by for example phase opponency or lateral inhibitory > > hypotheses). > > > > Before venturing further along this limb, it=C3=A2=E2=82=AC=E2=84=A2s w= orth considering some > > issues. First, Logan made clear that his theorem only applies to a > > perfectly band-limited signal, and might not be =C3=A2=E2=82=AC=C5=93ap= proximately valid=C3=A2=E2=82=AC > > for a signal that is =C3=A2=E2=82=AC=C5=93approximately band-limited=C3= =A2=E2=82=AC . No practical > > signal is band-limited, if only because it must be time limited, and th= us > > the theorem might conceivably not be applicable at all. On the other > > hand, half-wave rectification offers much richer information than zero > > crossings, so perhaps the end result is valid (information preserved) > even > > if the theorem is not applicable stricto sensu. Second, there are many > > other imperfections such as adaptation, stochastic sampling to a > > spike-based representation, and so on, that might affect the usefulness > of > > the hypothesis. > > > > The challenge is to address some of these loose ends. For example: > > (1) Can the theorem be extended to make use of a halfwave-rectified > signal > > rather than zero crossings? Might that allow it to be applicable to > > practical time-limited signals? > > (2) What is the impact of real cochlear filter characteristics, > > adaptation, or stochastic sampling? > > (3) In what sense can one say that the acoustic signal is "available=C3= =A2=E2=82=AC > to > > neural signal processing? What are the limits of that concept? > > (4) Can all this be formulated in a way intelligible by non-mathematica= l > > auditory scientists? > > > > This is the challenge. The reward is - possibly - a better understandi= ng > > of how our brain hears the world. > > > > Alain > > > > --- > > Logan BF, JR. (1977) Information in the zero crossings of bandpass > > signals. Bell Syst. Tech. J. 56:487=C3=A2=E2=82=AC=E2=80=9C510. > > > > Marr, D. (1982) VISION - A Computational Investigation into the Human > > Representation and Processing of Visual Information. W.H. Freeman and C= o, > > republished by MIT press 2010. > > > > Heinz, M.G., Swaminathan J. (2009) Quantifying Envelope and > Fine-Structure > > Coding in Auditory Nerve Responses to Chimaeric Speech, JARO 10: > 407=C3=A2=E2=82=AC=E2=80=9C423 > > DOI: 10.1007/s10162-009-0169-8. > > > > Shamma, S, Lorenzi, C (2013) On the balance of envelope and temporal fi= ne > > structure in the encoding of speech in the early auditory system, J. > > Acoust. Soc. Am. 133, 2818=C3=A2=E2=82=AC=E2=80=9C2833. > > > > Parida S, Bharadwaj H, Heinz MG (2021) Spectrally specific temporal > > analyses of spike-train responses to complex sounds: A unifying > framework. > > PLoS Comput Biol 17(2): e1008155. > > https://doi.org/10.1371/journal.pcbi.1008155 > > > > de Cheveign=C3=83=C2=A9, A. (in press) Harmonic Cancellation, a Fundame= ntal of > > Auditory Scene Analysis. Trends in Hearing (https://psyarxiv.com/b8e5w/ > ). > > > -- > Prof Leslie Smith (Emeritus) > Computing Science & Mathematics, > University of Stirling, Stirling FK9 4LA > Scotland, UK > Tel +44 1786 467435 > Web: http://www.cs.stir.ac.uk/~lss > Blog: http://lestheprof.com > --=20 Ken W. Grant, Ph.D. Chief, Scientific and Clinical Studies Section America Building, Room 5601 Walter Reed National Military Medical Center 4954 North Palmer Road Bethesda, MD 20889-5630 OFFICE: 301-319-7043 CELL: 301-919-2957 kenneth.w.grant.civ@xxxxxxxx ken.w.grant@xxxxxxxx --00000000000000834105cd0a36de Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto">This discussion reminded me of an important and often ove= rlooked paper by Ron Cole and Brian Scott</div><div dir=3D"auto"><span styl= e=3D"font-family:sans-serif;font-size:14px;color:rgb(51,51,51)"><br></span>= </div><div dir=3D"auto"><span><b><span style=3D"font-family:sans-serif;font= -size:14px;color:rgb(51,51,51)">Cole, R. A., &amp; Scott, B. (1974). Toward= a theory of speech perception.=C2=A0</span><em style=3D"box-sizing:border-= box;font-family:sans-serif;font-size:14px;color:rgb(51,51,51)">Psychologica= l Review, 81</em><span style=3D"font-family:sans-serif;font-size:14px;color= :rgb(51,51,51)">(4), 348=E2=80=93374.=C2=A0</span><a target=3D"_blank" href= =3D"https://psycnet.apa.org/doi/10.1037/h0036656" style=3D"box-sizing:borde= r-box;font-family:sans-serif;text-decoration:none;font-size:14px;color:rgb(= 44,114,183)">https://doi.org/10.1037/h0036656</a></b></span><br></div><div = dir=3D"auto"><span><br></span></div><div dir=3D"auto"><span>V/r</span></div= ><div dir=3D"auto"><span><br></span></div><div dir=3D"auto"><span>Ken</span= ></div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_= attr">On Tue, Sep 28, 2021 at 2:42 AM Prof Leslie Smith &lt;<a href=3D"mail= to:l.s.smith@xxxxxxxx">l.s.smith@xxxxxxxx</a>&gt; wrote:<br></div= ><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border= -left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:= rgb(204,204,204)">I sen this originally=C2=A0 Alain de Chaveigne, but perha= ps I should have made<br> it more public. Here goes.<br> <br> Dear Alain:<br> <br> I did some related work with my student Madhuranda Pahar some while ago:<br= > it ended up with the publication linked to below.<br> <br> What we did was to resynthesize speech (or any other sound) from<br> zero-crossings (positive-going only) in band-limited signals (using the<br> gamma tone filterbank) plus some information about the maximal size of the<= br> signal in the previous half-cycle.<br> <br> In essence, given a surprisingly small number of channels, plus a little<br= > information about the signal level (i.e. a log-based coding of the signal<b= r> amplitude in the previous half-cycle, using 4 or 5 values - called<br> threshold levels in the paper), one can quite easily make out the speech.<b= r> <br> It&#39;s not a wonderful paper, and could do with more work and more exampl= es,<br> and the resynthesis is not particularly straightforward (but that&#39;s not= <br> important - what matters is the possibility of resynthesis, as the brain<br= > interprets the AN signal, rather than re-creating it. And we&#39;d never he= ard<br> of Logan&#39;s theorem (unfortunately!).<br> <br> Still, I hope this might be of interest. I believe i have the Matlab code<b= r> still (but it could do with being reworked.<br> <br> The paper can be found at<br> <a href=3D"http://www.cs.stir.ac.uk/~lss/recentpapers/PID6701133.pdf" rel= =3D"noreferrer" target=3D"_blank">http://www.cs.stir.ac.uk/~lss/recentpaper= s/PID6701133.pdf</a><br> <br> Reference: M.Pahar, L.S. Smith Coding and Decoding Speech using a<br> Biologically Inspired Coding System<br> presented at IEEE SSCI 2020, (virtual conference) 1-4 December 2020. DOI<br= > 10.1109/SSCI47803.2020.9308328.<br> <br> --Leslie Smith<br> <br> Alain de Cheveigne wrote:<br> &gt; Hi all,<br> &gt;<br> &gt; Here=C3=A2=E2=82=AC=E2=84=A2s a challenge for the young nimble minds o= n this list, and the old<br> &gt; and wise.<br> &gt;<br> &gt; Logan=C3=A2=E2=82=AC=E2=84=A2s theorem states that a signal can be rec= onstructed from its zero<br> &gt; crossings, to a scale, as long as the spectral representation of that<= br> &gt; signal is less than an octave wide.=C2=A0 It sounds like magic given t= hat zero<br> &gt; crossing information is so crude. How can the full signal be recovered= <br> &gt; from a sparse series of time values (with signs but no amplitudes)?<br= > &gt; =C3=A2=E2=82=AC=C5=93Band-limited=C3=A2=E2=82=AC=C2=A0 is clearly a po= werful assumption.<br> &gt;<br> &gt; Why is this of interest in the auditory context?=C2=A0 The band-limite= d premise<br> &gt; is approximately valid for each channel of the cochlear filterbank<br> &gt; (sometimes characterized as a 1/3 octave filter).=C2=A0 While cochlear= <br> &gt; transduction is non-linear, Logan=C3=A2=E2=82=AC=E2=84=A2s theorem sug= gests that any<br> &gt; information lost due to that non-linearity can be restored, within eac= h<br> &gt; channel. If so, cochlear transduction is =C3=A2=E2=82=AC=C5=93transpar= ent=C3=A2=E2=82=AC , which is<br> &gt; encouraging for those who like to speculate about neural models of<br> &gt; auditory processing. An algorithm applicable to the sound waveform can= be<br> &gt; implemented by the brain with similar results, in principle.<br> &gt;<br> &gt; Logan=C3=A2=E2=82=AC=E2=84=A2s theorem has been invoked by David Marr = for vision and several<br> &gt; authors for hearing (some refs below). The theorem is unclear as to ho= w<br> &gt; the original signal should be reconstructed, which is an obstacle to<b= r> &gt; formulating concrete models, but in these days of machine learning it<= br> &gt; might be OK to assume that the system can somehow learn to use the<br> &gt; information, granted that it=C3=A2=E2=82=AC=E2=84=A2s there.=C2=A0 The= hypothesis has far-reaching<br> &gt; implications, for example it implies that spectral resolution of centr= al<br> &gt; auditory processing is not limited by peripheral frequency analysis (a= s<br> &gt; already assumed by for example phase opponency or lateral inhibitory<b= r> &gt; hypotheses).<br> &gt;<br> &gt; Before venturing further along this limb, it=C3=A2=E2=82=AC=E2=84=A2s = worth considering some<br> &gt; issues.=C2=A0 First, Logan made clear that his theorem only applies to= a<br> &gt; perfectly band-limited signal, and might not be =C3=A2=E2=82=AC=C5=93a= pproximately valid=C3=A2=E2=82=AC <br> &gt; for a signal that is =C3=A2=E2=82=AC=C5=93approximately band-limited= =C3=A2=E2=82=AC .=C2=A0 No practical<br> &gt; signal is band-limited, if only because it must be time limited, and t= hus<br> &gt; the theorem might conceivably not be applicable at all.=C2=A0 On the o= ther<br> &gt; hand, half-wave rectification offers much richer information than zero= <br> &gt; crossings, so perhaps the end result is valid (information preserved) = even<br> &gt; if the theorem is not applicable stricto sensu.=C2=A0 Second, there ar= e many<br> &gt; other imperfections such as adaptation, stochastic sampling to a<br> &gt; spike-based representation, and so on, that might affect the usefulnes= s of<br> &gt; the hypothesis.<br> &gt;<br> &gt; The challenge is to address some of these loose ends. For example:<br> &gt; (1) Can the theorem be extended to make use of a halfwave-rectified si= gnal<br> &gt; rather than zero crossings? Might that allow it to be applicable to<br= > &gt; practical time-limited signals?<br> &gt; (2) What is the impact of real cochlear filter characteristics,<br> &gt; adaptation, or stochastic sampling?<br> &gt; (3) In what sense can one say that the acoustic signal is &quot;availa= ble=C3=A2=E2=82=AC=C2=A0 to<br> &gt; neural signal processing?=C2=A0 What are the limits of that concept?<b= r> &gt; (4) Can all this be formulated in a way intelligible by non-mathematic= al<br> &gt; auditory scientists?<br> &gt;<br> &gt; This is the challenge.=C2=A0 The reward is - possibly - a better under= standing<br> &gt; of how our brain hears the world.<br> &gt;<br> &gt; Alain<br> &gt;<br> &gt; ---<br> &gt; Logan BF, JR. (1977) Information in the zero crossings of bandpass<br> &gt; signals. Bell Syst. Tech. J. 56:487=C3=A2=E2=82=AC=E2=80=9C510.<br> &gt;<br> &gt; Marr, D. (1982) VISION - A Computational Investigation into the Human<= br> &gt; Representation and Processing of Visual Information. W.H. Freeman and = Co,<br> &gt; republished by MIT press 2010.<br> &gt;<br> &gt; Heinz, M.G., Swaminathan J. (2009) Quantifying Envelope and Fine-Struc= ture<br> &gt; Coding in Auditory Nerve Responses to Chimaeric Speech, JARO 10: 407= =C3=A2=E2=82=AC=E2=80=9C423<br> &gt; DOI: 10.1007/s10162-009-0169-8.<br> &gt;<br> &gt; Shamma, S, Lorenzi, C (2013) On the balance of envelope and temporal f= ine<br> &gt; structure in the encoding of speech in the early auditory system, J.<b= r> &gt; Acoust. Soc. Am. 133, 2818=C3=A2=E2=82=AC=E2=80=9C2833.<br> &gt;<br> &gt; Parida S, Bharadwaj H, Heinz MG (2021) Spectrally specific temporal<br= > &gt; analyses of spike-train responses to complex sounds: A unifying framew= ork.<br> &gt; PLoS Comput Biol 17(2): e1008155.<br> &gt; <a href=3D"https://doi.org/10.1371/journal.pcbi.1008155" rel=3D"norefe= rrer" target=3D"_blank">https://doi.org/10.1371/journal.pcbi.1008155</a><br= > &gt;<br> &gt; de Cheveign=C3=83=C2=A9, A. (in press) Harmonic Cancellation, a Fundam= ental of<br> &gt; Auditory Scene Analysis. Trends in Hearing (<a href=3D"https://psyarxi= v.com/b8e5w/" rel=3D"noreferrer" target=3D"_blank">https://psyarxiv.com/b8e= 5w/</a>).<br> <br> <br> -- <br> Prof Leslie Smith (Emeritus)<br> Computing Science &amp; Mathematics,<br> University of Stirling, Stirling FK9 4LA<br> Scotland, UK<br> Tel +44 1786 467435<br> Web: <a href=3D"http://www.cs.stir.ac.uk/~lss" rel=3D"noreferrer" target=3D= "_blank">http://www.cs.stir.ac.uk/~lss</a><br> Blog: <a href=3D"http://lestheprof.com" rel=3D"noreferrer" target=3D"_blank= ">http://lestheprof.com</a><br> </blockquote></div></div>-- <br><div dir=3D"ltr" class=3D"gmail_signature" = data-smartmail=3D"gmail_signature">Ken W. Grant, Ph.D.<br>Chief, Scientific= and Clinical Studies Section<br>America Building, Room 5601<br>Walter Reed= National Military Medical Center<br>4954 North Palmer Road<br>Bethesda, MD= 20889-5630<br>=C2=A0<br>OFFICE:=C2=A0 301-319-7043<br>CELL:=C2=A0 301-919-= 2957<br>=C2=A0<br><a href=3D"mailto:kenneth.w.grant.civ@xxxxxxxx">kenneth.w= .grant.civ@xxxxxxxx</a><br><a href=3D"mailto:ken.w.grant@xxxxxxxx">ken.w.g= rant@xxxxxxxx</a></div> --00000000000000834105cd0a36de--


This message came from the mail archive
src/postings/2021/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University