Re: Is correlation any good for pitch perception? ("Richard F. Lyon" )


Subject: Re: Is correlation any good for pitch perception?
From:    "Richard F. Lyon"  <DickLyon(at)ACM.ORG>
Date:    Mon, 19 Jan 2004 22:53:05 -0800

--============_-1137540773==_ma============ Content-Type: text/plain; charset="iso-8859-1" ; format="flowed" Content-Transfer-Encoding: quoted-printable Peter, Bertrand, Craig, Dmitry, Shihab, and all, I've been out of auditory work for quite a few=20 years now, but still lurk on this list. This discussion of pitch and auto-correlation=20 sounds so much like what we heard one and two=20 decades ago that's it nostalgic, in a funny kind=20 of way. Peter, thanks for carrying the torch. Dick ps. the first time I talked about auditory=20 correlograms at a Navy workshop on AI and=20 bionics, J.C.R. Licklider fell asleep against the=20 back wall of the room. When I mentioned his name=20 and his theory, his wife elbowed him in the ribs=20 to wake him up. I think his interest was more on=20 the AI side in those days (c. 1984). At 10:27 PM -0800 01/19/2004, Craig Atencio wrote: > >Dear Peter, > >You are completely correct in pointing out my=20 >faulty memory. You descibed dominance regions=20 >for pitch but not unresolved harmonics in the=20 >second of your 1996 papers. I was thinking of a=20 >talk Bertrand gave out here in Berkeley late=20 >last year. The conclusion that I claimed he=20 >stated was correct, though it was not in the=20 >context of autocorrelation models. The move=20 >toward using Shamma's ideas is also correct and=20 >was stated in the Berkeley talk and a talk given=20 >near Paris last year. > >The main point, of course, was to request a=20 >longer article on the state-space embedding=20 >technique described by Dmitry. It is an=20 >interesting idea that needs more detail made=20 >available. Especially the idea of how it could=20 >be implemented in a neural system. > >Thanks also for your detailed analysis of=20 >autocorrelation results in your mailing. > >-Craig > >--------------------- >Craig Atencio >Department of Bioengineering UCSF/UCB >W.M. Keck Center for Integrative Neuroscience UCSF >513 Parnassus Ave. >HSE 834, Box 0732 >San Francisco, CA, 94143-0732, USA ><http://www.keck.ucsf.edu/~craig>http://www.keck.ucsf.edu/~craig >office: 415-476-1762 (UCSF) >cell: 510-708-6346 > > > >----- Original Message ----- >From: <mailto:peter(at)epl.meei.harvard.edu>Peter Cariani >To: <mailto:AUDITORY(at)LISTS.MCGILL.CA>AUDITORY(at)LISTS.MCGILL.CA >Cc: <mailto:craig(at)phy.ucsf.edu>Craig Atencio >Sent: Monday, January 19, 2004 9:37 PM >Subject: Re: Is correlation any good for pitch perception? > >Dear Craig and Eckhard > >Craig, which of our 1996 results are you thinking of >"that overestimate pitch salience of unresolved harmonics"? >I can't think of any offhand -- do you have some of our results >confused with those generated by computer models? > >P. A. Cariani and B. Delgutte, "Neural=20 >correlates of the pitch of complex tones. I.=20 >Pitch and pitch salience. II. Pitch shift, pitch=20 >ambiguity, phase-invariance, pitch circularity,=20 >and the dominance region for pitch.," J.=20 >Neurophysiology, vol. 76, pp. 1698-1734, 1996. > >A difficulty with estimating pitch saliences has been the relative >dearth of data on salience per se (Fastl's papers notwithstanding). >We stated that the peak-to-mean ratio in=20 >population-wide all-order interval distribution >qualitatively corresponded to pitch salience. The more general concept is >that the salience of the pitch is related to the=20 >fraction of intervals related to >a particular periodicity (n/F0, where n=3D 1,2,3,....) amongst all others. > >We found that the salience of high-Fc AM tones=20 >just outside the pitch existence >region was near 1 and the salience of AM noise=20 >was near 1.4. The ordering of the >saliences did correspond well to the ordering in Fastl's paper and in the >rest of the literature that was then available=20 >(pitches of harmonics in the dominance >region have higher saliences than higher ones,=20 >pitches of resolved harmonics 3-5 had >higher saliences than unresolved ones 6-12). > >Population-interval models (I speak of the results of my own simulations) >predict lower saliences as carrier frequencies increase >(because of weaker phase locking to the carrier,=20 >a smaller fraction of ANFs being >driven due to the asymmetry of tuning curves, and the more dispersed nature= of >intervals associated with envelopes rather than individual partials). >They also predict lower saliences with harmonic=20 >number. If you have a logarithmic >distribution of CFs, the higher the harmonic=20 >number, the fewer the fibers that are >excited (proportionally) by a single harmonic bracketed by others. > >If you're talking about low F0's (the salience=20 >of F0=3D80 Hz harmonic complexes in our paper), >subjectively, the 80 Hz pitch of those stimuli=20 >is strong (these had components at the >fundamental, in contrast to Carlyon's stimuli).=20 >Certainly if you go over to the piano and play a=20 >melody in the register near 80 Hz, the pitches=20 >aren't qualitatively weaker than an octave or=20 >two above it. >This is a subjective observation. > >It is true, though, that the early models would=20 >not have accounted well for saliences of >very low pitches (< 50-60 Hz), because the early=20 >models did not discount longer intervals. >One of subsequent evolutions of pitch models in=20 >the last 10 years has been the realization >that lower limit of pitch has implications for interval-based models, e.g. > >D. Pressnitzer, R. D. Patterson, and K.=20 >Krumboltz, "The lower limit of melodic pitch,"=20 >J. Acoust. Soc. Am., vol. 109, pp. 2074-2084,=20 >2001. > >This was driven, I think, in part from Carlyon &=20 >Shackleton's (1994) work on low-F0 >periodicities in high-Fc channels (high harmonic=20 >numbers), which has led some of us to >hypothesize that high CF channels may have=20 >shorter interval analysis windows than lower-CF=20 >ones. >It may well be the case that there are some=20 >differences in processing of intervals according=20 >to CF, but this does not change the core=20 >hypothesis that there is a global temporal=20 >representation of >periodicities below around 4 kHz. (I believe=20 >that this strong, precise level-invariant=20 >representation coexists with a much weaker and=20 >coarser (rate) place representation that covers=20 >the range of cochlear resonances (50-20,000 Hz;=20 >i.e. a duplex model much like Licklider's). > >You have to realize that the earlier studies=20 >were simply trying to predict pitch on the basis=20 >of interval patterns. We didn't deal with=20 >questions around the fringes of the pitch=20 >existence region. To deal with these questions,=20 >one needs to grapple with the length of the=20 >interval analysis windows (what is the longest=20 >interval that is analyzed?). An autocorrelation=20 >that encompasses indefinitely long lags as=20 >indefinitely precise frequency resolution (like=20 >a vernier principle) -- we know the frequency=20 >resolution of the auditory system is quite fine,=20 >but nevertheless limited. Goldstein and=20 >Srulowicz assumed first order intervals, which=20 >in effect produces an exponential window, but=20 >there are many problems with first-order=20 >intervals (they are rate-dependent -- pitch=20 >representations would shift as SPLs and firing=20 >rates increased; the other major problem comes=20 >from interference when one has 2 harmonic=20 >complexes (say n=3D 1-6) of different F0's -- if=20 >they are 20% apart in frequency, their pitches=20 >do not obliterate each other in the manner that=20 >would be expected if the representations were=20 >based on first-order intervals). "Higher-order=20 >peaks" are necessary to account for hearing=20 >multiple concurrent F0's. This interference is a=20 >big problem for first-order intervals and for=20 >central representations of pitch that rely on=20 >bandpass MTF's. > >Recently Krumbholz, Patterson, Nobbe, & Fastl=20 >have recently probed the form of the (putative)=20 >interval analysis windows: > >[2] K. Krumbholz, R. D. Patterson, A. Nobbe, and=20 >H. Fastl, "Microsecond temporal resolution in=20 >monaural hearing without spectral cues?," J=20 >Acoust Soc Am, vol. 113, pp. 2790-800, 2003. > >These refinements of interval models appear to=20 >be capable of handling decline of salience of=20 >resolved >and unresolved harmonics at both ends of the=20 >spectrum. For low frequencies, salience is=20 >limited by >the length/shape of the interval analysis=20 >window; for high frequencies, it is limited by=20 >the factors outlined above. > > >AUTOCORRELATION: REPRESENTATIONS & COMPUTATIONS >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >Are you really sure that our auditory system=20 >uses autocorrelation at all? (Terez) > > >"There are indeed at best scant indications for=20 >autocorrelation merely inside brain." (Eckhard) > >We have to be clear about neural representations=20 >and analyses ("computations"). > >It is abundantly clear that the all-order=20 >interspike interval distributions have forms=20 >that resemble stimulus autocorrelation functions=20 >in essential ways, up to the frequency limits of=20 >phase-locking. >A temporal, interval-based representation of=20 >periodicity and spectrum exists at early stages >of auditory processing, at least up to the level=20 >of the midbrain. There are tens if not hundreds >of papers in the auditory literature that support this. > >The mechanism by which this information is=20 >utilized/analyzed by the auditory system is >unknown, and I agree that the evidence for=20 >neural autocorrelators per se (a la Licklider) >is quite scant. However, the evidence for=20 >central rate-place representation of pitch >F0-pitch and low frequency pure tone pitches as=20 >well) is also very weak -- I see no >convincing physiological evidence for harmonic templates (BF's << 5 kHz) or= of >robust physiological resolution of harmonics=20 >higher than the second. We want to see >a representation of lower-frequency sounds (<=20 >4kHz) that is precise, largely level-invariant, >capable of supporting multiple objects (two=20 >musical instruments playing different notes), >and that accounts for the pitches of pure and complex tones. > >A central rate-place coding of pitch is not out=20 >of the question -- I can imagine >ways that we might have missed some kind of=20 >covert, distributed representation -- >but I can also imagine similar (but more=20 >elegant) possibilities for central temporal=20 >representations, >and there is no compelling reason to think that=20 >the central codes must be rate-based. >Cortical neurons don't behave anything like rate integrators -- >its evident from the nature of single unit=20 >responses that some other functional=20 >information-processing principle must be=20 >operant; we desperately need fresh ideas --=20 >alternative notions of information processing. > >It's premature to rule anything out yet. >We understand the cortical correlates of very=20 >few auditory percepts with any degree >of confidence, such that we can say that it is=20 >this or that kind of code. Even in vision, >where there are orders of magnitude more workers=20 >on both cortical and theoretical >fronts, as far as I can see this is also the=20 >case -- they do not have a coherent >theory of how the cortex works -- how the information is represented and >analyzed -- they cannot explain how we (and most=20 >animals) distinguish triangles from circles >or why images disappear when they are stabilized on the retina. > >I think the central problem in auditory=20 >neurophysiology is to determine what becomes of=20 >this >superabundant, high quality, invariant, interval information. Laurel Carney >and Shihab Shamma have proposed models for utilizing phase information, but >it remains to be seen whether there is compelling physiological evidence fo= r >these models. It's also not yet clear to me how=20 >these models behave in terms of pitch. >I have been working on time-domain strategies that bypass the need to compu= te >autocorrelations explicitly (pitch is=20 >notoriously relative, which is not consistent=20 >with >explicit absolute estimation mechanisms), but I=20 >do not yet see any strong positive physiological=20 >evidence for these either. > >Amidst all this equivocation about mechanisms,=20 >let us not forget what we know about early=20 >representations, i.e. that the interval-based=20 >"autocorrelation-like" representation does exist=20 >at the levels of the auditory nerve, cochlear=20 >nuclei, and midbrain. Even at the midbrain, I=20 >think it is likely that the interval-based=20 >representation >is of higher quality and greater reliability=20 >than those based on bandpass MTFs or on=20 >rate-place profiles. > >I think it's likely that, whatever the mechanism=20 >turns out to be, it will involve this interval=20 >information and it will also be tied in closely=20 >with scene analysis mechanisms (harmonicity=20 >grouping). > >If one is fairly certain that the information is=20 >temporally-coded to begin with, then one looks=20 >first for temporal processing mechanisms. > >--Peter Cariani > >Peter Cariani, PhD >Eaton Peabody Laboratory of Auditory Physiology >Massachusetts Eye & Ear Infirmary >243 Charles St., Boston, MA 02114 USA > > >On Friday, January 16, 2004, at 05:34 PM, Craig Atencio wrote: > >Dear Dmitry, > >My understanding recently was that autocorrelation may not be the best >measure of periodicity pitch because it performs too well. Papers in >1996 by Cariani and Delgutte showed that pitch salience was >overestimated for unresolved harmonics. For most everything else the >model worked quite well. Note that these studies were a >neurophysiological test of Licklider's original autocorrelation idea, >where he gave a basic schematic of how autocorrelation might be >implemented in a neural system. I recently heard a talk where Delgutte >said that he was moving more in the direction of Shamma's earlier work >based on spatio-temporal representations of auditory signals. > >I did read your earlier ICASSP paper and took a look at the Matlab >files on the website. Unfortunately, the code is not available for >viewing and the ICASSP paper lacks some detail. I, for one, was really >intrigued by your idea, so I kindly suggest that you write a longer >paper and submit it to JASA. The editors there are always interested >in pitch, as are the reviewers and readers. (I believe that last year >Pierre Divenyi proposed this as well.) > >You also mentioned that a basic analog system could implement your >idea. Including that in the JASA paper would be great too. It would be >helpful to see that so we could determine which neural center, if any, >might be able to implement your idea. > >Best wishes, > >Craig > >--------------------- >Craig Atencio >Department of Bioengineering UCSF/UCB >W.M. Keck Center for Integrative Neuroscience UCSF >513 Parnassus Ave. >HSE 834, Box 0732 >San Francisco, CA, 94143-0732, USA >http://www.keck.ucsf.edu/~craig >office: 415-476-1762 (UCSF) >cell: 510-708-6346 > > >----- Original Message ----- >From: "Dmitry Terez" <terez(at)SOUNDMATHTECH.COM> >To: <AUDITORY(at)LISTS.MCGILL.CA> >Sent: Friday, January 16, 2004 1:21 PM >Subject: Is correlation any good for pitch perception? > >Dear Auditory List Members, > >I would like to convey some thoughts on the much-discussed subject > >of > >how human auditory system can use autocorrelation analysis for pitch >perception. > >Are you really sure that our auditory system uses autocorrelation at >all? >Has anybody seen it really happening in the brain? As far as I >understand >(Forgive me if I am wrong), beyond the cochlea not much is really >known about the real mechanism behind an exceptionally robust human >pitch perception. I am not an auditory scientist, but it just looks >to me that the correct answer on your part is "We do not know". > >I do think that correlation function has two fatal drawbacks, as far >as pitch detection is concerned (I am talking about a classical > >auto- > >or cross-correlation function of a signal, that is, a multiply-and- >add type of operation, as defined in any textbook on signal >processing) > >The first fatal drawback of correlation is the abundance of > >secondary > >peaks due to complex harmonic structure of a signal. For some real >signals we are dealing with every day, such as speech, the secondary >peaks in the correlation function due to speech formants (vocal > >tract > >resonances) are sometimes about the same height as the main peaks > >due > >to signal periodicity (pitch). > >The second fatal drawback of correlation is its pitch strength >(salience) property for simple and complex tones. In other words, > >the > >main peaks in the correlation function computed for, e.g. a simple >sine wave, are too wide. Meanwhile, I would expect a simple tone to >cause the same or even stronger pitch sensation than a complex tone >with the same fundamental frequency. > >I think that it would be strange if evolution resulted in such a >suboptimal mechanism of perceiving sound periodicity. > >As some of you may know, recently we introduced a new revolutionary >concept of pitch detection. It has nothing to do with correlation >(although one can see some similarity) or spectrum of a signal. It > >is > >basically based on "unfolding" a scalar signal in several > >dimensions - > >a concept of signal "embedding", as it is called in nonlinear and >chaotic signal processing. >The ICASSP paper and the Matlab demo are available from >http://www.soundmathtech.com/pitch > >You can also read our US patent application publication No. >20030088401 >at http://www.uspto.gov/patft > >Although a purely digital implementation is described, I can build a >simple analog electro-mechanical device (basically a mechanical part >followed by a two-dimensional grid of "neurons" for projecting an >output) that is based on the same principle and is exceptionally >robust at detecting pitch. > >My question is: Can our auditory system use this type of processing >for pitch perception? > >Is it possible to find some mechanism that can perform this kind of >processing, perhaps between the cochlea and the brain? > >I do not expect a quick answer. Please, take your time, maybe next > >10 > >years =8A > >Also, I would like to add that although words like "chaos >theory", "phase space" or "signal embedding" might seem not relevant >to your research on pitch perception, they are now, in fact. This is >an entirely new game=8A > > >Best Regards, > >Dmitry E. Terez, Ph.D. > >SoundMath Technologies, LLC >P.O. Box 846 >Cherry Hill, New Jersey, 08003 >USA >e-mail: dterez AT soundmathtech.com --============_-1137540773==_ma============ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!doctype html public "-//W3C//DTD W3 HTML//EN"> <html><head><style type=3D"text/css"><!-- blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 } --></style><title>Re: Is correlation any good for pitch perception?</title></head><body> <div>Peter, Bertrand, Craig, Dmitry, Shihab, and all,</div> <div><br></div> <div>I've been out of auditory work for quite a few years now, but still lurk on this list.</div> <div><br></div> <div>This discussion of pitch and auto-correlation sounds so much like what we heard one and two decades ago that's it nostalgic, in a funny kind of way.</div> <div><br></div> <div>Peter, thanks for carrying the torch.</div> <div><br></div> <div>Dick</div> <div><br></div> <div>ps. the first time I talked about auditory correlograms at a Navy workshop on AI and bionics, J.C.R. Licklider fell asleep against the back wall of the room.&nbsp; When I mentioned his name and his theory, his wife elbowed him in the ribs to wake him up.&nbsp; I think his interest was more on the AI side in those days (c. 1984).</div> <div><br></div> <div><br></div> <div><br></div> <div>At 10:27 PM -0800 01/19/2004, Craig Atencio wrote:</div> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>Dear Peter,</blockquote> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>You are completely correct in pointing out my faulty memory. You descibed dominance regions for pitch but not unresolved harmonics in the second of your 1996 papers. I was thinking of a talk Bertrand gave out here in Berkeley late last year. The conclusion that I claimed he stated was correct, though it was not in the context of autocorrelation models. The move toward using Shamma's ideas is also correct and was stated in the Berkeley talk and a talk given near Paris last year.</blockquote> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>The main point, of course, was to request a longer article on the state-space embedding technique described by Dmitry. It is an interesting idea that needs more detail made available. Especially the idea of how it could be implemented in a neural system.</blockquote> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>Thanks also for your detailed analysis of autocorrelation results in your mailing.</blockquote> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>-Craig</blockquote> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>---------------------<br> Craig Atencio<br> Department of Bioengineering UCSF/UCB<br> W.M. Keck Center for Integrative Neuroscience UCSF<br> 513 Parnassus Ave.<br> HSE 834, Box 0732<br> San Francisco, CA, 94143-0732, USA</blockquote> <blockquote type=3D"cite" cite><a href=3D"http://www.keck.ucsf.edu/~craig">http://www.keck.ucsf.edu/~craig</a ><br> office: 415-476-1762 (UCSF)<br> cell: 510-708-6346</blockquote> <blockquote type=3D"cite" cite>&nbsp;</blockquote> <blockquote type=3D"cite" cite>&nbsp;<br> <blockquote>----- Original Message -----</blockquote> <blockquote><b>From:</b> <a href=3D"mailto:peter(at)epl.meei.harvard.edu">Peter Cariani</a></blockquote> <blockquote><b>To:</b> <a href=3D"mailto:AUDITORY(at)LISTS.MCGILL.CA">AUDITORY(at)LISTS.MCGILL.CA</a></block= quote > <blockquote><b>Cc:</b> <a href=3D"mailto:craig(at)phy.ucsf.edu">Craig Atencio</a></blockquote> <blockquote><b>Sent:</b> Monday, January 19, 2004 9:37 PM</blockquote> <blockquote><b>Subject:</b> Re: Is correlation any good for pitch perception?</blockquote> <blockquote><br></blockquote> <blockquote>Dear Craig and Eckhard<br> <br> Craig, which of our 1996 results are you thinking of<br> &quot;that overestimate pitch salience of unresolved harmonics&quot;?<br> I can't think of any offhand -- do you have some of our results<br> confused with those generated by computer models?<br> <br> P. A. Cariani and B. Delgutte, &quot;Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. II. Pitch shift, pitch ambiguity, phase-invariance, pitch circularity, and the dominance region for pitch.,&quot;<i> J. Neurophysiology</i>, vol. 76, pp. 1698-1734, 1996.<br> <br> A difficulty with estimating pitch saliences has been the relative<br> dearth of data on salience per se (Fastl's papers notwithstanding).<br> We stated that the peak-to-mean ratio in population-wide all-order interval distribution<br> qualitatively corresponded to pitch salience. The more general concept is<br> that the salience of the pitch is related to the fraction of intervals related to<br> a particular periodicity (n/F0, where n=3D 1,2,3,....) amongst all others.<br> <br> We found that the salience of high-Fc AM tones just outside the pitch existence<br> region was near 1 and the salience of AM noise was near 1.4. The ordering of the<br> saliences did correspond well to the ordering in Fastl's paper and in the<br> rest of the literature that was then available (pitches of harmonics in the dominance<br> region have higher saliences than higher ones, pitches of resolved harmonics 3-5 had<br> higher saliences than unresolved ones 6-12).<br> <br> Population-interval models (I speak of the results of my own simulations)<br> predict lower saliences as carrier frequencies increase<br> (because of weaker phase locking to the carrier, a smaller fraction of ANFs being<br> driven due to the asymmetry of tuning curves, and the more dispersed nature of<br> intervals associated with envelopes rather than individual partials).</blockquote> <blockquote>They also predict lower saliences with harmonic number. If you have a logarithmic<br> distribution of CFs, the higher the harmonic number, the fewer the fibers that are<br> excited (proportionally) by a single harmonic bracketed by others.<br> <br> If you're talking about low F0's (the salience of F0=3D80 Hz harmonic complexes in our paper),<br> subjectively, the 80 Hz pitch of those stimuli is strong (these had components at the<br> fundamental, in contrast to Carlyon's stimuli). Certainly if you go over to the piano and play a melody in the register near 80 Hz, the pitches aren't qualitatively weaker than an octave or two above it.<br> This is a subjective observation.<br> <br> It is true, though, that the early models would not have accounted well for saliences of<br> very low pitches (&lt; 50-60 Hz), because the early models did not discount longer intervals.<br> One of subsequent evolutions of pitch models in the last 10 years has been the realization<br> that lower limit of pitch has implications for interval-based models, e.g.<br> <br> D. Pressnitzer, R. D. Patterson, and K. Krumboltz, &quot;The lower limit of melodic pitch,&quot; J. Acoust. Soc. Am., vol. 109, pp. 2074-2084, 2001.<br> <br> This was driven, I think, in part from Carlyon &amp; Shackleton's (1994) work on low-F0<br> periodicities in high-Fc channels (high harmonic numbers), which has led some of us to<br> hypothesize that high CF channels may have shorter interval analysis windows than lower-CF ones.<br> It may well be the case that there are some differences in processing of intervals according to CF, but this does not change the core hypothesis that there is a global temporal representation of<br> periodicities below around 4 kHz. (I believe that this strong, precise level-invariant representation coexists with a much weaker and coarser (rate) place representation that covers the range of cochlear resonances (50-20,000 Hz; i.e. a duplex model much like Licklider's).<br> <br> You have to realize that the earlier studies were simply trying to predict pitch on the basis of interval patterns. We didn't deal with questions around the fringes of the pitch existence region. To deal with these questions, one needs to grapple with the length of the interval analysis windows (what is the longest interval that is analyzed?). An autocorrelation that encompasses indefinitely long lags as indefinitely precise frequency resolution (like a vernier principle) -- we know the frequency resolution of the auditory system is quite fine, but nevertheless limited. Goldstein and Srulowicz assumed first order intervals, which in effect produces an exponential window, but there are many problems with first-order intervals (they are rate-dependent -- pitch representations would shift as SPLs and firing rates increased; the other major problem comes from interference when one has 2 harmonic complexes (say n=3D 1-6) of different F0's -- if they are 20% apart in frequency, their pitches do not obliterate each other in the manner that would be expected if the representations were based on first-order intervals). &quot;Higher-order peaks&quot; are necessary to account for hearing multiple concurrent F0's. This interference is a big problem for first-order intervals and for central representations of pitch that rely on bandpass MTF's.<br> <br> Recently Krumbholz, Patterson, Nobbe, &amp; Fastl have recently probed the form of the (putative) interval analysis windows:<br> <br> [2] K. Krumbholz, R. D. Patterson, A. Nobbe, and H. Fastl, &quot;Microsecond temporal resolution in monaural hearing without spectral cues?,&quot; J Acoust Soc Am, vol. 113, pp. 2790-800, 2003.<br> <br> These refinements of interval models appear to be capable of handling decline of salience of resolved<br> and unresolved harmonics at both ends of the spectrum. For low frequencies, salience is limited by<br> the length/shape of the interval analysis window; for high frequencies, it is limited by the factors outlined above.<br> <br> <br> AUTOCORRELATION: REPRESENTATIONS &amp; COMPUTATIONS<br> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<span ></span>=3D=3D<br> <blockquote>Are you really sure that our auditory system uses autocorrelation at all? (Terez)<br> </blockquote> </blockquote> <blockquote><br> &quot;There are indeed at best scant indications for autocorrelation merely inside brain.&quot; (Eckhard)</blockquote> <blockquote><br> We have to be clear about neural representations and analyses (&quot;computations&quot;).<br> <br> It is abundantly clear that the all-order interspike interval distributions have forms that resemble stimulus autocorrelation functions in essential ways, up to the frequency limits of phase-locking.<br> A temporal, interval-based representation of periodicity and spectrum exists at early stages<br> of auditory processing, at least up to the level of the midbrain. There are tens if not hundreds<br> of papers in the auditory literature that support this.<br> <br> The mechanism by which this information is utilized/analyzed by the auditory system is<br> unknown, and I agree that the evidence for neural autocorrelators per se (a la Licklider)<br> is quite scant. However, the evidence for central rate-place representation of pitch<br> =460-pitch and low frequency pure tone pitches as well) is also very weak -- I see no<br> convincing physiological evidence for harmonic templates (BF's &lt;&lt; 5 kHz) or of<br> robust physiological resolution of harmonics higher than the second. We want to see<br> a representation of lower-frequency sounds (&lt; 4kHz) that is precise, largely level-invariant,<br> capable of supporting multiple objects (two musical instruments playing different notes),<br> and that accounts for the pitches of pure and complex tones.<br> <br> A central rate-place coding of pitch is not out of the question -- I can imagine<br> ways that we might have missed some kind of covert, distributed representation --<br> but I can also imagine similar (but more elegant) possibilities for central temporal representations,<br> and there is no compelling reason to think that the central codes must be rate-based.<br> Cortical neurons don't behave anything like rate integrators --<br> its evident from the nature of single unit responses that some other functional information-processing principle must be operant; we desperately need fresh ideas -- alternative notions of information processing.<br> <br> It's premature to rule anything out yet.<br> We understand the cortical correlates of very few auditory percepts with any degree<br> of confidence, such that we can say that it is this or that kind of code. Even in vision,<br> where there are orders of magnitude more workers on both cortical and theoretical<br> fronts, as far as I can see this is also the case -- they do not have a coherent<br> theory of how the cortex works -- how the information is represented and<br> analyzed -- they cannot explain how we (and most animals) distinguish triangles from circles<br> or why images disappear when they are stabilized on the retina.<br> <br> I think the central problem in auditory neurophysiology is to determine what becomes of this<br> superabundant, high quality, invariant, interval information. Laurel Carney<br> and Shihab Shamma have proposed models for utilizing phase information, but<br> it remains to be seen whether there is compelling physiological evidence for<br> these models. It's also not yet clear to me how these models behave in terms of pitch.<br> I have been working on time-domain strategies that bypass the need to compute<br> autocorrelations explicitly (pitch is notoriously relative, which is not consistent with<br> explicit absolute estimation mechanisms), but I do not yet see any strong positive physiological evidence for these either.<br> <br> Amidst all this equivocation about mechanisms, let us not forget what we know about early representations, i.e. that the interval-based &quot;autocorrelation-like&quot; representation does exist at the levels of the auditory nerve, cochlear nuclei, and midbrain. Even at the midbrain, I think it is likely that the interval-based representation<br> is of higher quality and greater reliability than those based on bandpass MTFs or on rate-place profiles.<br> <br> I think it's likely that, whatever the mechanism turns out to be, it will involve this interval information and it will also be tied in closely with scene analysis mechanisms (harmonicity grouping).<br> <br> If one is fairly certain that the information is temporally-coded to begin with, then one looks first for temporal processing mechanisms.<br> <br> --Peter Cariani<br> <br> Peter Cariani, PhD<br> Eaton Peabody Laboratory of Auditory Physiology<br> Massachusetts Eye &amp; Ear Infirmary</blockquote> <blockquote>243 Charles St., Boston, MA 02114 USA<br> <br> <br> On Friday, January 16, 2004, at 05:34 PM, Craig Atencio wrote:<br> <blockquote>Dear Dmitry,<br> <br> My understanding recently was that autocorrelation may not be the best<br> measure of periodicity pitch because it performs too well. Papers in<br> 1996 by Cariani and Delgutte showed that pitch salience was<br> overestimated for unresolved harmonics. For most everything else the<br> model worked quite well. Note that these studies were a<br> neurophysiological test of Licklider's original autocorrelation idea,<br> where he gave a basic schematic of how autocorrelation might be<br> implemented in a neural system. I recently heard a talk where Delgutte<br> said that he was moving more in the direction of Shamma's earlier work<br> based on spatio-temporal representations of auditory signals.<br> <br> I did read your earlier ICASSP paper and took a look at the Matlab<br> files on the website. Unfortunately, the code is not available for<br> viewing and the ICASSP paper lacks some detail. I, for one, was really<br> intrigued by your idea, so I kindly suggest that you write a longer<br> paper and submit it to JASA. The editors there are always interested<br> in pitch, as are the reviewers and readers. (I believe that last year<br> Pierre Divenyi proposed this as well.)<br> <br> You also mentioned that a basic analog system could implement your<br> idea. Including that in the JASA paper would be great too. It would be<br> helpful to see that so we could determine which neural center, if any,<br> might be able to implement your idea.<br> <br> Best wishes,<br> <br> Craig<br> <br> ---------------------<br> Craig Atencio<br> Department of Bioengineering UCSF/UCB<br> W.M. Keck Center for Integrative Neuroscience UCSF<br> 513 Parnassus Ave.<br> HSE 834, Box 0732<br> San Francisco, CA, 94143-0732, USA<br> http://www.keck.ucsf.edu/~craig<br> office: 415-476-1762 (UCSF)<br> cell: 510-708-6346<br> <br> <br> ----- Original Message -----<br> =46rom: &quot;Dmitry Terez&quot; &lt;terez(at)SOUNDMATHTECH.COM&gt;<br> To: &lt;AUDITORY(at)LISTS.MCGILL.CA&gt;<br> Sent: Friday, January 16, 2004 1:21 PM<br> Subject: Is correlation any good for pitch perception?<br> <blockquote>Dear Auditory List Members,<br> <br> I would like to convey some thoughts on the much-discussed subject<br> </blockquote> </blockquote> <blockquote>of<br> <blockquote>how human auditory system can use autocorrelation analysis for pitch<br> perception.<br> <br> Are you really sure that our auditory system uses autocorrelation at<br> all?<br> Has anybody seen it really happening in the brain? As far as I<br> understand<br> (Forgive me if I am wrong), beyond the cochlea not much is really<br> known about the real mechanism behind an exceptionally robust human<br> pitch perception. I am not an auditory scientist, but it just looks<br> to me that the correct answer on your part is "We do not know".<br> <br> I do think that correlation function has two fatal drawbacks, as far<br> as pitch detection is concerned (I am talking about a classical<br> </blockquote> </blockquote> <blockquote>auto-<br> <blockquote>or cross-correlation function of a signal, that is, a multiply-and-<br> add type of operation, as defined in any textbook on signal<br> processing)<br> <br> The first fatal drawback of correlation is the abundance of<br> </blockquote> </blockquote> <blockquote>secondary<br> <blockquote>peaks due to complex harmonic structure of a signal. For some real<br> signals we are dealing with every day, such as speech, the secondary<br> peaks in the correlation function due to speech formants (vocal<br> </blockquote> </blockquote> <blockquote>tract<br> <blockquote>resonances) are sometimes about the same height as the main peaks<br> </blockquote> </blockquote> <blockquote>due<br> <blockquote>to signal periodicity (pitch).<br> <br> The second fatal drawback of correlation is its pitch strength<br> (salience) property for simple and complex tones. In other words,<br> </blockquote> </blockquote> <blockquote>the<br> <blockquote>main peaks in the correlation function computed for, e.g. a simple<br> sine wave, are too wide. Meanwhile, I would expect a simple tone to<br> cause the same or even stronger pitch sensation than a complex tone<br> with the same fundamental frequency.<br> <br> I think that it would be strange if evolution resulted in such a<br> suboptimal mechanism of perceiving sound periodicity.<br> <br> As some of you may know, recently we introduced a new revolutionary<br> concept of pitch detection. It has nothing to do with correlation<br> (although one can see some similarity) or spectrum of a signal. It<br> </blockquote> </blockquote> <blockquote>is<br> <blockquote>basically based on "unfolding" a scalar signal in several<br> </blockquote> </blockquote> <blockquote>dimensions -<br> <blockquote>a concept of signal "embedding", as it is called in nonlinear and<br> chaotic signal processing.</blockquote> <blockquote>The ICASSP paper and the Matlab demo are available from<br> http://www.soundmathtech.com/pitch<br> <br> You can also read our US patent application publication No.<br> 20030088401<br> at http://www.uspto.gov/patft<br> <br> Although a purely digital implementation is described, I can build a<br> simple analog electro-mechanical device (basically a mechanical part<br> followed by a two-dimensional grid of "neurons" for projecting an<br> output) that is based on the same principle and is exceptionally<br> robust at detecting pitch.<br> <br> My question is: Can our auditory system use this type of processing<br> for pitch perception?<br> <br> Is it possible to find some mechanism that can perform this kind of<br> processing, perhaps between the cochlea and the brain?<br> <br> I do not expect a quick answer. Please, take your time, maybe next<br> </blockquote> </blockquote> <blockquote>10<br> <blockquote>years =8A<br> <br> Also, I would like to add that although words like "chaos<br> theory", "phase space" or "signal embedding" might seem not relevant<br> to your research on pitch perception, they are now, in fact. This is<br> an entirely new game=8A<br> <br> <br> Best Regards,<br> <br> Dmitry E. Terez, Ph.D.<br> <br> SoundMath Technologies, LLC<br> P.O. Box 846<br> Cherry Hill, New Jersey, 08003<br> USA<br> e-mail: dterez AT soundmathtech.com</blockquote> </blockquote> </blockquote> </blockquote> <div><br></div> </body> </html> --============_-1137540773==_ma============--


This message came from the mail archive
http://www.auditory.org/postings/2004/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University