Re: Time and space (Peter Cariani )


Subject: Re: Time and space
From:    Peter Cariani  <peter(at)epl.meei.harvard.edu>
Date:    Mon, 9 Jun 1997 16:34:40 +0000

Neil Todd wrote: > Dear Peter > On Thu Jun 5 22:42 BST 1997 Peter Cariani wrote: > > Lastly, in Bertrand's and my work in the auditory nerve > > it also became apparent to me that what is needed for > > periodicity pitch is an autocorrelation-like analysis of > > of periodicity pitch, ..... > > > My impression is that the > > modulation detector idea will not work for the pitch > > shift of AM tones (first effect of pitch shift, or > > de Boer's rule)...... > > > I think these are critical issues for models based on > > time-to-place that I tried to bring up in Montreal last > > summer (I didn't do it to give you a hard time, I promise). > > Yes, it is very clear that you are a strong proponent of this > class of model and certainly it is true that power spectrum > type models only respond strongly to first order intervals. > However, if you had actually read the proceedings (and > listened to what I said then) you would have noticed that I did > indeed address these issues. Autocorrelation models are good, > but they are not perfect. To save you looking up your copy of > the Proc. ICMPC96 I quote below the relevant section. > > "The next phenomenon we consider is that of virtual pitch > shift. Virtual pitch shift is described as the shift in > perceived pitch of a complex in which all the partials have > been shifted by a constant amount, so that they are no longer > harmonic. In the literature there are generally two distinct > accounts of pitch shift (Hartmann and Doty, 1996). Proponents > of temporal theories argue that pitch shift can be only > accounted for in the fine temporal structure of the AN > response, since the envelope remains invariant. Earlier > place-pattern accounts are that pitch shift results from the > disparity between a central pattern and the excitation pattern, > i.e. a best fitting harmonic series. Whether represented in terms of temporal autocorrelation or in terms of spectral pattern analysis, the Schouten-de Boer experiments indicated that the fine structure of the stimulus must be represented, either the fine structure of the waveform or the fine structure of the power spectrum. > It is of interest to see if pitch shift can be predicted by > [place-time] pattern matching against the sensory memory traces > without the spiking component. The signals for this example > were obtained from Example 21 of the ASA Auditory > Demonstrations CD and consist of the 4th, 5th and 6th harmonics > of a 200 Hz complex. As would be expected (without spiking) as > the three partials shift upwards their interaction terms remain > relatively invariant at 200 Hz. However, the pitch strength as > measured by response of the recognition space (Figure 2) does > indeed show the correct type of pitch shift including pitch > ambiguity. The strongest response is obtained for the harmonic > complex 800, 1000, 1200 Hz, although the estimated pitch is a > little less than 200 Hz. The 860, 1060, 1260 Hz complex > produces a pitch estimate at about 210 Hz. The 900, 1100, 1300 > Hz complex produces an apparent bimodal response with one > estimate at about 215 Hz and another at about 185 Hz. The 960, > 1160, 1360 Hz complex also gives a bimodal response with > estimates of about 230 Hz and 190 Hz. I looked up your papers again in the IMPC proceedings. I do remember thinking that the IC modulation tunings were way sharper than anything I had seen in the literature, looking at the first paper (fig. 2) there are multiple, sharp peaks for Fm's of 200, 400, and 800 Hz. I'm not at all knocking your valiant attempt to make a model that spans several processing levels -- I think this is a good thing to strive for -- but do these response patterns seem at all "physiological" to you? So much rests on the sharpness of that tuning.......Looking at Fig. 2 in the second paper -- the one on pitch shift -- you get an estimated pitch of 195 Hz for the perfectly harmonic case (800,1000, 1200). For the shifted case, (860, 1060, 1260), de Boer's rule would predict pitches at 215 and 172 Hz, whereas your plot shows a global maximum at 205 Hz. The peak at 185 Hz is yet smaller than one at 195 Hz. What psycho- physical results were these estimated pitches compared against? > Clearly this simple pattern matching model does seem to give an > account of pitch shift without fine structure. How then may > this be reconciled with the fine structure account? In order to > investigate the effect of fine structure in the model, a > shifted complex (900, 1100, 1300 Hz) was presented to the model > including the spiking component (see Figure 3). It is clear > from figure 3 that although fine structure appears to be only > weakly represented in the model it does have the effect of > shifting the interaction term to about 215 Hz. It appears then > that both fine structure and central matching contribute to > pitch shift. It may be that the central auditory system > combines envelope and neural timing information (Cooke and > Brown, 1994) into a single image which is available for > learning and recognition. It is known that there are some small > discrepancies between current timing models and experimental > data. e.g. the slope of the pitch shift in the case of Meddis > and Hewitt (1991) and the predicted mistuning of the > fundamental in the case of Hartmann and Doty (1996). It may be > that the hybrid model proposed here may account for these small > discrepancies of the timing models, but this requires testing." > [Proc. ICMPC 96. p. 180-181.] But for this stimulus (900, 1100, 1300) I would expect v. weak low pitches: a pitch at the true fundamental (100 Hz), and fainter ambiguous pitches around 185 Hz and 222 Hz. Your model doesn't predict any of these discrete pitches-- the estimates have a broad, high-pass character starting with about 190 Hz. The model doesn't seem to work...... > To be quite honest, I am quite agnostic as to what mechanism is > responsible at a sub-cortical level for the time to place > mapping. Both power specta and autocorrelation (which are > Fourier twins don't forget) do this. Actually, the most > neurologically plausible model I have seen is Gerald Langner's > (neither autocorrelation nor power spectrum) since it includes > both DCN and VCN components. Either way it does not alter the > cortical model I have proposed. All I am suggesting, as gently as possible, is that you should pay more attention to the nature of the elements that are supposed to be carrying out your informational operations -- just check to see if they in any way resemble what is seen physiologically. Even given highly tuned elements, the model doesn't seem to work --- so it's time to either go back and recheck for errors of implementation or time to rethink your basic assumptions. Peter Cariani


This message came from the mail archive
http://www.auditory.org/postings/1997/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University