[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Temporal Envelope based pitch perception

Dear Imran,

It sounds to me like you might be noticing what Schouten (1940) and later Schouten, Ritsma, and Cardozo (1962) described as the first effect of pitch shift: the pitch of a complex stimulus is dependent on the rate of modulation (or beating caused by frequency separation among components), but moves up and down in linear proportion to the center frequency of the complex.  The second effect of pitch shift involves a further shift related to the spacing of the components.  These effects can easily shift the perceived pitch by 50-60 Hz, as is shown in that paper.

Erick Gallun

Research Investigator
National Center for Rehabilitative Auditory Research
Portland VA Medical Center

Schouten, Ritsma, and Cardozo (1962)"Pitch of the Residue" JASA, 34, 1418-24

>On Tue, 2010-02-02 at 16:51 +0100, Imran Dhamani wrote:
>> Hi everyone. 
>> I recently had a doubt pertaining to envelope based pitch perception.
>> I would be grateful if I can get the answer to my question. Thanks in
>> advance. 
>> According to the various researches that I have read till now
>> pertaining to the importance of temporal envelope cues in speech
>> perception, I could understand that the pitch/fundamental frequency
>> can be reliably represented via only the temporal envelope cues in
>> normal as well as hearing impaired and cochlear implanted listeners
>> (at least within a certain range/limit of Fo). In a simple laboratory
>> experiment I also found that my subjective judgement of the pitch of
>> speech sounds (word/sentence) as a trained listener was almost within
>> 50-60 Hz of the objective estimate of the pitch/Fo using
>> LPC/autocorrelation or Cepstral analysis in Matlab and Praat software.
>> In another series of experiments that I performed I found that when I
>> channel vocoded speech sounds (500 Hz sine wave and BBN noise carrier
>> both used alternatively) using various envelope cut off frequencies
>> ranging from 50-500 Hz with variable number of bands from 8-24 (based
>> on the greenwoods function/map), there was a drastic mismatch between
>> the objective estimate of fundamental frequency/pitch between the
>> original stimuli and the vocoded stimuli across all the conditions
>> (example if the pitch of the original stimuli was 120 Hz the
>> objectively estimated pitch of vocoded stimuli was around 70-80 Hz).
>> Moreover I also noticed a relatively lesser mismatch between original
>> and vocoded using the sine wave carrier and with increasing the
>> envelope cut-off frequencies. In the next set of trials I also
>> generated various pitch shifted versions (relatively preserving the
>> temporal information) of the same set of speech stimuli and then
>> vocoded them using the same variables and surprisingly found no
>> significant/drastic change (just a 10-20 Hz change) in the objectively
>> estimated pitch even if I shifted the original stimulus pitch by a
>> ratio of 70 (F0=220-250 Hz). Later I tried simulating the speech
>> stimuli using a cochlear implant simulation using variable carrier
>> rates from 400-10000 and channels 10-22 and found almost similar
>> (within 5-10 Hz) objectively estimated pitch values between the
>> original and simulated speech stimuli. The doubts that I had are as
>> follows: 
>> 1)       Are these findings due to any technical error (probably in
>> objective pitch estimation of vocoded stimuli) or any other mistake? 
>>         ( or can subjective findings mask objective data?) 
>> 2)       Is pitch representation solely dependent on temporal envelope
>> cues or are there any other contributors like carrier frequency (other
>> than the Nyquist- Shannon theorem), envelope cut-off, envelope
>> extraction method, temporal analysis/sample length etc which may also
>> play a major role? 
>> 3)       How is the pitch information encoded and extracted in such a
>> complex temporal envelope of speech sounds (is it completely different
>> than the periodicity based or spectral based pitch extraction mode)? 
>> 4)       Is it that if I band pass filter (based on auditory filters)
>> the envelope information then the filter/channel containing the
>> pitch/Fo information will have a different envelope (probably more
>> periodic) than the other parts and maybe the pitch information is
>> extracted by the auditory system from the complex envelope through
>> this mode?   
>> Best regards,
>> Imran Dhamani 
>> PhD. student.
>> _