[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A new paradigm?(On pitch and periodicity (was "correction to post"))

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
From: Ranjit Randhawa <rsran@xxxxxxxxxxx>
Date: Thu, 8 Sep 2011 11:44:12 -0400
Approved-by: rsran@xxxxxxxxxxx
Comments: To: Steve Beet <steve.beet@xxxxxxxx>
Delivery-date: Thu Sep 8 14:57:16 2011
In-reply-to: <6663_1315394441_4E675389_6663_738_1_20110907120553.a28636a3.steve.beet@xxxxxxxx>
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
References: <1730_1312014166_4E33BF56_1730_101_1_309996174.92611.1312010201743.JavaMa i l.root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <19187_1312295606_4E380AB6_19187_28_1_4E380604.5000202@xxxxxxxxxxx> <20309_1312297019_4E38103B_20309_113_1_CANdd18V1F1CUM+XX=2CKiXkVMeHGmzU1k2 0NP6w7dkpJ=kh=qQ@xxxxxxxxxxxxxx> <26001_1312308216_4E383BF8_26001_208_1_p06240807ca5de376a32f@[192.168.1.21 0]> <4E3AADC8.9040803@xxxxxxxxxxx> <30451_1312483004_4E3AE6BC_30451_39_1_p06240801ca608799b58a@[172.19.64.178]> <18472_1315329834_4E66572A_18472_686_1_4E664FF8.60103@xxxxxxxxxxx> <6663_1315394441_4E675389_6663_738_1_20110907120553.a28636a3.steve.beet@xxxxxxxx>
Reply-to: Ranjit Randhawa <rsran@xxxxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1

Dear Steve,

The model I am proposing depends on analyzing frequency at each pointalong the BM (no filter banks), which then means that magnitude of thatfrequency can be given in terms of the magnitude of its harmonics, basedon using the rate of change of energy directly by summation. What thisthen means is that the harmonic series is limited by the upper range ofthe cochlea, 20 khz, and the number of terms of the harmonic series willdecrease as higher level frequencies are considered. Since the number ofterms for the higher frequencies is limited, it was conjectured by methat it was the reason why phase locking tends to decrease above about 4khz., and the quality of the sound decreases as compared with a tone atmuch lower frequencies which will have many more terms in the harmonicseries.The only way to proceed with the analyses, at least as discovered by meso far, requires that the analyses start with the highest component ofthe series, meaning that the highest associated frequency is firstevaluated and therefore subtracted before the next lower harmonic isevaluated. Meaning that by the time the lower numbered harmonics areevaluated, the ones that tend to define pitch, the signal is fairlyclean. Hence, noise enhancement due to the dx(t)/dt part of the rate ofchange of energy (x(t)*dx(t)/dt) is removed automatically.Since magnitude is available directly from the summation of the rate ofchange of energy, phase for each of the harmonics can be determined byusing the criteria of choosing the maximum magnitude from the resultsderived by rotating the input vector, sized to be equal to thewavelength of the frequency being analyzed. The amount of rotation islimited as it depends on the harmonic being analyzed, and the point atwhich the maximum is found, also defines the phase of the harmonic. Theuse of energy allows for such a criteria. For a periodic signal, therewill be one frequency at which the maximum sums of the magnitudes of theharmonic series components will equal the total evaluated by summing theabsolute value of the rate of change of energy, providing a means ofchoosing the fundamental. This is more complicated than using a modifiedform of auto-correlation but I felt required to allow explanation of the"party" effect.I did want to clarify that one is not using a filter bank at all, sinceI don't believe that such a thing actually exists in wetware. Hence, itwas necessary that the method include a method by which the higherfrequency components can be removed and its impact to the overall signalnoted. I have tried to understand your reference to the Teager energyoperator, and have to admit that my mathematical skills were not up toit. I have tried to approach the problem at a more fundamental level andhope that this clarification provides additional details of this.

Regards,
Randy Randhawa


On 9/7/2011 7:05 AM, Steve Beet wrote:

Hi Ranjit,

In respect of the paragraph below, what you're suggesting is essentially the same as the Teager energy operator. I applied a "stabilised" form of this idea to the output of an auditory filter-bank, loosely based on a very early version of Dick Lyon's auditory model, in the late 1980s. I extended it to include estimates of the signal energy, the phase velocity of the travelling wave within the cochlea (analogous to Yegnanarayana's "modified group delay"), and the dominant frequency at each point along the basilar membrane. There are some examples of these parameters in this paper:

http://stevebeet.supanet.com/assets/archives/IOA92.zip

and a more detailed description of the analysis method is in this one (I don't have an electronic copy for this I'm afraid):

"Automatic speech recognition using a reduced auditory representation and position-tolerant discrimination. S. W. Beet. Computer Speech and Language, Vol. 4, pp 17-33. January 1990."

It might be worth taking a look at these before trying your ideas out - the presence of the dx(t)/dt term in your equation will make any results very susceptible to background noise and distortion unless you take some measures akin to those described in the Computer Speech and Language paper.

Good luck with your ideas!

Steve Beet



On Tue, 6 Sep 2011 12:53:12 -0400
Ranjit Randhawa<rsran@xxxxxxxxxxx>  wrote:

If one were to consider a pure sinusoid in the phase domain (one where
the axis are x(t) and dx(t)/dt), the locus would be a circle. The area
of this circle would give us the magnitude, though how to determine this
requires a different approach as the integration over 2pi would be zero.
If we consider the product x(t)*dx(t)/dt as the rate of change of energy
it would have a sign associated with it, then it is possible to
determine this area, though the resulting algorithm would be too simple
and fall apart for more complex signals since we don't know the period.

Prev by Date: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
Next by Date: Assistant Professor at KU
Previous by thread: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
Next by thread: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
Index(es):
- Date
- Thread