[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A new paradigm?(On pitch and periodicity (was "correction to post"))

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
From: Ranjit Randhawa <rsran@xxxxxxxxxxx>
Date: Fri, 9 Sep 2011 11:00:48 -0400
Approved-by: rsran@xxxxxxxxxxx
Comments: To: Steve Beet <steve.beet@xxxxxxxx>
Delivery-date: Fri Sep 9 12:11:49 2011
In-reply-to: <14368_1315501792_4E68F6E0_14368_184_1_20110908175606.8cd6a00d.steve.beet@xxxxxxxx>
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
References: <1730_1312014166_4E33BF56_1730_101_1_309996174.92611.1312010201743.JavaMa i l.root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <19187_1312295606_4E380AB6_19187_28_1_4E380604.5000202@xxxxxxxxxxx> <20309_1312297019_4E38103B_20309_113_1_CANdd18V1F1CUM+XX=2CKiXkVMeHGmzU1k2 0NP6w7dkpJ=kh=qQ@xxxxxxxxxxxxxx> <26001_1312308216_4E383BF8_26001_208_1_p06240807ca5de376a32f@[192.168.1.21 0]> <4E3AADC8.9040803@xxxxxxxxxxx> <30451_1312483004_4E3AE6BC_30451_39_1_p06240801ca608799b58a@[172.19.64.178]> <18472_1315329834_4E66572A_18472_686_1_4E664FF8.60103@xxxxxxxxxxx> <6663_1315394441_4E675389_6663_738_1_20110907120553.a28636a3.steve.beet@xxxxxxxx> <4E68E2CC.9050905@xxxxxxxxxxx> <13691_1315499665_4E68EE91_13691_34_1_20110908171758.534654bf.steve.beet@xxxxxxxx> <14368_1315501792_4E68F6E0_14368_184_1_20110908175606.8cd6a00d.steve.beet@xxxxxxxx>
Reply-to: Ranjit Randhawa <rsran@xxxxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1

Hi Steve,

Thanks for your comments. I believe the hangup is in some of the detailsof how summation of the bivalent rate of change of energy signal shouldbe handled. I make the assumption that the vector length is equal to thewavelength of the frequency being considered and within this vector thechange of signs will take place at an edge, defined to occur when therate of change of energy is zero. For a pure tone there are basicallyfour edges in the phase domain that define the wavelength. The way thatI use the concept of "evaluative bivalence", mentioned earlier, is tosum each side of an edge separately, and then calculate a sum based onequal absolute values on each side of the edge, leaving behind aremainder that then becomes available to a lower harmonics. Of coursethere is no remainder when the positive sum on one side of an edge isequal to the negative sum on the other, which should occur for welldefined resonant systems.I am sure this is about as clear as mud, but it should become a bitclearer by playing around with circle on a phase diagram and thenconsidering this description in terms of two halves, a 180 deg phaseshift, for different starting points on this circle. The basic rulesshould become obvious.The nice thing of this approach, in my humble opinion, is that one doesnot consider a sinusoid as a single value but as a harmonic series, andtherefore combination tones fall out automatically, not to mentiondichotic hearing which forces one to consider that tonal resolutiontakes place at a higher level of the CNS rather than at the peripheralsystem. But will have to leave that line of discussion to a future time.I do have some results, but unfortunately don't have the ability toprovide a URL for a wider distribution, but will send you separately, aMicrosoft Word .docx file that I hope you will be able to open andperuse. Any comments would be welcome.

Thanks again and of course the best,
Randy

On 9/8/2011 12:56 PM, Steve Beet wrote:

Hi again, Randy,

A brief postscript:

Another problem which I forgot to mention, is that the displacement of the basilar membrane and/or cilia is rarely a simple damped sinusoid, especially at the apical end. Even if the speech signal is "clean", you can have two or more harmonics at similar amplitudes at the same point on the membrane, the simple x(t).dx(t)/dt expression you mentioned can give rapidly varying (even negative) values for instantaneous "energy". You'll need to do something to convert these values to a more "robust" form - in my RAR analysis I dealt with it by applying a signal-dependent weighting function to the instantaneous values and integrating over a short period of time. This is simple and can be justified mathematically, but you might be able to come up with something better!

Cheers.

Steve Beet


On Thu, 8 Sep 2011 17:17:58 +0100
Steve Beet<steve.beet@xxxxxxxx>  wrote:

Dear Randy,

I think we might be talking at cross purposes here. The "filter bank" I referred to could equally well be called a "linear model of wave propagation within the cochlea". The filter-bank I used was simply an empirical linear approximation to a real (non-linear, and quite possibly non-deterministic) transfer function between the incident pressure wave and the displacement of the cilia on the basilar membrane.

I should mention perhaps that there is no way I would claim my "reduced auditory representation" is either complete or precise in it's characterisation of human perception - that was not my intention.

The point I was trying to make in my email was that your characterisation of "signal energy" as "x(t).dx(t)/dt" is essentially the same as the Teager energy operator which has been widely investigated, both in the context of auditory-inspired, and conventional linear (Fourier-based) signal analysis.

I also wanted to point out that this direct characterisation of signal energy as the sum of kinetic and potential energy in the system, calculated as "x(t).dx(t)/dt", is fine if you're analysing a simple resonant system, but in the real world, the acoustic environment is complicated (even if it is only a single voice) and the numerical results usually become swamped by any higher frequencies in the signal being analysed. I was merely trying to suggest that you need to consider the practicalities of how you measure "energy" before you commit yourself to a particular set of equations.

It's all well and good to say "start with the highest of the series" but with the definition of energy which you propose, the energy at higher frequencies will be over-emphasised and I suspect you will have trouble differentiating between background and/or quantisation noise and the highest harmonics of the signal. However, I might be wrong - and I'd love to hear about the results of any experiments you do.

Again, good luck with all this!

Steve Beet



On Thu, 08 Sep 2011 11:44:12 -0400
Ranjit Randhawa<rsran@xxxxxxxxxxx>  wrote:

Dear Steve,
The model I am proposing depends on analyzing frequency at each point
along the BM (no filter banks), which then means that magnitude of that
frequency can be given in terms of the magnitude of its harmonics, based
on using the rate of change of energy directly by summation. What this
then means is that the harmonic series is limited by the upper range of
the cochlea, 20 khz, and the number of terms of the harmonic series will
decrease as higher level frequencies are considered. Since the number of
terms for the higher frequencies is limited, it was conjectured by me
that it was the reason why phase locking tends to decrease above about 4
khz., and the quality of the sound decreases as compared with a tone at
much lower frequencies which will have many more terms in the harmonic
series.
The only way to proceed with the analyses, at least as discovered by me
so far, requires that the analyses start with the highest component of
the series, meaning that the highest associated frequency is first
evaluated and therefore subtracted before the next lower harmonic is
evaluated. Meaning that by the time the lower numbered harmonics are
evaluated, the ones that tend to define pitch, the signal is fairly
clean. Hence, noise enhancement due to the dx(t)/dt part of the rate of
change of energy (x(t)*dx(t)/dt) is removed automatically.
Since magnitude is available directly from the summation of the rate of
change of energy, phase for each of the harmonics can be determined by
using the criteria of choosing the maximum magnitude from the results
derived by rotating the input vector, sized to be equal to the
wavelength of the frequency being analyzed. The amount of rotation is
limited as it depends on the harmonic being analyzed, and the point at
which the maximum is found, also defines the phase of the harmonic. The
use of energy allows for such a criteria. For a periodic signal, there
will be one frequency at which the maximum sums of the magnitudes of the
harmonic series components will equal the total evaluated by summing the
absolute value of the rate of change of energy, providing a means of
choosing the fundamental. This is more complicated than using a modified
form of auto-correlation but I felt required to allow explanation of the
"party" effect.
I did want to clarify that one is not using a filter bank at all, since
I don't believe that such a thing actually exists in wetware. Hence, it
was necessary that the method include a method by which the higher
frequency components can be removed and its impact to the overall signal
noted. I have tried to understand your reference to the Teager energy
operator, and have to admit that my mathematical skills were not up to
it. I have tried to approach the problem at a more fundamental level and
hope that this clarification provides additional details of this.
Regards,
Randy Randhawa


On 9/7/2011 7:05 AM, Steve Beet wrote:

Hi Ranjit,

In respect of the paragraph below, what you're suggesting is essentially the same as the Teager energy operator. I applied a "stabilised" form of this idea to the output of an auditory filter-bank, loosely based on a very early version of Dick Lyon's auditory model, in the late 1980s. I extended it to include estimates of the signal energy, the phase velocity of the travelling wave within the cochlea (analogous to Yegnanarayana's "modified group delay"), and the dominant frequency at each point along the basilar membrane. There are some examples of these parameters in this paper:

http://stevebeet.supanet.com/assets/archives/IOA92.zip

and a more detailed description of the analysis method is in this one (I don't have an electronic copy for this I'm afraid):

"Automatic speech recognition using a reduced auditory representation and position-tolerant discrimination. S. W. Beet. Computer Speech and Language, Vol. 4, pp 17-33. January 1990."

It might be worth taking a look at these before trying your ideas out - the presence of the dx(t)/dt term in your equation will make any results very susceptible to background noise and distortion unless you take some measures akin to those described in the Computer Speech and Language paper.

Good luck with your ideas!

Steve Beet



On Tue, 6 Sep 2011 12:53:12 -0400
Ranjit Randhawa<rsran@xxxxxxxxxxx>   wrote:

If one were to consider a pure sinusoid in the phase domain (one where
the axis are x(t) and dx(t)/dt), the locus would be a circle. The area
of this circle would give us the magnitude, though how to determine this
requires a different approach as the integration over 2pi would be zero.
If we consider the product x(t)*dx(t)/dt as the rate of change of energy
it would have a sign associated with it, then it is possible to
determine this area, though the resulting algorithm would be too simple
and fall apart for more complex signals since we don't know the period.

Prev by Date: CFP: Special issue on Speech Separation and Recognition in Multisource Environments
Next by Date: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
Previous by thread: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
Next by thread: Re: A new paradigm?(On pitch and periodicity (was "correction to post"))
Index(es):
- Date
- Thread