[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Get lost, Mr. Cochlea!! --- The Brain (Jont Allen )

I am sorry but I have not seen your response until just now. I suspect
the date on your computer is off, and my window was set too small
to see your message pop up.

Ramdas Kumaresan wrote:

> Dear Jont:
> Jont Allen wrote in
> http://sound.media.mit.edu/dpwe-bin/mhmessage.cgi/AUDITORY/postings/2001/135
> >The ear IS similar to a floating point converter. The ear does not have
> an infinite
> >dynamic range or signal to noise ratio. This limited dynamic range
> >shows up as masking. Do you disagree?
> I don't know,  but masking of a weak signal due to an intense signal in
> its neighbourhood, is it entirely due to what happens in the periphery?
> (We know about asymmetry, spreading and shifting of excitation to higher
> frequencies.)
> What if the periphery still accurately (to the extent it is
> allowable by timing jitter etc) represents the weak and intense signal
> combo and the higher centers ignore the weak component, say, because
> there  is much more precise  phase locking to the intense signal.
> I am not too hot on the trail in masking.
> Is it established that the information loss (masking) is entirely due to
> the
> periphery?

As I understand it, so called "suppressive masking" (which I dont view as real masking,
but that gets me in lots of hot water with several people) happens in the cochlea.
This is the same a "two tone suppression" and the so-called "upward spread of masking."
This is caused by compression within the cochlea, due to the outer hair cells.

The other type of masking is due to neural noise. This is the component that causes true
masking, in the sense of the floating point converter that I mentioned above.
This does not occur in the cochlea, but in the auditory nerve. The point process

> Jont Allen wrote:
> >The auditory nerve signal is not about zero crossing.  Even zero crossing
> >are not exact, and would have jitter. But masking is NOT timing jitter.
> We thought the classical theory of a neuron firing says that if the
> membrane potential exceeds a threshold then it fires. If so, then it IS
> some form of zero or level crossing detector. It is a question of how
> the cochlear mechanics transforms the signal and presents it to the
> neuron/haircell.

If you make a histogram of the time of the spike relative to the signal, you
would see that the spike does not code zero crossings, rather it codes
a half wave rectified version of the signal. Personally I would not call
this a zero crossing detector any more than a half-wave rectifier is a
zero crossing detector. One important difference is that the half-wave signal
has intensity information encoded in it.

> Zero-crossings, as descriptors of a signal,  have acquired
> an undeserved  bad reputation. As we have pointed out in our
> original post, the zero-crossings of a STIMULUS SIGNAL,
> themselves are NOT of much use.
> But there are ways to carry  reliably in zero-crossings (of other
> related signals)
> information about the temporal envelope and phase of a stimulus signal,
> thereby implicitly, but completely representing  a signal.
> This is our Main point.
> Those familiar with speech signal processing  know
> about what is called Line-Spectrum-Frequencies (LSFs)
> originally proposed by Fumitada Itakura, which represent
> the spectral envelope of a signal. These LSFs are used reliably and
> successfully in speech coding, recognition etc. These are
> indeed 'zero-crossings' that represent the
> spectral envelope, except that these zero-crossings occur along
> the frequency axis, instead of time axis. Thus, there is already
> evidence
> albeit in the other (frequency) domain that these
> the zero-crossings are reliable.
> On a lighter note, I asked Yadong Wang  (my grad student), two years
> ago, to take a look at zero-crossings after reading your 1985 paper
> in which you seemed to be saying that the auditory nerve signal IS based
> on zero-crossings. (Jont B.Allen, "Cochlear Modeling", IEEE Acoustics,
> Speech and Signal Processing Magazine,January 1985, p.3-28.)  Refer to
> Figure 25
> and Figure 26 in this paper. Quoting from captions of Figure 25:
> "Based on the model of the haircell, we assume here that the information
> is carried by the zero-crossings of the multitudinous narrow band
> signals. This is because the hair cell cilia appear to act as a switch,
> given moderate and high level signals, transforming the signals
> to peak-clipped signal. In an infinitely peak-clipped signal the
> the information is coded by the zero-crossings..."
> It is heart breaking to see that you would abandon zero-crossings and us
> midstream.

You got me there. I was investigating how far one could go with zero-crossing.
I concluded (not in that paper) that it is hard to represent intensity (loudness)
with such models. This is mathematically well quantified by Ben Logan's theory
of reconstruction of signals from their zero crossings. As I understand his theory,
when you can do it, you loose the scale factor information. I would guess that the
same is true of LSPs. Gitza's multi-level crossing is an attempt to get around
this problem I believe, somewhat inspired by the distribution of thresholds in
auditory nerve fibers.  We now believe, as first proposed by fletcher, that the
loudness is coded by the overall rate of firing. However this is unlikely to be
a simple one to one code. Namely loudness is not just a measure of the total rate.

> Rmadas Kumaresan
> Yadong Wang


> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Get lost, Mr. Cochlea!! --- The Brain
> From:    Jont Allen  <jba@RESEARCH.ATT.COM>
> Date:    Tue, 27 Feb 2001 00:01:19 -0500
> Yadong,
> This is all very cute, and I dont want to be accused of not having a
> sense of humor,
> (clearly you do, and it is refreshing), but there is a thing called
> masking.
> Information is lost in the early auditory stages, due to neural coding.
> The auditory nerve signal is not about zero crossing.  Even zero
> crossing
> are not exact, and would have jitter. But masking is NOT timing jitter.
> The ear IS similar to a floating point converter. The ear does not have
> an infinite
> dynamic
> range or signal to noise ratio. This limited dynamic range shows up as
> masking.
> Do you disagree?
> Jont

Jont B. Allen
AT&T Labs-Research, Shannon Laboratory, E161
180 Park Ave., Florham Park NJ, 07932-0971
973/360-8545voice, x7111fax, http://www.research.att.com/~jba