[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The natural spectrogram



At 11:09 29.01.2004 -0800, Julius Smith wrote:
>At 01:48 AM 1/28/2004, Eckard Blumschein wrote:
>>...So far I can neither imagine the
>>STFT itself to be natural nor a spectrogram based on it. Wouldn't this
>>require to naturally choose size of the window?
>
>Yes -- and as a function of frequency.  We normally call it a
>"multiresolution" STFT.

In this sense and due to the absence of arbitrary windows, both the actual
cochlear functions and the suggested natural spectrogram are distinguished
by a steplessly sliding rather than just a stepping 'multi' resolution. Use
of STFT at least requires an arbitrary decision how many windows to choose
for every moment. Perhaps there is no natural preference for a particular
variant of choices.

>
>>Wouldn't one have to decide further arbitrary parameters like the degree
>>of overlap?
>
>This is just a sampling-rate issue.  If computational cost is no object,
>one can simply choose maximum overlap (i.e., a "sliding FFT" instead of a
>"hopping FFT").  On the other hand, FFT filter banks can usually be
>downsampled quite a lot and still give equivalent end results.  In this
>context, your window is your anti-aliasing filter for
>downsampling.  Reference: Jont B. Allen, "Short Term Spectral Analysis,
>Synthesis, and Modification by Discrete Fourier Transform", IEEE ASSP-25(3).

Cochlea is not subject to a sample rate, and the natural spectrogram adapts
to the given sampled input in a similar manner as does the 'sliding' FFT.
Neither cochlea nor the natural spectrogram require an anti-aliasing filter.
You are quite right: The wider the frequency range, the more it makes sense
to downsample the input to the low-frequency window. Isn't it a calamity:
Perhaps, there is no way of sliding down-sampling. Even with most powerful
computers, a multidimensional field of FFTs which might be further
complicated by a sophisticated structure of downsampling looks anything but
practical and parsimonious. Its first dimension are the frequency steps of
multiresolution, its second one are the temporal steps of overlap, its
third one are frequency dependent ratios of downsampling. Jont's paper
dates back to 1977 when one could not yet imagine PCs to do that job. So I
consider it of historical value.

>
>>  Doesn't any usual spectrogram incompletely represent the information?
>
>STFTs are normally invertible, in my experience, even in the presence of
>aliasing due to downsampling (it gets canceled in the reconstruction).  The
>classic spectrogram discards phase, so it is not exactly invertible.  Of
>course, it is well known that phase can be reconstructed from STFT
>magnitude to a large extent for typical signals and analysis conditions.

Cochlea as well as the natural spectrogram are not subject to such
consequences of inappropriate theory.


>>Isn't the usual spectrogram subject to the notorious trade-off beween
>>spectral and temporal resolution?
>
>Well sure, but we can let the human ear tell us where to be on that
trade-off.

This might be not quite correct for several reasons. The smallest product
delta t times delta f of hearing is much better than according to the
uncertainty principle.
Aren't about 10 microseconds and 1 Hz realistic? The product is 10^-5 << 1.
Frequency resolution of the natural spectrogram is not at all restricted,
in principle.


>>Was there any physiological justification for STFT which could include the
>>rectification?
>>Is there close similarity to measurement of BM motion and neural pattern?
>
>I don't understand the first question.  My understanding of rectification
>that this is the nature of how the hair cells respond to basilar membrane
>vibration. Firing increases when the membrane pushes one way, but not the
>other.

That is my understanding of rectification, too.

>The STFT implements a filter bank, and the output of that filter
>bank can be rectified accordingly (applied to real time-domain signals at
>the STFT filter-bank output, of course).

The latter is the point. Complex FT including STFT does not at all deliver
a real time-domain signal but magnitude and phase. The usual spectrogram
shows magnitude vs. time. A magnitude cannot be rectified. Therefore, the
usual spectrogram fails to convey the information responsible for audible
effects of polarity, eg between positive and negative clicks.
The natural spectrogram is more natural in any comparison with the usual
one because it is based on Fourier cosine transform (FCT) which directly
provides the input to rectification.

Proponents of complex FT might claim the FCT to be just the real part of
FT. This is formally largely correct. However, it ignores several important
flaws arising from arbitrary preconditions of complex calculus. Let's skip
the two most basic arbitrary choices (origin and sign of imaginary part).

Complex FT always presumes tacit introduction of redundancy. The most
'correct' input to the FT in case of what I suggest to call an 'effectual
signal' fills a window that is located symmetrical with respect to zero but
padded with zeros for the time to come in which the signal is unknown. The
word 'effectual' indicates correspondence to the so called causal signal.
The effectual signal differs from a simply time-mirrored (anti-causal) one.

The zeros introduce two mutually cancelling fictive components each of
which alone would violate causality. That's why the traditional spectrogram
exhibits non-causality. Being obvious nonsense, non-causal output before
any input is the more strikingly to be seen the wider the window has been
chosen. Any Fourier transform of a causal signal or an effectual signal
shows Hermitian symmetry. In other words, its real part is symmetrical over
frequency. Negative frequency does not have any particular physical
meaning. It is just an artefact of complex calculus.

>
>I suppose you're posting to the right list!

In 1844, Ohm dismissed Seebeck's observation. He supposed that a missing
fundamental cannot be heared. Let's be open to the insight that cochlea
performs a real-valued rather than complex-valued frequency analysis. Then
we are in position to deal with the further steps of physiological signal
processing on a more sound basis. As a first result, I offered a 'joint
autocorrelation' hypothesis. Hopefully, it will reconcile Peter and
Christian. Also it could, for the first time, plausibly tell to Ohm why
Seebeck was right. To this list, Chen-Gia Tsai posted his observation of a
pitch at 9f0/4 (if I recall correctly). I imagine hundreds of experts here
on the lurk for something new. Of course, corrections are always and
anywhere unwelcome. I do not intend to hurt anybody. If someone feels
offended I apologize for that. I will sincerely try going on responding
privately to all objections and request.

Eckard