[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Logan's theorem - a challenge



Hi Alain and members,

I will attempt to give a bit of non-mathematical intuition behind information in the zeros.

Zeros are cool! For example, if you know zeros (roots) of a polynomial you then know everything about it (upto a scaling factor). Hence, recovering a polynomial using just its zeros  is not a surprise. Folks do it in high school too.
A practical question is  - do you have access to all its zeros (or roots)? If yes, excellent. If not, then we have to state the necessary and sufficient conditions which the polynomial should obey to enable its recovery from zeros. To give a hint, polynomials can have complex roots, and it is not search for the imaginary roots if all you see is your polynomial wiggling on the real axis!

Now, let's switch gears to signals which are oscillatory. Polynomials can be oscillatory too but those will have many terms, with a huge mix of degrees.
Let's instead consider sine waves as oscillatory signals. Often, they give a compact representation to represent oscillations.
Can zeros (or zero-crossings, ZCs) of an oscillatory signal be sufficient for its recovery? B.F. Logan Jr addressed some aspects associated with this in http://languagelog.ldc.upenn.edu/myl/Logan1977.pdf (Theorem 8 and 9).
Broadly, he presented the conditions under which the ZCs of the oscillatory signal can uniquely define the signal. Note that, he did present any reconstruction algorithms.
===
What are the conditions under which ZCs suffice to represent an oscillatory signal? He gave two conditions. This was very nice of him as it made the paper popular! Let's get an intuition in a few steps.
- Firstly, what if the signal never crosses zero. It can have a DC making the oscillatory signal to wiggle above zero voltage all the time. No zeros hence, no recovery possible using ZCs! Hence, the signal should be bandpass (no DC). This will make sure that we get some ZCs.
- Secondly, Nyquist sampling rate suggests that to sample an oscillatory signal at a rate at least twice the maximum frequency contained in its spectrum for enabling perfect recovery from just the samples.
We must make sure the ZC rate satisfies the Nyquist rate. If the signal is bandpass with an octave bandwidth this will be satisfied. Under this condition, the ZCs rate no more below the Nyquist rate, and thus, we have hope for recovery just from ZCs.
- Thirdly, we got the ZCs but do they uniquely define this oscillatory signal? Or, can there be another oscillatory signal which satisfies the above conditions and have the ZCs at the same instants? Here is the key contribution of Logan's paper. He presents a condition which if obeyed by the signal then the ZCs are unique. Interestingly, this condition links to the polynomials but in a more abstract way. I will not expand on it here.
===
So, nothing got solved as, like Alain suggested, we still don't have a reconstruction algorithm to get back the signal from ZCs. But we do now know some conditions which can guarantee the recovery and some day we will have an algorithm for recovery. Also, since Logan's paper, multiple researchers have worked on this topic. Like, what Malcolm et al. suggested is an iterative approach (projection on convex sets). These approaches are actually very interesting, and even more when they present the guarantees on convergence.

Anyways, there is a much bigger picture here. Sampling ZCs implies sampling time instants instead of amplitude.
I find this interesting as it breaks away from the uniform sampling (and analysis) framework employed in signal processing. Researchers continue to work on this aspect. I am also one of them :). One of our works which might be of interest to you is [1].

With new approaches, we might be able to reach out to a bigger class of oscillatory signals than what Logan considered. Further, also design algorithms (maybe DDN-style) to understand the information encoded in ZCs of arbitrary time limited  (not bandlimited) signals.
To summarize, ZCs might be the "dark matter" of signals and exploring these can potentially re-define the way we do DSP.

[1] Neeraj Kumar Sharma, Time-Instant Sampling Based Encoding of Time-Varying Acoustic Spectrum, AIP Conference Proceedings 1703, 100003 (2015) https://aip.scitation.org/doi/pdf/10.1063/1.4939431

Cheers,
Neeraj
https://neerajww.github.io/

On Tue, Sep 28, 2021 at 5:27 PM Alain de Cheveigne <alain.de.cheveigne@xxxxxxxxxx> wrote:
Hi Malcom,

Every time I come across a problem it invariably turns out that you, or Dick, or Shihab, have figured it out in the 80s or 90s :-). 

POCS is very elegant. The fact that you implemented it in an auditory model and got a good result suggests that the idea is indeed sound.  And yet, you say the outcome was not perfect, and it’s uncertain exactly how it might depend on further imperfections of a biological implementation. It would be nice to have a formal proof that relates imperfection of outcome to imperfection of premises.  And that is understandable by those who would benefit from it.

I see 3 categories of customer: (a) those, like you, or I, or Shihab, that trust that the result holds and think it's useful, (b) those who are skeptical and/or put off by too much hand-waving, and (c) those who have no idea what we’re talking about, but should. Categories (b) and especially (c) need catering for.

The band-limited constraint is powerful, but brittle.  A band-limited signal extends over all time, allowing for LOTS of zero-crossings.  A time-limited signal has access to much less. The ~1/3 octave bandwidth of an auditory filter might seem narrow enough, and yet its impulse response (which determines the temporal span and thus the available zero-crossings) extends for just a few cycles before falling below the noise floor. That’s VERY far from the ideal.  On the other hand, as you say, a half-wave rectified signal contains much more information, so intuitively the consequence might hold even if the premise does not.  It would be nice to make that slightly more formal.

The aim is to make things simple.  If a Logan-like theorem holds, and peripheral transduction is `transparent’, we don’t need to worry about it when developing a neural model of auditory processing. We can conceptualize the model as having full access to the acoustic waveform, no information lost. Implementation details are important, but we can work on them with the trust that there is no obstacle of principle. That’s how I understand your ICASSP paper.

This is potentially useful enough to justify the effort to get it right to the satisfaction of all categories, particularly as the implications are are non-trivial and possibly contentious.  I’m not sure everyone is happy with the idea that spectral resolution is NOT limited by auditory filter bandwidth...

Alain
(please cc to Alain.de.Cheveigne@xxxxxxxxx, as we’re having mail server problems)


> On 26 Sep 2021, at 16:01, Malcolm Slaney <000001757ffb5fe1-dmarc-request@xxxxxxxxxxxxxxx> wrote:
>
> POCS.
>
> Projections onto Convex Sets [1].
>
> Dick Lyon and I used POCS to invert [2] our favorite auditory model.  A contemporaneous paper [3] from Shamma’s lab did the same.
>
> Both the band-limited constraint and the known positive values of the signal define convex sets.  We know in the frequency domain many parts of the spectrum are equal to zero.  And in the time domain we know the values that are positive.  We can iterate between the time and the frequency domain, each time projecting onto the appropriate constraint, to find the best solution.
>
> I didn’t work out the theory, but since the auditory filter bank has a bandwidth of less than an octave, I think there must be only a single solution.  In practice, just a handful of back and forth iterations was sufficient to find the solution.
>
> Piece of cake.  :-)
>
> Our interest in this problem was not to generate audio, a cute parlor trick, but to show that the auditory representation we were working with did not lose any perceptually important information.
>
> — Malcolm
> P.S.  Reconstructions from zero crossing requires infinite resolution of the time of the zero crossing.  That would be hard to do with a spike representation. Fortunately, there is a LOT more information in the HWR signal.
>
> [1] https://en.wikipedia.org/wiki/Projections_onto_convex_sets
>
> [2] M. Slaney, D. Naar.  R. Lyon. Auditory model inversion for sound separation. Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994. https://engineering.purdue.edu/~malcolm/apple/icassp94/CorrelogramInversion.pdf
>
> [3] X. Yang; K. Wang; S.A. Shamma. Auditory representations of acoustic signals. IEEE Transactions on Information Theory, Volume: 38, Issue: 2, March 1992. https://ieeexplore.ieee.org/document/119739
>
>> On Sep 25, 2021, at 11:03 PM, Alain de Cheveigne <alain.de.cheveigne@xxxxxxxxxx> wrote:
>>
>> Hi all,
>>
>> Here’s a challenge for the young nimble minds on this list, and the old and wise.
>>
>> Logan’s theorem states that a signal can be reconstructed from its zero crossings, to a scale, as long as the spectral representation of that signal is less than an octave wide.  It sounds like magic given that zero crossing information is so crude. How can the full signal be recovered from a sparse series of time values (with signs but no amplitudes)?  “Band-limited” is clearly a powerful assumption.
>>
>> Why is this of interest in the auditory context?  The band-limited premise is approximately valid for each channel of the cochlear filterbank (sometimes characterized as a 1/3 octave filter).  While cochlear transduction is non-linear, Logan’s theorem suggests that any information lost due to that non-linearity can be restored, within each channel. If so, cochlear transduction is “transparent”, which is encouraging for those who like to speculate about neural models of auditory processing. An algorithm applicable to the sound waveform can be implemented by the brain with similar results, in principle. 
>>
>> Logan’s theorem has been invoked by David Marr for vision and several authors for hearing (some refs below). The theorem is unclear as to how the original signal should be reconstructed, which is an obstacle to formulating concrete models, but in these days of machine learning it might be OK to assume that the system can somehow learn to use the information, granted that it’s there.  The hypothesis has far-reaching implications, for example it implies that spectral resolution of central auditory processing is not limited by peripheral frequency analysis (as already assumed by for example phase opponency or lateral inhibitory hypotheses).
>>
>> Before venturing further along this limb, it’s worth considering some issues.  First, Logan made clear that his theorem only applies to a perfectly band-limited signal, and might not be “approximately valid” for a signal that is “approximately band-limited”.  No practical signal is band-limited, if only because it must be time limited, and thus the theorem might conceivably not be applicable at all.  On the other hand, half-wave rectification offers much richer information than zero crossings, so perhaps the end result is valid (information preserved) even if the theorem is not applicable stricto sensu.  Second, there are many other imperfections such as adaptation, stochastic sampling to a spike-based representation, and so on, that might affect the usefulness of the hypothesis.
>>
>> The challenge is to address some of these loose ends. For example:
>> (1) Can the theorem be extended to make use of a halfwave-rectified signal rather than zero crossings? Might that allow it to be applicable to practical time-limited signals?
>> (2) What is the impact of real cochlear filter characteristics, adaptation, or stochastic sampling? 
>> (3) In what sense can one say that the acoustic signal is "available” to neural signal processing?  What are the limits of that concept?
>> (4) Can all this be formulated in a way intelligible by non-mathematical auditory scientists?
>>
>> This is the challenge.  The reward is - possibly - a better understanding of how our brain hears the world.
>>
>> Alain
>>
>> ---
>> Logan BF, JR. (1977) Information in the zero crossings of bandpass signals. Bell Syst. Tech. J. 56:487–510.
>>
>> Marr, D. (1982) VISION - A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Co, republished by MIT press 2010.
>>
>> Heinz, M.G., Swaminathan J. (2009) Quantifying Envelope and Fine-Structure Coding in Auditory Nerve Responses to Chimaeric Speech, JARO 10: 407–423
>> DOI: 10.1007/s10162-009-0169-8.
>>
>> Shamma, S, Lorenzi, C (2013) On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system, J. Acoust. Soc. Am. 133, 2818–2833.
>>
>> Parida S, Bharadwaj H, Heinz MG (2021) Spectrally specific temporal analyses of spike-train responses to complex sounds: A unifying framework. PLoS Comput Biol 17(2): e1008155. https://doi.org/10.1371/journal.pcbi.1008155
>>
>> de Cheveigné, A. (in press) Harmonic Cancellation, a Fundamental of Auditory Scene Analysis. Trends in Hearing (https://psyarxiv.com/b8e5w/).