[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Logan's theorem - a challenge



Hi Malcom, 

Every time I come across a problem it invariably turns out that you, or Dick, or Shihab, have figured it out in the 80s or 90s :-).  

POCS is very elegant. The fact that you implemented it in an auditory model and got a good result suggests that the idea is indeed sound.  And yet, you say the outcome was not perfect, and it’s uncertain exactly how it might depend on further imperfections of a biological implementation. It would be nice to have a formal proof that relates imperfection of outcome to imperfection of premises.  And that is understandable by those who would benefit from it.

I see 3 categories of customer: (a) those, like you, or I, or Shihab, that trust that the result holds and think it's useful, (b) those who are skeptical and/or put off by too much hand-waving, and (c) those who have no idea what we’re talking about, but should. Categories (b) and especially (c) need catering for.

The band-limited constraint is powerful, but brittle.  A band-limited signal extends over all time, allowing for LOTS of zero-crossings.  A time-limited signal has access to much less. The ~1/3 octave bandwidth of an auditory filter might seem narrow enough, and yet its impulse response (which determines the temporal span and thus the available zero-crossings) extends for just a few cycles before falling below the noise floor. That’s VERY far from the ideal.  On the other hand, as you say, a half-wave rectified signal contains much more information, so intuitively the consequence might hold even if the premise does not.  It would be nice to make that slightly more formal.

The aim is to make things simple.  If a Logan-like theorem holds, and peripheral transduction is `transparent’, we don’t need to worry about it when developing a neural model of auditory processing. We can conceptualize the model as having full access to the acoustic waveform, no information lost. Implementation details are important, but we can work on them with the trust that there is no obstacle of principle. That’s how I understand your ICASSP paper.

This is potentially useful enough to justify the effort to get it right to the satisfaction of all categories, particularly as the implications are are non-trivial and possibly contentious.  I’m not sure everyone is happy with the idea that spectral resolution is NOT limited by auditory filter bandwidth...

Alain
(please cc to Alain.de.Cheveigne@xxxxxxxxx, as we’re having mail server problems)


> On 26 Sep 2021, at 16:01, Malcolm Slaney <000001757ffb5fe1-dmarc-request@xxxxxxxxxxxxxxx> wrote:
> 
> POCS.
> 
> Projections onto Convex Sets [1].
> 
> Dick Lyon and I used POCS to invert [2] our favorite auditory model.  A contemporaneous paper [3] from Shamma’s lab did the same. 
> 
> Both the band-limited constraint and the known positive values of the signal define convex sets.  We know in the frequency domain many parts of the spectrum are equal to zero.  And in the time domain we know the values that are positive.  We can iterate between the time and the frequency domain, each time projecting onto the appropriate constraint, to find the best solution.
> 
> I didn’t work out the theory, but since the auditory filter bank has a bandwidth of less than an octave, I think there must be only a single solution.  In practice, just a handful of back and forth iterations was sufficient to find the solution.
> 
> Piece of cake.  :-)
> 
> Our interest in this problem was not to generate audio, a cute parlor trick, but to show that the auditory representation we were working with did not lose any perceptually important information.
> 
> — Malcolm
> P.S.  Reconstructions from zero crossing requires infinite resolution of the time of the zero crossing.  That would be hard to do with a spike representation. Fortunately, there is a LOT more information in the HWR signal.
> 
> [1] https://en.wikipedia.org/wiki/Projections_onto_convex_sets
> 
> [2] M. Slaney, D. Naar.  R. Lyon. Auditory model inversion for sound separation. Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994. https://engineering.purdue.edu/~malcolm/apple/icassp94/CorrelogramInversion.pdf
> 
> [3] X. Yang; K. Wang; S.A. Shamma. Auditory representations of acoustic signals. IEEE Transactions on Information Theory, Volume: 38, Issue: 2, March 1992. https://ieeexplore.ieee.org/document/119739
> 
>> On Sep 25, 2021, at 11:03 PM, Alain de Cheveigne <alain.de.cheveigne@xxxxxxxxxx> wrote:
>> 
>> Hi all,
>> 
>> Here’s a challenge for the young nimble minds on this list, and the old and wise.
>> 
>> Logan’s theorem states that a signal can be reconstructed from its zero crossings, to a scale, as long as the spectral representation of that signal is less than an octave wide.  It sounds like magic given that zero crossing information is so crude. How can the full signal be recovered from a sparse series of time values (with signs but no amplitudes)?  “Band-limited” is clearly a powerful assumption.
>> 
>> Why is this of interest in the auditory context?  The band-limited premise is approximately valid for each channel of the cochlear filterbank (sometimes characterized as a 1/3 octave filter).  While cochlear transduction is non-linear, Logan’s theorem suggests that any information lost due to that non-linearity can be restored, within each channel. If so, cochlear transduction is “transparent”, which is encouraging for those who like to speculate about neural models of auditory processing. An algorithm applicable to the sound waveform can be implemented by the brain with similar results, in principle.  
>> 
>> Logan’s theorem has been invoked by David Marr for vision and several authors for hearing (some refs below). The theorem is unclear as to how the original signal should be reconstructed, which is an obstacle to formulating concrete models, but in these days of machine learning it might be OK to assume that the system can somehow learn to use the information, granted that it’s there.  The hypothesis has far-reaching implications, for example it implies that spectral resolution of central auditory processing is not limited by peripheral frequency analysis (as already assumed by for example phase opponency or lateral inhibitory hypotheses).
>> 
>> Before venturing further along this limb, it’s worth considering some issues.  First, Logan made clear that his theorem only applies to a perfectly band-limited signal, and might not be “approximately valid” for a signal that is “approximately band-limited”.  No practical signal is band-limited, if only because it must be time limited, and thus the theorem might conceivably not be applicable at all.  On the other hand, half-wave rectification offers much richer information than zero crossings, so perhaps the end result is valid (information preserved) even if the theorem is not applicable stricto sensu.  Second, there are many other imperfections such as adaptation, stochastic sampling to a spike-based representation, and so on, that might affect the usefulness of the hypothesis.
>> 
>> The challenge is to address some of these loose ends. For example:
>> (1) Can the theorem be extended to make use of a halfwave-rectified signal rather than zero crossings? Might that allow it to be applicable to practical time-limited signals?
>> (2) What is the impact of real cochlear filter characteristics, adaptation, or stochastic sampling?  
>> (3) In what sense can one say that the acoustic signal is "available” to neural signal processing?  What are the limits of that concept?
>> (4) Can all this be formulated in a way intelligible by non-mathematical auditory scientists?
>> 
>> This is the challenge.  The reward is - possibly - a better understanding of how our brain hears the world.
>> 
>> Alain
>> 
>> ---
>> Logan BF, JR. (1977) Information in the zero crossings of bandpass signals. Bell Syst. Tech. J. 56:487–510.
>> 
>> Marr, D. (1982) VISION - A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Co, republished by MIT press 2010.
>> 
>> Heinz, M.G., Swaminathan J. (2009) Quantifying Envelope and Fine-Structure Coding in Auditory Nerve Responses to Chimaeric Speech, JARO 10: 407–423
>> DOI: 10.1007/s10162-009-0169-8.
>> 
>> Shamma, S, Lorenzi, C (2013) On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system, J. Acoust. Soc. Am. 133, 2818–2833.
>> 
>> Parida S, Bharadwaj H, Heinz MG (2021) Spectrally specific temporal analyses of spike-train responses to complex sounds: A unifying framework. PLoS Comput Biol 17(2): e1008155. https://doi.org/10.1371/journal.pcbi.1008155
>> 
>> de Cheveigné, A. (in press) Harmonic Cancellation, a Fundamental of Auditory Scene Analysis. Trends in Hearing (https://psyarxiv.com/b8e5w/).