Re: [AUDITORY] Logan's theorem - a challenge (Alain de Cheveigne )


Subject: Re: [AUDITORY] Logan's theorem - a challenge
From:    Alain de Cheveigne  <alain.de.cheveigne@xxxxxxxx>
Date:    Tue, 28 Sep 2021 09:44:55 +0100

Hi Malcom,=20 Every time I come across a problem it invariably turns out that you, or = Dick, or Shihab, have figured it out in the 80s or 90s :-). =20 POCS is very elegant. The fact that you implemented it in an auditory = model and got a good result suggests that the idea is indeed sound. And = yet, you say the outcome was not perfect, and it=E2=80=99s uncertain = exactly how it might depend on further imperfections of a biological = implementation. It would be nice to have a formal proof that relates = imperfection of outcome to imperfection of premises. And that is = understandable by those who would benefit from it. I see 3 categories of customer: (a) those, like you, or I, or Shihab, = that trust that the result holds and think it's useful, (b) those who = are skeptical and/or put off by too much hand-waving, and (c) those who = have no idea what we=E2=80=99re talking about, but should. Categories = (b) and especially (c) need catering for. The band-limited constraint is powerful, but brittle. A band-limited = signal extends over all time, allowing for LOTS of zero-crossings. A = time-limited signal has access to much less. The ~1/3 octave bandwidth = of an auditory filter might seem narrow enough, and yet its impulse = response (which determines the temporal span and thus the available = zero-crossings) extends for just a few cycles before falling below the = noise floor. That=E2=80=99s VERY far from the ideal. On the other hand, = as you say, a half-wave rectified signal contains much more information, = so intuitively the consequence might hold even if the premise does not. = It would be nice to make that slightly more formal. The aim is to make things simple. If a Logan-like theorem holds, and = peripheral transduction is `transparent=E2=80=99, we don=E2=80=99t need = to worry about it when developing a neural model of auditory processing. = We can conceptualize the model as having full access to the acoustic = waveform, no information lost. Implementation details are important, but = we can work on them with the trust that there is no obstacle of = principle. That=E2=80=99s how I understand your ICASSP paper. This is potentially useful enough to justify the effort to get it right = to the satisfaction of all categories, particularly as the implications = are are non-trivial and possibly contentious. I=E2=80=99m not sure = everyone is happy with the idea that spectral resolution is NOT limited = by auditory filter bandwidth... Alain (please cc to Alain.de.Cheveigne@xxxxxxxx, as we=E2=80=99re having mail = server problems) > On 26 Sep 2021, at 16:01, Malcolm Slaney = <000001757ffb5fe1-dmarc-request@xxxxxxxx> wrote: >=20 > POCS. >=20 > Projections onto Convex Sets [1]. >=20 > Dick Lyon and I used POCS to invert [2] our favorite auditory model. = A contemporaneous paper [3] from Shamma=E2=80=99s lab did the same.=20 >=20 > Both the band-limited constraint and the known positive values of the = signal define convex sets. We know in the frequency domain many parts = of the spectrum are equal to zero. And in the time domain we know the = values that are positive. We can iterate between the time and the = frequency domain, each time projecting onto the appropriate constraint, = to find the best solution. >=20 > I didn=E2=80=99t work out the theory, but since the auditory filter = bank has a bandwidth of less than an octave, I think there must be only = a single solution. In practice, just a handful of back and forth = iterations was sufficient to find the solution. >=20 > Piece of cake. :-) >=20 > Our interest in this problem was not to generate audio, a cute parlor = trick, but to show that the auditory representation we were working with = did not lose any perceptually important information. >=20 > =E2=80=94 Malcolm > P.S. Reconstructions from zero crossing requires infinite resolution = of the time of the zero crossing. That would be hard to do with a spike = representation. Fortunately, there is a LOT more information in the HWR = signal. >=20 > [1] https://en.wikipedia.org/wiki/Projections_onto_convex_sets >=20 > [2] M. Slaney, D. Naar. R. Lyon. Auditory model inversion for sound = separation. Proceedings of ICASSP '94. IEEE International Conference on = Acoustics, Speech and Signal Processing, 1994. = https://engineering.purdue.edu/~malcolm/apple/icassp94/CorrelogramInversio= n.pdf >=20 > [3] X. Yang; K. Wang; S.A. Shamma. Auditory representations of = acoustic signals. IEEE Transactions on Information Theory, Volume: 38, = Issue: 2, March 1992. https://ieeexplore.ieee.org/document/119739 >=20 >> On Sep 25, 2021, at 11:03 PM, Alain de Cheveigne = <alain.de.cheveigne@xxxxxxxx> wrote: >>=20 >> Hi all, >>=20 >> Here=E2=80=99s a challenge for the young nimble minds on this list, = and the old and wise. >>=20 >> Logan=E2=80=99s theorem states that a signal can be reconstructed = from its zero crossings, to a scale, as long as the spectral = representation of that signal is less than an octave wide. It sounds = like magic given that zero crossing information is so crude. How can the = full signal be recovered from a sparse series of time values (with signs = but no amplitudes)? =E2=80=9CBand-limited=E2=80=9D is clearly a = powerful assumption. >>=20 >> Why is this of interest in the auditory context? The band-limited = premise is approximately valid for each channel of the cochlear = filterbank (sometimes characterized as a 1/3 octave filter). While = cochlear transduction is non-linear, Logan=E2=80=99s theorem suggests = that any information lost due to that non-linearity can be restored, = within each channel. If so, cochlear transduction is =E2=80=9Ctransparent=E2= =80=9D, which is encouraging for those who like to speculate about = neural models of auditory processing. An algorithm applicable to the = sound waveform can be implemented by the brain with similar results, in = principle. =20 >>=20 >> Logan=E2=80=99s theorem has been invoked by David Marr for vision and = several authors for hearing (some refs below). The theorem is unclear as = to how the original signal should be reconstructed, which is an obstacle = to formulating concrete models, but in these days of machine learning it = might be OK to assume that the system can somehow learn to use the = information, granted that it=E2=80=99s there. The hypothesis has = far-reaching implications, for example it implies that spectral = resolution of central auditory processing is not limited by peripheral = frequency analysis (as already assumed by for example phase opponency or = lateral inhibitory hypotheses). >>=20 >> Before venturing further along this limb, it=E2=80=99s worth = considering some issues. First, Logan made clear that his theorem only = applies to a perfectly band-limited signal, and might not be = =E2=80=9Capproximately valid=E2=80=9D for a signal that is = =E2=80=9Capproximately band-limited=E2=80=9D. No practical signal is = band-limited, if only because it must be time limited, and thus the = theorem might conceivably not be applicable at all. On the other hand, = half-wave rectification offers much richer information than zero = crossings, so perhaps the end result is valid (information preserved) = even if the theorem is not applicable stricto sensu. Second, there are = many other imperfections such as adaptation, stochastic sampling to a = spike-based representation, and so on, that might affect the usefulness = of the hypothesis. >>=20 >> The challenge is to address some of these loose ends. For example: >> (1) Can the theorem be extended to make use of a halfwave-rectified = signal rather than zero crossings? Might that allow it to be applicable = to practical time-limited signals? >> (2) What is the impact of real cochlear filter characteristics, = adaptation, or stochastic sampling? =20 >> (3) In what sense can one say that the acoustic signal is = "available=E2=80=9D to neural signal processing? What are the limits of = that concept? >> (4) Can all this be formulated in a way intelligible by = non-mathematical auditory scientists? >>=20 >> This is the challenge. The reward is - possibly - a better = understanding of how our brain hears the world. >>=20 >> Alain >>=20 >> --- >> Logan BF, JR. (1977) Information in the zero crossings of bandpass = signals. Bell Syst. Tech. J. 56:487=E2=80=93510. >>=20 >> Marr, D. (1982) VISION - A Computational Investigation into the Human = Representation and Processing of Visual Information. W.H. Freeman and = Co, republished by MIT press 2010. >>=20 >> Heinz, M.G., Swaminathan J. (2009) Quantifying Envelope and = Fine-Structure Coding in Auditory Nerve Responses to Chimaeric Speech, = JARO 10: 407=E2=80=93423 >> DOI: 10.1007/s10162-009-0169-8. >>=20 >> Shamma, S, Lorenzi, C (2013) On the balance of envelope and temporal = fine structure in the encoding of speech in the early auditory system, = J. Acoust. Soc. Am. 133, 2818=E2=80=932833. >>=20 >> Parida S, Bharadwaj H, Heinz MG (2021) Spectrally specific temporal = analyses of spike-train responses to complex sounds: A unifying = framework. PLoS Comput Biol 17(2): e1008155. = https://doi.org/10.1371/journal.pcbi.1008155 >>=20 >> de Cheveign=C3=A9, A. (in press) Harmonic Cancellation, a Fundamental = of Auditory Scene Analysis. Trends in Hearing = (https://psyarxiv.com/b8e5w/).


This message came from the mail archive
src/postings/2021/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University