[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Granular synthesis and auditory segmentation

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Granular synthesis and auditory segmentation
From: Peter Cariani <peter@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 15 Oct 1998 16:44:36 +0100
Organization: Eaton Peabody Laboratory
Reply-to: peter@xxxxxxxxxxxxxxxxxxxx
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
While I agree with much of what Bates and Fabbri are saying regarding
threshold crossing times (essentially I believe that
frequency information is largely conveyed through an interspike
interval code operating over the entire auditory nerve
array -- i.e. all-order interval statistics of tens
of thousands of auditory nerve fibers that constitutes an
autocorrelation-like representation of the stimulus), there have been
expressed some common misconceptions about how auditory nerve fibers
respond to sounds.

Many general neuroscience textbooks are absolutely
terrible on these issues.

Meijer:
>I don't know about your neurons, but mine completely fail
>to replenish their synapses above about 1 or 2 kHz even
>after plenty of coffee.

Fabbri:
        ... Actually many physiology books discuss the refractory
        period (the ability to replenish chemical balance) of
        Cochlear neurons as operating to about 5"KHz".
        ... A detail readily confirmed by the literature.
        ... Given that most telephony systems run 300"Hz" to 3"KHz",
        the 5"KHz" refractory period does well in practical situations!


Relative refractory periods of auditory nerve fibers in
the cat are generally on the order of 1 - 10 msec or more,
with most having mean refractory periods around 4 or 5 msec.
This can be deduced from maximum sustained driven rates that
are seen in the auditory nerve (generally a few hundred Hz).
I think this is what the discussion of synaptic replenishment was
alluding to.) Minimum refractory periods can be as short as
750 usec, but one sees these as occasional spike doublets
rather than sustained bursts or stimulus-entrained responses.

While ANF's do not fire once every cycle for components above
a few hundred Hz, they do phase-lock to stimulus components
(at least) up to 4-5 kHz. Because there is synaptic jitter,
one would have to look at longer and longer spike trains in order
to detect significant phase-locking at higher frequencies
if it is indeed there. A few people have looked but there
is some dispute over methodological details. I think there is also
a question of whether anesthesia might affect phase-locking
at higher frequencies (at other levels of the auditory
system, barbiturate anesthetics appear to reduce the upper
frequencies for which there is timing information
(by whatever criterion) by half.

The "volley principle" was proposed in the 1930's as a means
by which ANF's collectively could temporally encode stimulus
periodicities up to several kHz, despite the fact that no
one ANF fires at that high a rate. There were some initial
errors in the way the volley principle was conceptualized,
i.e. it was thought that each ANF would fire deterministically
once every N-th cycle, whereas it was later found that whether
the ANF fires on any given cycle is stochastic rather than
deterministic (e.g. a Poisson process with deadtime).

There are some far-reaching implications of phase-locking of
responses that should be drawn out and discussed. These
apply not only to the auditory system, but also to other sensory
systems whose receptors follow the stimulus waveform
(mechanoception, for flutter-vibration; vision, for drifting images).

First, to the extent that there is phase-locking the fine
temporal structure of the stimulus is represented in the
times-of-arrivals of spikes. This means that the phase
structure of the stimulus is contained in the times-of-arrivals
of spikes in the ANF array (albeit with different cochlear
delays and absolute spike latencies for different ANF's).

Second, to the extent that there is phase-locking, all-order
interval distributions will roughly reflect the autocorrelation
functions of the stimulus. For each CF-channel in the auditory
nerve array, this is the autocorrelation function of the
stimulus after it has undergone cochlear filtering. When
autocorrelations of many overlapping frequency channels are
combined, the form of the summary autocorrelation resembles
that of the unfiltered stimulus (this is the
autocorrelational equivalent of summing all of
the individual components in a power spectrum). We see a
rough resemblance between the forms of all-order interval
distributions taken over populations of ANF's with many
different CF's and those of the stimulus autocorrelation
function, i.e. patterns of major and minor peaks and their
locations for periodicities below a few kHz.

[I say "roughly reflect", because half-wave rectification
does subtly alter the form of interspike interval
distributions and autocorrelation functions.
The autocorrelation of a sinusoidal component has the
form of a cosine of the same frequency; the autocorrelation
of a half-wave rectified component has flattened troughs,
but closely resembles that of the unrectified component
near the peaks. The positions of peaks and troughs are
unchanged by half-wave rectification. When you listen to
a half-wave rectified pure tone, it does sound very
slightly different in timbre, slightly buzzy, but the
pitch is the same,so the pure tone and its rectified
counterpart don't sound all that different. I think
Didier is right, however, in insisting on factoring
in the effects of cochlear filtering first -- there
are simple thought experiments with waveforms having
positive-negative asymmetries which nevertheless still
sound the same when polarities are reversed. If we
only listened to the positive parts of asymmetric
waveforms, different polarities should sound different.]

To the extent that there is phase-locking, in
the time-of-arrival patterns one then has phase
structure, whereas in the interval patterns created
by the times-of-arrivals one has a representation
of stimulus spectral structure via an autocorrelation
like temporal representation. Common, repeated
time-of-arrival (phase) structure is a very basic & powerful
heuristic for perceptual grouping that does not
require channel-selection or segregation. So it's
possible to use phase in building up fused auditory
objects, but then to use other aspects of the pattern,
such as interspike interval statistics (which are
phase-deaf) to encode the attributes of auditory
objects (pitch, timbre, rhythm). There is a very
interesting paper by Kubovy in Kubovy & Pomerantz,
eds. (1981, Perceptual Organization, LEA) on phase
effects and auditory grouping.

One of the big advantages of using spike timing are
the precisions of spike arrival times (clearly the
10-20 usec timing jnd's that are seen for
binaural ITD's are due to phase-locked spike
precisions although it is not clear to me concretely how
central ITD analyzers actually achieve those
precisions. Jnd's for pure tone frequencies near 1 kHz
similarly correspond to temporal differences on
the order of 20 usec........). Other advantages are the
extremely invariant nature of the interval patterns
that are produced, in the face of large changes in
level, background noise, location in auditory space,
etc. etc. etc. This makes the problem of centrally
representing auditory form invariances much simpler
(one does not require compensatory mechanisms to
adjust for level or s/n or locational differences
or system nonlinearities.)

If one takes these mechanisms seriously, then the
thing to look at are (running) autocorrelations of
various signals of interest (speech, music, env.
sounds, etc.). Running autocorrelations are
expansions of the signal (e.g. RAC(t, tau) =
X(t)*X(t-tau)) that require no contiguous
temporal analysis window (as a spectrogram does),
and hence can have as fine time resolution as the
time-series that is the waveform being analyzed.
It's a way of getting "instantaneous periodicity"
(oxymoron).

Peter Cariani

Peter Cariani, Ph.D.
Eaton Peabody Laboratory
    of Auditory Physiology
Massachusetts Eye & Ear Infirmary
243 Charles St, Boston MA 02114

tel       (617) 573-4243
FAX      (617) 720-4408
email    peter@epl.meei.harvard.edu

McGill is running a new version of LISTSERV (1.8d on Windows NT). 
Information is available on the WEB at http://www.mcgill.ca/cc/listserv
Follow-Ups:
- Re: Granular synthesis and auditory segmentation
  - From: Richard J. Fabbri
Prev by Date: Re: MAA and stream separation
Next by Date: Re: Granular synthesis and auditory segmentation
Previous by thread: Re: MAA and stream separation
Next by thread: Re: Granular synthesis and auditory segmentation
Index(es):
- Date
- Thread