Re: Granular synthesis and auditory segmentation

Richard Fabbri wrote:

> Does this set of facts make anyone wonder why we bother to sample
> the WHOLE waveform and continue to apply Fourier transforms to these
> Bipolar samples when Nature does Not ?!
> Nature has a much simpler solution in Time Domain!

I don't know about your neurons, but mine completely fail
to replenish their synapses above about 1 or 2 kHz even
after plenty of coffee. Of course there is a role for
non-Fourier type processing too, but no simple scheme
covers the entire audible [20 Hz, 20 kHz] range.

Didier Depireux clarified the issue very nicely:

> The half-wave rectification occurs _after_ the frequency
> decomposition performed on the basilar membrane, i.e.
> after you have decomposed the signal into frequency
> channels.

For these and other reasons, I'm more interested in granular
synthesis applications that operate up to say 5 or 6 kHz by
taking into account this frequency decomposition performed
on the basilar membrane. To me a relatively weak effect that
breaks down above 1 or 2 kHz would appear of little use. In
my experience, using a logarithmic frequency scale and linear
time axis makes the "textural aspect" (i.e., the patterning)
in sound textures perceptually relatively invariant to their
position in the time-frequency plane with a typical [0s, 1s]
by [500 Hz, 5 kHz] area. That is also what I would want, since
it keeps the perceptual qualities of overall time-frequency
"position" and sound "pattern" largely independent, just as
in vision the texture of an object doesn't appear to change
and interfere with position of that object in the visual field.

Of course the "art" is to optimize this preservation of
invariants in the cross-modal mapping, while maximizing
resolution and ease of perception (including "proper"
grouping and segregation, possibly by manipulating the
sound textures).

Best wishes,

Peter Meijer



