Re: Granular synthesis and auditory segmentation ("Richard J. Fabbri" )

Subject: Re: Granular synthesis and auditory segmentation
From:    "Richard J. Fabbri"  <fabbri(at)NETAXIS.COM>
Date:    Thu, 15 Oct 1998 05:22:15 -0400

Peter, Thank you for pasting the note of Mssr. Didier with yours as it now gives me the opportunity to discuss both of your responses in this single email. >I don't know about your neurons, but mine completely fail >to replenish their synapses above about 1 or 2 kHz even >after plenty of coffee. ... Actually many physiology books discuss the refractory period (the ability to replenish chemical balance) of Cochlear neurons as operating to about 5"KHz". ... A detail readily confirmed by the literature. ... Given that most telephony systems run 300"Hz" to 3"KHz", the 5"KHz" refractory period does well in practical situations! >Of course there is a role for non-Fourier type processing too, but >no simple scheme covers the entire audible [20 Hz, 20 kHz] range. ... True, this simple scheme merely covers all of speech communications. ... Singing (resonant redundancies) and mechanical vibrations in string and wind instruments have other cues. >Didier Depireux clarified the issue very nicely: > >> The half-wave rectification occurs _after_ the frequency >> decomposition performed on the basilar membrane, i.e. >> after you have decomposed the signal into frequency channels. ... Ah, The Place Theory! ... If you truly believe in Fourier Analysis then you also believe in the inverse transform. However, the local, resonant response of a stretched membrane (the Place Theory) is only useful for a sinusoidal drive. Speech presents complex structures in time and, a resonant response at any GIVEN time (at a PLACE) on the Basilar membrane is NOT the same thing as a FULL spectral analysis which produces amplitude and PHASE information at MANY "frequencies" such that an inverse transform is possible. ... It is also a well known fact that (Binaural) localization has a 10microsecond resolution (1 to 2 spatial degrees) and, that this resolution is crucial to explaining the Cocktail Party Effect. ... 10microsecond resolution implies a Fourier sample window of approx the same size --- which implies a "spectral" resolution of 100"KHz", i.e., quite useless if acoustic analysis is the goal. >the "textural aspect" (i.e., the patterning) >in sound textures perceptually relatively invariant to their >position in the time-frequency plane with a typical [0s, 1s] >by [500 Hz, 5 kHz] area. That is also what I would want, since >it keeps the perceptual qualities of overall time-frequency >"position" and sound "pattern" largely independent, just as >in vision the texture of an object doesn't appear to change >and interfere with position of that object in the visual field. > >Of course the "art" is to optimize this preservation of >invariants in the cross-modal mapping, while maximizing >resolution and ease of perception (including "proper" >grouping and segregation, possibly by manipulating the >sound textures). ... I also work with graphical speech patterns. ... But, my structures are time-locked to SOURCES in the acoustic environment. ... And, Self-Organized Neural Maps detect and classify these structures. ... You must solve the SOURCE localization problem before ANY (source) analysis is EVER attempted. ... I wish to make this point emphatically - one can ONLY analyze a SOURCE and, before one CAN do SOURCE analysis, one MUST isolate the information from THAT source. ... This is precisely what occurs during the Cocktail Party Effect, i.e., a SOURCE is isolated and analysis is focused on the information produced by THAT source. ... Spectral analysis of a Acoustic point in space is essentially useless since the resultant frequencies to be ASSOCIATED with a PARTICULAR SOURCE remains unknown! ... But, this result is to be expected as the typical FFT sample window of 10 milliseconds is 1,000 times larger than the raw, human, localization resolution of 10 microseconds, i.e., the 1 to 2 Spatial Degrees of SOURCE resolution are completely buried - actually, it's more appropriate to say that Spatial Resolution has been lost in the 10 millisecond AVERAGING process used to calculate "spectral" components. Rich Fabbri McGill is running a new version of LISTSERV (1.8d on Windows NT). Information is available on the WEB at

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University