Re: Granular synthesis and auditory segmentation (Robert Bolia )

Subject: Re: Granular synthesis and auditory segmentation
From:    Robert Bolia  <rbolia(at)FALCON.AL.WPAFB.AF.MIL>
Date:    Thu, 15 Oct 1998 11:10:50 -0400

Richard, I'm not sure that I see how an MAA of 1-2 degrees (the 10 microsecond resolution of the binaural system) is crucial to explaining the Cocktail Party Effect. Has anyone ever done any cocktail party experiments with talkers separated by as little as 1 degree? Bob Bolia. Robert S. Bolia Research Scientist Veridian Air Force Research Laboratory Wright-Paterson Air Force Base, Ohio >>> "Richard J. Fabbri" <fabbri(at)NETAXIS.COM> 10/15 5:22 AM >>> ... Ah, The Place Theory! ... If you truly believe in Fourier Analysis then you also believe in the inverse transform. However, the local, resonant response of a stretched membrane (the Place Theory) is only useful for a sinusoidal drive. Speech presents complex structures in time and, a resonant response at any GIVEN time (at a PLACE) on the Basilar membrane is NOT the same thing as a FULL spectral analysis which produces amplitude and PHASE information at MANY "frequencies" such that an inverse transform is possible. ... It is also a well known fact that (Binaural) localization has a 10microsecond resolution (1 to 2 spatial degrees) and, that this resolution is crucial to explaining the Cocktail Party Effect. ... 10microsecond resolution implies a Fourier sample window of approx the same size --- which implies a "spectral" resolution of 100"KHz", i.e., quite useless if acoustic analysis is the goal. >the "textural aspect" (i.e., the patterning) >in sound textures perceptually relatively invariant to their >position in the time-frequency plane with a typical [0s, 1s] >by [500 Hz, 5 kHz] area. That is also what I would want, since >it keeps the perceptual qualities of overall time-frequency >"position" and sound "pattern" largely independent, just as >in vision the texture of an object doesn't appear to change >and interfere with position of that object in the visual field. > >Of course the "art" is to optimize this preservation of >invariants in the cross-modal mapping, while maximizing >resolution and ease of perception (including "proper" >grouping and segregation, possibly by manipulating the >sound textures). ... I also work with graphical speech patterns. ... But, my structures are time-locked to SOURCES in the acoustic environment. ... And, Self-Organized Neural Maps detect and classify these structures. ... You must solve the SOURCE localization problem before ANY (source) analysis is EVER attempted. ... I wish to make this point emphatically - one can ONLY analyze a SOURCE and, before one CAN do SOURCE analysis, one MUST isolate the information from THAT source. ... This is precisely what occurs during the Cocktail Party Effect, i.e., a SOURCE is isolated and analysis is focused on the information produced by THAT source. ... Spectral analysis of a Acoustic point in space is essentially useless since the resultant frequencies to be ASSOCIATED with a PARTICULAR SOURCE remains unknown! ... But, this result is to be expected as the typical FFT sample window of 10 milliseconds is 1,000 times larger than the raw, human, localization resolution of 10 microseconds, i.e., the 1 to 2 Spatial Degrees of SOURCE resolution are completely buried - actually, it's more appropriate to say that Spatial Resolution has been lost in the 10 millisecond AVERAGING process used to calculate "spectral" components. Rich Fabbri McGill is running a new version of LISTSERV (1.8d on Windows NT). Information is available on the WEB at McGill is running a new version of LISTSERV (1.8d on Windows NT). Information is available on the WEB at

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University