the digital Foley artist ("Robert E. Remez" )

Subject: the digital Foley artist
Date:    Thu, 2 Apr 1998 15:13:33 -0500

TO: Multiple recipients of list AUDITORY The nonstationary spectrum of speech, and its frequent discontinuities create the problem that Dan Ellis and Brian Karlsen have noted here, with conviction. In addition to agreeing, though, I would like to dissuade the list readers from concluding that the deficiencies of the contemporary accounts of perceptual organization--the assignment of acoustic constituents to coherent streams--is restricted to the domain of vocally produced sound. While I have argued that the standard framework for discussing auditory perceptual analysis clearly fails when challenged to explain the perceptual coherence of speech--and, I have taken the standard approach to be Auditory Scene Analysis and its computational implementations-- the standard account goes largely untested with mechanical sources of sound. Instead, the tests of a Gestalt-derived conceptualization rely on arbitrarily designed patterns composed for the ideal domain of audiofrequency oscillators and noise generators. Paul Iverson's thesis, and Dan Ellis's, are notable exceptions to my caricature, and indicate how far we have yet to go to understanding the perceptual organization of sounds produced by complex nonvocal mechanical events. On such grounds, I have proposed that we reserve our endorsement of Auditory Scene Analysis as the accurate description of the analytical function that promotes perceptual coherence; it is at least plausible to speculate that tests with complex mechanical sources of sound will reveal that vocally produced sound is just one instance of this class, in which analytic mechanisms are adequate to handle nonstationary, discontinuous and heterogeneous spectra. For a variety of independent reasons, this seems like a best bet; expectation-driven processes work quite well in artifical analyzers (Dennis Klatt used to contrast the natural constraints with those that govern the implementations of engineers) in which memory is cheap and durable. They are far less inviting as descriptions of the human operator. So--to conclude--a low-level, fast, automatic function operating without expectation is clearly implicated by perceptual studies of the human listener as the means by which organization is achieved for speech. Whether this is a domain-specific solution to the problem of organizing the speech stream, or whether this is a general auditory function which accommodates complex albeit ordinary mechanical sources of sound remains to be discovered. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= = Robert E. Remez 212.854.4247 (office) = = Professor and Chair 212.854.2464 (lab) = = Department of Psychology 212.854.2069 (dept) = = Barnard College 212.854.3601 (fax) = = 3009 Broadway = = New York, New York 10027-6598 = = email: remez(at) = = Home page: = =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University