the digital Foley artist

The nonstationary spectrum of speech, and its frequent discontinuities
create the problem that Dan Ellis and Brian Karlsen have noted here,
with conviction. In addition to agreeing, though, I would like to
dissuade the list readers from concluding that the deficiencies of
the contemporary accounts of perceptual organization--the assignment of
acoustic constituents to coherent streams--is restricted to the domain
of vocally produced sound. While I have argued that the standard framework
for discussing auditory perceptual analysis clearly fails when challenged
to explain the perceptual coherence of speech--and, I have taken the standard
approach to be Auditory Scene Analysis and its computational implementations--
the standard account goes largely untested with mechanical sources of
sound. Instead, the tests of a Gestalt-derived conceptualization rely on
arbitrarily designed patterns composed for the ideal domain of audiofrequency
oscillators and noise generators. Paul Iverson's thesis, and Dan
Ellis's, are notable exceptions to my caricature, and indicate how far
we have yet to go to understanding the perceptual organization of sounds
produced by complex nonvocal mechanical events.

On such grounds, I have proposed that we reserve our endorsement of
Auditory Scene Analysis as the accurate description of the analytical
function that promotes perceptual coherence; it is at least plausible
to speculate that tests with complex mechanical sources of sound will
reveal that vocally produced sound is just one instance of this class,
in which analytic mechanisms are adequate to handle nonstationary,
discontinuous and heterogeneous spectra. For a variety of independent
reasons, this seems like a best bet; expectation-driven processes work
quite well in artifical analyzers (Dennis Klatt used to contrast the
natural constraints with those that govern the implementations of
engineers) in which memory is cheap and durable. They are far less
inviting as descriptions of the human operator.

So--to conclude--a low-level, fast, automatic function operating without
expectation is clearly implicated by perceptual studies of the human
listener as the means by which organization is achieved for speech.
Whether this is a domain-specific solution to the problem of organizing
the speech stream, or whether this is a general auditory function
which accommodates complex albeit ordinary mechanical sources of
sound remains to be discovered.

