[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ph.D. thesis on computational audition available



Dear friends,
you might find the following thesis interesting.

Redundancy Reduction for Computational Audition, a Unifying Approach.

Paris Smaragdis,
Massachusetts Institute of Technology, Media Laboratory,
May 2001.

Abstract

Computational audition has always been a subject of multiple theories.
Unfortunately very few place audition in the grander scheme of
perception, and even fewer facilitate formal and robust definitions as
well as efficient implementations. In our work we set forth to address
these issues. We present mathematical principles that unify the
objectives of lower level listening functions, in an attempt to
formulate a global and plausible theory of computational audition. Using
tools to perform redundancy reduction, and adhering to theories of its
incorporation in a perceptual framework, we pursue results that support
our approach. Our experiments focus on three major auditory functions,
preprocessing, grouping and scene analysis. For auditory preprocessing,
we prove that it is possible to evolve coclear-like filters by
adaptation to natural sounds. Following that and using the same
principles as in preprocessing, we present a treatment that collapses
the heuristic set of the gestalt auditory grouping rules, down to one
efficient and formal rule. We succesfully apply the same elements once
again to form an auditory scene analysis foundation, capable of
detection, autonomous feature extraction, and separation of sources in
real-world complex scenes. Our treatment was designed in such a manner
so as to be independent of parameter estimations and data
representations specific to the auditory domain. Some of our experiments
have been replicated in other domains of perception, providing equally
satisfying results, and a potential for defining global ground rules for
computational perception, even outside the realm of our five senses.

The documents and some of the media examples are to be found at:

http://sound.media.mit.edu/~paris/phd

Best regards,
Paris