[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Auditory Toolbox for Matlab
I'm very happy to announce that version 2.0 of the Matlab Auditory
toolbox is now available for downloading. More information,
documentation, and downloading links are available at
The auditory toolbox builds on top of the capabilities of a commercial
calculation system known as Matlab. Matlab provides all the IO routines
and provides a general environment for programming. The Matlab auditory
toolbox extends Matlab's capabilities by providing a number of auditory
models. A small amount of code, all written in a very high-level
scripting language, is easy to understand and modify (if necessary)
The first version of this toolbox was published while I was at Apple.
Apple has graciously given Interval permission to update the toolbox,
fixing some bugs and adding some new features. The primary modules
provided by this toolbox are gammatone filters, Meddis' hair cell model,
Lyon's cochlear model, correlograms, Seneff's ear model, and some common
representations from the speech world.
A large number of people have helped make this toolbox better, both by
providing code and feedback. Certainly Richard F. Lyon, Ray Meddis,
Richard Duda, Chris Pal, Kate Nguyen and Alain de Cheveigne' have made
large contributions. Thank you.
I hope you find this toolbox useful. There are no guarantees, after all
this code is free. Please let me know if you download this and want to
be notified of updates.
P.S. I am interested in other contributions to this toolbox. Please
let me know if you have an implementation of an auditory model in Matlab
which might fit into this package.
What is the Auditory Toolbox?
This report describes a collection of tools that implement several
popular auditory models for a numerical programming environment called
MATLAB. This toolbox will be useful to researchers that are interested
in how the auditory periphery works and want to compare and test their
theories. This toolbox will also be useful to speech and auditory
engineers who want to see how the human auditory system represents
This version of the toolbox fixes several bugs, especially in the
Gammatone and MFCC implementations, and adds several new functions. This
report was previously published as Apple Computer Technical Report #45.
We appreciate receiving permission from Apple Computer to republish
their code and to update this package.
There are many ways to describe and represent sounds. The figure below
shows one taxonomy based on signal dimensionality. A simple waveform is
a one-dimensional representation of sound. The two-dimensional
representation describes the acoustic signal as a time-frequency image.
This is the typical approach for sound and speech analysis. This toolbox
includes conventional tools such as the short-time-Fourier-Transform
(STFT or Spectrogram) and several cochlear models that estimate auditory
nerve firing “probabilities” as a function of time. Finally, the next
level of abstraction is to summarize the periodicities of the cochlear
output with the correlogram. The correlogram provides a powerful
representation that makes it easier to understand multiple sounds and to
perform auditory scene analysis.
What does the Auditory Toolbox contain?
Six types of auditory time-frequency representations are implemented in
1. Richard F. Lyon has described an auditory model based on a
transmission line model of the basilar membrane and followed by several
stages of adaptation. This model can represent sound at either a fine
time scale (probabilities of an auditory nerve firing) or at the longer
time scales characteristic of the spectrogram or MFCC analysis. The
LyonPassiveEar command implements this particular ear model.
2. Roy Patterson has proposed a model of psychoacoustic filtering
based on critical bands. This auditory front-end combines a Gammatone
filter bank with a model of hair cell dynamics proposed by Ray Meddis.
This auditory model is implemented using the MakeERBFilters,
ERBFilterBank, and MeddisHairCell commands.
3.Stephanie Seneff has described a cochlear model that combines a
critical band filterbank with models of detection and automatic gain
control. This toolbox implements stages I and II of her model.
4. Conventional FFT analysis is represented using the spectrogram.
Both narrow band and wide band spectrograms are possible. See the
spectrogram command for more information.
5. A common front-end for many speech recognition systems consists
of Mel-frequency cepstral coefficients (MFCC). This technique combines
an auditory filter-bank with a cosine transform to give a rate
representation roughly similar to the auditory system. See the mfcc
command for more information. In addition, a common technique known as
rasta is included to filter the coefficients, simulating the effects of
masking and providing speech recognition system a measure of
6. Conventional speech-recognition systems often use
linear-predictive analysis to model a speech signal. The forward
transform, proclpc, and its inverse, synlpc are included.
Email to AUDITORY should now be sent to AUDITORY@lists.mcgill.ca
LISTSERV commands should be sent to firstname.lastname@example.org
Information is available on the WEB at http://www.mcgill.ca/cc/listserv