[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

summary of replies

Title: summary of replies
Dear list, some time ago I put a message asking about PCA as a pre-processing technic for neural net's inputs. Some of you resquest a post with the replies I got. So, here is, with a little delay. (sorry about that).
Thanks a lot for all your help and suggestions.
Luis Felipe Oliveira

Malcolm Slaney
See also
a good solution to this problem.  The basic input format, anchor
models, should solve your problem.

Matt Flax:
You could try to use PDP++ ... it is a neural network computation
environment. IT allows you to use CSS to pre-process the input ... which
might mean you need to write C++ style code to take wav files and
transform them ....

John Hershey:
Matlab makes all of this fairly easy.
you can write your own programs to read the wav files, perform windowed
ffts, perform PCA on the power spectra (or log power spectra), and
implement your neural net.  

Christian Spevak:
a standard technique for audio preprocessing is the extraction of
mel-frequency cepstral coefficients (MFCCs). Logan (2000) argued that this
can be regarded as an approximation of the KL transform.

 author =       {Beth Logan},
 title =        {Mel frequency cepstral coefficients for music modeling},
 crossref =     {ISMIR00},

 title =        {Proceedings of the First International Symposium on Music
Information Retrieval (ISMIR)},
 booktitle =    {Proceedings of the First International Symposium on Music
Information Retrieval (ISMIR)},
 year =         {2000},
 address =      {Plymouth, Massachusetts},
 month =         oct,
 key =          {ISMIR},
 url =          {http://ciir.cs.umass.edu/music2000},

A Matlab implementation is included e.g. in Slaney's Auditory Toolbox.

 author =       {Malcolm Slaney},
 title =        {Auditory {T}oolbox {V}ersion 2},
 institution =  {Interval Research Corporation},
 address =      {Palo Alto, California},
 year =         {1998},
 type =         {Interval Technical Report},
 number =       {1998-010},
 url =          {http://www.slaney.org/malcolm/pubs.html},

In my own work, I have used MFCCs in combination with self-organizing maps
to classify sound. If you are interested:

Christian Spevak, Richard Polfreman and Martin Loomes. Towards detection of
perceptually similar sounds: investigating self-organizing maps. In:
Proceedings of the AISB'01 Symposium on Creativity in Arts and Sciences,
pp. 45-50. York, 2001.

Other approaches in timbre recognition build on the extraction of multiple
features, such as spectral centroid, spectral flux, RMS etc or higher-level
features characterizing e.g. rhythmic texture. A good overview of such
feature extraction techniques has recently been given by Tzanetakis:

 author =       {George Tzanetakis},
 title =        {Manipulation, Analysis and Retrieval Systems for Audio
 school =       {Princeton University},
 year =         {2002},
 url =          {http://www.cs.princeton.edu/~gtzan/},

Huseyin Hacihabiboglu:
Audio data contained in WAV or AIFF files are bulky for any algorithm
which is not psychoacoustically motivated (using ANNs is such a case).
Therefore, feature vectors which define the salient properties of sounds
are needed. Different approaches have been taken employing different
feature vector types. In my opinion, the most fruitful approaches used TF
representations (in one way or another) of sound as feature vectors. What
I understand from your case is that you're trying PCA to find a couple of
principle components of the sound file as feature vectors. I would

" P. Herrera, X. Amatriain, E. Batlle, and X. Serra, Towards
instrument segmentation for music content description: A critical review
of instrument classification techniques, Proceedings of the ICMC99, 1999."

as a good overview (i.e. of not only ANNs but also other fancy methods
such as rough sets) about different approaches taken.

I worked on a similar topic dring my MSc and you can reacch my thesis
and related publications from http://husshho.port5.com

As for the computer programs I would recommend MATLAB which also has a
Neural Network Toolbox (http://www.mathworks.com) and it also has
utilities to read wav and aiff files. Other than that, you can also try
Octave (http://www.octave.org) which is a free GPL tool, but I don't know
if it includes any neural network toolbox. In any case Octave may help you
with reading only the aiff files. (In Turkish we say : "Fair enough for
the roast of the cheap meat!") :)

Hamin Pichevar:
I think that a lot of processing has to be done before you can apply your
signal to your neural network. PCA and Kahrunen-Loeve transforms suppose
that your signal is stationary which is not the case for sound sources. In
addition, the Levinson-Durbin algorithm (an implementation of the
Kahrunen-Loeve T (KLT))is very CPU consuming. Yes, in fact KLT is a
compression technique (so that it will take less neurons in the input
layer), but timbre is a "detail". Are you sure that by doing a KLT you don't
lose important timbristic information? If you use the KLT anyway, then a HMM
(Hidden Markov Model) or a classical neural network (backprop., etc.) could
be used for the classification. But don't use KLT on raw (.wav) data. Use
one of the feature extraction algorithms used for sound processing [1].

You didn't mention what type of network you want to use, but most of the
classical neural networks are static and unsuitable for this task in the
general sense. There is a new tendency in this field toward more biological
approaches ([2][3][4] and so many others). If you  opt for this approach,
you should begin with a cochlear filterbank and envelope detection
afterwards [2]. Hope this can help you a little bit.

Falou !,

 author = {Joseph W. Picone},
 month = {sep},
 year = 1993,
 title = {Signal Modeling Techniques in Speech Recognition},
 journal = {Proceedings of the IEEE},
 volume = 81,
 number = 9,
 pages = {1215-1247},

 author = {D. Wang and G. J. Brown},
 month = {May},
 year = 1999,
 title = {Separation of Speech from Interfering Sounds Based on Oscillatory
 journal = {IEEE Transactions on Neural Networks},
 volume = 10,
 number = 3,
 pages = {684-697},
 author = {J. Rouat and R. Pichevar},
 year = 2002,
 title = {Nonlinear Speech Processing Techniques for Source Segregation},
 booktitle = {{EUSIPCO2002}}
 author = {R. Pichevar and J. Rouat},
 year = 2002,
 title = {Double-Vowel Segregation Based on a {Cochleotopic/AMtopic} Map
         Using a Biological Neural Network},
 booktitle = {{APCAM2002}}

Ali Taylan Cemgil:
I did many years ago such an experiment. I have an online paper that
you can find at http://www.mbfys.kun.nl/~cemgil/papers/netenga.ps.gz

Jay Reynolds:
Check out:


Rua Haszard Morris:
We are in a similar situation; we have time-based pitch contours and face
motion data (17*3 dimensions) that we'd like to input to a neural net or
some kind of classification stats.  I am in the middle of looking at
reducing the motion data using a PCA-based technique.

I believe that a PCA would result in time-based data (but with less, say 7,
dimensions), and would therefore not be suitable for direct input to a
neural net.  However, such data could be modelled and the parameters for the
model input to the net/stats.

For example, we are using a polynomial model to reduce time-based pitch
contours (from speech data) to coefficients which are then used to feed into
a discriminant analysis to classify with.  We are having moderate success
with this approach, but are still looking at what to do with the motion

So.. I would appreciate it if you could forward to me any suggestions people
make to you on this question.  If we have any breakthroughs or success I'll
let you know...