[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

summary of replies

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: summary of replies
From: Luis Felipe de Oliveira <luisfol@xxxxxxxxxx>
Date: Mon, 9 Dec 2002 14:31:10 -0300
Comments: To: Luis Guilherme <lglimabr@bol.com.br>, Juliano Broli <broli@ig.com.br>, Zampronha <zampra@osite.com.br>
Delivery-date: Mon Dec 9 11:33:43 2002
Reply-to: Luis Felipe de Oliveira <luisfol@xxxxxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Microsoft-Outlook-Express-Macintosh-Edition/5.02.2022

Title: summary of replies

Dear list, some time ago I put a message asking about PCA as a pre-processing technic for neural net's inputs. Some of you resquest a post with the replies I got. So, here is, with a little delay. (sorry about that).
Thanks a lot for all your help and suggestions.
Luis Felipe Oliveira

Malcolm Slaney suggests:
See also         http://www.almaden.ibm.com/cs/people/malcolm/pubs/MPESAR-ICME2002.pdf for a good solution to this problem. The basic input format, anchor models, should solve your problem.Matt Flax:
You could try to use PDP++ ... it is a neural network computation environment. IT allows you to use CSS to pre-process the input ... which might mean you need to write C++ style code to take wav files and transform them .... http://www.cnbc.cmu.edu/resources/PDP%2B%2B/PDP%2B%2B.html
John Hershey:
Matlab makes all of this fairly easy. you can write your own programs to read the wav files, perform windowed ffts, perform PCA on the power spectra (or log power spectra), and implement your neural net. Christian Spevak:
a standard technique for audio preprocessing is the extraction of mel-frequency cepstral coefficients (MFCCs). Logan (2000) argued that this can be regarded as an approximation of the KL transform. @INPROCEEDINGS{Logan00, author =       {Beth Logan}, title =        {Mel frequency cepstral coefficients for music modeling}, crossref =     {ISMIR00}, } @PROCEEDINGS{ISMIR00, title =        {Proceedings of the First International Symposium on Music Information Retrieval (ISMIR)}, booktitle =    {Proceedings of the First International Symposium on Music Information Retrieval (ISMIR)}, year =         {2000}, address =      {Plymouth, Massachusetts}, month =         oct, key =          {ISMIR}, url =          {http://ciir.cs.umass.edu/music2000}, } A Matlab implementation is included e.g. in Slaney's Auditory Toolbox. @techreport{Slaney98a, author =       {Malcolm Slaney}, title =        {Auditory {T}oolbox {V}ersion 2}, institution = {Interval Research Corporation}, address =      {Palo Alto, California}, year =         {1998}, type =         {Interval Technical Report}, number =       {1998-010}, url =          {http://www.slaney.org/malcolm/pubs.html}, } In my own work, I have used MFCCs in combination with self-organizing maps to classify sound. If you are interested: Christian Spevak, Richard Polfreman and Martin Loomes. Towards detection of perceptually similar sounds: investigating self-organizing maps. In: Proceedings of the AISB'01 Symposium on Creativity in Arts and Sciences, pp. 45-50. York, 2001. http://www.spevak.de/pub/ Other approaches in timbre recognition build on the extraction of multiple features, such as spectral centroid, spectral flux, RMS etc or higher-level features characterizing e.g. rhythmic texture. A good overview of such feature extraction techniques has recently been given by Tzanetakis: @phdthesis{Tzanetakis02, author =       {George Tzanetakis}, title =        {Manipulation, Analysis and Retrieval Systems for Audio Signals}, school =       {Princeton University}, year =         {2002}, url =          {http://www.cs.princeton.edu/~gtzan/}, }Huseyin Hacihabiboglu:
Audio data contained in WAV or AIFF files are bulky for any algorithm which is not psychoacoustically motivated (using ANNs is such a case). Therefore, feature vectors which define the salient properties of sounds are needed. Different approaches have been taken employing different feature vector types. In my opinion, the most fruitful approaches used TF representations (in one way or another) of sound as feature vectors. What I understand from your case is that you're trying PCA to find a couple of principle components of the sound file as feature vectors. I would recommend: " P. Herrera, X. Amatriain, E. Batlle, and X. Serra, Towards instrument segmentation for music content description: A critical review of instrument classification techniques, Proceedings of the ICMC99, 1999." as a good overview (i.e. of not only ANNs but also other fancy methods such as rough sets) about different approaches taken. I worked on a similar topic dring my MSc and you can reacch my thesis and related publications from http://husshho.port5.com As for the computer programs I would recommend MATLAB which also has a Neural Network Toolbox (http://www.mathworks.com) and it also has utilities to read wav and aiff files. Other than that, you can also try Octave (http://www.octave.org) which is a free GPL tool, but I don't know if it includes any neural network toolbox. In any case Octave may help you with reading only the aiff files. (In Turkish we say : "Fair enough for the roast of the cheap meat!") :)Hamin Pichevar:
I think that a lot of processing has to be done before you can apply your signal to your neural network. PCA and Kahrunen-Loeve transforms suppose that your signal is stationary which is not the case for sound sources. In addition, the Levinson-Durbin algorithm (an implementation of the Kahrunen-Loeve T (KLT))is very CPU consuming. Yes, in fact KLT is a compression technique (so that it will take less neurons in the input layer), but timbre is a "detail". Are you sure that by doing a KLT you don't lose important timbristic information? If you use the KLT anyway, then a HMM (Hidden Markov Model) or a classical neural network (backprop., etc.) could be used for the classification. But don't use KLT on raw (.wav) data. Use one of the feature extraction algorithms used for sound processing [1]. You didn't mention what type of network you want to use, but most of the classical neural networks are static and unsuitable for this task in the general sense. There is a new tendency in this field toward more biological approaches ([2][3][4] and so many others). If you opt for this approach, you should begin with a cochlear filterbank and envelope detection afterwards [2]. Hope this can help you a little bit. Falou !, Ramin [1]@ARTICLE{Picone93, author = {Joseph W. Picone}, month = {sep}, year = 1993, title = {Signal Modeling Techniques in Speech Recognition}, journal = {Proceedings of the IEEE}, volume = 81, number = 9, pages = {1215-1247}, } [2]@ARTICLE{Wang99, author = {D. Wang and G. J. Brown}, month = {May}, year = 1999, title = {Separation of Speech from Interfering Sounds Based on Oscillatory          Correlation}, journal = {IEEE Transactions on Neural Networks}, volume = 10, number = 3, pages = {684-697}, } [3]@INPROCEEDINGS{Rouat2002, author = {J. Rouat and R. Pichevar}, year = 2002, title = {Nonlinear Speech Processing Techniques for Source Segregation}, booktitle = {{EUSIPCO2002}} } [4]@CONFERENCE{Pichevar2002, author = {R. Pichevar and J. Rouat}, year = 2002, title = {Double-Vowel Segregation Based on a {Cochleotopic/AMtopic} Map          Using a Biological Neural Network}, booktitle = {{APCAM2002}} }
Ali Taylan Cemgil:
I did many years ago such an experiment. I have an online paper that you can find at http://www.mbfys.kun.nl/~cemgil/papers/netenga.ps.gz
Jay Reynolds:
Check out: http://www.cs.unr.edu/~bebis/CS791S/Code/pca/
Rua Haszard Morris:
We are in a similar situation; we have time-based pitch contours and face motion data (17*3 dimensions) that we'd like to input to a neural net or some kind of classification stats. I am in the middle of looking at reducing the motion data using a PCA-based technique. I believe that a PCA would result in time-based data (but with less, say 7, dimensions), and would therefore not be suitable for direct input to a neural net. However, such data could be modelled and the parameters for the model input to the net/stats. For example, we are using a polynomial model to reduce time-based pitch contours (from speech data) to coefficients which are then used to feed into a discriminant analysis to classify with. We are having moderate success with this approach, but are still looking at what to do with the motion data. So.. I would appreciate it if you could forward to me any suggestions people make to you on this question. If we have any breakthroughs or success I'll let you know...

Follow-Ups:
- Measured/Speculative difference in AN firing
  - From: Satrajit S. Ghosh

Prev by Date: Re: Modulation as grouping characteristic
Next by Date: vacancy for Ph.D.student
Previous by thread: Re: Modulation as grouping characteristic
Next by thread: Measured/Speculative difference in AN firing
Index(es):
- Date
- Thread