# Sound Analysis Tools

```Lonce LaMar Wyse wrote:

> > Last year, I began a discussion on the list  about analysis tools for
>> sound processing and resynthesis that prooved to be interesting for many
>> people on the auditory list.

>Quite interested.

>I can't say I recall the previous discussion - do you have it archived
>so that you could email me a copy?

I think it is archived somewhere on the Auditory mailing list web pages.

>> My research is about the analysis and resynthesis of musical sounds.

>What distinction do you mean to make between musical and nonmusical sounds?

I must have said tuned and untuned sounds.

I will start the discussion by describing my purposes and what algorithm I use.

My primary goal is to separate the harmonic part and the stochastic part of
the sound.

There are many existing algorithms for analysing the harmonic part of a sound:

- heterodyne filtering
- phase vocoder
- fft interpolation (PARSHL,MQ)
- wavelet transforms (Toshio Irino)
- wavelet transform interpolations (Dan Ellis)

I choose the FFT interpolation:
for every time frame I do a windowed FFT.
I use a Gaussian window.
When you multiply the signal by the window, you convolve the spectrum by
the transform of the window (in this peticular case the transform of
a gaussian is... a gaussian).
So, when the signal is a sine wave, I obtain a gaussian in my spectrum.
What I want is the exact continuous frequency of my sine wave.
What I have is a frequency sampled gaussian.
On a log scale a gaussian is a parabola.
So, the quadratic interpolation of this spectrum on a log scale gives
an accurate value of the frequency and amplitude.

For each frame I store the frequency, phase and amplitude for each peak
of the spectrum (ie for every sine wave).

Before resynthesis, I select the peaks that correspond to harmonics
(thanks to an harmony criterion).

On resynthesis, I link the peaks from a frame to the next.
I achieve resynthesis by cubic interpolation of the phase(MQ): the synthesized
harmonic part sound has the same phases as the original.

So, I substract, in the time domain, the harmonic resynthesized part from the
original sound, I obtain the difference: ie the noisy part of the sound.

By the way, I know that Xavier Serra did an algorithm very similar to this one.
But does he substract the harmonic sound in the time domain or in the spectral
domain (after his paper I think that it is in the spectral domain).

This analysis is very usefull: as it gives the difference between the original
signal and the synthetic one, it gives a good idea of the quality of the
analysis itself. So, it is possible to adapt many parameters (such as
window length) to obtain the best results.

This kind of algorithm is very efficient for tuned sounds, it allows you
to analyse sounds with strong vibrati and glissandi.

It can also be usefull for time stretch and frequency shifts or also for
sound editing and morphing (there, you only use the frequency and
amplitude information, there can appear distortions for noisy
sounds because the phase relationships can't be used).

It is far less efficient for noisy sounds such as cymbal sounds.
In this case I think that the problem is that the ear has a rather log
frequency scale rather than a linear scale (such as the FFT).

The solution may be the use of constant Q analysis (such as Dan Ellis's or
Toshio Irino's). The problem is that high speed continuous wavelet transforms
are still under development...

So wait and see ?

Is there somebody on the list working on such an algorithm ?

>Perfecto Herrera-Boyer wrote:

>I am a Doctorate student too and I expect to develop my thesis on
>sound synthesis, being directed by Xavier Serra. I am working at the
>Institut Universitari de l'Audiovisual (IUA). It is a
>research/production center belonging to the Universitat Pompeu Fabra
>academic/professional profile merges Cognitive Science and Sound
>Technology, although nowadays I am interested more in the latter than
>in the former.

>Here at the IUA we are working mainly with SMS (Spectral Modeling
>Synthesis), but there is somebody who works with Lemur.

Can you describe SMS on the auditory mailing list.

Maybe Kelly Fitz (Lemur author) is on the list and can describe us his
software.

Maybe, somebody at the IRCAM can describe us their SuperPhaseVocoder
(SVP) and Audiosculpt(an impressive piece of software).

>Maybe, as a starting point, it would be interesting to review or
>abstract the discussion held last year, because I am a new member of
>HEARING.

For those who are interested, I think that you can find an archive of the
previous discussions(and paper refs) on the auditory mailing list web pages...

If you really want, it may be possible to make an abstract from the previous
discussion and to post it to the list.

+-------------------------------------------------------------+
|                                                             |\
|  Thierry Rochebois           Doctorant                      | +
|  IEF   ARCEMA/TSI    Bat 220   UPS    91405 ORSAY Cedex     | |
|  thierry@ief-paris-sud.fr                                   | |
|  http://maury.ief-paris-sud.fr:8001/~thierry/welcome.html   | |
|                                                             | |
+-------------------------------------------------------------+ |
\                                                             \|
+-------------------------------------------------------------+
```