[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Acoustical similarity

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Acoustical similarity
From: "Regis Rossi A. Faria" <regis@xxxxxxxxxx>
Date: Mon, 5 Feb 2007 14:59:39 -0200
Comments: To: James McDermott <jamesmichaelmcdermott@GMAIL.COM>
Delivery-date: Mon Feb 5 12:07:14 2007
In-reply-to: <e442154b0702050217n4fcf0af2hadded4c2e806529b@mail.gmail.com>
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
References: <e442154b0702050217n4fcf0af2hadded4c2e806529b@mail.gmail.com>
Reply-to: "Regis Rossi A. Faria" <regis@xxxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Thunderbird 1.5.0.9 (Windows/20061207)

Hi Bruno,

I have used wavelet transform to extract features (patterns/cues for expressivity content) in the past, and more recently I used PEAQ (perceptual evaluation of audio quality) techniques to measure similarity (or quality degradation) of two sounds: the original and the encoded/decoded (i.e., which has passed by a encoding/decoding process).

This last one is an objective (algorithmic) audio quality evaluation method standardized by ITU under BS.1387.

Kind regards,
Regis

James McDermott escreveu:

From:    "Bruno L. Giordano"

I am looking for "general" metrics of the acoustical (not perceived)
similarity between mono signals independent of a features extraction
stage (e.g., peak level, harmonicity etc.).

Ideally, this metric would operate on a low-level representation of the
signal (ideally the waveform).


Hi Bruno,

I am doing work which involves measuring similarity for machine
learning applications. One standard method (eg in evolutionary
computation) is to take a mean square error over the magnitude or
power spectrum: ie for two signals x and y of length N, window them
and take the DFT of each window and then take the magnitude of each
bin, to produce two sequences of spectra, X_i and Y_i: the distance is
then

d(x, y) = sum_i (sum_n (X_i[j] - Y_i[j]) ^2)

You can indeed define a purely time-domain distance measure:

d(x, y) = sum_n (x[n] - y[n]) / N

but it seems to be pretty useless: eg if we construct y by
phase-inverting x, we get a very large distance between them, even
though they sound exactly the same.

As you know, in other applications (such as automatic classification),
the extraction of features is more common.

I'd be interested to hear more about your application and why you
don't want to extract features?

James

References:
- Re: Acoustical similarity
  - From: James McDermott

Prev by Date: Re: Robust method of fundamental frequency estimation.
Next by Date: Re: Five string bass
Previous by thread: Re: Acoustical similarity
Next by thread: C1 to C#1
Index(es):
- Date
- Thread