[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Acoustical similarity

Hello James,

I use features for the quantification of the acoustical correlates of behavioral data.

However, the very process of the definition of the features requires making guesses (i.e., assumptions) about what is perceptually relevant.

While this problem might not be incredibly pressing for the researcher working on simple, synthetic stimuli, it can become painful when perception of complex everyday sounds is of interest. Indeed, given the informational richness of these latter, it is possible that the researcher, in the features-definition process, does not capture all of what is used by a listener.

Therefore the idea of general metrics.


James McDermott wrote:
From:    "Bruno L. Giordano"

I am looking for "general" metrics of the acoustical (not perceived)
similarity between mono signals independent of a features extraction
stage (e.g., peak level, harmonicity etc.).

Ideally, this metric would operate on a low-level representation of the
signal (ideally the waveform).

Hi Bruno,

I am doing work which involves measuring similarity for machine
learning applications. One standard method (eg in evolutionary
computation) is to take a mean square error over the magnitude or
power spectrum: ie for two signals x and y of length N, window them
and take the DFT of each window and then take the magnitude of each
bin, to produce two sequences of spectra, X_i and Y_i: the distance is

d(x, y) = sum_i (sum_n (X_i[j] - Y_i[j]) ^2)

You can indeed define a purely time-domain distance measure:

d(x, y) = sum_n (x[n] - y[n]) / N

but it seems to be pretty useless: eg if we construct y by
phase-inverting x, we get a very large distance between them, even
though they sound exactly the same.

As you know, in other applications (such as automatic classification),
the extraction of features is more common.

I'd be interested to hear more about your application and why you
don't want to extract features?