[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Intermediate representation for music analysis

I would add to this that, using an FFT, it is quite easy to measure the
component frequencies of a complex signal with precision that is finer
than the bin spacing. One just needs to estimate the rate of change of
the phase of a component within a bin. This technique, which has been
described in the context of the so-called phase vocoder algorithm,
permits the frequency of each signal component resolved by the FFT to be
estimated more precisely than the limit apparently imposed by the FFT
bin spacing in the frequency domain.

Best regards,

Hugh McDermott, PhD
Principal Research Fellow
Department of Otolaryngology
The University of Melbourne
384 - 388 Albert Street,
East Melbourne.  3002
Phone: +61 3 9929 8665
Fax: +61 3 9663 6086
E-mail: hughm@xxxxxxxxxxxxxx
Web page: http://www.medoto.unimelb.edu.au/people/mcdermoh/

-----Original Message-----
From: AUDITORY Research in Auditory Perception
[mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Bob Masta
Sent: Monday, 17 July 2006 11:01 PM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Intermediate representation for music analysis

Note that no matter what sort of analysis you do, the frequency
resolution is determined by the reciprocal of the analysis window
duration.  So if you want fine resolution for the low frequencies, you
need a long sample set, even if you only need much coarser resolution at
the high frequencies (due to the log nature of hearing).
So, why not just take a long FFT?  Even though they have linear
frequency spacing, FFTs have been heavily optimized for efficient
computation.  I wonder if it might be better using a conventional FFT
and lumping some upper bins together to form quasi-log bands, rather
than using a less-efficient log-spaced filter bank.

There is one weakness to that approach, however, in that if you set the
overall FFT length so that the lowest band you want to handle is just
exactly matched by the lowest FFT spectral line width, then the next
spectral line will be at *twie* that... there will be no nice
fractional-octave alignment.  If you really need that,
a log filter bank may be best.   

However, the way I have seen this handled is to assume (hope?) that
there will be plenty of upper harmonics in the signal, many of which
will fall into regions of the FFT where the resolution (considered on an
octave basis) is much higher.  By looking at a few of these upper
harmonics, it was possible to figure out what the actual fundamental
frequency was to similarly-high resolution.

Best regards,

Bob Masta