[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

auditory filters without tails

Dear Experts on auditory modeling,

I am developing auditorily-motivated methods for extracting
multiple pitches from musical audio (= polyphonic music transcription).

A question related to auditory filters is burning in my mind:


The shape of the magnitude response of the "gammatone" auditory filters
can be characterized as "rounded-exponential + TAILS around the passband".
Due to the tails, a filter centered on 1 kHz, for example,
exhibits only up to 65 dB attenuation for frequencies in the range 0-500 Hz.
This is NOT sufficient in music analysis; the filters at higher bands
are not able to really reject the dominating sinusoidal components
at lower frequencies.
I would like to get rid of the tails of the auditory filter alltogether.


Can anyone provide me a digital IIR-filter implementation for the
Roex(p) frequency response (without the tails) as proposed in [1] ?
The gammatone implementation by Slaney in [2] includes the tails,
thus implementing the Roex(p,w,t) type response rather than the simpler
Roex(p) model in [1].


It seems to me that steeper auditory filters (without the tails) should
be used for computational analysis of audio.
This is because the components at the neighbouring bands are sinusoids
and not flat noise as in the experiments where the auditory filter
shape was originaly measured and derived.

>From the theoretical perspective (my goals being quite application-oriented),
I am quoting one paragraph from [1] which indicates that the tails
of the auditory filter are simply due to approach to the absolute threshold!
Please let me know if I am completely on the wrong track or if
the conclusion by Patterson and Moore below is no longer valid.

Quotation from [1, p.144]:
 "The dynamic range of the filters shown in Fig. 3.13 decreases with decreasing
  [notched] noise level. This suggests that the tails of the auditory
  filter are simply a consequence of masked threshold approaching absolute
  threshold at wide notch bandwidths. A similar conclusion was reached by
  Glasberg et al. (1984b) who measured the characteristic of the
  auditory filter over a greater dynamic range than had been done
  previously, using a relatively high noise spectrum level, 45 dB,
  and maskers with very wide notches. They also used listeners with a
  wide range of ages and, correspondingly, a wide range of absolute
  thresholds. The tails that typically flank the passband of the auditory
  filter had previously been interpreted as being an inherent part of
  the filter shape. If this were true, the filters of Glasberg et al.
  should have had typical passbands with extended tails because of the
  greater dynamic range of the measurements. Instead their filters
  showed passbands with markedly increased dynamic ranges and the same
  small tails as found previously. Although the decrease in the slope
  of the curve relating threshold to notch width often occurs well before
  the signal reaches absolute threshold, the approach to absolute threshold
  does seem to be the crucial factor in producing the tails of the filter."


[1] Patterson, Moore, "Auditory filters and excitation patterns as
    representations of frequency resolution," In Frequency Selectivity in
    Hearing, Moore (Ed.), Academic Press, 1986.
[2] Slaney, "An Efficient Implementation of the Patterson-Holdsworth
    Cochlear Filter Bank," Apple TR #35, 1993.

Any comments and advice are greatly appreciated.
With best regards,

Anssi Klapuri    klap@cs.tut.fi   http://www.cs.tut.fi/~klap
work: Tampere University of Tech., P.O.Box 553, FIN-33101 Tampere, Finland
tel.: +358 3 3115 2124, fax: +358 3 3115 4954, gsm: +358 50 364 8208