[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Definition and Measurement of Harmonicity

Chris and Others,

I assume here that this topic refers to the second of Jim Beauchamp's
definitions: "the degree to which single sounds contain only harmonics" of
a common fundamental frequency.  I also assume that this is to be
determined by examining the signal itself.  In practice this will be a
sampled signal x(n), n = 0, 1, ..., N-1.

On this basis, I think we need more general and applicable measures than
the ones so far mentioned in this stream, for  the following reasons.

Thus, Reinhart Frosch's formula only applies for a particular model of
thick plucked or struck strings.  It also doesn't give an overall measure
of inharmonicity, since it only applies to the individual partials.  In
addition, it doesn't take into account the relative strengths of the
various partials (e.g. a signal would still be almost perfectly harmonic,
even if some of the partial frequencies are grossly in error, provided that
the corresponding partial amplitudes are very small).  It takes no account
of noise of any sort in the signal, which is often the reason that a signal
is less than perfectly harmonic.

Similar comments can be made about Jim Beauchamp's formula.  Incidentally,
I couldn't find any measures of pitch salience on a quick browse through
Slaney's Auditory Toolbox, as suggested by Brian Gygi.

Possibly the biggest problem in practice with both the above formulas is
that they assume that the partial frequencies are known.  However, deriving
these from the observed signal is a very difficult problem in
practice.  Papers are still being published even for the simplest case of a
single sine wave in noise.  Methods based on the DFT (or FFT) are often
used to derive the partial frequencies in more complex cases, but
simple-minded approaches using such methods are not very
accurate.  Subspace methods can also be used, but are computationally
expensive.  However, I think it is fair to say that the problem of finding
good estimates of the partial frequencies is still very much an open
problem.  (And I suspect that the estimates would have to be very good
indeed to use the proposed formulas.)

My conclusion is that, instead of the above, we need something similar to
the concept of the degree of voicing, which is commonly used in speech
coding work.  The degree of voicing is a measure of the degree to which a
signal is periodic - i.e. the degree to which it is harmonic in the above

There is a good survey of voicing determination in "Pitch and Voicing
Determination" by W.J. Hess in Chapter 1 of "Advances in Speech Signal
Processing" by S. Furui and M.M. Sondhi (Eds), Marcel Dekker, 1992.

As pointed out in that article, many pitch estimators produce estimates of
the degree of voicing as a by-product of their pitch estimates.  There is
thus a large choice of harmonicity measures available.

A brief description of two common measures of harmonicity that are easily
appreciated in their own right are as follows (in their original contexts
they were associated with further algorithms for pitch estimation or
coding, or both).

Firstly, consider the autocorrelation function (ACF) of x(n), denoted by R(k).

If a signal x(n) is purely harmonic, it will also be perfectly periodic
with some period K, so that x(n+K) = x(n) for all n.  In this case it is
easy to show that R(k) is also periodic with the same period.  Also, R(0)
is the global maximum value of R(k), but this same maximum value is also
achieved by R(K),  R(2*K), etc.  However, If x(n) is not periodic, R(0)
will be larger than R(K), R(2*K), ...

In the case of a signal observed only on the interval [0, N-1], an
appropriate (re-)definition of the ACF is

    R(k) = MEAN (x(n) * x(n+k)),

where the mean is taken over all terms for which both n and n+k are in the
range [0, N-1].  For a harmonic signal the ACF defined like this will still
be approximately periodic and have almost equal major peaks at k = 0, K,
2*K, ..., provided that N is large enough to cover several periods (N >
2.5*K is often considered adequate).

These considerations lead to a frequently used measure of harmonicity,
defined as

    H1 = MAX (R(k)) / R(0),

where the maximum is taken over k in the range [1, N-1].  H1 will always be
less than or equal to 1, but in the purely harmonic case it will be near 1,
whereas for a noise-like signal it will be near zero.

It may be possible for H1 to be large for some non-harmonic signals, but in
practice this measure has been found to be a reasonably good indicator of
voicing (or harmonicity) for speech signals, and is widely used.  It has
the great advantage of being very simple to compute.

A variation of this measure is to apply it to the residual signal following
linear prediction, instead of to the signal itself.

A second measure of harmonicity can be obtained by fitting a harmonic signal to the observed signal, e.g. using least squares. That is, we write x(n) = p(n) + e(n), where p(n) is a purely harmonic (periodic) signal and e(n) is an error (or residual) term.

We can write p(n) in the form   p(n) = SUM (A(k)*cos(k*w0*n + ph(k)) ),
where the sum is taken over all harmonics k*w0 up to the Nyquist frequency
(i.e. k ranges over [1, 2, ..., floor(pi / w0)] ).  To perform the fit we
then find the amplitudes A(k), the fundamental frequency (or pitch) w0 and
the phases ph(k) that minimize the energy of the error sequence e(n).

This is a highly nonlinear problem in general, but it becomes linear and
simple to compute if the fundamental frequency is known.  Hence in practice
this method usually begins by finding a best pitch estimate w0, using any
good method (see the Hess article), and then solving for the amplitudes and
phases.  The details can be found in a number of articles by R.J. McAulay
and T.F. Quatieri on sinusoidal coding (e.g. Chapter 4 of "Speech Coding
and Synthesis" by W.B. Kleijn and K.K. Paliwal (Eds) , Elsevier Science,

The first harmonicity measure that results from this analysis is then

    H2 = SUM(h(n)^2) / SUM(e(n)^2) , where n ranges over [0, N-1];

i.e. the harmonic-to-residual ratio (similar to signal-to-noise ratio).  H2
is large in the purely harmonic case and small in the noise-only case.

Instead of H2 we could also consider

    H3 = SUM(h(n)^2) / [SUM(e(n)^2) + SUM(h(n)^2)] = H2 / (H2 + 1),

which is near 1 in the purely harmonic case and near zero in the noise-only
case, just like H1.  (McAulay and Quatieri also give other related measures
in their papers.)

Unlike H1, however, these measures are clearly tied to the degree to which
a signal is harmonic.  But they require much more computation (though still
not an enormous amount).

All the above assumes implicitly that the signal is stationary, which is at
best an approximation in real cases.  The case in which the fundamental
varies slowly can be handled in sinusoidal coding, but sudden changes are
more problematic.  Also, it is implicit that the residual e(n) is a
noise-like signal, not other harmonic complexes (as in musical chords).

        Harvey Holmes

At 05:23 15/01/2005, Reinhart Frosch wrote:
The inharmonicity of piano strings is treated in
section 12.3 of the book "The Physics of Musical
Instruments", by Fletcher and Rossing (Springer,
2nd ed. 1998).

The basic equation for the frequency of the k-th
partial tone is:

f[k] = f[1i] * k * (1 + k^2 * B)^0.5 ;

here, f[1i] is the fundamental frequency of an
idealized string that has the same length, mass
and tension as the real string but is infinitely
flexible (i.e., has no stiffness).

B = 0 corresponds to a string without stiffness
and thus to a harmonic complex tone;
B is an "inharmonicity coefficient".

Reinhart Frosch,
(r. Physics Dept., ETH Zurich.)
CH-5200 Brugg.

>-- Original-Nachricht --
>Date:         Thu, 13 Jan 2005 14:50:20 +0000
>Reply-To: Chris Share <cshare01@xxxxxxxxx>
>From: Chris Share <cshare01@xxxxxxxxx>
>Subject:      Definition and Measurement of Harmonicity
>To: AUDITORY@xxxxxxxxxxxxxxx
>I'm interested in analysing musical signals in terms of their
>There are numerous references to harmonicity in the literature
>however I can't find a precise definition of it. Is there an
>agreed definition for this term?
>If someone could point me to some relevant literature it would
>be very much appreciated.
>Chris Share