[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rhythm

Dear Bill and List,

You may be interested in a neuroanatomically and neurophysiologically
plausible account of rhythm perception, but first it is important to
clear up a few points of terminology in this area which, historically,
has been rather muddled. The following are related, but quite
distinct phenomena.

(1) recognition of metrical categories

(2) temporal grouping

(3) discrimination of filled time durations

(4) discrimination of single or multiple empty intervals (event rate)

(5) tempo or beat rate discrimination

The distinction between (1) and (2) is very important because it is
quite possible to have ametrical and nonperiodic rhythms, e.g. bird
song or plainchant. That is, we may hear events as being grouped in
time without associating a well-defined beat or quantising into
categories (1:2, 1:3, etc).

The distinction between (3) and (4) is very important because (a)
music has articulation, i.e. staccato-legato variation, and (b)
filled and empty durations are perceptually quite different, e.g. the
filled duration illusion.

The distinction between (4) and (5) is essential because it is
possible to discriminate say an 8 Hz and a 10 Hz click train without
"hearing" a beat.

For the last few years I have been promoting a sensory-motor theory
of rhythm, time and beat perception. This theory accounts for the
complex of phenomena above by the interaction of:  (A) sensory
systems; (B) the motor system; and (C) an interpretation, planning
and control system.

The first tenet of this theory is that what unites all our senses is
that they code for change in the physical environment. The existence
of multi-modal cortex and multi-sensory binding (e.g. audio-visual
speech) implies that our senses share a common form of temporal
coding. This common code, it is suggested, is manifest in the
cerebral cortex in the form of dynamic spatio-temporal receptive
fields (RFs) , i.e. receptive fields that code for motion.  One way
of viewing a dynamic RF is in the form of a filter which has both a
temporal and a spatial tuning. The cortex is populated by many
millions of neurones possessing these dynamic RFs having a
distribution of tunings over a range of spatial and temporal scales.

The second tenet is that  our sensory systems and the neurodynamics
of the motor system are adapted or tuned in to the temporal
properties of the external environment and the biodynamics of the
body. This implies both that the temporal sensitivity of any sensory
system will reflect the temporal properties its environment,  and
that motor expectations will reflect the biodynamics of the body.

The temporal sensitivity of any sensory system depends on the
distribution of temporal tunings of the dynamic RFs in the cortex. In
the case of the motor system, two important structures are the
cerebellum, which receives inputs from multi-modal sensory cortex,
and the basal ganglia, which receives inputs from the entire
neocortex. Multi-modal sensory cortex in turn also receives
reciprocal inputs from the premotor cortex, thus completing the
sensory-motor loop,  so that the motor system influences the
interpretation of sensory information. Although the functions of the
cerebellum and basal ganglia are still very much a mystery, both are
known to be implicated in motor timing. One well-established role
that the cerebro-cerebellum is thought to play,  is that of a
feedforward control mechanism,  representing an internalisation of
the biodynamics of the body.

An implication  of the motor aspect of the second tenet,  is that the
natural propensity of the body to rhythmicity will also be reflected
in any internal feedforward model.  Three important evolutionarily
primitive cyclical behaviours which correspond with the biodynamics
of the body are: mastication, locomotion and respiration. In addition
to cerebellar control these primitive actions are also controlled by
means of low-level spinal circuits referred to as central pattern
generators (CPGs). Although CPGs may be viewed as clock-like, in fact
CPGs are controlled *tonically* (as opposed to phasic control) from
the brain-stem, and do not reflect any inherent neural oscillatory
mechanism (i.e. it's not a tick-tock).

One important consequence of a sensory-motor perspective is that
internalised rhythmicity of the body will have an effect on
perception, via the sensory-motor loops. To put this crudely, the way
we hear or see may be determined, at least in part, by the
biomechanical properties of our own bodies. Although this proposition
may seem at first sight to be verging on the absurd,  it is possible
to show that the three most important phonological levels of speech
rhythm, the syllable, foot and intonational phrase, can all be
identified with the three evolutionarily primitive rhythmic
behaviours above. Also I have some recent data which shows that an
individual's preferred tempo correlates with simple biomechanical

However, to return to the specifics.

(1) Metrical categories arise naturally from a wavelet
representation, the ratios 1:2, 1:3, 1:4 are the first four terms of
the harmonic series which emerges from a scale-space representation.
Recognition can be modelled as the correlation between a template for
each of the categories and any test rhythm. Event rate dependency is
a consequence of two things (a) the distribution of temporal
frequencies of the RFs (each of the category templates will look a
little different depending on the rate, i.e. there are a set of rate-
dependent templates for each category) (b) since the RFs are causal
they effectively form a memory.

(2) Temporal grouping also emerges naturally from a scale-space
representation, most evident when the temporal component of the RFs
are low-pass, and does not require any prior categories.

(3) The filled interval illusion may be explained by the fact the
transform of a square pulse is a sin52x/x52 form, whereas that of two
or more short events is a harmonic series (see

(4) Empty interval and event rate discrimination may also be
explained by the distribution of  temporal frequencies of the
cortical RFs, i.e. we are maximally sensitive  when the event rate
falls in the centre of the cortical distribution. This is
approximately 0.5 Hz - 32 Hz (log distribution), but may extend a
little further either side. (Incidentally, this same mechanism also
accounts for AM and FM detection/discrimination at low modulation

(5) A beat is an entirely different beast. It is not something we
perceive, but something we impose onto a periodic signal. As has been
pointed out, there is a well-defined existence region which
corresponds approximately to locomotor rates. For this reason many
have suggested that this is the origin of beat induction. According
to the sensory-motor perspective a beat is literally an imagined
movement. The most likely location in the brain for such an imagined
movement is the sensory-motor loop mentioned above (main structures
are posterior parietal lobe, pre-motor cortex, cerebro-cerebellum
and basal ganglia). Once a beat has been induced it has a certain
inertia of its own and influences the way the brain interprets the
auditory image flow.

If you are at interested in reading more, the following may be of


Neil Todd

On expressive timing.

Shaffer, L.H., Clarke, E.F. and Todd, N.P.McAngus.  Meter and rhythm
in piano playing. Cognition  20, 61-77, (1985).

Shaffer, L.H. and Todd, N.P.McAngus (1994) The interpretative
component in musical performance. In R. Aiello and J. Sloboda (Eds)
Musical Perceptions. OUP. pp. 258-270.

Todd, N.P.McAngus (1985)  A model of expressive timing in  tonal
music.  Music Perception 3, 33-58.

Todd, N.P.McAngus (1989) A computational model of rubato.
Contemporary Music Review  3, 69-88.

Todd, N.P.McAngus (1992) The dynamics of dynamics: a model of musical
expression. J. Acoust. Soc. Am.,91(6), 3540-3550.

On rhythm perception and time discrimination.

Todd, N.P.McAngus (1994) The auditory primal sketch: A multi-scale
model of rhythmic grouping.  J. New Music Research  23(1), 25-70.

Todd, N.P.McAngus (1995) The kinematics of musical expression. J.
Acoust. Soc. Am. 97(3), 1940-1949.

Todd, N.P.McAngus (1996) An auditory cortical theory of auditory
stream segregation. Network : Computation in Neural Systems. 7, 349-

Todd, N.P.McAngus (1996). Towards a theory of the principal monaural
pathway: pitch, time and auditory grouping. In W. Ainsworth and S.
Greenberg (Eds). Proceedings of the  International Workshop on the
auditory basis of speech perception. Keele, July, 1996. pp 216-221.

Todd, N.P.McAngus (1996). Towards a theory of the central auditory
system III: Time. In B. Pennycook and E. Costa-Giomi (Eds.)
Proceedings of the Fourth International Conference on Music
Perception and Cognition. Montreal, August, 1996. pp 185-190.

Todd, N.P.McAngus (to appear) A model of auditory image flow I:
Architecture. British Journal of Audiology

Todd, N.P.McAngus (to appear) A model of auditory image flow II:
Detection of amplitude and frequency modulation. British Journal of

Todd, N.P.McAngus and Brown, G.J. (1996) Visualization of rhythm,
time and metre. Artificial Intelligence Review   10, 253-273.

Todd, N.P.McAngus and Clarke, E.F. (1995)  The perception of rhythmic
structure in expressive musical performance. Proceedings of the 15th
International Congress of Acoustics.. Volume III. pp 459-462.

Todd, N.P.McAngus and Lee, C.S. (to appear) A sensory-motor theory of
speech perception: Implications for learning, organisation and
recognition. To appear in W. Ainsworth and S. Greenberg (Eds).
Listening to Speech. OUP.

General reviews

Clarke, E.F. (to appear) Rhythm and Tempo(?) In D.Deutsch. Psychology
of Music(?) Second Edition.

O'Boyle, D.J. (1997). On the human neuropsychology of timing of
simple repetitive movements. In: Bradshaw,C.M. & Szabadi, E. (Eds.),
Time and behaviour. Psychological and neurobehavioural analyses (pp.
459-515). Amsterdam: Elsevier Press (in press).