[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cariani's question: "What is the visual analogue of pitch?"



The debate between Kubovy and Neuhoff is interesting,
although it will take some time to digest.
I found that the URL for Kubovy's papers that works is:
http://www.people.virginia.edu/~mk9y/mySite/papers.html

There are a number of provocative interchanges between music and film
that always
come to mind in these discussions: the abstract films of Dadaist
Hans Richter and Eggeling's Symphonie Diagonale. This I think is the
closest visual
art comes to music, where repetition and rhythm of form and movement
play
strong roles. On the music as visual form front,
I taught Psychology of Music last fall and used Stephen Malinkowski's
Music Animation Machine piano roll music animations to help visualize
melodic structure.
http://www.well.com/user/smalin/mam.html
It's worth having a look at it (and his tapes) if you're interested in
these issues.

The Gestalists certainly included melody and rhythm as examples of
coherent, relational organizations.  Melodic and rhythmic grouping
mechanisms
arguably form the "chunks" that cause us to parse music in particular
ways that
are then described by the cognitivists in terms of nested hierarchical
organizations.

Along with Handel's Listening (1989),  I've found Snyder's book,
Music and Memory very useful in developing these notions in
musical contexts.

I agree that the relation between audition and vision is not simple.
We understand neither system well. Pitch is not frequency per se, and
visual
form is not simply a spatial pattern of activation on the retina, but
there are
nevertheless parallels between the kinds of correlational invariances
and
transformations  that underlie say magnification invariance of form in
vision
and transpositional invariance of chords and melodies in music. One
looks
at various binocular spatial-disparity effects (stereodiagrams) and
there are
temporal analogues in the binaural system (Huggins pitch). Time delays
in the binocular
system map to depth (Pulfrich), while they map to azimuthal location in
audition. The correspondences are not those that would be predicted by
simple analogies, but neither do they seem arbitrary.

I tend to think of timbre as the auditory analogue of visual texture
and color,
and melody as an auditory analogue of visual figure or contour. Because
of
eye movements, a figure is constantly being presented to different
retinal
locations, such that the spatiotemporal (spike) volley pattern
associated with the
spatial form is re-presented to the system over and over again. We can
imagine circuits
that build up this invariant volley pattern as a stable object.  A
series of notes repeated likewise creates
an auditory volley pattern that is repeated, and the same kind of
mechanism
would create an auditory image of the whole repeated sequence.

When the melody is transposed, we hear the similarity of the patterns,
but also the shift in pitch
(upward or downward) of the pattern as a whole: i.e. apparent movement
of an object.
Music theory is rife with all sorts of metaphors of movement (rhythmic,
melodic,
tonal, thematic, etc.), which involves this combination of an invariant
pattern (object) being transformed in a manner that preserves its
essential organization (that made it a stable object in the first
place).
The paper by Pitts & McCulloch (1947) on How We Know Universals
had the right spirit in trying to conceive of a mechanism, but their
neural coding assumptions -- re: the nature of the representations -- I
think
were flawed.The pattern invariants could be volley patterns of spikes,
rather than
channel patterns (rate-place profiles in auditory and visual areas).
This
might explain why our sensory systems so effortlessly recognize the
similarity of
the patterns even when they are transposed or translated onto completely
different sets of neural channels (different retinotopic and
cochleotopic positions in neural maps).
It's easy to move temporal patterns around in neural
systems, but much harder to move spatial patterns.
In the 1930's Lashley recognized the problems these channel-translations
pose for "switchboard models" of vision.
But today our thinking is are so enamored of features and rate-channel
codes
that it becomes nearly impossible to conceive of anything else.

--Peter Cariani

On Tuesday, January 20, 2004, at 06:32  PM, Eliot Handelman wrote:

John Neuhoff wrote:

Stephen Handel once said that an analogy between vision and audition
could
be "seductive, but misleading". In my opinion, Kubovy & Van
Valkenburg's
"Pitch is to space as audition is to vision" idea has some serious
drawbacks.

I've been thinking recently about the relation of hearing to vision as
it applies to the
perception of music, eg the "construction" by the mind of a melody,
such
that when
you listen there is a sense of a highly structured whole, or of a trend
towards wholeness. In my
work, which is about computational analysis of music, I've come to find
that a useful approach is one that
analogizes from computer vision -- ie, hierarchically builds up larger
entities -- "objects" -- from low level features --
ie, things like orientational trends -- in a way that seems highly
evocative of the patterns of computation
that vision is known to imply. It's interesting to speculate that the
procedures for listening to music
might map rather gracefully from visual processes to hearing  and
perhaps even involve certain visual specializations.

It would be useful to know, in this regard, whether we possess
orientation-selective cells -- which doesn't
seem implausible. If these existed, then almost certainly some sort of
hierarchic computations would take
advantage of these.  I haven't seen any research that directly supports
this, though.

pitch::space = audition:vision strikes me as much too simple. If I'm
right in thinking that music is a kind of
auditory system analogue to vision, then there are very mny more
factors
that need to be accomodated. The
most important of these, I think, are "parallelisms" -- ie, repetitions
(in structure, for instance, and potentially
at a very local level) that preserve a sense of "object constancy" --
eg
the transposiion of a rhytmically-shaped
interval or two. Even in very simple music -- like "happy birthday" --
these can be confoudedly complex for
a program to work out. It gives an indication of the complexity of the
brain the beast that regards this as a
simple entertainment must posess.

Ifr parallelism analysis corresponds to visual object constancy
analysis, then surely the analogy would go something
like this: relations-between-pitch::space =  audition:vision.

Just a few thoughts, but I'd be glad for any feedback.

-- eliot

-------

Eliot Handelman Ph.D
Montreal, Canada