[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: correlation of vision and audition



Dear Colleagues,

Thanks to those of you who sent in information about the mutual
interaction of visual and auditory signals.

Here are the abbreviated replies that I received from two lists:  (1)
AUDITORY, and (2) ICAD (International Conference on Auditory Display).
There are a number of useful references in it.

Jason:
If you want to join the AUDITORY list, send a message to Dan Ellis
<dpwe@ICSI.Berkeley.EDU>, asking him to put you on the list

- Al Bregman


Here is the request for information that I sent to these lists:

-------------- request for information starts here ----------------
>Dear Colleagues,
>
>A doctoral student in the music recording program at McGill, who is taking
>my course in auditory perception, wrote me to ask about a phemonenon that
>he had observed while doing animation.  I include a part of his message:
>
>---------- Forwarded message ----------
>Date: Fri, 6 Feb 1998 14:15:35 -0500 (EST)
>From: Jason Corey <corey@music.mcgill.ca>
>
>While creating and editing a soundtrack for an animated film, it became
>apparent that sounds occurring sychronously in time with visual events on
>the screen, had a effect on how I perceived the visuals.  For one
>particular scene there happened to be a great deal of activity happening
>in the animation, with different events happening at different positions
>on the screen.  Without a sound track there were many events that were not
>perceived, until a sound effect was sychronised with the particular visual
>events.
>
>It seems that by having sound accompany a visual, many more details
>of the visual are perceived than without a soundtrack.  It would seem
>that our auditory system helps the visual system to "sort out" the
>details when there are many events happening visually.  Maybe this
>somehow helps us focus on certain visual details more than others.  My
>guess is that by changing the soundtrack to a given moving picture, it is
>possible to alter what the subject will focus on.
>
>I was wondering how much work has been done in this area.  Any help would
>be appreciated.
> ...
>
>Jason Corey
>--------- end of forwarded message ---------------
>
>I know about the work on the ventriloquism effect by Thomas, 1941, by
>Witkin et al, 1952, and by Bertelson, Radeau and others more recently, but
>this research is more concerned with the PERCEIVED LOCATION of the sound
>or visual object rather than its behavior over time.  I think the question
>is about how a correlated sound helps in the visual parsing of a display.
>
>Concerning the reverse, how a correlated visual event would help in the
>parsing of an auditory presentation, we have probably all seen one person
>using hand motions in the air to direct another person's attention to a
>particular part in a complicated piece of music.
>
>I am looking for information about these phenomena - both when audition
>assists visual parsing or the reverse.
>
>I know about some of the research that looks at the role of visual cues in
>the perception of speech (Dodd, 1977, 1979, 1980; Massaro, 1987; McGurk
>and colleagues).  I also am aware of some research on the influence of
>correlated sounds on apparent motion (phi) by Gilbert (1939) and by
>O'Leary and Rhodes (1984).  There is also the work of Spelke and her
>colleagues on the integration of correlated visual and auditory events by
>infants.
>
>I have also heard of a study by Sumby and Pollack on how the seeing of a
>speaker can help the listener to understand speech in noise, but I
>don't have the exact reference.
>
>The effects of visual input on speech understanding may be entirely due to
>the fact that the visual input actually supplies additional
>information about the identity of the speech signal.  Has anybody
>determined whether any part of the effect may be in assisting the listener
>to parse the mixture of sounds that might be present?
>
>Does anyone know of articles, chapters or technical reports specifically
>on the issue of how information from one modality helps us to parse
>mixtures in another?  Or of the exact reference to the Sumby-Pollack
>article?
>
>If so, I'd appreciate hearing about them.
>
>Thanks,
>
>Al
>
>----------------------------------------------------------------------
>Albert S. Bregman,  Professor,  Dept of Psychology,  McGill University
>1205  Docteur Penfield Avenue,   Montreal,  Quebec,  Canada   H3A 1B1.
>Phone: +1 514-398-6103 Fax: -4896  Email: bregman@hebb.psych.mcgill.ca
>Lab Web Page: http://www.psych.mcgill.ca/labs/auditory/laboratory.html
----------------------------------------------------------------------

---------- the request for information ends here --------------------



Here are the answers I received:

----------- the set of received answers starts here -----------------


The full Sumby-Pollack reference is this:

 Sumby, W.H. and Pollack, I. (1954)
Visual contribution to speech intelligibility in noise,
J. Acoust. Soc. Am. 26: 212-215.

(Sent by Steve Greenberg <steveng@ICSI.Berkeley.EDU>,
John Neuhoff <neuhoffj@lafvax.lafayette.edu>
"Peter F Assmann,GR4126,2435,GR41" <assmann@utdallas.edu>
"Thomas Anthony Campbell" <campbelt@mailexcite.com>)


John Neuhoff
 <http://sound.media.mit.edu/AUDITORY/asamtgs/asa97pen/5aMU/5aMU3.html>
  also sent these:

Stein, B. E., Meredith, M. A., Huneycutt, W. S., & McDade, L. (1989).
Behavioral indices of multisensory integration: Orientation to visual cues
is affected by auditory stimuli. Journal of Cognitive Neuroscience, 1(1), 12-24.

Stein, B. E., & Meredith, M. A. (1994). The Merging of the Senses.
Cambridge, MA: MIT Press.

Perrott, D. R., Sadralodabai, T., Saberi, K., & Strybel, T. Z. (1991).
Aurally aided visual search in the central visual field: Effects of visual
load and visual enhancement of the target. Human Factors, 33(4), 389-400.

Scott Lipscomb also gave a talk at the ASA meeting last May entitled
"Perceptual measures of visual and auditory cues in film music".  The
abstract is online at:


--John Neuhoff
______________________________________________________________
John Neuhoff                    Phone:  610-250-5287
Department of Psychology        FAX:    610-250-5349
Lafayette College               email:  neuhoffj@lafayette.edu
Easton, PA 18042
www.lafayette.edu/~neuhoffj/neuhoff.htm
--------------------------------------------------------------


Date: Fri, 6 Feb 1998 21:04:55 -0500 (EST)
To: Al Bregman <bregman@hebb.psych.mcgill.ca>
From: Harold Fiske <hfiske@julian.uwo.ca>

ANNABEL COHEN'S WORK MAY BE OF INTEREST TO YOU, SEE, E.G., THE ISSUE OF
_PSYCHOMUSICOLOGY_  SHE EDITED RECENTLY (ONE OR TWO ISSUES BACK -- SORRY I
DO NOT HAVE THE EXACT REFERENCE IMMEDIATELY AVAILABLE, BUT IT IS EASILY FOUND).

HAROLD FISKE
UNIVERSITY OF WESTERN ONTARIO

---------------------------------------------------------

>From Nick.Zacharov@research.nokia.com Sat Feb  7 03:39:04 1998

Hi

The issue of multimodal interaction seems to have been studied to a fair
extent within the broadcast & film industry. However in these most of the
work related to AV sync and percieved quality. i have a few refs. on this if
you require.

The overall cognition process and the effective enhancement or degradation
associated with multimodel stimuli is well described 'Stein B. E., Meredith
M. A., The merging of the senses, pp 148-156, MIT press, 1993'. This book
studies both the percpetion of cat and humans (apparently having similar
cognative processes) and look an numerous multimodal interactions

I hope this helps

yours

-------------------------------------------
Nick Zacharov
Nokia Research Center
Speech & Audio Systems
PO Box 100               Tel +358-3-272-5786
33721 Tampere            Fax +358-3-272-5899
FINLAND
-------------------------------------------------------------------

Date: Sat, 07 Feb 1998 10:28:16 -0600
Organization: The University of Texas at Dallas
To: Al Bregman <bregman@hebb.psych.mcgill.ca>
From: "Peter F Assmann,GR4126,2435,GR41" <assmann@utdallas.edu>

Al, here is the Sumby and Pollack reference, along with a
set of other papers concerning with the issue of how
visual cues from lipreading supplement those provided by
the acoustic signal. Chris Darwin brought my attention
to the paper by Driver, which addresses the role of vision
in selective attention to a target voice against the
background of a competing voice. Driver created an audio
signal of a mixture of two voices speaking a list of
target and distractor words, and the task was to identify
as many words as possible. In combination with the audio
signal subjects saw the face of one of the talkers on a
video monitor (synchronized with one of the voices in the
audio signal). Performance in recognizing the target voice
was around 58% but when the video monitor was moved (either
100 cm to the left or right), identification improved to
about 77% correct. The apparent change in the visual source
caused listeners to hear the corresonding sound source to
come from a different direction and thereby enhanced the
segregation of the two voices.

-Peter

====================================================================

Driver, J. (1996). Enhancement of selective listening by illusory
mislocation of speech sounds due to lip-reading. Nature 381: 66-67.

Summerfield, Q. (1991). Visual perception of phonetic gestures. In:
 Modularity and the Motor Theory of Speech Perception. Edited by
 Mattingly, I. and Studdert-Kennedy, M., Ch. 6, pp. 117-138.
 L. Erlbaum & Associates: Hillsdale, N.J.

Summerfield, Q. (1987). Some preliminaries to a comprehensive
account of audio-visual speech perception. In Hearing by eye:
The psychology of lip-reading. Edited by B. Dodd and R. Campbell.
(Erlbaum, N.J.)

Grant, K.W., Braida, L.D. and Renn, R.J. (1994). Auditory supplements
to speechreading: Combining amplitude envelope cues from
different spectral regions of speech. Journal of  the Acoustical
Society of America 95: 1065-1073.

Grant, K.W., Ardell, L.H., Kuhl, P.K.and Sparks, D.W. (1985).
The contribution of fundamental frequency, amplitude envelope,
and voicing duration cues to speechreading in normal-hearing subjects.
Journal of  the Acoustical Society of America 77: 671-677.

-----------------------------------------------------------

Date: Mon, 9 Feb 1998 09:38:05 GMT
To: Al Bregman <bregman@hebb.psych.mcgill.ca>
From: bob.carlyon@mrc-apu.cam.ac.uk (Bob Carlyon)

al,
do you know about the Nature article by Jon Driver, in which "target" and
"distractor" speech were mixed into a common loudspeaker? H efound that a
video of someone producing the target speech produced better identification
when the video monitor was located above a "dummy" speaker to one side of
th ereal one, compared to when it was above the real speaker.
Reference: "Enhancement of selective listening by illusory mislocation of
speech sounds due to lip-reading", Nature 381, p66-68 [1996]

He has also done a lot of work with Charles Spence showing the various ways
in which the identification of visual targets can be affected by auditory
"cues" which "attract attention" to wards (or away from) the target. Also
vice versa. Some of this is in P&P 1997

cheers

bob

------------------------------------------------------------------------------
Dr. Bob Carlyon, MRC Applied Psychology Unit,15 Chaucer Rd.
CAMBRIDGE CB2 2EF, England. Phone: (44) 1223 355294 ext 720
FAX:   (44) 1223 359062.  email: bob.carlyon@mrc-apu.cam.ac.uk
----------------------------------------------------------------------------



Date: Mon, 09 Feb 1998 13:38:22 +0100
From: Jon Barker <j.barker@dcs.shef.ac.uk>
Reply-To: jon@kom.auc.dk
Organization: Sheffield University

Al Bregman wrote:

> The effects of visual input on speech understanding may be entirely due to
> the fact that the visual input actually supplies additional
> information about the identity of the speech signal.  Has anybody
> determined whether any part of the effect may be in assisting the listener
> to parse the mixture of sounds that might be present?
>

Yes,

Driver  1996,    "Cant remember the title!"    Nature  381, pages 66--68
addresses this question..

He was interested in whether the ventriloquist illusion could actually
segregate speech sources such that recognition scores in simultaneous
speaker tasks would be higher.

Listeners were presented with two simultaneous speakers (using a
loudspeaker) and shown the lip movements for just one speaker. They were
asked to repeat the words of the talker who's lip movements they were
watching.  The (rather remarkable) finding is that when the video abd
audio are presented from a different spatial locations, the ventriloquist
illusion shifts the apparent location of one of the voices, and this
illusory spatial segregation improves their ability to identify the words
in the mixture.  i.e. it acts like a real speatial separation.

In another experiment, reported in the same paper, it was demonstrated how
seeing the lips of one of the talkers could actually help the listener to
identify the words of the *unseen* speaker.

The results would seem to implicate visual cues in the segregation
itself - not just as extra information to help identification after
segregation.

Cheers

Jon
--
Jon Barker, Speech and Hearing, Computer Science,
University of Sheffield, Sheffield  S1 4DP, UK
Phone: +44-(0)114-22 21800;  FAX: +44-(0)114-278 0972
Email: j.barker@dcs.shef.ac.uk;  URL: http://www.dcs.shef.ac.uk/~jon

---------------------------------------------------

To: bregman@hebb.psych.mcgill.ca
Date:   Mon, 09 Feb 1998 07:57:51 -0700
From: "Thomas Anthony Campbell" <campbelt@mailexcite.com>


Dear Al,

There's a good section in Goldstein's latest edition of Sensation and
Perception on McGurk-like effects for plucked and bowed instruments:
seeing a plucked instrument when the sound is that of a bowed intrument
makes the perciever report that sound as plucked.  Similar effects don't
occur for written words combined with sounds(Fowler and Deckle, 1991).
Perhaps you've seen all this.

It seems to me there's might be something in this audio-visual correlation
of *movements* during auditory-visual object identification. Written words
don't move.  However, bows, fingers, mouths and the sounds of speech and
music do move. *When* that correlation occurs I'd really like to know.

As for assistance of visual parsing by sound , we've found that a repeated
acoustic "one" can have a small facilitatory effect on *serial recall* of
lip-read digit lists.  This facilitatory effect over silent lips isn't
very clear cut.  Perhaps a repeated "one" helps them concentrate on
parsing the lip-read digits more.  There are other explanations I'm sure.
For me, it's an open question.

I'm in search of similar papers that involve manipulations of audio-visual
 synchrony
in McGurk-like effects. If you hear of any such papers then please send the
 references
on to me.

Hope this helps!

Tom Campbell
Research student
Dept of Psychology
Reading University
Whiteknights
Reading
ENGLAND

FOWLER, C. & DECKLE, D.J. (1991) Listening with eye and hand: Cross-modal
 contributions
to speech perception. J of Exp Psy: HPP 17, 816-828


From: Scott Lipscomb <lipscomb@utsa.edu>
To: "'Auditory List'" <auditory@vm1.mcgill.ca>
Cc: "'Al Bregman'" <bregman@hebb.psych.mcgill.ca>

List members:

I must confess that I was THRILLED to see the following posting by Al
Bregman and his student, Jason Corey.  There has been quite a flurry of
activity in the experimental investigation of film music and its
associated context ... e.g., animation, motion pictures, etc.  Both my
Master's thesis (1990) and doctoral dissertation (1995) at UCLA looked
at these issues from an experimental aesthetic frame of reference.  With
the assistance of Dr. Roger Kendall, a 7-year series of experiments were
run.  The results the Master's thesis were published in Psychomusicology
(vol. 13, nos. 1&2) in 1994 ... in fact, the entire volume is devoted to
Film Music, guest-edited by Annabel Cohen.  Dr. Cohen and one of her
students have also done some important research in this area (see
Marshall & Cohen, 1988 in Music Perception, vol 6).

More germaine, however, are the results of my dissertation, which
addressed the issue of A-V synchronization ... directly relevant to
Jason's question(s) below.  I am hoping to publish these results soon,
so they will be generally available.  Some aspects of the study may be
viewed at my Web address ... http://music.utsa.edu/~lipscomb

I would welcome any additional discussion of the role of Film Music
and/or comments about my research.

Sincerely,
Scott Lipscomb


Date: Mon, 09 Feb 1998 15:00:41 -0400 (AST)
From: Annabel Cohen <acohen@upei.ca>

Hi Al,

   My interest in auditory-visual correlation, as you call it,  stems
from my research on effects of music in film perception.
I came up with a framework  that had at the core the notion of the
control of visual attention by audio-visual congruence, so you can
see why I am interested in your query.   A statement of this
framework has appeared in several publications that I will send to
you (also a set for Jason Corey). The framework was applied in  two
studies, one entailing a filmed animation (2 triangles
and a circle) and the other,  interactions among wolves.
 The data in support of the notion of  visual capture by audition
however are scant, in part I think  due to the technology available
to me at the time but also due to  a window for tolerance on what is
regarded as temporally congruent.  There are also  separate  issues
of  phase and pattern to be considered.  Shared pattern is probably
more important than shared phase for nonverbal materials, though for
verbal, precision in all aspects of timing is  likely more critical.

Roger Kendall and Scott Lipscomb have also focused on the issue of
structural congruence  in the film-music domain, and one of their
publications will be found in a journal issue of Psychomusicology
(1994, Vol 1-2) that I am sending.

Bill Thompson's research in the same volume on the effects of
soundtracks on perceived closure might also be relevant.

Finally, in the same volume,  buried in a paper by Iwamiya is
evidence that asynchronous timing negatively influences judgements
about music-video excerpts.

 I have recently been listening to your CD demonstrations
and felt that there might be a connection between auditory capture of
a structurally similar  auditory  stream and  auditory capture of a
structurally similar visual display.

I'd be interested in your thoughts on this and  the other information
 as I am unaware  of the Sumby and Pollock work you mentioned.

Best regards,

Annabel

------------------------------------------------

Date:         Mon, 9 Feb 1998 15:01:26 -0500
From: Robert Zatorre <MD37@MUSICA.MCGILL.CA>

For some interesting behavioral as well as neurophysiological evidence
pertaining to auditory visual interactions, see the following:

Stein, B.E., Wallace, M.T. and Meredith, M.A. (1995) Neural mechanisms
mediating attention and orientation to multisensory cues. In The Cognitive
Neurosciences, M. Gazzaniga Ed., MIT press, Cambridge, Mass., pp. 683-702.

Knudsen, E.I. and Brainard, M.S. (1995) Creating a unified represenation of
visual and auditory space in the brain. Annual Review of Neuroscience, 18,
19-43.

These studies have shown that under at least some conditions, stimuli that
are subthreshold in one or the other modality alone can be responded to
when they're combined, hence demonstrating some nonlinear interactions.
Stein and colleagues have provided neurophysiological evidence that
neurons within the deep layers of the feline superior colliculus contain
topographic maps of both visual and auditory space, and that large
enhancements of response are observed to combined auditory and visual
stimulation; this structure may therefore be one substrate for integration
of the two modalities. In addition, it has been shown that inputs from
unimodal neurons within polysensory cortical areas are important
determinants of SC integration of response.


Robert J. Zatorre, Ph.D.
Montreal Neurological Institute
3801 University St.
Montreal, QC Canada H3A 2B4
phone: 1-514-398-8903
fax: 1-514-398-1338
e-mail: md37@musica.mcgill.ca

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

From: Scott Lipscomb <lipscomb@utsa.edu>
To: "'Al Bregman'" <bregman@hebb.psych.mcgill.ca>

Al:

Good to hear from you .... GREAT!  I will look forward to hearing from
Jason.  As you might imagine, principles of "auditory scene analysis"
play a central role in any discussion of film soundtracks ... what with
the sound effects, music, dialogue, etc.

Hope all is well.

Scott

----------------------------------------------------------------

Date: Mon, 09 Feb 1998 21:23:22 -0500
From: Jarrell Pair <jarrell@cc.gatech.edu>

Dr. Bregman,

I am a graduate student studying at Georgia Tech's Graphics,
Visualization, and Usability (GVU) Center.  At GVU AudioLab, as part of my
master's degree work, we are designing an experiment to evaluate and
document audio-visual cross modal effects in virtual environments.  Below
is a list of references I have found while conducting literature research.
Also, I have included a rough initial description of the type of
experiment we are hoping to conduct once we have a competently designed
experiment.  Any comments you may have would be greatly appreciated.  I
would be very interested in any similar work being conducted at McGill or
any additional references you may come across.

Best Regards,

Jarrell Pair


--------------------------------------------------------------------------------
 ----------------------------------------------------------------------

Main Title:
                  " Basic principles of sensory evaluation"
                   Philadelphia, American Society for Testing and Materials
[1968].

Author:
                 ASTM Subcommittee E18.02 on Principles of Psychophysical Test
Methods.

This book has a section titled "Intercorrelation of the Senses".  It
references some interesting experiments conducted in the Soviet Union on
the effects of audio on vision.  These were conducted during the Cold
War making the data's value questionable.  However, the results were
compiled and reviewed in

London, I.D., "Research on Sensory Interaction in the Soviet Union,"
Psychological Bulletin, Vol. 51, No. 6, 1954, pp.531-568

--------------------------------------------------------------------------------
 ----------------------------------------------------------------------

Main Title:
                   The psychology of perception.
Author:
                   Dember, William Norton,

This text has a section on "Intermodal Manipulations".  It also mentions
Ivan London's review of Soviet literature.  Dember's book claims that
audio affects visual sensitivity and acuity.

Specifically the Russian studies claim:

-Sensitivity to white light in the fovea is increased under auditory
stimulation of moderate intensity

-Auditory stimulation increases sensitivity to blue-green light, but
lowers sensitivity to orange-red.

--------------------------------------------------------------------------------
 -------------------------------------------------------------------------------
 ------

Main Title:
                   Visual and auditory perception
Author:
             Murch, Gerald M.,

Murch's book includes a section on "Auditory and Visual Interaction".
It mentions a 1967 study by Posner indicating that the auditory system
can retain information better than the visual system.  It mentions that
a growing body of work "suggests a close interrelationship between the
short-term storage mechanisms of vision and audition".  Unlike the
Russian study, Posner's work is well respected and referenced.

--------------------------------------------------------------------------------
 ----------------------------------------------------------------------

Neuman, W. R.,  Beyond HDTV: Exploring Subjective Responses to Very High
 Definition Television. MIT Media Laboratory July, 1990

This document supposedly describes a much discussed but never referenced
experiment at M.I.T.  Have you ever read this paper?  I have been unsuccessful
in my attempts at finding it.


--------------------------------------------------------------------------------
 ----------------------------------------------------------------------

Stein, Barry E. and Meredith, M. Alex. The Merging of the Senses.  Cambridge,
MA:  MIT Press, 1993.


--------------------------------------------------------------------------------
 ----------------------------------------------------------------------

Welch, R.B. and Warren, D.H.  Intersensory Interactions. In K. Boff, L. Kaufman,
and  J.P. Thomas (eds.) Handbook of Perception and Human Performance, Volume 1.
New York: J. Wiley and Sons, 1986.


--------------------------------------------------------------------------------
 ----------------------------------------------------------------------


Audition/Vision Cross Modal Effects:
A Research Proposal

 Sound designers in the film and television industry have long relied on
the notion that high quality sound can be used to enhance the audience's
visual perception of moving images.  This concept has never been formally
researched or published. Recently, the debate over standards for high
definition television (HDTV) sparked a degree of interest in the effect of
audio on visual perception.  W.R. Neuman set up an informal experiment in
which subjects were required to watch two video sequences.  In both
sequences, the visual information was identical.  However, the first
sequence was encoded with low quality audio while the second sequence
incorporated high quality sound.  The subjects were subsequently asked to
compare the two sequences.  Nearly every subject stated that the second
video sequence was visually superior though the only difference was that
it had better implemented sound. Neuman explained how "It turns out that
there is an 'Audio Effect; in the evaluation of picture quality.  In fact,
the effect of improved audio is as strong on picture quality ratings as it
is on audio quality ratings."

 This audio effect, if it does indeed exist would have a significant
impact on the design of virtual environments. Designers of virtual
environments seek to create a feeling of presence, or immersion for users.
Primarily, this goal has been pursued by creating worlds with convincing
3D interactive graphics. This sense of presence is achieved by chiefly
engaging the human sense of sight.  Unlike sight, the sense of hearing is
often neglected in the implementation of a virtual world. Recent work
indicates that the integration of spatial audio in a virtual environment
enhances a user's sense of presence (Hendrix and Barfield 290-301).
Regardless of considerable evidence on its immersive potential, audio is
often banished as the poor stepchild of virtual reality. The plight of
audio in interface design is explained by Cohen and Wenzel:  Audio alarms
and signals have been with us since long before there were computers, but
even though music and visual arts are considered sibling muses, a
disparity exists between the exploitation of sound and graphics in
interfaces. .  . . For whatever reasons, the development of user
interfaces has historically been focused more on visual modes than aural.
(291)  This trend is in part due to technical resource limitations of
computer systems.  Designers were forced to sacrifice audio quality for
graphics performance.  However, these restrictions no longer exist. In the
past several years dedicated audio ASIC's (application specific integrated
circuits) coupled with fast CPU's have made it feasible to implement high
fidelity, immersive audio in graphically intensive virtual environments.
A more definite understanding of how sound affects visual perception would
allow virtual environment designers to optimally distribute resources
among the audio and visual aspects of a virtual world as a method of
maximizing presence.

The Significance of Audio-Visual Cross Modal Effects in Virtual
Environments: A Research Proposal

 At Georgia Tech's Graphics Visualization, and Usability (GVU) Center, the
Virtual Environments (VE) Group has developed virtual worlds for treating
individuals suffering from socially debilitating phobias.  The first of
these systems was built to treat fear of heights.  Audio in this system
was functionally minimal and usually disabled.  Experimental trials with
actual patients indicated that this approach to therapy could be
effective.  Subsequently, a PC based system was developed to treat fear of
flying.  Audio was a major component of the fear of flying virtual
environment project.  Many users mentioned how sound greatly contributed
to making the experience more "real".  This system is currently being used
by psychotherapists in Atlanta, Cleveland, and San Diego.

 Recently, a new virtual environment was built to treat Vietnam veterans
suffering from post traumatic stress disorder (PTSD).  This system is
currently being evaluated at the Atlanta Veterans Administration hospital.
It consists of a helicopter ride and a jungle walk.  The PTSD system
incorporates a complex, high fidelity soundscape.  Source sounds were
gathered from Hollywood production libraries and from tapes provided by
the Veterans Administration.  Explosions were processed to maximize their
low frequency or bass characteristics.  These sounds were played back for
the user over high quality headphones.  Also, a speaker designed to
respond well to low frequencies was mounted under the user's chair.  This
scheme allowed the user to feel and hear the impact of explosions and
aftershocks.  When the system was being developed, many users initially
used the it without sound enabled.  Subsequently, they experienced the
environment with the three dimensional soundscape.  All users invariably
commented that the sound transformed their perception of the environment.

 These comments on audio in the PTSD virtual environment prompt a need for
a formal experiment.  This experiment would have two aims.  One goal would
be to document the contribution audio provides to the overall sense of
presence.  Secondly, and more importantly, it would aim to provide
additional understanding as to how and if audio affects visual perception.
In other words, evidence would be gained to support or challenge the idea
of the audio effect Neuman mentioned.  Data from this experiment could be
used to eventually build a method for quantifying the sense of presence in
a virtual environment.

 This experiment would essentially be a perception study.  Extensive
consultation with psychologists would be necessary to formulate a credible
experimental design.  A general sketch of the experiment is given below.

 The experiment would consists of two tasks.  The first task would involve
a duplication of Neuman's HDTV experiment.  However, this experiment would
be performed using formal experimental methods.  Subjects would be asked
to watch television in a mock up of a living room.  This living room would
include a two way mirror behind which evaluators could observe as
necessary.  A living room mock up as described is available at the Georgia
Center for Advanced Telecommunications Technology (GCATT) within the
Broadband Telecommunications Center (BTC).  Two identical video sequences
would be played.  One would have low fidelity monaural sound.  The second
video would use compact disc quality stereo sound. Subjects would be asked
to fill out a questionnaire concerning their opinion of the overall
quality of each video sequence.  The video sequences and audio could be
provided by Professor James Oliverio, director of GVU AudioLab, who has
extensive experience as a sound designer for film and television.  As
mentioned earlier, this portion of the experiment would be used to verify
Neuman's results.  Neuman's experiment is widely referred to, but it has
never been taken seriously by the audio research community due to its
informality and the fact that it has not been further verified through
repetition and publication.

 The subjects would next be asked to complete the experiment's second
task.  Subjects would don a virtual reality helmet and experience three
identical virtual worlds. The subjects would first enter a virtual
environment without sound.  The second world would have monaural, low
fidelity sound.  A third world would have stereo, CD quality sound with
3-D audio effects.  Subjects would be asked to complete an extensive
questionnaire.  The questionnaire would ask questions concerning the
relative visual quality of each environment.  They would also be asked to
rate their sense of "being there" or presence.  The virtual world used for
this experiment would most likely be the environment used for the
previously mentioned post traumatic stress disorder treatment system.

 After an adequate number of subjects is run, data from the questionnaires
would be compiled.  Trends in the results would hopefully provide insight
into how audio and visual cues interact in video sequences and immersive
virtual environments reality.  Also, it could provide an incremental step
in finding a method for quantifying presence in a virtual reality world.


Preliminary References

Cohen, M., and Wenzel., E. M. The Design of Multidimensional Sound Interfaces.
In W. Barfield and T. Furness III, editors, Virtual Environments and Advanced
Interface Design. Oxford University Press, New York, New York, 1995.

Goldstein, Bruce E., Sensation and Perception. New York: Brooks/Cole, 1996.

Hendrix, C., and Barfield, W. The Sense of Presence Within Virtual
Environments.    Presence: Teleoperators and Virtual Environments 5, 3 (Summer
1996), 290-301.

Moore, Brian C.J., An Introduction to the Psychology of Hearing.  New York: AP
Academic Press, 1997.

Neuman, W. R.,  Beyond HDTV: Exploring Subjective Responses to Very High
 Definition Television. MIT Media Laboratory July, 1990

Prosen, C.A., Moody, D.B., Stebbins, W.C., and Hawkins, J.E., Jr., Auditory
intensity  discrimination after selective loss of cochlear outer hair cells.
Science, 212, 1286- 1288.

Stein, Barry E. and Meredith, M. Alex. The Merging of the Senses.  Cambridge,
MA:  MIT Press, 1993.

Welch, R.B. and Warren, D.H.  Intersensory Interactions. In K. Boff, L. Kaufman,
and  J.P. Thomas (eds.) Handbook of Perception and Human Performance, Volume 1.
New York: J. Wiley and Sons, 1986.


--------------------------------------------------------
Jarrell Pair
Research Assistant, Virtual Environments Group
Assistant Director, AudioLab
Graphics, Visualization, and Usability Center
Georgia Institute of Technology
Web:   http://www.cc.gatech.edu/gvu/people/jarrell.pair
Phone: (404)-894-4144 (Lab)
Fax:    (404)-894-0673
--------------------------------------------------------