Re: correlation of vision and audition (Al Bregman )

Subject: Re: correlation of vision and audition From: Al Bregman <bregman(at)hebb.psych.mcgill.ca> Date: Wed, 11 Feb 1998 18:39:56 -0500 Dear Colleagues, Thanks to those of you who sent in information about the mutual interaction of visual and auditory signals. Here are the abbreviated replies that I received from two lists: (1) AUDITORY, and (2) ICAD (International Conference on Auditory Display). There are a number of useful references in it. Jason: If you want to join the AUDITORY list, send a message to Dan Ellis <dpwe(at)ICSI.Berkeley.EDU>, asking him to put you on the list - Al Bregman Here is the request for information that I sent to these lists: -------------- request for information starts here ---------------- >Dear Colleagues, > >A doctoral student in the music recording program at McGill, who is taking >my course in auditory perception, wrote me to ask about a phemonenon that >he had observed while doing animation. I include a part of his message: > >---------- Forwarded message ---------- >Date: Fri, 6 Feb 1998 14:15:35 -0500 (EST) >From: Jason Corey <corey(at)music.mcgill.ca> > >While creating and editing a soundtrack for an animated film, it became >apparent that sounds occurring sychronously in time with visual events on >the screen, had a effect on how I perceived the visuals. For one >particular scene there happened to be a great deal of activity happening >in the animation, with different events happening at different positions >on the screen. Without a sound track there were many events that were not >perceived, until a sound effect was sychronised with the particular visual >events. > >It seems that by having sound accompany a visual, many more details >of the visual are perceived than without a soundtrack. It would seem >that our auditory system helps the visual system to "sort out" the >details when there are many events happening visually. Maybe this >somehow helps us focus on certain visual details more than others. My >guess is that by changing the soundtrack to a given moving picture, it is >possible to alter what the subject will focus on. > >I was wondering how much work has been done in this area. Any help would >be appreciated. > ... > >Jason Corey >--------- end of forwarded message --------------- > >I know about the work on the ventriloquism effect by Thomas, 1941, by >Witkin et al, 1952, and by Bertelson, Radeau and others more recently, but >this research is more concerned with the PERCEIVED LOCATION of the sound >or visual object rather than its behavior over time. I think the question >is about how a correlated sound helps in the visual parsing of a display. > >Concerning the reverse, how a correlated visual event would help in the >parsing of an auditory presentation, we have probably all seen one person >using hand motions in the air to direct another person's attention to a >particular part in a complicated piece of music. > >I am looking for information about these phenomena - both when audition >assists visual parsing or the reverse. > >I know about some of the research that looks at the role of visual cues in >the perception of speech (Dodd, 1977, 1979, 1980; Massaro, 1987; McGurk >and colleagues). I also am aware of some research on the influence of >correlated sounds on apparent motion (phi) by Gilbert (1939) and by >O'Leary and Rhodes (1984). There is also the work of Spelke and her >colleagues on the integration of correlated visual and auditory events by >infants. > >I have also heard of a study by Sumby and Pollack on how the seeing of a >speaker can help the listener to understand speech in noise, but I >don't have the exact reference. > >The effects of visual input on speech understanding may be entirely due to >the fact that the visual input actually supplies additional >information about the identity of the speech signal. Has anybody >determined whether any part of the effect may be in assisting the listener >to parse the mixture of sounds that might be present? > >Does anyone know of articles, chapters or technical reports specifically >on the issue of how information from one modality helps us to parse >mixtures in another? Or of the exact reference to the Sumby-Pollack >article? > >If so, I'd appreciate hearing about them. > >Thanks, > >Al > >---------------------------------------------------------------------- >Albert S. Bregman, Professor, Dept of Psychology, McGill University >1205 Docteur Penfield Avenue, Montreal, Quebec, Canada H3A 1B1. >Phone: +1 514-398-6103 Fax: -4896 Email: bregman(at)hebb.psych.mcgill.ca >Lab Web Page: http://www.psych.mcgill.ca/labs/auditory/laboratory.html ---------------------------------------------------------------------- ---------- the request for information ends here -------------------- Here are the answers I received: ----------- the set of received answers starts here ----------------- The full Sumby-Pollack reference is this: Sumby, W.H. and Pollack, I. (1954) Visual contribution to speech intelligibility in noise, J. Acoust. Soc. Am. 26: 212-215. (Sent by Steve Greenberg <steveng(at)ICSI.Berkeley.EDU>, John Neuhoff <neuhoffj(at)lafvax.lafayette.edu> "Peter F Assmann,GR4126,2435,GR41" <assmann(at)utdallas.edu> "Thomas Anthony Campbell" <campbelt(at)mailexcite.com>) John Neuhoff <http://sound.media.mit.edu/AUDITORY/asamtgs/asa97pen/5aMU/5aMU3.html> also sent these: Stein, B. E., Meredith, M. A., Huneycutt, W. S., & McDade, L. (1989). Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience, 1(1), 12-24. Stein, B. E., & Meredith, M. A. (1994). The Merging of the Senses. Cambridge, MA: MIT Press. Perrott, D. R., Sadralodabai, T., Saberi, K., & Strybel, T. Z. (1991). Aurally aided visual search in the central visual field: Effects of visual load and visual enhancement of the target. Human Factors, 33(4), 389-400. Scott Lipscomb also gave a talk at the ASA meeting last May entitled "Perceptual measures of visual and auditory cues in film music". The abstract is online at: --John Neuhoff ______________________________________________________________ John Neuhoff Phone: 610-250-5287 Department of Psychology FAX: 610-250-5349 Lafayette College email: neuhoffj(at)lafayette.edu Easton, PA 18042 www.lafayette.edu/~neuhoffj/neuhoff.htm -------------------------------------------------------------- Date: Fri, 6 Feb 1998 21:04:55 -0500 (EST) To: Al Bregman <bregman(at)hebb.psych.mcgill.ca> From: Harold Fiske <hfiske(at)julian.uwo.ca> ANNABEL COHEN'S WORK MAY BE OF INTEREST TO YOU, SEE, E.G., THE ISSUE OF _PSYCHOMUSICOLOGY_ SHE EDITED RECENTLY (ONE OR TWO ISSUES BACK -- SORRY I DO NOT HAVE THE EXACT REFERENCE IMMEDIATELY AVAILABLE, BUT IT IS EASILY FOUND). HAROLD FISKE UNIVERSITY OF WESTERN ONTARIO --------------------------------------------------------- >From Nick.Zacharov(at)research.nokia.com Sat Feb 7 03:39:04 1998 Hi The issue of multimodal interaction seems to have been studied to a fair extent within the broadcast & film industry. However in these most of the work related to AV sync and percieved quality. i have a few refs. on this if you require. The overall cognition process and the effective enhancement or degradation associated with multimodel stimuli is well described 'Stein B. E., Meredith M. A., The merging of the senses, pp 148-156, MIT press, 1993'. This book studies both the percpetion of cat and humans (apparently having similar cognative processes) and look an numerous multimodal interactions I hope this helps yours ------------------------------------------- Nick Zacharov Nokia Research Center Speech & Audio Systems PO Box 100 Tel +358-3-272-5786 33721 Tampere Fax +358-3-272-5899 FINLAND ------------------------------------------------------------------- Date: Sat, 07 Feb 1998 10:28:16 -0600 Organization: The University of Texas at Dallas To: Al Bregman <bregman(at)hebb.psych.mcgill.ca> From: "Peter F Assmann,GR4126,2435,GR41" <assmann(at)utdallas.edu> Al, here is the Sumby and Pollack reference, along with a set of other papers concerning with the issue of how visual cues from lipreading supplement those provided by the acoustic signal. Chris Darwin brought my attention to the paper by Driver, which addresses the role of vision in selective attention to a target voice against the background of a competing voice. Driver created an audio signal of a mixture of two voices speaking a list of target and distractor words, and the task was to identify as many words as possible. In combination with the audio signal subjects saw the face of one of the talkers on a video monitor (synchronized with one of the voices in the audio signal). Performance in recognizing the target voice was around 58% but when the video monitor was moved (either 100 cm to the left or right), identification improved to about 77% correct. The apparent change in the visual source caused listeners to hear the corresonding sound source to come from a different direction and thereby enhanced the segregation of the two voices. -Peter ==================================================================== Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature 381: 66-67. Summerfield, Q. (1991). Visual perception of phonetic gestures. In: Modularity and the Motor Theory of Speech Perception. Edited by Mattingly, I. and Studdert-Kennedy, M., Ch. 6, pp. 117-138. L. Erlbaum & Associates: Hillsdale, N.J. Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In Hearing by eye: The psychology of lip-reading. Edited by B. Dodd and R. Campbell. (Erlbaum, N.J.) Grant, K.W., Braida, L.D. and Renn, R.J. (1994). Auditory supplements to speechreading: Combining amplitude envelope cues from different spectral regions of speech. Journal of the Acoustical Society of America 95: 1065-1073. Grant, K.W., Ardell, L.H., Kuhl, P.K.and Sparks, D.W. (1985). The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. Journal of the Acoustical Society of America 77: 671-677. ----------------------------------------------------------- Date: Mon, 9 Feb 1998 09:38:05 GMT To: Al Bregman <bregman(at)hebb.psych.mcgill.ca> From: bob.carlyon(at)mrc-apu.cam.ac.uk (Bob Carlyon) al, do you know about the Nature article by Jon Driver, in which "target" and "distractor" speech were mixed into a common loudspeaker? H efound that a video of someone producing the target speech produced better identification when the video monitor was located above a "dummy" speaker to one side of th ereal one, compared to when it was above the real speaker. Reference: "Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading", Nature 381, p66-68 [1996] He has also done a lot of work with Charles Spence showing the various ways in which the identification of visual targets can be affected by auditory "cues" which "attract attention" to wards (or away from) the target. Also vice versa. Some of this is in P&P 1997 cheers bob ------------------------------------------------------------------------------ Dr. Bob Carlyon, MRC Applied Psychology Unit,15 Chaucer Rd. CAMBRIDGE CB2 2EF, England. Phone: (44) 1223 355294 ext 720 FAX: (44) 1223 359062. email: bob.carlyon(at)mrc-apu.cam.ac.uk ---------------------------------------------------------------------------- Date: Mon, 09 Feb 1998 13:38:22 +0100 From: Jon Barker <j.barker(at)dcs.shef.ac.uk> Reply-To: jon(at)kom.auc.dk Organization: Sheffield University Al Bregman wrote: > The effects of visual input on speech understanding may be entirely due to > the fact that the visual input actually supplies additional > information about the identity of the speech signal. Has anybody > determined whether any part of the effect may be in assisting the listener > to parse the mixture of sounds that might be present? > Yes, Driver 1996, "Cant remember the title!" Nature 381, pages 66--68 addresses this question.. He was interested in whether the ventriloquist illusion could actually segregate speech sources such that recognition scores in simultaneous speaker tasks would be higher. Listeners were presented with two simultaneous speakers (using a loudspeaker) and shown the lip movements for just one speaker. They were asked to repeat the words of the talker who's lip movements they were watching. The (rather remarkable) finding is that when the video abd audio are presented from a different spatial locations, the ventriloquist illusion shifts the apparent location of one of the voices, and this illusory spatial segregation improves their ability to identify the words in the mixture. i.e. it acts like a real speatial separation. In another experiment, reported in the same paper, it was demonstrated how seeing the lips of one of the talkers could actually help the listener to identify the words of the *unseen* speaker. The results would seem to implicate visual cues in the segregation itself - not just as extra information to help identification after segregation. Cheers Jon -- Jon Barker, Speech and Hearing, Computer Science, University of Sheffield, Sheffield S1 4DP, UK Phone: +44-(0)114-22 21800; FAX: +44-(0)114-278 0972 Email: j.barker(at)dcs.shef.ac.uk; URL: http://www.dcs.shef.ac.uk/~jon --------------------------------------------------- To: bregman(at)hebb.psych.mcgill.ca Date: Mon, 09 Feb 1998 07:57:51 -0700 From: "Thomas Anthony Campbell" <campbelt(at)mailexcite.com> Dear Al, There's a good section in Goldstein's latest edition of Sensation and Perception on McGurk-like effects for plucked and bowed instruments: seeing a plucked instrument when the sound is that of a bowed intrument makes the perciever report that sound as plucked. Similar effects don't occur for written words combined with sounds(Fowler and Deckle, 1991). Perhaps you've seen all this. It seems to me there's might be something in this audio-visual correlation of *movements* during auditory-visual object identification. Written words don't move. However, bows, fingers, mouths and the sounds of speech and music do move. *When* that correlation occurs I'd really like to know. As for assistance of visual parsing by sound , we've found that a repeated acoustic "one" can have a small facilitatory effect on *serial recall* of lip-read digit lists. This facilitatory effect over silent lips isn't very clear cut. Perhaps a repeated "one" helps them concentrate on parsing the lip-read digits more. There are other explanations I'm sure. For me, it's an open question. I'm in search of similar papers that involve manipulations of audio-visual synchrony in McGurk-like effects. If you hear of any such papers then please send the references on to me. Hope this helps! Tom Campbell Research student Dept of Psychology Reading University Whiteknights Reading ENGLAND FOWLER, C. & DECKLE, D.J. (1991) Listening with eye and hand: Cross-modal contributions to speech perception. J of Exp Psy: HPP 17, 816-828 From: Scott Lipscomb <lipscomb(at)utsa.edu> To: "'Auditory List'" <auditory(at)vm1.mcgill.ca> Cc: "'Al Bregman'" <bregman(at)hebb.psych.mcgill.ca> List members: I must confess that I was THRILLED to see the following posting by Al Bregman and his student, Jason Corey. There has been quite a flurry of activity in the experimental investigation of film music and its associated context ... e.g., animation, motion pictures, etc. Both my Master's thesis (1990) and doctoral dissertation (1995) at UCLA looked at these issues from an experimental aesthetic frame of reference. With the assistance of Dr. Roger Kendall, a 7-year series of experiments were run. The results the Master's thesis were published in Psychomusicology (vol. 13, nos. 1&2) in 1994 ... in fact, the entire volume is devoted to Film Music, guest-edited by Annabel Cohen. Dr. Cohen and one of her students have also done some important research in this area (see Marshall & Cohen, 1988 in Music Perception, vol 6). More germaine, however, are the results of my dissertation, which addressed the issue of A-V synchronization ... directly relevant to Jason's question(s) below. I am hoping to publish these results soon, so they will be generally available. Some aspects of the study may be viewed at my Web address ... http://music.utsa.edu/~lipscomb I would welcome any additional discussion of the role of Film Music and/or comments about my research. Sincerely, Scott Lipscomb Date: Mon, 09 Feb 1998 15:00:41 -0400 (AST) From: Annabel Cohen <acohen(at)upei.ca> Hi Al, My interest in auditory-visual correlation, as you call it, stems from my research on effects of music in film perception. I came up with a framework that had at the core the notion of the control of visual attention by audio-visual congruence, so you can see why I am interested in your query. A statement of this framework has appeared in several publications that I will send to you (also a set for Jason Corey). The framework was applied in two studies, one entailing a filmed animation (2 triangles and a circle) and the other, interactions among wolves. The data in support of the notion of visual capture by audition however are scant, in part I think due to the technology available to me at the time but also due to a window for tolerance on what is regarded as temporally congruent. There are also separate issues of phase and pattern to be considered. Shared pattern is probably more important than shared phase for nonverbal materials, though for verbal, precision in all aspects of timing is likely more critical. Roger Kendall and Scott Lipscomb have also focused on the issue of structural congruence in the film-music domain, and one of their publications will be found in a journal issue of Psychomusicology (1994, Vol 1-2) that I am sending. Bill Thompson's research in the same volume on the effects of soundtracks on perceived closure might also be relevant. Finally, in the same volume, buried in a paper by Iwamiya is evidence that asynchronous timing negatively influences judgements about music-video excerpts. I have recently been listening to your CD demonstrations and felt that there might be a connection between auditory capture of a structurally similar auditory stream and auditory capture of a structurally similar visual display. I'd be interested in your thoughts on this and the other information as I am unaware of the Sumby and Pollock work you mentioned. Best regards, Annabel ------------------------------------------------ Date: Mon, 9 Feb 1998 15:01:26 -0500 From: Robert Zatorre <MD37(at)MUSICA.MCGILL.CA> For some interesting behavioral as well as neurophysiological evidence pertaining to auditory visual interactions, see the following: Stein, B.E., Wallace, M.T. and Meredith, M.A. (1995) Neural mechanisms mediating attention and orientation to multisensory cues. In The Cognitive Neurosciences, M. Gazzaniga Ed., MIT press, Cambridge, Mass., pp. 683-702. Knudsen, E.I. and Brainard, M.S. (1995) Creating a unified represenation of visual and auditory space in the brain. Annual Review of Neuroscience, 18, 19-43. These studies have shown that under at least some conditions, stimuli that are subthreshold in one or the other modality alone can be responded to when they're combined, hence demonstrating some nonlinear interactions. Stein and colleagues have provided neurophysiological evidence that neurons within the deep layers of the feline superior colliculus contain topographic maps of both visual and auditory space, and that large enhancements of response are observed to combined auditory and visual stimulation; this structure may therefore be one substrate for integration of the two modalities. In addition, it has been shown that inputs from unimodal neurons within polysensory cortical areas are important determinants of SC integration of response. Robert J. Zatorre, Ph.D. Montreal Neurological Institute 3801 University St. Montreal, QC Canada H3A 2B4 phone: 1-514-398-8903 fax: 1-514-398-1338 e-mail: md37(at)musica.mcgill.ca -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ From: Scott Lipscomb <lipscomb(at)utsa.edu> To: "'Al Bregman'" <bregman(at)hebb.psych.mcgill.ca> Al: Good to hear from you .... GREAT! I will look forward to hearing from Jason. As you might imagine, principles of "auditory scene analysis" play a central role in any discussion of film soundtracks ... what with the sound effects, music, dialogue, etc. Hope all is well. Scott ---------------------------------------------------------------- Date: Mon, 09 Feb 1998 21:23:22 -0500 From: Jarrell Pair <jarrell(at)cc.gatech.edu> Dr. Bregman, I am a graduate student studying at Georgia Tech's Graphics, Visualization, and Usability (GVU) Center. At GVU AudioLab, as part of my master's degree work, we are designing an experiment to evaluate and document audio-visual cross modal effects in virtual environments. Below is a list of references I have found while conducting literature research. Also, I have included a rough initial description of the type of experiment we are hoping to conduct once we have a competently designed experiment. Any comments you may have would be greatly appreciated. I would be very interested in any similar work being conducted at McGill or any additional references you may come across. Best Regards, Jarrell Pair -------------------------------------------------------------------------------- ---------------------------------------------------------------------- Main Title: " Basic principles of sensory evaluation" Philadelphia, American Society for Testing and Materials [1968]. Author: ASTM Subcommittee E18.02 on Principles of Psychophysical Test Methods. This book has a section titled "Intercorrelation of the Senses". It references some interesting experiments conducted in the Soviet Union on the effects of audio on vision. These were conducted during the Cold War making the data's value questionable. However, the results were compiled and reviewed in London, I.D., "Research on Sensory Interaction in the Soviet Union," Psychological Bulletin, Vol. 51, No. 6, 1954, pp.531-568 -------------------------------------------------------------------------------- ---------------------------------------------------------------------- Main Title: The psychology of perception. Author: Dember, William Norton, This text has a section on "Intermodal Manipulations". It also mentions Ivan London's review of Soviet literature. Dember's book claims that audio affects visual sensitivity and acuity. Specifically the Russian studies claim: -Sensitivity to white light in the fovea is increased under auditory stimulation of moderate intensity -Auditory stimulation increases sensitivity to blue-green light, but lowers sensitivity to orange-red. -------------------------------------------------------------------------------- ------------------------------------------------------------------------------- ------ Main Title: Visual and auditory perception Author: Murch, Gerald M., Murch's book includes a section on "Auditory and Visual Interaction". It mentions a 1967 study by Posner indicating that the auditory system can retain information better than the visual system. It mentions that a growing body of work "suggests a close interrelationship between the short-term storage mechanisms of vision and audition". Unlike the Russian study, Posner's work is well respected and referenced. -------------------------------------------------------------------------------- ---------------------------------------------------------------------- Neuman, W. R., Beyond HDTV: Exploring Subjective Responses to Very High Definition Television. MIT Media Laboratory July, 1990 This document supposedly describes a much discussed but never referenced experiment at M.I.T. Have you ever read this paper? I have been unsuccessful in my attempts at finding it. -------------------------------------------------------------------------------- ---------------------------------------------------------------------- Stein, Barry E. and Meredith, M. Alex. The Merging of the Senses. Cambridge, MA: MIT Press, 1993. -------------------------------------------------------------------------------- ---------------------------------------------------------------------- Welch, R.B. and Warren, D.H. Intersensory Interactions. In K. Boff, L. Kaufman, and J.P. Thomas (eds.) Handbook of Perception and Human Performance, Volume 1. New York: J. Wiley and Sons, 1986. -------------------------------------------------------------------------------- ---------------------------------------------------------------------- Audition/Vision Cross Modal Effects: A Research Proposal Sound designers in the film and television industry have long relied on the notion that high quality sound can be used to enhance the audience's visual perception of moving images. This concept has never been formally researched or published. Recently, the debate over standards for high definition television (HDTV) sparked a degree of interest in the effect of audio on visual perception. W.R. Neuman set up an informal experiment in which subjects were required to watch two video sequences. In both sequences, the visual information was identical. However, the first sequence was encoded with low quality audio while the second sequence incorporated high quality sound. The subjects were subsequently asked to compare the two sequences. Nearly every subject stated that the second video sequence was visually superior though the only difference was that it had better implemented sound. Neuman explained how "It turns out that there is an 'Audio Effect; in the evaluation of picture quality. In fact, the effect of improved audio is as strong on picture quality ratings as it is on audio quality ratings." This audio effect, if it does indeed exist would have a significant impact on the design of virtual environments. Designers of virtual environments seek to create a feeling of presence, or immersion for users. Primarily, this goal has been pursued by creating worlds with convincing 3D interactive graphics. This sense of presence is achieved by chiefly engaging the human sense of sight. Unlike sight, the sense of hearing is often neglected in the implementation of a virtual world. Recent work indicates that the integration of spatial audio in a virtual environment enhances a user's sense of presence (Hendrix and Barfield 290-301). Regardless of considerable evidence on its immersive potential, audio is often banished as the poor stepchild of virtual reality. The plight of audio in interface design is explained by Cohen and Wenzel: Audio alarms and signals have been with us since long before there were computers, but even though music and visual arts are considered sibling muses, a disparity exists between the exploitation of sound and graphics in interfaces. . . . For whatever reasons, the development of user interfaces has historically been focused more on visual modes than aural. (291) This trend is in part due to technical resource limitations of computer systems. Designers were forced to sacrifice audio quality for graphics performance. However, these restrictions no longer exist. In the past several years dedicated audio ASIC's (application specific integrated circuits) coupled with fast CPU's have made it feasible to implement high fidelity, immersive audio in graphically intensive virtual environments. A more definite understanding of how sound affects visual perception would allow virtual environment designers to optimally distribute resources among the audio and visual aspects of a virtual world as a method of maximizing presence. The Significance of Audio-Visual Cross Modal Effects in Virtual Environments: A Research Proposal At Georgia Tech's Graphics Visualization, and Usability (GVU) Center, the Virtual Environments (VE) Group has developed virtual worlds for treating individuals suffering from socially debilitating phobias. The first of these systems was built to treat fear of heights. Audio in this system was functionally minimal and usually disabled. Experimental trials with actual patients indicated that this approach to therapy could be effective. Subsequently, a PC based system was developed to treat fear of flying. Audio was a major component of the fear of flying virtual environment project. Many users mentioned how sound greatly contributed to making the experience more "real". This system is currently being used by psychotherapists in Atlanta, Cleveland, and San Diego. Recently, a new virtual environment was built to treat Vietnam veterans suffering from post traumatic stress disorder (PTSD). This system is currently being evaluated at the Atlanta Veterans Administration hospital. It consists of a helicopter ride and a jungle walk. The PTSD system incorporates a complex, high fidelity soundscape. Source sounds were gathered from Hollywood production libraries and from tapes provided by the Veterans Administration. Explosions were processed to maximize their low frequency or bass characteristics. These sounds were played back for the user over high quality headphones. Also, a speaker designed to respond well to low frequencies was mounted under the user's chair. This scheme allowed the user to feel and hear the impact of explosions and aftershocks. When the system was being developed, many users initially used the it without sound enabled. Subsequently, they experienced the environment with the three dimensional soundscape. All users invariably commented that the sound transformed their perception of the environment. These comments on audio in the PTSD virtual environment prompt a need for a formal experiment. This experiment would have two aims. One goal would be to document the contribution audio provides to the overall sense of presence. Secondly, and more importantly, it would aim to provide additional understanding as to how and if audio affects visual perception. In other words, evidence would be gained to support or challenge the idea of the audio effect Neuman mentioned. Data from this experiment could be used to eventually build a method for quantifying the sense of presence in a virtual environment. This experiment would essentially be a perception study. Extensive consultation with psychologists would be necessary to formulate a credible experimental design. A general sketch of the experiment is given below. The experiment would consists of two tasks. The first task would involve a duplication of Neuman's HDTV experiment. However, this experiment would be performed using formal experimental methods. Subjects would be asked to watch television in a mock up of a living room. This living room would include a two way mirror behind which evaluators could observe as necessary. A living room mock up as described is available at the Georgia Center for Advanced Telecommunications Technology (GCATT) within the Broadband Telecommunications Center (BTC). Two identical video sequences would be played. One would have low fidelity monaural sound. The second video would use compact disc quality stereo sound. Subjects would be asked to fill out a questionnaire concerning their opinion of the overall quality of each video sequence. The video sequences and audio could be provided by Professor James Oliverio, director of GVU AudioLab, who has extensive experience as a sound designer for film and television. As mentioned earlier, this portion of the experiment would be used to verify Neuman's results. Neuman's experiment is widely referred to, but it has never been taken seriously by the audio research community due to its informality and the fact that it has not been further verified through repetition and publication. The subjects would next be asked to complete the experiment's second task. Subjects would don a virtual reality helmet and experience three identical virtual worlds. The subjects would first enter a virtual environment without sound. The second world would have monaural, low fidelity sound. A third world would have stereo, CD quality sound with 3-D audio effects. Subjects would be asked to complete an extensive questionnaire. The questionnaire would ask questions concerning the relative visual quality of each environment. They would also be asked to rate their sense of "being there" or presence. The virtual world used for this experiment would most likely be the environment used for the previously mentioned post traumatic stress disorder treatment system. After an adequate number of subjects is run, data from the questionnaires would be compiled. Trends in the results would hopefully provide insight into how audio and visual cues interact in video sequences and immersive virtual environments reality. Also, it could provide an incremental step in finding a method for quantifying presence in a virtual reality world. Preliminary References Cohen, M., and Wenzel., E. M. The Design of Multidimensional Sound Interfaces. In W. Barfield and T. Furness III, editors, Virtual Environments and Advanced Interface Design. Oxford University Press, New York, New York, 1995. Goldstein, Bruce E., Sensation and Perception. New York: Brooks/Cole, 1996. Hendrix, C., and Barfield, W. The Sense of Presence Within Virtual Environments. Presence: Teleoperators and Virtual Environments 5, 3 (Summer 1996), 290-301. Moore, Brian C.J., An Introduction to the Psychology of Hearing. New York: AP Academic Press, 1997. Neuman, W. R., Beyond HDTV: Exploring Subjective Responses to Very High Definition Television. MIT Media Laboratory July, 1990 Prosen, C.A., Moody, D.B., Stebbins, W.C., and Hawkins, J.E., Jr., Auditory intensity discrimination after selective loss of cochlear outer hair cells. Science, 212, 1286- 1288. Stein, Barry E. and Meredith, M. Alex. The Merging of the Senses. Cambridge, MA: MIT Press, 1993. Welch, R.B. and Warren, D.H. Intersensory Interactions. In K. Boff, L. Kaufman, and J.P. Thomas (eds.) Handbook of Perception and Human Performance, Volume 1. New York: J. Wiley and Sons, 1986. -------------------------------------------------------- Jarrell Pair Research Assistant, Virtual Environments Group Assistant Director, AudioLab Graphics, Visualization, and Usability Center Georgia Institute of Technology Web: http://www.cc.gatech.edu/gvu/people/jarrell.pair Phone: (404)-894-4144 (Lab) Fax: (404)-894-0673 --------------------------------------------------------

This message came from the mail archive
http://www.auditory.org/postings/1998/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University