Re: Hean Movements and Sound Source Segregation (Erik Larsen )


Subject: Re: Hean Movements and Sound Source Segregation
From:    Erik Larsen  <elarsen(at)MIT.EDU>
Date:    Mon, 19 Dec 2005 09:11:15 -0500

On the same note, head motion helps enormously with externalizing virtual sound sources. I had a demo once of a system with generalized HRTFs, that was not very convincing if you kept your head still. But they had also implemented variable HRTFs and a head tracker so that the external position of the virtual source would be consistent with head movements, and as soon as you did move your head the image of the source was very convincing. Although it is entirely anecdotal, I'm sure others have had similar experiences. The integration of two separate sensory signals that are consistent with one another must be a very powerful cue to the brain - and if the cues are not consistent yet not wholly incompatible, the brain will create a percept that is somewhat compatible with both a la McGurk (McGurk and MacDonald 1976, Nature 264: 746-748). It doesn't seem too surprising that there is a large difference between front-back resolution in the case of moving your head vs. moving the sound source. Earlier in this discussion it was mentioned that in noisy situations you would orient your head so as to obtain the largest SNR possible. I think it is important to keep in mind that the visual channel has very high SNR and is, in noisy situations, most likely much more useful than a few dB acoustically (Sumby and Pollack 1954, JASA 26:212-215); so I agree with Al Bregman's observation that you would probably view the talker. Of course, an optimal acoustic SNR may be possible while keeping the talker in view as well. Erik A.J. Aranyosi wrote: > Dear list, > > I'm also not aware of any references that describe the contribution of > head motion to auditory scene analysis. However, in addition to the > arguments by Wallach, there have been some experimental studies that > confirm the contribution of head motion to sound source localization. > Back in 1993 I did an undergraduate thesis project with Steve Colburn in > which we mounted the head of a KEMAR acoustic manikin on a stepper > motor. The motor was driven by a rotation sensor placed on top of a > listener's head. Front-back confusions that arose when the head was > stationary were resolved by moving the manikin head to match the > listener's head motion. When the manikin head was moved opposite the > listener's head motion, listeners got front-back judgments reversed. We > never collected enough data to publish these results while I was there. > However, Wightman and Kistler published a study in 1995 showing that > head motion resolved front-back confusions, while sound source motion > did not. So it seems that we need to sense proprioception of the head > motion in addition to changes in the acoustic signal. > > -A.J. > > Michael Mandel wrote: > >> Also, in the late 1930s Hans Wallach gave a geometrical argument for >> the use of motion in localization: Say that you hear a sound and can >> localize it using iteraural timing and level differences to the median >> plane. If you turn your head to the left, a sound source in front of >> you will move towards your right ear, while one behind you will move >> towards your left ear. >> >> Furthermore, if the sound source is elevated above or below the >> horizontal plane, it will move less than if it were in the plane. In >> the extreme, when the sound source is directly above you its cues >> don't change at all as you rotate your head. >> >> See e.g. Wallach, H. "On Sound Localization." JASA 10(1), 1938, p83 >> >> -Mike >> >> >> >> On Sun, Dec 18, 2005 at 01:10:40PM +0100, Christian Kaernbach wrote: >> >> >>> Dear Al, >>> >>> This is an interesting question. I know of no work directly >>> addressing head movements and auditory scene analysis. The role of >>> head movements for sound source localization has certainly been well >>> studied quite some time ago (see postscript). However, whether head >>> movements are only relevant to localization, or whether they help to >>> separate sound sources, would be an interesting field of research. I >>> would reckon that in a typical cocktail party situation the listener >>> would move the head until he/she found the optimal SNR between the >>> desired signal and the rest of the sound field. Once this position >>> found, it should not be helpful to move the head further. Sure, it >>> would make the desired sound source a moving target, but with all the >>> other sound sources moving around in the same way. The first approach >>> should be to monitor what listeners actually do in difficult auditory >>> scenes. I could imagine that in case of repetitions (the important >>> phrase comes twice, e.g. because the speaker realized that it did not >>> come through) the listener might be inclined (sic!) to try a >>> different head position, so as to reduce redundancy between the two >>> communications. >>> >>> Best regards, >>> Christian >>> >>> PS: Let me write on the role of head movements for sound source >>> localization (SSL) in a postscript, a) because I am not a real expert >>> on this issue, and b) because many listers might know plenty about >>> it. I could not tell from where I have this knowledge, most probably >>> from oral communication early in my career. The two primary cues for >>> SSL are intensity differences and delay differences. These two cues >>> are, however, quite ambiguous: all sound sources on the famous "cone >>> of confusion" induce the same delay and intensity difference. Think >>> of zero delay and intensity difference: this is true for all sound >>> sources on the median plane, i.e. from ahead, top, behind, below, >>> etc. Nevertheless it has early been noted that humans can well >>> discriminate between sound sources from ahead and from behind. I was >>> told the anecdote that in the early days of SSL research this >>> performance was attributed to (from today's viewpoint) weird supposed >>> mechanisms, such as a sound pressure sensitivity of the chest. Later >>> on one started to use head fixation, and much of the ahead/behind >>> discrimination performance went away. Another mechanism involved in >>> this performance is the spectral filtering by the outer ear (head >>> related transfer functions, HRTF), but this mechanism can only be >>> helpful if the sound (or its supposed spectrum) is known to the >>> listener. So if using sine tones of varying loudness, the >>> ahead/behind discrimination depends critically on the participant's >>> ability to move his/her head. I suppose that much of this can be >>> found in Jens Blauert's book "Spatial Hearing". ... Note that head >>> movements for the improvement of SSL are quite a nice example of the >>> role of action in perception, up to the point where some say >>> "Perception is a behavior, a specific kind of action aiming at the >>> driving home of a maximum amount of information on the object of >>> interest." (Is this Gibsonian?) >>> >>> >>> -- >>> Christian Kaernbach >>> Institut für Psychologie >>> Karl-Franzens-Universität Graz >>> 8010 Graz >>> Austria >>> www.kaernbach.de fechner.uni-graz.at >>> >>> > >


This message came from the mail archive
http://www.auditory.org/postings/2005/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University