Re: correlation of vision and audition ("PETER B.L. Meijer" )

Subject: Re: correlation of vision and audition
From:    "PETER B.L. Meijer"  <meijer(at)NATLAB.RESEARCH.PHILIPS.COM>
Date:    Thu, 12 Feb 1998 11:01:00 +0100

February 12, 1998 Annabel Cohen wrote to Al Bregman, in response to his query: [I'm not sure she is on this list, so I'll cc this to her] > There are also separate issues of phase and pattern to > be considered. Shared pattern is probably more important > than shared phase for nonverbal materials, though for > verbal, precision in all aspects of timing is likely more > critical. Yes, I expect that too, which is why I stressed correlation ("shared pattern", with short delays allowed) over synchrony ("phase", no delay) in my non-speech non-music application. I'm glad that Al posted some references to the literature touching on this to some extent. Annabel: > I have recently been listening to your [Al's, PM] CD demonstrations > and felt that there might be a connection between auditory capture of > a structurally similar auditory stream and auditory capture of a > structurally similar visual display. You can hardly get more structural similarity between video and audio streams than via the spectrographic complement, it seems to me - although that is probably not at all the application area you are after. To me it was the whole reason for exploring it as a sensory substitution approach based on audio-visual cross-modal effects and techniques. The complementary nature, as well as the generality, of "real-time" spectrograpic synthesis (auditory) and analysis (visual) in my case *still* leads to inevitable delays when working with video frames: the images now acting as spectrograms are captured and fixed before synthesizing the associated sounds in order to avoid the spectrographic synthesis of blurred images. There are not just a few sound sources associated with each complete video frame now (like with normal movies!), but every pixel in the image acts as a sound source, so approaches like regular spatial audio wouldn't work and things *have* to be time-multiplexed (hence delayed) to allow distinguishing all sound sources, with the auditory uncertainty relation and critical bands setting the main limits. Short term auditory/visual memory of course also sets limits, but with spectrographic sounds lasting on the order of a second I so far do not know of any indications that these would be more severe than the other limits. However, I would be most interested to learn of more literature on any significant auditory/visual memory-related limits here that others might know of. Best wishes, Peter Meijer P.B.L. Meijer, ``An Experimental System for Auditory Image Representations,'' IEEE Transactions on Biomedical Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992. On-line at Also see the experiments with Roy Patterson's AIM software at My Feb 10 icad response: > Dear Al, > > > It seems that by having sound accompany a visual, many more > > details of the visual are perceived than without a soundtrack. > > > I am looking for information about these phenomena - both > > when audition assists visual parsing or the reverse. > > > Does anyone know of articles, chapters or technical reports > > specifically on the issue of how information from one modality > > helps us to parse mixtures in another? > > I don't have publications on that, but the phenomenon is quite > familiar to me. If one uses The vOICe as a sighted person, so > hearing soundscapes *and* viewing the screen at ("almost") the > same time, certain visual patterns stand out and attract attention > due to their very characteristic sounds. Examples are vertical > (rithmic) and (to a lesser extent) horizontal striping patterns. > The association persists for a while even *after* using The vOICe, > in the sense that visual striping patterns keep attracting visual > attention to a larger degree than usual until the effect wears off. > This after-effect seems simply Pavlovian to me. Note also that the > "synchronization" of audio and video need not be truly simultaneous, > but may involve short fixed delays, as long as the brain can do > its correlation work: it may be more about correlation than about > synchronization. > > It also works both ways, in the sense that sometimes one sees > objects and wonders why one does not (consciously) hear them, > and then, after focussing auditory attention, one *does* hear > them (I have a demo of that on my website, involving the grip > of a kitchen cabinet door that can be hard to hear at first, > but can be heard once you know what you should be hearing based > on the visuals). > > In the terminology of your example one could say that The vOICe > creates vision-correlated soundtracks (of a very special kind, > of course) on-the-fly, hence the close analogy with your topic. > > Best wishes, > > Peter Meijer

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University