[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: correlation of vision and audition



                                        February 12, 1998


Annabel Cohen wrote to Al Bregman, in response to his query:
[I'm not sure she is on this list, so I'll cc this to her]

> There are also separate issues of phase and pattern to
> be considered. Shared pattern is probably more important
> than shared phase for nonverbal materials, though for
> verbal, precision in all aspects of timing is likely more
> critical.

Yes, I expect that too, which is why I stressed correlation
("shared pattern", with short delays allowed) over synchrony
("phase", no delay) in my non-speech non-music application.
I'm glad that Al posted some references to the literature
touching on this to some extent.

Annabel:

> I have recently been listening to your [Al's, PM] CD demonstrations
> and felt that there might be a connection between auditory capture of
> a structurally similar  auditory  stream and  auditory capture of a
> structurally similar visual display.

You can hardly get more structural similarity between
video and audio streams than via the spectrographic
complement, it seems to me - although that is probably
not at all the application area you are after. To me
it was the whole reason for exploring it as a sensory
substitution approach based on audio-visual cross-modal
effects and techniques.

The complementary nature, as well as the generality, of
"real-time" spectrograpic synthesis (auditory) and analysis
(visual) in my case *still* leads to inevitable delays
when working with video frames: the images now acting as
spectrograms are captured and fixed before synthesizing
the associated sounds in order to avoid the spectrographic
synthesis of blurred images. There are not just a few sound
sources associated with each complete video frame now (like
with normal movies!), but every pixel in the image acts as
a sound source, so approaches like regular spatial audio
wouldn't work and things *have* to be time-multiplexed
(hence delayed) to allow distinguishing all sound sources,
with the auditory uncertainty relation and critical bands
setting the main limits. Short term auditory/visual memory
of course also sets limits, but with spectrographic sounds
lasting on the order of a second I so far do not know of any
indications that these would be more severe than the other
limits. However, I would be most interested to learn of more
literature on any significant auditory/visual memory-related
limits here that others might know of.

Best wishes,

Peter Meijer


P.B.L. Meijer, ``An Experimental System for Auditory
Image Representations,'' IEEE Transactions on Biomedical
Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992. On-line
at http://ourworld.compuserve.com/homepages/Peter_Meijer/voicebme.html

Also see the experiments with Roy Patterson's AIM software at
http://ourworld.compuserve.com/homepages/Peter_Meijer/aumodel.htm


My Feb 10 icad response:

> Dear Al,
>
> > It seems that by having sound accompany a visual, many more
> > details of the visual are perceived than without a soundtrack.
>
> > I am looking for information about these phenomena - both
> > when audition assists visual parsing or the reverse.
>
> > Does anyone know of articles, chapters or technical reports
> > specifically on the issue of how information from one modality
> > helps us to parse mixtures in another?
>
> I don't have publications on that, but the phenomenon is quite
> familiar to me. If one uses The vOICe as a sighted person, so
> hearing soundscapes *and* viewing the screen at ("almost") the
> same time, certain visual patterns stand out and attract attention
> due to their very characteristic sounds. Examples are vertical
> (rithmic) and (to a lesser extent) horizontal striping patterns.
> The association persists for a while even *after* using The vOICe,
> in the sense that visual striping patterns keep attracting visual
> attention to a larger degree than usual until the effect wears off.
> This after-effect seems simply Pavlovian to me. Note also that the
> "synchronization" of audio and video need not be truly simultaneous,
> but may involve short fixed delays, as long as the brain can do
> its correlation work: it may be more about correlation than about
> synchronization.
>
> It also works both ways, in the sense that sometimes one sees
> objects and wonders why one does not (consciously) hear them,
> and then, after focussing auditory attention, one *does* hear
> them (I have a demo of that on my website, involving the grip
> of a kitchen cabinet door that can be hard to hear at first,
> but can be heard once you know what you should be hearing based
> on the visuals).
>
> In the terminology of your example one could say that The vOICe
> creates vision-correlated soundtracks (of a very special kind,
> of course) on-the-fly, hence the close analogy with your topic.
>
> Best wishes,
>
> Peter Meijer