[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Cross-modality comodulation and the release from masking



This is a multi-part message in MIME format.
--------------E5DADC48F5D1389C38C6C8A1
Content-Type: multipart/alternative;
 boundary="------------F809BAD2E5D38A317AFD28C5"


--------------F809BAD2E5D38A317AFD28C5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

John Hershey wrote:

> Ken, I've been reading your recent postings to the auditory list with
> increasing interest. I am developing a model of audio-visual interaction in
> sound localization that parallels
> aspects of your work, especially:

> >... We are currently looking into the correlation, on a sentence by sentence
> basis, between the time course of lip opening and the rms amplitude
> fluctuations in the speech signals both broadband and in selected spectral
> bands (especially the F2 region) as an explanation for the differences across
> sentences. These results indicate that cross-modal comodulation between visual
> and acoustic signals can reduce stimulus uncertainty in auditory detection and
> reduce thresholds for detection.

> I would like to read your work more closely. Would you send me references or
> postscript reprints if they're available?
>
> Here is an abstract of ours submitted to the 5th Annual Joint Symposium on
> Neural Computation, Saturday, May 16, 1998  (John Hershey & Javier Movellan)
>
> Title:  Looking for sounds:  using audio-visual mutual information to locate
> sound sources.
>
> Abstract:
>
> Evidence from psychophysical experiments shows that, in humans,
> localization of acoustic signals is strongly influenced by synchrony
> with visual signals (e.g., this effect is at work when sound coming
> from the side of your TV feels as if it were coming from the mouth of
> the actors).  This effect, known as ventriloquism, suggests that speaker
> localization is a multimodal process and that the perceptual system
> is tuned to finding dependencies between audio and visual signals.
> In spite of this evidence, most systems for automatic speaker
> localization use only acoustic information, taking advantage of
> standard stereo cues.
>
> In this talk we present a progress report on a real time audio-visual
> system for automatic speaker localization and tracking.  The current
> implementation uses input from a single camera and a microphone
> and looks for regions of the visual landscape that correlate highly with
> the acoustic signal.  These regions are tagged as highly likely to
> contain an acoustic source.  We will discuss our experience with the
> system and its potential theoretical and practical applications.
>
> John Hershey
> Cognitive Science, UCSD.

Hi John,

The work you will be describing sounds interesting. I have an HTML version of a
condensed 2-page manuscript describing work that I will present at Seattle at
the ASA meeting in June. You can find the paper at the following address:

  ASACondensedMS.htm
(http://members.aol.com/kwgrant/ASACondensedMS.htm)

I have just begun to make measurements of the correlation between lip kinematics
and acoustic amplitude envelope fluctuations derived from the whole speech
signal as well as from separate spectral regions of speech (roughly
corresponding to F1, F2, and F3 regions). Depending on the phonetic content of
the sentence (e.g., /u/ obscures many of the lips movements), the correlation
may be strong or weak. But the correlation between area of lip opening and
amplitude envelope in the F2 region (800-2500) seem to be the most strongly
correlated. This is very preliminary so I don't know how well it will hold up
for different sentences and different talkers. But it is consistent with what we
know about the information conveyed by speechreading: that place-of-articulation
is the primary cue derived from speechreading and that place cues are primarily
F2 transition cues.

Please keep me posted about your progress in this area and don't hesitate to
contact me if you have any questions.

Ken

--------------F809BAD2E5D38A317AFD28C5
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<HTML>
John Hershey wrote:
<BLOCKQUOTE TYPE=CITE>Ken, I've been reading your recent postings to the
auditory list with increasing interest. I am developing a model of audio-visual
interaction in sound localization that parallels
<BR>aspects of your work, especially:</BLOCKQUOTE>

<BLOCKQUOTE TYPE=CITE>>... We are currently looking into the correlation,
on a sentence by sentence basis, between the time course of lip opening
and the rms amplitude fluctuations in the speech signals both broadband
and in selected spectral bands (especially the F2 region) as an explanation
for the differences across sentences. These results indicate that cross-modal
comodulation between visual and acoustic signals can reduce stimulus uncertainty
in auditory detection and reduce thresholds for detection.</BLOCKQUOTE>

<BLOCKQUOTE TYPE=CITE>I would like to read your work more closely. Would
you send me references or&nbsp; postscript reprints if they're available?

<P>Here is an abstract of ours submitted to the 5th Annual Joint Symposium
on
<BR>Neural Computation, Saturday, May 16, 1998&nbsp; (John Hershey &amp;
Javier Movellan)

<P>Title:&nbsp; Looking for sounds:&nbsp; using audio-visual mutual information
to locate sound sources.

<P>Abstract:

<P>Evidence from psychophysical experiments shows that, in humans,
<BR>localization of acoustic signals is strongly influenced by synchrony
<BR>with visual signals (e.g., this effect is at work when sound coming
<BR>from the side of your TV feels as if it were coming from the mouth
of
<BR>the actors).&nbsp; This effect, known as ventriloquism, suggests that
speaker
<BR>localization is a multimodal process and that the perceptual system
<BR>is tuned to finding dependencies between audio and visual signals.
<BR>In spite of this evidence, most systems for automatic speaker
<BR>localization use only acoustic information, taking advantage of
<BR>standard stereo cues.

<P>In this talk we present a progress report on a real time audio-visual
<BR>system for automatic speaker localization and tracking.&nbsp; The current
<BR>implementation uses input from a single camera and a microphone
<BR>and looks for regions of the visual landscape that correlate highly
with
<BR>the acoustic signal.&nbsp; These regions are tagged as highly likely
to
<BR>contain an acoustic source.&nbsp; We will discuss our experience with
the
<BR>system and its potential theoretical and practical applications.

<P>John Hershey
<BR>Cognitive Science, UCSD.</BLOCKQUOTE>
Hi John,

<P>The work you will be describing sounds interesting. I have an HTML version
of a condensed 2-page manuscript describing work that I will present at
Seattle at the ASA meeting in June. You can find the paper at the following
address:

<P>&nbsp; <A
 HREF="http://members.aol.com/kwgrant/ASACondensedMS.htm";>ASACondensedMS.htm</A>
<BR>(<A
 HREF="http://members.aol.com/kwgrant/ASACondensedMS.htm";>http://members.aol.com
 /kwgrant/ASACondensedMS.htm</A>)

<P>I have just begun to make measurements of the correlation between lip
kinematics and acoustic amplitude envelope fluctuations derived from the
whole speech signal as well as from separate spectral regions of speech
(roughly corresponding to F1, F2, and F3 regions). Depending on the phonetic
content of the sentence (e.g., /u/ obscures many of the lips movements),
the correlation may be strong or weak. But the correlation between area
of lip opening and amplitude envelope in the F2 region (800-2500) seem
to be the most strongly correlated. This is very preliminary so I don't
know how well it will hold up for different sentences and different talkers.
But it is consistent with what we know about the information conveyed by
speechreading: that place-of-articulation is the primary cue derived from
speechreading and that place cues are primarily F2 transition cues.

<P>Please keep me posted about your progress in this area and don't hesitate
to contact me if you have any questions.

<P>Ken</HTML>

--------------F809BAD2E5D38A317AFD28C5--

--------------E5DADC48F5D1389C38C6C8A1
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Ken W. Grant
Content-Disposition: attachment; filename="vcard.vcf"

begin:          vcard
fn:             Ken W. Grant
n:              Grant;Ken W.
org:            Army Audiology & Speech Center
adr;dom:        Research Section;;Walter Reed Army Medical Center;;Washington,
 DC;20307-5001;
email;internet: grant@nicom.com
title:          Research Audiologist
tel;work:       (202) 782-8596
tel;fax:        (202) 782-9228
x-mozilla-cpt:  ;0
x-mozilla-html: FALSE
version:        2.1
end:            vcard


--------------E5DADC48F5D1389C38C6C8A1--