[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Why is it that joint speech-enhancement with ASR is not a popular research topic?

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: [AUDITORY] Why is it that joint speech-enhancement with ASR is not a popular research topic?
From: "J. Scott Merritt" <alsauser@xxxxxxxxxxxxxx>
Date: Tue, 26 Jun 2018 10:45:17 -0400
Approved-by: alsauser@xxxxxxxxxxxxxx
Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:subject:from:sender:reply-to:date :message-id:content-transfer-encoding:mime-version:references :approved-by:arc-authentication-results; bh=FRyKdO0G6Y8KWfPQKH+Zpek2heQY1uYPGePu9f7BRRw=; b=waQuA+8SxZlPBd36cRVMWSzyPraEeSEoxKcZGeP5gGm0qxtkKOnLfMtcq+0Ypt5hQP YJwxR4L92OswFFT2dtDKHVZ2uVvDkxZX0IjWAj1wlp5QQnCS5mPoKOmU6zibWKbSUtvB tWY4tb9b4FGg105bTt3N1ZSASATdpnj3yLTmOKUBVG1soIu9ZQ6oV2GFVUOmKHIlCKVj fNOggjZPFMCdOvUDF4Yq0r2jZxRZNSliFeX3eOqOOR06ZJd5l7pVr1cTkqqvdzj891vP hRO0V1q8DmdKwkc2bpN//w577m4sNEqlggnZCMiBUDyZNtzVEcIX7/c1cmsMI+ILBqYj +BXg==
Arc-seal: i=1; a=rsa-sha256; t=1530073276; cv=none; d=google.com; s=arc-20160816; b=WfZ5aEXHPRVEI+tmFtLZfmPJqtHEpW2S1yHlBurNdv+QdROkc3sdVQ+X9XRWaM0bwv SmgideVUWK+qvDbsym+Iz2X7umw1/kZzEPUduguUNzmcPehCTc+q0NLg1auLz8nHhGcq 6OoTEjyu8cLhQjKWzjSswGGPhYWGJeRPE/hRuEuF80BlpPX/gZJk5hvmKq3OtJooIndI 8vpTeO0WAblT3kHzjRqmlIynWZu3gfkN9+HOcdd8hWvH3pAYWvPSSxEQkSct4FENR+L2 gv7oRwxBCXSMleLk5+6v3F6LWbaObVnMwgw+PxULUyRaPQaGL7RomCwOxSvCJOqiAu7d 0izg==
Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx
Delivered-to: dan.ellis@xxxxxxxxx
In-reply-to: <14963_1529990884_5B31CEE4_14963_12_1_CANPVCKhM2kWHMUgQHX2JUAS7mgiqKZbM2W8xjm8Us=J6GUVYEA@mail.gmail.com>
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
References: <24427_1529727153_5B2DC8B1_24427_258_1_8542A9387F138643A44D148D648EEAC642B5F7F6@KBNMXEXC10.Demant.com> <22611_1529900065_5B306C21_22611_165_1_CANPVCKjdtChc+wesqeCtMjJz0TviGX7q0PWhxMZCG9rQ9Fqcug@mail.gmail.com> <26319_1529911498_5B3098CA_26319_117_1_Pine.GSO.4.58.1806250909050.1473@orsi.inf.u-szeged.hu> <30339_1529914806_5B30A5B6_30339_426_5_7360dd55-fde2-e4ae-7bdc-0e8db4de8612@sheffield.ac.uk> <CANPVCKik8kvCgv1=J5z2OefP3WLMUjJOOHmNA6uVvs=Xgk-nYw@mail.gmail.com> <15548_1529986185_5B31BC89_15548_58_4_f7d91280-7455-8d2e-0f72-49622f4baa48@sheffield.ac.uk> <14963_1529990884_5B31CEE4_14963_12_1_CANPVCKhM2kWHMUgQHX2JUAS7mgiqKZbM2W8xjm8Us=J6GUVYEA@mail.gmail.com>
Reply-to: alsauser@xxxxxxxxxxxxxx
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Hi Samer,

We did extensive work on robust speech recognition in developing
our FAA Certified speech recognition system for aircraft pilots.
With respect to your questions, I will offer the following
observations:

- Humans will alter their speech production when speaking in the
presence of noise ... hence training and testing with "additive
noise" may not be reflective of the "real world" conditions
(e.g. in an aircraft cockpit).

- In our experience, speech "enhancement" (as judged by humans),
typically does not improve the performance of speech recognition
systems, and will often degrade it, even after retraining on that
"enhanced" speech.  It seem like the recognition system is more
sensitive (than humans) to the sometimes minor artifacts
introduced by the enhancement process.

Best regards, Scott.
J. Scott Merritt, President and Founder
VoiceFlight System LLC
www.voiceflight.com


On Mon, 25 Jun 2018 22:09:13 -0700
Samer Hijazi <hijazi@xxxxxxxxxxxxxx> wrote:

> Hi Phil,
> 
> Thanks for your insightful response and pointing me to your duplication on
> this topic from 2003.
> I am particularly intrigued by your comment,
> 
> I am particularly intrigued with your comment:
> " It would be wrong to start with clean speech, add noise, use that as
> input and clean speech + text as training targets, because in real life
> speech& other sound sources don't combine like that. "
> 
> There are many recent publication on speech enhancement  that are using a
> simple additive noise model, and sometimes RIR simulator, and they are
> publishing impressive results. Is there a need to incorporate any thing
> beyond RIR to generalize the training dataset to create a solution that
> would work properly in the real world?
> 
> Regards,
> 
> Samer
> 
> 
> 
> On Mon, Jun 25, 2018 at 9:13 PM Phil Green <p.green@xxxxxxxxxxxxxxx> wrote:
> 
> >
> >
> > On 25/06/2018 17:00, Samer Hijazi wrote:
> >
> > Thanks Laszlo and Phil,
> > I am not speaking about doing ASR in two steps, i am speaking about doing
> > the ASR and speech enhancement jointly in multi-objective learning process.
> >
> > Are, you mean multitask learning. That didn't come over at all in your
> > first mail.
> >
> > There are many papers showing if you used related objective resumes to
> > train your network, you will get better results on both objectives than
> > what you would get if you train for each one separately.
> >
> > An early paper on this, probably the first application to ASR, was
> >
> > *Parveen & Green, Multitask Learning in Connectionist Robust ASR using
> > Recurrent Neural Networks, Eurospeech 2003.*
> >
> > And it seams obvious that if we used speech contents (i.e. text) and
> > perfect speech waveform as two independent but correlated targets, we will
> > end up with a better text recognition and better speech enhancement; am i
> > missing something?
> >
> >
> > It would be wrong to start with clean speech, add noise, use that as input
> > and clean speech + text as training targets, because in real life speech &
> > other sound sources don't combine like that. That's why the spectacular
> > results in the Parveen/Green paper are misleading..
> >
> > HTH
> >
> > --
> > *** note email is now p.green@xxxxxxxxxx ***
> > Professor Phil Green
> > SPandH
> > Dept of Computer Science
> > University of Sheffield
> > *** note email is now p.green@xxxxxxxxxx ***
> >
> >

Prev by Date: Re: [AUDITORY] Why is it that joint speech-enhancement with ASR is not a popular research topic?
Next by Date: Re: [AUDITORY] Why is it that joint speech-enhancement with ASR is not a popular research topic?
Previous by thread: Re: [AUDITORY] Why is it that joint speech-enhancement with ASR is not a popular research topic?
Next by thread: Re: [AUDITORY] Why is it that joint speech-enhancement with ASR is not a popular research topic?
Index(es):
- Date
- Thread