[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] ISCA SIGML seminar: Audio Spectrogram Transformer for Audio Scene Analysis

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: [AUDITORY] ISCA SIGML seminar: Audio Spectrogram Transformer for Audio Scene Analysis
From: Hao Tang <haotang@xxxxxxxx>
Date: Thu, 10 Jun 2021 00:47:58 +0100
Approved-by: haotang@xxxxxxxx
Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :mime-version:approved-by; bh=0GVJv6yoDBek+r8hhUa0+F+EYx36WkM9+3sl4aVeNR8=; b=EBBIS6ywyD4wb6j2MEN2+I+lDO84pe6F5xsQeJz1tIf22VjuVQn4fpTt/b9uw9hJJF eRN/tm71hz47bFvRES+BtpM3dHmBQr5H2e/4kmCsd5QWdPpclOy6cUGSfqxYKM9XHG9t dhr/NRxLnjCWJhiQqdHlUa+poKRtGJguVzQB4yq5uLGAdm+JdZJ0Mldusg1/EdzP+dMV DQYMqJQG4p2IITMq+H+1c3jcaFaAEfG08S+mnka0DPqwpkGYR7vdJRh70LGQ7ai1ioF1 yP6v1oP5Ljzo8Bj5nqiu9Jv1GYYuVUkqX27UcydoxYJ43cqMncMiCWbtbnXUASy25Hrp pKcQ==
Arc-seal: i=1; a=rsa-sha256; t=1623299135; cv=none; d=google.com; s=arc-20160816; b=s4XSgysvGOg/7SEEI1TxY89WqtAeAY2glAqLNaJGgBDX85dY28VfsyLDYYytR44+zS 99n3RvFhoPmVG/PWKO4txoDmhgrHe7AJvQ4X4J4Fv7i1tt144y+JNwJC0IHAaMHheJZZ hsh2HMoIqt3AuIBfv1lO9n62W5Y+e9+4YJ94raPzPvAGLVLPWQvGCREgFZXoOJyHc9Sk LH6LN5J/1NdHEJxYrHix1XCmVfJupbfs1JBfFPAX5y4oul6hULaPly02jvkbuVFndRbr +OdclrhytY71Z0cD/mZ+5vBq8I5e7IMZeAJ/vxxUdYpQwyGMONgiRcVAT1u1x41vzFak 95Xw==
Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx
Delivered-to: dan.ellis@xxxxxxxxx
List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
Reply-to: Hao Tang <haotang@xxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear colleagues,

We are hosting a talk that might be of interest to the people on this list.

Next Wednesday (16 Jun) at 5pm (UTC+0), Yuan Gong from MIT will talk
about audio scene analysis with transformers. The details of the talk
can be found at the end of the email and at the seminar webpage
https://homepages.inf.ed.ac.uk/htang2/sigml/seminar/.

If you are interested, the link to the talk will be distributed
through our mailing list https://groups.google.com/g/isca-sigml.
Please subscribe and stay tuned!

Best,
Hao

---

Title: Audio Spectrogram Transformer for Audio Scene Analysis

Abstract: Audio scene analysis is an active research area and has a wide
range of applications. Since the release of AudioSet, great progress has
been made in advancing model performance, which mostly comes from the
development of novel model architectures and attention modules. However,
we find that appropriate training techniques are equally important for
building audio tagging models, but have not received the attention they
deserve. In the first part of the talk, I will present PSLA, a
collection of training techniques that can noticeably boost the model
accuracy.

On the other hand, in the past decade, convolutional neural networks
(CNNs) have been widely adopted as the main building block for
end-to-end audio classification models, which aim to learn a direct
mapping from audio spectrograms to corresponding labels. To better
capture long-range global context, a recent trend is to add a
self-attention mechanism on top of the CNN, forming a CNN-attention
hybrid model. However, it is unclear whether the reliance on a CNN is
necessary, and if neural networks purely based on attention are
sufficient to obtain good performance in audio classification. In the
second part of the talk, I will answer the question by introducing the
Audio Spectrogram Transformer (AST), the first convolution-free, purely
attention-based model for audio classification.

Bio: Yuan Gong is a postdoctoral associate at the MIT Computer Science
and Artificial Intelligence Laboratory (CSAIL). He received his Ph.D.
degree in Computer Science from the University of Notre Dame, and his
B.S. degree in Biomedical Engineering from Fudan University. He won the
2017 AVEC depression detection challenge and one of his papers was
nominated for the best student paper award in Interspeech 2019.
Currently, his research interests include: audio scene analysis,
speech-based health systems, voice anti-spoofing.

Prev by Date: [AUDITORY] PostDoc in Auditory Neuroscience and Bilingualism
Next by Date: [AUDITORY] Fully-funded PhD position about perception in crocodile
Previous by thread: [AUDITORY] PostDoc in Auditory Neuroscience and Bilingualism
Next by thread: [AUDITORY] Fully-funded PhD position about perception in crocodile
Index(es):
- Date
- Thread