• Publications
  • Influence
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
TLDR
This paper proposes pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset, and investigates the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks.
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
TLDR
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.
Multi-level Attention Model for Weakly Supervised Audio Classification
TLDR
A multi-attention attention model which consists of multiple attention modules applied on the intermediate neural network layers that achieves a state-of-the-art mean average precision (mAP) of 0.360, outperforming the single attention model and the Google baseline system.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Audio Set Classification with Attention Model: A Probabilistic Perspective
This paper investigates the Audio Set classification. Audio Set is a large scale weakly labelled dataset (WLD) of audio clips. In WLD only the presence of a label is known, without knowing the
Deep Neural Network Baseline for DCASE Challenge 2016
TLDR
The DCASE Challenge 2016 contains tasks for Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), and audio tagging, and DNN baselines indicate that DNNs can be successful in many of these tasks, but may not always perform better than the baselines.
Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data
TLDR
A time–frequency (T–F) segmentation framework trained on weakly labelled data to tackle the sound event detection and separation problem is proposed and predicted onset and offset times can be obtained from the T–F segmentation masks.
Weakly Labelled AudioSet Tagging With Attention Neural Networks
TLDR
This work bridges the connection between attention neural networks and multiple instance learning (MIL) methods, and proposes decision-level and feature-level attention neural Networks for audio tagging, which achieves a state-of-the-art mean average precision.
Surrey-cvssp system for DCASE2017 challenge task4
TLDR
A learnable gating activation function for selecting informative local features in a weakly-supervised mode and a new batch-level balancing strategy are proposed to tackle the data unbalancing problem.
...
...