• Publications
  • Influence
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
TLDR
This paper proposes pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset, and investigates the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks.
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
TLDR
Experimental results show that the proposed two-stage polyphonic sound event detection and localization method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Weakly Labelled AudioSet Tagging With Attention Neural Networks
TLDR
This work bridges the connection between attention neural networks and multiple instance learning (MIL) methods, and proposes decision-level and feature-level attention neural Networks for audio tagging, which achieves a state-of-the-art mean average precision.
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
TLDR
The proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.
DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline
TLDR
A cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a “CNN Baseline” system that implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision.
DCASE 2018 Challenge baseline with convolutional neural networks
TLDR
Python implementation of DCASE 2018 has five tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio tagging; the baseline source code contains the implementation of convolutional neural networks, including AlexNetish and VGGish -- networks originating from computer vision.
Capsule Routing for Sound Event Detection
TLDR
This work proposes a neural network architecture that uses the recently-proposed capsule routing mechanism to train a network that can learn global coherence implicitly, thereby improving generalization performance.
TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report
TLDR
A two-stage polyphonic sound event detection and localization method that is able to localize and detect overlapping sound events in different environments, and can improve the performance of both SED and DOA estimation, and performs significantly better than the baseline method.
Weakly labelled AudioSet Classification with Attention Neural Networks.
TLDR
This work investigates audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes, and proposes decision-level and feature-level attention neural networks for audio tagging, which achieves a state-of-the-art mean average precision.
...
...