Capsule Routing for Sound Event Detection

@article{Iqbal2018CapsuleRF,
  title={Capsule Routing for Sound Event Detection},
  author={Turab Iqbal and Yong Xu and Qiuqiang Kong and Wenwu Wang},
  journal={2018 26th European Signal Processing Conference (EUSIPCO)},
  year={2018},
  pages={2255-2259}
}
The detection of acoustic scenes is a challenging problem in which environmental sound events must be detected from a given audio signal. This includes classifying the events as well as estimating their onset and offset times. We approach this problem with a neural network architecture that uses the recently-proposed capsule routing mechanism. A capsule is a group of activation units representing a set of properties for an entity of interest, and the purpose of routing is to identify part-whole… 

Figures and Tables from this paper

Polyphonic Sound Event Detection by Using Capsule Neural Networks
TLDR
Extensive evaluations carried out on three publicly available datasets are reported, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also achieves the best results with respect to the state-of-the-art algorithms.
A CAPSULE NEURAL NETWORKS BASED APPROACH FOR BIRD AUDIO DETECTION Technical Report
TLDR
A system for bird audio detection based on the innovative CapsNet architecture with the aim to incentive the network to learn global coherence implicitly and to identify part-whole relationships between capsules, thereby improving generalization performance in detecting the presence bird songs from various environmental conditions.
A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling
TLDR
A network architecture mainly designed for audio tagging, which can also be used for weakly supervised acoustic event detection (AED), which consists of a modified DenseNet as the feature extractor, and a global average pooling (GAP) layer to predict frame-level labels at inference time.
Duration Robust Weakly Supervised Sound Event Detection
  • Heinrich DinkelKai Yu
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
It is shown that for this task subsampling the temporal resolution by a neural network enhances the F1 score as well as its robustness towards short, sporadic sound events and the use of double thresholding as a more robust and predictable post-processing method.
Improving performance and inference on audio classification tasks using capsule networks
TLDR
This paper proposes an architecture for capsule networks fit for audio classification tasks and suggests modifications for regularization and multi-label classification, and develops insights into the data using capsule outputs and shows the utility of the learned network for transfer learning.
Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events
TLDR
This work modify this attention-based feedforward structure in such a way that allows the resulting model to use audio as well as video to compute sound event predictions, and makes a compelling case for devoting more attention to research in multimodal audiovisual classification.
Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
TLDR
A hierarchical pooling structure is proposed to improve the performance of weakly labeled sound event detection system and has made remarkable improvements on three types of pooling function without adding any parameters.
Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation
TLDR
A novel PSED framework that incorporates Multi-Type-Multi-Scale TFRs, which can reveal acoustics patterns in a complementary manner and achieves a 7% reduction in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset.
Capsule Networks - A survey
...
...

References

SHOWING 1-10 OF 30 REFERENCES
ENSEMBLE OF CONVOLUTIONAL NEURAL NETWORKS FOR WEAKLY-SUPERVISED SOUND EVENT DETECTION USING MULTIPLE SCALE INPUT
TLDR
The proposed model, an ensemble of convolutional neural networks to detect audio events in the automotive environment, achieved the 2nd place on audio tagging and the 1st place on sound event detection.
Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging
TLDR
A weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events and the attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
TLDR
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
A convolutional neural network approach for acoustic scene classification
TLDR
This paper proposes the use of a CNN trained to classify short sequences of audio, represented by their log-mel spectrogram, and introduces a training method that can be used under particular circumstances in order to make full use of small datasets.
Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly
Detection and classification of acoustic scenes and events: An IEEE AASP challenge
TLDR
An overview of systems submitted to the public evaluation challenge on acoustic scene classification and detection of sound events within a scene as well as a detailed evaluation of the results achieved by those systems are provided.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Dynamic Routing Between Capsules
TLDR
It is shown that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
TLDR
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
Capsules for Object Segmentation
TLDR
The proposed convolutional-deconvolutional capsule network, called SegCaps, shows strong results for the task of object segmentation with substantial decrease in parameter space and is able to handle large image sizes as opposed to baseline capsules.
...
...