ATTENTION-BASED CONVOLUTIONAL NEURAL NETWORK FOR AUDIO EVENT CLASSIFICATION WITH FEATURE TRANSFER LEARNING
@inproceedings{Chen2018ATTENTIONBASEDCN, title={ATTENTION-BASED CONVOLUTIONAL NEURAL NETWORK FOR AUDIO EVENT CLASSIFICATION WITH FEATURE TRANSFER LEARNING}, author={Tianxiang Chen and Udit Gupta}, year={2018} }
Audio event classification is an urgent Content based Information Retrieval (CBIR) unsolved problem with numerous applications that it can benefit. This paper is explaining Pindrop’s submission to the ”Making Sense of Sound” challenge. In this submission we address the challenge of classifying audio excerpts based on their origin by using Convolutional Neural Networks with feature transfer learning. We use pretrained VGGish network to extract feature embeddings. Our results show a remarkable…
One Citation
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
This paper proposes pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset, and investigates the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks.
References
SHOWING 1-10 OF 11 REFERENCES
CNN architectures for large-scale audio classification
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Audio Set Classification with Attention Model: A Probabilistic Perspective
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper investigates the Audio Set classification. Audio Set is a large scale weakly labelled dataset (WLD) of audio clips. In WLD only the presence of a label is known, without knowing the…
Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks
- Computer ScienceISMIR
- 2015
A range of label-preserving audio transformations are applied and pitch shifting is found to be the most helpful augmentation method for music data augmentation, reaching the state of the art on two public datasets.
Audio Set: An ontology and human-labeled dataset for audio events
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
ESC: Dataset for Environmental Sound Classification
- Computer ScienceACM Multimedia
- 2015
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Computer ScienceICML
- 2015
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Adam: A Method for Stochastic Optimization
- Computer ScienceICLR
- 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Progressive Neural Networks
- Computer ScienceArXiv
- 2016
This work evaluates this progressive networks architecture extensively on a wide variety of reinforcement learning tasks, and demonstrates that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
Freesound technical demo
- Computer ScienceACM Multimedia
- 2013
This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.
Vggish: A vgg-like audio classification model
- https://github.com/DTaoo/VGGish, 2017.
- 2017