• Corpus ID: 214797346

Meta learning based audio tagging

@inproceedings{Xu2018MetaLB,
  title={Meta learning based audio tagging},
  author={Kele Xu and Boqing Zhu and Dezhi Wang and Yuxing Peng and Huaimin Wang and Lilun Zhang and Bo Li},
  booktitle={DCASE},
  year={2018}
}
In this paper, we describe our solution for the general-purpose audio tagging task, which belongs to one of the subtasks in the DCASE 2018 challenge. For the solution, we employed both deep learning methods and statistic features-based shallow architecture learners. For single model, different deep convolutional neural network architectures are tested with different kinds of input, which ranges from the raw-signal, log-scaled Mel-spectrograms (log Mel) to Mel Frequency Cepstral Coefficients… 

Figures and Tables from this paper

Multi-Representation Knowledge Distillation For Audio Classification
TLDR
A novel end-to-end collaborative learning framework that takes multiple representations as the input to train the models in parallel and can improve the classification performance and achieve state-of-the-art results on both acoustic scene classification tasks and general audio tagging tasks.
Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data
TLDR
A state-of-the-art general audio tagging model is first employed to predict weak labels for unlabeled data, and a weakly supervised architecture based on the convolutional recurrent neural network is developed to solve the strong annotations of sound events with the aid of the unlabeling data with predicted labels.
Large-Scale Whale-Call Classification by Transfer Learning on Multi-Scale Waveforms and Time-Frequency Features
TLDR
An effective data-driven approach based on pre-trained Convolutional Neural Networks using multi-scale waveforms and time-frequency feature representations is developed in order to perform the classification of whale calls from a large open-source dataset recorded by sensors carried by whales.

References

SHOWING 1-10 OF 26 REFERENCES
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
TLDR
A shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multilabel classification task and a symmetric or asymmetric deep denoising auto-encoder (syDAE or asyDAE) to generate new data-driven features from the logarithmic Mel-filter banks features.
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
TLDR
A novel end-to-end network called WaveMsNet is proposed based on the multi-scale convolution operation and two-phase method, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
TLDR
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network
TLDR
Inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset, and can further improve the classification robustness of multi-scale DenseNet.
Learning environmental sounds with end-to-end convolutional neural network
  • Yuji Tokozume, T. Harada
  • Computer Science
    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
TLDR
This paper proposes a novel end-to-end ESC system using a convolutional neural network (CNN) and achieves a 6.5% improvement in classification accuracy over the state-of-the-art logmel-CNN with the static and delta log-mel feature, simply by combining the system and logMel-CNN.
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Squeeze-and-Excitation Networks
TLDR
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
Aggregated Residual Transformations for Deep Neural Networks
TLDR
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
Classification of audio signals using AANN and GMM
Pairwise Decomposition with Deep Neural Networks and Multiscale Kernel Subspace Learning for Acoustic Scene Classification
TLDR
A system for acoustic scene classification using pairwise decomposition with deep neural networks and dimensionality reduction by multiscale kernel subspace learning and a nearest-neighbour classifier is proposed.
...
...