Transfer learning of weakly labelled audio

@article{Diment2017TransferLO,
  title={Transfer learning of weakly labelled audio},
  author={Aleksandr Diment and Tuomas Virtanen},
  journal={2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  year={2017},
  pages={6-10}
}
  • Aleksandr Diment, T. Virtanen
  • Published 1 October 2017
  • Computer Science
  • 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Many machine learning tasks have been shown solvable with impressive levels of success given large amounts of training data and computational power. For the problems which lack data sufficient to achieve high performance, methods for transfer learning can be applied. These refer to performing the new task while having prior knowledge of the nature of the data, gained by first performing a different task, for which training data is abundant. Shown successful for other machine learning tasks… 

Figures and Tables from this paper

Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
TLDR
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification
TLDR
A new type of pooling layer is proposed aimed at compensating non-relevant information of audio events by applying an adaptive transformation of the convolutional feature maps in the temporal axis that follows a uniform distance subsampling criterion on the learned feature space.
Exploring Deep Transfer Learning Techniques for Alzheimer’s Dementia Detection
TLDR
A large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets revealed insightful relations among models, data types, and data labels in this research area.
Improving Gender Identification in Movie Audio Using Cross-Domain Data
TLDR
This work acquires VAD labels for movie audio by aligning it with subtitle text, and trains a recurrent neural network model for VAD to predict gender using feature embeddings obtained from a model pre-trained for large-scale audio classification.
Acoustic Scene Classification Using A Deeper Training Method for Convolution Neural Network
TLDR
This paper presents a deep learning framework applied for acoustic scene classification (ASC) recognizing the environmental sounds and proposes a novel convolutional neural network (CNN) architecture that enforces the network deeply learning middle Convolutional layers.
Jazz Solo Instrument Classification with Convolutional Neural Networks, Source Separation, and Transfer Learning
TLDR
This paper builds upon a recently proposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns.
Machine learning in acoustics: Theory and applications.
TLDR
This work surveys the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics, and highlights ML developments in four acoustICS research areas: source localization in speech processing, source localized in ocean acoustic, bioacoustics and environmental sounds in everyday scenes.
Machine learning in acoustics: a review
TLDR
This work surveys the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics, and highlights ML developments in five acoustICS research areas: source localization in speech processing, source localized in ocean acoustic, bioacoustics), seismic exploration, and environmental sounds in everyday scenes.
Design and Implementation of Fast Spoken Foul Language Recognition with Different End-to-End Deep Neural Network Architectures
TLDR
The proposed system outperformed state-of-the-art pre-trained neural networks on the novel foul language dataset and proved to reduce the computational cost with minimal trainable parameters.
Four-way Classification of Tabla Strokes with Models Adapted from Automatic Drum Transcription
TLDR
A new, diverse tabla dataset suitably annotated for the task is presented and the use of transfer learning on a state-of-the-art pre-trained multiclass CNN drums model is explored, finding that the 1- way models provide the best mean f-score while the drums pre- trained and tablaadapted 3-way models generalize better for the most scarce target class.
...
1
2
...

References

SHOWING 1-10 OF 24 REFERENCES
A Survey on Transfer Learning
TLDR
The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Audio event and scene recognition: A unified approach using strongly and weakly labeled data
  • B. Raj, Anurag Kumar
  • Computer Science
    2017 International Joint Conference on Neural Networks (IJCNN)
  • 2017
TLDR
The main method is based on manifold regularization on graphs in which it is shown that the unified learning can be formulated as a constraint optimization problem which can be solved by iterative concave-convex procedure (CCCP).
A joint detection-classification model for audio tagging of weakly labelled data
TLDR
This work proposes a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously and shows that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%.
Domain Adaptation with Structural Correspondence Learning
TLDR
This work introduces structural correspondence learning to automatically induce correspondences among features from different domains in order to adapt existing models from a resource-rich source domain to aresource-poor target domain.
Improving SVM accuracy by training on auxiliary data sources
TLDR
Experiments show that when the training data set is very small, training with auxiliary data can produce large improvements in accuracy, even when the auxiliary data is significantly different from the training (and test) data.
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition
TLDR
A sparse auto encoder method for feature transfer learning for speech emotion recognition using a common emotion-specific mapping rule from a small set of labelled data in a target domain to improve the performance relative to learning each source domain independently.
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Budapest, Hungary, 3 Sep 2016.
TLDR
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Constructing informative priors using transfer learning
TLDR
An algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task, which relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
...
1
2
3
...