Transfer learning of weakly labelled audio
@article{Diment2017TransferLO, title={Transfer learning of weakly labelled audio}, author={Aleksandr Diment and Tuomas Virtanen}, journal={2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, year={2017}, pages={6-10} }
Many machine learning tasks have been shown solvable with impressive levels of success given large amounts of training data and computational power. For the problems which lack data sufficient to achieve high performance, methods for transfer learning can be applied. These refer to performing the new task while having prior knowledge of the nature of the data, gained by first performing a different task, for which training data is abundant. Shown successful for other machine learning tasks…
16 Citations
Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
A new type of pooling layer is proposed aimed at compensating non-relevant information of audio events by applying an adaptive transformation of the convolutional feature maps in the temporal axis that follows a uniform distance subsampling criterion on the learned feature space.
Exploring Deep Transfer Learning Techniques for Alzheimer’s Dementia Detection
- Computer ScienceFrontiers in Computer Science
- 2021
A large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets revealed insightful relations among models, data types, and data labels in this research area.
Improving Gender Identification in Movie Audio Using Cross-Domain Data
- Computer ScienceINTERSPEECH
- 2018
This work acquires VAD labels for movie audio by aligning it with subtitle text, and trains a recurrent neural network model for VAD to predict gender using feature embeddings obtained from a model pre-trained for large-scale audio classification.
Acoustic Scene Classification Using A Deeper Training Method for Convolution Neural Network
- Computer Science2019 International Symposium on Electrical and Electronics Engineering (ISEE)
- 2019
This paper presents a deep learning framework applied for acoustic scene classification (ASC) recognizing the environmental sounds and proposes a novel convolutional neural network (CNN) architecture that enforces the network deeply learning middle Convolutional layers.
Jazz Solo Instrument Classification with Convolutional Neural Networks, Source Separation, and Transfer Learning
- Computer ScienceISMIR
- 2018
This paper builds upon a recently proposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns.
Machine learning in acoustics: Theory and applications.
- PhysicsThe Journal of the Acoustical Society of America
- 2019
This work surveys the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics, and highlights ML developments in four acoustICS research areas: source localization in speech processing, source localized in ocean acoustic, bioacoustics and environmental sounds in everyday scenes.
Machine learning in acoustics: a review
- PhysicsArXiv
- 2019
This work surveys the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics, and highlights ML developments in five acoustICS research areas: source localization in speech processing, source localized in ocean acoustic, bioacoustics), seismic exploration, and environmental sounds in everyday scenes.
Design and Implementation of Fast Spoken Foul Language Recognition with Different End-to-End Deep Neural Network Architectures
- Computer ScienceSensors
- 2021
The proposed system outperformed state-of-the-art pre-trained neural networks on the novel foul language dataset and proved to reduce the computational cost with minimal trainable parameters.
Four-way Classification of Tabla Strokes with Models Adapted from Automatic Drum Transcription
- Computer ScienceISMIR
- 2021
A new, diverse tabla dataset suitably annotated for the task is presented and the use of transfer learning on a state-of-the-art pre-trained multiclass CNN drums model is explored, finding that the 1- way models provide the best mean f-score while the drums pre- trained and tablaadapted 3-way models generalize better for the most scarce target class.
References
SHOWING 1-10 OF 24 REFERENCES
A Survey on Transfer Learning
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2010
The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Audio event and scene recognition: A unified approach using strongly and weakly labeled data
- Computer Science2017 International Joint Conference on Neural Networks (IJCNN)
- 2017
The main method is based on manifold regularization on graphs in which it is shown that the unified learning can be formulated as a constraint optimization problem which can be solved by iterative concave-convex procedure (CCCP).
A joint detection-classification model for audio tagging of weakly labelled data
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
This work proposes a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously and shows that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%.
Domain Adaptation with Structural Correspondence Learning
- Computer ScienceEMNLP
- 2006
This work introduces structural correspondence learning to automatically induce correspondences among features from different domains in order to adapt existing models from a resource-rich source domain to aresource-poor target domain.
Improving SVM accuracy by training on auxiliary data sources
- Computer ScienceICML
- 2004
Experiments show that when the training data set is very small, training with auxiliary data can produce large improvements in accuracy, even when the auxiliary data is significantly different from the training (and test) data.
Understanding the difficulty of training deep feedforward neural networks
- Computer ScienceAISTATS
- 2010
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition
- Computer Science2013 Humaine Association Conference on Affective Computing and Intelligent Interaction
- 2013
A sparse auto encoder method for feature transfer learning for speech emotion recognition using a common emotion-specific mapping rule from a small set of labelled data in a target domain to improve the performance relative to learning each source domain independently.
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Budapest, Hungary, 3 Sep 2016.
- Computer Science
- 2016
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Constructing informative priors using transfer learning
- Computer ScienceICML
- 2006
An algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task, which relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent.
Audio Set: An ontology and human-labeled dataset for audio events
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.