Learning to Match Transient Sound Events Using Attentional Similarity for Few-shot Sound Recognition

@article{Chou2019LearningTM,
  title={Learning to Match Transient Sound Events Using Attentional Similarity for Few-shot Sound Recognition},
  author={Szu-Yu Chou and Kai-Hsiang Cheng and Jyh-Shing Roger Jang and Yi-Hsuan Yang},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={26-30}
}
In this paper, we introduce a novel attentional similarity module for the problem of few-shot sound recognition. Given a few examples of an unseen sound event, a classifier must be quickly adapted to recognize the new sound event without much fine-tuning. The proposed attentional similarity module can be plugged into any metric-based learning method for few-shot learning, allowing the resulting model to especially match related short sound events. Extensive experiments on two datasets show that… 

Figures and Tables from this paper

Few-Shot Sound Event Detection
TLDR
This work adapts state-of-the-art metric-based few-shot learning methods to automate the detection of similar-sounding events, requiring only one or few examples of the target event, and develops a method to automatically construct a partial set of labeled examples to reduce user labeling effort.
Metric Learning with Background Noise Class for Few-Shot Detection of Rare Sound Events
TLDR
This paper aims to achieve few-shot detection of rare sound events, from query sequence that contain not only the target events but also the other events and background noise, and proposes metric learning with background noise class for the few- shot detection.
Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers
TLDR
Novel approaches to few-shot sound event detection are proposed utilizing region proposals and the Perceiver architecture, which is capable of accurately localizing sound events with very few examples of each class of interest.
Few-Shot Acoustic Event Detection Via Meta Learning
TLDR
This paper formulate few-shot AED problem and explores different ways of utilizing traditional supervised methods for this setting as well as a variety of meta-learning approaches, which are conventionally used to solve few- shot classification problem.
Multi-label Few-shot Learning for Sound Event Recognition
TLDR
A One-vs.-Rest episode selection strategy is proposed to mitigate the issue of the complexity of forming an episode and apply the strategy to the multi-label few-shot problem.
A Mutual learning framework for Few-shot Sound Event Detection
TLDR
This work proposes to update class prototypes with transductive inference to make the class prototypes as close to the true class center as possible, and proposes to use the updated class prototypes to fine-tune the feature extractor.
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
TLDR
This paper is aimed at providing the audio recognition community with a carefully annotated dataset for FSL and OSR comprised of 1360 clips from 34 classes divided into pattern sounds and unwanted sounds.
Few-Shot Continual Learning for Audio Classification
TLDR
This work introduces a few-shot continual learning framework for audio classification, where a trained base classifier is continuously expanded to recognize novel classes based on only few labeled data at inference time, which enables fast and interactive model updates by end-users with minimal human effort.
Who Calls The Shots? Rethinking Few-Shot Learning for Audio
TLDR
A series of experiments lead to audio-specific insights on few-shot learning, some of which are at odds with recent findings in the image domain: there is no best one-size- fits-all model, method, and support set selection criterion, and it depends on the expected application scenario.
FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORK WITH BACKGROUND CLASS Technical Report
TLDR
Experimental results show that the proposed prototypical network-based method for few-shot bioacoustic event detection can effectively distinguish target events and background noise.
...
1
2
3
...

References

SHOWING 1-10 OF 26 REFERENCES
Learning to Recognize Transient Sound Events using Attentional Supervision
TLDR
This paper presents an attempt to learn a neural network model that recognizes more than 500 different sound events from the audio part of user generated videos (UGV), establishing a new state-of-theart for DCASE17 and AudioSet data sets.
ENSEMBLE OF CONVOLUTIONAL NEURAL NETWORKS FOR WEAKLY-SUPERVISED SOUND EVENT DETECTION USING MULTIPLE SCALE INPUT
TLDR
The proposed model, an ensemble of convolutional neural networks to detect audio events in the automotive environment, achieved the 2nd place on audio tagging and the 1st place on sound event detection.
Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
TLDR
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Learning to Compare: Relation Network for Few-Shot Learning
TLDR
A conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each, which is easily extended to zero- shot learning.
Matching Networks for One Shot Learning
TLDR
This work employs ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories to learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.
Event Localization in Music Auto-tagging
TLDR
This paper proposes a convolutional neural network architecture that is able to make accurate frame-level predictions of tags in unseen music clips by using only clip-level annotations in the training phase, and presents qualitative analyses showing the model can indeed learn certain characteristics of music tags.
Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks
TLDR
A model based on convolutional neural networks that relies only on weakly-supervised data for training and is able to detect frame-level information, e.g., the temporal position of sounds, even when it is trained merely with clip-level labels.
Siamese Neural Networks for One-Shot Image Recognition
TLDR
A method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs and is able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
...
1
2
3
...