Detection and Classification of Acoustic Scenes and Events

@article{Stowell2015DetectionAC,
  title={Detection and Classification of Acoustic Scenes and Events},
  author={Dan Stowell and Dimitrios Giannoulis and Emmanouil Benetos and Mathieu Lagrange and Mark D. Plumbley},
  journal={IEEE Transactions on Multimedia},
  year={2015},
  volume={17},
  pages={1733-1746}
}
For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the… 

Figures and Tables from this paper

Classification Study of Sound and Image Events Using Event Detection Systems
TLDR
The state of the art in automatically classifying audio scenes, and automatically detecting and classifyingaudio events is reported on.
Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning
TLDR
A method that learns features in an unsupervised manner from high-resolution spectrogram patches, and integrates within the deep neural network framework to detect and classify acoustic events.
Acoustic scene classification using spectrograms
TLDR
This work investigates the performance of an automatic classification system, using the database of the DCASE 2016 challenge, that recognizes environments/places where an audio sample was originally recorded, considering 15 different categories and concludes that the features obtained in the visual domain can support the development of an efficient classification system in this application.
Audio Event Detection using Weakly Labeled Data
TLDR
It is shown that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem and two frameworks for solving multiple-instance learning are suggested, one based on support vector machines, and the other on neural networks.
A Multi-task Learning Approach Based on Convolutional Neural Network for Acoustic Scene Classification
TLDR
This paper combines the ASC task and Sound Event Detection (SED) task, and proposes a new CNN approach with multi-task Learning (MTL), which uses SED as an auxiliary task to pay more attention to the information of the sound event in the model.
ANALYSIS OF THE SOUND EVENT DETECTION METHODS AND SYSTEMS
TLDR
A number of problems that are associated with the development of sound event detection systems, such as the deviation for each environment and each sound category, overlapping audio events, unreliable training data, etc are presented.
Multi-microphone acoustic events detection and classification for indoor monitoring
TLDR
A system to classify seven indoor acoustic events is analysed, based on the baseline algorithm of DCASE’2016 challenge, and the results with multi-microphone configurations confirm the improvement, covering a bigger part of space at the same time.
An improved weakly supervised learning system for detection of sound events in domestic environments
TLDR
An improved weakly supervised learning framework on the basis of convolutional recurrent neural network (CRNN) is employed to achieve the sound event timestamps using a small size of fully labeled dataset and large-scale weakly labeled and unlabeled dataset.
...
...

References

SHOWING 1-10 OF 78 REFERENCES
A database and challenge for acoustic scene classification and event detection
TLDR
This paper introduces a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection.
IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events IEEE AASP SCENE CLASSIFICATION CHALLENGE USING HIDDEN MARKOV MODELS AND FRAME BASED CLASSIFICATION
TLDR
Two algorithms to discriminate between different scenes using hidden Markov models (HMMs) and Gaussian mixture models (GMMs) yielded 72% correct classification with 10 fold crossvalidation and 62% accuracy respectively.
Acoustic Scene Classification
TLDR
An account of the state-of-the-art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce, and a range of different algorithms submitted for a data challenge to provide a general and fair benchmark for ASC techniques.
Detection and classification of acoustic scenes and events: An IEEE AASP challenge
TLDR
An overview of systems submitted to the public evaluation challenge on acoustic scene classification and detection of sound events within a scene as well as a detailed evaluation of the results achieved by those systems are provided.
AN I-VECTOR BASED APPROACH FOR AUDIO SCENE DETECTION
TLDR
The i-vector system is state-ofthe-art in Speaker Verification and Scene Detection, and is outperforming conventional Gaussian Mixture Model (GMM)-based approaches, and compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC.
Spectral vs. spectro-temporal features for acoustic event detection
  • Courtenay V. Cotton, D. Ellis
  • Computer Science, Physics
    2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2011
TLDR
This work proposes an approach to detecting and modeling acoustic events that directly describes temporal context, using convolutive non-negative matrix factorization (NMF), and discovers a set of spectro-temporal patch bases that best describe the data.
Acoustic scene classification using sparse feature learning and event-based pooling
TLDR
The results show that learned features outperform MFCCs, event-based pooling achieves higher accuracy than uniform pooling and, furthermore, a combination of the two methods performs even better than either one used alone.
CLEAR Evaluation of Acoustic Event Detection and Classification Systems
TLDR
In this paper, the various systems for the tasks of AED and AEC and their results are presented.
The CLEAR 2006 Evaluation
TLDR
The evaluation tasks in CLEAR 2006 included person tracking, face detection and tracking, person identification, head pose estimation, vehicle tracking as well as acoustic scene analysis and an overview of the results.
Acoustic Scene Classification: Classifying environments from the sounds they produce
TLDR
An account of the state of the art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce, and a range of different algorithms submitted for a data challenge to provide a general and fair benchmark for ASC techniques.
...
...