Polyphonic sound event detection using multi label deep neural networks

  title={Polyphonic sound event detection using multi label deep neural networks},
  author={Emre Çakir and Toni Heittola and Heikki Huttunen and Tuomas Virtanen},
  journal={2015 International Joint Conference on Neural Networks (IJCNN)},
In this paper, the use of multi label neural networks are proposed for detection of temporally overlapping sound events in realistic environments. [] Key Method Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work. The model is evaluated with recordings from realistic everyday environments and the obtained overall accuracy is 63.8%. The method is compared against a state-of-the-art method using non-negative matrix factorization as a…

Figures and Tables from this paper

Multi-label vs. combined single-label sound event detection with deep neural networks
This paper compares two different deep learning methods for the detection of environmental sound events: combined single-label classification and multi- label classification, and investigates the accuracy of both methods on the audio with different levels of polyphony.
A polyphonic sound event detection (SED) system based on a multi-model system that uses one model based on Deep Neural Networks (DNN) to detect sound events of car, and five models based on Bi-directional Gated Recurrent Units Recurrent Neural Networks and BGRU-RNN to detect other sound events.
Convolutional Recurrent Neural Networks for Rare Sound Event Detection
A convolutional recurrent neural network (CRNN) is proposed for rare sound event detection that provides significant performance improvement over two other deep learning based methods mainly due to its capability of longer term temporal modeling.
This paper presents a multi label bi-directional recurrent neural network to model the temporal evolution of sound events, and explores data augmentation techniques that have shown success in sound classification.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Convolutional Neural Networks with Multi-task Loss for Polyphonic Sound Event Detection
A multi-task loss function is used to couple with different neural networks and apply it to a polyphonic sound event detection task and it is compared with DNN, CNN and CBRNN methods.
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
Multi-frame Concatenation for Detection of Rare Sound Events Based on Deep Neural Network
It is illustrated that the number of frames concatenated affects the accuracy of SED as well as the influence of different frame concatenation when detecting sound events.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Joint Measurement of Multi-channel Sound Event Detection and Localization Using Deep Neural Network
This paper extracts the phase feature and amplitude feature of the sound spectrum from each audio channel, avoiding feature extraction limited by other microphone arrays.


Sound event detection using non-negative dictionaries learned from annotated overlapping events
  • O. Dikmen, A. Mesaros
  • Computer Science
    2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
  • 2013
This paper proposes a method which bypasses the need to build separate sound models and learns non-negative dictionaries for the sound content and their annotations in a coupled sense, and very promising results are obtained using only a small amount of data.
Recognition of acoustic events using deep neural networks
For an acoustic event classification task containing 61 distinct classes, classification accuracy of the neural network classifier excels that of the conventional Gaussian mixture model based hidden Markov model classifier.
Supervised model training for overlapping sound events based on unsupervised source separation
Two iterative approaches based on EM algorithm to select the most likely stream to contain the target sound to give a reasonable increase of 8 percentage units in the detection accuracy are proposed.
Acoustic event detection in real life recordings
A system for acoustic event detection in recordings from real life environments using a network of hidden Markov models, capable of recognizing almost one third of the events, and the temporal positioning of the Events is not correct for 84% of the time.
Detecting sound events in basketball video archive
A method for detecting the sound events in a basketball game with focusing on detecting cheering sound and template matching based approach is proposed, which can be used in basketball video content retrieval and highlight extraction.
Context-dependent sound event detection
The two-step approach was found to improve the results substantially compared to the context-independent baseline system, and the detection accuracy can be almost doubled by using the proposed context-dependent event detection.
Improving deep neural networks for LVCSR using rectified linear units and dropout
Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Deep learning for monaural speech separation
The joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint, is proposed to enhance the separation performance of monaural speech separation models.