• Publications
  • Influence
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
  • T. Virtanen
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language…
  • 1 March 2007
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented and enables a better separation quality than the previous algorithms.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
TUT database for acoustic scene classification and sound event detection
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
A multi-device dataset for urban acoustic scene classification
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition
The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.
Clotho: an Audio Captioning Dataset
Clotho, a dataset for audio captioning consisting of 4981 audio samples of 15 to 30 seconds duration and 24 905 captions of eight to 20 words length, and a baseline method to provide initial results, is presented.