• Publications
  • Influence
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
  • T. Virtanen
  • Mathematics, Computer Science
  • IEEE Transactions on Audio, Speech, and Language…
  • 1 March 2007
TLDR
An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. Expand
  • 978
  • 82
  • PDF
TUT database for acoustic scene classification and sound event detection
TLDR
We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting of binaural recordings from 15 different acoustic environments. Expand
  • 397
  • 63
  • PDF
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
We combine CNN and RNN in a convolutional recurrent neural network (CRNN) and apply it on a polyphonic SED task. Expand
  • 286
  • 40
  • PDF
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition
TLDR
This paper proposes to use exemplar-based sparse representations to model speech corrupted by additive noise as a linear combination of noise and speech exemplars for noise robust automatic speech recognition. Expand
  • 371
  • 28
  • PDF
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional space. Expand
  • 121
  • 28
  • PDF
A multi-device dataset for urban acoustic scene classification
TLDR
This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. Expand
  • 155
  • 27
  • PDF
Recurrent neural networks for polyphonic sound event detection in real life recordings
TLDR
We present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs), trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Expand
  • 226
  • 26
  • PDF
Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network
TLDR
This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources with good precision and generate SPS with high signal-to-noise ratio. Expand
  • 96
  • 14
  • PDF
Voice Conversion Using Dynamic Kernel Partial Least Squares Regression
TLDR
We propose to use dynamic kernel partial least squares (DKPLS) technique to model nonlinearities as well as to capture the dynamics in the data. Expand
  • 124
  • 13
  • PDF
Polyphonic sound event detection using multi label deep neural networks
TLDR
In this paper, the use of multi label neural networks are proposed for detection of temporally overlapping sound events in realistic environments. Expand
  • 199
  • 12
  • PDF