• Publications
  • Influence
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
CNN architectures for large-scale audio classification
TLDR
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition
TLDR
The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.
Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio
TLDR
A source separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation quality in less time, and is up to 8 times faster than conventional algorithms.
An exemplar-based NMF approach to audio event detection
TLDR
A novel, exemplar-based method for audio event detection based on non-negative matrix factorisation, which model events as a linear combination of dictionary atoms, and mixtures as alinear combination of overlapping events.
Techniques for Noise Robustness in Automatic Speech Recognition
TLDR
For some reasons, this techniques for noise robustness in automatic speech recognition tends to be the representative book in this website.
Exemplar-based Recognition of Speech in Highly Variable Noise
TLDR
This work compares several exemplar-based factorisation and decoding algorithms in pursuit of higher noise robustness in speech recognition, and shows that the proposed exemplarbased techniques offer a substantial improvement in the Noise robustness of speech recognition.
Acquisition of ordinal words using weakly supervised NMF
TLDR
Constrained subspace NMF (CSNMF) is proposed as an extension to NMF that aims to better deal with ordinal data and thus increase the learning rate of the grounding information with an ordinal structure.
Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition
TLDR
This paper introduces a novel non-parametric, exemplar-based method for reconstructing clean speech from noisy observations, based on techniques from the field of Compressive Sensing, which can impute missing features using larger time windows such as entire words.
Self-taught assistive vocal interfaces: an overview of the ALADIN project
TLDR
The overall learning framework, the user-centred design and evaluation aspects, database collection and approaches taken to combat problems such as noise and erroneous input are described.
...
...