• Publications
  • Influence
Theano: A Python framework for fast computation of mathematical expressions
TLDR
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
Attention-Based Models for Speech Recognition
TLDR
The attention-mechanism is extended with features needed for speech recognition and a novel and generic method of adding location-awareness to the attention mechanism is proposed to alleviate the issue of high phoneme error rate.
Regularizing Neural Networks by Penalizing Confident Output Distributions
TLDR
It is found that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.
End-to-end attention-based large vocabulary speech recognition
TLDR
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.
Robust 3D Action Recognition with Random Occupancy Patterns
TLDR
This work extracts semi-local features called random occupancy pattern ROP features, which employ a novel sampling scheme that effectively explores an extremely large sampling space and utilizes a sparse coding approach to robustly encode these features.
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
TLDR
A variety of structural and optimization improvements to the Listen, Attend, and Spell model are explored, which significantly improve performance and a multi-head attention architecture is introduced, which offers improvements over the commonly-used single- head attention.
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
TLDR
A recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another, illustrating the power of attention-based models.
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
TLDR
Initial results demonstrate that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.
Unsupervised Speech Representation Learning Using WaveNet Autoencoders
TLDR
A regularization scheme is introduced that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models
TLDR
An attention-based seq2seq speech recognition system that directly transcribes recordings into characters is analysed, observing two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used.
...
...