• Publications
  • Influence
Recurrent neural network based language model
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model. Expand
Librispeech: An ASR corpus based on public domain audio books
It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself. Expand
X-Vectors: Robust DNN Embeddings for Speaker Recognition
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition. Expand
Extensions of recurrent neural network language model
Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model. Expand
A time delay neural network architecture for efficient modeling of long temporal contexts
This paper proposes a time delay neural network architecture which models long term temporal dependencies with training times comparable to standard feed-forward DNNs and uses sub-sampling to reduce computation during training. Expand
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
A method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training is described, using the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI. Expand
Deep Neural Network Embeddings for Text-Independent Speaker Verification
It is found that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions, which are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora. Expand
JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling
Dexterous surgical activity is of interest to many researchers in human motion modeling. In this paper, we describe a dataset of surgical activities and release it for public use. The dataset wasExpand
Audio augmentation for speech recognition
This paper investigates audio-level speech augmentation methods which directly process the raw signal, and presents results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios. Expand
Highway long short-term memory RNNS for distant speech recognition
This paper extends the deep long short-term memory (DL-STM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers, and introduces the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Expand