Librispeech: An ASR corpus based on public domain audio books
- Vassil Panayotov, Guoguo Chen, Daniel Povey, S. Khudanpur
- Computer ScienceIEEE International Conference on Acoustics…
- 19 April 2015
It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.
The Kaldi Speech Recognition Toolkit
- Daniel Povey, Arnab Ghoshal, Karel Veselý
- Computer Science
- 2011
The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
X-Vectors: Robust DNN Embeddings for Speaker Recognition
- David Snyder, D. Garcia-Romero, Gregory Sell, Daniel Povey, S. Khudanpur
- Computer ScienceIEEE International Conference on Acoustics…
- 15 April 2018
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.
A time delay neural network architecture for efficient modeling of long temporal contexts
- Vijayaditya Peddinti, Daniel Povey, S. Khudanpur
- Computer ScienceInterspeech
- 2015
This paper proposes a time delay neural network architecture which models long term temporal dependencies with training times comparable to standard feed-forward DNNs and uses sub-sampling to reduce computation during training.
The HTK book version 3.4
- S. Young, Gunnar Evermann, P. Woodland
- Computer Science
- 16 September 2006
MUSAN: A Music, Speech, and Noise Corpus
- David Snyder, Guoguo Chen, Daniel Povey
- Computer SciencearXiv.org
- 28 October 2015
This report introduces a new corpus of music, speech, and noise suitable for training models for voice activity detection (VAD) and music/speech discrimination and demonstrates use of this corpus on Broadcast news and VAD for speaker identification.
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
- Daniel Povey, Vijayaditya Peddinti, S. Khudanpur
- Computer ScienceInterspeech
- 8 September 2016
.
Minimum Phone Error and I-smoothing for improved discriminative training
- Daniel Povey, P. Woodland
- Computer ScienceIEEE International Conference on Acoustics…
- 13 May 2002
The Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria are smoothed approximations to the phone or word error rate respectively and I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE).
Deep Neural Network Embeddings for Text-Independent Speaker Verification
- David Snyder, D. Garcia-Romero, Daniel Povey, S. Khudanpur
- Computer ScienceInterspeech
- 20 August 2017
These are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora, and the two representations are complementary, and their fusion improves on the baseline at all operating points.
Boosted MMI for model and feature-space discriminative training
- Daniel Povey, D. Kanevsky, Brian Kingsbury, B. Ramabhadran, G. Saon, Karthik Venkat Ramanan
- Computer ScienceIEEE International Conference on Acoustics…
- 12 May 2008
A modified form of the maximum mutual information (MMI) objective function which gives improved results for discriminative training by boosting the likelihoods of paths in the denominator lattice that have a higher phone error relative to the correct transcript.
...
...