DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1

  title={DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1},
  author={John S. Garofolo and Lori Lamel and William M. Fisher and Jonathan G. Fiscus and David S. Pallett and Nancy L. Dahlgren},
Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR
DPGMM posteriorgrams are appended to increase the discriminability of acoustic features to enhance ASR systems and the experimental results on the WSJ corpora show the proposal stably improves ASR Systems and provides even more improvement for smaller datasets with fewer resources. Expand
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition
Experimental results show that MixSpeech achieves better accuracy than the baseline models without data augmentation, and outperforms a strongData augmentation method SpecAugment on these recognition tasks. Expand
Broad Phonetic Classification of ASR using Visual Based Features
The results show that using (VS-HMM-GM-MBTI-CNN-VQ) is an available method for classification of phonemes, with the potential for use in applications such as automatic speech recognition and automatic language identification. Expand
E2E-SINCNET: Toward Fully End-To-End Speech Recognition
The proposed E2E-SincNet is a novel fully E 2E ASR model that goes from the raw waveform to the text transcripts by merging two recent and powerful paradigms: SincNet and the joint CTC-attention training scheme. Expand
End-to-End Trainable Self-Attentive Shallow Network for Text-Independent Speaker Verification
This paper proposes a novel framework for SV, end-to-end trainable self-attentive shallow network (SASN), incorporating a time-delay neural network (TDNN) and a self-Attentive pooling mechanism based on the self-ATTentive x-vector system during an utterance embedding phase that is highly efficient, and provides more accurate speaker verification than GE2E. Expand
Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization
This study proposes a long short-term memory (LSTM) network for complex spectral mapping and examines the importance of training corpus for cross-corpus generalization, finding that a training corpus that contains utterances with different channels can significantly improve performance on untrained corpora. Expand
Surprisal-Triggered Conditional Computation with Neural Networks
This model is used both to extract features and to predict observations in a stream of input observations, and can match the performance of a baseline in which the big network is always used with 15% fewer FLOPs. Expand
WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement
Experimental results on speech denoising and compressed speech restoration tasks confirm that with the SRU and the restricted feature map, WaveCRN performs comparably to other state-of-the-art approaches with notably reduced model complexity and inference time. Expand
A Factorial Deep Markov Model for Unsupervised Disentangled Representation Learning from Speech
Latent representations learned by the Factorial Deep Markov Model outperform a baseline i-vector system on speaker verification and dialect identification while also reducing the error rate of a phone recognition system in a domain mismatch scenario. Expand
An End-to-end Approach for Lexical Stress Detection based on Transformer
This work proposes a end-to-end approach using sequence to sequence model of transformer to estimate lexical stress, which can achieve a 6.36% phoneme error rate on the TIMIT dataset, which exceeds the 7.2% error rate in other studies. Expand