Corpus ID: 65148724

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

  title={Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM \{TIMIT\} | NIST},
  author={John S. Garofolo and Lori Lamel and William M. Fisher and Jonathan G. Fiscus and David S. Pallett and Nancy L. Dahlgren},

Topics from this paper

Revisiting SincNet: An Evaluation of Feature and Network Hyperparameters for Speaker Recognition
The main finding is that the stride and window size of the feature extractor plays a crucial role in obtaining good performance, and by using optimal values for these two hyperparameters, traditional features are able to match the performance of sinc features. Expand
Speech Acoustic Modelling from Raw Phase Spectrum
This paper investigates the possibility and efficacy of acoustic modelling using the raw short-time phase spectrum, and studies the usefulness of the raw wrapped, unwrapped and minimum-phase phase spectra as well as the phase of the source and filter components for acoustic modelling. Expand
On the Robustness and Training Dynamics of Raw Waveform Models
The accuracy of raw waveform acoustic models for automatic speech recognition is found to be comparable to, or better than, their MFCC-based counterparts in matched conditions and notably improved by using a better alignment. Expand
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling
The sign spectrum is demonstrated to carry information related to the temporal structure of the signal as well as the speech’s source component enabling unique signal characterisation and reconstruction in acoustic modelling for automatic speech recognition. Expand
Multi-scale Generative Adversarial Networks for Speech Enhancement
Speech Enhancement Multi-scale Generative Adversarial Networks (SEMGAN), whose generator and discriminator networks are structured on the basis of fully convolutional neural networks (FCNNs) gain a superior performance in comparison with the optimally modified log-spectral amplitude estimator (OMLSA) and SEGAN in different noisy conditions. Expand
Deep learning and structured data
This thesis investigates deep learning from a spectrum of different perspectives, and studies the question of generalization, which is one of the most fundamental notion in machine learning theory, and shows how it becomes different from the conventional way in the regime of deep learning. Expand
Filtered and unfiltered sentences produce different spectral context effects in vowel categorization
Vowel categorization was examined following context sentences that naturally possessed desired spectral properties without any filtering, which maximizes acoustic control over stimulus materials, but vastly understates the acoustic variability of speech. Expand
Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production
Three-dimensional bottleneck features are analysed and it is shown that for vowels, their spatial representation is very close to the familiar F1:F2 vowel quadrilateral, which suggests that these networks derive representations specific to particular phonetic categories, with properties similar to those used by human perception. Expand
Co-channel Speech Separation Based on Amplitude Modulation Spectrum Analysis
  • Qi Hu, M. Liang
  • Computer Science
  • Circuits Syst. Signal Process.
  • 2014
This paper proposes an approach to exploit the amplitude modulation spectrum and to perform the separation based on the framework of computational auditory scene analysis (CASA), which utilizes the periodicity encoded in the AMS and then makes the channel selection. Expand
In-set/out-of-set speaker recognition in sustained acoustic scenarios using sparse data
For situations in which the background environment type remains constant between train and test, an in-set/out-of-set speaker recognition system that takes advantage of information gathered from the environmental noise can be formulated which realizes significant improvement when only extremely limited amounts of train/test data is available. Expand