Corpus ID: 10425838

Using VTLN for broadcast news transcription

  title={Using VTLN for broadcast news transcription},
  author={Do Yeong Kim and Srinivasan Umesh and Mark John Francis Gales and Thomas Hain and Philip C. Woodland},
Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the Jacobian associated with the VTLN transformation. A new, simple, linear… Expand
Applying vocal tract length normalization to meeting recordings
This work investigates the behaviour of the VTLN warping factor and shows that a stable estimate is not obtained, and instead it appears to be influenced by the context of the meeting, in particular the current conversational partner. Expand
Vocal tract length normalisation (VTLN) is a well known rapid adaptation technique. VTLN as a linear transformation in the cepstral domain results in the scaling and translation factors. The warpingExpand
Combining vocal tract length normalization with hierarchial linear transformations
A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented and experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity. Expand
A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics
This paper develops a computationally efficient approach for warp factor estimation in Vocal Tract Length Normalization (VTLN) that has recognition performance that is comparable to conventional VTLN and yet is computationally more efficient. Expand
Bias Adaptation for Vocal Tract Length Normalization
This paper presents a complete and comprehensible derivation of the biastransformation for VTLN and implements it in a unied frame-work for statisticalparametricspeechsynthesis and recognition. Expand
Study of jacobian compensation using linear transformation of conventional MFCC for VTLN
This paper presents a linear transformation to obtain warped features from unwarped features during vocal-tract length normalisation (VTLN) within the conventional MFCC framework without any modification in the signal processing steps involved during the feature extraction stage. Expand
Effect of jacobian compensation in linear transformation based VTLN under matched and mis-matched speaker conditions
In this paper we study the effect of use of jacobian in different linear transformation (LT) based methods of VTLN. In conventional VTLN, the jacobian is highly non-linear and can not be computed andExpand
VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC
In this paper, we propose a method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tractExpand
Speaker adaptation with an Exponential Transform
A linear transform that is called an Exponential Transform (ET), which integrates aspects of CMLLR, VTLN and STC/MLLT into a single transform with jointly trained components, and finds that the axis along which male and female speakers differ is automatically learned. Expand
Speaker normalisation for large vocabulary multiparty conversational speech recognition
One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recordingExpand


An investigation into vocal tract length normalisation
It was found that if multiple iterations of constrained MLLR is used there is no additional advantage to also using VTLN, and that as previously reported that the e ects of V TLN and unconstrained M LLR are largely additive. Expand
Vocal tract normalization as linear transformation of MFCC
This paper shows that Mel-frequency warping can equally well be integrated into the framework of VTN as linear transformation on the cepstrum and there is a strong interdependence ofVTN and Maximum Likelihood Linear Regression for the case of Gaussian emission probabilities. Expand
Vocal tract normalization equals linear transformation in cepstral space
  • M. Pitz, H. Ney
  • Mathematics, Computer Science
  • IEEE Transactions on Speech and Audio Processing
  • 2005
It is shown that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR), which can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. Expand
Recent advances in broadcast news transcription
Heteroscedastic linear discriminant analysis (HLDA and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successfully applied to the BN-E task for the first time. Expand
Speaker normalization with all-pass transforms
This work develops a novel speaker normalization scheme by exploiting the fact that frequency domain transformations similar to that inherent in VTLN can be accomplished entirely in the cepstral domain through the use of conformal maps. Expand
Speaker normalization using efficient frequency warping procedures
  • L. Lee, R. Rose
  • Computer Science
  • 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis are presented. Expand
The 1998 HTK system for transcription of conversational telephone speech
This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone speech as used in the NIST 1998 Hub5E evaluation and experimental results for each stage of the multi-pass decoding scheme are presented. Expand
Maximum likelihood linear transformations for HMM-based speech recognition
  • M. Gales
  • Computer Science
  • Comput. Speech Lang.
  • 1998
The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform. Expand