# Using VTLN for broadcast news transcription

@inproceedings{Kim2004UsingVF, title={Using VTLN for broadcast news transcription}, author={Do Yeong Kim and Srinivasan Umesh and Mark John Francis Gales and Thomas Hain and Philip C. Woodland}, booktitle={INTERSPEECH}, year={2004} }

Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the Jacobian associated with the VTLN transformation. A new, simple, linear… Expand

#### 47 Citations

Applying vocal tract length normalization to meeting recordings

- Computer Science
- INTERSPEECH
- 2005

This work investigates the behaviour of the VTLN warping factor and shows that a stable estimate is not obtained, and instead it appears to be influenced by the context of the meeting, in particular the current conversational partner. Expand

PAID I BIAS ADAPTATION FOR VOCAL TRACT LENGTH NORMALIZATION

- 2013

Vocal tract length normalisation (VTLN) is a well known rapid adaptation technique. VTLN as a linear transformation in the cepstral domain results in the scaling and translation factors. The warping… Expand

Combining vocal tract length normalization with hierarchial linear transformations

- Computer Science
- 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2012

A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented and experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity. Expand

A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics

- Computer Science
- INTERSPEECH
- 2008

This paper develops a computationally efficient approach for warp factor estimation in Vocal Tract Length Normalization (VTLN) that has recognition performance that is comparable to conventional VTLN and yet is computationally more efficient. Expand

Bias Adaptation for Vocal Tract Length Normalization

- Computer Science
- 2013

This paper presents a complete and comprehensible derivation of the biastransformation for VTLN and implements it in a unied frame-work for statisticalparametricspeechsynthesis and recognition. Expand

Study of jacobian compensation using linear transformation of conventional MFCC for VTLN

- Computer Science
- INTERSPEECH
- 2008

This paper presents a linear transformation to obtain warped features from unwarped features during vocal-tract length normalisation (VTLN) within the conventional MFCC framework without any modification in the signal processing steps involved during the feature extraction stage. Expand

Effect of jacobian compensation in linear transformation based VTLN under matched and mis-matched speaker conditions

- Mathematics
- 2010 National Conference On Communications (NCC)
- 2010

In this paper we study the effect of use of jacobian in different linear transformation (LT) based methods of VTLN. In conventional VTLN, the jacobian is highly non-linear and can not be computed and… Expand

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

- Computer Science
- IEEE Transactions on Audio, Speech, and Language Processing
- 2012

In this paper, we propose a method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract… Expand

Speaker adaptation with an Exponential Transform

- Computer Science
- 2011 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2011

A linear transform that is called an Exponential Transform (ET), which integrates aspects of CMLLR, VTLN and STC/MLLT into a single transform with jointly trained components, and finds that the axis along which male and female speakers differ is automatically learned. Expand

Speaker normalisation for large vocabulary multiparty conversational speech recognition

- Computer Science
- 2009

One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recording… Expand

#### References

SHOWING 1-8 OF 8 REFERENCES

An investigation into vocal tract length normalisation

- Computer Science
- EUROSPEECH
- 1999

It was found that if multiple iterations of constrained MLLR is used there is no additional advantage to also using VTLN, and that as previously reported that the e ects of V TLN and unconstrained M LLR are largely additive. Expand

Vocal tract normalization as linear transformation of MFCC

- Computer Science
- INTERSPEECH
- 2003

This paper shows that Mel-frequency warping can equally well be integrated into the framework of VTN as linear transformation on the cepstrum and there is a strong interdependence ofVTN and Maximum Likelihood Linear Regression for the case of Gaussian emission probabilities. Expand

Vocal tract normalization equals linear transformation in cepstral space

- Mathematics, Computer Science
- IEEE Transactions on Speech and Audio Processing
- 2005

It is shown that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR), which can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. Expand

Recent advances in broadcast news transcription

- Computer Science
- 2003

Heteroscedastic linear discriminant analysis (HLDA and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successfully applied to the BN-E task for the first time. Expand

Speaker normalization with all-pass transforms

- Computer Science
- ICSLP
- 1998

This work develops a novel speaker normalization scheme by exploiting the fact that frequency domain transformations similar to that inherent in VTLN can be accomplished entirely in the cepstral domain through the use of conformal maps. Expand

Speaker normalization using efficient frequency warping procedures

- Computer Science
- 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
- 1996

An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis are presented. Expand

The 1998 HTK system for transcription of conversational telephone speech

- Computer Science
- 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)
- 1999

This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone speech as used in the NIST 1998 Hub5E evaluation and experimental results for each stage of the multi-pass decoding scheme are presented. Expand

Maximum likelihood linear transformations for HMM-based speech recognition

- Computer Science
- Comput. Speech Lang.
- 1998

The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform. Expand