Unsupervised Training on Large Amounts of Broadcast News Data

@article{Ma2006UnsupervisedTO,
  title={Unsupervised Training on Large Amounts of Broadcast News Data},
  author={Jeff Z. Ma and Spyridon Matsoukas and Owen Kimball and Richard M. Schwartz},
  journal={2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings},
  year={2006},
  volume={3},
  pages={III-III}
}
This paper presents our recent effort that aims at improving our Arabic broadcast news (BN) recognition system by using thousands of hours of un-transcribed Arabic audio in the way of unsupervised training. Unsupervised training is first carried out on the 1,900-hour English topic detection and tracking (TDT) data and is compared with the lightly-supervised training method that we have used for the DARPA EARS evaluations. The comparison shows that unsupervised training produces a 21.7% relative… CONTINUE READING

Tables, Results, and Topics from this paper.

Key Quantitative Results

  • The comparison shows that unsupervised training produces a 21.7% relative reduction in word error rate (WER), which is comparable to the gain obtained with light supervision methods.

Citations

Publications citing this paper.
SHOWING 1-10 OF 34 CITATIONS

Similar Papers

Loading similar papers…