Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

@article{Kurata2006UnsupervisedAO,
  title={Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus},
  author={Gakuto Kurata and Shinsuke Mori and Masafumi Nishimura},
  journal={2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings},
  year={2006},
  volume={1},
  pages={I-I}
}
The target uses of large vocabulary continuous speech recognition (LVCSR) systems are spreading. It takes a lot of time to build a good LVCSR system specialized for the target domain because experts need to manually segment the corpus of the target domain, which is a labor-intensive task. In this paper, we propose a new method to adapt an LVCSR system to a new domain. In our method, we stochastically segment a Japanese raw corpus of the target domain. Then a domain-specific language model (LM… CONTINUE READING