Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences

  title={Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences},
  author={Andrew M. Finch and Taisuke Harada and Kumiko Tanaka-Ishii and Eiichiro Sumita},
  journal={ACM Trans. Asian & Low-Resource Lang. Inf. Process.},
This article proposes a technique for mining bilingual lexicons from pairs of parallel short word sequences. The technique builds a generative model from a corpus of training data consisting of such pairs. The model is a hierarchical nonparametric Bayesian model that directly induces a bilingual lexicon while training. The model learns in an unsupervised manner and is designed to exploit characteristics of the language pairs being mined. The proposed model is capable of utilizing commonly used… CONTINUE READING