A Minimally Supervised Approach for Detecting and Ranking Document Translation Pairs

@inproceedings{Krstovski2011AMS,
  title={A Minimally Supervised Approach for Detecting and Ranking Document Translation Pairs},
  author={Kriste Krstovski and David A. Smith},
  booktitle={WMT@EMNLP},
  year={2011}
}
We describe an approach for generating a ranked list of candidate document translation pairs without the use of bilingual dictionary or machine translation system. We developed this approach as an initial, filtering step, for extracting parallel text from large, multilingual—but non-parallel— corpora. We represent bilingual documents in a vector space whose basis vectors are the overlapping tokens found in both languages of the collection. Using this representation, weighted by tf·idf, we… CONTINUE READING
10 Citations
14 References
Similar Papers

Similar Papers

Loading similar papers…