A Cross-Lingual Word Kernel SVM for SMT Training Corpus Selection

  • Xiwu Han
  • Published 2009 in
    2009 WRI World Congress on Computer Science and…


Instead of collecting more and more parallel training corpora, this paper aims to improve SMT performance by exploiting full potential of the existing parallel corpora. Inspired by the mechanism of string subsequence and word sequence kernels, we first propose a cross-lingual word kernel (CWK) SVM to classify SMT training corpus as literal translation and… (More)
DOI: 10.1109/CSIE.2009.278

4 Figures and Tables


