Lu Xiang

Learn More
Since the quality of statistical machine translation (SMT) is heavily dependent upon the size and quality of training data, many approaches have been proposed for automatically mining bilingual text from comparable corpora. However, the existing solutions are restricted to extract either bilingual sentences or sub-sentential fragments. Instead, we present(More)
This paper presents our system for the CIPS-SIGHAN-2014 bakeoff task of Chinese word segmentation. This system adopts a character-based joint approach, which combines a character based generative model and a character-based discriminative model. To further improve the performance in cross-domain, an external dictionary is employed. In addition,(More)
  • 1