Learn More
Data selection has shown significant improvements in effective use of training data by extracting sentences from large general-domain corpora to adapt statistical machine translation (SMT) systems to in-domain data. This paper performs an in-depth analysis of three different sentence selection techniques. The first one is cosine tf-idf, which comes from the(More)
This paper is to introduce our participation in the WMT13 shared tasks on Quality Estimation for machine translation without using reference translations. We submitted the results for Task 1.1 (sentence-level quality estimation), Task 1.2 (system selection) and Task 2 (word-level quality estimation). In Task 1.1, we used an enhanced version of BLEU metric(More)
This paper describes the NLP 2 CT Grammatical Error Detection and Correction system for the CoNLL 2013 shared task, with a focus on the errors of article or determiner (ArtOrDet), noun number (Nn), preposition (Prep), verb form (Vform) and subject-verb agreement (SVA). A hybrid model is adopted for this special task. The process starts with spellchecking as(More)
The conventional machine translation evaluation metrics tend to perform well on certain language pairs but weak on other language pairs. Furthermore, some evaluation metrics could only work on certain language pairs not language-independent. Finally, no considering of linguistic information usually leads the metrics result in low correlation with human(More)
In this paper, we proposed a Chinese word segmentation model for micro-blog text. Although Conditional Random Fields (CRFs) models have been presented to deal with word segmentation, this is still the first time to apply it for the segmentation in the domain of Chinese micro-blog. Different from the genres of common articles, micro-blog has gradually become(More)
This paper aims at effective use of training data by extracting sentences from large generaldomain corpora to adapt statistical machine translation systems to domain-specific data. We regard this task as a problem of filtering training sentences with respect to the target domain via different similarity metrics. Thus, we give new insights into when data(More)
  • 1