Learn More
This paper describes our ongoing work on grammatical error correction (GEC). Focusing on all possible error types in a real-life environment, we propose a factored statistical machine translation (SMT) model for this task. We consider error correction as a series of language translation problems guided by various linguistic information, as factors that(More)
This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label(More)
This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information , automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the(More)
This paper describes the NLP 2 CT Grammatical Error Detection and Correction system for the CoNLL 2013 shared task, with a focus on the errors of article or determiner (ArtOrDet), noun number (Nn), preposition (Prep), verb form (Vform) and subject-verb agreement (SVA). A hybrid model is adopted for this special task. The process starts with spell-checking(More)
This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the " segmentation agreements " between the two different types of view are used to overcome the scarcity of the label information on unla-beled data. The proposed approach trains a(More)
Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation(More)
This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs <i>k</i>-nearest neighbor (<i>k</i>-NN) similarity graphs over words with identical pre-terminal(More)
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In(More)
This work was carried out to improve the probability of interception of frequency sweep and lower the system complexity of non-sweep mode of the electronic intelligence system for frequency agile radar signal. A low complexity receiver structure and its associated demodulation algorithm with high probability of intercept are also presented. Moreover, the(More)