Learn More
This paper describes our ongoing work on grammatical error correction (GEC). Focusing on all possible error types in a real-life environment, we propose a factored statistical machine translation (SMT) model for this task. We consider error correction as a series of language translation problems guided by various linguistic information, as factors that(More)
This paper introduces a graph-based semi-supervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label(More)
[1] In semi-arid areas, multiple equilibrium states of an ecosystem (e.g., grassland and desert) are found to coexist, and the transition from grassland to desert is often abrupt at the boundary. A simple ecosystem model is developed to provide the biophysical explanation of this phenomenon. The model has three variables: living biomass, wilted biomass, and(More)
BACKGROUND With the increase of motor vehicles, ambient air pollution related to traffic exhaust has become an important environmental issue in China. Because of their fast growth and development, children are more susceptible to ambient air pollution exposure. Many chemicals from traffic exhaust, such as carbon monoxide, nitrogen dioxide, and lead, have(More)
This paper describes the NLP 2 CT Grammatical Error Detection and Correction system for the CoNLL 2013 shared task, with a focus on the errors of article or determiner (ArtOrDet), noun number (Nn), preposition (Prep), verb form (Vform) and subject-verb agreement (SVA). A hybrid model is adopted for this special task. The process starts with spell-checking(More)
This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the " segmentation agreements " between the two different types of view are used to overcome the scarcity of the label information on unla-beled data. The proposed approach trains a(More)
This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information , automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the(More)
Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation(More)
This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs <i>k</i>-nearest neighbor (<i>k</i>-NN) similarity graphs over words with identical pre-terminal(More)