Xiaodong Zeng

Learn More
This paper introduces a graph-based semisupervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label(More)
BACKGROUND With the increase of motor vehicles, ambient air pollution related to traffic exhaust has become an important environmental issue in China. Because of their fast growth and development, children are more susceptible to ambient air pollution exposure. Many chemicals from traffic exhaust, such as carbon monoxide, nitrogen dioxide, and lead, have(More)
This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a(More)
[1] In semi-arid areas, multiple equilibrium states of an ecosystem (e.g., grassland and desert) are found to coexist, and the transition from grassland to desert is often abrupt at the boundary. A simple ecosystem model is developed to provide the biophysical explanation of this phenomenon. The model has three variables: living biomass, wilted biomass, and(More)
A dynamical ecosystem model with three variables, living biomass, wilted biomass and available soil wetness, is developed to examine the vegetation–soil water interaction in semi-arid areas. The governing equations are based on the mass conservation law. The physical and biophysical processes are formulated with the parameters estimated from observational(More)
The conventional machine translation evaluation metrics tend to perform well on certain language pairs but weak on other language pairs. Furthermore, some evaluation metrics could only work on certain language pairs not language-independent. Finally, no considering of linguistic information usually leads the metrics result in low correlation with human(More)
This paper describes our ongoing work on grammatical error correction (GEC). Focusing on all possible error types in a real-life environment, we propose a factored statistical machine translation (SMT) model for this task. We consider error correction as a series of language translation problems guided by various linguistic information, as factors that(More)
Based on the physico-biophysical considerations, mathematical analysis and some approximate formulations generally adopted in meteorology and ecology, an ecological dynamic model of grassland is developed. The model consists of three interactive variables, i.e. the biomass of living grass, the biomass of wilted grass, and the soil wetness. The major(More)
This paper describes the NLP 2 CT Grammatical Error Detection and Correction system for the CoNLL 2013 shared task, with a focus on the errors of article or determiner (ArtOrDet), noun number (Nn), preposition (Prep), verb form (Vform) and subject-verb agreement (SVA). A hybrid model is adopted for this special task. The process starts with spellchecking as(More)