Huiming Duan

The CIPS-SIGHAN CLP 2012 Chinese Word Segmentation on MicroBlog Corpora Bakeoff was held in the autumn of 2012. This bake-off task of Chinese word segmentation is focused on the performance of Chinese word segmentation algorithms on MicroBlog corpora. 17 groups submitted 20 results, among which the best system has all the P, R and F values near 95%, and the(More)
Gelatin is a well-known biopolymer, and it has a long history of use mainly as a gelling agent in the food industry. This paper reports a new method for producing recombinant hydroxylated human-derived gelatin in Pichia pastoris KM71. Three independent expression cassettes encoding for specific length of gelatin, prolyl 4-hydroxylase (P4H, EC,(More)
This paper presents a maximum entropy (ME)-based model for Chinese noun phrase metaphor recognition. The metaphor recognizing process will be viewed as a classification task between metaphor and literal meaning. Our experiments show that the metaphor recognizer based on the ME method is significantly better than the Example-based methods within the same(More)
NP identification is a challenging subtask of NLP. The reported literatures mainly focus on base noun phrase and maximal-length noun phrase, and deal with them as a sequence labeling problem. In this paper, unlike existing perspective, we concentrate on a special subcategory of Chinese NP, classifier noun phrase (CNP), and present a new approach which uses(More)
In contemporary Chinese, there is a subclass of verbs called Dummy Verbs. After briefly introducing the lexical meanings of two typical dummy verb, ‘Jiayi’ and ‘Jinxing’, this paper discusses the grammatical attributes of ‘Jiayi’ and ‘Jinxing’ in detail and further explores their functions as markers of syntactic constituents and semantic roles.
Increase in three-character words attracts more and more attention from researchers. In the present paper, the ratio of three-character words unrecorded in the Grammatical Knowledge-base of Contemporary Chinese is obtained by an analysis of the tagged corpus of People’s Daily of 1998. (henceforth, three-character unknown words). The results show that the(More)
