Yu-Ming Hsieh

Preparation of knowledge bank is a very difficult task. In this paper, we discuss the knowledge extraction from the manually examined Sinica Treebank. Categorical information, word-to-word relation, word collocations, new syntactic patterns and sentence structures are obtained. A searching system for Chinese sentence structure was developed in this study.(More)
In order to obtain a high precision and high coverage grammar, we proposed a model to measure grammar coverage and designed a PCFG parser to measure efficiency of the grammar. To generalize grammars, a grammar binarization method was proposed to increase the coverage of a probabilistic contextfree grammar. In the mean time linguistically-motivated feature(More)
Selecting the best structure from several ambiguous structures produced by a syntactic parser is a challenging issue. The quality of the solution depends on the precision of the structure evaluation methods. In this paper, we propose a general model (context-dependent probability re-estimation model, CDM) to enhance the structure probabilities estimation.(More)
In order to accomplish the tasks of identifying incorrect characters and error correction, we developed two error detection systems with different dictionaries. First system, called CKIP-WS, adopted the CKIP word segmentation system which based on CKIP dictionary as its core detection procedure; another system, called G1-WS, used Google 1T uni-gram data to(More)
Event classification is one of the crucial tasks in lexical semantic representation. Traditionally, researchers have regarded process and state as two top-level events and discriminated between them by semantic and syntactic characteristics. In this paper, we add cause-result relativity as an auxiliary criterion to discriminate between process and state by(More)
自然語言處理的過程中,句法剖析(parsing)是一個核心處理過程。在過去研究中,剖析器(parser)利 用從樹庫(treebank)中訓練出的probabilistic context-free grammar(以下簡稱PCFG),對句子剖析是很常用 的技術。在英文的部份,因為有大量的英文樹庫資料,利用PCFG剖析英文句子都會有不錯的效果,現 有資料顯示約可至九成,還進一步的做到詞彙化剖析(lexicalized parsing)[6]。相對於有限的中文句結構 樹庫,非詞彙化剖析(unlexicalized parsing)是一個研究的開始。在本篇論文中,研究如何從有限的中研 院中文句結構樹庫(Sinica Treebank)中,抽取最佳的PCFG,使得抽取出的語法規律有較佳的覆蓋率(More)
This paper presents a method to enhance a Chinese parser in parsing conjunctive structures. Long conjunctive structures cause long-distance dependencies and tremendous syntactic ambiguities. Pure syntactic approaches hardly can determine boundaries of conjunctive phrases properly. In this paper, we propose a divide-andconquer approach which overcomes the(More)