Shallow Parsing By Weighted Probabilistic Sum

Abstract

In this paper, we define the chunking problem as a classification of words and present a weighted probabilistic model for a text chunking. The proposed model exploits context features around the focus word. And to alleviate the sparse data problem, it integrates general features with specific features. In the training stage, we select useful features after measuring information gain ratio of each features and assign higher weight to more informative feature by adopting the information gain ratio. At the application time, we classify words into chunk labels while checking consistency of the begin and the end of a chunk. The experimental results show that the model combining general and specific features alleviates the sparse data problem. In addition, the weighted probabilistic model based on information gain ratio outperforms the non-weighted model.

6 Figures and Tables

Cite this paper

@inproceedings{Hwang2001ShallowPB, title={Shallow Parsing By Weighted Probabilistic Sum}, author={Young-Sook Hwang and So-Young Park and Hoo-Jung Chung and Yong-Jae Kwak and Hae-Chang Rim}, year={2001} }