Guoyang Shen

How to effectively protect against spam on search ranking results is an important issue for contemporary web search engines. This paper addresses the problem of combating one major type of web spam: 'link spam.' Most of the previous work on anti link spam managed to make use of one snapshot of web data to detect spam, and thus it did not take advantage of(More)
We present CCRFs (Cascaded Conditional Random Fields): a cascaded approach to scale Conditional Random Fields (CRFs) for Chinese POS tagging (labeling). General CRFs worked well on POS tagging, but met difficulty when dealing with a large training dataset and tag set because of high computation cost for training. CCRFs organize all tags in a hierarchy and(More)
Relation extraction is to identify the relations between pairs of named entities. In this paper, we try to solve the problem of relation extraction by discovering dependency tree patterns (a pattern is an embedded sub dependency tree indicating a relation instance). Our approach is to find an optimal rule (pattern) set automatically based on the proposed(More)
