Huihsin Tseng

Learn More
We present a Chinese word segmentation system submitted to the closed track of Sighan bakeoff 2005. Our segmenter was built using a conditional random field sequence model that provides a framework to use a large number of linguistic features such as character identity, morphological and character reduplication features. Because our morphological features(More)
The prevalence in Chinese of grammatical structures that translate into English in different word orders is an important cause of translation difficulty. While previous work has used phrase-structure parses to deal with such ordering problems, we introduce a richer set of Chinese grammatical relations that describes more semantically abstract relations(More)
Part-of-speech tagging, like any supervised statistical NLP task, is more difficult when test sets are very different from training sets, for example when tagging across genres or language varieties. We examined the problem of POS tagging of different varieties of Mandarin Chinese (PRC-Mainland, PRCHong Kong, and Taiwan). An analytic study first showed that(More)
We propose a simple yet effective approach to context sensitive synonym discovery for Web search queries based on co-click analysis; i.e., analyzing queries leading to clicking same documents. In addition to deriving word based synonyms, we also derive concept based synonyms with the help of query segmentation. Evaluation results show that this approach(More)
This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on(More)
This is a pilot study which aims at the design of a Chinese morphological analyzer which is in state to predict the syntactic and semantic properties of nominal, verbal and adjectival compounds. Morphological structures of compound words contain the essential information of knowing their syntactic and semantic characteristics. In particular, morphological(More)
Search with synonyms is a challenging problem for Web search, as it can easily cause intent drifting. In this paper, we propose a practical solution to this issue, based on co-clicked query analysis, i.e., analyzing queries leading to clicking the same documents. Evaluation results on Web search queries show that synonyms obtained from this approach(More)
User clicks on a URL in response to a query are extremely useful predictors of the URL’s relevance to that query. Exact match click features tend to suffer from severe data sparsity issues in web ranking. Such sparsity is particularly pronounced for new URLs or long queries where each distinct query-url pair will rarely occur. To remedy this, we present a(More)
  • 1