Learn More
Parsing is one of the important processes for natural language processing and, in general, a large-scale CFG is used to parse a wide variety of sentences. For many languages, a CFG is derived from a large-scale syntactically annotated corpus, and many parsing algorithms using CFGs have been proposed. However, we could not apply them to Japanese since a(More)
Automatic query expansion has been known to be the most important method in overcoming the word mismatch problem in information retrieval. Thesauri have long been used by many researchers as a tool for query expansion. However only one type of thesaurus has generally been used. In this paper we analyze the characteristics of dierent thesaurus types and(More)
This paper proposes an efficient example sampling method for example-based word sense disam-biguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead(More)
Analyzing compound nouns is one of the crucial issues for natural language processing systems , in particular for those systems that aim at a wide coverage of domaius. In this paper, we propose a mcthod to analyze structures of Japanese compound nouns by using both word collocations statistics and a thesaurus. An experiment is conducted with 160,000 word(More)
Text categorization is the classification of documents with respect to a set of prede-fined categories. In this paper, we propose a new probabilistic model for text catego-rization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers(More)
SUMMARY In this study we introduce AdjScales, a method for scaling similar adjectives by their strength. It combines existing Web-based computational linguistic techniques in order to automatically differentiate between similar adjectives that describe the same property by strength. Though this kind of information is rarely present in most of the lexical(More)
Linguistic knowledge plays a crucial role in natural language processing. Constructing large linguistic knowledge bases requires a lot of human effort and much cost. There have been many attempts to construct linguistic knowledge automatically , based on two primary strategies: knowledge extraction from annotated corpora and the augmentation of existing(More)
Text categorization can be viewed as a process of category search, in which one or more categories for a test document are searched for by using given training documents with known categories. In this paper a cluster-based search with a probabilistic clustering algorithm is proposed and evaluated on two data sets. The eciency, eectiveness, and noise(More)