Learn More
One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TFÃIDF, LSI and multi-word for text representation. We used a Chinese and an English document(More)
One of the deficiencies of mutual information is its poor capacity to measure association of words with unsymmetrical co-occurrence, which has large amounts for multi-word expression in texts. Moreover, threshold setting, which is decisive for success of practical implementation of mutual information for multi-word extraction, brings about many parameters(More)
In this paper, we proposed a new approach using ontology to improve precision of terminology extraction from documents. Firstly, a linguistic method was used to extract the terminological patterns from documents. Then, similarity measures within the framework of ontology were employed to rank the semantic dependency of the noun words in a pattern. Finally,(More)
As a hybrid of N-gram in natural language processing and collocation in statistical linguistics , multi-word is becoming a hot topic in area of text mining and information retrieval. In this paper, a study concerning distribution of multi-words is carried out to explore a theoretical basis for probabilistic term-weighting scheme. Specifically, the Poisson(More)
—Text representation, which is a fundamental and necessary process for text-based intelligent information processing, includes the tasks of determining the index terms for documents and producing the numeric vectors corresponding to the documents. In this paper, multi-word, which is regarded as containing more contextual semantics than individual word and(More)