Corpus ID: 18959397

Text Classification and Distributional features techniques in Datamining and Warehousing

  title={Text Classification and Distributional features techniques in Datamining and Warehousing},
  author={Srikanth Bethu and G. Babu and J. Vinoda and E. Priyadarshini and M. R. rao},
Text Categorization is traditionally done by using the term frequency and inverse document frequency.This type of method is not very good because, some words which are not so important may appear in the document .The term frequency of unimportant words may increase and document may be classified in the wrong category.For reducing the error of classifying of documents in wrong category. The Distributional features are introduced. In the Distribuional Features, the Distribution of the words in… Expand


Distributional Features for Text Categorization
In this paper, the distributional features are used to describe a word, which express the distribution of a word in a document, which requires only a little additional cost, while the categorization performance can be significantly improved. Expand
Distributional Word Clusters vs. Words for Text Categorization
An approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier with a word-cluster representation is studied, which significantly outperforms the word-based representation in terms of categorization accuracy or representation efficiency. Expand
An Improved k-Nearest Neighbor Algorithm for Text Categorization
An improved kNN algorithm is proposed, which uses different numbers of nearest neighbors for different categories, rather than a fixed number across all categories, and is promising for some cases, where estimating the parameter k via cross-validation is not allowed. Expand
A Framework of Feature Selection Methods for Text Categorization
A theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement is proposed and a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights is proposed. Expand
Dimension Reduction in Text Classification with Support Vector Machines
Novel dimension reduction methods to reduce the dimension of the document vectors dramatically are adopted and decision functions for the centroid-based classification algorithm and support vector classifiers are introduced to handle the classification problem where a document may belong to multiple classes. Expand
A Loss Function Analysis for Classification Methods in Text Categorization
This paper presents a formal analysis of popular text classification methods, focusing on their loss functions whose minimization is essential to the optimization of those methods, and whoseExpand
An Improved KNN Text Classification Algorithm Based on Clustering
The simulation results show that the algorithm proposed in this paper can not only effectively reduce the actual number of training samples and lower the calculation complexity, but also improve the accuracy of KNN text classification algorithm. Expand
Text Mining Application Programming
Developers will be able to tap into the bevy of information available online in ways they never thought possible and students will have a thorough understanding of the theory and practical application of text mining. Expand
Text Mining Application Programming, Cengage Learning
  • 2008
Distributional Features for Text Categorization” Proc
  • 17th European Conf. Machine Learning (ICML 06), pp 497-508,
  • 2006