Inductive learning algorithms and representations for text categorization

@inproceedings{Dumais1998InductiveLA,
  title={Inductive learning algorithms and representations for text categorization},
  author={Susan T. Dumais and John C. Platt and David Hecherman and Mehran Sahami},
  booktitle={CIKM '98},
  year={1998}
}
1. ABSTRACT Text categorization – the assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate… Expand
Feature Preparation in Text Categorization
Text categorization is an important application of machine learning to the field of document information retrieval. Most machine learning methods treat text documents as a feature vectors. We reportExpand
On feature distributional clustering for text categorization
TLDR
This work describes a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier that yields high performance text classification that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Expand
Support vector machines for text categorization
TLDR
This paper compares artificial neural network and support vector machine algorithms for use as text classifiers of news items and identifies a reduction in feature set that provides improved results. Expand
Combining machine learning and hierarchical structures for text categorization
TLDR
This dissertation focuses on the use of hierarchical classification structures, such as the UMLS Metathesaurus or the Yahoo! hierarchy of topics, to build and train machine learning algorithms for text categorization using a variation of the Hierarchical Mixtures of Experts (HME) model adapted forText categorization. Expand
Ranking and selecting terms for text categorization via SVM discriminate boundary
TLDR
An SVM based feature ranking and selecting method for text categorization is proposed that achieves higher classification performance than existing feature selection based on LSI and x/sup 2/ statistics values. Expand
Machine learning in automated text categorization
TLDR
This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation. Expand
Experiments with multi-label text classifier on the Reuters collection
Text categorization is the classification to assign a text document to an appropriate category in a predefined set of categories. We present an approach on hierarchical text categorization that is aExpand
A statistical learning learning model of text classification for support vector machines
TLDR
This model explains why and when SVMs perform well for text classification and connects the statistical properties of text-classification tasks with the generalization performance of a SVM in a quantitative way. Expand
A Novel Term Weighting Scheme for Automated Text Categorization
  • H. Xu, Chunping Li
  • Computer Science
  • Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007)
  • 2007
TLDR
This paper presents a new term weighting scheme that considers more information provided by the term distribution among different categories and shows that it is more effective than three other popular schemes. Expand
Machine Learning in Automated Text Categorization: a Bibliography
ATC is the activity of automatically building, by means of machine learning techniques, automatic text classifiers, i.e. systems capable of assigning to a text document one or more thematicExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
A comparison of two learning algorithms for text categorization
TLDR
It is shown that both algorithms achieve reasonable performance and allow controlled tradeoos between false positives and false negatives, and the stepwise feature selection in the decision tree algorithm is particularly eeective in dealing with the large feature sets common in text categorization. Expand
Automated learning of decision rules for text categorization
TLDR
It is shown that machine-generated decision rules appear comparable to human performance, while using the identical rule-based representation, and compared with other machine-learning techniques. Expand
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs areExpand
An evaluation of phrasal and clustered representations on a text categorization task
TLDR
It is shown that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. Expand
Expert network: effective and efficient learning from human decisions in text categorization and retrieval
TLDR
The simplicity of the model, the high recall-precision rates, and the efficient computation together make ExpNet preferable as a practical solution for real-world applications. Expand
An example-based mapping method for text categorization and retrieval
TLDR
It is evident that the LLSF approach uses the relevance information effectively within human decisions of categorization and retrieval, and achieves a semantic mapping of free texts to their representations in an indexing language. Expand
Training algorithms for linear text classifiers
TLDR
This work proposes that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used in training linear text classifiers for IR tasks, and theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Expand
Context-sensitive learning methods for text categorization
TLDR
RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods and are viewed as a confirmation of the usefulness of classifiers that represent contextual information. Expand
A comparison of event models for naive bayes text classification
TLDR
It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size. Expand
Context-sensitive learning methods for text categorization
Two recently implemented machine-learning algorithms, RIPPERand sleeping-experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classi...
...
1
2
3
4
...