Text Classification from Labeled and Unlabeled Documents using EM

@article{Nigam2004TextCF,
  title={Text Classification from Labeled and Unlabeled Documents using EM},
  author={Kamal Nigam and Andrew McCallum and Sebastian Thrun and Tom Michael Mitchell},
  journal={Machine Learning},
  year={2004},
  volume={39},
  pages={103-134}
}
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive… Expand
A Two Step Data Mining Approach for Amharic Text Classification
TLDR
This paper intended to implement an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation- Maximization (EM) and two classifiers: Naive Bayes (NB) and locally weighted learning (LWL). Expand
Semi-supervised text classification from unlabeled documents using class associated words
TLDR
A learning algorithm, based on the combination of Expectation-Maximization and a Naïve Bayes classifier, is introduced to classify documents from fully unlabeled documents using class associated words, which has good classification capability with high accuracy. Expand
Text classification from positive and unlabeled documents
TLDR
This paper explores an efficient extension of the standard Support Vector Machine approach, called SVMC (Support Vector Mapping Convergence) for the TC-WON tasks, and shows that when the positive training data is not too under-sampled, SVMC significantly outperforms other methods. Expand
Semi-supervised Text Classification Using Partitioned EM
TLDR
This paper proposes a clustering based partitioning technique that first partitions the training documents in a hierarchical fashion using hard clustering, and prunes the tree using the labeled data after running the expectation maximization algorithm in each partition. Expand
Active Learning with Labeled and Unlabeled Documents in Text Categorization
In many real-world learning problems, preparing labeled examples for training is very expensive. In this paper a method for designing a classifier has suggested. At first an initial naïve BayesianExpand
Classification from Positive and Unlabeled Documents
Most existing studies of text classification assume that the training data are completely labeled. In reality, however, many information retrieval problems can be more accurately described asExpand
A model for handling approximate, noisy or incomplete labeling in text classification
TLDR
A Bayesian model, BayesANIL, that is capable of estimating uncertainties associated with the labeling process, and provides an intuitive modification to the EM iterations by re-estimating the empirical. Expand
Text Classification by Labeling Words
TLDR
This paper proposes a method that combines clustering and feature selection that labels a set of representative words for each class, and can effectively rank the words in the unlabeled set according to their importance. Expand
Automatic Text Classification from Labeled and Unlabeled Data
TLDR
This chapter presents a semi-supervised text classification framework that is based on the radial basis function (RBF) neural networks and can learn for classification effectively from a very small quantity of labeled training samples and a large pool of additional unlabeled documents. Expand
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
TLDR
This paper develops a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by first decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classi cation problems. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 78 REFERENCES
Employing EM and Pool-Based Active Learning for Text Classification
This paper shows how a text classifier’s need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) methodExpand
Employing Em in Pool-based Active Learning for Text Classiication
This paper shows how a text classiier's need for labeled training data can be reduced by a combination of active learning and Expectation Maximization (EM) on a pool of unlabeled data.Expand
Combining labeled and unlabeled data with co-training
TLDR
A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples. Expand
Expert network: effective and efficient learning from human decisions in text categorization and retrieval
TLDR
The simplicity of the model, the high recall-precision rates, and the efficient computation together make ExpNet preferable as a practical solution for real-world applications. Expand
Improving Text Classification by Shrinkage in a Hierarchy of Classes
TLDR
This paper shows that the accuracy of a naive Bayes text classi er can be improved by taking advantage of a hierarchy of classes, and adopts an established statistical technique called shrinkage that smoothes parameter estimates of a data-sparse child with its parent in order to obtain more robust parameter estimates. Expand
Committee-Based Sampling For Training Probabilistic Classifiers
TLDR
A general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples, which is particularly attractive because it evaluates the expected information gain from a training example implicitly. Expand
Active Learning with Committees for Text Categorization
TLDR
This paper reports on experiments using a committee of Winnowbased learners and demonstrates that this approach can reduce the number of labeled training examples required over that used by a single Winnow learner by l-2 orders of magnitude. Expand
A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data
TLDR
A classifier structure and learning algorithm that make effective use of unlabelled data to improve performance and is a "mixture of experts" structure that is equivalent to the radial basis function (RBF) classifier, but unlike RBFs, is amenable to likelihood-based training. Expand
A comparison of event models for naive bayes text classification
TLDR
It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size. Expand
Context-sensitive learning methods for text categorization
TLDR
RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods and are viewed as a confirmation of the usefulness of classifiers that represent contextual information. Expand
...
1
2
3
4
5
...