Text Classification from Labeled and Unlabeled Documents using EM

  title={Text Classification from Labeled and Unlabeled Documents using EM},
  author={K. Nigam and A. McCallum and S. Thrun and Tom Michael Mitchell},
  journal={Machine Learning},
  • K. Nigam, A. McCallum, +1 author Tom Michael Mitchell
  • Published 2004
  • Computer Science
  • Machine Learning
  • This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive… CONTINUE READING
    2,977 Citations

    Figures, Tables, and Topics from this paper

    Semi-supervised text classification from unlabeled documents using class associated words
    • 10
    Text classification from positive and unlabeled documents
    • 67
    • PDF
    Semi-supervised Text Classification Using Partitioned EM
    • 23
    • Highly Influenced
    • PDF
    A model for handling approximate, noisy or incomplete labeling in text classification
    • 20
    • PDF
    Text Classification by Labeling Words
    • 198
    • PDF
    Automatic Text Classification from Labeled and Unlabeled Data
    • 8
    • PDF
    Combining Labeled and Unlabeled Data for MultiClass Text Categorization
    • 128
    • Highly Influenced


    Employing EM and Pool-Based Active Learning for Text Classification
    • 797
    • PDF
    Employing Em in Pool-based Active Learning for Text Classiication
    • Kamal Nigamyknigam
    • 1998
    • 183
    • Highly Influential
    • PDF
    Combining labeled and unlabeled data with co-training
    • 5,044
    • PDF
    Expert network: effective and efficient learning from human decisions in text categorization and retrieval
    • 484
    Improving Text Classification by Shrinkage in a Hierarchy of Classes
    • 624
    • PDF
    Committee-Based Sampling For Training Probabilistic Classifiers
    • 449
    Active Learning with Committees for Text Categorization
    • 186
    • PDF
    A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data
    • 328
    • PDF
    A comparison of event models for naive bayes text classification
    • 3,587
    • PDF
    Context-sensitive learning methods for text categorization
    • 571
    • PDF