Employing EM and Pool-Based Active Learning for Text Classification

  title={Employing EM and Pool-Based Active Learning for Text Classification},
  author={Andrew McCallum and Kamal Nigam},
This paper shows how a text classifier’s need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool for explicitly estimating document density when selecting examples for labeling. Then active learning is combined with ExpectationMaximization in order to “fill in” the class labels of those documents that remain unlabeled. Experimental results show that… CONTINUE READING
Highly Influential
This paper has highly influenced a number of papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 873 citations. REVIEW CITATIONS

3 Figures & Tables



Citations per Year

873 Citations

Semantic Scholar estimates that this publication has 873 citations based on the available data.

See our FAQ for additional information.