Pool-based Active Learning for Text Classiication

  • Kamal Nigamy, Andrew McCallumzy
  • Published 1998

Abstract

This paper shows how a text classiier's need for labeled training documents can be reduced by employing a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool by explicitly estimating document density when selecting examples for labeling. Then active learning is combined with Expectation-Maximization in order to \\ll in" the class labels of those documents that remain unlabeled. Experimental results show that the improvements to active learning reduce the need for labelings by one-third over previous QBC approaches , and that the combination of EM and active learning requires only slightly more than half as many labeled training examples to achieve the same accuracy as either EM or active learning alone.

Cite this paper

@inproceedings{Nigamy1998PoolbasedAL, title={Pool-based Active Learning for Text Classiication}, author={Kamal Nigamy and Andrew McCallumzy}, year={1998} }