Building text classifiers using positive and unlabeled examples
@article{Liu2003BuildingTC, title={Building text classifiers using positive and unlabeled examples}, author={B. Liu and Yang Dai and Xiaoli Li and Wee Sun Lee and Philip S. Yu}, journal={Third IEEE International Conference on Data Mining}, year={2003}, pages={179-186} }
We study the problem of building text classifiers using positive and unlabeled examples. [] Key Method These techniques are based on the same idea, which builds a classifier in two steps. Each existing technique uses a different method for each step. We first introduce some new methods for the two steps, and perform a comprehensive evaluation of all possible combinations of methods of the two steps. We then propose a more principled approach to solving the problem based on a biased formulation of SVM, and…
675 Citations
Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification
- Computer ScienceISNN
- 2012
An improved iterative classification approach is proposed which is the extension of Biased-SVM and it is shown that it is effective for text classification and outperforms the Biasing SVM and other two step techniques.
A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples
- Computer ScienceAIRS
- 2008
A novel method for the first step, which cluster the unlabeled and positive examples to identify the reliable negative document, and then run SVM iteratively to show experimentally that it is efficient and effective.
Semi-Supervised Text Classification Using Positive and Unlabeled Data
- Computer ScienceAMT
- 2006
This method combines the graph-based semi-supervised learning with the two-step method for solving the PU-Learning problem with small P and indicates that the improved method performs well when the size of P is small.
Tri-Training Based Learning from Positive and Unlabeled Data
- Computer Science2008 International Symposiums on Information Processing
- 2008
A new tri-training algorithm for the LPU problem is proposed that combines the step 1 of the three LPU algorithms to extract a reliable negative examples set and is proposed to build an initial classifier for the tri- training and replace the bootstrap sampling procedure that has not been thought as a good method.
An Evaluation of Two-Step Techniques for Positive-Unlabeled Learning in Text Classification
- Computer Science
- 2014
Five combinations of techniques for two-step approach to positive-unlabeled PU learning problem are evaluated and it is found that using Rocchio method in step 1 and Expectation-Maximization method in steps 2 is most effective combination in experiments.
A More Accurate Text Classifier for Positive and Unlabeled data
- Computer Science
- 2005
Comprehensive experiments demonstrate that the proposed CoTrain-Active approach is superior to Biased-SVM which is said to be previous best and especially suitable for those situations where the given positive dataset P is extremely insufficient.
Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples
- Computer ScienceJ. Comput.
- 2009
A new reliable negative extracting algorithm for step 1 is proposed that adopts kNN algorithm to rank the similarity of unlabeled examples from the k nearest positive examples, and set a threshold to label some unlabeling examples that lower than it as the reliable negative examples rather than the common method to label positive examples.
Building text classifiers using positive, unlabeled and ‘outdated’ examples
- Computer ScienceConcurr. Comput. Pract. Exp.
- 2016
The results show that the proposed method Transfer‐1DNF can extract more reliable negative examples with lower error rates, and the classifier outperforms the baseline algorithms.
Co-EM Support Vector Machine Based Text Classification from Positive and Unlabeled Examples
- Computer Science2008 First International Conference on Intelligent Networks and Intelligent Systems
- 2008
This paper has brought about a novel method based on multi-view algorithms for learning from positive and unlabeled examples (LPU) by using the co-EM SVM algorithm, which was previously used for semi-supervised learning.
A New PU Learning Algorithm for Text Classification
- Computer ScienceMICAI
- 2005
This paper adopts traditional two-step approach by making use of both positive and unlabeled examples, and improves the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate.
References
SHOWING 1-10 OF 47 REFERENCES
Partially Supervised Classification of Text Documents
- Computer ScienceICML
- 2002
This paper shows that the problem of identifying documents from a set of documents of a particular topic or class P and a large set M of mixed documents, and that under appropriate conditions, solutions to the constrained optimization problem will give good solution to the partially supervised classification problem.
Text Classification from Labeled and Unlabeled Documents using EM
- Computer ScienceMachine Learning
- 2004
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.
Combining labeled and unlabeled data with co-training
- Computer ScienceCOLT' 98
- 1998
A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
- Computer ScienceICML
- 2002
This paper develops a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by first decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classi cation problems.
A sequential algorithm for training text classifiers
- Computer ScienceSIGIR '94
- 1994
An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task and reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.
The Value of Unlabeled Data for Classification Problems
- Computer ScienceICML 2000
- 2000
It is demonstrated that Fisher information matrices can be used to judge the asymp-totic value of unlabeled data and this methodology is applied to both passive partially supervised learning and active learning.
One-Class SVMs for Document Classification
- Computer ScienceJ. Mach. Learn. Res.
- 2001
The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable.
Enhancing Supervised Learning with Unlabeled Data
- Computer ScienceICML
- 2000
A new semi-supervised learning method called co-learning that is designed to use unlabeled data to enhance standard supervised learning algorithms to leverage off the fact that they have different representations of the hypotheses and are likely to detect different patterns in labeled data.
A re-examination of text categorization methods
- Computer ScienceSIGIR '99
- 1999
The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.