• Corpus ID: 189928209

Active Learning by Greedy Split and Label Exploration

  title={Active Learning by Greedy Split and Label Exploration},
  author={Alyssa Herbst and Bert Huang},
Annotating large unlabeled datasets can be a major bottleneck for machine learning applications. We introduce a scheme for inferring labels of unlabeled data at a fraction of the cost of labeling the entire dataset. We refer to the scheme as greedy split and label exploration (GSAL). GSAL greedily queries an oracle (or human labeler) and partitions a dataset to find data subsets that have mostly the same label. GSAL can then infer labels by majority vote of the known labels in each subset. GSAL… 

Figures and Tables from this paper



Multi-class active learning for image classification

An uncertainty measure is proposed that generalizes margin-based uncertainty to the multi-class case and is easy to compute, so that active learning can handle a large number of classes and large data sizes efficiently.

An Analysis of Active Learning Strategies for Sequence Labeling Tasks

This paper surveys previously used query selection strategies for sequence models, and proposes several novel algorithms to address their shortcomings, and conducts a large-scale empirical comparison.

Data Programming: Creating Large Training Sets, Quickly

A paradigm for the programmatic creation of training sets called data programming is proposed in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict.

Adversarial Label Learning

A weakly supervised method---adversarial label learning---that trains classifiers to perform well against an adversary that chooses labels for training data is proposed, which minimizes an upper bound of the classifier's error rate using projected primal-dual subgradient descent.

Confidence-based active learning

  • Mingkun LiI. Sethi
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2006
This paper proposes a new active learning approach, confidence-based active learning, based on identifying and annotating uncertain samples, which takes advantage of current classifiers' probability preserving and ordering properties and is robust without additional computational effort.

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

Interactive Structure Learning with Structural Query-by-Committee

This work presents a generalization of the query-by-committee active learning algorithm for this setting, and studies its consistency and rate of convergence, both theoretically and empirically, with and without noise.

Importance weighted active learning

This work presents a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions that uses importance weighting to correct sampling bias, and is able to give rigorous label complexity bounds for the learning process.

Active Learning with Statistical Models

This work shows how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.

Hierarchical sampling for active learning

This work presents an active learning scheme that exploits cluster structure in data and demonstrates the power of cluster-based learning to improve the quality of research in many domains.