• Corpus ID: 195767214

The Practical Challenges of Active Learning: Lessons Learned from Live Experimentation

  title={The Practical Challenges of Active Learning: Lessons Learned from Live Experimentation},
  author={Jean-François Kagy and Tolga Kayadelen and Ji Ma and Afshin Rostamizadeh and Jana Strnadov{\'a}},
We tested in a live setting the use of active learning for selecting text sentences for human annotations used in training a Thai segmentation machine learning model. In our study, two concurrent annotated samples were constructed, one through random sampling of sentences from a text corpus, and the other through model-based scoring and ranking of sentences from the same corpus. In the course of the experiment, we observed the effect of significant changes to the learning environment which are… 

Figures from this paper

Power to the Oracle? Design Principles for Interactive Labeling Systems in Machine Learning

Five design principles for interactive labeling systems are identified based on a literature review and a frame for detecting common ground in the implementation of corresponding solutions is offered and strives to contribute design knowledge for the increasingly important class of interactive labeling system.

Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification

This paper introduces a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL and compares the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets.

Fluent: An AI Augmented Writing Tool for People who Stutter

Fluent is presented, an AI augmented writing tool which assists people who stutter in writing scripts which they can speak more fluently and can be beneficial for certain important life situations like giving a talk, presentation, etc.

Yet Another Study on Active Learning and Human Pose Estimation

A practical active learning strategy, currently under testing in an industrial online environment, is proposed and an overview of the implemented strategy is presented along with initial results.



Active Learning and the Total Cost of Annotation

This work explores one such strategy: using a model during annotation to automate some of the decisions during annotation, showing an 80% reduction in annotation cost compared with labeling randomly selected data with a single model.

Active Learning by Labeling Features

This paper proposes an active learning approach in which the machine solicits "labels" on features rather than instances, and shows that this method outperforms passive learning with features as well as traditional active learning with instances.

Active Learning Literature Survey

This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

Active Model Selection

This paper presents and analyses the simplified "(budgeted) active model selection" version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation, and shows that it is NP-hard in general.

How well does active learning actually work? Time-based evaluation of cost-reduction strategies for language documentation.

It is found that better example selection and label suggestions improve efficiency, but effectiveness depends strongly on annotator expertise, and the need for cost-sensitive learning methods that adapt to annotators is stressed.

Reinforcement learning for active model selection

This paper considers applying reinforcement learning (RL) techniques to learn an effective spending policy, and shows that the performance of RL techniques is inferior to existing, simpler spending policies.

Active Learning with Model Selection

This work proposes an algorithm that actively samples data to simultaneously train a set of candidate models and also select the best model from this set and empirically demonstrates that this algorithm is nearly as effective as an active learning oracle that knows the optimal model in advance.

Efficiently learning the accuracy of labeling sources for selective sampling

IEThresh (Interval Estimate Threshold) is presented as a strategy to intelligently select the expert(s) with the highest estimated labeling accuracy and achieves a given level of accuracy with less than half the queries issued by all-experts labeling and less than a third the queries required by random expert selection on datasets such as the UCI mushroom one.

Active Hidden Markov Models for Information Extraction

This paper considers the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training, and describes an EM style algorithm for learning HMMs from partially labeled data.

Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

This paper proposes algorithms for integrating machine learning into crowd-sourced databases in order to combine the accuracy of human labeling with the speed and cost-effectiveness of machine learning classifiers, and designs the first active learning algorithms that meet all these requirements.