Structured labeling for facilitating concept evolution in machine learning

@article{Kulesza2014StructuredLF,
  title={Structured labeling for facilitating concept evolution in machine learning},
  author={Todd Kulesza and Saleema Amershi and Rich Caruana and Danyel Fisher and Denis Xavier Charles},
  journal={Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
  year={2014}
}
Labeling data is a seemingly simple task required for training many machine learning systems, but is actually fraught with problems. This paper introduces the notion of concept evolution, the changing nature of a person's underlying concept (the abstract notion of the target class a person is labeling for, e.g., spam email, travel related web pages) which can result in inconsistent labels and thus be detrimental to machine learning. We introduce two structured labeling solutions, a novel… 

Figures from this paper

Evidence Humans Provide When Explaining Data-Labeling Decisions
TLDR
In a user study and a data experiment, it was found that some concepts could be partially defined through their relationship to frequently co-occurring concepts, rather than only through labeling.
The Exploratory Labeling Assistant: Mixed-Initiative Label Curation with Large Document Collections
TLDR
This paper proposes an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations and uses unsupervised machine learning methods that provide suggestions and data summaries.
Label-and-Learn: Visualizing the Likelihood of Machine Learning Classifier's Success During Data Labeling
TLDR
Through a Label-and-Learn interface, this paper explores visualization strategies that leverage the data labeling task to enhance developers' knowledge about their dataset, including the likely success of the classifiers and the rationale behind the classifier's decisions.
OneLabeler: A Flexible System for Building Data Labeling Tools
TLDR
A conceptual framework for data labeling and OneLabeler is proposed based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios and demonstrates the expressiveness and utility of the system through ten example labeling tools built with One labeler.
Increasing the Speed and Accuracy of Data Labeling Through an AI Assisted Interface
TLDR
This work designed an AI labeling assistant that uses a semi-supervised learning algorithm to predict the most probable labels for each example and leverages these predictions to provide assistance in two ways: providing a label recommendation and reducing the labeler’s decision space by focusing their attention on only themost probable labels.
Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets
TLDR
Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions.
Local Decision Pitfalls in Interactive Machine Learning
TLDR
This work characterize the utility of interactive feature selection through a combination of human-subjects experiments and computational simulations and finds that, in expectation, interactive modification fails to improve model performance and may hamper generalization due to overfitting.
Interactive Naming for Explaining Deep Neural Networks: A Formative Study
TLDR
A user interface for "interactive naming," which allows a human annotator to manually cluster significant activation maps in a test set into meaningful groups called "visual concepts", is developed and found that a large fraction of the activation maps have recognizable visual concepts, and that there is significant agreement between the different annotators about their denotations.
AILA: Attentive Interactive Labeling Assistant for Document Classification through Attention-Based Deep Neural Networks
TLDR
Assessment of the labeling efficiency and the accuracy showed that participants' labeling efficiency increased significantly under the condition with IAM than the condition without IAM, while the two conditions maintained roughly the same labeling accuracy.
Putting the Scientist in the Loop -- Accelerating Scientific Progress with Interactive Machine Learning
TLDR
A typical scientist's data collection and processing workflow is analyzed and many problems facing practitioners when attempting to capitalize on advances in machine learning and pattern recognition are highlighted.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 47 REFERENCES
Get another label? improving data quality and data mining using multiple, noisy labelers
TLDR
The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
Learning in the Presence of Concept Drift and Hidden Contexts
TLDR
A family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear are described, including a heuristic that constantly monitors the system's behavior.
The problem of concept drift: definitions and related work
TLDR
This paper considers different types of concept drift, peculiarities of the problem, and gives a critical review of existing approaches to the problem.
Assisting Users with Clustering Tasks by Combining Metric Learning and Classification
TLDR
This work has developed a hybrid mechanism for combining the metric learner and the classifier, and presents results from a large number of trials based on human clusterings, in which it is shown that the combination scheme matches and often exceeds the performance of a method which exclusively uses either type of learner.
Learning to Tag using Noisy Labels
TLDR
This work investigates a method for training tagging algorithms using a reduced set of labels corresponding to topics derived from the tags, and shows that this method is comparable, in terms of annotation and retrieval performance, to the method of using tags directly as labels.
Learning to Tag from Open Vocabulary Labels
TLDR
This work presents a new approach that organizes these noisy tags into well-behaved semantic classes using topic modeling, and learns to predict tags accurately using a mixture of topic classes, and achieves comparable performance for classification and superior performance for retrieval.
A hybrid user model for news story classification
TLDR
An intelligent agent designed to compile a daily news program for individual users, which motivates the use of a multi-strategy machine learning approach that allows for the induction of user models that consist of separate models for long-term and short-term interests.
Learning consensus opinion: mining data from a labeling game
TLDR
A novel approach to collecting the individual user preferences over image-search results is presented: this work uses a collaborative game in which players are rewarded for agreeing on which image result is best for a query, which amounts to about 18 million expressed preferences between pairs.
Power to the People: The Role of Humans in Interactive Machine Learning
TLDR
It is argued that the design process for interactive machine learning systems should involve users at all stages: explorations that reveal human interaction patterns and inspire novel interaction methods, as well as refinement stages to tune details of the interface and choose among alternatives.
Identifying Mislabeled Training Data
TLDR
This paper uses a set of learning algorithms to create classifiers that serve as noise filters for the training data and suggests that for situations in which there is a paucity of data, consensus filters are preferred, whereas majority vote filters are preferable for situations with an abundance of data.
...
1
2
3
4
5
...