Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa

  title={Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa},
  author={Jie Yang and Thomas Drake and Andreas C. Damianou and Yoelle Maarek},
  journal={Proceedings of the 2018 World Wide Web Conference},
This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted crowdsourcing approach, where multiple annotators with unknown expertise contribute an uncontrolled amount (often limited) of annotations. Our framework leverages the low-rank structure in annotations to learn individual annotator expertise, which… 

Figures and Tables from this paper

A review and experimental analysis of active learning over crowdsourced data

This paper provides a comprehensive and systematic survey of the recent research on active learning in the hybrid human–machine classification setting, where crowd workers contribute labels to either directly classify data instances or to train machine learning models.

End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models

This paper shows how label aggregation can benefit from estimating the annotators' confusion matrices to improve the learning process and shows how relabeling only 10% of the data using the expert's results in over 90% classification accuracy with SVM.

Scalpel-CD: Leveraging Crowdsourcing and Deep Probabilistic Modeling for Debugging Noisy Training Data

Scalpel-CD is a first-of-its-kind system that leverages both human and machine intelligence to debug noisy labels from the training data of machine learning systems and is able to improve label quality with only 2.8% instances inspected by the crowd.

ActiveLink: Deep Active Learning for Link Prediction in Knowledge Graphs

A novel deep active learning framework, ActiveLink, which can be applied to actively train any neural link predictor, inspired by recent advances in Bayesian deep learning, which takes a Bayesian view on neural link predictors, thereby enabling uncertainty sampling forDeep active learning.

Machine learning from crowds: A systematic review of its applications

This work has analyzed many applications of machine learning using crowdsourced data following a systematic methodology, classifying them into different fields of study, highlighting several of their characteristics and showing the recent interest in the use of crowdsourcing for machine learning.

LABNET: A Collaborative Method for DNN Training and Label Aggregation

It is argued that training DNN and aggregating labels are not two separate tasks, and LABNET an iterative two-step method that connects data features, noisy labels, and aggregated labels is proposed.

Online Label Aggregation: A Variational Bayesian Approach

This paper proposes a novel online label aggregation framework, BiLA, which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training, and derives the convergence bound of the proposed optimizer.

CrowdRL: An End-to-End Reinforcement Learning Framework for Data Labelling

CrowdRL is the first RL framework designed for the data labelling workflow by seamlessly integrating task selection, task assignment and truth inference together, and fully utilizes the power of heterogeneous annotators (experts and crowdsourcing workers) and machine learning models together to infer the truth, which highly improves the quality of datalabelling.

Bayesian Ensembles of Crowds and Deep Learners for Sequence Tagging

This work develops a modular Bayesian method for aggregating sequence labels from multiple annotators and evaluates different models of annotator errors and labeling biases, showing that the sequential annotator model outperforms previous methods.



Active Learning from Crowds

A probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate across the task domain is employed.

Active Learning from Crowds with Unsure Option

This paper allows the annotators to express that they are unsure about the assigned data instances, and proposes the ALCU-SVM algorithm, which achieves very promising performance on simulated and real crowdsourcing data.

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

A probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers is developed and developed to improve the quality of the links while limiting the amount of work performed by the crowd.

Active Learning with Amazon Mechanical Turk

The utility of active learning in crowdsourcing is evaluated on two tasks, named entity recognition and sentiment detection, and it is shown that active learning outperforms random selection of annotation examples in a noisy crowdsourcing scenario.

Modeling annotator expertise: Learning when everybody knows a bit of something

This paper develops a probabilistic approach to this problem when annotators may be unreliable, but also their expertise varies depending on the data they observe, which provides clear advantages over previously introduced multi-annotator methods.

Learning from crowds in the presence of schools of thought

This work presents a statistical model to estimate worker reliability and task clarity without resorting to the single gold standard assumption, instantiated by explicitly characterizing the grouping behavior to form schools of thought with a rank-1 factorization of a worker-task groupsize matrix.

On Quality Control and Machine Learning in Crowdsourcing

This paper considers two particular aspects of crowdsourcing and their interplay, data quality control (QC) and ML, reflecting on where crowdsourcing has been, where it is, and where it might go from here.

Deep Bayesian Active Learning with Image Data

This paper develops an active learning framework for high dimensional data, a task which has been extremely challenging so far, with very sparse existing literature, and demonstrates its active learning techniques with image data, obtaining a significant improvement on existing active learning approaches.

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.

Active Learning Literature Survey

This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.