Crowdsourced PAC Learning under Classification Noise

@inproceedings{Heinecke2019CrowdsourcedPL,
  title={Crowdsourced PAC Learning under Classification Noise},
  author={Shelby Heinecke and L. Reyzin},
  booktitle={AAAI Conference on Human Computation \& Crowdsourcing},
  year={2019}
}
In this paper, we analyze PAC learnability from labels produced by crowdsourcing. In our setting, unlabeled examples are drawn from a distribution and labels are crowdsourced from workers who operate under classification noise, each with their own noise parameter. We develop an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier. Our three-step algorithm incorporates majority voting, pure-exploration bandits, and noisy-PAC… 

Toward a Perspectivist Turn in Ground Truthing for Predictive Computing

This article describes and advocate for a different paradigm, which it is called data perspectivism, which moves away from traditional gold standard datasets, towards the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of ML processes.

AutoFR: Automated Filter Rule Generation for Adblocking

This work designs an algorithm based on multi-arm bandits to generate filter rules while controlling the trade-off between blocking ads and avoiding breakage, and introduces AutoFR, a reinforcement learning framework to fully automate the process of filter rule creation and evaluation.

Three-Way Decision for Handling Uncertainty in Machine Learning: A Narrative Review

A narrative review of the state of the art of applications of TWD in regard to the different areas of concern identified by the framework is presented, and in so doing it will highlight both the points of strength of the three-way methodology, and the opportunities for further research.

A Smartphone Crowdsensing System Enabling Environmental Crowdsourcing for Municipality Resource Allocation with LSTM Stochastic Prediction

A smartphone crowdsensing system that is based on citizens’ reactions as human sensors at the edge of a municipality infrastructure to supplement malfunctions exploiting environmental crowdsourcing location-allocation capabilities is proposed.

A smartphone-enabled crowdsensing and crowdsourcing system for predicting municipality resource allocation stochastic requirements

An inference engine model is incorporated, which is based on a Long Short-Term Memory (LSTM) Neural Network to learn stochastically examples of incidence occurrence, that is able to predict a similar event in the future thus allocates efficiently a municipality department resource before the problem emerges.

As if sand were stone. New concepts and metrics to probe the ground on which to build trustable AI

Different dimensions and related metrics to assess the quality of the datasets used to build predictive models and Medical Artificial Intelligence (MAI) are proposed and argued that the proposed metrics are feasible for application in real-world settings for the continuous development of trustable and interpretable MAI systems.

The Elephant in the Machine: Proposing a New Metric of Data Reliability and its Application to a Medical Case to Assess Classification Reliability

A novel reliability metric to quantify the extent a ground truth, generated in multi-rater settings, is reliable as a reliable basis for the training and validation of machine learning predictive models is presented.

References

SHOWING 1-10 OF 24 REFERENCES

Efficient PAC Learning from the Crowd

It is shown that all good labelers can be identified, when at least the majority of labelers are good, and that any F that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally ef-cient manner by querying the crowd, despite high amounts of noise in the responses.

Cost-Saving Effect of Crowdsourcing Learning

This paper theoretically study the cost-saving effect of crowdsourcing learning, and presents an upper bound for the minimally-sufficient number of crowd labels for effective crowdsourcinglearning.

Error Rate Analysis of Labeling by Crowdsourcing

Finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules for the Dawid-Skene (and Symmetric DawidSkene ) crowdsourcing model are provided.

Crowdsourcing label quality: a theoretical analysis

The quality of labels inferred from crowd workers by majority voting is theoretically studied and an analysis of label quality shows that the label error rate decreases exponentially with the number of workers selected for each task.

Bandit-Based Task Assignment for Heterogeneous Crowdsourcing

This letter proposes a contextual bandit formulation for task assignment in heterogeneous crowdsourcing that is able to deal with the exploration-exploitation trade-off in worker selection and theoretically investigates the regret bounds and demonstrates its practical usefulness experimentally.

Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing

A new PAC algorithm, which, with probability at least 1 - δ, identifies a set of K arms with regret at most e.g., the sample complexity bound of the algorithm is provided and the lower bound is established, which demonstrates the superior performance of the proposed algorithm.

Majority Voting and Pairing with Multiple Noisy Labeling

This paper proposes strategies of utilizing multiple labels for supervised learning, based on two basic ideas: majority voting and pairing, and shows that pairing with Beta estimation always performs well under different certainty levels.

Multi-Armed Bandit Algorithms for Crowdsourcing Systems with Online Estimation of Workers' Ability

This work develops a notion of Limited-information Crowdsourcing Systems (LCS), where the task master can assign the work based on some knowledge of the workers' ability acquired over time, and uses the simplified bounded KUBE (B-KUBE) algorithm as a solution.

On the Cost Complexity of Crowdsourcing

The theorems provide a general theoretical method to model the trade-off between costs and quality, which can be used to evaluate and design crowdsourcing algorithms, and characterize the complexity of crowdsourcing problems.

Learning From Noisy Examples

This paper shows that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average.