Crowdsourced PAC Learning under Classification Noise
@inproceedings{Heinecke2019CrowdsourcedPL, title={Crowdsourced PAC Learning under Classification Noise}, author={Shelby Heinecke and L. Reyzin}, booktitle={AAAI Conference on Human Computation \& Crowdsourcing}, year={2019} }
In this paper, we analyze PAC learnability from labels produced by crowdsourcing. In our setting, unlabeled examples are drawn from a distribution and labels are crowdsourced from workers who operate under classification noise, each with their own noise parameter. We develop an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier. Our three-step algorithm incorporates majority voting, pure-exploration bandits, and noisy-PAC…
10 Citations
Ground truthing from multi-rater labeling with three-way decision and possibility theory
- Computer ScienceInf. Sci.
- 2021
Toward a Perspectivist Turn in Ground Truthing for Predictive Computing
- Computer ScienceArXiv
- 2021
This article describes and advocate for a different paradigm, which it is called data perspectivism, which moves away from traditional gold standard datasets, towards the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of ML processes.
AutoFR: Automated Filter Rule Generation for Adblocking
- Computer ScienceArXiv
- 2022
This work designs an algorithm based on multi-arm bandits to generate filter rules while controlling the trade-off between blocking ads and avoiding breakage, and introduces AutoFR, a reinforcement learning framework to fully automate the process of filter rule creation and evaluation.
Three-Way Decision for Handling Uncertainty in Machine Learning: A Narrative Review
- Computer ScienceIJCRS
- 2020
A narrative review of the state of the art of applications of TWD in regard to the different areas of concern identified by the framework is presented, and in so doing it will highlight both the points of strength of the three-way methodology, and the opportunities for further research.
A Smartphone Crowdsensing System Enabling Environmental Crowdsourcing for Municipality Resource Allocation with LSTM Stochastic Prediction
- Computer ScienceSensors
- 2020
A smartphone crowdsensing system that is based on citizens’ reactions as human sensors at the edge of a municipality infrastructure to supplement malfunctions exploiting environmental crowdsourcing location-allocation capabilities is proposed.
A smartphone-enabled crowdsensing and crowdsourcing system for predicting municipality resource allocation stochastic requirements
- Computer SciencePCI
- 2020
An inference engine model is incorporated, which is based on a Long Short-Term Memory (LSTM) Neural Network to learn stochastically examples of incidence occurrence, that is able to predict a similar event in the future thus allocates efficiently a municipality department resource before the problem emerges.
As if sand were stone. New concepts and metrics to probe the ground on which to build trustable AI
- Computer ScienceBMC Medical Informatics and Decision Making
- 2020
Different dimensions and related metrics to assess the quality of the datasets used to build predictive models and Medical Artificial Intelligence (MAI) are proposed and argued that the proposed metrics are feasible for application in real-world settings for the continuous development of trustable and interpretable MAI systems.
The Elephant in the Machine: Proposing a New Metric of Data Reliability and its Application to a Medical Case to Assess Classification Reliability
- Computer Science
- 2020
A novel reliability metric to quantify the extent a ground truth, generated in multi-rater settings, is reliable as a reliable basis for the training and validation of machine learning predictive models is presented.
Theoretical guarantee for crowdsourcing learning with unsure option
- Computer SciencePattern Recognition
- 2023
References
SHOWING 1-10 OF 24 REFERENCES
Efficient PAC Learning from the Crowd
- Computer ScienceCOLT
- 2017
It is shown that all good labelers can be identified, when at least the majority of labelers are good, and that any F that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally ef-cient manner by querying the crowd, despite high amounts of noise in the responses.
Cost-Saving Effect of Crowdsourcing Learning
- Computer ScienceIJCAI
- 2016
This paper theoretically study the cost-saving effect of crowdsourcing learning, and presents an upper bound for the minimally-sufficient number of crowd labels for effective crowdsourcinglearning.
Error Rate Analysis of Labeling by Crowdsourcing
- Computer Science
- 2013
Finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules for the Dawid-Skene (and Symmetric DawidSkene ) crowdsourcing model are provided.
Crowdsourcing label quality: a theoretical analysis
- BiologyScience China Information Sciences
- 2015
The quality of labels inferred from crowd workers by majority voting is theoretically studied and an analysis of label quality shows that the label error rate decreases exponentially with the number of workers selected for each task.
Bandit-Based Task Assignment for Heterogeneous Crowdsourcing
- Computer Science, BusinessNeural Computation
- 2015
This letter proposes a contextual bandit formulation for task assignment in heterogeneous crowdsourcing that is able to deal with the exploration-exploitation trade-off in worker selection and theoretically investigates the regret bounds and demonstrates its practical usefulness experimentally.
Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing
- Computer ScienceICML
- 2014
A new PAC algorithm, which, with probability at least 1 - δ, identifies a set of K arms with regret at most e.g., the sample complexity bound of the algorithm is provided and the lower bound is established, which demonstrates the superior performance of the proposed algorithm.
Majority Voting and Pairing with Multiple Noisy Labeling
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2019
This paper proposes strategies of utilizing multiple labels for supervised learning, based on two basic ideas: majority voting and pairing, and shows that pairing with Beta estimation always performs well under different certainty levels.
Multi-Armed Bandit Algorithms for Crowdsourcing Systems with Online Estimation of Workers' Ability
- Computer ScienceAAMAS
- 2018
This work develops a notion of Limited-information Crowdsourcing Systems (LCS), where the task master can assign the work based on some knowledge of the workers' ability acquired over time, and uses the simplified bounded KUBE (B-KUBE) algorithm as a solution.
On the Cost Complexity of Crowdsourcing
- Computer ScienceIJCAI
- 2018
The theorems provide a general theoretical method to model the trade-off between costs and quality, which can be used to evaluate and design crowdsourcing algorithms, and characterize the complexity of crowdsourcing problems.
Learning From Noisy Examples
- Computer ScienceMachine Learning
- 2005
This paper shows that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average.