• Corpus ID: 10403524

Efficient PAC Learning from the Crowd

@inproceedings{Awasthi2017EfficientPL,
  title={Efficient PAC Learning from the Crowd},
  author={Pranjal Awasthi and Avrim Blum and Nika Haghtalab and Y. Mansour},
  booktitle={COLT},
  year={2017}
}
In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in… 
Crowdsourced PAC Learning under Classification Noise
TLDR
This paper develops an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier that incorporates majority voting, pure-exploration bandits, and noisy-PAC learning.
Semi-verified Learning from the Crowd with Pairwise Comparisons
TLDR
This work investigates the significantly more challenging case that the majority of the crowd are incorrect which renders learning impossible in general and shows that under the semi-verified model of Charikar et al. (2017), it is possible to learn the underlying function while the labeling cost is significantly mitigated by the enriched and more easily obtained queries.
Learning Halfspaces with Pairwise Comparisons: Breaking the Barriers of Query Complexity via Crowd Wisdom
TLDR
This paper shows that even when both labels and comparisons are corrupted by Massart noise, there is a polynomial-time algorithm that provably learns the underlying halfspace with near-optimal query complexity and noise tolerance, under the distribution-independent setting.
Robust Learning from Untrusted Sources
TLDR
This work derives a procedure that allows for learning from all available sources, yet automatically suppresses irrelevant or corrupted data, and shows that this method provides significant improvements over alternative approaches from robust statistics and distributed optimization.
Robust Density Estimation from Batches: The Best Things in Life are (Nearly) Free
TLDR
This work proves that up to logarithmic factors, the optimal sample complexity is the same as for genuine, non-adversarial, data, and implies the first polynomial-time sample-optimal algorithm for robust interval-based classification based on batched data.
Distinguishing Distributions When Samples Are Strategically Transformed
TLDR
In this paper, necessary and sufficient conditions for when the principal can distinguish between agents of ``good'' and ``bad'' types are given, when the type affects the distribution of samples that the agent has access to.
Three-Way Decision for Handling Uncertainty in Machine Learning: A Narrative Review
TLDR
A narrative review of the state of the art of applications of TWD in regard to the different areas of concern identified by the framework is presented, and in so doing it will highlight both the points of strength of the three-way methodology, and the opportunities for further research.
On the Sample Complexity of Adversarial Multi-Source PAC Learning
TLDR
A generalization bound is provided that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds that show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
Noise in Classification
TLDR
This chapter considers the computational and statistical aspects of learning linear thresholds in presence of noise and discusses approaches for dealing with negative results by exploiting natural assumptions on the data-generating process.

References

SHOWING 1-10 OF 33 REFERENCES
Vox Populi: Collecting High-Quality Labels from a Crowd
TLDR
This paper studies the problem of pruning low-quality teachers in a crowd, in order to improve the label quality of the training set, and shows that this is in fact achievable with a simple and efficient algorithm, which does not require that each example be repeatedly labeled by multiple teachers.
Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems
TLDR
A new algorithm is given for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers, and it is shown that the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and nonadaptive scenarios.
Adaptive Task Assignment for Crowdsourced Classification
TLDR
This work investigates the problem of task assignment and label inference for heterogeneous classification tasks and derives a provably near-optimal adaptive assignment algorithm that can lead to more accurate predictions at a lower cost when the available workers are diverse.
Online decision making in crowdsourcing markets: theoretical challenges
TLDR
This paper grew out of the authors' own frustration with modeling issues that inhibit theoretical research on online decision making for crowdsourcing, and is hoped it will encourage the community to attempt to understand, debate, and ultimately address them.
Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction
TLDR
A crowdsourcing model in which workers are asked to rate the quality of items previously generated by other workers, and it is shown that this is possible with an amount of work required of the manager, and each worker, that does not scale with $n$.
Agnostic active learning
TLDR
This work state and analyze the first active learning algorithm that finds an @e-optimal hypothesis in any hypothesis class, when the underlying distribution has arbitrary forms of noise, and achieves an exponential improvement over the usual sample complexity of supervised learning.
Quality management on Amazon Mechanical Turk
TLDR
This work presents algorithms that improve the existing state-of-the-art techniques, enabling the separation of bias and error, and illustrates how to incorporate cost-sensitive classification errors in the overall framework and how to seamlessly integrate unsupervised and supervised techniques for inferring the quality of the workers.
Efficient Learning of Linear Separators under Bounded Noise
TLDR
This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research.
Bandits with Knapsacks: Dynamic procurement for crowdsourcing∗
In a basic version of the dynamic procurement problem, the algorithm has a budget B to spend, and is facing n agents (potential sellers) that are arriving sequentially. The algorithm offers a
The Strength of Weak Learnability
This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly
...
1
2
3
4
...