A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

  title={A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing},
  author={Yi Yang and Zhong-Qiu Zhao and Quan-wei Bai and Qing Liu and Weihua Li},
Due to the noises in crowdsourced labels, label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels. LA methods estimate true labels from crowdsourced labels by modeling worker qualities. Most existing LA methods are iterative in nature. They need to traverse all the crowdsourced labels multiple times in order to iteratively update true labels and worker qualities until convergence. Consequently, these methods have high space complexity O ( TM ) and time… 

Figures and Tables from this paper



Multi-Label Truth Inference for Crowdsourcing Using Mixture Models

This paper proposes two novel probabilistic models MCMLI and MCMLD to address the multi-class multi-label inference problem in crowdsourcing and shows that these models significantly outperform existing competitive alternatives.

Aggregating Crowd Wisdom with Side Information via a Clustering-based Label-aware Autoencoder

A clustering-based label-aware autoencoder to alleviate label noise and extends the framework of variational autoencoders and utilizes maximizing a posteriori (MAP) estimation, which prevents the model from overfitting and trivial solutions.

Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

It is argued that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; in response an enhanced Bayesian classifier combination (EBCC) model is proposed, with inference based on a mean-field variational approach.

Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing

Nite-sample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the Dawid-Skene crowdsourcing model are provided and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori rule.

Aggregating Crowd Wisdoms with Label-aware Autoencoders

This paper proposes a novel framework named Label-Aware Autoencoders (LAA), which integrates a classifier and a reconstructor into a unified model to infer labels in an unsupervised manner and introduces object ambiguity and latent aspects into LAA.

Truth Inference in Crowdsourcing: Is the Problem Solved?

It is believed that the truth inference problem is not fully solved, and the limitations of existing algorithms are identified and point out promising research directions.

Truth Inference at Scale: A Bayesian Model for Adjudicating Highly Redundant Crowd Annotations

A novel technique is proposed, based on a Bayesian graphical model with conjugate priors, and simple iterative expectation-maximisation inference, which produces competitive performance to the state-of-the-art benchmark methods and is the only method that significantly outperforms the majority vote heuristic at one-sided level 0.025.

Streaming Bayesian Inference for Crowdsourced Classification

Streaming Bayesian Inference for Crowdsourcing (SBIC) is proposed, a new algorithm that has low complexity and can be used in a real-time online setting and has provable asymptotic guarantees both in the online and offline settings.

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

A new algorithm is given for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers, and it is shown that the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and nonadaptive scenarios.

Crowdsourcing with Self-paced Workers

The proposed SPCrowd (Self-Paced Crowd worker) first asks workers to complete a set of golden tasks with known annotations; provides feedback to assist workers with capturing the raw modes of tasks and to spark the self-paced learning, which facilitates the estimation of workers’ quality and tasks’ difficulty.