Improve Learning from Crowds via Generative Augmentation

  title={Improve Learning from Crowds via Generative Augmentation},
  author={Zhendong Chu and Hongning Wang},
  journal={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
  • Zhendong Chu, Hongning Wang
  • Published 2021
  • Computer Science
  • Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw… Expand

Figures and Tables from this paper


Deep learning from crowds
A novel general-purpose crowd layer is proposed, which allows us to train deep neural networks end-to-end, directly from the noisy labels of multiple annotators, using only backpropagation. Expand
Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds
This work proposes an information theoretic approach, Max-MIG, for joint learning from crowds, with a common assumption: the crowdsourcing labels and the data are independent conditioning on the ground truth, and devise an accurate data-crowds forecaster that employs both the data and the crowdsourced labels to forecast the groundtruth. Expand
Data Augmentation Generative Adversarial Networks
It is shown that a Data Augmentation Generative Adversarial Network (DAGAN) augments standard vanilla classifiers well and can enhance few-shot learning systems such as Matching Networks. Expand
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images
An experimental study on learning from crowds that handles data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet), which gives valuable insights into the functionality of deep CNN learning from crowd annotations and proves the necessity of data aggregation integration. Expand
Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
A minimax error rate is derived under more practical setting for a broader class of crowdsourcing models that includes the Dawid and Skene model as a special case and a worker clustering model is proposed, which is more practical than the DS model under real crowdsourcing settings. Expand
Learning From Crowds
A probabilistic approach for supervised learning when the authors have multiple annotators providing (possibly noisy) labels but no absolute gold standard, and experimental results indicate that the proposed method is superior to the commonly used majority voting baseline. Expand
Active Learning from Crowds
A probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate across the task domain is employed. Expand
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data. Our approach is based on an objective function that trades-off mutual informationExpand
L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise
A novel information-theoretic loss function, L_DMI, is proposed, which is the first loss function that is provably robust to instance-independent label noise, regardless of noise pattern, and it can be applied to any existing classification neural networks straightforwardly without any auxiliary information. Expand
Semi-Supervised Learning with Generative Adversarial Networks
This work extends Generative Adversarial Networks to the semi-supervised context by forcing the discriminator network to output class labels and shows that this method can be used to create a more data-efficient classifier and that it allows for generating higher quality samples than a regular GAN. Expand