Corpus ID: 49556671

Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification

  title={Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification},
  author={Vaibhav Sinha and Sukrut Rao and V. Balasubramanian},
  journal={arXiv: Machine Learning},
Many real world problems can now be effectively solved using supervised machine learning. A major roadblock is often the lack of an adequate quantity of labeled data for training. A possible solution is to assign the task of labeling data to a crowd, and then infer the true label using aggregation methods. A well-known approach for aggregation is the Dawid-Skene (DS) algorithm, which is based on the principle of Expectation-Maximization (EM). We propose a new simple, yet effective, EM-based… Expand
Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution
A scalable methodology for estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task and reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies is developed and implemented. Expand
Discovering Biased News Articles Leveraging Multiple Human Annotations
The goal is to compare domain experts to crowd workers and also to prove that media bias can be detected automatically, and to contribute to a trustworthy media ecosystem by automatically identifying politically biased news articles. Expand
Some people aren't worth listening to: periodically retraining classifiers with feedback from a team of end users
A classifier is demonstrated that can learn which users tend to be unreliable, filtering their feedback out of the loop, thus improving performance in subsequent iterations. Expand


Active Learning for Crowd-Sourced Databases
Two new active learning algorithms are presented to combine humans and algorithms together in a crowd-sourced database, based on the theory of non-parametric bootstrap, which makes their results applicable to a broad class of machine learning models. Expand
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
A meta-algorithm is applied, based on a metric labeling formulation of the rating-inference problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. Expand
Variational Inference for Crowdsourcing
By choosing the prior properly, both BP and MF perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions. Expand
Learning Supervised Topic Models for Classification and Regression from Crowds
This article proposes two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds and develops an efficient stochastic variational inference algorithm. Expand
Using Crowdsourcing and Active Learning to Track Sentiment in Online Media
A system for tracking economic sentiment in online media that has been deployed since August 2009 is described, which uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items. Expand
Online crowdsourcing: Rating annotators and obtaining cost-effective labels
  • P. Welinder, P. Perona
  • Computer Science
  • 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops
  • 2010
A model of the labeling process which includes label uncertainty, as well a multi-dimensional measure of the annotators' ability is proposed, from which an online algorithm is derived that estimates the most likely value of the labels and the annotator abilities. Expand
Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels
Crowdsourcing has become a primary means for label collection in many real-world machine learning applications. A classical method for inferring the true labels from the noisy labels provided byExpand
The Multidimensional Wisdom of Crowds
A method for estimating the underlying value of each image from (noisy) annotations provided by multiple annotators, based on a model of the image formation and annotation process, which predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Expand
Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing
Experimental results demonstrate that the proposed algorithm for multi-class crowd labeling problems is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods. Expand
Deep learning from crowds
A novel general-purpose crowd layer is proposed, which allows us to train deep neural networks end-to-end, directly from the noisy labels of multiple annotators, using only backpropagation. Expand