Corpus ID: 1623378

Dynamic Allocation of Crowd Contributions for Sentiment Analysis during the 2016 U.S. Presidential Election

@article{Sameki2016DynamicAO,
  title={Dynamic Allocation of Crowd Contributions for Sentiment Analysis during the 2016 U.S. Presidential Election},
  author={M. Sameki and M. Gentil and Kate K. Mays and Lei Guo and Margrit Betke},
  journal={ArXiv},
  year={2016},
  volume={abs/1608.08953}
}
Opinions about the 2016 U.S. Presidential Candidates have been expressed in millions of tweets that are challenging to analyze automatically. Crowdsourcing the analysis of political tweets effectively is also difficult, due to large inter-rater disagreements when sarcasm is involved. Each tweet is typically analyzed by a fixed number of workers and majority voting. We here propose a crowdsourcing framework that instead uses a dynamic allocation of the number of workers. We explore two dynamic… Expand
How Does Tweet Difficulty Affect Labeling Performance of Annotators?
TLDR
It is found that there is indeed a relationship between both factors, assuming that annotators have labeled some tweets before: labels assigned to easy tweets are more reliable than those assigned to difficult tweets. Expand
BUOCA: Budget-Optimized Crowd Worker Allocation
TLDR
This work proposes an algorithm that computes budget-optimized crowd worker allocation (BUOCA) and trains a machine learning system that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling, and envisages a human-machine system for performing budget- Optimized data analysis at a scale beyond the feasibility of crowdsourcing. Expand
Predicting Worker Disagreement for More Effective Crowd Labeling
TLDR
A crowd labeling framework is proposed: a disagreement predictor is trained on a small seed of documents, and this predictor is used to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. Expand
How do annotators label short texts? Toward understanding the temporal dynamics of tweet labeling
TLDR
Crowdsourcing is a popular means to obtain human-crafted information, for example labels of tweets, which can then be used in text mining tasks, and it is shown that annotators undergo two phases, a learning phase during which they build a conceptual model of the characteristics determining the sentiment of a tweet, and an exploitation phase duringwhich they use their conceptual model. Expand
Analyzing crowd workers' learning behavior to obtain more reliable labels
TLDR
This thesis explores label reliability from two perspectives: first, how the label reliability of crowd workers develops over time during an actual labeling task, and second how it is affected by the difficulty of the documents to be labeled. Expand
Performance Comparison of Crowdworkers and NLP Tools onNamed-Entity Recognition and Sentiment Analysis of Political Tweets
We report results of a comparison of the accuracy of crowdworkers and seven Natural Language Processing (NLP) toolkits in solving two important NLP tasks, named-entity recognition (NER) andExpand

References

SHOWING 1-10 OF 20 REFERENCES
A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
TLDR
A system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a micro-blogging service, offers a new and timely perspective on the dynamics of the electoral process and public opinion. Expand
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
TLDR
It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions. Expand
Semi-supervised recognition of sarcastic sentences in Twitter and Amazon
TLDR
This paper experiments with semi-supervised sarcasm identification on two very different data sets: a collection of 5.9 million tweets collected from Twitter, and aCollection of 66000 product reviews from Amazon. Expand
Online Task Assignment in Crowdsourcing Markets
TLDR
This work presents a two-phase exploration-exploitation assignment algorithm and proves that it is competitive with respect to the optimal offline algorithm which has access to the unknown skill levels of each worker. Expand
Big Social Data Analytics in Journalism and Mass Communication
This article presents an empirical study that investigated and compared two “big data” text analysis methods: dictionary-based analysis, perhaps the most popular automated analysis approach in socialExpand
Joint Crowdsourcing of Multiple Tasks
TLDR
This paper proposes a framework called JOCR (Joint Crowdsourcing, pronounced as “Joker”) for analyzing joint allocations of many tasks to a pool of workers, and poses the challenge of developing efficient algorithms for it. Expand
Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria
TLDR
An empirical study is conducted to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency. Expand
Affective News: The Automated Coding of Sentiment in Political Texts
TLDR
The objective here is to outline and validate a new automated measurement instrument for sentiment analysis in political texts using a dictionary-based approach consisting of a simple word count of the frequency of keywords in a text from a predefined dictionary. Expand
Efficient crowdsourcing for multi-class labeling
TLDR
It is shown that it is possible to obtain an answer to each task correctly with probability 1-ε as long as the redundancy per task is O((K/q) log (K/ε)), where each task can have any of the $K$ distinct answers equally likely, q is the crowd-quality parameter that is defined through a probabilistic model. Expand
Identifying Sarcasm in Twitter: A Closer Look
TLDR
This work reports on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author and uses this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. Expand
...
1
2
...