Modeling sequential annotations for sequence labeling with crowds

  title={Modeling sequential annotations for sequence labeling with crowds},
  author={Xiaolei Lu and Tommy W. S. Chow},
  journal={IEEE transactions on cybernetics},
  • Xiaolei LuT. Chow
  • Published 19 October 2021
  • Computer Science
  • IEEE transactions on cybernetics
Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations, the quality of label sequence relies on the expertise level of annotators in capturing internal dependencies for each token in the sequence. In this article, we propose modeling sequential annotation for sequence labeling with crowds (SA-SLC). First, a conditional probabilistic model is developed to… 
1 Citations

Classification-oriented dawid skene model for transferring intelligence from crowds to machines

A Classification-Oriented Dawid Skene (CODS) model is developed, which achieves the three objectives simultaneously in this context, namely, to learn a classifier that is capable of labelling future items without further assistance of crowd workers.



A Bayesian Approach for Sequence Tagging with Crowds

This work proposes a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels and finds that this approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.

Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling

A novel framework Consensus Network that can be trained on annotations from multiple sources and dynamically aggregates source-specific knowledge by a context-aware attention module that leads to a model reflecting the agreement among multiple sources is proposed.

Sembler: Ensembling Crowd Sequential Labeling for Improved Quality

The proposed Sembler model, a statistical model for ensembling crowd sequential labelings, is evaluated on a real Twitter and a synthetical biological data set, and finds that Sembler is particularly accurate when more than half of annotators make mistakes.

Sequence labeling with multiple annotators

A probabilistic approach for sequence labeling using Conditional Random Fields (CRF) for situations where label sequences from multiple annotators are available but there is no actual ground truth.

Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks

An empirical Bayesian algorithm called SpEM is proposed that iteratively eliminates the spammers and estimates the consensus labels based only on the good annotators and is motivated by defining a spammer score that can be used to rank the annotators.

Aggregating and Predicting Sequence Labels from Crowd Annotations

A suite of methods for aggregating sequential crowd labels to infer a best single set of consensus annotations and using crowd annotations as training data for a model that can predict sequences in unannotated text are evaluated.

Learning From Crowds

A probabilistic approach for supervised learning when the authors have multiple annotators providing (possibly noisy) labels but no absolute gold standard, and experimental results indicate that the proposed method is superior to the commonly used majority voting baseline.

Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.

Modeling annotator expertise: Learning when everybody knows a bit of something

This paper develops a probabilistic approach to this problem when annotators may be unreliable, but also their expertise varies depending on the data they observe, which provides clear advantages over previously introduced multi-annotator methods.

Learning from crowdsourced labeled data: a survey

This survey introduces the basic concepts of the qualities of labels and learning models, and introduces open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools.