Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

  title={Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing},
  author={Boaz Shmueli and Jan Fell and Soumya Ray and Lun-Wei Ku},
The use of crowdworkers in NLP research is growing rapidly, in tandem with the exponential increase in research production in machine learning and AI. Ethical discussion regarding the use of crowdworkers within the NLP research community is typically confined in scope to issues related to labor conditions such as fair pay. We draw attention to the lack of ethical considerations related to the various tasks performed by workers, including labeling, evaluation, and production. We find that the… 

Figures and Tables from this paper

Resolving the Human Subjects Status of Machine Learning's Crowdworkers

This analysis exposes a potential loophole in the Common Rule, where researchers can elude research ethics oversight by splitting data collection and analysis into distinct studies.

CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

A novel framework is introduced, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

Quantifying and Avoiding Unfair Qualification Labour in Crowdsourcing

It is found that it is possible to reduce the burden on workers while still collecting high quality data, and a study of the correlation between qualifications and work quality on two NLP tasks is conducted.

Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation

An array of literature that provides insights into ethical considerations around crowdsourced dataset annotation is surveyed, and a concrete set of recommendations and considerations for dataset developers at various stages of the ML data pipeline is put forth.

Use of Formal Ethical Reviews in NLP Literature: Historical Trends and Current Practices

A detailed quantitative and qualitative analysis of the ACL Anthology is conducted, as well as comparing the trends in the field to those of other related disciplines, such as cognitive science, machine learning, data mining, and systems.

Measuring Ethics in AI with AI: A Methodology and Dataset Construction

This paper proposes to use such newfound capabilities of AI technologies to augment the AI measuring capabilities by training a model to classify publications related to ethical issues and concerns and highlights the implications of AI metrics, in particular their contribution towards developing trustful and fair AI-based tools and technologies.

Risk-graded Safety for Handling Medical Queries in Conversational AI

A corpus of human written English language medical queries and the responses of different types of systems, labelled with both crowdsourced and expert annotations suggests that, while these tasks can be automated, caution should be exercised, as errors can potentially be very serious.

ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI

We present the first English corpus study on abusive language towards three conversational AI systems gathered ‘in the wild’: an open-domain social bot, a rule-based chatbot, and a task-based system.

Hourly Wages in Crowdworking: A Meta-Analysis

In the past decade, crowdworking on online labor market platforms has become an important source of income for a growing number of people worldwide. This development has led to increasing political

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

XTREMESPEECH, a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya, is presented and novel tasks with accompanying baselines are established and an interpretability analysis of BERT’s predictions is performed.



Ethical Considerations in NLP Shared Tasks

A number of ethical issues along with other areas of concern that are related to the competitive nature of shared tasks are presented and the development of a framework for the organisation of and participation in shared tasks that can help mitigate against these issues arising are proposed.

Mechanical Turk is Not Anonymous

While Amazon’s Mechanical Turk (AMT) online workforce has been characterized by many people as being anonymous, we expose an aspect of AMT’s system design that can be exploited to reveal a surprising

The Social Impact of Natural Language Processing

A number of social implications of NLP are identified and discussed and their ethical significance, as well as ways to address them are discussed.

Internet-based crowdsourcing and research ethics: the case for IRB review

It is demonstrated that the crowdsourcing model of research has the potential to cause harm to participants, manipulates the participant into continued participation, and uses participants as experimental subjects, and it is concluded that protocols relying on this model require institutional review board scrutiny.

Integrating Ethics into the NLP Curriculum

This tutorial is to empower NLP researchers and practitioners with tools and resources to teach others about how to ethically apply NLP techniques, and will present both high-level strategies for developing an ethics-oriented curriculum, based on experience and best practices.

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

It is found that all three of the widely-used MLMs the authors evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs, a benchmark for measuring some forms of social bias in language models against protected demographic groups in the US.

Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine?

To define precisely what MTurk is and what it is not, it is hoped that this will point out opportunities for the community to deliberately value ethics above cost savings.

Social Bias Frames: Reasoning about Social and Power Implications of Language

It is found that while state-of-the-art neural models are effective at high-level categorization of whether a given statement projects unwanted social bias, they are not effective at spelling out more detailed explanations in terms of Social Bias Frames.

Conducting behavioral research on Amazon’s Mechanical Turk

It is shown that when taken as a whole Mechanical Turk can be a useful tool for many researchers, and how the behavior of workers compares with that of experts and laboratory subjects is discussed.

Conversational Markers of Constructive Discussions

This work proposes a framework for analyzing conversational dynamics in order to determine whether a given task-oriented discussion is worth having or not, and applies it to conversations naturally occurring in an online collaborative world exploration game developed and deployed to support this research.