• Corpus ID: 248478175

Handling and Presenting Harmful Text

@article{Derczynski2022HandlingAP,
  title={Handling and Presenting Harmful Text},
  author={Leon Derczynski and Hannah Rose Kirk and Abeba Birhane and Bertie Vidgen},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.14256}
}
Textual data can pose a risk of serious harm. These harms can be categorised along three axes: (1) the harm type (e.g. misinformation, hate speech or racial stereotypes) (2) whether it is elicited as a feature of the research de-sign from directly studying harmful content (e.g. training a hate speech classifier or auditing unfiltered large-scale datasets) versus spuriously invoked from working on unrelated problems (e.g. language generation or part of speech tagging) but with datasets that… 
1 Citations

Figures from this paper

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate

TLDR
HemojiCheck is presented, a test suite of 3,930 short-form statements that allows us to evaluate performance on hateful language expressed with emoji, and the HatemojiBuild dataset is created using a human-and-model-in-the-loop approach to address weaknesses in existing hate detection models.

References

SHOWING 1-10 OF 63 REFERENCES

Ethical and social risks of harm from Language Models

TLDR
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs) by analyzing a wide range of established and anticipated risks, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.

Directions in abusive language training data, a systematic review: Garbage in, garbage out

TLDR
This paper systematically reviews 63 publicly available training datasets which have been created to train abusive language classifiers and reports on creation of a dedicated website for cataloguing abusive language data hatespeechdata.com.

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

TLDR
A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.

The Harm in Hate Speech

The Harm in Hate Speech. By Jeremy Waldron. Cambridge: Harvard University Press, 2012. 292 pp. $26.95 cloth.This well-crafted volume by Jeremy Waldron, who teaches law and philosophy at New York

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science

TLDR
It is argued that data statements will help alleviate issues related to exclusion and bias in language technology, lead to better precision in claims about how natural language processing research can generalize and thus better engineering results, protect companies from public embarrassment, and ultimately lead to language technology that meets its users in their own preferred linguistic style.

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

TLDR
An indepth analysis of GPT-2, which is the most downloaded text generation model on HuggingFace, and assesses biases related to occupational associations for different protected categories by intersecting gender with religion, sexuality, ethnicity, political affiliation, and continental name origin.

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

TLDR
This work manually audit the quality of 205 language-specific corpora released with five major public datasets and recommends techniques to evaluate and improve multilingual corpora and discusses potential risks that come with low-quality data releases.

Annotating Online Misogyny

TLDR
A comprehensive taxonomy of labels for annotating misogyny in natural written language is presented, and a high-quality dataset of annotated posts sampled from social media posts is introduced.

To “See” is to Stereotype Image Tagging Algorithms, Gender Recognition, and the Accuracy – Fairness Trade-off

TLDR
Evaluating five proprietary algorithms for tagging images, it is found that in three, gender inference is hindered when a background is introduced, and it is the one whose output is most consistent with human stereotyping processes that is superior in recognizing gender.

Lessons from archives: strategies for collecting sociocultural data in machine learning

TLDR
It is argued that a new specialization should be formed within ML that is focused on methodologies for data collection and annotation: efforts that require institutional frameworks and procedures for sociocultural data collection.
...