Fast, Accurate, and Healthier: Interactive Blurring Helps Moderators Reduce Exposure to Harmful Content

  title={Fast, Accurate, and Healthier: Interactive Blurring Helps Moderators Reduce Exposure to Harmful Content},
  author={Anubrata Das and Brandon Dang and Matthew Lease},
  booktitle={AAAI Conference on Human Computation \& Crowdsourcing},
While most user content posted on social media is benign, other content, such as violent or adult imagery, must be detected and blocked. Unfortunately, such detection is difficult to automate, due to high accuracy requirements, costs of errors, and nuanced rules for acceptable content. Consequently, social media platforms today rely on a vast workforce of human moderators. However, mounting evidence suggests that exposure to disturbing content can cause lasting psychological and emotional… 

Figures and Tables from this paper

The Psychological Well-Being of Content Moderators

An estimated 100,000 people work today as commercial content moderators. These moderators are often exposed to disturbing content, which can lead to lasting psychological and emotional distress. This

The Psychological Well-Being of Content Moderators: The Emotional Labor of Commercial Moderation and Avenues for Improving Support

This literature review investigates moderators’ psychological symptomatology, drawing on other occupations involving trauma exposure to further guide understanding of both symptoms and support mechanisms, and introduces wellness interventions.

Awe Versus Aww: The Effectiveness of Two Kinds of Positive Emotional Stimulation on Stress Reduction for Online Content Moderators

When people have the freedom to create and post content on the internet, particularly anonymously, they do not always respect the rules and regulations of the websites on which they post, leaving

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

This paper proposes the CM-Refinery framework that leverages large-scale multimedia datasets to automatically extend initial training datasets with hard examples that can refine content moderation models, while significantly reducing the involvement of human annotators.

Handling and Presenting Harmful Text

Practical advice is provided on how textual harms should be handled, presented, and discussed and H ARM C HECK, a resource for reflecting on research into textual harms, is introduced to encourage ethical, responsible, and respectful research in the NLP community.

Handling and Presenting Harmful Text in NLP Research

This work provides an analytical framework categorising harms on three axes, and introduces \textsc{HarmCheck} -- a documentation standard for handling and presenting harmful text in research.

(Re)Politicizing Digital Well-Being: Beyond User Engagements

The psychological costs of the attention economy are often considered through the binary of harmful design and healthy use, with digital well-being chiefly characterised as a matter of personal

"Give Everybody [..] a Little Bit More Equity": Content Creator Perspectives and Responses to the Algorithmic Demonetization of Content Associated with Disadvantaged Groups

It was found creators had concerns about YouTube's algorithmic system stereotyping content featuring vulnerable demographics in harmful ways -- creators believed these demonetization errors led to a range of economic, social, and personal harms.

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

It is found that the RLHF models are increasinglycult to red team as they scale, and a trend with scale for the other model types is found, which indicates that this transparency accelerates the ability to work together as a community in order to develop shared norms, practices, and technical standards.

A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues

The experimental results show that HMCEval achieves around 99% evaluation accuracy with half of the human effort spared, showing thatHMCEval provides reliable evaluation outcomes while reducing human effort by a large amount.



But Who Protects the Moderators? The Case of Crowdsourced Image Moderation

This work designs and conducts experiments in which blurred graphic and non-graphic images are filtered by human moderators on Amazon Mechanical Turk (AMT), and observes how obfuscation affects the moderation experience with respect to image classification accuracy, interface usability, and worker emotional well-being.

Testing Stylistic Interventions to Reduce Emotional Impact of Content Moderation Workers

It is found that simple grayscale transformations can provide an easy to implement and use solution that can significantly change the emotional impact of content reviews, and that a full blur intervention can be challenging to reviewers.

Who moderates the moderators?: crowdsourcing abuse detection in user-generated content

This paper introduces a framework to address the problem of moderating online content using crowdsourced ratings, and presents efficient algorithms to accurately detect abuse that only require knowledge about the identity of a single 'good' agent, who rates contributions accurately more than half the time.

Volunteer Moderators in Twitch Micro Communities: How They Get Involved, the Roles They Play, and the Emotional Labor They Experience

This study reports on interviews with 20 people who moderate for Twitch micro communities, defined as channels that are built around a single or group of streamers, rather than the broadcast of an event.

EmoMadrid: An emotional pictures database for affect research

Emotional scenes are, along with facial expressions, the most employed stimuli in Affective Sciences. However, as compared to facial expressions, available emotional scene databases are scarce and,

Digital detritus: 'Error' and the logic of opacity in social media content moderation

It is argued that its value to the platform as a potentially revenue-generating commodity is actually the key criterion and the one to which all moderation decisions are ultimately reduced, resulting in commercialized online spaces that have far less to offer in terms of political and democratic challenge to the status quo and which may serve to reify and consolidate power rather than confront it.

Fast violence detection in video

Inspired by psychology results that suggest that kinematic features alone are discriminant for specific actions, this work proposes a novel method which uses extreme acceleration patterns as the main feature and is at least 15 times faster than current generic action recognition methods.

Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter

It is found that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems training on expert annotations outperform systems trained on amateur annotations.

Moderation Practices as Emotional Labor in Sustaining Online Communities: The Case of AAPI Identity Work on Reddit

Recommendations for improving moderation in online communities centered around identity work are provided and implications of emotional labor in the design of Reddit and similar platforms are discussed.

Baseline Results for Violence Detection in Still Images

A new database is established, using the Bag-of-Words (BoW) model which is frequently adopted in image classification domain to discriminate violence images and non-violence images and the effectiveness of four different feature representations are tested within the BoW framework.