An Exploratory Analysis of the Relation between Offensive Language and Mental Health

  title={An Exploratory Analysis of the Relation between Offensive Language and Mental Health},
  author={Ana-Maria Bucur and Marcos Zampieri and Liviu P. Dinu},
In this paper, we analyze the interplay between the use of offensive language and mental health. We acquired publicly available datasets created for offensive language identification and depression detection and we train computational models to compare the use of offensive language in social media posts written by groups of individuals with and without self-reported depression diagnosis. We also look at samples written by groups of individuals whose posts show signs of depression according to… 

Figures and Tables from this paper

A Psychologically Informed Part-of-Speech Analysis of Depression in Social Media

An extensive part-of-speech analysis of the discourse of social media users with depression, providing insights regarding the way in which depressed individuals are expressing themselves on social media platforms, allowing for better-informed computational models to help monitor and prevent mental illnesses.

Early Risk Detection of Pathological Gambling, Self-Harm and Depression Using BERT

The contributions of the BLUE team are presented in the 2021 edition of the eRisk workshop, in which they tackle the problems of early detection of gambling addiction, self-harm and estimating depression severity from social media posts.

Natural language processing as a tool to identify the Reddit particularities of cancer survivors around the time of diagnosis and remission: A pilot study

In the current study, we analyzed 15297 texts from 39 cancer survivors who posted or commented on Reddit in order to detect the language particularities of cancer survivors from online discourse. We

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

MOLD, the Marathi Offensive Language Dataset, is introduced, the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages.

A Computational Exploration of Pejorative Language in Social Media

In this paper we study pejorative language, an under-explored topic in computational linguistics. Unlike existing models of offensive language and hate speech, pejorative language manifests itself

Hostility Detection in Online Hindi-English Code-Mixed Conversations

This paper proposes a novel hierarchical neural network architecture to identify hostile posts/comments/replies in online Hindi-English Code-Mixed conversations and leverages large multilingual pre-trained (mLPT) models like mBERT, XLMR, and MuRIL to do so.

WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments

This paper addresses the identification of toxic, engaging, and fact-claiming comments on social media using large pre-trained transformer models and multitask learning.

Sequence-to-Sequence Lexical Normalization with Multilingual Transformers

This work proposes a sentence-level sequence-to-sequence model based on mBART, which frames the problem as a machine translation problem, and improves performance on extrinsic, downstream tasks through normalization compared to models operating on raw, unprocessed, social media text.

Influence of El Niño decaying pace on low latitude tropical cyclogenesis over the western North Pacific

The modulation of El Niño decaying pace on tropical cyclones (TCs) activities at low latitude (within 10°) over the western North Pacific (WNP) is investigated in this study. During rapidly decaying

Life is not Always Depressing: Exploring the Happy Moments of People Diagnosed with Depression

In this work, we explore the relationship between depression and manifestations of happiness in social media. While the majority of works surrounding depression focus on symptoms, psychological



A Dataset for Research on Depression in Social Media

A methodology for automatically collecting large samples of depression and non-depression posts from online social media is presented and a benchmark is performed on the dataset to establish a point of reference for researchers who are interested in using it.

A Test Collection for Research on Depression and Language Use

A novel early detection task is proposed and a novel effectiveness measure is defined to systematically compare early detection algorithms that takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases.

The role of personality, age, and gender in tweeting about mental illness

Language-derived personality and demographic estimates show surprisingly strong performance in distinguishing users that tweet a diagnosis of depression or PTSD from random controls, reaching an area under the receiveroperating characteristic curve ‐ AUC ‐ of around .8 in all the authors' binary classification tasks.

Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media

This paper develops a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation, and utilizes semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts.

CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts

The shared task for the 2019 Workshop on Computational Linguistics and Clinical Psychology (CLPsych’19) introduced an assessment of suicide risk based on social media postings, using data from Reddit

CLPsych 2015 Shared Task: Depression and PTSD on Twitter

This paper presents a summary of the Computational Linguistics and Clinical Psychology (CLPsych) 2015 shared and unshared tasks. These tasks aimed to provide apples-to-apples comparisons of various

Predicting Depression via Social Media

It is found that social media contains useful signals for characterizing the onset of depression in individuals, as measured through decrease in social activity, raised negative affect, highly clustered egonetworks, heightened relational and medicinal concerns, and greater expression of religious involvement.

Methods in predictive techniques for mental health status on social media: a critical review

A systematic literature review of the state-of-the-art in predicting mental health status using social media data, focusing on characteristics of the study design, methods, and research design finds 75 studies in this area published between 2013 and 2018.

Depression and Self-Harm Risk Assessment in Online Forums

This work introduces a large-scale general forum dataset consisting of users with self-reported depression diagnoses matched with control users, and proposes methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrates that this approach outperforms strong previously proposed methods.

Offensive Language Identification in Greek

OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive, and is evaluated by several computational models trained and tested on this data.