• Publications
  • Influence
Investigation and modeling of the structure of texting language
TLDR
The nature and type of compressions used in SMS texts are investigated, and a Hidden Markov Model based word-model for TL is developed, which results in a 35% reduction of the relative word level error rates.
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
TLDR
HateXplain is introduced, the first benchmark hate speech dataset covering multiple aspects of the issue, and it is observed that existing state-of-the-art models, which utilize the human rationales for training, perform better in re- ducing unintended bias towards target communities.
On the permanence of vertices in network communities
TLDR
Compared to other metrics, permanence provides a more accurate estimate of a derived community structure to the ground-truth community and is more sensitive to perturbations in the network.
Spread of Hate Speech in Online Social Media
TLDR
This study performs the first cross-sectional view of how hateful users diffuse hate content in online social media on Gab and finds that the hateful users are far more densely connected among themselves.
An automatic approach to identify word sense changes in text media across timescales
TLDR
An unsupervised and automated method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books and millions of tweets posted per day is proposed.
Metrics for Community Analysis
TLDR
A survey of the start-of-the-art metrics used for the detection and the evaluation of community structure and a comparative analysis of these metrics in measuring the goodness of the underlying community structure is presented.
Thou shalt not hate: Countering Online Hate Speech
TLDR
This paper creates and releases the first ever dataset for counterspeech using comments from YouTube, and performs a rigorous measurement study characterizing the linguistic structure of counterspeeches for the first time.
That’s sick dude!: Automatic identification of word sense change across different timescales
TLDR
An unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books is proposed and can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.
Hateminers : Detecting Hate speech against Women
TLDR
The machine learning models developed for the Automatic Misogyny Identification (AMI) shared task at EVALITA 2018 are presented and the winning model is released for public use.
Hate begets Hate: A Temporal Study of Hate Speech.
TLDR
The first temporal analysis of hate speech on Gab.com, a social media site with very loose moderation policy, generates temporal snapshots of Gab from millions of posts and users and calculates an activity vector based on DeGroot model to identify hateful users.
...
...