Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification

@inproceedings{Prez2019AtalayaAS,
  title={Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification},
  author={Juan Manuel P{\'e}rez and Franco M. Luque},
  booktitle={*SEMEVAL},
  year={2019}
}
In this article, we describe our participation in HatEval, a shared task aimed at the detection of hate speech against immigrants and women. We focused on Spanish subtasks, building from our previous experiences on sentiment analysis in this language. We trained linear classifiers and Recurrent Neural Networks, using classic features, such as bag-of-words, bag-of-characters, and word embeddings, and also with recent techniques such as contextualized word representations. In particular, we… 

Figures and Tables from this paper

Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5
TLDR
Two deep neural networks (Bidirectional Gated Recurrent Unit), Character-level Convolutional Neural Network (CNN), and one machine learning model are developed by exploiting the linguistic features to detect hate speech against women and immigrants on Twitter in multilingual contexts, particularly in English and Spanish.
ANDES at SemEval-2020 Task 12: A Jointly-trained BERT Multilingual Model for Offensive Language Detection
TLDR
A single model was jointly-trained by fine-tuning Multilingual BERT to tackle the task across all the proposed languages, with a performance close to top-performing systems in spite of sharing the same parameters across all languages.
Notebook for PAN at CLEF 2021
TLDR
The HaMor submission for the Profiling Hate Speech Spreaders on Twitter task at PAN 2021 ranked as the 19th position over 66 participating teams according to the averaged accuracy value of 73% reached by the proposed models over the two languages.
Multilingual Offensive Language Identification with Cross-lingual Embeddings
TLDR
This paper takes advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources, and shows that the approach compares favorably to the best systems submitted to recent shared tasks on these three languages.
IMT Mines Ales at HASOC 2019: Automatic Hate Speech Detection
TLDR
The LGI2P team from IMT Mines Ales has trained a fastText model for each proposed language and obtained promising results on the Hindi dataset for both sub-tasks A and B of the Hate Speech and Offensive Content Identification in Indo-European Languages 2019 shared task.
Multilingual Offensive Language Identification for Low-resource Languages
TLDR
Results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task, and project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish.
Using Transfer-based Language Models to Detect Hateful and Offensive Language Online
TLDR
Four deep learners based on the Bidirectional Encoder Representations from Transformers, with either general or domain-specific language models, were tested against two datasets containing tweets labelled as either ‘Hateful’, ‘Normal’ or ‘Offensive’ and indicate that the attention-based models profoundly confuse hate speech with offensive and normal language.
Leveraging Multi-domain, Heterogeneous Data using Deep Multitask Learning for Hate Speech Detection
TLDR
A Convolution Neural Network (CNN) based multi-task learning models (MTLs) 1 to leverage information from multiple sources to obtain state-of-the-art performance with respect to the existing systems.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation
TLDR
This work presents the participation as team Atalaya in the task of polarity classification of tweets, which followed standard techniques in preprocessing, representation and classification, and also explored some novel ideas.
Deep Learning for Hate Speech Detection in Tweets
TLDR
These experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter
TLDR
The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter, and provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation
TLDR
This paper describes the system (HIT-SCIR) submitted to the CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies, which was ranked first according to LAS and outperformed the other systems by a large margin.
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
Overview of the EVALITA 2018 Hate Speech Detection Task
TLDR
The Hate Speech Detection task is a shared task on Italian social media for the detection of hateful content, and it has been proposed for the first time at EVALITA 2018, providing two datasets from two different online social platforms differently featured from the linguistic and communicative point of view.
Overview of the Task on Automatic Misogyny Identification at IberEval 2018
TLDR
The datasets, the evaluation methodology, an overview of the proposed systems and the obtained results are presented, some conclusions are drawn and the future work is discussed.
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network
TLDR
This paper introduces a new method based on a deep neural network combining convolutional and gated recurrent networks that is able to capture both word sequence and order information in short texts and sets new benchmark by outperforming on 6 out of 7 datasets by between 1 and 13% in F1.
...
...