Corpus ID: 209421111

3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identification in Indo-European Languages

@inproceedings{Mishra20193IdiotsAH,
  title={3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identification in Indo-European Languages},
  author={Shubhanshu Mishra},
  booktitle={FIRE},
  year={2019}
}
We describe our team 3Idiots’s approach for participating in the 2019 shared task on hate speech and offensive content (HASOC) identification in Indo-European languages. Our approach relies on finetuning pre-trained monolingual and multilingual transformer (BERT) based neural network models. Furthermore, we also investigate an approach based on labels joined from all sub-tasks. This resulted in good performance on the test set. Among the eight shared tasks, our solution won the first place for… Expand
AI_ML_NIT_Patna @HASOC 2020: BERT Models for Hate Speech Identification in Indo-European Languages
The current paper describes the system submitted by team AI_ML_NIT_Patna. The task aims to identify offensive language in code-mixed dataset of comments in Indo-European languages offered forExpand
Exploring multi-task multi-lingual learning of transformer models for hate speech and offensive speech identification in social media
TLDR
A multi-task and multilingual approach based on recently proposed Transformer Neural Networks to solve three sub-tasks for hate speech to show that it is possible to to utilize different combined approaches to obtain models that can generalize easily on different languages and tasks, while trading off slight accuracy for a much reduced inference time compute cost. Expand
Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages
TLDR
The HASOC track intends to stimulate development in Hate Speech for Hindi, German and English by identifying Hate Speech in Social Media using LSTM networks processing word embedding input. Expand
TUB at HASOC 2020: Character based LSTM for Hate Speech Detection in Indo-European Languages
TLDR
Among the state-of-the-art deep learning models that have been used for the experiments, the character based LSTM achieved the best results on detecting hate speech contents in tweets. Expand
Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020
TLDR
The multilingual joint training approach to be the best trade-off between computational efficiency of model deployment and model’s evaluation performance and the utility of task label marginalization, joint label classification, and joint training on multilingual datasets. Expand
NSIT & IIITDWD @ HASOC 2020: Deep learning model for hate-speech identification in Indo-European languages
TLDR
The target is to present deep learning models to detect hate speech and offensive content in three languages English, Hindi, and German using Convolutional Neural Networks, Bi-directional long short term memory, and hybrid models (CNN+BiLSTM). Expand
Siva@HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text
TLDR
This paper proposes a novel and flexible approach of selective translation and transliteration to be able to reap better results out of fine-tuning and ensembling multilingual transformer networks like XLM-RoBERTa and mBERT. Expand
FlorUniTo@TRAC-2: Retrofitting Word Embeddings on an Abusive Lexicon for Aggressive Language Detection
TLDR
The FlorUniTo team investigated the applicability of using an abusive lexicon to enhance word embeddings towards improving detection of aggressive language, showing promising improvements across languages. Expand
Non-neural Structured Prediction for Event Detection from News in Indian Languages
TLDR
This method is based on structured prediction using only word n-gram and regex features and does not rely on any latest deep learning or neural network methods, and was the best performing system on all tasks and languages. Expand
Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection
TLDR
An exhaustive exploration of different transformer models for Offensive Language Identification in Dravidian Languages at EACL 2021 is presented and a genetic algorithm technique for ensembling different models is provided. Expand
...
1
2
...

References

SHOWING 1-8 OF 8 REFERENCES
Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages
TLDR
The HASOC track intends to stimulate development in Hate Speech for Hindi, German and English by identifying Hate Speech in Social Media using LSTM networks processing word embedding input. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Semi-supervised Named Entity Recognition in noisy-text
TLDR
The models described in this paper are based on linear chain conditional random fields (CRFs), use the BIEOU encoding scheme, and leverage random feature dropout for up-sampling the training data, and include word clusters and pre-trained distributed word representations, updated gazetteer features, and global context predictions. Expand
Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets
TLDR
Effectiveness of multi-dataset-multi-task learning in training neural models for four sequence tagging tasks for Twitter data, namely, part of speech tagging, chunking, super sense tagging, and named entity recognition is studied. Expand
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization
TLDR
This work provides a free, open and GUI-based sentiment analysis tool that allows for a) relabeling predictions and/or adding labeled instances to retrain the weights of a given model, and b) customizing lexical resources to account for false positives and false negatives in sentiment lexicons. Expand
Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora
TLDR
This paper analyzes six popular benchmark datasets where tweets are annotated with sentiment labels to identify patterns and correlations of meta-data features of tweets and users with potential to improve sentiment analysis applications on social media data. Expand
Capturing Signals of Enthusiasm and Support Towards Social Issues from Twitter
TLDR
This paper analyzes the robustness of a prior framework for tagging tweets across the dimensions of enthusiasm and support, and offers an alternative or supplemental classification schema and prediction model to standard sentiment analysis and stance detection. Expand
Enthusiasm and support: alternative sentiment classification for social movements on social media
TLDR
It is suggested that enthusiastic and supportive tweets are more prevalent in tweets about social causes than other types of tweets on Twitter. Expand