IIITT at CASE 2021 Task 1: Leveraging Pretrained Language Models for Multilingual Protest Detection

@inproceedings{Kalyan2021IIITTAC,
  title={IIITT at CASE 2021 Task 1: Leveraging Pretrained Language Models for Multilingual Protest Detection},
  author={Pawan Kalyan and D.Ramohan Reddy and Adeep Hande and Ruba Priyadharshini and Ratnasingam Sakuntharaj and Bharathi Raja Chakravarthi},
  booktitle={CASE},
  year={2021}
}
In a world abounding in constant protests resulting from events like a global pandemic, climate change, religious or political conflicts, there has always been a need to detect events/protests before getting amplified by news media or social media. This paper demonstrates our work on the sentence classification subtask of multilingual protest detection in CASE@ACL-IJCNLP 2021. We approached this task by employing various multilingual pre-trained transformer models to classify if any sentence… 

Figures and Tables from this paper

Multilingual Protest News Detection - Shared Task 1, CASE 2021
TLDR
This report benchmarks state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection and finds that monolingual models outperformed the multilingual models in a few evaluation scenarios.
Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021
TLDR
This paper reports the Machine Translation systems submitted by the IIITT team for the English→Marathi and English⇔Irish language pairs LoResMT 2021 shared task, and fine-tune IndicTrans, a pretrained multilingual NMT model for English→ Marathi, using external parallel corpus as input for additional training.
Pegasus@Dravidian-CodeMix-HASOC2021: Analyzing Social Media Content for Detection of Offensive Text
TLDR
This research paper employs two Transformer-based prototypes which successfully stood in the top 8 for all the tasks of the HASOC - DravidianCodeMix FIRE 2021 shared task and introduces two inventive methods for detecting offensive comments/posts in Tamil and Malayalam.
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages
TLDR
The work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 is described by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI), English translations (TRAA) of the TRAI data and the combination of all the three.

References

SHOWING 1-10 OF 38 REFERENCES
Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context setting
TLDR
An overview of the CLEF-2019 Lab ProtestNews on Extracting Protests from News in the context of generalizable natural language processing is presented and neural networks yield the best results.
Protest Event Detection: When Task-Specific Models Outperform an Event-Driven Method
TLDR
Two approaches for identifying protest events in news in English are presented and it is shown that developing dedicated architectures and models for each task outperforms simpler solutions based on the propagation of labels from lexical items to documents.
Multilingual Protest News Detection - Shared Task 1, CASE 2021
TLDR
This report benchmarks state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection and finds that monolingual models outperformed the multilingual models in a few evaluation scenarios.
A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries
TLDR
This work proposes a coherent set of tasks for protest information collection in the context of generalizable natural language processing, including news article classification, event sentence detection, and event extraction that address the challenge of building generalizable NLP tools that perform well independent of the source of the text.
Analyzing ELMo and DistilBERT on Socio-political News Classification
TLDR
This study evaluates the robustness of two state-of-the-art deep contextual language representations, ELMo and DistilBERT, on supervised learning of binary protest news classification and sentiment analysis of product reviews and suggests that DistilberT can transfer generic semantic knowledge to other domains better than ELMo.
Cross-Context News Corpus for Protest Event-Related Knowledge Base Construction
TLDR
A gold standard corpus of protest events that comprise various local and international English language sources from various countries, which possesses the variety and quality that are necessary to develop and benchmark text classification and event extraction systems in a cross-context setting, contributing to the generalizability and robustness of automated text processing systems.
Multilingual Protest Event Data Collection with GATE
TLDR
A finite-state approach to protest event features collection from short texts in several European languages using the General Architecture for Text Engineering (GATE) using the results of the annotation performance evaluation are presented.
How Multilingual is Multilingual BERT?
TLDR
It is concluded that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs, and that the model can find translation pairs.
Automated Extraction of Socio-political Events from News (AESPEN): Workshop and Shared Task Report
TLDR
The volume and variety of both the data sources and event information collection approaches related to socio-political events and the need to fill the gap between automated text processing techniques and requirements of social and political sciences are shown.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
...
...