Team_Swift at SemEval-2020 Task 9: Tiny Data Specialists through Domain-Specific Pre-training on Code-Mixed Data
@inproceedings{Malte2020Team_SwiftAS, title={Team_Swift at SemEval-2020 Task 9: Tiny Data Specialists through Domain-Specific Pre-training on Code-Mixed Data}, author={Aditya Malte and P. Bhavsar and Sushant Rathi}, booktitle={SemEval@COLING}, year={2020} }
Code-mixing is an interesting phenomenon where the speaker switches between two or more languages in the same text. In this paper, we describe an unconventional approach to tackling the SentiMix Hindi-English challenge (uid: aditya_malte). Instead of directly fine-tuning large contemporary Transformer models, we train our own domain-specific embeddings and use them for downstream tasks. We also discuss how this technique provides comparable performance while making for a much more deployable… CONTINUE READING
One Citation
SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
- Computer Science
- SemEval@COLING
- 2020
- 37
- PDF
References
SHOWING 1-10 OF 10 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Computer Science
- NAACL-HLT
- 2019
- 14,669
- PDF
RoBERTa: A Robustly Optimized BERT Pretraining Approach
- Computer Science
- ArXiv
- 2019
- 2,397
- Highly Influential
- PDF
SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
- Computer Science
- SemEval@COLING
- 2020
- 37
- PDF
Enriching Word Vectors with Subword Information
- Computer Science
- Transactions of the Association for Computational Linguistics
- 2017
- 4,456
- PDF
Multilingual Cyber Abuse Detection using Advanced Transformer Architecture
- Computer Science
- TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)
- 2019
- 3
a . Cross - lingual language model pretraining . In NeurIPS . Guillaume Lample and Alexis Conneau . 2019 b . Cross - lingual language model pretraining
- 2019