WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

@article{Nguyen2020WNUT2020T2,
  title={WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets},
  author={Dat Quoc Nguyen and Thanh Vu and Afshin Rahimi and Mai Hoang Dao and Linh The Nguyen and Long Doan},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.08232}
}
In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve… Expand

Tables from this paper

ISWARA at WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets using BERT and FastText Embeddings
TLDR
Results show that pairing BERT with word occurrences outperforms fastText with F1-Score, precision, recall, and accuracy on test data of 76%, 81%, 72%, and 79%, respectively. Expand
NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets
TLDR
It is found that domain-specific pre-trained BERT models lead to the best performance, and the standalone CT-BERT model proved to be highly competitive, leading to a shared first place in the shared task. Expand
IRLab@IITBHU at WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets using BERT
TLDR
This paper reports the submission to the shared Task 2: Identification of informative COVID-19 English tweets at W-NUT 2020, and briefly explains two models that showed promising results in tweet classification tasks: DistilBERT and FastText. Expand
LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets
TLDR
This system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. Expand
TATL at WNUT-2020 Task 2: A Transformer-based Baseline System for Identification of Informative COVID-19 English Tweets
TLDR
Inspired by the recent advances in pretrained Transformer language models, a simple yet effective baseline for the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets is proposed. Expand
CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features
TLDR
This paper explores improving the performance of a pre-trained transformer-based language model fine-tuned for text classification through an ensemble implementation that makes use of corpus level information and a handcrafted feature. Expand
Not-NUTs at WNUT-2020 Task 2: A BERT-based System in Identifying Informative COVID-19 English Tweets
TLDR
A model is proposed that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not, and ensembling different BERTweet model configurations achieves competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class. Expand
Linguist Geeks on WNUT-2020 Task 2: COVID-19 Informative Tweet Identification using Progressive Trained Language Models and Data Augmentation
TLDR
This work focuses on developing a language system that can differentiate between Informative or Uninformative tweets associated with COVID-19 for WNUT-2020 Shared Task 2 and employs deep transfer learning models such as BERT along with other techniques such as Noisy Data Augmentation and Progress Training. Expand
CIA_NITT at WNUT-2020 Task 2: Classification of COVID-19 Tweets Using Pre-trained Language Models
TLDR
The models for WNUT2020 shared task2 involves identification of COVID-19 related informative tweets are presented and a first model based on CT-BERT achieves F1-score of 88.7% and a second model which is an ensemble of CT-berT, RoBERTa and SVM achieves F 1-score 88.52%. Expand
DATAMAFIA at WNUT-2020 Task 2: A Study of Pre-trained Language Models along with Regularization Techniques for Downstream Tasks
TLDR
Experiments show that adding regularizations to RoBERTa pre-trained model can be very robust to data and annotation noises and can improve overall performance by more than 1.2%. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets
TLDR
It is found that domain-specific pre-trained BERT models lead to the best performance, and the standalone CT-BERT model proved to be highly competitive, leading to a shared first place in the shared task. Expand
#GCDH at WNUT-2020 Task 2: BERT-Based Models for the Detection of Informativeness in English COVID-19 Related Tweets
TLDR
A transformer-based approach to the detection of informativeness in English tweets on the topic of the current COVID-19 pandemic as well as a Naive Bayes classifier and a support vector machine as baseline systems are presented. Expand
COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter
TLDR
COVID-Twitter-BerT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19, shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. Expand
NutCracker at WNUT-2020 Task 2: Robustly Identifying Informative COVID-19 Tweets using Ensembling and Adversarial Training
TLDR
The ensemble of COVID-Twitter-BERT and RoBERTa models trained using adversarial training produces a F1-score of 0.9096 on the test data of WNUT-2020 Task 2 and ranks 1st on the leaderboard. Expand
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TLDR
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD. Expand
Bag of Tricks for Efficient Text Classification
TLDR
A simple and efficient baseline for text classification is explored that shows that the fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. Expand
Fast-join: An efficient method for fuzzy token matching based string similarity join
TLDR
This paper proposes a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions by allowing fuzzy match between two tokens, and achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods. Expand
The measurement of observer agreement for categorical data.
TLDR
A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics. Expand
...
1
2
...