Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

  title={Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition},
  author={Abbas Ghaddar and Philippe Langlais and Ahmad Rashid and Mehdi Rezagholizadeh},
  journal={Transactions of the Association for Computational Linguistics},
Abstract In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performance on standard benchmarks. To… Expand
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
A new KD algorithm is introduced, Combined-KD 1, which takes advantage of two promising approaches (better training scheme and more efficient data augmentation) and achieves state-of-theart results on the GLUE benchmark, out-ofdomain generalization, and adversarial robustness compared to competitive methods. Expand
Knowledge Distillation with Noisy Labels for Natural Language Understanding
This is the first study on KD with noisy labels in Natural Language Understanding (NLU) with the scope of the problem documented, and two methods to mitigate the impact of label noise are presented. Expand
End-to-End Self-Debiasing Framework for Robust NLU Training
A simple yet effective debiasing framework whereby the shallow representations of the main model are used to derive a bias model and both models are trained simultaneously, which leads to competitive OOD results. Expand
How Do Your Biomedical Named Entity Models Generalize to Novel Entities?
This work systematically analyzes the three types of recognition abilities of BioNER models: memorization, synonym generalization, and concept generalization to improve the generalizability of the state-of-the-art SOTA models on five benchmark datasets. Expand
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
The proposed RAIL-KD approach outperforms other state-of-the-art intermediate layer KD methods considerably in both performance and training-time and acts as a regularizer for improving the generalizability of the student model. Expand


What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name?
A simple method is provided that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions and shows that this method enhances robustness and increases accuracy on both natural and adversarial datasets. Expand
Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition
Two variants of DATNet are investigated to explore effective feature fusion between high and low resource, and a novel Generalized Resource-Adversarial Discriminator (GRAD) is proposed to address the noisy and imbalanced training data. Expand
Robust Named Entity Recognition with Truecasing Pretraining
This work addresses the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. Expand
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures
It is shown that without targeting a specific bias, the sentence augmentation improves the robustness of transformer models against multiple biases, and that models can still be vulnerable to the lexical overlap bias, even when the training data does not contain this bias. Expand
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
This paper trains a naive model that makes predictions exclusively based on dataset biases, and a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Expand
FreeLB: Enhanced Adversarial Training for Language Understanding
A novel adversarial training algorithm - FreeLB - is proposed, that promotes higher robustness and invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples. Expand
A Boundary-aware Neural Model for Nested Named Entity Recognition
This work proposes a boundary-aware neural model for nested NER which leverages entity boundaries to predict entity categorical labels, which can decrease computation cost and relieve error propagation problem in layered sequence labeling model. Expand
Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition
This paper decomposes the sentence into two parts: entity and context, and rethink the relationship between them and model performance from a causal perspective, and proposes the Counterfactual Generator, which generates counterfactual examples by the interventions on the existing observational examples to enhance the original dataset. Expand
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
This paper proposes a method that can automatically detect and ignore dataset-specific patterns, which it hypothesize are likely to reflect dataset bias, and trains a lower capacity model in an ensemble with a higher capacity model. Expand
Adversarial training for multi-context joint entity and relation extraction
Adversarial training is demonstrated to be a regularization method that allows improving the state-of-the-art effectiveness on several datasets in different contexts (i.e., news, biomedical, and real estate data) and for different languages (English and Dutch). Expand