Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization

@article{Taill2020ContextualizedEI,
  title={Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization},
  author={Bruno Taill{\'e} and Vincent Guigue and Patrick Gallinari},
  journal={Advances in Information Retrieval},
  year={2020},
  volume={12036},
  pages={383 - 391}
}
Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to detect mentions never seen during training. However, standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions. In this paper, we perform an empirical… 

A Realistic Study of Auto-regressive Language Models for Named Entity Typing and Recognition

TLDR
A method to select seen and rare / unseen names when having access only to the pre-trained model and report results on these groups, which show auto-regressive language models as meta-learners can perform NET and NER fairly well.

Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition

TLDR
A new methodology to probe auto-regressive LMs for NET and NER generalization is proposed, which draws inspiration from human linguistic behavior, by resorting to meta-learning and introduces a novel procedure to assess the model's memorization of NEs and report the memorization’s impact on the results.

Evaluation of contextual embeddings on less-resourced languages

TLDR
It is shown that monolingual BERT models generally dominate, with a few exceptions such as the dependency parsing task, where they are not competitive with ELMo models trained on large corpora.

What do we really know about State of the Art NER?

TLDR
A broad evaluation of NER is performed using a popular dataset, that takes into consideration various text genres and sources constituting the dataset at hand, and recommends some useful reporting practices for NER researchers that could help in providing a better understanding of a SOTA model’s performance in future.

HardEval: Focusing on Challenging Tokens to Assess Robustness of NER

TLDR
An evaluation method that focuses on subsets of tokens that represent specific sources of errors: unknown words and label shift or ambiguity provides a system-agnostic basis for evaluatingspecific sources of NER errors and assessing room for improvement in terms of robustness.

Separating Retention from Extraction in the Evaluation of End-to-end Relation Extraction

TLDR
This paper proposes two experiments confirming that retention of known facts is a key factor of performance on standard benchmarks and suggests that a pipeline model able to use intermediate type representations is less prone to over-rely on retention.

Investigation on Data Adaptation Techniques for Neural Named Entity Recognition

TLDR
This work investigates the impact of large monolingual unlabeled corpora and synthetic data from the original labeled data on the performance of three different named entity recognition tasks.

Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!

TLDR
A small empirical study is proposed to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05, and calls for unifying the evaluation setting in end-to-end RE.

How Do Your Biomedical Named Entity Models Generalize to Novel Entities?

TLDR
It is found that although BioNER models achieve state-of-the-art performance on BioNER benchmarks based on overall performance, they have limitations in identifying synonyms and new biomedical concepts such as COVID-19.

How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?

TLDR
This work systematically analyze the three types of recognition abilities of BioNER models: memorization, synonym generalization, and concept generalization and finds that although current best models achieve state-of-the-art performance on benchmarks based on overall performance, they have limitations in identifying synonyms and new biomedical concepts.

References

SHOWING 1-10 OF 25 REFERENCES

Improving Language Understanding by Generative Pre-Training

TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Deep Contextualized Word Representations

TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Generalisation in named entity recognition: A quantitative analysis

Using Linguistic Features to Improve the Generalization Capability of Neural Coreference Resolvers

TLDR
It is shown that generalization improves only slightly by merely using a set of additional linguistic features, however, employing features and subsets of their values that are informative for coreference resolution, considerably improves generalization.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Universal Language Model Fine-tuning for Text Classification

TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

GloVe: Global Vectors for Word Representation

TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

One billion word benchmark for measuring progress in statistical language modeling

TLDR
A new benchmark corpus to be used for measuring progress in statistical language modeling, with almost one billion words of training data, is proposed, which is useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques.

Transfer Learning for Entity Recognition of Novel Classes

TLDR
This work is the first direct comparison of these previously published approaches for entity recognition problems where the class labels in the source and target domains are different, and empirically demonstrates when each of the published approaches tends to do well.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity