Linguistic Knowledge and Transferability of Contextual Representations

@article{Liu2019LinguisticKA,
  title={Linguistic Knowledge and Transferability of Contextual Representations},
  author={Nelson F. Liu and Matt Gardner and Yonatan Belinkov and Matthew E. Peters and Noah A. Smith},
  journal={ArXiv},
  year={2019},
  volume={abs/1903.08855}
}
Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. [] Key Result However, language model pretraining on more data gives the best results.
Quantifying the Contextualization of Word Representations with Semantic Class Probing
TLDR
This work quantifies the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding.
On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)
TLDR
It is shown that with no fine-tuning, a single contextualised representation encodes enough syntactic and semantic sentence-level information to significantly outperform a non-contextual baseline for classifying 5-class sentiment of its ancestor constituents at multiple levels of the constituency tree.
On the Universality of Deep Contextual Language Models
TLDR
This work explores the notion of ‘Universality’ by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings.
On the Linguistic Representational Power of Neural Machine Translation Models
TLDR
It is shown that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information.
Context Analysis for Pre-trained Masked Language Models
TLDR
A detailed analysis of contextual impact in Transformer- and BiLSTM-based masked language models suggests significant differences on the contextual impact between the two model architectures.
Probing Pretrained Language Models for Lexical Semantics
TLDR
A systematic empirical analysis across six typologically diverse languages and five different lexical tasks indicates patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
Multilingual Probing of Deep Pre-Trained Contextual Encoders
TLDR
This work comprehensively evaluate and analyse – from a typological perspective amongst others – multilingual variants of existing encoders on probing datasets constructed for 6 non-English languages.
Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models
TLDR
This work offers a systematic exploration of how much transfer occurs when models are denied any information about word identity via random scrambling, and finds that only BERT shows high rates of transfer into scrambled domains, and for classification but not sequence labeling tasks.
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
TLDR
A model pre-training framework, GenerationAugmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data to mitigate issues of existing general-purpose language models.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 74 REFERENCES
Dissecting Contextual Word Embeddings: Architecture and Representation
TLDR
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
What do you learn from context? Probing for sentence structure in contextualized word representations
TLDR
A novel edge probing task design is introduced and a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline are constructed to investigate how sentence structure is encoded across a range of syntactic, semantic, local, and long-range phenomena.
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
TLDR
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing
TLDR
A novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion, that consistently outperforms the previous state-of-the-art on 6 tested languages, yielding an improvement of 6.8 LAS points on average.
Deep RNNs Encode Soft Hierarchical Syntax
TLDR
A set of experiments is presented to demonstrate that deep recurrent neural networks learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision, indicating that a soft syntactic hierarchy emerges.
Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks
TLDR
This paper investigates the quality of vector representations learned at different layers of NMT encoders and finds that higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging.
...
1
2
3
4
5
...