Corpus ID: 237571775

MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models

  title={MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models},
  author={Qianchu Liu and Fangyu Liu and Nigel Collier and Anna Korhonen and Ivan Vulic},
Recent work indicated that pretrained language models (PLMs) such as BERT and RoBERTa can be transformed into effective sentence and word encoders even via simple self-supervised techniques. Inspired by this line of work, in this paper we propose a fully unsupervised approach to improving word-in-context (WiC) representations in PLMs, achieved via a simple and efficient WiC-targeted fine-tuning procedure: MirrorWiC. The proposed method leverages only raw texts sampled from Wikipedia, assuming… 


XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance in the task of distinguishing different meanings of a word, even for distant languages.
ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
ConSERT is presented, a Contrastive Framework for Self-Supervised SEntence Representation Transfer that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way and achieves new state-of-the-art performance on STS tasks.
Probing Pretrained Language Models for Lexical Semantics
A systematic empirical analysis across six typologically diverse languages and five different lexical tasks indicates patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
Language-agnostic BERT Sentence Embedding
This work adapts multilingual BERT to produce language-agnostic sentence embeddings for 109 languages that improve average bi-text retrieval accuracy over 112 languages to 83.7%, well above the 65.5% achieved by the prior state-of-the-art on Tatoeba.
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
A simple but effective approach to WSD using a nearest neighbor classification on CWEs and it is shown that the pre-trained BERT model is able to place polysemic words into distinct 'sense' regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability.
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
A large-scale Word in Context dataset, called WiC, based on annotations curated by experts, for generic evaluation of context-sensitive representations, and shows that existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose.
SimCSE: Simple Contrastive Learning of Sentence Embeddings
This paper describes an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise, and shows that contrastive learning theoretically regularizes pretrained embeddings’ anisotropic space to be more uniform and it better aligns positive pairs when supervised signals are available.
Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation
This work shows that contextual embeddings can be used to achieve unprecedented gains in Word Sense Disambiguation (WSD) tasks, and analyses the robustness of the approach when ignoring part-of-speech and lemma features, requiring disambiguated against the full sense inventory, and revealing shortcomings to be improved.
SenseBERT: Driving Some Sense into BERT
This paper proposes a method to employ weak-supervision directly at the word sense level, pre-trained to predict not only the masked words but also their WordNet supersenses, and achieves a lexical-semantic level language model, without the use of human annotation.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.