Corpus ID: 237581532

BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

@article{Gessler2021BERTHU,
  title={BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology},
  author={Luke Gessler and Nathan Schneider},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.09780}
}
An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses. Rather than build a WSD system as in previous work, we investigate contextualized embedding neighborhoods directly, formulating a query-by-example nearest neighbor retrieval task and examining ranking performance for words and senses in different frequency bands. In an evaluation on two English sense… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 33 REFERENCES
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
TLDR
A simple but effective approach to WSD using a nearest neighbor classification on CWEs and it is shown that the pre-trained BERT model is able to place polysemic words into distinct 'sense' regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability. Expand
FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary
TLDR
This paper introduces FEWS (Few-shot Examples of Word Senses), a new low-shot WSD dataset automatically extracted from example sentences in Wiktionary that provides a large training set that covers many more senses than previous datasets and a comprehensive evaluation set containing few- and zero-shot examples of a wide variety of senses. Expand
SenseBERT: Driving Some Sense into BERT
TLDR
This paper proposes a method to employ weak-supervision directly at the word sense level, pre-trained to predict not only the masked words but also their WordNet supersenses, and achieves a lexical-semantic level language model, without the use of human annotation. Expand
Recent Trends in Word Sense Disambiguation: A Survey
TLDR
An extensive overview of current advances in WSD is provided, describing the state of the art in terms of resources for the task, i.e., sense inventories and reference datasets for training and testing, as well as automatic disambiguation approaches, detailing their peculiarities, strengths and weaknesses. Expand
Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords
TLDR
Using a contextualized “pseudoword” as a stand-in for a static embedding in the input layer and then performing masked prediction of a word in the sentence, this work is able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Expand
Linguistic Knowledge and Transferability of Contextual Representations
TLDR
It is found that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge. Expand
Annotating WordNet
TLDR
This paper presents a report on the state of the current endeavor to increase the connectivity of WordNet through sense-tagging the glosses, the result of which will be to create a more integrated lexical resource. Expand
Word sense disambiguation: A survey
TLDR
This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
CxGBERT: BERT meets Construction Grammar
TLDR
The results allow us to conclude that BERT does indeed have access to a significant amount of information, much of which linguists typically call constructional information, and provides insights into what deep learning methods learn from text, while also showing that information contained in constructions is redundantly encoded in lexico-semantics. Expand
...
1
2
3
4
...