Corpus ID: 224803428

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

@article{Orr2021BootlegCT,
  title={Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation},
  author={Laurel Orr and Megan Leszczynski and Simran Arora and Sen Wu and Neel Guha and Xiao Ling and Christopher R{\'e}},
  journal={ArXiv},
  year={2021},
  volume={abs/2010.10363}
}
A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail entities. Humans use subtle reasoning patterns based on knowledge of entity facts, relations, and types to disambiguate unfamiliar entities. Inspired by these patterns, we introduce Bootleg, a self-supervised NED system that is explicitly grounded in reasoning patterns for disambiguation. We… Expand
Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP
TLDR
It is found that the retrievers exhibit popularity bias, significantly under-performing on rarer entities that share a name, e.g., they are twice as likely to retrieve erroneous documents on queries for the less popular entity under the same name. Expand
Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text
TLDR
This work proposes a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain and achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Expand
Metadata Shaping: Natural Language Annotations for the Tail
Language models (LMs) have made remarkable progress, but still struggle to generalize beyond the training data to rare linguistic patterns. Since rare entities and facts are prevalent in the queriesExpand
A Knowledge Graph Entity Disambiguation Method Based on Entity-Relationship Embedding and Graph Structure Embedding
TLDR
The proposed EDEGE (Entity Disambiguation based on Entity and Graph Embedding) method, which utilizes the semantic embedding vector of entity relationship and the embedding vectors of subgraph structure feature to improve the Precision and Recall of entity disambigsuation problems. Expand
Goodwill Hunting: Analyzing and Repurposing Off-the-Shelf Named Entity Linking Systems
TLDR
This work lays out and investigates two challenges faced by individuals or organizations building NEL systems, and shows how tailoring a simple technique for patching models using weak labeling can provide a 25% absolute improvement in accuracy of sport-related errors. Expand
Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins
TLDR
Ember is proposed, a system that abstracts and automates keyless joins to generalize context enrichment and allows users to develop nocode pipelines for five domains, including search, recommendation and question answering, and can exceed alternatives by up to 39% recall, with as little as a single line configuration change. Expand
ReTraCk: A Flexible and Efficient Framework for Knowledge Base Question Answering
  • Shuang Chen, Qian Liu, Zhiwei Yu, Chin-Yew Lin, Jian-Guang Lou, Feng Jiang
  • Computer Science
  • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
  • 2021
TLDR
ReTraCk is a neural semantic parsing framework for large scale knowledge base question answering (KBQA) that includes a retriever to retrieve relevant KB items efficiently, a transducer to generate logical form with syntax correctness guarantees and a checker to improve transduction procedure. Expand
Robustness Gym: Unifying the NLP Evaluation Landscape
TLDR
Robustness Gym (RG), a simple and extensible evaluation toolkit that unifies 4 standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks, is proposed. Expand
Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems
TLDR
The goal in this tutorial is to introduce the feature store system and discuss the challenges and current solutions to managing these new embedding-centric pipelines. Expand
Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
TLDR
It is found that the multilingual ability of BERT leads to robust performance in monolingual and multilingual settings and zero-shot language transfer is explored and found surprisingly robust performance. Expand
...
1
2
...

References

SHOWING 1-10 OF 68 REFERENCES
KORE: keyphrase overlap relatedness for entity disambiguation
TLDR
A novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases is developed, which improves the quality of prior link-based models, and also eliminates the need for explicit interlinkage between entities. Expand
Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation
TLDR
Utilizing E-ELMo for local entity disambiguation, this work outperforms all of the state-of-the-art local and global models on the popular benchmarks by improving about 0.5\% on micro average accuracy for AIDA test-b with Yago candidate set. Expand
Robust Disambiguation of Named Entities in Text
TLDR
A robust method for collective disambiguation is presented, by harnessing context from knowledge bases and using a new form of coherence graph that significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs. Expand
Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All
TLDR
Pair-Linking is designed, a novel iterative solution for the MINTREE optimization problem that is not only more accurate but also surprisingly faster than many state-of-the-art collective linking algorithms. Expand
Knowledge Enhanced Contextual Word Representations
TLDR
After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation. Expand
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking
TLDR
This study proposes an extreme simplification of the entity linking setup that works surprisingly well: simply cast it as a per token classification over the entire entity vocabulary and shows on an entity linking benchmark that this model improves the entity representations over plain BERT. Expand
Entity Disambiguation for Knowledge Base Population
TLDR
This work presents a state of the art system for entity disambiguation that not only addresses challenges but also scales to knowledge bases with several million entries using very little resources. Expand
Fast and Accurate Entity Linking via Graph Embedding
TLDR
This paper proposes a framework for entity linking that leverages graph embeddings to perform collective disambiguation and implements and evaluates a reference pipeline that uses DBpedia as knowledge base and leverages specific algorithms for fast candidate search and high-performance state-space search optimization. Expand
Entity linking at the tail: sparse signals, unknown entities, and phrase models
TLDR
A web-scale unsupervised entity linking system for a commercial search engine that addresses requirements by combining new developments in sparse signal recovery to identify the most discriminative features from noisy, free-text web documents and explicit modeling of out-of-knowledge-base entities to improve precision at the tail. Expand
Entity Linking via Joint Encoding of Types, Descriptions, and Context
TLDR
This work presents a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. Expand
...
1
2
3
4
5
...