MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

  title={MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations},
  author={Xinyin Ma and Yong Jiang and Nguyen Bach and Tao Wang and Zhongqiang Huang and Fei Huang and Weiming Lu},
Entity retrieval, which aims at disambiguating mentions to canonical entities from massive KBs, is essential for many tasks in natural language processing. Recent progress in entity retrieval shows that the dual-encoder structure is a powerful and efficient framework to nominate candidates if entities are only identified by descriptions. However, they ignore the property that meanings of entity mentions diverge in different contexts and are related to various portions of descriptions, which are… 

Figures and Tables from this paper

Named Entity Linking with Entity Representation by Multiple Embeddings

It is shown that the representations of KB entities can be adjusted using only KB data, and the adjustment can improve NEL performance, and it is found that tuning on diverse news provides better embeddings.

Bi-Link: Bridging Inductive Link Predictions from Text via Contrastive Learning of Transformers and Prompts

This paper proposes Bi-Link, a contrastive learning framework with probabilistic syntax prompts for link predictions, and designs a symmetric link prediction model, establishing bidirectional linking between forward prediction and backward prediction to better express symmetric relations.

Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata

This paper describes a new lexical resource, where all the dictionary entries of the history and geography part to current data sources are connected to a wikidata identifier and can automate more easily the identification, comparison, and verification of historically-situated representations.

Proxy-based Zero-Shot Entity Linking by Effective Candidate Retrieval

  • Maciej WiatrakEirini ArvanitiAngus BrayneJonas VetterleAaron Sim
  • Computer Science
  • 2023
This work shows that pairing a proxy-based metric learning loss with an adversarial regularizer provides ancient alternative to hard negative sampling in the candidate retrieval stage and shows competitive performance on the recall@1 metric, thereby providing the option to leave out the expensive candidate ranking step.

AcX: System, Techniques, and Experiments for Acronym Expansion

The design and implementation of AcX are described, three new acronym expansion benchmarks are proposed, a comparison of state-of-the-art techniques on them, and ensemble techniques that improve on any single technique are proposed.



Improving Zero-Shot Entity Retrieval through Effective Dense Representations

This work proposes a simple approach for improving candidate generation by efficiently embedding mention-entity pairs in dense space through a BERT-based bi-encoder and introduces a new pooling function and incorporate entity type side-information.

Autoregressive Entity Retrieval

Entities are at the center of how we represent and aggregate knowledge. For instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one per article). The ability to retrieve such

Scalable Zero-shot Entity Linking with Dense Entity Retrieval

This paper introduces a simple and effective two-stage approach for zero-shot linking, based on fine-tuned BERT architectures, and shows that it performs well in the non-zero-shot setting, obtaining the state-of-the-art result on TACKBP-2010.

Learning Dense Representations for Entity Retrieval

We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

A novel embedding method specifically designed for NED that jointly maps words and entities into the same continuous vector space and extends the skip-gram model by using two models.

Zero-Shot Entity Linking by Reading Entity Descriptions

It is shown that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities and proposed domain-adaptive pre-training (DAP) is proposed to address the domain shift problem associated with linking unseen entities in a new domain.

Robust Disambiguation of Named Entities in Text

A robust method for collective disambiguation is presented, by harnessing context from knowledge bases and using a new form of coherence graph that significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.

Learning relatedness measures for entity linking

This paper formalizes the problem of learning entity relatedness as a learning-to-rank problem, and proposes a methodology to create reference datasets on the basis of manually annotated data.

Robust named entity disambiguation with random walks

This article presents two novel approaches guided by a natural notion of semantic similarity for the collective disambiguation of all entities mentioned in a document at the same time based on learning-to-rank.

Learning Cross-Context Entity Representations from Text

Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of