Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization

  title={Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization},
  author={Nishant Yadav and Nicholas Monath and Rico Angell and Manzil Zaheer and Andrew McCallum},
Efficient k-nearest neighbor search is a fundamental task, foundational for many problems in NLP. When the similarity is measured by dot-product between dual-encoder vectors or `2-distance, there already exist many scalable and efficient search methods. But not so when similarity is measured by more accurate and expensive black-box neural similarity models, such as cross-encoders, which jointly encode the query and candidate neighbor. The cross-encoders’ high computational cost typically limits… 



Scalable Zero-shot Entity Linking with Dense Entity Retrieval

This paper introduces a simple and effective two-stage approach for zero-shot linking, based on fine-tuned BERT architectures, and shows that it performs well in the non-zero-shot setting, obtaining the state-of-the-art result on TACKBP-2010.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

CUR matrix decompositions for improved data analysis

An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

This work presents a simple yet efficient data augmentation strategy called Augmented SBERT, where the cross-encoder is used to label a larger set of input pairs to augment the training data for the bi-encoding, and shows that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

This work proposes an optimized training approach, called RocketQA, to improving dense passage retrieval, which significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions and demonstrates that the performance of end-to-end QA can be improved based on theRocketQA retriever.

DiPair: Fast and Accurate Distillation for Trillion-ScaleText Matching and Pair Modeling

This work proposes DiPair — a novel framework for distilling fast and accurate models on text pair tasks that is both highly scalable and offers improved quality-speed tradeoffs.

Accelerating Large-Scale Inference with Anisotropic Vector Quantization

A family of anisotropic quantization loss functions is developed that leads to a new variant of vector quantization that more greatly penalizes the parallel component of a datapoint's residual relative to its orthogonal component.

Zero-Shot Entity Linking by Reading Entity Descriptions

It is shown that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities and proposed domain-adaptive pre-training (DAP) is proposed to address the domain shift problem associated with linking unseen entities in a new domain.

Billion-Scale Similarity Search with GPUs

This paper proposes a novel design for an inline-formula that enables the construction of a high accuracy, brute-force, approximate and compressed-domain search based on product quantization, and applies it in different similarity search scenarios.