Adaptive Nearest Neighbor Machine Translation

@article{Zheng2021AdaptiveNN,
  title={Adaptive Nearest Neighbor Machine Translation},
  author={Xin Zheng and Zhirui Zhang and Junliang Guo and Shujian Huang and Boxing Chen and Weihua Luo and Jiajun Chen},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.13022}
}
kNN-MT, recently proposed by Khandelwal et al. (2020a), successfully combines pre-trained neural machine translation (NMT) model with token-level k-nearest-neighbor (kNN) retrieval to improve the translation accuracy. However, the traditional kNN algorithm used in kNN-MT simply retrieves a same number of nearest neighbors for each target token, which may cause prediction errors when the retrieved neighbors include noises. In this paper, we propose Adaptive kNN-MT to dynamically determine the… 

Figures and Tables from this paper

Nearest Neighbor Knowledge Distillation for Neural Machine Translation

This paper proposes to move the time-consuming kNN search forward to the preprocessing phase, and then introduces k Nearest Neighbor Knowledge Distillation (kNN-KD) that trains the base NMT model to directly learn the knowledge of kNN.

Efficient Cluster-Based k-Nearest-Neighbor Machine Translation

A more efficient kNN-MT is explored and a cluster-based Compact Network for feature reduction in a contrastive learning manner to compress context features into 90+% lower dimensional vectors is proposed and shown good generalization on unseen domains.

Towards Robust k-Nearest-Neighbor Machine Translation

To alleviate the impact of noise, this paper proposes a confidence-enhanced k NN-MT model with robust training, which not only achieves improvements over current k Nn-MT models, but also exhibits better robustness.

Dynamic Fusion Nearest Neighbor Machine Translation via Dempster-Shafer Theory

This paper proposes an approach via Dempster–Shafer theory(DST) to dynamically fuse different probability distributions to suit different scenarios in low-resource translation scenarios and demonstrates that this approach is more significantly improved and more robust than the traditional k NN-MT.

Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation

This paper proposes a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for k -nearest-neighbor retrieval, and introduces an autoencoder task based on thetarget language, and inserts lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task.

Low Resource Retrieval Augmented Adaptive Neural Machine Translation

KNN-Kmeans MT is proposed, a sample efficient algorithm that improves retrieval based augmentation performance in low resource settings by adding an additional K-means filtering layer after the KNN step and conjecture that the observed improvement is a consequence of eliminating bad neighbors as their retrieval databases are small and retrieving a fixed number of neighbors leads to adding noise to the model.

Chunk-based Nearest Neighbor Machine Translation

Experiments on machine translation in two settings, static domain adaptation and “on-the-fly” adaptation, show that the chunk-based k NN-MT model leads to a significant speed-up (up to 4 times) with only a small drop in translation quality.

Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation

This work uses supervised contrastive learning to learn the distinctive retrieval representation derived from the original context representation of kNNMT, and proposes a fast and effective approach to constructing hard negative samples.

Better Datastore, Better Translation: Generating Datastores from Pre-Trained Models for Nearest Neural Machine Translation

This paper proposes PRED, a framework that leverages Pre -trained models for D atastores in k NN-MT to build datastores of better quality, and designs a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the N MT model to retrieve from better datastore.

Nearest Neighbor Non-autoregressive Text Generation

A novel training strategy to learn the edit operations on neighbors to improve NAR text generation and outperforms an NAR baseline on the WMT’14 En-De dataset and reports analysis on neighbor examples used in the proposed method.

References

SHOWING 1-10 OF 20 REFERENCES

Nearest Neighbor Machine Translation

We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a

Generalization through Memorization: Nearest Neighbor Language Models

It is suggested that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.

Non-Parametric Adaptation for Neural Machine Translation

This work proposes a novel n-gram level retrieval approach that relies on local phrase level similarities, allowing us to retrieve neighbors that are useful for translation even when overall sentence similarity is low, and combines this with an expressive neural network, allowing the model to extract information from the noisy retrieved context.

BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA

BERT-kNN outperforms BERT on cloze-style QA by large margins without any further training and excels for rare facts.

Search Engine Guided Neural Machine Translation

An attention-based neural machine translation model is extended by allowing it to access an entire training set of parallel sentence pairs even after training, and significantly outperforms the baseline approach.

Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings

This work proposes an approach that adapts models with domain-aware feature embeddings, which are learned via an auxiliary language modeling task, and allows the model to assign domain-specific representations to words and output sentences in the desired domain.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Iterative Domain-Repaired Back-Translation

This paper proposes a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair (DR) model to refine translations in synthetic bilingual data and designs the unified training framework to optimize paired DR and NMT models jointly.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Guiding Neural Machine Translation with Retrieved Translation Pieces

This paper proposes a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation.