Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
@inproceedings{Naskar2021EnergyBasedRI, title={Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models}, author={Subhajit Naskar and Pedram Rooshenas and Simeng Sun and Mohit Iyyer and Andrew McCallum}, booktitle={ACL}, year={2021} }
The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task…
17 Citations
Residual Energy-Based Models for Text
- Computer ScienceJ. Mach. Learn. Res.
- 2021
This work finds experimentally that the answer is affirmative when one has access to the training data for the model, and guardedly affirmative even if one does not, suggesting that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process.
Improving Joint Training of Inference Networks and Structured Prediction Energy Networks
- Computer ScienceSPNLP
- 2020
This paper designs a compound objective to jointly train both cost-augmented and test-time inference networks along with the energy function, and proposes joint parameterizations for the inference networks that encourage them to capture complementary functionality during learning.
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
- Computer ScienceCOLING
- 2020
It is shown that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary, and advocate for the use of decision rules that take into account the translation distribution holistically.
Quality-Aware Decoding for Neural Machine Translation
- Computer ScienceArXiv
- 2022
An extensive comparison of various possible candidate generation and ranking methods across four datasets and two model classes shows that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
- Computer ScienceArXiv
- 2022
The theoretical connections between the two paradigms, and it is shown that methods such as KL-control developed for RM can also be construed as belonging to DM, are explored, and that while DM differs from RM, it can suffer from similar training difficulties, such as high gradient variance.
Searching for COMETINHO: The Little Metric That Could
- Computer ScienceEAMT
- 2022
This paper explores optimization techniques, pruning, and knowledge distillation to create more compact and faster COMET versions and presents DISTIL-COMET a lightweight distilled version that is 80% smaller and 2.128x faster while attaining a performance close to the original model and above strong baselines such as BERTSCORE and PRISM.
Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
- Computer ScienceArXiv
- 2022
This paper proposes Transcormer – a Transformer model with a novel sliding language modeling (SLM) for sentence scoring that can avoid the limitations of CLM and MLM and inherit their advantages, and thus achieve high effectiveness andency in scoring.
Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models
- Computer ScienceACL
- 2022
A novel exact n-best search algorithm for neural sequence models is proposed, and it is shown that intrinsic uncertainty affects model uncertainty as the model tends to overly spread out the probability mass for uncertain tasks and sentences.
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
- Computer ScienceACL
- 2022
It is shown that it is possible to directly train a second-stage model performing re-ranking on a set of summary candidates, and a mixture-of-experts SummaReranker learns to select a better candidate and consistently improves the performance of the base model.
RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation
- Computer ScienceArXiv
- 2022
A regularized MBR reranking framework (RMBR), which considers semantic-based similarity and computes the expected utility for each candidate by truncating the list, and the proposed quality regularizer and uncertainty regularizer are incorporated into the framework.
References
SHOWING 1-10 OF 45 REFERENCES
Residual Energy-Based Models for Text Generation
- Computer ScienceICLR
- 2020
This work investigates un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level, and shows that residual EBMs yield lower perplexity compared to locally normalized baselines.
A Study of Reinforcement Learning for Neural Machine Translation
- Computer ScienceEMNLP
- 2018
A systematic study on how to train better NMT models using reinforcement learning, providing a comprehensive comparison of several important factors and proposing a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data.
Improving Neural Machine Translation Models with Monolingual Data
- Computer ScienceACL
- 2016
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
- Computer ScienceICLR
- 2020
It is proved that one of the most common RL methods for MT does not optimize the expected reward, as well as show that other methods take an infeasibly long time to converge.
Attention is All you Need
- Computer ScienceNIPS
- 2017
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
- Computer ScienceCOLING
- 2020
It is shown that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary, and advocate for the use of decision rules that take into account the translation distribution holistically.
Incorporating BERT into Neural Machine Translation
- Computer ScienceICLR
- 2020
A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.
An Actor-Critic Algorithm for Sequence Prediction
- Computer ScienceICLR
- 2017
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.
On integrating a language model into neural machine translation
- Computer ScienceComput. Speech Lang.
- 2017
On the use of BERT for Neural Machine Translation
- Computer ScienceEMNLP
- 2019
This work compares various ways to integrate pretrained BERT model with NMT model, investigates the impact of the monolingual data used for BERT training on the final translation quality and assesses how BERT pretrained representations affect model robustness.