Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning

@inproceedings{Chen2020SequenceGW,
  title={Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning},
  author={Liqun Chen and Ke Bai and Chenyang Tao and Yizhe Zhang and Guoyin Wang and Wenlin Wang and Ricardo Henao and Lawrence Carin},
  booktitle={AAAI},
  year={2020}
}
Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that… 

Figures and Tables from this paper

Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation
TLDR
This work proposes a technique called adaptive prior-dependent correction (APDC) to enhance RL, which leverages the distribution generated by computing the distances between the ground truth and all other words to correct the agent’s stochastic policy and estimates the advantage function with the difference of the Q-values which can be estimated by Monte Carlo rollouts.
Improving Adversarial Text Generation with n-Gram Matching
TLDR
The experimental results show that the model trained with mixed rewards from both n-gram matching and the discriminator has been able to outperform other GAN-based models in terms of BLEU score and qualitydiversity trade-off at a parity of computational budget.
Non-Parallel Text Style Transfer with Self-Parallel Supervision
TLDR
LaMer is proposed, a novel text style transfer framework based on large-scale language models that first mines the roughly parallel expressions in the non-parallel datasets with scene graphs, and then employs MLE training, followed by imitation learning refinement, to leverage the intrinsic parallelism within the data.
N ON -P ARALLEL T EXT S TYLE T RANSFER WITH S ELF -P ARALLEL S UPERVISION
TLDR
LaMer is proposed, a novel text style transfer framework based on large-scale language models that first mines the roughly parallel expressions in the non-parallel datasets with scene graphs, and then employs MLE training, followed by imitation learning refinement, to leverage the intrinsic parallelism within the data.
An Optimal-Transport-Based Reinforcement Learning Approach for Computation Offloading
TLDR
This paper builds a collaborative computation offloading model in cloud and edge computing and formulates a multi-objective optimization problem and proposes an Optimal-Transport-Based RL approach to resolve the offloading problem and make the optimal offloading decision for minimizing the overall cost of delay and energy consumption.
Re-evaluating Word Mover's Distance
TLDR
An analogy between WMD and L1-normalized BOW is introduced and it is shown that not only the performance of WMD but also the distance values resemble those of BOW in high dimensional spaces.
The Influence of Structural Information on Natural Language Processing
TLDR
The Influence of Structural Information on Natural Language Processing by researchers at the Massachusetts Institute of Technology and the University of Massachusetts Amherst is studied.

References

SHOWING 1-10 OF 71 REFERENCES
Connecting the Dots Between MLE and RL for Sequence Generation
TLDR
A generalized entropy regularized policy optimization formulation is presented, and it is shown that the apparently distinct algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of a reward function and a couple of hyperparameters.
Cold-Start Reinforcement Learning with Softmax Policy Gradient
TLDR
A reinforcement learning method based on a softmax value function that requires neither warm-start training and sample variance reduction is described, which combines the advantages of policy-gradient methods with the efficiency and simplicity of maximum-likelihood approaches.
Improving Sequence-to-Sequence Learning via Optimal Transport
TLDR
This work imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features in sequence-to-sequence models and shows consistent improvements over a wide variety of NLP tasks.
Self-Critical Sequence Training for Image Captioning
TLDR
This paper considers the problem of optimizing image captioning systems using reinforcement learning, and shows that by carefully optimizing systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.
An Actor-Critic Algorithm for Sequence Prediction
TLDR
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.
Reward Augmented Maximum Likelihood for Neural Structured Prediction
TLDR
This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards.
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
TLDR
Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update.
Multi-Reward Reinforced Summarization with Saliency and Entailment
TLDR
This work addresses three important aspects of a good summary via a reinforcement learning approach with two novel reward functions: ROUGESal and Entail, on top of a coverage-based baseline, and shows superior performance improvement when these rewards are combined with traditional metric (ROUGE) based rewards.
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
TLDR
This work proposes a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries.
...
...