Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning
@inproceedings{Chen2020SequenceGW, title={Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning}, author={Liqun Chen and Ke Bai and Chenyang Tao and Yizhe Zhang and Guoyin Wang and Wenlin Wang and Ricardo Henao and Lawrence Carin}, booktitle={AAAI}, year={2020} }
Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that…
7 Citations
Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation
- Computer ScienceAAAI
- 2021
This work proposes a technique called adaptive prior-dependent correction (APDC) to enhance RL, which leverages the distribution generated by computing the distances between the ground truth and all other words to correct the agent’s stochastic policy and estimates the advantage function with the difference of the Q-values which can be estimated by Monte Carlo rollouts.
Improving Adversarial Text Generation with n-Gram Matching
- Computer SciencePACLIC
- 2021
The experimental results show that the model trained with mixed rewards from both n-gram matching and the discriminator has been able to outperform other GAN-based models in terms of BLEU score and qualitydiversity trade-off at a parity of computational budget.
Non-Parallel Text Style Transfer with Self-Parallel Supervision
- Computer ScienceArXiv
- 2022
LaMer is proposed, a novel text style transfer framework based on large-scale language models that first mines the roughly parallel expressions in the non-parallel datasets with scene graphs, and then employs MLE training, followed by imitation learning refinement, to leverage the intrinsic parallelism within the data.
N ON -P ARALLEL T EXT S TYLE T RANSFER WITH S ELF -P ARALLEL S UPERVISION
- Computer Science
- 2022
LaMer is proposed, a novel text style transfer framework based on large-scale language models that first mines the roughly parallel expressions in the non-parallel datasets with scene graphs, and then employs MLE training, followed by imitation learning refinement, to leverage the intrinsic parallelism within the data.
An Optimal-Transport-Based Reinforcement Learning Approach for Computation Offloading
- Computer Science2021 IEEE Wireless Communications and Networking Conference (WCNC)
- 2021
This paper builds a collaborative computation offloading model in cloud and edge computing and formulates a multi-objective optimization problem and proposes an Optimal-Transport-Based RL approach to resolve the offloading problem and make the optimal offloading decision for minimizing the overall cost of delay and energy consumption.
Re-evaluating Word Mover's Distance
- Computer ScienceArXiv
- 2021
An analogy between WMD and L1-normalized BOW is introduced and it is shown that not only the performance of WMD but also the distance values resemble those of BOW in high dimensional spaces.
The Influence of Structural Information on Natural Language Processing
- Linguistics
- 2020
The Influence of Structural Information on Natural Language Processing by researchers at the Massachusetts Institute of Technology and the University of Massachusetts Amherst is studied.
References
SHOWING 1-10 OF 71 REFERENCES
Connecting the Dots Between MLE and RL for Sequence Generation
- Computer ScienceDeepRLStructPred@ICLR
- 2019
A generalized entropy regularized policy optimization formulation is presented, and it is shown that the apparently distinct algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of a reward function and a couple of hyperparameters.
Cold-Start Reinforcement Learning with Softmax Policy Gradient
- Computer ScienceNIPS
- 2017
A reinforcement learning method based on a softmax value function that requires neither warm-start training and sample variance reduction is described, which combines the advantages of policy-gradient methods with the efficiency and simplicity of maximum-likelihood approaches.
Improving Sequence-to-Sequence Learning via Optimal Transport
- Computer ScienceICLR
- 2019
This work imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features in sequence-to-sequence models and shows consistent improvements over a wide variety of NLP tasks.
Self-Critical Sequence Training for Image Captioning
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This paper considers the problem of optimizing image captioning systems using reinforcement learning, and shows that by carefully optimizing systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.
An Actor-Critic Algorithm for Sequence Prediction
- Computer ScienceICLR
- 2017
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.
Reward Augmented Maximum Likelihood for Neural Structured Prediction
- Computer ScienceNIPS
- 2016
This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards.
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
- Computer ScienceAAAI
- 2017
Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update.
Multi-Reward Reinforced Summarization with Saliency and Entailment
- Computer ScienceNAACL
- 2018
This work addresses three important aspects of a good summary via a reinforcement learning approach with two novel reward functions: ROUGESal and Entail, on top of a coverage-based baseline, and shows superior performance improvement when these rewards are combined with traditional metric (ROUGE) based rewards.
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
- Computer ScienceNIPS
- 2015
This work proposes a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.
A Deep Reinforced Model for Abstractive Summarization
- Computer ScienceICLR
- 2018
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries.