Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

@inproceedings{Kreutzer2017BanditSP,
  title={Bandit Structured Prediction for Neural Sequence-to-Sequence Learning},
  author={Julia Kreutzer and Artem Sokolov and Stefan Riezler},
  booktitle={ACL},
  year={2017}
}
Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-based recurrent neural networks. Furthermore, we show how to incorporate control variates into our… 

Tables from this paper

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
TLDR
A reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback that combines the advantage actor-critic algorithm with the attention-based neural encoder-decoder architecture and effectively optimizes traditional corpus-level machine translation metrics.
A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation
We present an approach to interactive-predictive neural machine translation that attempts to reduce human effort from three directions: Firstly, instead of requiring humans to select, correct, or
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
TLDR
Improvements of over 1 BLEU can be obtained by integrating a regression-based reward estimator trained on cardinal feedback for 800 translations into RL for NMT, showing that RL is possible even from small amounts of fairly reliable human feedback, pointing to a great potential for applications at larger scale.
Learning from Chunk-based Feedback in Neural Machine Translation
TLDR
It is demonstrated how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback.
The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
TLDR
A standard neural machine translation system is built and extended in two ways: (1) robust reinforcement learning techniques to learn effectively from the bandit feedback, and (2) domain adaptation using data selection from a large corpus of parallel data.
Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction
TLDR
A general proof that applies sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic objectives, and an experimental evaluation on linear bandit structured prediction tasks with sparse word-based feature representations that confirm the theoretical results.
Can Neural Machine Translation be Improved with User Feedback?
TLDR
This paper presents the first real-world application of methods for improving neural machine translation with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform, to improve task-specific and machine translation quality metrics.
Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications
TLDR
This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback, and proposes a self-regulation approach, where the learner decides which type of feedback to choose for each input.
Interactive-Predictive Neural Machine Translation through Reinforcement and Imitation
TLDR
An interactive-predictive neural machine translation framework for easier model personalization using reinforcement and imitation learning and in simulation experiments on two language pairs systems get close to the performance of supervised training with much less human effort.
Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation
TLDR
These experiments on in-domain and cross-domain adaptation reveal the importance of exploration and reward scaling, and provide empirical counter-evidence to Choshen et al. (2020) claims that policy gradient algorithms success is determined by the shape of output distributions rather than the reward.
...
...

References

SHOWING 1-10 OF 57 REFERENCES
Stochastic Structured Prediction under Bandit Feedback
TLDR
An experimental evaluation on problems of natural language processing over exponential output spaces is presented, and convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm is compared.
Learning Structured Predictors from Bandit Feedback for Interactive NLP
Structured prediction from bandit feedback describes a learning scenario where instead of having access to a gold standard structure, a learner only receives partial feedback in form of the loss
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Sequence Level Training with Recurrent Neural Networks
TLDR
This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation.
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
TLDR
These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.
Search-based structured prediction
TLDR
Searn is an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision and comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies goodperformance on the structured prediction problem.
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
TLDR
Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
TLDR
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
On the difficulty of training recurrent neural networks
TLDR
This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.
On the importance of initialization and momentum in deep learning
TLDR
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
...
...