• Publications
  • Influence
Theano: A Python framework for fast computation of mathematical expressions
TLDR
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
TLDR
This work proposes Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and presents more realistic tasks, and proposes a new set of baselines for quantifying the benefit of meta-learning in Meta- Dataset.
An Actor-Critic Algorithm for Sequence Prediction
TLDR
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.
On Using Monolingual Corpora in Neural Machine Translation
TLDR
This work investigates how to leverage abundant monolingual corpora for neural machine translation to improve results for En-Fr and En-De translation and extends to high resource languages such as Cs-En and De-En.
Probabilistic Model-Agnostic Meta-Learning
TLDR
This paper proposes a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution that is trained via a variational lower bound, and shows how reasoning about ambiguity can also be used for downstream active learning problems.
Bridging the Gap Between Value and Policy Based Reinforcement Learning
TLDR
A new RL algorithm, Path Consistency Learning (PCL), is developed that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces and significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks.
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
TLDR
Off-policy trust region method Trust-PCL is proposed, which is the result of observing that the optimal policy and state values of a maximum reward objective with a relative-entropy regularizer satisfy a set of multi-step pathwise consistencies along any path.
Unsupervised Perceptual Rewards for Imitation Learning
TLDR
This work presents a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning
TLDR
This work exploits the insight that demonstrations from other tasks can be used to constrain the set of possible reward functions by learning a "prior" that is specifically optimized for the ability to infer expressive reward functions from limited numbers of demonstrations.
...
...