Learn More
Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through(More)
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively(More)
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We(More)
Recent works on end-to-end neural network-based architectures for machine translation have shown promising results for English-French and English-German translation. Unlike these language pairs, however, in the majority of scenarios, there is a lack of high quality parallel corpora. In this work, we focus on applying neural machine translation to(More)
Reward function design and exploration time are arguably the biggest obstacles to the deployment of reinforcement learning (RL) agents in the real world. In many real-world tasks, designing a suitable reward function takes considerable manual engineering and often requires additional and potentially visible sensors to be installed just to measure whether(More)
We formulate a new notion of softmax temporal consistency that generalizes the standard hard-max Bellman consistency usually considered in value based reinforcement learning (RL). In particular , we show how softmax consistent action values correspond to optimal policies that maximize entropy regularized expected reward. More importantly, we establish that(More)
Recently there has been growing interest in building " active " visual object recognizers, as opposed to " passive " recognizers which classifies a given static image into a predefined set of object categories. In this paper we propose to generalize recent end-to-end active visual rec-ognizers into a controller-recognizer framework. In this framework, the(More)
Trust region methods, such as TRPO, are often used to stabilize policy optimization algorithms in reinforcement learning (RL). While current trust region strategies are effective for continuous control, they typically require a prohibitively large amount of on-policy interaction with the environment. To address this problem, we propose an offpolicy trust(More)
  • 1