• Corpus ID: 53278498

Towards Governing Agent's Efficacy: Action-Conditional β-VAE for Deep Transparent Reinforcement Learning

  title={Towards Governing Agent's Efficacy: Action-Conditional $\beta$-VAE for Deep Transparent Reinforcement Learning},
  author={John Yang and Gyujeong Lee and Minsung Hyun and Simyung Chang and Nojun Kwak},
We tackle the blackbox issue of deep neural networks in the settings of reinforcement learning (RL) where neural agents learn towards maximizing reward gains in an uncontrollable way. Such learning approach is risky when the interacting environment includes an expanse of state space because it is then almost impossible to foresee all unwanted outcomes and penalize them with negative rewards beforehand. Unlike reverse analysis of learned neural features from previous works, our proposed method… 

Figures and Tables from this paper


DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
A new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act, which significantly outperforms conventional baselines in zero-shot domain adaptation scenarios.
Curiosity-Driven Exploration by Self-Supervised Prediction
This work forms curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model, which scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and ignores the aspects of the environment that cannot affect the agent.
Contingency-Aware Exploration in Reinforcement Learning
This study develops an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games, which confirms that contingency-awareness is indeed an extremely powerful concept for tackling exploration problems in reinforcement learning.
Reinforcement Learning with Unsupervised Auxiliary Tasks
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Visualizing and Understanding Atari Agents
A method for generating useful saliency maps is introduced and used to show 1) what strong agents attend to, 2) whether agents are making decisions for the right or wrong reasons, and 3) how agents evolve during learning.
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings
This work shows that it can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems and provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior.
Deep Reinforcement Learning from Human Preferences
This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion.
Variational Option Discovery Algorithms
A tight connection between variational option discovery methods and variational autoencoders is highlighted, and Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection is introduced, and a curriculum learning approach is proposed.
Dueling Network Architectures for Deep Reinforcement Learning
This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.