• Corpus ID: 237365528

Learning to Synthesize Programs as Interpretable and Generalizable Policies

  title={Learning to Synthesize Programs as Interpretable and Generalizable Policies},
  author={Dweep Trivedi and Jesse Zhang and Shao-Hua Sun and Joseph J. Lim},
  booktitle={Neural Information Processing Systems},
Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state… 

Programmatic Reinforcement Learning without Oracles

This work proposes a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules, and allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle.

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

This work proposes “CodeRL”, a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL), and treats the code-generating LM as an actor network, and introduces a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

A comprehensive review of existing works on eXplainable RL (XRL) is provided and a new taxonomy is introduced where prior works are clearly categorized into model-explaining, reward-Explaining, state explaining, and task- Explaining methods.

Code as Policies: Language Model Programs for Embodied Control

—Large language models (LLMs) trained on code- completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code- writing LLMs can be

Unsupervised Learning of Neurosymbolic Encoders

This work integrates modern program synthesis techniques with the variational autoencoding (VAE) framework, in order to learn a neurosymbolic encoder in conjunction with a standard decoder, which leads to more interpretable and factorized latent representations compared to fully neural encoders.

Synthesized Differentiable Programs

  • Computer Science
  • 2022
This paper presents a combined algorithm for synthesizing syntactic programs, compiling them into the weights of a neural network, and then tuning the resulting model to form an efficient gorithm for inducing abstract algorithmic structure and a corresponding local set of desirable complex programs.

Explainable Pathfinding for Inscrutable Planners with Inductive Logic Programming

This work builds on inductive logic pro- gramming techniques that allow for background knowledge and inductive biases to be combined to construct an explainable graph representing so- lutions to all states in the state space that can be explained to a human.

Iterative Genetic Improvement: Scaling Stochastic Program Synthesis

This work proposes a new framework for stochastic program synthesis, called iterative genetic improvement, to overcome the problem of how to search over the vast space of programs efficiently, and indicates that this method has considerable advantages over several representative stoChastic program synthesizer techniques, both in terms of scalability and of solution quality.

From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks

This work inves-tigate the crucial component of student modeling, the ability to automatically infer students’ misconceptions for predicting (synthesizing) their behavior, and introduces a novel benchmark, StudentSyn, centered around the following challenge: For a given student, synthesize the student’s attempt on a new target task after observing theStudentSyn.

Competition-level code generation with AlphaCode

AlphaCode is introduced, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform, marking the first time an artificial intelligence system has performed competitively in programming competitions.



Imitation-Projected Programmatic Reinforcement Learning

The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies and exploit contemporary combinatorial methods for this task.

Modular Multitask Reinforcement Learning with Policy Sketches

Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis

Reinforcement learning is performed on top of a supervised model with an objective that explicitly maximizes the likelihood of generating semantically correct programs, which leads to improved accuracy of the models, especially in cases where the training data is limited.

Program Synthesis Guided Reinforcement Learning

This work proposes model predictive program synthesis, which trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty.

Few-Shot Bayesian Imitation Learning with Logical Program Policies

This work proposes an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples, and argues that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.

Discovering symbolic policies with deep reinforcement learning

This work uses an autoregressive recurrent neural network to generate control policies represented by tractable mathematical expressions and proposes an “anchoring” algorithm that distills pre-trained neural network-based policies into fully symbolic policies, one action dimension at a time, to scale to environments with multidimensional action spaces.

Verifiable Reinforcement Learning via Policy Extraction

VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy and its Q-function, is proposed and it is shown that it substantially outperforms two baselines.

Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning

This paper proposes a novel framework for efficient multi-task reinforcement learning that trains agents to employ hierarchical policies that decide when to use a previously learned policy and when to learn a new skill.

RobustFill: Neural Program Learning under Noisy I/O

This work directly compares both approaches for automatic program learning on a large-scale, real-world learning task and demonstrates that the strength of each approach is highly dependent on the evaluation metric and end-user application.

Program Guided Agent

Experimental results on a 2D Minecraft environment not only demonstrate that the proposed framework learns to reliably accomplish program instructions and achieves zero-shot generalization to more complex instructions but also verify the efficiency of the proposed modulation mechanism for learning the multitask policy.