• Corpus ID: 237365528

Learning to Synthesize Programs as Interpretable and Generalizable Policies

@inproceedings{Trivedi2021LearningTS,
  title={Learning to Synthesize Programs as Interpretable and Generalizable Policies},
  author={Dweep Trivedi and Jesse Zhang and Shao-Hua Sun and Joseph J. Lim},
  booktitle={Neural Information Processing Systems},
  year={2021}
}
Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state… 

Programmatic Reinforcement Learning without Oracles

This work proposes a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules, and allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle.

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

This work proposes “CodeRL”, a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL), and treats the code-generating LM as an actor network, and introduces a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.

ProTo: Program-Guided Transformer for Program-Guided Tasks

It is demonstrated that ProTo outperforms the previous state-of-the-art methods on GQA visual reasoning and 2D Minecraft policy learning datasets and demonstrates better generalization to unseen, complex, and human-written programs.

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

This survey provides a comprehensive review of existing works on eXplainable RL (XRL) and introduces a new taxonomy where prior works are clearly categorized into model-explaining, reward- Explaining, state-Explaining, and task-explained methods.

Code as Policies: Language Model Programs for Embodied Control

—Large language models (LLMs) trained on code- completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code- writing LLMs can be

Synthesized Differentiable Programs

  • Computer Science
  • 2022
This paper presents a combined algorithm for synthesizing syntactic programs, compiling them into the weights of a neural network, and then tuning the resulting model to form an efficient gorithm for inducing abstract algorithmic structure and a corresponding local set of desirable complex programs.

Explainable Pathfinding for Inscrutable Planners with Inductive Logic Programming

This work builds on inductive logic pro- gramming techniques that allow for background knowledge and inductive biases to be combined to construct an explainable graph representing so- lutions to all states in the state space that can be explained to a human.

Iterative Genetic Improvement: Scaling Stochastic Program Synthesis

This work proposes a new framework for stochastic program synthesis, called iterative genetic improvement, to overcome the problem of how to search over the vast space of programs efficiently, and indicates that this method has considerable advantages over several representative stoChastic program synthesizer techniques, both in terms of scalability and of solution quality.

From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks

This work inves-tigate the crucial component of student modeling, the ability to automatically infer students’ misconceptions for predicting (synthesizing) their behavior, and introduces a novel benchmark, StudentSyn, centered around the following challenge: For a given student, synthesize the student’s attempt on a new target task after observing theStudentSyn.

MP-CodeCheck: Evolving Logical Expression Code Anomaly Learning with Iterative Self-Supervision

This work presents MP-CodeCheck, an MP system that tries to identify anomalous code patterns within logical program expressions and compares it against ControlFlag, a state-of-the-art self-supervised code anomaly detection system; it is found that MPCC is more spatially and temporally efficient.

References

SHOWING 1-10 OF 131 REFERENCES

Synthesizing Programmatic Policies that Inductively Generalize

This work proposes a learning framework called adaptive teaching, which learns a state machine policy by imitating a teacher; in contrast to traditional imitation learning, the teacher adaptively updates itself based on the structure of the student.

Programmatically Interpretable Reinforcement Learning

This work proposes a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maximal reward, and demonstrates that NDPS is able to discover human-readable policies that pass some significant performance bars.

Imitation-Projected Programmatic Reinforcement Learning

The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies and exploit contemporary combinatorial methods for this task.

Modular Multitask Reinforcement Learning with Policy Sketches

Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis

Reinforcement learning is performed on top of a supervised model with an objective that explicitly maximizes the likelihood of generating semantically correct programs, which leads to improved accuracy of the models, especially in cases where the training data is limited.

Program Synthesis Guided Reinforcement Learning

This work proposes model predictive program synthesis, which trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty.

Few-Shot Bayesian Imitation Learning with Logical Program Policies

This work proposes an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples, and argues that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.

Discovering symbolic policies with deep reinforcement learning

This work uses an autoregressive recurrent neural network to generate control policies represented by tractable mathematical expressions and proposes an “anchoring” algorithm that distills pre-trained neural network-based policies into fully symbolic policies, one action dimension at a time, to scale to environments with multidimensional action spaces.

Verifiable Reinforcement Learning via Policy Extraction

VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy and its Q-function, is proposed and it is shown that it substantially outperforms two baselines.

Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning

This paper proposes a novel framework for efficient multi-task reinforcement learning that trains agents to employ hierarchical policies that decide when to use a previously learned policy and when to learn a new skill.
...