Prompting Decision Transformer for Few-Shot Policy Generalization

  title={Prompting Decision Transformer for Few-Shot Policy Generalization},
  author={Mengdi Xu and Yikang Shen and Shun Zhang and Yuchen Lu and Ding Zhao and Joshua B. Tenenbaum and Chuang Gan},
Human can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot… 

Figures and Tables from this paper

A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration

A Memory-related Multi-task Method (M3), which designs a model based on the exploration data to extract action effect features and save them in memory, while an action predictive model is trained.

Transformers are Adaptable Task Planners

A Transformer Task Planner that learns high-level actions from demonstrations by leveraging object attribute-based representations and shows generalization to unseen preferences using a single demonstration as a prompt in a simulated dishwasher loading task is proposed.

GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction

This paper proposes a physiologically-inspired data augmentation method to improve performance and increase the robustness of heart disease detection based on ECG signals, and designs a ground metric that recognizes theerence betweenECG signals based on physiologically determined features.



Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Decision Transformer: Reinforcement Learning via Sequence Modeling

Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

TACO: Learning Task Decomposition via Temporal Alignment for Control

This work proposes a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration, and shows that this approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

Generalizing from a Few Examples: A Survey on Few-Shot Learning

A thorough survey to fully understand Few-Shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.

Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering

This paper proposes Fast Parameter Adaptation for Image-Text Modeling (FPAIT) that learns to learn jointly understanding image and text data by a few examples that leverages dynamic linear transformations to alleviate the side effects of the small training set.

Conservative Q-Learning for Offline Reinforcement Learning

Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

One-Shot Visual Imitation Learning via Meta-Learning

A meta-imitation learning method that enables a robot to learn how to learn more efficiently, allowing it to acquire new skills from just a single demonstration, and requires data from significantly fewer prior tasks for effective learning of new skills.