Visual Reaction: Learning to Play Catch With Your Drone

  title={Visual Reaction: Learning to Play Catch With Your Drone},
  author={Kuo-Hao Zeng and Roozbeh Mottaghi and Luca Weihs and Ali Farhadi},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agents itself. Visual reaction entails predicting the future changes in a visual environment and planning accordingly. We study the problem of visual reaction in the context of playing catch with a drone in visually rich synthetic environments. This is a challenging problem since the agent is required to learn (1) how… 

Figures and Tables from this paper

Pushing it out of the Way: Interactive Visual Navigation
This paper introduces the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent’s actions, and finds that agents equipped with an NIE exhibit significant improvements in their navigational capabilities.
AllenAct: A Framework for Embodied AI Research
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Learning About Objects by Learning to Interact with Them
This work presents a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction, and reveals that this agent learns efficiently and effectively; not just for objects it has interacted with before, but also for novel instances from seen categories as well as novel object categories.
No-frills Dynamic Planning using Static Planners
This paper addresses the task of interacting with dynamic environments where the changes in the environment are independent of the agent through the context of trapping a moving ball with a UR5 robotic arm and proposes an approach to utilize a static planner for dynamic tasks using a Dynamic Planning add-on.
Imitative Learning for Multi-Person Action Forecasting
This work puts forth a novel imitative learning framework upon the basis of inverse reinforcement learning that is able to explore multiple plausible futures via extrapolating a series of latent variables and taking them into account to generate predictions.
Learning to estimate UAV created turbulence from scene structure observed by onboard cameras
This work addresses the complex problem of estimating in-flight turbulences caused by obstacles, in particular the complex structures in cluttered environments through a memory based model in the form of a recurrent neural network.
Deep KKL: Data-driven Output Prediction for Non-Linear Systems
This work proposes a predictor structure based on the Kazantzis-Kravaris/Luenberger (KKL) observer and shows that KKL fits well into the general framework and proposes a constructive solution for this predictor that solely relies on a small set of trajectories measured from the system.


Target-driven visual navigation in indoor scenes using deep reinforcement learning
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.
Unsupervised Learning for Physical Interaction through Video Prediction
An action-conditioned video prediction model is developed that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames, and is partially invariant to object appearance, enabling it to generalize to previously unseen objects.
Activity Forecasting
The unified model uses state-of-the-art semantic scene understanding combined with ideas from optimal control theory to achieve accurate activity forecasting and shows how the same techniques can improve the results of tracking algorithms by leveraging information about likely goals and trajectories.
"What Happens If..." Learning to Predict the Effect of Forces in Images
A deep neural network model is designed that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene by combining Convolutional and Recurrent Neural Networks.
Visual Interaction Networks: Learning a Physics Simulator from Video
The Visual Interaction Network is introduced, a general-purpose model for learning the dynamics of a physical system from raw visual observations, consisting of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks.
An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders
A conditional variational autoencoder is proposed for predicting the dense trajectory of pixels in a scene—what will move in the scene, where it will travel, and how it will deform over the course of one second.
Learning to Act by Predicting the Future
The presented approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream that provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment.
Anticipating Visual Representations from Unlabeled Video
This work presents a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects and applies recognition algorithms on the authors' predicted representation to anticipate objects and actions.
AI2-THOR: An Interactive 3D Environment for Visual AI
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.
Visual Semantic Navigation using Scene Priors
This work proposes to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework and shows how semantic knowledge improves performance significantly and improves in generalization to unseen scenes and/or objects.