Corpus ID: 219708374

Learning About Objects by Learning to Interact with Them

  title={Learning About Objects by Learning to Interact with Them},
  author={Martin Lohmann and Jordi Salvador and Aniruddha Kembhavi and R. Mottaghi},
Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision. Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction… Expand

Figures and Tables from this paper

AllenAct: A Framework for Embodied AI Research
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms. Expand
Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery
Act the Part is introduced to learn how to interact with articulated objects to discover and segment their pieces by coupling action selection and motion segmentation, and is able to isolate structures to make perceptual part recovery possible without semantic labels. Expand
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning
This paper proposes a unified affordance learning framework to learn object-object interaction for various tasks using physical simulation (SAPIEN), ShapeNet models with rich geometric diversity, and an object-kernel point convolution network to reason about detailed interaction between two objects. Expand
VAT-MART: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
Perceiving and manipulating 3D articulated objects (e.g., cabinets, 1 doors) in human environments is an important yet challenging task for future 2 home-assistant robots. The space of 3D articulatedExpand
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
This paper proposes object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction- aware, and task-aware visual action affordance and trajectory proposals. Expand
Where2Act: From Pixels to Actions for Articulated 3D Objects
This paper proposes a learning-from-interaction framework with an online data sampling strategy that allows to train the network in simulation (SAPIEN) and generalizes across categories and proposes, discusses, and evaluates novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force. Expand


Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
This work fine-tunes an existing DeepMask instance segmentation network on the self-labeled training data acquired by the robot, and presents a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner. Expand
AI2-THOR: An Interactive 3D Environment for Visual AI
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models. Expand
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural networkExpand
Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments
Evaluations show that the proposed information-theoretic approach allows a robot to efficiently determine the composite structure of its environment, and the probabilistic model allows straightforward integration of multiple modalities, such as movement data and static scene features. Expand
Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection
A computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts, which can beat the state-of-the-art full-training based performances by learning from very few samples for each object category. Expand
Visual Reaction: Learning to Play Catch With Your Drone
The results show that the model that integrates a forecaster with a planner outperforms a set of strong baselines that are based on tracking as well as pure model-based and model-free RL baselines. Expand
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference. Expand
Scaling and Benchmarking Self-Supervised Visual Representation Learning
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning. Expand
Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
A principled probabilistic formulation of object saliency as a sampling problem that allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. Expand
Watch and learn: Semi-supervised learning of object detectors from videos
We present a semi-supervised approach that localizes multiple unknown object instances in long videos. We start with a handful of labeled boxes and iteratively learn and label hundreds of thousandsExpand