• Corpus ID: 219708374

Learning About Objects by Learning to Interact with Them

  title={Learning About Objects by Learning to Interact with Them},
  author={Martin Lohmann and Jordi Salvador and Aniruddha Kembhavi and Roozbeh Mottaghi},
Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision. Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction… 

Figures and Tables from this paper

AllenAct: A Framework for Embodied AI Research

AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.

Where2Act: From Pixels to Actions for Articulated 3D Objects

This paper proposes a learning-from-interaction framework with an online data sampling strategy that allows to train the network in simulation (SAPIEN) and generalizes across categories and proposes, discusses, and evaluates novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force.

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

This paper takes the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes.

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

This paper proposes the A SK 4H ELP policy, a policy that augments agents with the ability to request, and then use expert assistance, thereby reducing the cost of querying the expert.

Pay Self-Attention to Audio-Visual Navigation

These thorough experiments validate the superior performance (both quantitatively and qualitatively) of FSAAVN in comparison with the state-of-the-arts, and also provide unique insights about the choice of visual modalities, visual/audio encoder backbones and fusion patterns.

Reasoning about Actions over Visual and Linguistic Modalities: A Survey

This paper surveys existing tasks, benchmark datasets, various techniques and models, and their respec-tive performance concerning advancements in RAC in the vision and language domain and outlines potential directions for future research.

Sound Adversarial Audio-Visual Navigation

This work designs an acoustically complex environment in which, besides the target sound, there exists a sound attacker playing a zero-sum game with the agent, and develops a joint training mechanism by employing the property of a centralized critic with decentralized actors.

Interactron: Embodied Adaptive Object Detection

The idea is to continue training during inference and adapt the model at test time without any explicit supervision via interacting with the environment, and its performance is on par with a model trained with full supervision for those environments.

VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects

This paper proposes object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware , interaction-aware, and task-aware visual action affordance and trajectory proposals.



Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction

This work fine-tunes an existing DeepMask instance segmentation network on the self-labeled training data acquired by the robot, and presents a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.

AI2-THOR: An Interactive 3D Environment for Visual AI

AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network

Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments

Evaluations show that the proposed information-theoretic approach allows a robot to efficiently determine the composite structure of its environment, and the probabilistic model allows straightforward integration of multiple modalities, such as movement data and static scene features.

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

A computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts, which can beat the state-of-the-art full-training based performances by learning from very few samples for each object category.

Visual Reaction: Learning to Play Catch With Your Drone

The results show that the model that integrates a forecaster with a planner outperforms a set of strong baselines that are based on tracking as well as pure model-based and model-free RL baselines.

Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning

This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference.

Scaling and Benchmarking Self-Supervised Visual Representation Learning

It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.

Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection

A principled probabilistic formulation of object saliency as a sampling problem that allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object.

Watch and learn: Semi-supervised learning of object detectors from videos

We present a semi-supervised approach that localizes multiple unknown object instances in long videos. We start with a handful of labeled boxes and iteratively learn and label hundreds of thousands