• Corpus ID: 219708374

Learning About Objects by Learning to Interact with Them

@article{Lohmann2020LearningAO,
  title={Learning About Objects by Learning to Interact with Them},
  author={Martin Lohmann and Jordi Salvador and Aniruddha Kembhavi and Roozbeh Mottaghi},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.09306}
}
Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision. Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction… 

Figures and Tables from this paper

AllenAct: A Framework for Embodied AI Research
TLDR
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Where2Act: From Pixels to Actions for Articulated 3D Objects
TLDR
This paper proposes a learning-from-interaction framework with an online data sampling strategy that allows to train the network in simulation (SAPIEN) and generalizes across categories and proposes, discusses, and evaluates novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force.
IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes
TLDR
This paper takes the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes.
Sound Adversarial Audio-Visual Navigation
TLDR
This work designs an acoustically complex environment in which, besides the target sound, there exists a sound attacker playing a zero-sum game with the agent, and develops a joint training mechanism by employing the property of a centralized critic with decentralized actors.
Interactron: Embodied Adaptive Object Detection
TLDR
The idea is to continue training during inference and adapt the model at test time without any explicit supervision via interacting with the environment, and its performance is on par with a model trained with full supervision for those environments.
A Survey of Embodied AI: From Simulators to Research Tasks
TLDR
An encyclopedic survey of the three main research tasks in embodied AI – visual exploration, visual navigation and embodied question answering – covering the state-of-the-art approaches, evaluation metrics and datasets is surveyed.
Shaping embodied agent behavior with activity-context priors from egocentric video
TLDR
This work introduces an approach to discover activitycontext priors from in-the-wild egocentric video captured with human worn cameras, encoding the video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction.
Shaping embodied agent behavior with activity-context priors from egocentric video
TLDR
This work introduces an approach to discover activitycontext priors from in-the-wild egocentric video captured with human worn cameras, encoding the video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction.
VAT-MART: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
TLDR
This paper proposes object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system out puts more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction- aware, and task-aware visual action affordance and trajectory proposals.
...
...

References

SHOWING 1-10 OF 72 REFERENCES
Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
TLDR
This work fine-tunes an existing DeepMask instance segmentation network on the self-labeled training data acquired by the robot, and presents a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network
Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments
TLDR
Evaluations show that the proposed information-theoretic approach allows a robot to efficiently determine the composite structure of its environment, and the probabilistic model allows straightforward integration of multiple modalities, such as movement data and static scene features.
Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection
TLDR
A computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts, which can beat the state-of-the-art full-training based performances by learning from very few samples for each object category.
Visual Reaction: Learning to Play Catch With Your Drone
TLDR
The results show that the model that integrates a forecaster with a planner outperforms a set of strong baselines that are based on tracking as well as pure model-based and model-free RL baselines.
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
TLDR
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
TLDR
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.
Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
TLDR
A principled probabilistic formulation of object saliency as a sampling problem that allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object.
Watch and learn: Semi-supervised learning of object detectors from videos
We present a semi-supervised approach that localizes multiple unknown object instances in long videos. We start with a handful of labeled boxes and iteratively learn and label hundreds of thousands
...
...