Learning About Objects by Learning to Interact with Them
@article{Lohmann2020LearningAO, title={Learning About Objects by Learning to Interact with Them}, author={Martin Lohmann and Jordi Salvador and Aniruddha Kembhavi and Roozbeh Mottaghi}, journal={ArXiv}, year={2020}, volume={abs/2006.09306} }
Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision. Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction…
16 Citations
AllenAct: A Framework for Embodied AI Research
- Computer ScienceArXiv
- 2020
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Where2Act: From Pixels to Actions for Articulated 3D Objects
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This paper proposes a learning-from-interaction framework with an online data sampling strategy that allows to train the network in simulation (SAPIEN) and generalizes across categories and proposes, discusses, and evaluates novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force.
IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes
- Computer ScienceICLR
- 2022
This paper takes the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes.
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
- Computer ScienceArXiv
- 2022
This paper proposes the A SK 4H ELP policy, a policy that augments agents with the ability to request, and then use expert assistance, thereby reducing the cost of querying the expert.
Pay Self-Attention to Audio-Visual Navigation
- Computer ScienceArXiv
- 2022
These thorough experiments validate the superior performance (both quantitatively and qualitatively) of FSAAVN in comparison with the state-of-the-arts, and also provide unique insights about the choice of visual modalities, visual/audio encoder backbones and fusion patterns.
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
- Computer ScienceArXiv
- 2022
This paper surveys existing tasks, benchmark datasets, various techniques and models, and their respec-tive performance concerning advancements in RAC in the vision and language domain and outlines potential directions for future research.
Sound Adversarial Audio-Visual Navigation
- Computer ScienceICLR
- 2022
This work designs an acoustically complex environment in which, besides the target sound, there exists a sound attacker playing a zero-sum game with the agent, and develops a joint training mechanism by employing the property of a centralized critic with decentralized actors.
Interactron: Embodied Adaptive Object Detection
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
The idea is to continue training during inference and adapt the model at test time without any explicit supervision via interacting with the environment, and its performance is on par with a model trained with full supervision for those environments.
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
- Computer ScienceICLR
- 2022
This paper proposes object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware , interaction-aware, and task-aware visual action affordance and trajectory proposals.
References
SHOWING 1-10 OF 67 REFERENCES
Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
- Computer Science2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2019
This work fine-tunes an existing DeepMask instance segmentation network on the self-labeled training data acquired by the robot, and presents a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.
AI2-THOR: An Interactive 3D Environment for Visual AI
- Computer ScienceArXiv
- 2017
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
- Computer ScienceNIPS
- 2016
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network…
Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments
- Computer ScienceIEEE Transactions on Robotics
- 2014
Evaluations show that the proposed information-theoretic approach allows a robot to efficiently determine the composite structure of its environment, and the probabilistic model allows straightforward integration of multiple modalities, such as movement data and static scene features.
Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A computational model for weakly-supervised object detection, based on prior knowledge modelling, exemplar learning and learning with video contexts, which can beat the state-of-the-art full-training based performances by learning from very few samples for each object category.
Visual Reaction: Learning to Play Catch With Your Drone
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
The results show that the model that integrates a forecaster with a planner outperforms a set of strong baselines that are based on tracking as well as pure model-based and model-free RL baselines.
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
- Computer Science, PhysicsNIPS
- 2015
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.
Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
- Computer Science2013 IEEE Conference on Computer Vision and Pattern Recognition
- 2013
A principled probabilistic formulation of object saliency as a sampling problem that allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object.
Watch and learn: Semi-supervised learning of object detectors from videos
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
We present a semi-supervised approach that localizes multiple unknown object instances in long videos. We start with a handful of labeled boxes and iteratively learn and label hundreds of thousands…