Interactron: Embodied Adaptive Object Detection

  title={Interactron: Embodied Adaptive Object Detection},
  author={Klemen Kotar and Roozbeh Mottaghi},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Klemen KotarRoozbeh Mottaghi
  • Published 1 February 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Over the years various methods have been proposed for the problem of object detection. Recently, we have wit-nessed great strides in this domain owing to the emergence of powerful deep neural networks. However, there are typically two main assumptions common among these approaches. First, the model is trained on a fixed training set and is evaluated on a pre-recorded test set. Second, the model is kept frozen after the training phase, so no further updates are performed after the training is… 

Figures and Tables from this paper

Learning to View: Decision Transformers for Active Object Detection

This paper uses reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality and provides exhaustive analyses of the reward distribution and observation space.

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

ProcTHOR, a framework for procedural generation of Embodied AI environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks, is proposed and demonstrated via a sample of 10,000 generated houses and a simple neural model.

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

The Ask4Help policy is proposed, a policy that augments agents with the ability to request, and then use expert assistance, thereby reducing the cost of querying the expert.

ActMAD: Activation Matching to Align Distributions for Test-Time-Training

This work proposes to perform this adaptation via Activation Matching (ActMAD), which analyzes activations of the model and align activation statistics of the OOD test data to those of the training data, and model the distribution of each feature in multiple layers across the network.

Online Object Representations with Contrastive Learning

A self-supervised approach for learning representations of objects from monocular videos is proposed and found that given a limited set of objects, object correspondences will naturally emerge when using contrastive learning without requiring explicit positive pairs.

Self-supervisory Signals for Object Discovery and Detection

The proposed self-supervision provided by a robot traversing an environment to learn representations of encountered objects results in effective environment specific object discovery and detection at no or very small human labeling cost.

Embodied Visual Active Learning for Semantic Segmentation

This work extensively evaluates the proposed models using the photorealistic Matterport3D simulator and shows that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations.

A self-supervised learning system for object detection using physics simulation and multi-view pose estimation

An autonomous process for training a Convolutional Neural Network for object detection and pose estimation in robotic setups and results show that the proposed approach outperforms popular training processes relying on synthetic — but not physically realistic — data and manual annotation.

One-Shot Unsupervised Cross-Domain Detection

This paper presents an object detection algorithm able to perform unsupervised adaption across domains by using only one target sample, seen at test time, by introducing a multi-task architecture that one-shot adapts to any incoming sample by iteratively solving a self-supervised task on it.

Move to See Better: Towards Self-Supervised Amodal Object Detection

A self-supervised framework to improve an object detector in unseen scenarios by moving an agent around in a 3D environment and aggregating multi-view RGB-D information is proposed.

A dataset for developing and benchmarking active vision

It is shown that, although increasingly accurate and fast, the state of the art for object detection is still severely impacted by object scale, occlusion, and viewing direction all of which matter for robotics applications.

Incremental Learning of Object Detectors without Catastrophic Forgetting

This work presents a method to learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available, and presents object detection results on the PASCAL VOC 2007 and COCO datasets.

Embodied Amodal Recognition: Learning to Move to Perceive Objects

Experimental results show that agents with embodiment (movement) achieve better visual recognition performance than passive ones and in order to improve visual recognition abilities, agents can learn strategic paths that are different from shortest paths.

Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction

This work fine-tunes an existing DeepMask instance segmentation network on the self-labeled training data acquired by the robot, and presents a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.