• Corpus ID: 246442335

Interactron: Embodied Adaptive Object Detection

  title={Interactron: Embodied Adaptive Object Detection},
  author={Klemen Kotar and Roozbeh Mottaghi},
Over the years various methods have been proposed for the problem of object detection. Recently, we have wit-nessed great strides in this domain owing to the emergence of powerful deep neural networks. However, there are typically two main assumptions common among these approaches. First, the model is trained on a fixed training set and is evaluated on a pre-recorded test set. Second, the model is kept frozen after the training phase, so no further updates are performed after the training is… 
1 Citations

Figures and Tables from this paper

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
The proposed PROCTHOR, a framework for procedural generation of Embodied AI environments, enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks.


Online Object Representations with Contrastive Learning
A self-supervised approach for learning representations of objects from monocular videos is proposed and found that given a limited set of objects, object correspondences will naturally emerge when using contrastive learning without requiring explicit positive pairs.
Embodied Visual Active Learning for Semantic Segmentation
This work extensively evaluates the proposed models using the photorealistic Matterport3D simulator and shows that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations.
A self-supervised learning system for object detection using physics simulation and multi-view pose estimation
An autonomous process for training a Convolutional Neural Network for object detection and pose estimation in robotic setups and results show that the proposed approach outperforms popular training processes relying on synthetic — but not physically realistic — data and manual annotation.
Move to See Better: Towards Self-Supervised Amodal Object Detection
A self-supervised framework to improve an object detector in unseen scenarios by moving an agent around in a 3D environment and aggregating multi-view RGB-D information is proposed.
Pix2seq: A Language Modeling Framework for Object Detection
Pix2Seq is presented, a simple and generic framework for object detection that achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.
Embodied Amodal Recognition: Learning to Move to Perceive Objects
Experimental results show that agents with embodiment (movement) achieve better visual recognition performance than passive ones and in order to improve visual recognition abilities, agents can learn strategic paths that are different from shortest paths.
A dataset for developing and benchmarking active vision
It is shown that, although increasingly accurate and fast, the state of the art for object detection is still severely impacted by object scale, occlusion, and viewing direction all of which matter for robotics applications.
Incremental Learning of Object Detectors without Catastrophic Forgetting
This work presents a method to learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available, and presents object detection results on the PASCAL VOC 2007 and COCO datasets.
Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
This work fine-tunes an existing DeepMask instance segmentation network on the self-labeled training data acquired by the robot, and presents a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.
Learning About Objects by Learning to Interact with Them
This work presents a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction, and reveals that this agent learns efficiently and effectively; not just for objects it has interacted with before, but also for novel instances from seen categories as well as novel object categories.