• Publications
  • Influence
3D Bounding Box Estimation Using Deep Learning and Geometry
TLDR
Although conceptually simple, this method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.
Multiview RGB-D Dataset for Object Instance Detection
TLDR
A new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset is presented and an approach for detection and recognition is presented.
Synthesizing Training Data for Object Detection in Indoor Scenes
TLDR
This work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.
6-DOF GraspNet: Variational Grasp Generation for Object Manipulation
TLDR
This work forms the problem of grasp generation as sampling a set of grasps using a variational autoencoder and assess and refine the sampled graspts using a grasp evaluator model, trained purely in simulation and works in the real-world without any extra steps.
PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking
TLDR
This work forms the 6D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3D rotation and the3D translation of an object are decoupled, and achieves state-of-the-art results on two 6D pose estimation benchmarks.
Visual Representations for Semantic Target Driven Navigation
TLDR
This work proposes to use semantic segmentation and detection masks as observations obtained by state-of-the-art computer vision algorithms and use a deep network to learn navigation policies on top of representations that capture spatial layout and semantic contextual cues.
Deep Convolutional Features for Image Based Retrieval and Scene Categorization
TLDR
This paper examines several pooling strategies derived for CNN features and demonstrates superior performance on the image retrieval task (INRIA Holidays) at the fraction of the computational cost, while using a relatively small memory requirements.
LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation
TLDR
A novel framework for 6D pose estimation of unseen objects is proposed, which presents a network that reconstructs a latent 3D representation of an object using a small number of reference views at inference time and is able to render the latent3D representation from arbitrary views.
Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes
TLDR
This work proposes an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene and treats 3D points of the recorded point cloud as potential grasp contacts, and reduces the dimensionality of the grasp representation to 4- doF which greatly facilitates the learning process.
6-DOF Grasping for Target-driven Object Manipulation in Clutter
TLDR
This work presents a method that plans 6-DOF grasps for any desired object in a cluttered scene from partial point cloud observations and can even reason about effective grasp sequences to retrieve objects that are not immediately accessible.
...
1
2
3
4
...