• Publications
  • Influence
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
TLDR
This work creates an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks and proposes the hybrid kernel, a special case of the generalized sparse convolution, and trilateral-stationary conditional random fields that enforce spatio-temporal consistency in the 7D space-time-chroma space.
Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression
TLDR
This paper introduces a generalized version of IoU ( GIoU) as a loss into the state-of-the art object detection frameworks, and shows a consistent improvement on their performance using both the standard, IoU based, and new, GIo U based, performance measures on popular object detection benchmarks.
Universal Correspondence Network
TLDR
A convolutional spatial transformer to mimic patch normalization in traditional features like SIFT is proposed, which is shown to dramatically boost accuracy for semantic correspondences across intra-class shape variations.
SEGCloud: Semantic Segmentation of 3D Point Clouds
TLDR
SEGCloud is presented, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF).
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
TLDR
A semi-automatic framework that employs existing detection methods and enhances them using two main constraints: framing of query images sampled on panoramas to maximize the performance of 2D detectors, and multi-view consistency enforcement across 2D detections that originate in different camera locations.
JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments.
TLDR
A novel egocentric dataset collected from the authors' social mobile manipulator JackRabbot, which incorporates data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, all from the ego-perspective of the robot, both stationary and navigating is presented.
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
TLDR
The Free-Form Deformation layer is a powerful new building block for Deep Learning models that manipulate 3D data and DEFORMNET uses this FFD layer combined with shape retrieval for smooth and detail-preserving 3D reconstruction of qualitatively plausible point clouds with respect to a single query image.
Completing 3D object shape from one depth image
TLDR
This work takes an exemplar-based approach: retrieve similar objects in a database of 3D models using view-based matching and transfer the symmetries and surfaces from retrieved models to fully automatically reconstruct a 3D model from any category.
Weakly Supervised 3D Reconstruction with Adversarial Constraint
Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations of 2D/3D data. In
...
...