3D ShapeNets: A deep representation for volumetric shapes
- Zhirong Wu, Shuran Song, Jianxiong Xiao
- Computer ScienceComputer Vision and Pattern Recognition
- 21 June 2014
This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and shows that this 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
ShapeNet: An Information-Rich 3D Model Repository
- Angel X. Chang, T. Funkhouser, F. Yu
- Computer ScienceArXiv
- 9 December 2015
ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy, a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations.
SUN RGB-D: A RGB-D scene understanding benchmark suite
- Shuran Song, Samuel P. Lichtenberg, Jianxiong Xiao
- Computer ScienceComputer Vision and Pattern Recognition
- 7 June 2015
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
- F. Yu, Yinda Zhang, Shuran Song, Ari Seff, Jianxiong Xiao
- Computer ScienceArXiv
- 10 June 2015
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories.
Matterport3D: Learning from RGB-D Data in Indoor Environments
- Angel X. Chang, Angela Dai, Yinda Zhang
- Computer ScienceInternational Conference on 3D Vision
- 18 September 2017
Matterport3D is introduced, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400RGB-D images of 90 building-scale scenes that enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification.
Semantic Scene Completion from a Single Depth Image
- Shuran Song, F. Yu, Andy Zeng, Angel X. Chang, M. Savva, T. Funkhouser
- Computer ScienceComputer Vision and Pattern Recognition
- 28 November 2016
The semantic scene completion network (SSCNet) is introduced, an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum.
Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation
- He Wang, Srinath Sridhar, Jingwei Huang, Julien P. C. Valentin, Shuran Song, L. Guibas
- Computer ScienceComputer Vision and Pattern Recognition
- 9 January 2019
The proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.
3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions
- Andy Zeng, Shuran Song, M. Nießner, Matthew Fisher, Jianxiong Xiao, T. Funkhouser
- Computer ScienceComputer Vision and Pattern Recognition
- 27 March 2016
3DMatch is presented, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data that consistently outperforms other state-of-the-art approaches by a significant margin.
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
- Shuran Song, Jianxiong Xiao
- Computer ScienceComputer Vision and Pattern Recognition
- 7 November 2015
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.
Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
- Shuran Song, Jianxiong Xiao
- Computer ScienceIEEE International Conference on Computer Vision
- 1 December 2013
A unified benchmark dataset of 100 RGBD videos with high diversity is constructed, different kinds of RGBD tracking algorithms using 2D or 3D model are proposed, and a quantitative comparison of various algorithms with RGB or RGBD input is presented.
...
...