Charles Ruizhongtai Qi

Learn More
Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining render-based image synthesis and CNNs(More)
3D shape models are becoming widely available and easier to capture, making available 3D information crucial for progress in object classification. Current state-of-theart methods rely on CNNs to address this problem. Recently, we witness two types of CNNs being developed: CNNs based upon volumetric representations versus CNNs based upon multi-view(More)
Both 3D models and 2D images contain a wealth of information about everyday objects in our environment. However, it is difficult to semantically link together these two media forms, even when they feature identical or very similar objects. We propose a <i>joint</i> embedding space populated by both 3D shapes and 2D images of objects, where the distances(More)
Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well(More)
Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the(More)
Our method completes a partial 3D scan using a 3D Encoder-Predictor network that leverages semantic features from a 3D classification network. The predictions are correlated with a shape database, which we use in a multi-resolution 3D shape synthesis step. We obtain completed high-resolution meshes that are inferred from partial, low-resolution input scans.(More)
1. Details on Model Training Training for Our Volumetric CNNs To produce occupancy grids from meshes, the faces of a mesh are subdivided until the length of the longest edge is within a single voxel; then all voxels that intersect with a face are marked as occupied. For 3D resolution 10,30 and 60 we generate voxelizations with central regions 10, 24, 54 and(More)
Few prior works study deep learning on point sets. PointNet [20] is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network(More)
This paper describes our efforts to include a hands-on component in the teaching of core concepts of digital signal processing. The basis of our approach was the low-cost and open-source &#x201C;Stanford Lab in a Box.&#x201D; This system, with its easy to use Arduino-like programming interface allowed students to see how fundamental DSP concepts such as(More)
In Sec B we extend the robustness test to compare PointNet with VoxNet on incomplete input. In Sec C we provide more details on neural network architectures, training parameters and in Sec D we describe our detection pipeline in scenes. Then Sec E illustrates more applications of PointNet, while Sec F shows more analysis experiments. Sec G provides a proof(More)