Xiaozhi Chen

Learn More
The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving. Our method exploits stereo imagery to place proposals in the form of 3D bounding boxes. We formulate the problem as minimizing an energy function encoding object size priors, ground plane as well as several depth informed features that reason about(More)
Recent advances in object detection have exploited object proposals to speed up object searching. However, many of existing object proposal generators have strong localization bias or require computationally expensive diversification strategies. In this paper, we present an effective approach to address these issues. We first propose a simple and useful(More)
The goal of this paper is to perform 3D object detection from a single monocular image in the domain of autonomous driving. Our method first aims to generate a set of candidate class-specific object proposals, which are then run through a standard CNN pipeline to obtain high-quality object detections. The focus of this paper is on proposal generation. In(More)
This paper aims at high-accuracy 3D object detection in autonomous driving scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes. We encode the sparse 3D point cloud with a compact multi-view representation. The network is composed of(More)
The goal of this paper is to perform 3D object detection in the context of autonomous driving. Our method aims at generating a set of high-quality 3D object proposals by exploiting stereo imagery. We formulate the problem as minimizing an energy function that encodes object size priors, placement of objects on the ground plane as well as several depth(More)
Object proposals have been widely used in object detection to speed up object searching. However, many of existing object proposal generators have pool localization quality, which weakens the performance of object detectors. In this paper, we present an effective approach to improve the localization quality of object proposals. We leverage the(More)
Sparse Coding, a popular feature coding method, has shown superior performance in visual recognition tasks. Different pooling methods, such as average pooling and max pooling, are commonly employed after feature coding. However, it has not been explained clearly what characteristic accounts for the success of pooling method. In this paper, a new pooling(More)
We focus on the problem of recognizing actions in still images, and this paper provides an approach which arranges features of different semantic parts in spatial order. Our approach includes three components: (1) a semantic learning algorithm that collects a set of part detectors, (2) an efficient detection method that divides multiple images by the same(More)
The Bag-of-Parts (BoP) model, which employs distinctive parts to represent images, has shown superior performance in vision recognition tasks. Our work is motivated by the need of reducing redundancy in tens of thousands parts. We propose a novel method to learn a compact latent representation from redundant part responses. We address this problem by(More)
Recent advances in salient object detection have exploited the deep Convolutional Neural Network (CNN) to represent high-level semantic, however, due to the presence of convolutional and pooling layers, it is difficult for CNN to generate saliency map with sharp boundaries. In this paper, we propose multi-scale mask-based Fast R-CNN framework which generate(More)