Corpus ID: 221083362

HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures

@article{Zhou2020HoliCityAC,
  title={HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures},
  author={Yichao Zhou and Jingwei Huang and Xili Dai and Linjie Luo and Zhili Chen and Yi Ma},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.03286}
}
We present HoliCity, a city-scale 3D dataset with rich structural information. Currently, this dataset has 6,300 real-world panoramas of resolution $13312 \times 6656$ that are accurately aligned with the CAD model of downtown London with an area of more than 20 km$^2$, in which the median reprojection error of the alignment of an average image is less than half a degree. This dataset aims to be an all-in-one data platform for research of learning abstracted high-level holistic 3D structures… Expand
CTRL-C: Camera calibration TRansformer with Line-Classification
Single image camera calibration is the task of estimating the camera parameters from a single input image, such as the vanishing points, focal length, and horizon line. In this work, we proposeExpand
Extreme Rotation Estimation using Dense Correlation Volumes
TLDR
This work presents a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting, where the images have little or no overlap, and proposes a network design that can automatically learn implicit cues as to their geometric relationship by comparing all pairs of points between the two input images. Expand
Fully Convolutional Line Parsing
TLDR
This work presents a one-stage Fully Convolutional Line Parsing network (F-Clip) that detects line segments from images and achieves a significantly better trade-off between efficiency and accuracy, resulting in a real-time line detector at up to 73 FPS on a single GPU. Expand

References

SHOWING 1-10 OF 73 REFERENCES
MegaDepth: Learning Single-View Depth Prediction from Internet Photos
  • Z. Li, Noah Snavely
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
This work proposes to use multi-View Internet photo collections, a virtually unlimited data source, to generate training data via modern structure-from-motion and multi-view stereo (MVS) methods, and presents a large depth dataset called MegaDepth based on this idea. Expand
Recovering 3D Planes from a Single Image via Convolutional Neural Networks
TLDR
A novel plane structure-induced loss is proposed to train the network to simultaneously predict a plane segmentation map and the parameters of the 3D planes, which significantly outperforms existing methods, both qualitatively and quantitatively. Expand
The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes
TLDR
This paper generates a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations, and conducts experiments with DCNNs that show how the inclusion of SYnTHIA in the training stage significantly improves performance on the semantic segmentation task. Expand
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
  • D. Eigen, R. Fergus
  • Computer Science
  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutionalExpand
U-Net: Convolutional Networks for Biomedical Image Segmentation
TLDR
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Expand
Vision meets robotics: The KITTI dataset
TLDR
A novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research, using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras and a high-precision GPS/IMU inertial navigation system. Expand
Robust Multiple Structures Estimation with J-Linkage
TLDR
The proposed solution is based on random sampling and conceptual data representation, and a tailored agglomerative clustering, called J-linkage, is used to group points belonging to the same model. Expand
Single-Image Piece-Wise Planar 3D Reconstruction via Associative Embedding
TLDR
A novel two-stage method based on associative embedding, inspired by its recent success in instance segmentation, that is able to detect an arbitrary number of planes and facilitate many real-time applications such as visual SLAM and human-robot interaction. Expand
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
TLDR
A dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations, enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large- scale indoor spaces. Expand
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
TLDR
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks. Expand
...
1
2
3
4
5
...