FloorLevel-Net: Recognizing Floor-Level Lines With Height-Attention-Guided Multi-Task Learning

  title={FloorLevel-Net: Recognizing Floor-Level Lines With Height-Attention-Guided Multi-Task Learning},
  author={Mengyang Wu and Wei Zeng and Chi-Wing Fu},
  journal={IEEE Transactions on Image Processing},
The ability to recognize the position and order of the floor-level lines that divide adjacent building floors can benefit many applications, for example, urban augmented reality (AR). This work tackles the problem of locating floor-level lines in street-view images, using a supervised deep learning approach. Unfortunately, very little data is available for training such a network – current street-view datasets contain either semantic annotations that lack geometric attributes, or rectified… 



Deep Recognition of Vanishing-Point-Constrained Building Planes in Urban Street Views

A novel convolutional neural network architecture that generates geometric segmentation of per-pixel orientations from a single street-view image is designed and proposed to rectify the pixel-wise segmentation into perspectively-projected quads based on spatial proximity between the segmentation masks and exterior line segments detected through an image processing.

DeepFacade: A Deep Learning Approach to Facade Parsing With Symmetric Loss

Qualitative results have shown that the method effectively aids deep convolutional neural networks to predict more accurate, visually pleasing, and symmetric shapes, and is the first to incorporate symmetry constraint into end-to-end training in deep neural networks for facade parsing.

The Cityscapes Dataset for Semantic Urban Scene Understanding

This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

Recovering 3D Planes from a Single Image via Convolutional Neural Networks

A novel plane structure-induced loss is proposed to train the network to simultaneously predict a plane segmentation map and the parameters of the 3D planes, which significantly outperforms existing methods, both qualitatively and quantitatively.

Cars Can’t Fly Up in the Sky: Improving Urban-Scene Segmentation via Height-Driven Attention Networks

  • Sungha ChoiJ. KimJ. Choo
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban- scene images, and achieves a new state-of-the-art performance on the Cityscapes benchmark with a large margin among ResNet101 based segmentation models.

Recovering Surface Layout from an Image

This paper takes the first step towards constructing the surface layout, a labeling of the image intogeometric classes, to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region.

ATLAS: A Three-Layered Approach to Facade Parsing

A novel approach for semantic segmentation of building facades that incorporates additional meta-knowledge in the form of weak architectural principles, which enforces architectural plausibility and consistency on the final reconstruction.

CCNet: Criss-Cross Attention for Semantic Segmentation

  • Zilong HuangXinggang Wang Wenyu Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
This work proposes a Criss-Cross Network (CCNet) for obtaining contextual information in a more effective and efficient way and achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results.

Designing deep networks for surface normal estimation

This paper proposes to build upon the decades of hard work in 3D scene understanding to design a new CNN architecture for the task of surface normal estimation and shows that incorporating several constraints and meaningful intermediate representations in the architecture leads to state of the art performance on surfacenormal estimation.

Pyramid Scene Parsing Network

This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.