Quadtree Generating Networks: Efficient Hierarchical Scene Parsing with Sparse Convolutions

@article{Chitta2020QuadtreeGN,
  title={Quadtree Generating Networks: Efficient Hierarchical Scene Parsing with Sparse Convolutions},
  author={Kashyap Chitta and Jos{\'e} Manuel {\'A}lvarez and Martial Hebert},
  journal={2020 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2020},
  pages={2009-2018}
}
Semantic segmentation with Convolutional Neural Networks is a memory-intensive task due to the high spatial resolution of feature maps and output predictions. [] Key Method Our quadtree representation enables hierarchical processing of an input image, with the most computationally demanding layers only being used at regions in the image containing boundaries between classes.

Figures and Tables from this paper

Parallel Complement Network for Real-Time Semantic Segmentation of Road Scenes

TLDR
A lightweight real-time segmentation model, named Parallel Complement Network (PCNet), is proposed to address the challenging task with fewer parameters, and provides the ability to overcome the problem of similar feature encoding among different classes, and further produces discriminative representations.

N-QGN: Navigation Map from a Monocular Camera using Quadtree Generating Networks

TLDR
The objective is to create an adaptive depth map prediction that only extract details that are essential for the obstacle avoidance and can significantly reduce the number of output information without major loss of accuracy.

Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation

TLDR
This work presents a novel pyramidal output representation to ensure parsimony with the "specialize and fuse" process for semantic segmentation, and achieves state-of-the-art performance on three widely-used semantic segmentsation datasets—ADE20K, COCO-Stuff, and Pascal-Context.

Hierarchical neural reconstruction for path guiding using hybrid path and photon samples

TLDR
This paper designs a light-weight neural network to directly operate convolutions on a sparse quadtree, which regresses a high-quality hierarchical sampling distribution, which enables more fine-grained directional sampling with less memory usage, effectively advancing the practicality and efficiency of neural path guiding.

StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

TLDR
This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images directly and achieves real-time performance on the onboard computer (NVIDIA Jetson TX2) and achieves better IoU and CD scores with only 2% of the computation cost of the state-of-the-art stereo model.

CEGFNet: Common Extraction and Gate Fusion Network for Scene Parsing of Remote Sensing Images

TLDR
This work proposes an end-to-end common extraction and gate fusion network (CEGFNet) to capture both high-level semantic features and low-level spatial details for scene parsing of remote sensing images and achieves state-of-the-art performance.

References

SHOWING 1-10 OF 52 REFERENCES

Fully convolutional networks for semantic segmentation

TLDR
The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

TLDR
This work introduces new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and uses them to develop Spatially-Sparse Convolutional networks, which outperform all prior state-of-the-art models on two tasks involving semantic segmentation of 3D point clouds.

RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation

TLDR
RefineNet is presented, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections and introduces chained residual pooling, which captures rich background context in an efficient manner.

ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation

TLDR
A deep architecture that is able to run in real time while providing accurate semantic segmentation, and a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy is proposed.

Quadtree Convolutional Neural Networks

TLDR
The method decomposes and represents the image as a linear quadtree that is only refined in the non-empty portions of the image, enabling QCNN to learn from sparse images much faster and process high resolution images without the memory constraints faced by traditional CNNs.

Context Encoding for Semantic Segmentation

TLDR
The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN, and can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset.

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

TLDR
This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

SBNet: Sparse Blocks Network for Fast Inference

TLDR
This work leverages the sparsity structure of computation masks and proposes a novel tiling-based sparse convolution algorithm that is effective on LiDAR-based 3D object detection, and reports significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.

Rethinking Atrous Convolution for Semantic Image Segmentation

TLDR
The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Learning Deconvolution Network for Semantic Segmentation

TLDR
A novel semantic segmentation algorithm by learning a deep deconvolution network on top of the convolutional layers adopted from VGG 16-layer net, which demonstrates outstanding performance in PASCAL VOC 2012 dataset.
...