Mesh Reconstruction from Aerial Images for Outdoor Terrain Mapping Using Joint 2D-3D Learning

  title={Mesh Reconstruction from Aerial Images for Outdoor Terrain Mapping Using Joint 2D-3D Learning},
  author={Qiaojun Feng and Nikolay A. Atanasov},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
This paper addresses outdoor terrain mapping using overhead images obtained from an unmanned aerial vehicle. Dense depth estimation from aerial images during flight is challenging. While feature-based localization and mapping techniques can deliver real-time odometry and sparse points reconstruction, a dense environment model is generally recovered offline with significant computation and storage. This paper develops a joint 2D-3D learning approach to reconstruct local meshes at each camera… 

Figures and Tables from this paper

TerrainMesh: Metric-Semantic Terrain Reconstruction from Aerial Images Using Joint 2D-3D Learning

Quantitative and qualitative evaluation using real aerial images show the potential of the joint 2D-3D learning approach to reconstruct a local metric-semantic mesh at each camera keyframe maintained by a visual odometry algorithm to support environmental monitoring and surveillance applications.

Smooth Mesh Estimation from Depth Data using Non-Smooth Convex Optimization

  • Antoni RosinolL. Carlone
  • Computer Science
    2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2021
This work builds a smooth and accurate 3D mesh that substantially improves the state-of-the-art on direct mesh reconstruction while running in real-time using a primal-dual method.



Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Mesh R-CNN

This work proposes a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object, augments Mask R-CNN with a mesh prediction branch that outputs meshes with varying topological structure.

Laplacian surface editing

This work argues that geometric detail is an intrinsic property of a surface and that, consequently, surface editing is best performed by operating over an intrinsic surface representation, and provides such a representation, based on the Laplacian of the mesh, by encoding each vertex relative to its neighborhood.

Accelerating 3D deep learning with PyTorch3D

1. Accelerating 3D Deep Learning with PyTorch3D, arXiv 2007.08501 2. Mesh R-CNN, ICCV 2019 3. SynSin: End-to-end View Synthesis from a Single Image, CVPR 2020 4. Fast Differentiable Raycasting for

A General Differentiable Mesh Renderer for Image-Based 3D Reasoning

This work proposes a natually differentiable rendering framework that is able to directly render colorized mesh using differentiable functions and back-propagate efficient supervisions to mesh vertices and their attributes from various forms of image representations.

A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset

  • Jin LiuShunping Ji
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
A novel network, called RED-Net, for wide-range depth inference, which was developed from a recurrent encoder-decoder structure to regularize cost maps across depths and a 2D fully convolutional network as framework as framework, and it is proved that the RED- net model pre-trained on the synthetic WHU dataset can be efficiently transferred to very different multi-view aerial image datasets without any fine-tuning.

TerrainFusion: Real-time Digital Surface Model Reconstruction based on Monocular SLAM

Compared with traditional structure from motion (SfM) based approaches, the presented system is able to output both large-scale high-quality DEM and orthomosaic in real-time with low computational cost.

Learning Joint 2D-3D Representations for Depth Completion

This paper designs a simple yet effective neural network block that learns to extract joint 2D and 3D features from RGBD data and shows that it outperforms the state-of-the-art on the challenging KITTI depth completion benchmark.

Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras

This work is the first to learn the camera intrinsic parameters, including lens distortion, from video in an unsupervised manner, thereby allowing us to extract accurate depth and motion from arbitrary videos of unknown origin at scale.

Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera

A deep regression model is developed to learn a direct mapping from sparse depth (and color images) input to dense depth prediction and a self-supervised training framework that requires only sequences of color and sparse depth images, without the need for dense depth labels is proposed.