HDMapNet: An Online HD Map Construction and Evaluation Framework

  title={HDMapNet: An Online HD Map Construction and Evaluation Framework},
  author={Qi Li and Yue Wang and Yilun Wang and Hang Zhao},
  journal={2022 International Conference on Robotics and Automation (ICRA)},
  • Qi LiYue Wang Hang Zhao
  • Published 13 July 2021
  • Computer Science
  • 2022 International Conference on Robotics and Automation (ICRA)
Constructing HD semantic maps is a central component of autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of HD semantic map learning, which dynamically constructs the local semantics based on onboard sensor observations. Meanwhile, we introduce a semantic map learning method, dubbed HDMapNet. HDMapNet encodes image… 

Figures and Tables from this paper

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

The Argoverse 2 (AV2) — a collection of three datasets for perception and forecasting research in the self-driving domain that supports self-supervised learning and the emerging task of point cloud forecasting is introduced.

SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation and Prediction

This paper proposes a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels, and proposes a new metric to evaluate the long-range HD map prediction and apply the generated HD map to a downstream path planning task.

MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

Qualitative results show that MapTR maintains stable and robust map construction quality in complex and various driving scenes, and is of great application value in autonomous driving.

You Only Label Once: 3D Box Adaptation from Point Cloud to Image via Semi-Supervised Learning

A learning-based 3D box adaptation approach that automatically adjusts minimum parameters of the 360 ◦ Lidar 3D bounding box to perfectly the image appearance of panoramic cameras and is the first to focus on image-level cuboid refinement, which balances the accuracy and accuracy well and dramatically reduces the labeling effort for accurate cuboid annotation.

Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents

This paper describes MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model), which considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks and shows that MMISM performs on par or even better than single-task models.

V2HDM-Mono: A Framework of Building a Marking-Level HD Map with One or More Monocular Cameras

—Marking-level high-definition maps (HD maps) are of great significance for autonomous vehicles, especially in large-scale, appearance-changing scenarios where autonomous vehicles rely on markings for

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

A full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs are introduced, and the future research directions in this area are pointed out.

Vision-Centric BEV Perception: A Survey

—Vision-centric BEV perception has recently received increased attention from both industry and academia due to its inherent merits, including presenting a natural representation of the world and

Scene Representation in Bird’s-Eye View from Surrounding Cameras with Transformers

This work proposes a transformer-based encoder-decoder structure to translate the image features from different cameras into the BEV frame, which takes advantage of the context information in the individual image and the relationship between images in different views.

UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

A new method is proposed that unifies both spatial and temporal fusion and merges them into a uni fied mathe-matical formulation and could support long-range fusion, which is hard to achieve in conventional BEV methods.



Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks

  • Thomas RoddickR. Cipolla
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This work presents a simple, unified approach for estimating birds-eye-view maps of their environment directly from monocular images using a single end-to-end deep learning architecture.

Machine Learning Assisted High-Definition Map Creation

  • Jialin Jiao
  • Computer Science
    2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC)
  • 2018
This paper first introduces the characteristics and layers of HD Maps; then a formal summary of the workflow of HD Map creation is provided; and most importantly, the machine learning techniques being used by the industry to minimize the amount of manual work in the process ofHD Map creation are presented.

The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes

The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.

PointPillars: Fast Encoders for Object Detection From Point Clouds

benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds, and proposes a lean downstream network.

The Cityscapes Dataset for Semantic Urban Scene Understanding

This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras

This paper addresses 360-degree road scene semantic segmentation using surround view cameras, which are widely equipped in existing production cars and proposes Restricted Deformable Convolution (RDC), which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map.

Cross-View Semantic Segmentation for Sensing Surroundings

A novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it and the experimental results show that the model can effectively make use of the information from different views and multi-modalities to understanding spatial information.

Semantic alignment of LiDAR data at city scale

An all-to-all, non-rigid, global alignment of LiDAR data collected with Google Street View cars in urban environments is implemented that provides better results than alternatives during experiments with data from large regions of New York, San Francisco, Paris, and Rome.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

A robust pose graph approach for city scale LiDAR mapping

A refined structure of the factor graph considering systematical initialization bias is introduced, where the scan-matching factors are twice validated through a novel classifier and a robust optimization strategy for reconstructing globally consistent 3D High-Definition maps at city scale.