GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving

  title={GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving},
  author={Yun Chen and Frieda Rong and Shivam Duggal and Shenlong Wang and Xinchen Yan and Sivabalan Manivasagam and Shangjie Xue and Ersin Yumer and Raquel Urtasun},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Yun Chen, Frieda Rong, R. Urtasun
  • Published 16 January 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Scalable sensor simulation is an important yet challenging open problem for safety-critical domains such as self-driving. Current works in image simulation either fail to be photorealistic or do not model the 3D environment and the dynamic objects within, losing high-level control and physical realism. In this paper, we present GeoSim, a geometry-aware image composition process which synthesizes novel urban driving scenarios by augmenting existing images with dynamic objects extracted from… 

Figures and Tables from this paper

SAC-GAN: Structure-Aware Image-to-Image Composition for Self-Driving
This work presents an end-to-end neural network trained to seamlessly compose an object represented as a cropped patch from an object image, into a background scene image, and evaluates the network in terms of quality, composability, and generalizability of the composite images.
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D
KITTI-360, successor of the popular KITTI dataset, is a suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization to facilitate research at the intersection of vision, graphics and robotics.
Instance Segmentation in CARLA: Methodology and Analysis for Pedestrian-oriented Synthetic Data Generation in Crowded Scenes
The evaluation results show that per-pedestrian depth aggregation obtained from the instance segmentation is more precise than previously available approximations based on bounding boxes especially in the context of crowded scenes in urban automated driving.
Depth-SIMS: Semi-Parametric Image and Depth Synthesis
This paper presents a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an inpainting network that transforms theRGB canvases into high quality RGB images and the sparse Depth maps into pixelwise dense depth maps.
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
The Argoverse 2 (AV2) — a collection of three datasets for perception and forecasting research in the self-driving domain that supports self-supervised learning and the emerging task of point cloud forecasting is introduced.
Block-NeRF: Scalable Large Scene Neural View Synthesis
It is demonstrated that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs, which decouples rendering time from scene size, enables rendering to scale to arbitrarily large environments, and allows per-block updates of the environment.
Waymo Open Dataset: Panoramic Video Panoptic Segmentation
The Waymo Open Dataset is presented, a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving and a new benchmark for Panoramic Video Panoptic Segmentation and establish a number of strong baselines based on the DeepLab family of models.
Fast Object Placement Assessment
This work proposes a pioneering fast OPA model with several innovations to bridge the performance gap between slow OPA models but runs significantly faster.
A Survey on Safety-Critical Scenario Generation for Autonomous Driving – A Methodological Perspective
A comprehensive taxonomy of existing algorithms of safety-critical scenario generation is provided by dividing them into three categories: data-driven generation, adversarial generation, and knowledge-based generation and extended to five main challenges of current works – fidelity, efficiency, diversity, transferability, controllability and the research opportunities lighted up by these challenges.
A Survey on Safety-critical Scenario Generation from Methodological Perspective
This survey provides a comprehensive taxonomy of existing algorithms of safety-critical scenario generation by dividing them into three categories: data-driven generation, adversarial generation, and knowledge-based generation and discusses useful tools for scenario generation, including simulation platforms and packages.


SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
This paper presents a simple yet effective approach to generate realistic scenario sensor data, based only on a limited amount of lidar and camera data collected by an autonomous vehicle, using texture-mapped surfels to efficiently reconstruct the scene from an initial vehicle pass or set of passes, preserving rich information about object 3D geometry and appearance, as well as the scene conditions.
AADS: Augmented autonomous driving simulation using data-driven algorithms
This work combines augmented real-world pictures with a simulated traffic flow to create photorealistic simulation images and renderings that are ready for training and testing of AD systems from perception to planning.
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
This work develops a novel simulator that captures both the power of physics-based and learning-based simulation, and showcases LiDARsim's usefulness for perception algorithms-testing on long-tail events and end-to-end closed-loop evaluation on safety-critical scenarios.
Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes
This work proposes an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation and object detection models, and introduces a novel dataset of augmented urban driving scenes with 360 degree images that are used as environment maps to create realistic lighting and reflections on rendered objects.
Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning
This work proposes a truly differentiable rendering framework that is able to directly render colorized mesh using differentiable functions and back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images.
Rendering synthetic objects into legacy photographs
This work proposes a method to realistically insert synthetic objects into existing photographs without requiring access to the scene or any additional scene measurements, and shows that the method is competitive with other insertion methods while requiring less scene information.
Augmented LiDAR Simulator for Autonomous Driving
This letter proposes a novel LiDAR simulator that augments real point cloud with synthetic obstacles (e.g., vehicles, pedestrians, and other movable objects) and describes the placement of obstacles that is critical for performance enhancement.
Monocular Neural Image Based Rendering With Continuous View Control
The experiments show that both proposed components, the transforming encoder-decoder and depth-guided appearance mapping, lead to significantly improved generalization beyond the training views and in consequence to more accurate view synthesis under continuous 6-DoF camera control.
Layer-structured 3D Scene Inference via View Synthesis
We present an approach to infer a layer-structured 3D representation of a scene from a single input image. This allows us to infer not only the depth of the visible pixels, but also to capture the
RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces
It is found that RELATE is also amenable to physically realistic scene editing and that it significantly outperforms prior art in object-centric scene generation in both synthetic data and real-world data (street traffic scenes).