A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions

  title={A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions},
  author={Ren{\'e} Schuster and Christian Unger and Didier Stricker},
  journal={2021 IEEE Winter Conference on Applications of Computer Vision (WACV)},
Motion estimation is one of the core challenges in computer vision. With traditional dual-frame approaches, occlusions and out-of-view motions are a limiting factor, especially in the context of environmental perception for vehicles due to the large (ego-) motion of objects. Our work pro-poses a novel data-driven approach for temporal fusion of scene flow estimates in a multi-frame setup to overcome the issue of occlusion. Contrary to most previous methods, we do not rely on a constant motion… 

Figures and Tables from this paper

M-FUSE: Multi-frame Fusion for Scene Flow Estimation

This paper proposes a novel multi-frame approach that considers an additional preceding stereo pair that performs a fusion of forward and backward flow estimates and hence allows to integrate temporal information on demand, and develops an improved two-frame baseline by incorporating an advanced stereo method.

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

  • Zachary TeedJia Deng
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
RAFT-3D is introduced, a new deep architecture for scene flow based on the RAFT model developed for optical flow but iteratively updates a dense field of pixelwise SE3 motion instead of 2D motion, which represents a soft grouping of pixels into rigid objects.

SF2SE3: Clustering Scene Flow into SE(3)-Motions via Proposal and Selection

. We propose SF2SE3, a novel approach to estimate scene dynamics in form of a segmentation into independently moving rigid objects and their SE (3)-motions. SF2SE3 operates on two consecutive stereo

RGB-D SLAM Using Scene Flow in Dynamic Environments

A novel motion removal visual system built on ORB-SLAM2 is proposed, which extracts pixel-level dynamic objects in RGB-D image sequences with scene flow method and demonstrates that the approach improves the tracking accuracy and work robustly in both highly and lowly dynamic scenes.

Neural Scene Flow Prior

This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization and includes the inclusion of a neural scene flow prior, which uses the architecture of neural networks as a new type of implicit regularizer.

TemporalStereo: Efficient Spatial-Temporal Stereo Matching Network

We present TemporalStereo, a coarse-to-fine based online stereo matching network which is highly efficient, and able to effectively exploit the past geometry and context information to boost the

Binary TTC: A Temporal Geofence for Autonomous Navigation

This method is the first to offer TTC information (binary or coarsely quantized) at sufficiently high frame-rates for practical use and predicts with low latency whether the observer will collide with an obstacle within a certain time, which is often more critical than knowing exact, per-pixel TTC.

RMS-FlowNet: Efficient and Robust Multi-Scale Scene Flow Estimation for Large-Scale Point Clouds

This work proposes a novel flow embedding design which can predict more robust scene flow in conjunction with Random-Sampling for multiscale scene flow prediction and shows that the model presents a competitive ability to generalize towards the real-world scenes of KITTI data set without fine-tuning.

Dense Feature Learning and Compact Cost Aggregation for Deep Stereo Matching

Comprehensive experimental results show that the 3D cost volume components obtained by the proposed DFL and CCA modules generally containing more multi-scale semantic information and thus can largely improve the final disparity regression accuracies.



3D Scene Flow Estimation with a Piecewise Rigid Scene Model

This work proposes to represent the dynamic scene as a collection of rigidly moving planes, into which the input images are segmented, and shows that a view-consistent multi-frame scheme significantly improves accuracy, especially in the presence of occlusions, and increases robustness against adverse imaging conditions.

A Fusion Approach for Multi-Frame Optical Flow Estimation

This work presents a simple, yet effective fusion approach for multi-frame optical flow that benefits from longer-term temporal cues and ranks first among published results in the MPI Sintel and KITTI 2015 benchmarks.

Piecewise Rigid Scene Flow

A novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments is introduced that achieves leading performance levels, exceeding competing3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.

Self-Supervised Monocular Scene Flow Estimation

  • Junhwa HurS. Roth
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This work designs a single convolutional neural network (CNN) that successfully estimates depth and 3D motion simultaneously from a classical optical flow cost volume, and adopts self-supervised learning with 3D loss functions and occlusion reasoning to leverage unlabeled data.

Fast Multi-frame Stereo Scene Flow with Motion Segmentation

A new multi-frame method for efficiently computing scene flow and camera ego-motion for a dynamic scene observed from a moving stereo camera rig, where the method consistently outperforms OSF, which is currently ranked second on the KITTI benchmark.

PWOC-3D: Deep Occlusion-Aware End-to-End Scene Flow Estimation

This paper proposes PWOC-3D, a compact CNN architecture to predict scene flow from stereo image sequences in an end-to-end supervised setting, and proposes a novel self-supervised strategy to predict occlusions from images (learned without any labeled occlusion data).

Object scene flow for autonomous vehicles

A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.

SENSE: A Shared Encoder Network for Scene-Flow Estimation

A compact network for holistic scene flow estimation is introduced, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation, which leads to a compact and efficient model at inference time.

RGB-D flow: Dense 3-D motion estimation using color and depth

It is shown that scene flow can be reliably computed using RGB-D data, overcoming depth noise and outperforming previous results on a variety of scenes.

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

  • N. MayerEddy Ilg T. Brox
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This paper proposes three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks and presents a convolutional network for real-time disparity estimation that provides state-of-the-art results.