Accurate and Efficient Stereo Matching via Attention Concatenation Volume

@article{Xu2022AccurateAE,
  title={Accurate and Efficient Stereo Matching via Attention Concatenation Volume},
  author={Gangwei Xu and Yun Wang and Junda Cheng and Jinhui Tang and Xin Yang},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.12699}
}
—Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this paper, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. The ACV can be… 

CGI-Stereo: Accurate and Real-Time Stereo Matching via Context and Geometry Interaction

The core of the CGI-Stereo is a Context and Geometry Fusion (CGF) block which adaptively fuses context and geometry information for more accurate andcient cost aggregation and meanwhile provides feedback to feature learning to guide more effective contextual feature extraction.

References

SHOWING 1-10 OF 32 REFERENCES

HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching

HITNet is a novel neural network architecture for real-time stereo matching that not only geometrically reasons about disparities but also infers slanted plane hypotheses allowing to more accurately perform geometric warping and upsampling operations.

Group-Wise Correlation Stereo Network

Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation, and preserves better performance when reducing parameters compared with previous methods.

Pyramid Stereo Matching Network

PSMNet is a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN, which takes advantage of the capacity of global context information by aggregating context in different scales and locations to form a cost volume.

A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos

This benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images and provides data at significantly higher temporal and spatial resolution.

End-to-End Learning of Geometry and Context for Deep Stereo Regression

We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

  • N. MayerEddy Ilg T. Brox
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This paper proposes three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks and presents a convolutional network for real-time disparity estimation that provides state-of-the-art results.

Are we ready for autonomous driving? The KITTI vision benchmark suite

The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.

Bilateral Grid Learning for Stereo Matching Networks

A novel edge-preserving cost volume upsampling module based on the slicing operation in the learned bilateral grid that outperforms existing published real-time deep stereo matching networks, as well as some complex networks on the KITTI stereo datasets.

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

This paper proposes a both memory and time efficient cost volume formulation that is complementary to existing multi-view stereo and stereo matching approaches based on 3D cost volumes and applies the cascade cost volume to the representative MVS-Net, obtaining a 35.6% improvement on DTU benchmark.

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch

A differentiable PatchMatch module is developed that allows us to discard most disparities without requiring full cost volume evaluation and is able to efficiently compute the cost volume for high likelihood hypotheses and achieve savings in both memory and computation.