Self-Supervised Learning of Domain Invariant Features for Depth Estimation
@article{Akada2021SelfSupervisedLO, title={Self-Supervised Learning of Domain Invariant Features for Depth Estimation}, author={Hiroyasu Akada and S. Bhat and Ibraheem Alhashim and Peter Wonka}, journal={2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year={2021}, pages={997-1007} }
We tackle the problem of unsupervised synthetic-to-real domain adaptation for single image depth estimation. An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output. In this paper, we propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner. Specifically, we extend self-supervised learning from traditional…
Figures and Tables from this paper
4 Citations
Learning Feature Decomposition for Domain Adaptive Monocular Depth Estimation
- Computer Science2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2022
A novel UDA method for MDE, referred to as Learning Feature Decomposition for Adaptation (LFDA), which learns to decompose the feature space into content and style components, which achieves superior accuracy and lower computational cost compared to the state-of-the-art approaches.
LocalBins: Improving Depth Estimation by Learning Local Distributions
- Computer ScienceECCV
- 2022
This work proposes a novel architecture for depth estimation from a single image based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regression tasks and evolves the architecture in two ways.
Domain Decorrelation with Potential Energy Ranking
- Computer ScienceArXiv
- 2022
PoER is proposed to decouple the object fea- ture and the domain feature in given images, promoting the learning of label-discriminative representations while cutting out the irrelevant correlations between the objects and the background.
BYEL : Bootstrap on Your Emotion Latent
- Computer ScienceArXiv
- 2022
This work proposes a framework Bootstrap Your Emotion Latent (BYEL), which uses only synthetic images in training and performs better than the baseline on the macro F1 score metric.
References
SHOWING 1-10 OF 67 REFERENCES
T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks
- Computer ScienceECCV
- 2018
A framework that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network that can be trained end-to-end, leading to good results, even surpassing early deep-learning methods that use real paired data.
DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation
- Computer ScienceECCV
- 2018
Dual Channel-wise Alignment Networks (DCAN) are presented, a simple yet effective approach to reduce domain shift at both pixel-level and feature-level in deep neural networks for semantic segmentation.
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
- Computer ScienceECCV
- 2016
This work proposes a unsupervised framework to learn a deep convolutional neural network for single view depth prediction, without requiring a pre-training stage or annotated ground-truth depths, and shows that this network trained on less than half of the KITTI dataset gives comparable performance to that of the state-of-the-art supervised methods for singleView depth estimation.
Unsupervised Monocular Depth Estimation with Left-Right Consistency
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This paper proposes a novel training objective that enables the convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data, and produces state of the art results for monocular depth estimation on the KITTI driving dataset.
High Quality Monocular Depth Estimation via Transfer Learning
- Computer ScienceArXiv
- 2018
A convolutional neural network for computing a high-resolution depth map given a single RGB image with the help of transfer learning, which outperforms state-of-the-art on two datasets and also produces qualitatively better results that capture object boundaries more faithfully.
AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
The proposed AdaDepth - an unsupervised domain adaptation strategy for the pixel-wise regression task of monocular depth estimation performs competitively with other established approaches on depth estimation tasks and achieves state-of-the-art results in a semi-supervised setting.
DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
- Computer Science, Biology2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A novel dual-GAN mechanism is developed, which enables image translators to be trained from two sets of unlabeled images from two domains, and can even achieve comparable or slightly better results than conditional GAN trained on fully labeled data.
Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks
- Computer Science2018 International Conference on 3D Vision (3DV)
- 2018
A novel unsupervised deep learning approach for predicting depth maps and showing that the depth estimation task can be effectively tackled within an adversarial learning framework is presented.
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
- Computer ScienceNIPS
- 2014
This paper employs two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally, and applies a scale-invariant error to help measure depth relations rather than scale.
CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A novel pixel-wise adversarial domain adaptation algorithm that leverages image-to-image translation methods for data augmentation and introduces a cross-domain consistency loss that enforces the adapted model to produce consistent predictions.