Cross-Modal Learning for Domain Adaptation in 3D Semantic Segmentation

  title={Cross-Modal Learning for Domain Adaptation in 3D Semantic Segmentation},
  author={Maximilian Jaritz and Tuan-Hung Vu and Raoul de Charette and {\'E}milie Wirbel and Patrick P{\'e}rez},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target… 

UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline

The results demonstrate that, by easily coupling UniDA3D with off-the-shelf 3D segmentation baselines, domain generalization ability of these baselines can be enhanced.

Learning 3D Semantic Segmentation with only 2D Image Supervision

This paper investigates how to use only those labeled 2D image collections to supervise training 3D semantic segmentation models using multi-view fusion, and addresses several novel issues with this approach, including how to select trusted pseudo-labels, how to sample 3D scenes with rare object categories, and how to decouple input features from 2D images from pseudo-Labels during training.

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

This work proposes PiMAE, a self-supervised pre-training framework that promotes 3D and 2D interaction through three aspects, and designs a unique cross-modal reconstruction module to enhance representation learning for both modalities.

Crossmodal Few-shot 3D Point Cloud Semantic Segmentation

This paper introduces a novel crossmodal few-shot learning approach for 3D point cloud semantic segmentation, in which the point cloud to be segmented is taken as query while one or few labeled 2D RGB images are taken as support to guide the segmentation of query.

Teaching robots to see clearly: optimizing cross-modal domain adaptation models through sequential classification and user evaluation

This work aims to improve xMUDA, a state-of-the-art multi-modal UDA model by incorporating a multi-step binary classification algorithm, which allows us to prioritize certain data labels, and alongside human evaluation, the mIoU and accuracy of the final output are reported.

Enhancing 3D Point Cloud Semantic Segmentation Using Multi-Modal Fusion With 2D Images

The performance of an open-source multimodal algorithm, MVPNet, is improved on 3D semantic segmentation task by using KPConv as a more robust and stronger 3D backbone.



xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

This work proposes cross-modal UDA (xMUDA) where the presence of 2D images and 3D point clouds for 3D semantic segmentation is assumed where the two input spaces are heterogeneous and can be impacted differently by domain shift.

DADA: Depth-Aware Domain Adaptation in Semantic Segmentation

This work proposes a unified depth-aware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain, and achieves state-of-the-art performance on different challenging synthetic-2-real benchmarks.

Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

A mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner is proposed.

ESL: Entropy-guided Self-supervised Learning for Domain Adaptation in Semantic Segmentation

Entropy-guided Self-supervised Learning (ESL), leveraging entropy as the confidence indicator for producing more accurate pseudo-labels is proposed, which consistently outperforms strong SSL baselines and achieves state-of-the-art results.

Learning to Adapt Structured Output Space for Semantic Segmentation

A multi-level adversarial network is constructed to effectively perform output space domain adaptation at different feature levels and it is shown that the proposed method performs favorably against the state-of-the-art methods in terms of accuracy and visual quality.

Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation

A semi-supervised approach named Alleviating Semantic-level Shift (ASS) is proposed, which can promote the distribution consistency from both global and local views and can beat the oracle model trained on the whole target dataset.

mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets

This paper formulates this as a multi-source domain adaptation and label unification problem, and proposes a novel method for it, which outperforms all competing methods significantly.

Semi-supervised Domain Adaptation with Subspace Learning for visual recognition

A novel domain adaptation framework, named Semi-supervised Domain Adaptation with Subspace Learning (SDASL), which jointly explores invariant low-dimensional structures across domains to correct data distribution mismatch and leverages available unlabeled target examples to exploit the underlying intrinsic information in the target domain.

Bidirectional Learning for Domain Adaptation of Semantic Segmentation

A self-supervised learning algorithm to learn a better segmentation adaptation model and in return improve the image translation model and the bidirectional learning framework for domain adaptation of segmentation is proposed.

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

  • Jonathan MunroD. Damen
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
  • 2019
This work proposes a multi-modal approach for adapting action recognition models to novel environments, employing late fusion of the two modalities commonly used in action recognition (RGB and Flow), with multiple domain discriminators, so alignment of modalities is jointly optimised with recognition.