• Corpus ID: 239768699

AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally Consistent Video Semantic Segmentation

  title={AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally Consistent Video Semantic Segmentation},
  author={Yizhe Zhang and Shubhankar Borse and Hong Cai and Fatih Murat Porikli},
In video segmentation, generating temporally consistent results across frames is as important as achieving framewise accuracy. Existing methods rely either on optical flow regularization or fine-tuning with test data to attain temporal consistency. However, optical flow is not always avail-able and reliable. Besides, it is expensive to compute. Fine-tuning the original model in test time is cost sensitive. This paper presents an efficient, intuitive, and unsupervised online adaptation method… 


Learning Blind Video Temporal Consistency
An efficient approach based on a deep recurrent network for enforcing temporal consistency in a video that can handle multiple and unseen tasks, including but not limited to artistic style transfer, enhancement, colorization, image-to-image translation and intrinsic image decomposition.
Exploiting Temporality for Semi-Supervised Video Segmentation
This work tackles the issue of label scarcity by using consecutive frames of a video, where only one frame is annotated, and proposes a deep, end-to-end trainable model which leverages temporal information in order to make use of easy to acquire unlabeled data.
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow
This paper proposes a novel framework for joint video semantic segmentation and optical flow estimation that is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.
Online Meta Adaptation for Fast Video Object Segmentation
Conventional deep neural networks based video object segmentation (VOS) methods are dominated by heavily fine-tuning a segmentation model on the first frame of a given video, which is time-consuming
Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video
  • S. Jain, Xin Wang, Joseph E. Gonzalez
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that
Semantic Video Segmentation by Gated Recurrent Flow Propagation
A deep, end-to-end trainable methodology for video segmentation that is capable of leveraging the information present in unlabeled data, besides sparsely labeled frames, in order to improve semantic estimates.
Dynamic Warping Network for Semantic Video Segmentation
This paper proposes a novel framework named Dynamic Warping Network (DWNet) to adaptively warp the interframe features for improving the accuracy of warping-based models and introduces the temporal consistency loss including the feature consistency loss and prediction consistency loss to explicitly supervise the warped features.
Improving Semantic Segmentation via Video Propagation and Label Relaxation
This paper presents a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks, and introduces a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries.
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
Online Adaptive Video Object Segmentation (OnAVOS) is proposed which updates the network online using training examples selected based on the confidence of the network and the spatial configuration and adds a pretraining step based on objectness, which is learned on PASCAL.
Temporal Memory Attention for Video Semantic Segmentation
A Temporal Memory Attention Network (TMANet) is proposed to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction.