Dynamic Face Video Segmentation via Reinforcement Learning

@article{Wang2020DynamicFV,
  title={Dynamic Face Video Segmentation via Reinforcement Learning},
  author={Yujiang Wang and Jie Shen and Mingzhi Dong and Yang Wu and Shiyang Cheng and Maja Pantic},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={6957-6967}
}
  • Yujiang Wang, Jie Shen, +3 authors M. Pantic
  • Published 2 July 2019
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
For real-time semantic video segmentation, most recent works utilised a dynamic framework with a key scheduler to make online key/non-key decisions. Some works used a fixed key scheduling policy, while others proposed adaptive key scheduling methods based on heuristic strategies, both of which may lead to suboptimal global performance. To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an… 
Online Keyframe Selection Scheme for Semantic Video Segmentation
  • M. Awan, Jitae Shin
  • Computer Science
    2020 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia)
  • 2020
TLDR
This paper proposes an online keyframe selection scheme for video analysis tasks based on deep reinforcement learning that can be used for the simultaneous decision of keyframe in any real-time video based task.
Semantic video segmentation with dynamic keyframe selection and distortion-aware feature rectification
Abstract The per-frame segmentation methods have a high computational cost, thereby, these methods are insufficient to cope with the fast inference need of semantic video segmentation. To
ST-VLAD: Video Face Recognition Based on Aggregated Local Spatial-Temporal Descriptors
TLDR
A novel video face recognition algorithm is proposed based on an aggregated local spatial-temporal descriptor (ST-VLAD), followed by a novel Fisher Criterion-based weight-learning method, which portrays the local information of the video more accurately, therefore largely improving the representation ability of description vectors.
Multi-feature fusion network for road scene semantic segmentation
TLDR
A lightweight semantic segmentation model is proposed that achieves high accuracy and comparable speed on the Cityscapes and CamVid datasets and less convolutional layers and ResNet will not take up a lot of resources.
FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild
TLDR
This work proposes a simple yet effective method to explicitly incorporate facial semantics into age estimation, so that the model would learn to correctly focus on the most informative facial components from unaligned facial images regardless of head pose and nonrigid deformation.
RoI Tanh-polar Transformer Network for Face Parsing in the Wild
TLDR
The proposed method improves the state-of-the-art for face parsing in the wild and does not require facial landmarks for alignment, as well as proposing a hybrid residual representation learning block, coined HybridBlock, that contains convolutional layers in both the Tanh-polar space and theTanh-Cartesian space.
Dilated Convolutions with Lateral Inhibitions for Semantic Image Segmentation
TLDR
Experimental results on three benchmark datasets show that the LI-based segmentation models outperform the baseline on all of them, thus verify the effectiveness and generality of the proposed LI-Convs.
A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline
TLDR
The proposed end-to-end optimization framework learns the best non-myopic policy for dynamically controlling the resolution of input video streams to globally optimize energy efficiency.
MemX: An Attention-Aware Smart Eyewear System for Personalized Moment Auto-capture
TLDR
This research presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive and therefore expensive and time-consuming and expensive process of computer programming called “hacking”.

References

SHOWING 1-10 OF 69 REFERENCES
Low-Latency Video Semantic Segmentation
TLDR
A framework for video semantic segmentation is developed, which incorporates two novel components: a feature propagation module that adaptively fuses features over time via spatially variant convolution, thus reducing the cost of per-frame computation and an adaptive scheduler that dynamically allocate computation based on accuracy prediction.
Low-Latency Video Semantic Segmentation
TLDR
A framework for video semantic segmentation is developed, which incorporates two novel components: a feature propagation module that adaptively fuses features over time via spatially variant convolution, thus reducing the cost of per-frame computation; and an adaptive scheduler that dynamically allocate computation based on accuracy prediction.
Budget-Aware Deep Semantic Video Segmentation
TLDR
This work formalizes the frame selection as a Markov Decision Process, and specifies a Long Short-Term Memory network to model a policy for selecting the frames, and develops a policy-gradient reinforcement-learning approach for approximating the gradient of the authors' non-decomposable and non-differentiable objective.
Dynamic Video Segmentation Network
TLDR
This paper explores the use of a decision network to adaptively assign different frame regions to different networks based on a metric called expected confidence score, and shows that this network is able to achieve up to 70.4% mIoU at 19.8 fps on the Cityscape dataset.
Clockwork Convnets for Video Semantic Segmentation
TLDR
This work defines a novel family of "clockwork" convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability, and extends clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video.
Semantic Video Segmentation by Gated Recurrent Flow Propagation
TLDR
A deep, end-to-end trainable methodology for video segmentation that is capable of leveraging the information present in unlabeled data, besides sparsely labeled frames, in order to improve semantic estimates.
Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video
  • S. Jain, Xin Wang, Joseph E. Gonzalez
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that
Architecture Search of Dynamic Cells for Semantic Video Segmentation
TLDR
This work proposes a neural architecture search solution, where the choice of operations together with their sequential arrangement are being predicted by a separate neural network, and shows that such generalisation leads to stable and accurate results across common benchmarks, such as CityScapes and CamVid datasets.
Face Mask Extraction in Video Sequence
TLDR
This work introduces an end-to-end trainable model for face mask extraction in video sequence that works on a per-sequence basis, and proposes a novel loss function, called segmentation loss, to directly optimise the intersection over union (IoU) performances.
Inter-BMV: Interpolation with Block Motion Vectors for Fast Semantic Segmentation on Video
TLDR
This work proposes a new scheme that propagates features using the block motion vectors present in compressed video, instead of optical flow, and bi-directionally warps and fuses features from enclosing keyframes to capture scene context on each video frame, which enables it to substantially accelerate segmentation on video.
...
1
2
3
4
5
...