VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution

@article{Xie2022VFHQAH,
  title={VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution},
  author={Liangbin Xie and Xintao Wang and Honglun Zhang and Chao Dong and Ying Shan},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2022},
  pages={656-665}
}
  • Liangbin XieXintao Wang Ying Shan
  • Published 6 May 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality video face dataset (VFHQ), which contains over 16, 000 high-fidelity clips of diverse interview… 

References

SHOWING 1-10 OF 48 REFERENCES

Deep Learning-based Face Super-resolution: A Survey

This survey presents a comprehensive review of deep learning-based FSR methods in a systematic manner and roughly categorizes existing methods according to the utilization of facial characteristics.

Learning to Have an Ear for Face Super-Resolution

A novel method to use both audio and a low-resolution image to perform extreme face super-resolution (a 16x increase of the input size) and shows experimentally that audio can assist in recovering attributes such as the gender, the age and the identity, and thus improve the correctness of the high- resolution image reconstruction process.

Exemplar Guided Face Image Super-Resolution Without Facial Landmarks

A convolutional neural network (CNN)-based solution, namely GWAInet, which applies super-resolution by a factor 8x on face images guided by another unconstrained HR face image of the same person with possible differences in age, expression, pose or size is proposed.

VGGFace2: A Dataset for Recognising Faces across Pose and Age

A new large-scale face dataset named VGGFace2 is introduced, which contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject, and the automated and manual filtering stages to ensure a high accuracy for the images of each identity are described.

Video Face Super-Resolution with Motion-Adaptive Feedback Cell

This paper proposes a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way, and can achieve better performance in the case of extremely complex motion scenarios.

Face Super-Resolution Guided by Facial Component Heatmaps

This paper proposes a method that explicitly incorporates structural information of faces into the face super-resolution process by using a multi-task convolutional neural network (CNN) and achieves superior face hallucination results and outperforms the state-of-the-art.

VoxCeleb: A Large-Scale Speaker Identification Dataset

This paper proposes a fully automated pipeline based on computer vision techniques to create a large scale text-independent speaker identification dataset collected 'in the wild', and shows that a CNN based architecture obtains the best performance for both identification and verification.

Self-Enhanced Convolutional Network for Facial Video Hallucination

A self-enhanced convolutional network for facial video hallucination is proposed by making full usage of preceding super-resolved frames and a temporal window of adjacent low-resolution frames and achieves excellent performance in the task of general video super-resolution in a single-shot setting.

FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation

A state-of-the-art video denoising algorithm based on a convolutional neural network architecture that exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model.

Blind Video Temporal Consistency via Deep Video Prior

This work shows that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior, and proposes a carefully designed iteratively reweighted training strategy to address the challenging multimodal inconsistency problem.