Patch-VQ: ‘Patching Up’ the Video Quality Problem

  title={Patch-VQ: ‘Patching Up’ the Video Quality Problem},
  author={Zhenqiang Ying and Maniratnam Mandal and Deepti Ghadiyaram and Alan Bovik University of Texas at Austin and AI Facebook},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem for social and streaming media applications. Efficient and accurate video quality predictors are needed to monitor and guide the processing of billions of shared, often imperfect, user-generated content (UGC). Unfortunately, current NR models are limited in their prediction capabilities on real-world, "in-the-wild" UGC video data. To advance progress on this problem, we created the largest… 
A Deep Learning based No-reference Quality Assessment Model for UGC Videos
A very simple but effective UGC VQA model is proposed, which tries to address this problem by training an end-to-end spatial feature extraction network to directly learn the quality-aware spatial feature representation from raw pixels of the video frames.
Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception
This work proposes to transfer knowledge from image quality assessment (IQA) databases with authentic distortions and large-scale action recognition with rich motion patterns and trains the proposed model on the target VQA databases using a mixed list-wise ranking loss function.
FAVER: Blind Quality Prediction of Variable Frame Rate Videos
A first-of-akind blind VQA model for evaluating HFR videos, which is dubbed the Framerate-Aware Video Evaluator w/o Reference (FAVER), which uses extended models of spatial natural scene statistics that encompass space-time wavelet-decomposed video signals, to conduct efficient frame rate sensitive quality prediction.
KonVid-150k: A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild
It is shown that the MLSP-VQA models trained on KonVid-150k sets the new state-of-the-art for cross-test performance on KoNViD-1k and LIVE-Qualcomm with a 0.83 and 0.64 SRCC, respectively, which is exceptionally well suited for training at scale, compared to deep transfer learning approaches.
StarVQA: Space-Time Attention for Video Quality Assessment
A novel spacetime attention network for the VQA problem, named StarVQA, which encodes the space-time position information of each patch to the input of the Transformer for long-range spatiotemporal dependencies of a video sequence.
Generalised Score Distribution: A Two-Parameter Discrete Distribution Accurately Describing Responses from Quality of Experience Subjective Experiments
The proposed Generalised Score Distribution (GSD) properly describes response distributions observed in typical MQA experiments and it is indicated that the GSD outperforms the approach based on sample empirical distribution when it comes to bootstrapping.
Perceptual Quality Assessment of UGC Gaming Videos
—In recent years, with the vigorous development of the video game industry, the proportion of gaming videos on major video websites like YouTube has dramatically increased. However, relatively little
Subjective Quality Assessment of User-Generated Content Gaming Videos
Benefited from the rapid development of the digital game industry, the growing popularity of online user-generated content (UGC) videos for games has accelerated the development of perceptual video
Subjective and Objective Analysis of Streamed Gaming Videos
A novel UGC gaming video resource is created, called the LIVEYouTube Gaming video quality (LIVE-YT-Gaming) database, comprised of 600 real UGCGaming videos, and a subjective human study is conducted on this data, yielding 18,600 human quality ratings recorded by 61 human subjects.
A strong baseline for image and video quality assessment
This work presents a simple yet effective unified model for perceptual quality assessment of image and video that achieves a comparable performance by applying only one global feature derived from a backbone network (i.e. resnet18 in the presented work).


The Konstanz natural video database (KoNViD-1k)
  • Vlad Hosu, F. Hahn, D. Saupe
  • Computer Science
    2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX)
  • 2017
KoNViD-1k is reported on, a subjectively annotated VQA database consisting of 1,200 public-domain video sequences, fairly sampled from a large public video dataset, YFCC100m, aimed at ‘in the wild’ authentic distortions.
Rank Correlation Methods
Rank correlation coefficients are statistical indices that measure the degree of association between two variables having ordered categories. Some well-known rank correlation coefficients are those
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
This work conducts a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and objective V QA model design.
InceptionTime: Finding AlexNet for Time Series Classification
An important step towards finding the AlexNet network for TSC is taken by presenting InceptionTime---an ensemble of deep Convolutional Neural Network models, inspired by the Inception-v4 architecture, which outperforms HIVE-COTE's accuracy together with scalability.
Large-Scale Study of Perceptual Video Quality
This paper has constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions, and demonstrates the value of the new resource, which is called the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it.
Two-Level Approach for No-Reference Consumer Video Quality Assessment
  • J. Korhonen
  • Computer Science
    IEEE Transactions on Image Processing
  • 2019
A new approach for learning-based video quality assessment is proposed, based on the idea of computing features in two levels so that low complexity features are computed for the full sequence first, and then high complexity Features are extracted from a subset of representative video frames, selected by using the low complexity Features.
Quality assessment of in-thewild videos
  • 2006
Thirteen ways to look at the correlation coefficient
Abstract In 1885, Sir Francis Galton first defined the term “regression” and completed the theory of bivariate correlation. A decade later, Karl Pearson developed the index that we still use to
Oberlo. 10 Youtube Statistics Every Marketer Should Know in 2020. [Online] Available: https: //
  • 2020
Quality assessment of in-thewild
  • videos. 2019
  • 2019