Corpus ID: 236469478

Insights from Generative Modeling for Neural Video Compression

  title={Insights from Generative Modeling for Neural Video Compression},
  author={Ruihan Yang and Yibo Yang and Joseph Marino and Stephan Mandt},
  • Ruihan Yang, Yibo Yang, +1 author S. Mandt
  • Published 2021
  • Engineering, Computer Science
  • ArXiv
While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present recent neural video codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for… Expand

Figures and Tables from this paper


Feedback Recurrent Autoencoder for Video Compression
This work proposes a new network architecture, based on common and well studied components, for learned video compression operating in low latency mode, and yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset. Expand
Deep Generative Video Compression
This work proposes an end-to-end, deep generative modeling approach to compress temporal sequences with a focus on video that builds upon variational autoencoder models for sequential data and combines them with recent work on neural image compression. Expand
Video Compression With Rate-Distortion Autoencoders
A deep generative model for lossy video compression is presented that outperforms the state-of-the-art learned video compression networks based on motion compensation or interpolation and opens up novel video compression applications, which have not been feasible with classical codecs. Expand
Deep Generative Video Compression with Temporal Autoregressive Transforms
State of the art learned methods for lossy video compression (Han et al., 2018; Liu et al., 2019; Lu et al., 2019; Habibian et al., 2019; Yang et al., 2020) build on sequential latent variable modelsExpand
Learned Video Compression via Joint Spatial-Temporal Correlation Exploration
This paper proposes an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors to exploit second-order correlations. Expand
Joint Autoregressive and Hierarchical Priors for Learned Image Compression
It is found that in terms of compression performance, autoregressive and hierarchical priors are complementary and can be combined to exploit the probabilistic structure in the latents better than all previous learned models. Expand
Neural Inter-Frame Compression for Video Coding
This work presents an inter-frame compression approach for neural video coding that can seamlessly build up on different existing neural image codecs and proposes to compute residuals directly in latent space instead of in pixel space to reuse the same image compression network for both key frames and intermediate frames. Expand
Channel-Wise Autoregressive Entropy Models for Learned Image Compression
This work introduces two enhancements, channel-conditioning and latent residual prediction, that lead to network architectures with better rate-distortion performance than existing context-adaptive models while minimizing serial processing. Expand
Lossy Image Compression with Normalizing Flows
This work proposes a deep image compression method that is able to go from low bit-rates to near lossless quality by leveraging normalizing flows to learn a bijective mapping from the image space to a latent representation. Expand
Variable Rate Deep Image Compression With a Conditional Autoencoder
The proposed scheme provides a better rate-distortion trade-off than the traditional variable-rate image compression codecs such as JPEG2000 and BPG and shows comparable and sometimes better performance than the state-of-the-art learned image compression models that deploy multiple networks trained for varying rates. Expand