Optical Flow and Mode Selection for Learning-based Video Coding

  title={Optical Flow and Mode Selection for Learning-based Video Coding},
  author={Th'eo Ladune and Pierrick Philippe and Wassim Hamidouche and Lu Zhang and Olivier D'eforges},
  journal={2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)},
This paper introduces a new method for inter-frame coding based on two complementary autoencoders: MOFNet and CodecNet. MOFNet aims at computing and conveying the Optical Flow and a pixel-wise coding Mode selection. The optical flow is used to perform a prediction of the frame to code. The coding mode selection enables competition between direct copy of the prediction or transmission through CodecNet.The proposed coding scheme is assessed under the Challenge on Learned Image Compression 2020… 

Figures from this paper

Conditional Coding for Flexible Learned Video Compression

This paper introduces a novel framework for end-to-end learned video coding. Image compression is generalized through conditional coding to exploit information from reference frames, allowing to

Generalized Difference Coder: A Novel Conditional Autoencoder Structure for Video Compression

This paper proposes the generalized difference coder, a special case of a conditional coder designed to avoid limiting bottlenecks and is able to achieve average rate savings of 27.8% compared to a standard autoencoder, by only adding a moderate complexity overhead of less than 7%.

CANF-VC: Conditional Augmented Normalizing Flows for Video Compression

An end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF), that leverages the conditional ANF to learn a video generative model for conditional inter-frame coding.

Deep Contextual Video Compression

This paper proposes a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding, using feature domain context as condition and enables the high dimension context to carry rich information to both the encoder and the decoder, which helps reconstruct the high-frequency contents for higher video quality.

B-CANF: Adaptive B-frame Coding with Conditional Augmented Normalizing Flows

B-CANF is the first attempt at applying conditional augmented normalizing models to both conditional motion and inter-frame coding, and features frame-type adaptive coding that learns better bit allocation for hierarchical B- frame coding.

AIVC: Artificial Intelligence based Video Codec

This paper introduces AIVC, an end-to-end neural video codec, based on two conditional autoencoders MNet and CNet, for motion compensation and coding, which offers performance competitive with the recent video coder HEVC under several established test conditions.

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

The latent prior is introduced which exploits the correlation among the latent representation to squeeze the temporal redundancy and the dual spatial prior is proposed to reduce the spatial redundancy in a parallel-friendly manner.



DVC: An End-To-End Deep Video Compression Framework

This paper proposes the first end-to-end video compression deep model that jointly optimizes all the components for video compression, and shows that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard MS-SSIM.

Neural Inter-Frame Compression for Video Coding

This work presents an inter-frame compression approach for neural video coding that can seamlessly build up on different existing neural image codecs and proposes to compute residuals directly in latent space instead of in pixel space to reuse the same image compression network for both key frames and intermediate frames.

Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement

The experiments validate that the HLVC approach advances the state-of-the-art of deep video compression methods, and outperforms the "Low-Delay P (LDP) very fast" mode of x265 in terms of both PSNR and MS-SSIM.

Variational image compression with a scale hyperprior

It is demonstrated that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR).

Feedback Recurrent Autoencoder for Video Compression

This work proposes a new network architecture, based on common and well studied components, for learned video compression operating in low latency mode, and yields state of the art MS-SSIM/rate performance on the high-resolution UVG dataset.

Overview of the High Efficiency Video Coding (HEVC) Standard

The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality.

The H.264/MPEG4 advanced video coding standard and its applications

The technology behind the new H.264/MPEG4-AVC standard is discussed, focusing on the main distinct features of its core coding technology and its first set of extensions, known as the fidelity range extensions (FRExt).

End-to-end Optimized Image Compression

Across an independent set of test images, it is found that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods, and a dramatic improvement in visual quality is observed, supported by objective quality estimates using MS-SSIM.

Optical Flow Estimation Using a Spatial Pyramid Network

The Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters, which makes it more efficient and appropriate for embedded applications.

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume, and outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks.