• Corpus ID: 239885704

NeRV: Neural Representations for Videos

@inproceedings{Chen2021NeRVNR,
  title={NeRV: Neural Representations for Videos},
  author={Hao Chen and Bo He and Hanyu Wang and Yixuan Ren and Ser-Nam Lim and Abhinav Shrivastava},
  booktitle={NeurIPS},
  year={2021}
}
We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation. As an image-wise implicit representation, NeRV output the whole… 
Implicit Neural Video Compression
TLDR
The method, which is called implicit pixel flow (IPF), offers several simplifications over established neural video codecs: it does not require the receiver to have access to a pretrained neural network, does not use expensive interpolation-based warping operations, anddoes not require a separate training dataset.
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
TLDR
This work rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames’ features, which decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time.
Implicit Neural Representations for Image Compression
TLDR
This work proposes the first comprehensive image compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding, and finds that the approach to source compression with INRs vastly outperforms similar prior work, is competitive with common compression algorithms designed specifically for images and closes the gap to state-of-theart learned approaches based on Rate-Distortion Autoencoders.
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
TLDR
An INR-based video generator that improves the motion dynamics by manipulating the space and time coordinates differently and a motion discriminator that efficiently identifies the unnatural motions without observing the entire long frame sequences are introduced.
Meta-Learning Sparse Compression Networks
TLDR
This paper introduces the first method allowing for sparsification to be employed in the inner-loop of commonly used Meta-Learning algorithms, drastically improving both compression and the computational cost of learning INRs.
Leveraging Bitstream Metadata for Fast and Accurate Video Compression Correction
TLDR
This work develops a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream and shows that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion.
MINER: Multiscale Implicit Neural Representations
TLDR
A novel implicit representation framework called MINER is presented that is well suited for very large visual signals such as images, videos, and 3D volumes, resulting in a dramatic decrease in inference and training time, while requiring fewer parameters and less memory than state-of-the-art representations.
From data to functa: Your data point is a function and you should treat it like one
TLDR
This paper refers to the data as functa, and proposes a framework for deep learning on funCTa, which has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification.
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
TLDR
LilNetX is introduced, an end-to-end trainable technique for neural networks that enables learning models with specified accuracy-rate-computation trade-off and constructs a joint training objective that penalizes the self information of network parameters in a reparameterized latent space to encourage small model size.
Learning Cross-Video Neural Representations for High-Quality Frame Interpolation
TLDR
Cross-Video Neural Representation (CURE) is proposed as the first video interpolation method based on neural fields, which represents the video as a continuous function parameterized by a coordinate-based neural network, whose inputs are the spatiotemporal coordinates and outputs are the corresponding RGB values.
...
1
2
...

References

SHOWING 1-10 OF 64 REFERENCES
Conditional Entropy Coding for Efficient Video Compression
TLDR
A very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames that outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video and against all video codecs on lower framerates.
DVC: An End-To-End Deep Video Compression Framework
TLDR
This paper proposes the first end-to-end video compression deep model that jointly optimizes all the components for video compression, and shows that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard MS-SSIM.
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
TLDR
This paper presents an image compression architecture using a convolutional autoencoder, and then generalizes image compression to video compression, by adding an interpolation loop into both encoder and decoder sides, to achieve higher image compression performance.
Learned Video Compression
TLDR
This work presents a new algorithm for video coding, learned end-to-end for the low-latency mode, which outperforms all existing video codecs across nearly the entire bitrate range, and is the first ML-based method to do so.
Scale-Space Flow for End-to-End Optimized Video Compression
TLDR
This paper proposes scale-space flow, an intuitive generalization of optical flow that adds a scale parameter to allow the network to better model uncertainty and outperform analogous state-of-the art learned video compression models while being trained using a much simpler procedure and without any pre-trained optical flow networks.
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
TLDR
This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units
TLDR
This paper proposes a novel, simple yet effective activation scheme called concatenated ReLU (CRelu) and theoretically analyze its reconstruction property in CNNs and integrates CRelu into several state-of-the-art CNN architectures and demonstrates improvement in their recognition performance on CIFAR-10/100 and ImageNet datasets with fewer trainable parameters.
Deep Image Prior
TLDR
It is shown that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting.
Video Compression through Image Interpolation
TLDR
This paper presents an alternative in an end-to-end deep learning codec that outperforms today's prevailing codecs, such as H.261, MPEG-4 Part 2, and performs on par with H.264.
Doubly Convolutional Neural Networks
TLDR
This paper proposes doubly convolutional neural networks (DCNNs), which significantly improve the performance of CNNs by further exploring this idea and shows that DCNN can serve the dual purpose of building more accurate models and/or reducing the memory footprint without sacrificing the accuracy.
...
1
2
3
4
5
...