Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

  title={Gradient Forward-Propagation for Large-Scale Temporal Video Modelling},
  author={Mateusz Malinowski and Dimitrios Vytiniotis and Grzegorz Swirszcz and Viorica Patraucean and Jo{\~a}o F. M. Carreira},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. It also creates a coupling between consecutive layers, which limits model parallelism and increases memory consumption. In this paper, we build upon Sideways, which avoids blocking by… 

An In-depth Study of Stochastic Backpropagation

This paper interprets SBP as an efficient way to implement stochastic gradient decent by performing backpropagation dropout, which leads to significant memory saving and training run-time reduction, with a minimal impact on the overall model accuracy.

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

  • Feng ChengMing Xu Wei Xia
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
Stochastic Backpropagation reduces the GPU memory cost by eliminating the need to cache activation values corresponding to the dropped backward paths, whose amount can be controlled by an adjustable keep-ratio.

Chicken Egg Fertility Identification using FOS and BP-Neural Networks on Image Processing

FOS pattern in detecting the fertility of chicken eggs by BP Neural Network is still categorized as low, so it is necessary to improve methods to get maximum results.



GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe is introduced, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers by pipelining different sub-sequences of layers on separate accelerators, resulting in almost linear speedup when a model is partitioned across multiple accelerators.

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

WaveNet: A Generative Model for Raw Audio

WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

Identity Mappings in Deep Residual Networks

The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.

HMDB: A large video database for human motion recognition

This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.

Sideways: Depth-Parallel Training of Video Models

It is shown that Sideways training of deep convolutional video networks not only still converges, but can also potentially exhibit better generalization compared to standard synchronized backpropagation.

A Practical Sparse Approximation for Real Time Recurrent Learning

The Sparse n-step Approximation (SnAp) is introduced to the RTRL influence matrix, which only keeps entries that are nonzero within n steps of the recurrent core, and substantially outperforms other R TRL approximations with comparable costs such as Unbiased Online Recurrent Optimization.

Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units

It is found that the temporal order matters more for the recently introduced 20BN Something-Something dataset where the task of fine-grained action recognition necessitates the model to do temporal reasoning.

Approximating Real-Time Recurrent Learning with Random Kronecker Factors

It is shown that KF-RTRL is an unbiased and memory efficient online learning algorithm that captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by training Recurrent Highway Networks on a synthetic string memorization task and on the Penn TreeBank task, respectively.