• Corpus ID: 227012681

Privileged Knowledge Distillation for Online Action Detection

  title={Privileged Knowledge Distillation for Online Action Detection},
  author={Peisen Zhao and Jiajie Wang and Lingxi Xie and Ya Zhang and Yanfeng Wang and Qi Tian},
Online Action Detection (OAD) in videos is proposed as a per-frame labeling task to address the real-time prediction tasks that can only obtain the previous and current video frames. This paper presents a novel learning-with-privileged based framework for online action detection where the future frames only observable at the training stages are considered as a form of privileged information. Knowledge distillation is employed to transfer the privileged information from the offline teacher to… 

Figures and Tables from this paper

Long Short-Term Transformer for Online Action Detection
LSTR provides an effective and efficient method to model long videos with fewer heuristics, which is validated by extensive empirical analysis and achieves state-of-the-art performance on three standard online action detection benchmarks.
GateHUB: Gated History Unit with Background Suppression for Online Action Detection
GateHUB is presented, Gated History Unit with Background Suppression, that comprises a novel position-guided gated crossattention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction.
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
Stochastic Backpropagation reduces the GPU memory cost by eliminat-ing the need to cache activation values corresponding to the dropped backward paths, whose amount can be con-trolled by an adjustable keep-ratio.
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
A framework to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach and improves the framework by introducing the Online-Offline Discrepancy Loss (OODL) to encourage the segmentation results to have a higher temporal consistency.
Continual Transformers: Redundancy-Free Attention for Online Inference
Novel formulations of the Scaled Dot-Product Attention are proposed, which enable Transformers to perform efficient online token-by-token inference in a continual input stream, and the produced outputs and learned weights are identical to those of the original Multi-Head Attention.


A Novel Online Action Detection Framework from Untrimmed Video Streams
Learning to Discriminate Information for Online Action Detection
A novel recurrent unit to explicitly discriminate the information relevant to an ongoing action from others is proposed, named Information Discrimination Unit (IDU), which enables the recurrent network with IDU to learn a more discriminative representation for identifying ongoing actions.
WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos
This work proposes WOAD, a weakly supervised framework that can be trained using only video-class labels and obtains the state-of-the-art results in the tasks of both online per-frame action recognition and online detection of action start.
StartNet: Online Detection of Action Start in Untrimmed Videos
The experimental results show that StartNet significantly outperforms the state-of-the-art by 15%-30% p-mAP under the offset tolerance of 1-10 seconds on THUMOS’14, and achieves comparable performance on ActivityNet with 10 times smaller time offset.
Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization
This work identifies two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation, and proposes a multi-branch neural network in which branches are enforced to discover distinctive action parts.
Online Action Detection in Untrimmed, Streaming Videos.
This work proposes three novel methods to specifically address the challenges in training ODAS models: hard negative samples generation based on Generative Adversarial Network (GAN) to distinguish ambiguous background, and explicitly modeling the temporal consistency between data around action start and data succeeding action start.
Online Action Detection
A realistic dataset composed of 27 episodes from 6 popular TV series and an evaluation protocol for fair comparison is introduced, showing this is a challenging problem for which none of the methods provides a good solution.
Temporal Recurrent Networks for Online Action Detection
A novel framework, the Temporal Recurrent Network (TRN), to model greater temporal context of each frame by simultaneously performing online action detection and anticipation of the immediate future and integrates both of these into a unified end-to-end architecture.
AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection
A novel action detection architecture named anchor-free one-stage temporal action detector (AFO-TAD) is proposed which achieves better performance for detecting action instances with arbitrary lengths and high temporal resolution.
Residual Knowledge Distillation
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A), and devise an effective method to derive S and A from a given model without increasing the total computational cost.