Action Unit Memory Network for Weakly Supervised Temporal Action Localization

@article{Luo2021ActionUM,
  title={Action Unit Memory Network for Weakly Supervised Temporal Action Localization},
  author={Wang Luo and Tianzhu Zhang and Wenfei Yang and Jingen Liu and Tao Mei and Feng Wu and Yongdong Zhang},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={9964-9974}
}
  • Wang Luo, Tianzhu Zhang, +4 authors Yongdong Zhang
  • Published 29 April 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Weakly supervised temporal action localization aims to detect and localize actions in untrimmed videos with only video-level labels during training. However, without frame-level annotations, it is challenging to achieve localization completeness and relieve background interference. In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank. In the proposed… Expand

Figures and Tables from this paper

Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection
TLDR
This is the first work to fully explore the global and local structure information in a unified deep model for weakly supervised action detection, and extensive experimental results demonstrate that the proposed MSA-Net performs favorably against state-of-the-art methods. Expand
Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
TLDR
This paper proposes a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model, and demonstrates the superiority of the method over existing state-ofthe-art methods on four benchmarks: THUMOS’14, GTEA, BEOID, and ActivityNet. Expand
Robust Pedestrian Attribute Recognition Using Group Sparsity for Occlusion Videos
TLDR
This paper formulate finding non-occluded frames as sparsity-based temporal attention of a crowded video to solve the uncorrelated attention issue and proposes a novel group sparsity based temporal attention module. Expand

References

SHOWING 1-10 OF 54 REFERENCES
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization
TLDR
A Two-Stream Consensus Network (TSCN) to simultaneously address weakly-supervised Temporal Action Localization challenges and a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries. Expand
Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization
TLDR
This work identifies two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation, and proposes a multi-branch neural network in which branches are enforced to discover distinctive action parts. Expand
Background Suppression Network for Weakly-supervised Temporal Action Localization
Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether eachExpand
Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks
  • Zi-yi Liu, Le Wang, +4 authors G. Hua
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
TLDR
The Contrast-based Localization EvaluAtioN Network (CleanNet) is proposed with the new action proposal evaluator, which provides pseudo-supervision by leveraging the temporal contrast in snippet-level action classification predictions, and is an integral part of CleanNet which enables end-to-end training. Expand
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
TLDR
This work proposes a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks that attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision. Expand
AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos
TLDR
A novel weakly-supervised TAL framework called AutoLoc is developed to directly predict the temporal boundary of each action instance and a novel Outer-Inner-Contrastive (OIC) loss is proposed to automatically discover the needed segment-level supervision for training such a boundary predictor. Expand
Learning Temporal Co-Attention Models for Unsupervised Video Action Localization
TLDR
This work proposes a two-step ``clustering + localization" iterative procedure, which can be regarded as a direct extension of the weakly-supervised ACL model, and introduces new losses specially designed for ACL, including action-background separation loss and cluster-based triplet loss. Expand
Weakly-Supervised Action Localization by Generative Attention Modeling
TLDR
This paper proposes to model the class-agnostic frame-wise probability conditioned on the frame attention using conditional Variational Auto-Encoder (VAE), and demonstrates advantage of the method and effectiveness in handling action-context confusion problem. Expand
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
TLDR
A novel loss function for the localization network is proposed to explicitly consider temporal overlap and achieve high temporal localization accuracy in untrimmed long videos. Expand
Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection
TLDR
This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection and designs a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Expand
...
1
2
3
4
5
...