Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
@article{Huang2021ForegroundActionCN, title={Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization}, author={Linjiang Huang and Liang Wang and Hongsheng Li}, journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2021}, pages={7982-7991} }
As a challenging task of high-level video understanding, weakly supervised temporal action localization has been attracting increasing attention. With only video annotations, most existing methods seek to handle this task with a localization-by-classification framework, which generally adopts a selector to select snippets of high probabilities of actions or namely the foreground. Nevertheless, the existing foreground selection strategies have a major limitation of only considering the…
Figures and Tables from this paper
17 Citations
Dual-Evidential Learning for Weakly-supervised Temporal Action Localization
- Computer ScienceECCV
- 2022
A generalized evidential deep learning framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal and achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks.
Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
- Computer Science
- 2022
A novel method from a category exclusion perspective, named Progressive Complementary Learning (ProCL), which gradually enhances the snippet-level supervision and introduces the background-aware pseudo complementary labeling in order to exclude more categories for snippets of less ambiguity.
Weakly-supervised Action Localization via Hierarchical Mining
- Computer ScienceArXiv
- 2022
A hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining, to maximize the usage of the given annotations and prediction-wise consistency is proposed.
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
ASM-Loc is proposed, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods and entails three segment-centric components: dynamic segment sampling for compensating the contribution of short actions, intra- and inter-segment attention for modeling action dynamics and capturing temporal dependencies.
Forcing the Whole Video as Background: An Adversarial Learning Strategy for Weakly Temporal Action Localization
- Computer ScienceACM Multimedia
- 2022
An adversarial learning strategy is presented to break the limitation of mining pseudo background snippets and a novel temporal enhancement network is designed to facilitate the model to construct temporal relation of affinity snippets based on the proposed strategy, for further improving the performance of action localization.
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This method seeks to mine the representative snippets in each video for propagating information between video snippets to generate better pseudo labels and obtains superior performance on two benchmarks, THUMOS14 and ActivityNet1.
Exploring Denoised Cross-video Contrast for Weakly-supervised Temporal Action Localization
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This work proposes a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting.
Distilling Vision-Language Pre-training to Collaborate with Weakly-Supervised Temporal Action Localization
- Computer ScienceArXiv
- 2022
A novel distillation-collaboration framework with two branches acting as CBP and VLP respectively, which is effectively fused to pro-mote a strong alliance for temporally localized action localization.
Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
- Computer ScienceArXiv
- 2022
A novel C 3 BN to achieve robust snippet predictions is proposed and a macro-micro consistency regularization strategy is proposed to force the model to be invariant (or equivariant) to the transformations of snippets with respect to video semantics, snippet predictions and snippet features.
End-to-End Temporal Action Detection With Transformer
- Computer ScienceIEEE Transactions on Image Processing
- 2022
TadTR is an end-to-end Transformer-based method for temporal action detection that achieves state-of-the-art performance on THUMOS14 and HACS Segments, and requires lower computation cost than previous detectors, while preserving remarkable performance.
References
SHOWING 1-10 OF 50 REFERENCES
A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization
- Computer ScienceAAAI
- 2021
This paper presents a novel framework named HAM-Net with a hybrid attention mechanism which includes temporal soft, semi-soft and hard attentions to address weakly supervised temporal action localization.
Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization
- Computer ScienceACM Multimedia
- 2020
A novel weakly-supervised Action Completeness Modeling with Background Aware Networks (ACM-BANets) with an asymmetrical training strategy, to suppress both highly discriminative and ambiguous background frames to remove the false positives.
Background Suppression Network for Weakly-supervised Temporal Action Localization
- Computer ScienceArXiv
- 2020
Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each…
Weakly-Supervised Action Localization by Generative Attention Modeling
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper proposes to model the class-agnostic frame-wise probability conditioned on the frame attention using conditional Variational Auto-Encoder (VAE), and demonstrates advantage of the method and effectiveness in handling action-context confusion problem.
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization
- Computer ScienceECCV
- 2020
A Two-Stream Consensus Network (TSCN) to simultaneously address weakly-supervised Temporal Action Localization challenges and a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
Modeling Sub-Actions for Weakly Supervised Temporal Action Localization
- Computer ScienceIEEE Transactions on Image Processing
- 2021
This paper describes a novel approach to alleviate the contradiction for detecting more complete action instances by explicitly modeling sub-actions, and devise three complementary loss functions, namely, representation loss, balance loss and relation loss to ensure the learned sub- actions are diverse and have clear semantic meanings.
Weakly-supervised Temporal Action Localization by Uncertainty Modeling
- Computer ScienceAAAI
- 2021
A new perspective on background frames is presented where they are modeled as out-of-distribution samples regarding their inconsistency and a background entropy loss is introduced to better discriminate background frames by encouraging their in-dist distribution (action) probabilities to be uniformly distributed over all action classes.
3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This work proposes a framework, called 3C-Net, which only requires video-level supervision (weak supervision) in the form of action category labels and the corresponding count to learn discriminative action features with enhanced localization capabilities.
Weakly-Supervised Action Localization With Background Modeling
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
A latent approach that learns to detect actions in long sequences given training videos with only whole-video class labels, and can be used to aggressively scale-up learning to in-the-wild, uncurated Instagram videos (where relevant frames and videos are automatically selected through attentional processing).
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning
- Computer ScienceECCV
- 2020
This work explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization (EM) framework, and derives two pseudo-label generation schemes to model the E and M process and iteratively optimize the likelihood lower bound.