Corpus ID: 236447715

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021

  title={Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021},
  author={Haisheng Su and Peiqin Zhuang and Yukun Li and Dongliang Wang and Weihao Gan and Wei Wu and Yu Qiao},
  • Haisheng Su, Peiqin Zhuang, +4 authors Yu Qiao
  • Published 2021
  • Computer Science
  • ArXiv
This technical report presents an overview of our solution used in the submission to 2021 HACS Temporal Action Localization Challenge on both Supervised Learning Track and Weakly-Supervised Learning Track. Temporal Action Localization (TAL) requires to not only precisely locate the temporal boundaries of action instances, but also accurately classify the untrimmed videos into specific categories. However, Weakly-Supervised TAL indicates locating the action instances using only video-level class… Expand

Figures and Tables from this paper


Transferable Knowledge-Based Multi-Granularity Fusion Network for Weakly Supervised Temporal Action Detection
A novel framework to handle temporal action detection under weak supervision by utilizing convolutional kernels with varied dilation rates to enlarge the receptive fields and a cascaded module with the proposed Online Adversarial Erasing mechanism to further mine more relevant regions of target actions by feeding the erased-feature maps of discovered regions back into the system. Expand
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
A novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of 3D ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data. Expand
AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos
A novel weakly-supervised TAL framework called AutoLoc is developed to directly predict the temporal boundary of each action instance and a novel Outer-Inner-Contrastive (OIC) loss is proposed to automatically discover the needed segment-level supervision for training such a boundary predictor. Expand
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
A Local-Global Temporal Encoder (LGTE), which adopts the channel grouping strategy to efficiently encode both “local and global” temporal inter-dependencies and a Temporal Boundary Regressor (TBR), designed to combine these two regression granularities in an end-to-end fashion, which achieves the precise boundaries and reliable confidence of proposals through progressive refinement. Expand
Temporal Action Detection with Structured Segment Networks
The structured segment network (SSN) is presented, a novel framework which models the temporal structure of each action instance via a structured temporal pyramid and introduces a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. Expand
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
A novel loss function for the localization network is proposed to explicitly consider temporal overlap and achieve high temporal localization accuracy in untrimmed long videos. Expand
Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network
In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling and generates superior accurate proposals over the existing state-of-the-art methods. Expand
Multi-Granularity Generator for Temporal Action Proposal
Through temporally adjusting the segment proposals with fine-grained information based on frame actionness, MGG achieves the superior performance over state-of-the-art methods on the public THUMOS-14 and ActivityNet-1.3 datasets. Expand
Single Shot Temporal Action Detection
This work proposes a novel Single Shot Action Detector (SSAD) network based on 1D temporal convolutional layers to skip the proposal generation step via directly detecting action instances in untrimmed video and empirically investigates into input feature types and fusion strategies to further improve detection accuracy. Expand
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
An effective proposal generation method, named Boundary-Sensitive Network (BSN), which adopts "local to global" fashion and significantly improves the state-of-the-art temporal action detection performance. Expand