• Corpus ID: 235489782

Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track

  title={Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track},
  author={Yuanhao Zhai and Le Wang and David S. Doermann and Junsong Yuan},
This technical report presents our solution to the HACS Temporal Action Localization Challenge 2021, Weakly-Supervised Learning Track. The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos given only video-level labels. We adopt the two-stream consensus network (TSCN) [5] as the main framework in this challenge. The TSCN consists of a two-stream base model training pro-cedure and a pseudo ground truth learning… 

Figures and Tables from this paper



Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization

A Two-Stream Consensus Network (TSCN) to simultaneously address weakly-supervised Temporal Action Localization challenges and a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.

Weakly-Supervised Action Localization With Background Modeling

A latent approach that learns to detect actions in long sequences given training videos with only whole-video class labels, and can be used to aggressively scale-up learning to in-the-wild, uncurated Instagram videos (where relevant frames and videos are automatically selected through attentional processing).

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

On HACS Segments, the state-of-the-art methods of action proposal generation and action localization are evaluated, and the new challenges posed by the dense temporal annotations are highlighted.

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.

A Duality Based Approach for Realtime TV-L1 Optical Flow

This work presents a novel approach to solve the TV-L1 formulation, which is based on a dual formulation of the TV energy and employs an efficient point-wise thresholding step.