Connectionist Temporal Modeling for Weakly Supervised Action Labeling

@article{Huang2016ConnectionistTM,
  title={Connectionist Temporal Modeling for Weakly Supervised Action Labeling},
  author={De-An Huang and Li Fei-Fei and Juan Carlos Niebles},
  journal={ArXiv},
  year={2016},
  volume={abs/1607.08584}
}
We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with… Expand
SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation
TLDR
This work assumes that for each training video only the list of actions is given that occur in the video, but not when, how often, and in which order they occur, and proposes an approach that can be trained end-to-end on such data. Expand
Weakly Supervised Action Segmentation Using Mutual Consistency
TLDR
This paper proposes a new approach for weakly supervised action segmentation based on a two branch network that achieves state-of-the-art results foraction segmentation and action alignment while being fully differentiable and faster to train since it does not require a costly alignment step during training. Expand
Temporal Action Labeling using Action Sets
TLDR
This work introduces a system that automatically learns to temporally segment and label actions in a video, where the only supervision that is used are action sets. Expand
Learning Temporal Action Proposals With Fewer Labels
TLDR
This work proposes a semi-supervised learning algorithm specifically designed for training temporal action proposal networks and shows that this approach consistently matches or outperforms the fully supervised state-of-the-art approaches. Expand
Weakly Supervised Energy-Based Learning for Action Segmentation
This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in trainingExpand
Weakly Supervised Energy-Based Learning for Action Segmentation
TLDR
A new constrained discriminative forward loss (CDFL) that is used for training the HMM and GRU under weak supervision and gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets. Expand
Weakly Supervised Energy-Based Learning for Action Segmentation
This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in trainingExpand
Weakly Supervised Energy-Based Learning for Action Segmentation
This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in trainingExpand
Weakly Supervised Gaussian Networks for Action Detection
TLDR
A novel method is proposed, called WSGN, that learns to detect actions from weak supervision, using only video-level labels, that leads to significant gains in action detection for two standard benchmarks THU-MOS14 and Charades. Expand
A flexible model for training action localization with varying levels of supervision
TLDR
This work proposes a unifying framework that can handle and combine varying types of less demanding weak supervision, based on discriminative clustering and integrates different types of supervision as constraints on the optimization. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 59 REFERENCES
Automatic annotation of human actions in video
TLDR
This paper addresses the problem of automatic temporal annotation of realistic human actions in video using minimal manual supervision with a kernel-based discriminative clustering algorithm that locates actions in the weakly-labeled training data. Expand
Weakly Supervised Action Labeling in Videos under Ordering Constraints
TLDR
It is shown that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner and evaluated on a new and challenging dataset of 937 video clips. Expand
Watch-n-patch: Unsupervised understanding of actions and relations
TLDR
The model learns the high-level action co-occurrence and temporal relations between the actions in the activity video and is applied to unsupervised action segmentation and recognition, and also to a novel application that detects forgotten actions, which is called action patching. Expand
Weakly-Supervised Alignment of Video with Text
TLDR
This paper proposes a method for aligning the two modalities of video and text, i.e., automatically providing a time (frame) stamp for every sentence, and formulate this problem as an integer quadratic program, and solve its continuous convex relaxation using an efficient conditional gradient algorithm. Expand
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
TLDR
A novel variant of long short-term memory deep networks is defined for modeling these temporal relations via multiple input and output connections and it is shown that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction. Expand
End-to-End Learning of Action Detection from Frame Glimpses in Videos
TLDR
A fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions and uses REINFORCE to learn the agent's decision policy. Expand
Learning Temporal Embeddings for Complex Video Analysis
TLDR
This paper proposes a scheme for incorporating temporal context based on past and future frames in videos, and compares this to other contextual representations, and shows how data augmentation using multi-resolution samples and hard negatives helps to significantly improve the quality of the learned embeddings. Expand
Learning realistic human actions from movies
TLDR
A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset. Expand
Modeling video evolution for action recognition
TLDR
The proposed method to capture video-wide temporal information for action recognition postulate that a function capable of ordering the frames of a video temporally captures well the evolution of the appearance within the video. Expand
Finding Actors and Actions in Movies
TLDR
This paper applies the proposed framework to the task of learning names of characters in the movie and demonstrates significant improvements over previous methods used for this task and explores the joint actor/action constraint and shows its advantage for weakly supervised action learning. Expand
...
1
2
3
4
5
...