Parsing Videos of Actions with Segmental Grammars

@article{Pirsiavash2014ParsingVO,
  title={Parsing Videos of Actions with Segmental Grammars},
  author={Hamed Pirsiavash and Deva Ramanan},
  journal={2014 IEEE Conference on Computer Vision and Pattern Recognition},
  year={2014},
  pages={612-619}
}
  • H. Pirsiavash, D. Ramanan
  • Published 2014
  • Computer Science
  • 2014 IEEE Conference on Computer Vision and Pattern Recognition
Real-world videos of human activities exhibit temporal structure at various scales, long videos are typically composed out of multiple action instances, where each instance is itself composed of sub-actions with variable durations and orderings. Temporal grammars can presumably model such hierarchical structure, but are computationally difficult to apply for long video streams. We describe simple grammars that capture hierarchical temporal structure while admitting inference with a finite-state… Expand
Unsupervised Deep Networks for Temporal Localization of Human Actions in Streaming Videos
We propose a deep neural network which captures latent temporal features suitable for localizing actions temporally in streaming videos. This network uses unsupervised generative models containingExpand
Unsupervised Semantic Action Discovery from Video Collections
TLDR
This paper proposes a method for parsing a video into semantic steps in an unsupervised way, capable of providing a semantic "storyline" of the video composed of its objective steps. Expand
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
TLDR
A novel variant of long short-term memory deep networks is defined for modeling these temporal relations via multiple input and output connections and it is shown that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction. Expand
Unsupervised Hierarchical Dynamic Parsing and Encoding for Action Recognition
TLDR
A novel hierarchical dynamic parsing and encoding method to capture both the locally smooth dynamics and globally drastic dynamic changes of an action to form a joint representation for action recognition. Expand
Temporal Action Labeling using Action Sets
TLDR
This work introduces a system that automatically learns to temporally segment and label actions in a video, where the only supervision that is used are action sets. Expand
Differentiable Grammars for Videos
TLDR
A novel algorithm which learns a formal regular grammar from real-world continuous data, such as videos, which outperforms the state-of-the-art on several challenging datasets and is more accurate for forecasting future activities in videos. Expand
Action Recognition by Hierarchical Mid-Level Action Elements
TLDR
This work introduces an unsupervised method that is capable of distinguishing action-related segments from background segments and representing actions at multiple spatiotemporal resolutions, and develops structured models that capture a rich set of spatial, temporal and hierarchical relations among the segments. Expand
Temporal Action Detection Using a Statistical Language Model
TLDR
This work proposes a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure and reports state-of-the-art results on three datasets. Expand
Asynchronous Temporal Fields for Action Recognition
TLDR
This work proposes a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. Expand
Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints
TLDR
This work introduces a system that automatically learns to temporally segment and label actions in a video, where the only supervision that is used are action sets. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Unsupervised learning of event AND-OR grammar and semantics from video
TLDR
This work proposes to learn the event grammar under the information projection and minimum description length principles in a coherent probabilistic framework, without manual supervision about what events happen and when they happen, and presents a prototype system for video analysis integrating bottom-up detection, grammatical learning and parsing. Expand
Retrieving actions in movies
  • I. Laptev, P. Pérez
  • Computer Science
  • 2007 IEEE 11th International Conference on Computer Vision
  • 2007
TLDR
A new annotated human action dataset is introduced and a new "keyframe priming" that combines discriminative models of human motion and shape within an action is shown to significantly improve the performance of action detection. Expand
Recognizing multitasked activities from video using stochastic context-free grammar
TLDR
New parsing strategies to enable error detection and recovery in stochastic context-free grammar and methods of quantifying group and individual behavior in activities with separable roles are introduced. Expand
Learning latent temporal structure for complex event detection
TLDR
A conditional model trained in a max-margin framework that is able to automatically discover discriminative and interesting segments of video, while simultaneously achieving competitive accuracies on difficult detection and recognition tasks is utilized. Expand
Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models
TLDR
This paper proposes a discriminative semi-Markov model approach, and defines a set of features over boundary frames, segments, as well as neighboring segments that enable it to conveniently capture a combination of local and global features that best represent each specific action type. Expand
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification
TLDR
A framework for modeling motion by exploiting the temporal structure of the human activities, which represents activities as temporal compositions of motion segments, and shows that the algorithm performs better than other state of the art methods. Expand
Recognition of Visual Activities and Interactions by Stochastic Parsing
TLDR
A probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents and how the system correctly interprets activities of multiple interacting objects is demonstrated. Expand
Joint segmentation and classification of human actions in video
TLDR
This method is based on a discriminative temporal extension of the spatial bag-of-words model that has been very popular in object recognition and is performed robustly within a multi-class SVM framework whereas the inference over the segments is done efficiently with dynamic programming. Expand
Probabilistic event logic for interval-based event recognition
TLDR
This paper argues that holistic reasoning about time intervals of events, and their temporal constraints is critical in such domains to overcome the noise inherent to low-level video representations and proposes a MAP inference algorithm for PEL that addresses the scalability issue of reasoning about an enormous number of time intervals and their constraints in a typical video. Expand
Hidden Conditional Random Fields for Gesture Recognition
TLDR
This paper derives a discriminative sequence model with a hidden state structure, and demonstrates its utility both in a detection and in a multi-way classification formulation. Expand
...
1
2
3
4
5
...