Actions ~ Transformations

@article{Wang2016ActionsT,
  title={Actions ~ Transformations},
  author={X. Wang and Ali Farhadi and Abhinav Gupta},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={2658-2667}
}
What defines an action like "kicking ball. [...] Key Method Motivated by recent advancements of video representation using deep learning, we design a Siamese network which models the action as a transformation on a high-level feature space. We show that our model gives improvements on standard action recognition datasets including UCF101 and HMDB51. More importantly, our approach is able to generalize beyond learned action categories and shows significant performance improvement on cross-category…Expand
Action Recognition Based on Discriminative Embedding of Actions Using Siamese Networks
TLDR
This paper trains a Siamese deep neural network with a contrastive loss on the low-dimensional representation of a pool of attributes learned in a universal Gaussian mixture model using factor analysis to classify actions by leveraging the corresponding class labels. Expand
Asynchronous Temporal Fields for Action Recognition
TLDR
This work proposes a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. Expand
Am I Done? Predicting Action Progress in Videos
TLDR
A novel approach is introduced, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution, based on a combination of the Faster R-CNN framework and LSTM networks. Expand
Encouraging LSTMs to Anticipate Actions Very Early
TLDR
A new action anticipation method that achieves high prediction accuracy even in the presence of a very small percentage of a video sequence, and develops a multi-stage LSTM architecture that leverages context-aware and action-aware features, and introduces a novel loss function that encourages the model to predict the correct class as early as possible. Expand
Procedural Generation of Videos to Train Deep Action Recognition Networks
TLDR
This work proposes an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines, and generates a diverse, realistic, and physically plausible dataset of humanaction videos, called PHAV for Procedural Human Action Videos. Expand
Explainable Video Action Reasoning via Prior Knowledge and State Transitions
TLDR
A novel action reasoning framework that uses prior knowledge to explain semantic-level observations of video state changes and can be used to detect and explain how those actions are executed with prior knowledge, just like the logical manner of thinking by humans. Expand
REPRESENTATION LEARNING FOR ACTION RECOGNITION
TLDR
The objective of this research work is to develop discriminative representations for human actions by combining the advantages of both low-level and high-level features and demonstrates the efficacy of sparse representation in the identification of the human body under rapid and substantial deformation. Expand
Pose from Action: Unsupervised Learning of Pose Features based on Motion
TLDR
An unsupervised method to learn pose features from videos that exploits a signal which is complementary to appearance and can be used as supervision: motion is proposed. Expand
Joint Discovery of Object States and Manipulation Actions
TLDR
This work proposes a joint model that learns to identify object states and to localize state-modifying actions and demonstrates successful discovery of seven manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations. Expand
Long-Term Temporal Convolutions for Action Recognition
TLDR
It is demonstrated that LTC-CNN models with increased temporal extents improve the accuracy of action recognition and the impact of different low-level representations, such as raw values of video pixels and optical flow vector fields, and the importance of high-quality optical flow estimation for learning accurate action models. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 70 REFERENCES
Modeling Actions through State Changes
  • A. Fathi, James M. Rehg
  • Computer Science
  • 2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
TLDR
This paper proposes a weakly supervised method for learning the object and material states that are necessary for recognizing daily actions and demonstrates that this method can be used to segment discrete actions from a continuous video of an activity. Expand
Finding action tubes
TLDR
This work addresses the problem of action detection in videos using rich feature hierarchies derived from shape and kinematic cues and extracts spatio-temporal feature representations to build strong classifiers using Convolutional Neural Networks. Expand
Trajectory-Based Modeling of Human Actions with Motion Reference Points
TLDR
This paper proposes a simple representation specifically aimed at the modeling of human action recognition in videos that operates on top of visual codewords derived from local patch trajectories, and therefore does not require accurate foreground-background separation, which is typically a necessary step to model object relationships. Expand
Learning realistic human actions from movies
TLDR
A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset. Expand
Action Recognition by Hierarchical Mid-Level Action Elements
TLDR
This work introduces an unsupervised method that is capable of distinguishing action-related segments from background segments and representing actions at multiple spatiotemporal resolutions, and develops structured models that capture a rich set of spatial, temporal and hierarchical relations among the segments. Expand
Action Recognition with Actons
TLDR
A two-layer structure for action recognition to automatically exploit a mid-level ``acton'' representation via a new max-margin multi-channel multiple instance learning framework, which yields the state-of-the-art classification performance on Youtube and HMDB51 datasets. Expand
Better Exploiting Motion for Better Action Recognition
TLDR
It is established that adequately decomposing visual motion into dominant and residual motions, both in the extraction of the space-time trajectories and for the computation of descriptors, significantly improves action recognition algorithms. Expand
Modeling video evolution for action recognition
TLDR
The proposed method to capture video-wide temporal information for action recognition postulate that a function capable of ordering the frames of a video temporally captures well the evolution of the appearance within the video. Expand
Action Recognition by Hierarchical Sequence Summarization
TLDR
This work presents a hierarchical sequence summarization approach for action recognition that learns multiple layers of discriminative feature representations at different temporal granularities and shows that its complexity grows sub linearly with the size of the hierarchy. Expand
Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin
  • Yang Wang, Greg Mori
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2011
TLDR
This work presents a discriminative part-based approach for human action recognition from video sequences using motion features based on the recently proposed hidden conditional random field (HCRF) for object recognition, and demonstrates that MMHCRF outperforms HCRF in humanaction recognition. Expand
...
1
2
3
4
5
...