Learning realistic human actions from movies

@article{Laptev2008LearningRH,
  title={Learning realistic human actions from movies},
  author={Ivan Laptev and Marcin Marszalek and Cordelia Schmid and Benjamin Rozenfeld},
  journal={2008 IEEE Conference on Computer Vision and Pattern Recognition},
  year={2008},
  pages={1-8}
}
The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. [...] Key Method We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video.Expand
Automatic annotation of human actions in video
TLDR
This paper addresses the problem of automatic temporal annotation of realistic human actions in video using minimal manual supervision with a kernel-based discriminative clustering algorithm that locates actions in the weakly-labeled training data.
Learning human actions in video
TLDR
This dissertation develops state-of-the-art feature extraction algorithms that robustly encode video information for both, action classification and action localization on realistic video data and proposes two new approaches to describe local features in videos.
Action recognition by exploring data distribution and feature correlation
TLDR
This work proposes an automatic video annotation algorithm by integrating semi-supervised learning and shared structure analysis into a joint framework for human action recognition, and demonstrates that the proposed algorithm outperforms the compared algorithms for action recognition when it has only a few labeled samples.
Transfer Learning for Human Action Recognition
TLDR
This paper proposes a framework that transfers the knowledge about concepts from a previously labeled still image database to the target action video database, and indicates that it is indeed possible to enhance action recognition with the transferred knowledge of even a few concepts.
Action Recognition in Realistic Sports Videos
TLDR
This chapter provides a detailed study of the prominent methods devised for action localization and recognition in videos and argues that performing the recognition on temporally untrimmed videos and attempting to describe an action, instead of conducting a forced-choice classification, are essential for analyzing the human actions in a realistic environment.
Web-Based Classifiers for Human Action Recognition
TLDR
The idea is to use images collected from the Web to learn representations of actions and leverage this knowledge to automatically annotate actions in videos, and to use “ordered pose pairs” (OPP) for encoding the temporal ordering of poses in the action model.
Chapter 9 Action Recognition in Realistic Sports Videos
The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this
Investigating the impact of frame rate towards robust human action recognition
TLDR
Promising results indicate that well designed key-frame selection methods can produce a set of representative frames and eventually reduce the impact of frame rate on the performance of human action recognition.
Investigating time-sensitive topic model approaches for action recognition
AbstractIn this paper, we present several attempts of using topic models for ac-tion recognition in videos. We show that time-sensitive topic models helprecognizing actions when little training data
Action Recognition in Videos
We construct a fully automatic large-scale system for visual retrieval of realistic action samples of different human action classes from TV-series and movies. We first propose a text-driven approach
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Retrieving actions in movies
  • I. Laptev, P. Pérez
  • Computer Science
    2007 IEEE 11th International Conference on Computer Vision
  • 2007
TLDR
A new annotated human action dataset is introduced and a new "keyframe priming" that combines discriminative models of human motion and shape within an action is shown to significantly improve the performance of action detection.
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
TLDR
The approach is not only able to classify different actions, but also to localize different actions simultaneously in a novel and complex video sequence.
Learning Motion Categories using both Semantic and Structural Information
TLDR
A novel generative model is presented, which extends probabilistic latent semantic analysis (pLSA), to capture both semantic and structural information for motion category recognition, and is shown to be better than existing unsupervised methods in both tasks of motion localisation and recognition.
Human action recognition with line and flow histograms
TLDR
A new shape descriptor based on the distribution of lines which are fitted to boundaries of human figures is introduced by using an entropy-based approach to densify the feature representation, thus, minimizing classification time without degrading accuracy.
OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning
  • Li-Jia Li, Li Fei-Fei
  • Computer Science
    2007 IEEE Conference on Computer Vision and Pattern Recognition
  • 2007
TLDR
This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously, and adapts a non-parametric latent topic model and proposes an incremental learning framework.
Detecting People Using Mutually Consistent Poselet Activations
TLDR
A new algorithm for detecting people using poselets is developed which uses only 2D annotations which are much easier for naive human annotators and is the current best performer on the task of people detection and segmentation.
A Biologically Inspired System for Action Recognition
TLDR
The approach builds on recent work on object recognition based on hierarchical feedforward architectures and extends a neurobiological model of motion processing in the visual cortex and finds that sparse features in intermediate stages outperform dense ones and that using a simple feature selection approach leads to an efficient system that performs better with far fewer features.
Reliable Transition Detection in Videos: A Survey and Practitioner's Guide
TLDR
This survey emphasizes those different core concepts underlying the different detection schemes for the three most widely used video transition effects: hard cuts, fades and dissolves.
Names and faces in the news
  • Tamara L. Berg, A. Berg, +5 authors D. Forsyth
  • Computer Science
    Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
  • 2004
TLDR
It is shown quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images, obtained by applying a face finder to approximately half a million captioned news images.
Behavior recognition via sparse spatio-temporal features
TLDR
It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
...
1
2
3
4
...