• Publications
  • Influence
Learning realistic human actions from movies
tl;dr
We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space- time pyramids and multi-channel non-linear SVMs to the spatiotemporal domain. Expand
  • 3,359
  • 573
  • Open Access
A Spatio-Temporal Descriptor Based on 3D-Gradients
tl;dr
In this work, we present a novel local descriptor for video sequences. Expand
  • 1,704
  • 186
  • Open Access
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study
tl;dr
This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier based on two effective measures for comparing distributions, the Earth Mover’s Distance and the χ2 distance. Expand
  • 2,087
  • 146
  • Open Access
Actions in context
tl;dr
This paper exploits the context of natural dynamic scenes for human action recognition in video. The contribution of this paper is three-fold: (a) we automatically discover relevant scene classes and their correlation with human actions, (b) we show how to learn selected scene classes from video without manual supervision and (c) we develop a joint scene-action SVM-based classifier. Expand
  • 863
  • 122
Actions in context
tl;dr
We use movie scripts for automatic video annotation and apply text mining to discover scene classes which co-occur with given actions. Expand
  • 451
  • 67
  • Open Access
High Five: Recognising human interactions in TV shows
tl;dr
We develop a per-person descriptor that uses attention (head orientation) and the local spatial and temporal context in a neighbourhood of each detected person in a frame to improve upon the initial classification obtained with our descriptor. Expand
  • 157
  • 33
  • Open Access
Structured Learning of Human Interactions in TV Shows
tl;dr
The objective of this work is recognition and spatiotemporal localization of two-person interactions in TV video such as sitcoms and dramas. Expand
  • 142
  • 29
  • Open Access
Semantic Hierarchies for Visual Object Recognition
tl;dr
In this paper we propose to use lexical semantic networks to extend the state-of-the-art object recognition techniques. Expand
  • 293
  • 17
  • Open Access
Constructing Category Hierarchies for Visual Recognition
Class hierarchies are commonly used to reduce the complexity of the classification problem. This is crucial when dealing with a large number of categories. In this work, we evaluate class hierarchiesExpand
  • 171
  • 16
  • Open Access
Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study
Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images asExpand
  • 185
  • 13
  • Open Access