• Publications
  • Influence
Recognizing action at a distance
A novel motion descriptor based on optical flow measurements in a spatiotemporal volume for each stabilized human figure is introduced, and an associated similarity measure to be used in a nearest-neighbor framework is introduced. Expand
End-to-End Learning of Action Detection from Frame Glimpses in Videos
A fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions and uses REINFORCE to learn the agent's decision policy. Expand
A Hierarchical Deep Temporal Model for Group Activity Recognition
A 2-stage deep temporal model designed to represent action dynamics of individual people in a sequence and another LSTM model is designed to aggregate person-level information for whole activity understanding is presented. Expand
Action recognition by learning mid-level motion features
  • A. Fathi, Greg Mori
  • Computer Science
  • IEEE Conference on Computer Vision and Pattern…
  • 23 June 2008
A method constructing mid-level motion features which are built from low-level optical flow information are developed, tuned to discriminate between different classes of action, and are efficient to compute at run-time. Expand
Discriminative figure-centric models for joint action localization and recognition
This paper develops an algorithm for action recognition and localization in videos that does not require reliable human detection and tracking as input and uses a figure-centric visual word representation. Expand
Discriminative Latent Models for Recognizing Contextual Group Activities
This paper proposes a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them and introduces a new feature representation called the action context (AC) descriptor. Expand
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
A novel variant of long short-term memory deep networks is defined for modeling these temporal relations via multiple input and output connections and it is shown that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction. Expand
Recognizing objects in adversarial clutter: breaking a visual CAPTCHA
Efficient methods based on shape context matching are developed that can identify the word in an EZ-Gimpy image with a success rate of 92%, and the requisite 3 words in a Gimpy image 33% of the time. Expand
Similarity-Preserving Knowledge Distillation
This paper proposes a new form of knowledge distillation loss that is inspired by the observation that semantically similar inputs tend to elicit similar activation patterns in a trained network. Expand
A Discriminative Latent Model of Object Classes and Attributes
This work presents a discriminatively trained model for joint modelling of object class labels and their visual attributes and captures the correlations among attributes using an undirected graphical model built from training data. Expand