• Corpus ID: 16469354

Team SRI-Sarnoff's AURORA System @ TRECVID 2011

@inproceedings{Cheng2011TeamSA,
  title={Team SRI-Sarnoff's AURORA System @ TRECVID 2011},
  author={Hui Cheng and Amir Tamrakar and Saad Ali and Qian Yu and Omar Javed and Jingen Liu and Ajay Divakaran and Harpreet S. Sawhney and Alexander G. Hauptmann and Mubarak Shah and Subhabrata Bhattacharya and M. Witbrock and Jon Curtis and Gerald Friedland and Robert Mertens and Trevor Darrell and R. Manmatha and James Allan},
  booktitle={TRECVID},
  year={2011}
}
In this paper, we present results from the experimental evaluation for the TRECVID 2011 MED11 (Multimedia Event Detection) task as a part of Team SRI-Sarnoff's AURORA system being developed under the IARPA ALADDIN Program. Our approach employs two classes of content descriptions for describing videos depicting diverse events: (1) Low level features and their aggregates, and (2) Semantic concepts that capture scenes, objects and atomic actions that are local in space-time. In this presentation… 

Figures and Tables from this paper

High-level event recognition in unconstrained videos
TLDR
While the existing solutions vary, common key modules are identified and detailed descriptions along with some insights for each are provided, including extraction and representation of low-level features across different modalities, classification strategies, fusion techniques, etc.
Recognition of complex events in open-source web-scale videos: a bottom up approach
TLDR
This symposium proposal presents a systematic decomposition of complex events into hierarchical components and makes an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy.
Complex Event Recognition Using Constrained Rank Optimization
TLDR
This chapter discusses a low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich concept scores, and demonstrates that the approach consistently improves the discriminativity of the concept scores by a significant margin.
Recognition of Complex Events in Open-source Web-scale Videos: Features, Intermediate Representations and Their Temporal Interactions
TLDR
This dissertation presents a systematic decomposition of com-plex events into hierarchical components and makes an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy and identifies three key stages where it makes novel contributions, keeping complex events in focus.
Action recognition by graph embedding and temporal classifiers
TLDR
A novel framework for selecting a set of prototypes from a labelled graph set taking class discrimination into account is created and Experimental results show that such a discriminative prototype selection framework can achieve superior results, not only for the task of human action recognition, but also in the classification of various structured data compared to other well-established prototype selection approaches.
Exploiting probabilistic relationships between action concepts for complex event classification
TLDR
A probabilistic framework that models the conditional relationships between the concepts and events and devise an approximate yet tractable solution to infer the posterior distribution to perform event classification is proposed.
Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space
TLDR
This paper investigates a novel methodology of representing and retrieving video shots using human-centric high-level features derived in brain imaging space (BIS) where brain responses to natural stimulus of video watching can be explored and interpreted.
Research Statement –
TLDR
This research attempts to address some of the sub-problems which are crucial in context of complex event recognition and are listed as follows:.

References

SHOWING 1-10 OF 15 REFERENCES
MoSIFT: Recognizing Human Actions in Surveillance Videos
TLDR
This paper proposes an algorithm called MoSIFT, which detects interest points and encodes not only their local appearance but also explicitly models local motion, and introduces a bigram model to construct a correlation between local features to capture the more global structure of actions.
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
TLDR
The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Space-time interest points
TLDR
This work builds on the idea of the Harris and Forstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time to detect spatio-temporal events.
Scale & Affine Invariant Interest Point Detectors
TLDR
A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.
Random Forests
TLDR
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Distinctive Image Features from Scale-Invariant Keypoints
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are ...
LIBSVM: A library for support vector machines
TLDR
Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Lin LIBSVM : a library for support vector machines
  • ACM T-IST,
  • 2011
Action recognition by dense trajectories
TLDR
This work introduces a novel descriptor based on motion boundary histograms, which is robust to camera motion and consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos.
...
...