Amir Tamrakar

Learn More
Low-level appearance as well as spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection and recognition tasks. However, their effcacy for complex event recognition in unconstrained videos have not been systematically evaluated. In this paper, we use the NIST(More)
We propose to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos. We model events using a variety of complementary semantic attribute features developed in a semantic concept space. Our contribution is to systematically demonstrate the advantages of this concept-based(More)
We present a framework for extracting image contours based on geometric and structural consistency among edge element locations and orientations. The paper presents two contributions. First, we observe that while the traditional edge orientation operators are based on first-order derivatives, orientation as tangent of a localized curve requires third-order(More)
Shape is an important cue for generic object recognition but can be insufficient without other cues such as object appearance. We explore a number of ways in which the geometric aspects of an object can be augmented with its appearance. The main idea is to construct a dense correspondence between the interior regions of two shapes based on a shape-based(More)
In this paper, we describe the evaluation results for TRECVID 2012 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks as a part of SRI-Sarnoff AURORA system that is developed under the IARPA ALDDIN program. In AURORA system, we incorporated various low-level features that capture color, appearance, motion, and audio information in(More)
We present a novel representation of images based on a decomposition into atomic patches which we call medial visual fragments and which is particularly suited for structural grouping. Specifically, we show that the medial axis/shock graph of a contour map partitions the image domain into non-overlapping regions, which together with the image information(More)
A key challenge underlying theories of vision is how the spatially restricted, retinotopically represented feature computations can be integrated to form abstract, coordinate-free object models. A resolution likely depends on the use of intermediate-level representations which can on the one hand be populated by local features and on the other hand be used(More)
In this paper, we present results from the experimental evaluation for the TRECVID 2011 MED11 (Multimedia Event Detection) task as a part of Team SRI-Sarnoff’s AURORA system being developed under the IARPA ALADDIN Program. Our approach employs two classes of content descriptions for describing videos depicting diverse events: (1) Low level features and(More)
We present a novel approach to computational modeling of social interactions based on modeling of essential social interaction predicates (ESIPs) such as joint attention and entrainment. Based on sound social psychological theory and methodology, we collect a new “Tower Game” dataset consisting of audio-visual capture of dyadic interactions labeled with the(More)
We introduce the Tower Game Dataset for computational modeling of social interaction predicates. Existing research in affective computing has focused primarily on recognizing the emotional and mental state of a human based on external behaviors. Recent research in the social science community argues that engaged and sustained social interactions require the(More)