Learn More
The problem of recognizing actions in realistic videos is challenging yet absorbing owing to its great potentials in many practical applications. Most previous research is limited due to the use of simplified action databases under controlled environments or focus on excessively localized features without sufficiently encapsulating the spatio-temporal(More)
The human visual system observes and understands a scene/image by making a series of fixations. Every “fixation point” lies inside a particular region of arbitrary shape and size in the scene which can either be an object or just a part of it. We define as a basic segmentation problem the task of segmenting that region containing the(More)
Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene, and every fixation point lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an(More)
Traditional image stitching using parametric transforms such as homography, only produces perceptually correct composites for planar scenes or parallax free camera motion between source frames. This limits mosaicing to source images taken from the same physical location. In this paper , we introduce a smoothly varying affine stitching field which is(More)
Visual vocabulary construction is an integral part of the popular Bag-of-Features (BOF) model. When visual data scale up (in terms of the dimensionality of features or/and the number of samples), most existing algorithms (e.g. k-means) become unfavorable due to the prohibitive time and space requirements. In this paper we propose the random locality(More)
We present a method to recover a 3D texture-mapped architecture model from a single image. Both single image based modeling and architecture modeling are challenging problems. We handle these difficulties by employing constraints derived from shape symmetries, which are prevalent in architecture. We first present a novel algorithm to calibrate the camera(More)
Temporal texture accounts for a large proportion of motion commonly experienced in the visual world. Current temporal texture techniques extract primarily motion-based features for recognition. We propose a representation where both the spatial and the temporal aspects of texture are coupled together. Such a representation has the advantages of improving(More)
Recently there is a line of research work proposing to employ Spectral Clustering (SC) to segment (group)\footnote{Throughout the paper, we use segmentation, clustering, and grouping, and their verb forms, interchangeably.} high-dimensional structural data such as those (approximately) lying on subspaces\footnote{We follow~\cite{liu2010robust} and use the(More)