Learn More
In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams,(More)
In this paper, we present an <i>automatic</i> web image mining system towards building a <i>universal</i> human age estimator based on facial information, which is applicable to all ethnic groups and various image qualities. First, a large (<391k) yet noisy human aging image dataset is crawled from the photo sharing website <i>Flickr</i> and <i>Google</i>(More)
The aim of this paper is to address the problem of recognizing human group activities in surveillance videos. This task has great potentials in practice, however was rarely studied due to the lack of benchmark database and the difficulties caused by large intra-class variations. Our contributions are two-fold. Firstly, we propose to encode the(More)
Along with the explosive growth of multimedia data, automatic multimedia tagging has attracted great interest of various research communities, such as computer vision, multimedia, and information retrieval. However, despite the great progress achieved in the past two decades, automatic tagging technologies still can hardly achieve satisfactory performance(More)
Recognizing complex human activities usually requires the detection and modeling of individual visual features and the interactions between them. Current methods only rely on the visual features extracted from 2-D images, and therefore often lead to unreliable salient visual feature detection and inaccurate modeling of the interaction context between(More)
—Automated scene analysis has been a topic of great interest in computer vision and cognitive science. Recently, with the growth of crowd phenomena in the real world, crowded scene analysis has attracted much attention. However, the visual occlusions and ambiguities in crowded scenes, as well as the complex behaviors and scene semantics, make the analysis a(More)
In this paper, we present a novel framework, called learning by propagability, for two essential data mining tasks, i.e., classification and regression. The whole learning process is driven by the philosophy that the data labels and the optimal feature representation jointly constitute a harmonic system, where the data labels are invariant with respect to(More)
Dense local trajectories have been successfully used in action recognition. However, for most actions only a few local motion features (e.g., critical movement of hand, arm, leg etc.) are responsible for the action label. Therefore, highlighting the local features which are associated with important motion parts will lead to a more discriminative action(More)