Learn More
In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams,(More)
Modern visual classification models generally include a feature pooling step, which aggregates local features over the region of interest into a statistic through a certain spatial pooling operation. Two commonly used operations are the average and max poolings. However, recent theoretical analysis has indicated that neither of these two pooling techniques(More)
In this paper, we propose an intelligent photography system, which automatically and professionally generates/recommends user-favorite photo(s) from a wide view or a continuous view sequence. This task is quite challenging given that the evaluation of photo quality is under-determined and usually subjective. Motivated by the recent prevalence of online(More)
Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure,(More)
The aim of this paper is to address the problem of recognizing human group activities in surveillance videos. This task has great potentials in practice, however was rarely studied due to the lack of benchmark database and the difficulties caused by large intra-class variations. Our contributions are two-fold. Firstly, we propose to encode the(More)
In this paper, we present an <i>automatic</i> web image mining system towards building a <i>universal</i> human age estimator based on facial information, which is applicable to all ethnic groups and various image qualities. First, a large (<391k) yet noisy human aging image dataset is crawled from the photo sharing website <i>Flickr</i> and <i>Google</i>(More)
In this paper, we present an automatic web image and video mining framework with the ultimate goal of building a universal human age estimator based on facial information, which is applicable to all ethnic groups and various image qualities. On one hand, a large (391 k) yet noisy human aging image database is collected from Flickr and Google Image using a(More)
We investigate the feature design and classification architectures in temporal action localization. This application focuses on detecting and labeling actions in untrimmed videos, which brings more challenge than classifying presegmented videos. The major difficulty for action localization is the uncertainty of action occurrence and utilization of(More)
We address the person re-identification problem by effectively exploiting a globally discriminative feature representation from a sequence of tracked human regions/patches. This is in contrast to previous person re-id works, which rely on either single frame based person to person patch matching, or graph based sequence to sequence matching. We show that a(More)