Zhen-Zhong Lan

Learn More
Automatic video segmentation and action recognition has been a long-standing problem in computer vision. Much work in the literature treats video segmentation and action recognition as two independent problems; while segmentation is often done without a temporal model of the activity, action recognition is usually performed on pre-segmented clips. In this(More)
Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This attenuation introduces bias to the resulting features and generates ill-conditioned feature matrices. The Gaussian Pyramid has been used as a feature enhancing technique that encodes(More)
Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training. Existing methods are limited in that they ignore an important aspect in learning: diversity. To incorporate this information, we propose an approach called self-paced(More)
Multimedia Event Detection is a multimedia retrieval task with the goal of finding videos of a particular event in an internet video archive, given example videos and descriptions. We focus here on mining features of example videos to learn the most characteristic features, which requires a combination of multiple complementary types of features. Generally,(More)
Viral videos that gain popularity through the process of Internet sharing are having a profound impact on society. Existing studies on viral videos have only been on small or confidential datasets. We collect by far the largest open benchmark for viral video study called CMU Viral Video Dataset, and share it with researchers from both academia and industry.(More)
We address the problem of action recognition in unconstrained videos. We propose a novel content driven pooling that leverages space-time context while being robust toward global space-time transformations. Being robust to such transformations is of primary importance in unconstrained videos where the action localizations can drastically shift between(More)
Inductive transfer learning and semi-supervised learning are two different branches of machine learning. The former tries to reuse knowledge in labeled out-of-domain instances while the later attempts to exploit the usefulness of unlabeled in-domain instances. In this paper, we bridge the two branches by pointing out that many semi-supervised learning(More)
We address the problem of action recognition in unconstrained videos. We propose a novel content driven pooling that leverages space-time context while being robust toward global space-time transformations. Being robust to such transformations is of primary importance in unconstrained videos where the action localizations can drastically shift between(More)
Multimedia Event Detection(MED) is a multimedia retrieval task with the goal of finding videos of a particular event in video archives, given example videos and event descriptions; different from MED, multimedia classification is a task that classifies given videos into specified classes. Both tasks require mining features of example videos to learn the(More)
Traditional feature selection algorithms require a large number of labeled training instances to find out the most informative subset of features. However, in many real-world applications, the labeled data are often difficult, expensive or time-consuming to obtain. Recently, several semi-supervised feature selection algorithms were proposed, which aim at(More)