Qingming Huang

Learn More
The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 38 trackers are 2 Authors Suppressed Due to Excessive Length presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For(More)
In this paper, we propose a new computational model for visual saliency derived from the information maximization principle. The model is inspired by a few well acknowledged biological facts. To compute the saliency spots of an image, the model first extracts a number of sub-band feature maps using learned sparse codes. It adopts a fully-connected graph(More)
Text in images and video frames carries important information for visual content understanding and retrieval. In this paper, by using multiscale wavelet features, we propose a novel coarse-to-fine algorithm that is able to locate text lines even under complex background. First, in the coarse detection, after the wavelet energy feature is calculated to(More)
Improving human action recognition in videos is restricted by the inherent limitations of the visual data. In this paper, we take the depth information into consideration and construct a novel dataset of human daily actions. The proposed ACT4 dataset provides synchronized data from 4 views and 2 sources, aiming to facilitate the research of action analysis(More)
The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the words in texts. However, massive experiments show that the commonly used visual(More)
Moving object segmentation in compressed domain plays an important role in many real-time applications, e.g. video indexing, video transcoding, video surveillance, etc. Because H.264/AVC is the up-to-date video-coding standard, few literatures have been reported in the area of video analysis on H.264/AVC compressed video. Compared with the former MPEG(More)
In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually(More)
Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due to three reasons. First, many detected local features are not stable enough, resulting in many noisy and non-descriptive visual words in images. Second, single(More)