Yueting Zhuang

Learn More
Key frame extraction has been recognized as one of the important research issues in video information retrieval. Although progress has been made in key frame extraction, the existing approaches are either computationally expensive or ine ective in capturing salient visual content. In this paper, we rst discuss the importance of key frame selection; and then(More)
In this paper, we propose a new image clustering algorithm, referred to as clustering using local discriminant models and global integration (LDMGI). To deal with the data points sampled from a nonlinear manifold, for each data point, we construct a local clique comprising this data point and its neighboring data points. Inspired by the Fisher criterion, we(More)
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to(More)
Recently, deep learning approach, especially deep Convolutional Neural Networks (ConvNets), have achieved overwhelming accuracy with fast processing speed for image classification. Incorporating temporal structure with deep ConvNets for video representation becomes a fundamental problem for video content analysis. In this paper, we propose a new approach,(More)
A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner. In this paper, we propose a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (whole saliency maps). In principle, the proposed saliency(More)
A better similarity mapping function across heterogeneous high-dimensional features is very desirable for many applications involving multi-modal data. In this paper, we introduce coupled dictionary learning (DL) into supervised sparse coding for multi-modal (crossmedia) retrieval. We call this Supervised coupleddictionary learning with group structures for(More)
Rich multimedia content including images, audio and text are frequently used to describe the same semantics in E-Learning and Ebusiness web pages, instructive slides, multimedia cyclopedias, and so on. In this paper, we present a framework for cross-media retrieval, where the query example and the retrieved result(s) can be of different media types. We(More)
To reveal and leverage the correlated and complemental information between different views, a great amount of multi-view learning algorithms have been proposed in recent years. However, unsupervised feature selection in multiview learning is still a challenge due to lack of data labels that could be utilized to select the discriminative features. Moreover,(More)
The most commonly used method for content-based video retrieval is query by example. But the definition of video similarity brings great obstacle to further research. This paper puts forward a new approach to solve the difficulty. Firstly, it advances centroid feature vector of shot in order to reduce the storage of video database. Secondly, considering all(More)