• Publications
  • Influence
Recognizing realistic actions from videos “in the wild”
TLDR
This paper presents a systematic framework for recognizing realistic actions from videos “in the wild”, and uses motion statistics to acquire stable motion features and clean static features, and PageRank is used to mine the most informative static features. Expand
Learning multi-label scene classification
TLDR
A framework to handle semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels, is presented and appears to generalize to other classification problems of the same nature. Expand
DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
TLDR
A large-scale Dataset for Object deTection in Aerial images (DOTA) is introduced and state-of-the-art object detection algorithms on DOTA are evaluated, demonstrating that DOTA well represents real Earth Vision applications and are quite challenging. Expand
Image Captioning with Semantic Attention
TLDR
This paper proposes a new algorithm that combines top-down and bottom-up approaches to natural language description through a model of semantic attention, and significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics. Expand
Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition
TLDR
This paper proposes a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other, and shows the best performances on three challenging published fine-grained datasets. Expand
iCoseg: Interactive co-segmentation with intelligent scribble guidance
TLDR
iCoseg, an automatic recommendation system that intelligently recommends where the user should scribble next, is proposed, and users following these recommendations can achieve good quality cutouts with significantly lower time and effort than exhaustively examining all cutouts. Expand
Visual event recognition in videos by learning from web data
TLDR
A new aligned space-time pyramid matching method to measure the distances between two video clips, and a cross-domain learning method to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Expand
Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks
TLDR
The proposed CNN can achieve better performance in image sentiment analysis than competing algorithms and is able to improve the performance on Twitter images by inducing domain transfer with a small number of manually labeled Twitter images. Expand
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark
TLDR
A new data set is introduced, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set, to encourage further research on visual emotion analysis. Expand
Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection
TLDR
This work forms video summarization as a novel dictionary selection problem using sparsity consistency, where a dictionary of key frames is selected such that the original video can be best reconstructed from this representative dictionary. Expand
...
1
2
3
4
5
...