• Publications
  • Influence
Tag ranking
TLDR
This paper proposes a tag ranking scheme, aiming to automatically rank the tags associated with a given image according to their relevance to the image content, and applies tag ranking into three applications: tag-based image search, tag recommendation, and group recommendation.
A generic framework of user attention model and its application in video summarization
TLDR
A generic framework of a user attention model is presented, which estimates the attentions viewers may pay to video contents, and a set of modeling methods for visual and aural attentions are proposed.
Structure Aware Single-Stage 3D Object Detection From Point Cloud
TLDR
An auxiliary network is designed which converts the convolutional features in the backbone network back to point-level representations and an efficient part-sensitive warping operation is developed to align the confidences to the predicted bounding boxes.
Spatio-Temporal AutoEncoder for Video Anomaly Detection
TLDR
A novel model called Spatio-Temporal AutoEncoding (ST AutoEncoder or STAE), which utilizes deep neural networks to learn video representation automatically and extracts features from both spatial and temporal dimensions by performing 3-dimensional convolutions, which enhances the motion feature learning in videos.
Unified Video Annotation via Multigraph Learning
TLDR
This paper shows that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs, and proposes optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme.
Learning to tag
TLDR
A multi-modality recommendation based on both tag and visual correlation is proposed, and the tag recommendation is formulates as a learning problem, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities.
Clickage: towards bridging semantic and intent gaps via mining click logs of search engines
TLDR
It is argued that the massive amount of click data from commercial search engines provides a data set that is unique in the bridging of the semantic and intent gap, and preliminary studies on the power of large-scale click data are presented.
Counterfactual VQA: A Cause-Effect Look at Language Bias
TLDR
A novel counterfactual inference framework is proposed, which enables the language bias to be captured as the direct causal effect of questions on answers and reduced by subtracting the direct language effect from the total causal effect.
Towards a Relevant and Diverse Search of Social Images
TLDR
A diverse relevance ranking scheme that is able to take relevance and diversity into account by exploring the content of images and their associated tags, and it is shown that the diversity of search results can be enhanced while maintaining a comparable level of relevance.
...
1
2
3
4
5
...