MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
- Jun Xu, Tao Mei, Ting Yao, Y. Rui
- Computer ScienceComputer Vision and Pattern Recognition
- 5 January 2016
A detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches, shows that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on this dataset.
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
- Zhaofan Qiu, Ting Yao, Tao Mei
- Computer ScienceIEEE International Conference on Computer Vision
- 1 October 2017
This paper devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x3 x 3 convolutions with 1 × 3 × 3 convolutional filters on spatial domain (equivalent to 2D CNN) plus 3 × 1 × 1 convolutions to construct temporal connections on adjacent feature maps in time.
Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition
- Jianlong Fu, Heliang Zheng, Tao Mei
- Computer ScienceComputer Vision and Pattern Recognition
- 21 July 2017
A novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way and achieves the best performance in three fine-grained tasks.
A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance
- Xinchen Liu, Wu Liu, Tao Mei, Huadong Ma
- Computer ScienceEuropean Conference on Computer Vision
- 8 October 2016
This paper proposes a novel deep learning-based approach to PROgressive Vehicle re-ID, called “PROVID”, which treats vehicle Re-Id as two specific progressive search processes: coarse-to-fine search in the feature space, and near- to-distantsearch in the real world surveillance environment.
Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition
- Heliang Zheng, Jianlong Fu, Tao Mei, Jiebo Luo
- Computer ScienceIEEE International Conference on Computer Vision
- 1 October 2017
This paper proposes a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other, and shows the best performances on three challenging published fine-grained datasets.
Exploring Visual Relationship for Image Captioning
- Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
- Computer ScienceEuropean Conference on Computer Vision
- 8 September 2018
This paper introduces a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework that novelly integrates both semantic and spatial object relationships into image encoder.
Jointly Modeling Embedding and Translation to Bridge Video and Language
- Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Y. Rui
- Computer ScienceComputer Vision and Pattern Recognition
- 7 May 2015
A novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual- semantic embedding and outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.
Destruction and Construction Learning for Fine-Grained Image Recognition
- Yue Chen, Yalong Bai, Wei Zhang, Tao Mei
- Computer ScienceComputer Vision and Pattern Recognition
- 1 June 2019
A novel "Destruction and Construction Learning" (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge.
Boosting Image Captioning with Attributes
- Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
- Computer ScienceIEEE International Conference on Computer Vision
- 5 November 2016
This paper presents Long Short-Term Memory with Attributes (LSTM-A) - a novel architecture that integrates attributes into the successful Convolutional Neural Networks plus Recurrent Neural Networks (RNNs) image captioning framework, by training them in an end-to-end manner.
PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance
- Xinchen Liu, Wu Liu, Tao Mei, Huadong Ma
- Computer ScienceIEEE transactions on multimedia
- 1 March 2018
This paper proposes PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks, which not only utilizes the multimodality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle reidentification in two progressive procedures: coarse- to-fine search in the feature domain, and near-to-distantsearch in the physical space.
...
...