• Publications
  • Influence
Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning
tl;dr
We present a novel approach that combines Reinforcement Learning and Generative Adversarial Networks (GANS) to generate more human-like responses to questions. Expand
  • 65
  • 17
  • Open Access
Visual question answering: A survey of methods and datasets
tl;dr
Visual Question Answering is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Expand
  • 146
  • 12
  • Open Access
FVQA: Fact-Based Visual Question Answering
tl;dr
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural language processing communities, not least because it offers insight into the relationships between two important sources of information. Expand
  • 114
  • 12
  • Open Access
Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks
tl;dr
We jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks. Expand
  • 76
  • 9
  • Open Access
Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks
tl;dr
We propose to build a directed graph over the object regions of an image to model the relationships between objects. Expand
  • 40
  • 8
  • Open Access
Toward End-to-End Car License Plate Detection and Recognition With Deep Neural Networks
  • Hui Li, Peng Wang, C. Shen
  • Computer Science, Engineering
  • IEEE Transactions on Intelligent Transportation…
  • 26 September 2017
tl;dr
We propose a unified deep neural network, which can localize license plates and recognize the letters simultaneously in a single forward pass. Expand
  • 59
  • 6
  • Open Access
Piecewise Classifier Mappings: Learning Fine-Grained Learners for Novel Categories With Few Examples
tl;dr
We propose an end-to-end trainable deep network which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task. Expand
  • 26
  • 6
  • Open Access
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
tl;dr
We propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. Expand
  • 130
  • 5
  • Open Access
Reading car license plates using deep neural networks
tl;dr
We tackle the problem of car license plate detection and recognition in natural scene images based on the powerful deep neural networks (DNNs). Expand
  • 38
  • 4
  • Open Access
Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length
tl;dr
We propose an end-to-end pipeline named Two-stream 3-D-convNet Fusion, which can recognize human actions in videos of arbitrary size and length using multiple features. Expand
  • 133
  • 2