• Publications
  • Influence
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
TLDR
A combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions is proposed, demonstrating the broad applicability of this approach to VQA. Expand
Hierarchical Attention Networks for Document Classification
TLDR
Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Expand
Embedding Entities and Relations for Learning and Inference in Knowledge Bases
TLDR
It is found that embeddings learned from the bilinear objective are particularly good at capturing relational semantics and that the composition of relations is characterized by matrix multiplication. Expand
Learning deep structured semantic models for web search using clickthrough data
TLDR
A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed. Expand
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
TLDR
A benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data, which could lead to one of the largest classification problems in computer vision. Expand
Stacked Attention Networks for Image Question Answering
TLDR
A multiple-layer SAN is developed in which an image is queried multiple times to infer the answer progressively, and the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer. Expand
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
TLDR
An Attentional Generative Adversarial Network that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation and for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image. Expand
Domain Adaptation via Pseudo In-Domain Data Selection
TLDR
The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding. Expand
From captions to visual concepts and back
TLDR
This paper uses multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives, and develops a maximum-entropy language model. Expand
Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base
TLDR
This work proposes a novel semantic parsing framework for question answering using a knowledge base that leverages the knowledge base in an early stage to prune the search space and thus simplifies the semantic matching problem. Expand
...
1
2
3
4
5
...