In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For… (More)
Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to large-scale problems due to its high computational… (More)
We propose NEIL (Never Ending Image Learner), a computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data. NEIL uses a… (More)
We present an approach to utilize large amounts of web data for learning CNNs. Specifically inspired by curriculum learning, we present a two-step approach for CNN training. First, we use easy images… (More)
While neural networks have been successfully applied to many NLP tasks the resulting vector-based models are very difficult to interpret. For example it’s not clear how they achieve compositionality,… (More)
In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a… (More)
There have been some recent efforts to build visual knowledge bases from Internet images. But most of these approaches have focused on bounding box representation of objects. In this paper, we… (More)
We explore design principles for general pixel-level prediction problems, from low-level edge detection to midlevel surface normal estimation to high-level semantic segmentation. Convolutional… (More)
Spectral clustering is one of the most popular clustering approaches. However, it is not a trivial task to apply spectral clustering to large-scale problems due to its computational complexity of… (More)
In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural network. Unlike previous approaches… (More)