Learn More
This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees.(More)
Motivated by recent successes on learning feature representations and on learning feature comparison functions, we propose a unified approach to combining both for training a patch matching system. Our system, dubbed Match-Net, consists of a deep convolutional network that extracts features from patches and a network of three fully connected layers that(More)
In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. This is an extremely challenging task due to visual differences between street photos (pictures of people wearing clothing in everyday uncontrolled settings) and online shop photos (pictures of(More)
What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1)(More)
When people describe a scene, they often include information that is not visually apparent; sometimes based on background knowledge, sometimes to tell a story. We aim to separate visual text—descriptions of what is being seen—from non-visual text in natural images and their descriptions. To do so, we first concretely define what it means to be visual,(More)
While previous results from univariate analysis showed that the activity level of the parahippocampal gyrus (PHG) but not the fusiform gyrus (FG) reflects selective maintenance of the cued picture category, present results from multi-voxel pattern analysis (MVPA) showed that the spatial response patterns of both regions can be used to differentiate the(More)
What is the story of an image? What is the relationship between pictures, language, and information we can extract using state of the art computational recognition systems? In an attempt to address both of these questions, we explore methods for retrieving and generating natural language descriptions for images. Ideally, we would like our generated textual(More)
Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies have demonstrated improved performance by incorporating additional hand-crafted features or by fusing predictions from multiple CNNs. Usually, these combinations are implemented via feature concatenation or by(More)
We present an algorithm and implementation for distributed parallel training of single-machine multiclass SVMs. While there is ongoing and healthy debate about the best strategy for multiclass classification, there are some features of the single-machine approach that are not available when training alternatives such as one-vs-all, and that are quite(More)