Learn More
The goal of this paper is to discover a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation. The desired patches need to satisfy two requirements: 1) to be representative, they need to occur frequently enough in the visual world; 2) to be discriminative, they need to be different enough from the rest of the(More)
Given a large repository of geo-tagged imagery, we seek to automatically find visual elements, for example windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be(More)
Considerable progress has been made in face recognition research over the last decade especially with the development of powerful models of face appearance (i.e., eigenfaces). Despite the variety of approaches and tools studied, however, face recognition is not accurate or robust enough to be deployed in uncontrolled environments. Recently, a number of(More)
We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification. We show that by conditioning the predictions on object proposals with sufficient image support, our method can do well without complicated spatial(More)
Thermal infrared (IR) imagery offers a promising alternative to visible imagery for face recognition due to its relative insensitive to variations in face appearance caused by illumination changes. Despite its advantages, however, thermal IR has several limitations including that it is opaque to glass. The focus of this study is on the sensitivity of(More)
We consider the problem of semi-supervised bootstrap learning for scene categorization. Existing semi-supervised approaches are typically unreliable and face semantic drift because the learning task is under-constrained. This is primarily because they ignore the strong interactions that often exist between scene categories, such as the common attributes(More)
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method maps textual queries and visual features from various regions into a shared space where they are compared for relevance with an inner product. Our method exhibits significant improvements in answering questions such as " what(More)
1-naphthol (1N), 2-naphthol (2N) and 8-quinolinol (8H) are general water pollutants. 1N and 2N are the configurational enantiomers and 8H is isoelectronic to 1N and 2N. These pollutants when ingested are transported in the blood by proteins like human serum albumin (HSA). Binding of these pollutants to HSA has been explored to elucidate the specific(More)
We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout [17], stochastic depth [6] and residual architectures [4, 5] as special cases. When viewed as a regularization method swapout(More)