Learn More
Human pose estimation requires a versatile yet well-constrained spatial model for grouping locally ambiguous parts together to produce a globally consistent hypothesis. Previous works either use local deformable models deviating from a certain template, or use a global mixture representation in the pose space. In this paper, we propose a new hierarchical(More)
We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the(More)
A video sequence of an underwater scene taken from above the water surface suffers from severe distortions due to water fluctuations. In this paper, we simultaneously estimate the shape of the water surface and recover the planar underwater scene without using any calibration patterns, image priors, multiple viewpoints or active illumination. The key idea(More)
Digital photo management is becoming indispensable for the explosively growing family photo albums due to the rapid popularization of digital cameras and mobile phone cameras. In an effective photo management system photo annotation is the most challenging task. In this paper, we develop several innovative interaction techniques for semi-automatic photo(More)
Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go's high branching factor makes traditional search techniques ineffective, even on leading-edge hardware, and Go's evaluation function could change drastically with one stone change. Recent works [Maddi-son et al. (2015); Clark & Storkey (2015)](More)
In this paper, we propose a new framework for training vision-based agent for First-Person Shooter (FPS) Game, in particular Doom. Our framework combines the state-of-the-art reinforcement learning approach (Asynchronous Advantage Actor-Critic (A3C) model [Mnih et al. (2016)]) with curriculum learning. Our model is simple in design and only uses game states(More)