Learn More
Cross-domain image synthesis and recognition are typically considered as two distinct tasks in the areas of computer vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be easily generalized or extended for solving the other. In this paper, we propose a unified model for coupled dictionary and feature space(More)
—Decomposition of an image into multiple semantic components has been an effective research topic for various image processing applications such as image denoising, enhancement, and inpainting. In this paper, we present a novel self-learning based image decomposition framework. Based on the recent success of sparse representation, the proposed framework(More)
We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connec-tionist Temporal(More)
Our aim is to show how state-of-the-art computer vision techniques can be used to advance prehensile analysis (i.e., understanding the functionality of human hands). Prehensile analysis is a broad field of multi-disciplinary interest, where researchers painstakingly manually analyze hours of hand-object interaction videos to understand the mechanics of hand(More)
We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D(More)
In this paper, we address the problem of robust face recognition using single sample per person. Given only one training image per subject of interest, our proposed method is able to recognize query images with illumination or expression changes, or even the corrupted ones due to occlusion. In order to model the above intra-class variations, we advocate the(More)
Maximum entropy inverse optimal control (MaxEnt IOC) is an effective means of discovering the underlying cost function of demonstrated human activity and can be used to predict human behavior over low-dimensional state spaces (i.e., forecasting of 2D trajectories). To enable inference in very large state spaces, we introduce an approximate MaxEnt IOC(More)
We present a novel learning-based method for single image super-resolution (SR). Given a single input low-resolution (LR) image (and its image pyramid), we propose to learn context-specific image sparse representation, which aims at modeling the relationship between low and high-resolution image patch pairs of different context categories in terms of the(More)