• Publications
  • Influence
Learning Temporal Regularity in Video Sequences
This work proposes two methods that are built upon the autoencoders for their ability to work with little to no supervision, and builds a fully convolutional feed-forward autoencoder to learn both the local features and the classifiers as an end-to-end learning framework.
Predictable Dual-View Hashing
We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces. We create a cross-view hamming space with the ability to compare information
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
The task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images, is introduced and state-of-the-art methods for textual machine comprehension and visual question answering are extended to the TQA dataset.
Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition
Experimental results indicate that DCP outperforms the state-of-the-art local descriptors for both face identification and face verification tasks and the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme.
Mining Discriminative Triplets of Patches for Fine-Grained Classification
This work introduces triplets of patches with geometric constraints to improve the accuracy of patch localization, and automatically mine discriminative geometrically-constrained triplets for classification in a patch-based framework that only requires object bounding boxes.
ActionFlowNet: Learning Motion Representation for Action Recognition
This work proposes a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow while recognizing actions with convolutional neural networks, capturing both appearance and motion in a single model.
Thermal-to-visible face recognition using partial least squares.
The preprocessing and feature extraction stages are designed to reduce the modality gap between the thermal and visible facial signatures, and facilitate the subsequent one-vs-all PLS-based model building.
Face Identification Using Large Feature Sets
A large and rich set of feature descriptors are employed for face identification using partial least squares to perform multichannel feature weighting and extended to a tree-based discriminative structure to reduce the time required to evaluate probe samples.
Learning to Super Resolve Intensity Images From Events
This work proposes an end-to-end network to reconstruct high resolution, high dynamic range (HDR) images directly from the event stream and evaluates the algorithm on both simulated and real-world sequences to verify that it captures fine details of a scene and outperforms the combination of the state-of-the-art event to image algorithms with the state of the art super resolution schemes.