• Publications
  • Influence
UNITER: UNiversal Image-TExt Representation Learning
TLDR
UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets is introduced, which can power heterogeneous downstream V+L tasks with joint multimodal embeddings.
Patient Knowledge Distillation for BERT Model Compression
TLDR
This work proposes a Patient Knowledge Distillation approach to compress an original large model (teacher) into an equally-effective lightweight shallow network (student), which translates into improved results on multiple NLP tasks with a significant gain in training efficiency, without sacrificing model accuracy.
MMD GAN: Towards Deeper Understanding of Moment Matching Network
TLDR
In the evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.
Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-identification
TLDR
This work presents a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching items can directly influence the computation of each other's representation.
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
TLDR
A novel adversarial training algorithm is proposed, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples.
Deep Structured Energy Based Models for Anomaly Detection
TLDR
This paper proposes deep structured energy based models (DSEBMs), where the energy function is the output of a deterministic deep neural network with structure, and develops novel model architectures to integrate EBMs with different types of data such as static data, sequential data, and spatial data.
Towards Pose Invariant Face Recognition in the Wild
TLDR
Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed Pose Invariant Model for face recognition in the wild over the state of thearts.
Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data
TLDR
This work tracks the faces of casual walkers on more than 40 hours of egocentric video and automatically extracts nearly 5 million pairs of images connected by or from different face tracks, along with their weather and location context, under pose and lighting variations, to learn a rich feature representation for facial attribute classification.
Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification
TLDR
Evaluation on person attributes classification tasks involving facial and clothing attributes suggests that the models produced by the proposed method are fast, compact and can closely match or exceed the state-of-the-art accuracy from strong baselines by much more expensive models.
Risk Prediction with Electronic Health Records: A Deep Learning Approach
TLDR
A deep learning approach for phenotyping from patient EHRs by building a fourlayer convolutional neural network model for extracting phenotypes and perform prediction and the proposed model is validated on a real world EHR data warehouse under the specific scenario of predictive modeling of chronic diseases.
...
...