• Publications
  • Influence
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis
TLDR
This paper presents an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data and discovered that this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. Expand
End-to-End Learning of Action Detection from Frame Glimpses in Videos
TLDR
A fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions and uses REINFORCE to learn the agent's decision policy. Expand
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
TLDR
A novel variant of long short-term memory deep networks is defined for modeling these temporal relations via multiple input and output connections and it is shown that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction. Expand
Towards Viewpoint Invariant 3D Human Pose Estimation
TLDR
A viewpoint invariant model for 3D human pose estimation from a single depth image that leverages a convolutional and recurrent network architecture with a top-down error feedback mechanism to self-correct previous pose estimates in an end-to-end manner. Expand
VideoSET: Video Summary Evaluation through Text
TLDR
This paper presents VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video, and develops a text-based approach for the evaluation. Expand
Scaling Human-Object Interaction Recognition Through Zero-Shot Learning
TLDR
This work introduces a factorized model for HOI detection that disentangles reasoning on verbs and objects, and at test-time can therefore produce detections for novel verb-object pairs through a zero-shot learning approach. Expand
Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference
TLDR
This work develops a pruning and quantization approach that leverages sparse representations in the underlying cryptosystem to accelerate inference and derives an optimal approximation for popular activation functions that achieves maximally-sparse encodings and minimizes approximation error. Expand
Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks
TLDR
This work introduces an approach to automatically assess surgeon performance by tracking and analyzing tool movements in surgical videos, leveraging region-based convolutional neural networks, and is the first to not only detect presence but also spatially localize surgical tools in real-world laparoscopic surgical videos. Expand
Dynamic Task Prioritization for Multitask Learning
TLDR
This work proposes a notion of dynamic task prioritization to automatically prioritize more difficult tasks by adaptively adjusting the mixing weight of each task’s loss objective and outperforms existing multitask methods and demonstrates competitive results with modern single-task models on the COCO and MPII datasets. Expand
Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos
TLDR
This work takes a modular neural network approach that, given a natural language query, extracts the semantic structure to assemble a compositional neural network layout and corresponding network modules and shows that it is able to achieve state-of-the-art results on the DiDeMo video retrieval dataset. Expand
...
1
2
3
4
5
...