• Publications
  • Influence
Globally-optimal greedy algorithms for tracking a variable number of objects
We analyze the computational problem of multi-object tracking in video sequences. We formulate the problem using a cost function that requires estimating the number of tracks, as well as their birthExpand
  • 696
  • 112
Generating Videos with Scene Dynamics
We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. futureExpand
  • 756
  • 94
Detecting activities of daily living in first-person camera views
We present a novel dataset and novel algorithms for the problem of detecting activities of daily living (ADL) in firstperson camera views. We have collected a dataset of 1 million frames of dozens ofExpand
  • 564
  • 67
A large-scale benchmark dataset for event recognition in surveillance video
We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoorExpand
  • 492
  • 61
Anticipating Visual Representations from Unlabeled Video
Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requiresExpand
  • 277
  • 27
Weakly Supervised Cascaded Convolutional Networks
Object detection is a challenging task in visual understanding domain, and even more so if the supervision is to be weak. Recently, few efforts to handle the task without expensive human annotationsExpand
  • 139
  • 27
Assessing the Quality of Actions
While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem of assessing how well people perform actions has been largelyExpand
  • 86
  • 26
Bilinear classifiers for visual recognition
We describe an algorithm for learning bilinear SVMs. Bilinear classifiers are a discriminative variant of bilinear models, which capture the dependence of data on multiple factors. Such models areExpand
  • 119
  • 13
Parsing IKEA Objects: Fine Pose Estimation
We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with recent advances in objectExpand
  • 205
  • 11
Representation Learning by Learning to Count
We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivarianceExpand
  • 140
  • 10