Rahul Sukthankar

Learn More
Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) recently evaluated a variety of approaches and identified the SIFT [D. G. Lowe, 1999] algorithm as being the most resistant to common image deformations. This paper examines (and(More)
Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the(More)
Graph matching and MAP inference are essential problems in computer vision and machine learning. We introduce a novel algorithm that can accommodate both problems and solve them efficiently. Recent graph matching algorithms are based on a general quadratic programming formulation, which takes in consideration both unary and second-order terms reflecting the(More)
This paper studies the use of volumetric features as an alternative to popular local descriptor approaches for event detection in video sequences. Motivated by the recent success of similar ideas in object detection on static images, we generalize the notion of 2D box features to 3D spatio-temporal volumetric features. This general framework enables us to(More)
Motivated by recent successes on learning feature representations and on learning feature comparison functions, we propose a unified approach to combining both for training a patch matching system. Our system, dubbed Match-Net, consists of a deep convolutional network that extracts features from patches and a network of three fully connected layers that(More)
Real-world actions occur often in crowded, dynamic environments. This poses a difficult challenge for current approaches to video event detection because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. We propose a technique for event recognition in crowded videos that reliably identifies(More)
The ability of a robot team to reconfigure itself is useful in many applications: for metamorphic robots to change shape, for swarm motion towards a goal, for biological systems to avoid predators, or for mobile buoys to clean up oil spills. Inmany situations, auxiliary constraints, such as connectivity between team members or limits on the maximum(More)
An important aspect in designing interactive, action-based interfaces is reliably recognizing actions with minimal latency. High latency causes the system’s feedback to lag behind user actions and thus significantly degrades the interactivity of the user experience. This paper presents algorithms for reducing latency when recognizing actions. We use a(More)
We introduce the first visual dataset of fast foods with a total of 4,545 still images, 606 stereo pairs, 303 360° videos for structure from motion, and 27 privacy-preserving videos of eating events of volunteers. This work was motivated by research on fast food recognition for dietary assessment. The data was collected by obtaining three instances(More)