• Publications
  • Influence
MOT16: A Benchmark for Multi-Object Tracking
TLDR
A new release of the MOTChallenge benchmark, which focuses on multiple people tracking, and offers a significant increase in the number of labeled boxes, but also provides multiple object classes beside pedestrians and the level of visibility for every single object of interest.
One-Shot Video Object Segmentation
TLDR
One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot).
Tracking Without Bells and Whistles
TLDR
Overall, Tracktor yields superior tracking performance than any current tracking method and the analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking
TLDR
With MOTChallenge, the work toward a novel multiple object tracking benchmark aimed to address issues of standardization, and the way toward a unified evaluation framework for a more meaningful quantification of multi-target tracking is described.
Image-Based Localization Using LSTMs for Structured Feature Correlation
TLDR
Experimental results show the proposed CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes outperforms existing deep architectures, and can localize images in hard conditions, where classic SIFT-based methods fail.
Video Object Segmentation without Temporal Information
TLDR
Semantic One-Shot Video Object Segmentation is presented, based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot).
Learning by Tracking: Siamese CNN for Robust Target Association
This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a
Learning an Image-Based Motion Context for Multiple People Tracking
TLDR
A novel method for multiple people tracking that leverages a generalized model for capturing interactions among individuals which is able to encode the effect of undetected targets, making the tracker more robust to partial occlusions.
HOTA: A Higher Order Metric for Evaluating Multi-object Tracking
TLDR
This work presents a novel MOT evaluation metric, higher order tracking accuracy (HOTA), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers.
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
TLDR
A novel approach that segments and tracks instances across space and time in a single stage and is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster pixels belonging to a specific objectinstance over an entire video clip is proposed.
...
...