Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies

@article{Sadeghian2017TrackingTU,
  title={Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies},
  author={Amir Sadeghian and Alexandre Alahi and Silvio Savarese},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={300-311}
}
The majority of existing solutions to the Multi-Target Tracking (MTT) problem do not combine cues over a long period of time in a coherent fashion. [] Key Method To address this challenge, we present a structure of Recurrent Neural Networks (RNN) that jointly reasons on multiple cues over a temporal window. Our method allows to correct data association errors and recover observations from occluded states. We demonstrate the robustness of our data-driven approach by tracking multiple targets using their…
Recurrent Autoregressive Networks for Online Multi-object Tracking
TLDR
This work proposes the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time and achieves top-ranked results on the two benchmarks.
Multi-object Tracking with Neural Gating Using Bilinear LSTM
TLDR
A novel recurrent network model, the Bilinear LSTM, is proposed in order to improve the learning of long-term appearance models via a recurrent network based on intuitions drawn from recursive least squares.
An Online and Flexible Multi-object Tracking Framework Using Long Short-Term Memory
TLDR
A novel Siamese LSTM Network is proposed to interpret both temporal and spatial components nonlinearly by learning the feature of trajectories, and outputs the similarity score of two trajectories for data association.
FANTrack: 3D Multi-Object Tracking with Feature Association Network
TLDR
A data-driven approach to online multi-object tracking that uses a convolutional neural network for data association in a tracking-by-detection framework that learns to perform global assignments in 3D purely from data, handles noisy detections and varying number of targets, and is easy to train.
SoDA: Multi-Object Tracking with Soft Data Association
TLDR
This work proposes a novel approach to MOT that uses attention to compute track embeddings that encode the spatiotemporal dependencies between observed objects, which allows the model to relax hard data associations, which may lead to unrecoverable errors.
ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking
TLDR
This paper introduces a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion, and shows the generality of this model by using it to produce future representations in the challenging task of human motion prediction.
Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers
TLDR
MO3TR is presented: a truly end-to-end Transformer-based online multi-object tracking (MOT) framework that learns to handle occlusions, track initiation and termination without the need for an explicit data association module or any heuristics/post-processing.
Online Multi-Object Tracking Based on Feature Representation and Bayesian Filtering Within a Deep Learning Architecture
TLDR
A novel affinity model by jointly learning more powerful feature representation and distance metric within a deep architecture and a recurrent neural network-based Bayesian filtering module, which takes a hidden state of the LSTM network as an input and performs recursive prediction and update for explicitly estimating targets state.
End-to-End Learning Deep CRF models for Multi-Object Tracking
TLDR
This paper proposes learning deep conditional random field (CRF) networks, aiming to model the assignment costs as unary potentials and the long-term dependencies among detection results as pairwise potentials, and uses a bidirectional long short-term memory (LSTM) network to encode the long -term dependencies.
DEFT: Detection Embeddings for Tracking
TLDR
This paper proposes an efficient joint detection and tracking model named DEFT, or “Detection Embeddings for Tracking", which relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
...
...

References

SHOWING 1-10 OF 111 REFERENCES
Tracklet Association by Online Target-Specific Metric Learning and Coherent Dynamics Estimation
TLDR
A novel method based on online target-specific metric learning and coherent dynamics estimation for tracklet (track fragment) association by network flow optimization in long-term multi-person tracking that outperforms several state-of-the-art tracking methods.
Multi-target tracking by online learning of non-linear motion patterns and robust appearance models
  • Bo Yang, R. Nevatia
  • Computer Science
    2012 IEEE Conference on Computer Vision and Pattern Recognition
  • 2012
TLDR
An online approach to learn non-linear motion patterns and robust appearance models for multi-target tracking in a tracklet association framework is described, and significant improvements compared with state-of-art methods are shown.
Improving Multi-frame Data Association with Sparse Representations for Robust Near-online Multi-object Tracking
TLDR
This work proposes to formulate the multi-frame data association step as an energy minimization problem, designing an energy that efficiently exploits sparse representations of all detections, and proposes to use a structured sparsity-inducing norm to compute representations more suited to the tracking context.
Coupling detection and data association for multiple object tracking
TLDR
A novel framework for multiple object tracking in which the problems of object detection and data association are expressed by a single objective function, which follows the Lagrange dual decomposition strategy.
Multiple Hypothesis Tracking Revisited
TLDR
It is demonstrated that a classical MHT implementation from the 90's can come surprisingly close to the performance of state-of-the-art methods on standard benchmark datasets, and it is shown that appearance models can be learned efficiently via a regularized least squares framework.
Long-Term Time-Sensitive Costs for CRF-Based Tracking by Detection
We present a Conditional Random Field (CRF) approach to tracking-by-detection in which we model pairwise factors linking pairs of detections and their hidden labels, as well as higher order
Learning by Tracking: Siamese CNN for Robust Target Association
This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a
The Way They Move: Tracking Multiple Targets with Similar Appearance
We introduce a computationally efficient algorithm for multi-object tracking by detection that addresses four main challenges: appearance similarity among targets, missing data due to targets being
Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor
  • Wongun Choi
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
TLDR
A novel Aggregated Local Flow Descriptor (ALFD) that encodes the relative motion pattern between a pair of temporally distant detections using long term interest point trajectories (IPTs) and ablative analysis verifies the superiority of the ALFD metric over the other conventional affinity metrics.
An online learned CRF model for multi-target tracking
  • Bo Yang, R. Nevatia
  • Computer Science
    2012 IEEE Conference on Computer Vision and Pattern Recognition
  • 2012
TLDR
The online CRF approach is more powerful at distinguishing spatially close targets with similar appearances, as well as in dealing with camera motions, and an efficient algorithm is introduced for finding an association with low energy cost.
...
...