Deep Occlusion Reasoning for Multi-camera Multi-target Detection

@article{Baqu2017DeepOR,
  title={Deep Occlusion Reasoning for Multi-camera Multi-target Detection},
  author={Pierre Baqu{\'e} and François Fleuret and Pascal V. Fua},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={271-279}
}
  • P. Baqué, F. Fleuret, P. Fua
  • Published 19 April 2017
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision (ICCV)
People detection in single 2D images has improved greatly in recent years. [] Key Method One of its key ingredients are high-order CRF terms that model potential occlusions and give our approach its robustness even when many people are present. Our model is trained end-to-end and we show that it outperforms several state-of-the-art algorithms on challenging scenes.

Figures from this paper

Bringing Generalization to Deep Multi-View Pedestrian Detection
TLDR
This work proposes a novel Generalized MVD (GMVD) dataset, assimilating diverse scenes with changing daytime, camera configurations, varying number of cameras, and discusses the properties essential to bring generalization to MVD and proposes a barebones model to incorporate them.
Bringing Generalization to Deep Multi-view Detection
TLDR
This work proposes a novel Generalized MVD (GMVD) dataset, assimilating diverse scenes with changing daytime, camera configurations, varying number of cameras, and discusses the properties essential to bring generalization to MVD and proposes a barebones model to incorporate them.
People Detection in a Depth Sensor Network via Multi-View CNNs trained on Synthetic Data
TLDR
An end-to-end multi-view deep learning architecture which takes three foreground segmented overlapping depth images as input and predicts the marginal probability distribution of people present in the scene is proposed.
Human Detection and Segmentation via Multi-view Consensus
TLDR
For scenes with dynamic activities and camera motion, this work proposes a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training via coarse 3D localization in a voxel grid and fine-grained offset regression.
Custom Object Detection via Multi-Camera Self-Supervised Learning
TLDR
MCSSL associates bounding boxes between cameras with overlapping fields of view by leveraging epipolar geometry and state-of-the-art tracking and reID algorithms, and prudently generates two sets of pseudo-labels to fine-tune backbone and detection networks respectively in an object detection model.
Neural Scene Decomposition for Multi-Person Motion Capture
TLDR
A self-supervised approach to learning what is called a neural scene decomposition (NSD), which encodes 3D geometry and can be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data.
Multi-camera object tracking via deep metric learning
TLDR
A convolutional network and triplet loss were used to map an object with its position in each partial view to a vector in the hyperspace and supervises the learning of representation respectively and by applying cluster algorithm on the transferred representation of object in the top view, the information from multiple partial cameras was fused and unified.
Semantic Driven Multi-Camera Pedestrian Detection
TLDR
The experimental results show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad hoc annotations nor human-guided configuration.
A Bayesian 3D Multi-view Multi-object Tracking Filter
TLDR
The key innovation is a high fidelity yet tractable 3D occlusion model, amenable to optimal Bayesian multi-view multi-object filtering, which seamlessly integrates, into a single Bayesian recursion, the sub-tasks of track management, state estimation, clutter rejection, and occlusions/misdetection handling.
A Bayesian Filter for Multi-View 3D Multi-Object Tracking With Occlusion Handling
TLDR
The key innovation is a high fidelity yet tractable 3D occlusion model, amenable to optimal Bayesian multi-view multi-object filtering, which seamlessly integrates, into a single Bayesian recursion, the sub-tasks of track management, state estimation, clutter rejection, and occlusions/misdetection handling.
...
...

References

SHOWING 1-10 OF 38 REFERENCES
Deep Multi-camera People Detection
TLDR
The core of the method is an architecture which makes use of monocular pedestrian data-set, available at larger scale than the multi-view ones, applies parallel processing to the multiple video streams, and jointly utilises it, and outperforms existing methods by large margins on the commonly used PETS 2009 data- set.
The WILDTRACK Multi-Camera Person Dataset
TLDR
A large-scale HD dataset named WILDTRACK is provided which finally makes advanced deep learning methods applicable to people detection problems and benchmarking multi-camera state of the art detectors on this new dataset.
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Multicamera People Tracking with a Probabilistic Occupancy Map
TLDR
It is demonstrated that the generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori.
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Robust multiple cameras pedestrian detection with multi-view Bayesian network
Higher Order Potentials in End-to-End Trainable Conditional Random Fields
TLDR
Two types of higher order potentials can be included in a Conditional Random Field model embedded within a deep network to allow inference with the efficient and differentiable mean-field algorithm, making it possible to implement the CRF model as a stack of layers in adeep network.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
How Far are We from Solving Pedestrian Detection?
TLDR
The gap between current state-of-the-art methods and the "perfect single frame detector" is investigated, the impact of training annotation noise on the detector performance is studied, and it is shown that one can improve even with a small portion of sanitised training data.
Learning Arbitrary Potentials in CRFs with Gradient Descent
TLDR
A new inference and learning framework which can learn arbitrary pairwise CRF potentials is introduced which can easily be integrated in deep neural networks to allow for end-to-end training.
...
...