End-to-End People Detection in Crowded Scenes
@article{Stewart2015EndtoEndPD, title={End-to-End People Detection in Crowded Scenes}, author={Russell Stewart and Mykhaylo Andriluka and A. Ng}, journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2015}, pages={2325-2333} }
Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. [] Key Method Because we generate predictions jointly, common post-processing steps such as nonmaximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in…
426 Citations
People detection in crowded scenes via regional-based convolutional network
- Computer Science2016 IEEE 13th International Conference on Signal Processing (ICSP)
- 2016
This work proposes an end-to-end framework that uses the convolutional network for feature representation, which generates candidate proposals online and performs the location optimization simultaneously and greatly outperforms traditional methods.
End-to-end crowd counting via joint learning local and global count
- Computer Science2016 IEEE International Conference on Image Processing (ICIP)
- 2016
An end-to-end convolutional neural network architecture that takes a whole image as its input and directly outputs the counting result, taking advantages of contextual information when predicting both local and global count is proposed.
Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
A single architecture is proposed that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme.
Multi-Layer Proposal Network for People Counting in Crowded Scene
- Computer Science2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA)
- 2017
By separately detecting the heads of people, this system gets rid of the problems caused by occlusion and achieves robust detection results in the case that no matter whether the head is towards or backwards to the camera.
Detection in Crowded Scenes: One Proposal, Multiple Predictions
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
The key of the approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks, which can effectively handle the difficulty of detecting highly overlapped objects.
Learning to Filter Object Detections
- Computer ScienceGCPR
- 2017
A filtering network (FNet) is proposed, a method which replaces NMS with a differentiable neural network that allows joint reasoning and re-scoring of the generated set of hypotheses per image, and demonstrates that FNet, a feed-forward network architecture, is able to mimic NMS decisions, despite the sequential nature of NMS.
People detection in crowded scenes using hierarchical features
- Computer Science2017 IEEE International Conference on Imaging Systems and Techniques (IST)
- 2017
We propose a new architecture(based on Faster R-CNN framework) for people detection. Our model extracts the first, third, fifth stage of the VGG16 network to form a robust feature map which consists…
Detective: An Attentive Recurrent Model for Sparse Object Detection
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
This work proposes a training mechanism based on the Hungarian algorithm and a loss that balances the localization and classification tasks that allows Detective to achieve promising results on the PASCAL VOC dataset for object detection.
End-to-End Object Detection with Transformers
- Computer ScienceECCV
- 2020
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.
Progressive End-to-End Object Detection in Crowded Scenes
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
A progressive predicting method that first select accepted queries prone to generate true positive predictions, then refine the rest noisy queries according to the previously accepted predictions, and can significantly boost the performance of query-based detectors in crowded scenes.
References
SHOWING 1-10 OF 28 REFERENCES
Learning People Detectors for Tracking in Crowded Scenes
- Computer Science2013 IEEE International Conference on Computer Vision
- 2013
This paper proposes a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes.
Detection and Tracking of Occluded People
- Medicine, Computer ScienceInternational Journal of Computer Vision
- 2013
This work observes that typical occlusions are due to overlaps between people and proposes a people detector tailored to various occlusion levels, and leverages the fact that person/person Occlusion result in very characteristic appearance patterns that can help to improve detection results.
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
- Computer ScienceICLR
- 2014
This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.
Pedestrian detection in crowded scenes
- Computer Science2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)
- 2005
Qualitative and quantitative results on a large data set confirm that the core part of the method is the combination of local and global cues via probabilistic top-down segmentation that allows examining and comparing object hypotheses with high precision down to the pixel level.
End-to-end integration of a Convolutional Network, Deformable Parts Model and non-maximum suppression
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This work trains a new model using a new structured loss function that considers all bounding boxes within an image, rather than isolated object instances, and enables the non-maximal suppression operation, previously treated as a separate post-processing stage, to be integrated into the model.
People-tracking-by-detection and people-detection-by-tracking
- Computer Science2008 IEEE Conference on Computer Vision and Pattern Recognition
- 2008
This paper combines the advantages of both detection and tracking in a single framework using a hierarchical Gaussian process latent variable model (hGPLVM) and presents experimental results that demonstrate how this allows to detect and track multiple people in cluttered scenes with reoccurring occlusions.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
- Computer Science2014 IEEE Conference on Computer Vision and Pattern Recognition
- 2014
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Multi-cue onboard pedestrian detection
- Computer Science2009 IEEE Conference on Computer Vision and Pattern Recognition
- 2009
Evaluating different features and classifiers in a sliding-window framework indicates that incorporating motion information improves detection performance significantly and the combination of multiple and complementary feature types can also help improve performance.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Filtered channel features for pedestrian detection
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A unifying framework is proposed that multiple top performing pedestrian detectors can be modelled by using an intermediate layer filtering low-level features in combination with a boosted decision forest and experimentally explore different filter families.