Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

@article{Wen2021DetectionTA,
  title={Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark},
  author={Longyin Wen and Dawei Du and Pengfei Zhu and Qinghua Hu and Qilong Wang and Liefeng Bo and Siwei Lyu},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={7808-7817}
}
  • Longyin WenDawei Du Siwei Lyu
  • Published 6 May 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured large-scale dataset, named as DroneCrowd, formed by 112 video clips with 33, 600 HD frames in various scenarios. Notably, we annotate 20, 800 people trajectories with 4.8 million heads and several video-level attributes. Meanwhile, we design the Space-Time Neighbor-Aware Network (STNNet) as a strong baseline to solve object detection… 

Figures and Tables from this paper

PSGCNet: A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote-Sensing Images

A novel framework for dense object counting in remote sensing images, which incorporates a pyramidal scale module (PSM) and a global context module (GCM), dubbed PSGCNet is proposed, where PSM is used to adaptively capture multi-scale information and GCM is to guide the model to select suitable scales generated from PSM.

Crowd Density Estimation from Autonomous Drones Using Deep Learning: Challenges and Applications

This research presents rigorous investigation and analysis in existing methods with their applications for crowd flow estimation from UAV and comprehensive performance evaluation for existing methods using recent deep learning frameworks is illustrated for crowd counting purposes.

A point and density map hybrid network for crowd counting and localization based on unmanned aerial vehicles

A novel network named PDNet is presented, which employs the multi-task learning approach to combine the point regression and density map regression and is designed for density maps regression and point regression, respectively.

Intelligence-Led Policing and the New Technologies Adopted by the Hellenic Police

In the never-ending search by Law Enforcement Agencies (LEAs) for ways to reduce crime more effectively, the prevention of criminal activity is always considered the ideal solution. Since the 1990s,

Cattle Detection Occlusion Problem

The proposed method proved superior to the usual competing algorithms for cow face detection, especially in very difficult cases, and improved the occlusion problem that is to detect hidden cattle from a huge dataset captured by drones using deep learning algorithms.

Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes

Unsupervised domain adaptive DINO via cascading alignment (CA-DINO) was proposed, which consists of attention-enhanced double discriminators (AEDD) and weak-restraints on category-level token (WROT) and yields 41% relative improvement compared to baseline on the benchmark dataset Foggy Cityscapes.

Enhancing Drones for Law Enforcement and Capacity Monitoring at Open Large Events

An artificial intelligence solution developed for the Castelldefels local police (Barcelona, Spain) to enhance the capabilities of drones used for the surveillance of large events and proposes a novel methodology for the efficient integration of deep learning algorithms in drone avionics.

SSAT: Self-Supervised Associating Network for Multiobject Tracking

This paper proposes a novel self-supervised learning method using several short videos that contain no human-added labels, based on the idea that each video is a set of temporally corresponding image frames, and describes how to improve tracking performance using a re-identification network trained in a self- supervised manner.

Perception of Risks and Usefulness of Smart Video Surveillance Systems

Interestingly, men rate the risk concerning their own privacy significantly higher than women do and the presented system as fairly useful and slightly risky for their own Privacy, provide insight into how people perceive smart video surveillance.

References

SHOWING 1-10 OF 46 REFERENCES

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

A novel approach is proposed that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image and significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.

Distribution Matching for Crowd Counting

In crowd counting, each training image contains multiple people, where each person is annotated by a dot, and DM-Count uses Optimal Transport (OT) to measure the similarity between the normalized predicted density map and the normalized ground truth density map to stabilize OT computation.

Single-Image Crowd Counting via Multi-Column Convolutional Neural Network

With the proposed simple MCNN model, the method outperforms all existing methods and experiments show that the model, once trained on one dataset, can be readily transferred to a new dataset.

Cross-scene crowd counting via deep convolutional neural networks

A deep convolutional neural network is proposed for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count, to obtain better local optimum for both objectives.

PointConv: Deep Convolutional Networks on 3D Point Clouds

The dynamic filter is extended to a new convolution operation, named PointConv, which can be applied on point clouds to build deep convolutional networks and is able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds.

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation

Contextaware crowd counting

  • In CVPR,
  • 2019

Efficiently Scaling up Crowdsourced Video Annotation

It is argued that video annotation requires specialized skill; most workers are poor annotators, mandating robust quality control protocols and an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling.

Hybrid Graph Neural Networks for Crowd Counting

This paper presents a novel network structure called Hybrid Graph Neural Network (HyGnn), which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph.

Estimating People Flows to Better Count them in Crowded Scenes

This paper advocates estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing, which enables us to impose much stronger constraints encoding the conservation of the number of people.