Localized region context and object feature fusion for people head detection

  title={Localized region context and object feature fusion for people head detection},
  author={Yule Li and Yong Dou and Xinwang Liu and Teng Li},
  journal={2016 IEEE International Conference on Image Processing (ICIP)},
  • Yule Li, Y. Dou, Teng Li
  • Published 1 September 2016
  • Computer Science
  • 2016 IEEE International Conference on Image Processing (ICIP)
People head detection in crowded scenes is challenging due to the large variability in clothing and appearance, small scales of people, and strong partial occlusions. Traditional bottom-up proposal methods and existing region proposal network approaches suffer from either poor recall or low precision. In this paper, we propose to improve both the recall and precision of head detection of region proposal models by integrating the local head information. In specific, we first use a region… 

Figures from this paper

Multi-person Head Segmentation in Low Resolution Crowd Scenes Using Convolutional Encoder-Decoder Framework

This work proposes a multi-person head segmentation algorithm in crowded environments using a convolutional encoder-decoder network which is trained using head probability heatmaps and has demonstrated excellent performance on a challenging spectator crowd dataset.

Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture

A novel method, Feature Refine Net (FRN), and a cascaded multi-scale architecture to improve the performance of small head detection, and the proposed channel weighting method enables FRN to make use of features alternatively and effectively.

HeadNet: An End-to-End Adaptive Relational Network for Head Detection

An effective adaptive relational network to capture context information, which is greatly helpful to suppress missed detection and achieve state-of-the-art results on two challenging datasets, i.e., HollywoodHeads and Brainwash.

Scale Mapping and Dynamic Re-Detecting in Dense Head Detection

This paper investigates the influence of head scale and contextual information, and proposes a scale-invariant method for head detection that can dynamically detect heads depending on the complexity of the image.

Head pose estimation with neural networks from surveillant images

This approach consists of two stages, head detection and pose estimation, and uses ResNet-50 as the backbone of the classifier, of which the input is the result of head detection.

TCM: Temporal Consistency Model for Head Detection in Complex Videos

A temporal consistency model (TCM) is proposed to enhance the performance of a generic object detector by integrating spatial-temporal information that exists among subsequent frames of a particular video by recovering missed detection and suppressing false positives.

Fully Convolutional Network for Crowd Size Estimation by Density Map and Counting Regression

A counting-by-regression framework is employed, where the human head is modeled as a Guassian distribution, and a deeper and lighter fully convolutional network (FCN) is designed to be a crowd density map estimator.

Representations, Analysis and Recognition of Shape and Motion from Imaging Data

This paper presents a comparison between two core paradigms for computing scene flow from multi-view videos of dynamic scenes. In both approaches, shape and motion estimation are decoupled, in

Real-Time and Accurate UAV Pedestrian Detection for Social Distancing Monitoring in COVID-19 Pandemic

A lightweight pedestrian detection network to accurately detect pedestrians by human head detection in real-time and then calculate the social distancing between pedestrians on UAV images is proposed and it is shown that multi-scale feature and spatial attention significantly contribute the performance of pedestrian detection.

Head mouse control system for people with disabilities

The designed human–machine interface is an assistive system that uses head movements and blinking for mouse control that allows people with disabilities to freely control mouse cursors and mouse buttons without wearing any equipment.



Context-Aware CNNs for Person Head Detection

This work leverage person-scene relations and propose a global CNN model trained to predict positions and scales of heads directly from the full image via energy-based model where the potentials are computed with a CNN framework.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

Edge Boxes: Locating Object Proposals from Edges

A novel method for generating object bounding box proposals using edges is proposed, showing results that are significantly more accurate than the current state-of-the-art while being faster to compute.

Sample-Specific Late Fusion for Visual Category Recognition

This paper identifies the optimal fusion weights for each sample and pushes positive samples to top positions in the fusion score rank list, and forms the problem as a L∞ norm constrained optimization problem and applies the Alternating Direction Method of Multipliers for the optimization.

Histograms of oriented gradients for human detection

  • N. DalalB. Triggs
  • Computer Science
    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)
  • 2005
It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

BING: Binarized normed gradients for objectness estimation at 300fps

To improve localization quality of the proposals while maintaining efficiency, a novel fast segmentation method is proposed and demonstrated its effectiveness for improving BING’s localization performance, when used in multi-thresholding straddling expansion (MTSE) post-processing.

Object Detection with Discriminatively Trained Part Based Models

We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

End-to-End People Detection in Crowded Scenes

This work proposes a model that is based on decoding an image into a set of people detections, which takes an image as input and directly outputs aset of distinct detection hypotheses.