Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks

  title={Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks},
  author={Zhen He and Hangen He},
Nowadays, video surveillance has become ubiquitous with the quick development of artificial intelligence. Multi-object detection (MOD) is a key step in video surveillance and has been widely studied for a long time. The majority of existing MOD algorithms follow the “divide and conquer” pipeline and utilize popular machine learning techniques to optimize algorithm parameters. However, this pipeline is usually suboptimal since it decomposes the MOD task into several sub-tasks and does not… 

Figures and Tables from this paper

A Novel Low Processing Time System for Criminal Activities Detection Applied to Command and Control Citizen Security Centers

A novel application of Deep Learning, specifically a Faster Region-Based Convolutional Network (R-CNN) for the detection of criminal activities treated as “ objects” to be detected in real-time video.

A Framework for Automatic Building Detection from Low-Contrast Satellite Images

The contrast of an image is optimized to represent all the information using singular value decomposition (SVD) based on the discrete wavelet transform (DWT), and a line-segment detection scheme is applied to accurately detect building line segments.

Vessel Detection and Tracking Method Based on Video Surveillance

A method that allows the detection and tracking of ships using the video streams of existing monitoring systems for ports and rivers is presented and the results confirm the usability of the proposed solution.

Optical frequency and phase information-based fusion approach for image rotation symmetry detection.

Compared with known methods, the proposed method can get more multiple-scale (skewed, small-scale, and regular) rotation symmetry centers, and can significantly boost the performance of detecting symmetry properties with better accuracy.

Camouflage design, assessment and breaking techniques: a survey

This article discusses various camouflage design, assessment, and breaking techniques in the literature and addresses several issues of interest as well as future research direction in this area.



Accurate Single Stage Detector Using Recurrent Rolling Convolution

A novel single stage end-to-end trainable object detection network is proposed by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are deep in context.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

You Only Look Once: Unified, Real-Time Object Detection

Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

SSD: Single Shot MultiBox Detector

The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

DenseBox: Unifying Landmark Localization with End to End Object Detection

DenseBox is introduced, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image and shows that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray.

Object Detection with Discriminatively Trained Part Based Models

We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in

CRAFT Objects from Images

This paper calls the proposed method "CRAFT" (Cascade Regionproposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade, and shows that the cascade structure helps in both tasks.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Monocular Pedestrian Detection: Survey and Experiments

An overview of the current state of the art of pedestrian detection from both methodological and experimental perspectives is provided and a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds is indicated.