Toulouse campus surveillance dataset: scenarios, soundtracks, synchronized videos with overlapping and disjoint views

  title={Toulouse campus surveillance dataset: scenarios, soundtracks, synchronized videos with overlapping and disjoint views},
  author={Thierry Malon and Geoffrey Roman-Jimenez and Patrice Guyot and Sylvie Chambon and Vincent Charvillat and Alain Crouzil and Andr{\'e} P{\'e}ninou and Julien Pinquier and Florence S{\`e}des and Christine S{\'e}nac},
  journal={Proceedings of the 9th ACM Multimedia Systems Conference},
In surveillance applications, humans and vehicles are the most important common elements studied. In consequence, detecting and matching a person or a car that appears on several videos is a key problem. Many algorithms have been introduced and nowadays, a major relative problem is to evaluate precisely and to compare these algorithms, in reference to a common ground-truth. In this paper, our goal is to introduce a new dataset for evaluating multi-view based methods. This dataset aims at paving… 

Figures and Tables from this paper

Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data
This work develops an alternative approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras; showing in the process that large existing autonomous vehicle datasets can be leveraged for pre-training.
Story comparison for estimating field of view overlap in a video collection
From a collection of videos acquired from static cameras simultaneously, a method for finding groups of videos with overlapping fields of view is proposed, which shows promising results.
A video summarization framework based on activity attention modeling using deep features for smart campus surveillance system
A keyframe extraction method to summarize academic activities to produce a short representation of the target video while preserving all the essential activities present in the original video is introduced.
Audiovisual Annotation Procedure for Multi-view Field Recordings
An original procedure to produce manual annotations in different contexts, including multi-modal and multi-view documents, based on using both audio and video annotations is proposed, which ensures consistency considering audio or video only, and provides additionally audiovisual information at a richer level.
Estimation of Correspondent Trajectories in Multiple Overlapping Synchronized Videos Using Correlation of Activity Functions
The main idea is that two areas from two different videos that systematically offer presence of objects simultaneously are very likely to correspond to each other and the correspondence between cells is used to find the reformulated trajectory in the other videos.
Negative filtering of CCTV Content - forensic video analysis framework
A negative filtering approach based on quality and usability/utility metadata is proposed, enabling to eliminate video sequences that do not satisfy requirements for their analysis through automatic processing.
Audio Annotation on Myanmar Traditional Boxing Video by Enhancing DT
This system is intended to provide boxing lover to highlight, review and replay their desired part by removing unwanted video parts by enhancing the classification accuracy and to promote the weakness of decision tree.
ecent trends in crowd analysis: A review
The purpose of the review is to find subareas, in crowd analysis, that are still unexplored or that seem to be rarely addressed through the prism of Deep Learning.
Improving Vehicle Re-Identification using CNN Latent Spaces: Metrics Comparison and Track-to-track Extension
Two main results are highlighted: i) the importance of the metric choice for vehicle re-identification, and ii) T2TP improves the performances compared to I2TP, especially when coupled with MCD-based metrics.


A large-scale benchmark dataset for event recognition in surveillance video
We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor
AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video
A concept for automatic construction site monitoring by taking into account 4D information (3D over time), that is acquired from highly-overlapping digital aerial images, which largely supports automated methods toward full scene understanding.
3DPeS: 3D people dataset for surveillance and forensics
3DPeS is a new dataset for 3D/multi- view surveillance and forensic applications, designed for discussing and evaluating research results in people re-identification and other related activities (people detection, people segmentation and people tracking).
UMPM benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction
This paper presents the Utrecht Multi-Person Motion benchmark, which includes synchronized motion capture data and video sequences from multiple viewpoints for multi-person motion including multi- person interaction and is available to the research community to promote research in multi-Person articulated human motion analysis.
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking
Inspired by the well received evaluation approach on the LFW dataset, a standard evaluation protocol is designed and benchmarked MCAD under several scenarios, which shows that while an average of 85% accuracy is achieved under the closed-view scenario, the performance suffers from a significant drop under the cross-View scenario.
Audio Surveillance
A general taxonomy, inspired by the more widespread video surveillance field, is proposed to systematically describe the methods covering background subtraction, event classification, object tracking, and situation analysis, highlighting the target applications of each described method and providing the reader with a systematic and schematic view.
Crowded Scene Analysis: A Survey
The background knowledge and the available features related to crowded scenes are provided and existing models, popular algorithms, evaluation protocols, and system performance are provided corresponding to different aspects of the crowded scene analysis.
A survey of approaches and trends in person re-identification
HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion
A baseline algorithm for 3D articulated tracking that uses a relatively standard Bayesian framework with optimization in the form of Sequential Importance Resampling and Annealed Particle Filtering is described, and a variety of likelihood functions, prior models of human motion and the effects of algorithm parameters are explored.
Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training