Intra-Clip Aggregation For Video Person Re-Identification

  title={Intra-Clip Aggregation For Video Person Re-Identification},
  author={Takashi Isobe and Jian Han and Fang Zhu and Yali Li and Shengjin Wang},
  journal={2020 IEEE International Conference on Image Processing (ICIP)},
  • Takashi Isobe, Jian Han, +2 authors S. Wang
  • Published 5 May 2019
  • Computer Science
  • 2020 IEEE International Conference on Image Processing (ICIP)
Video-based person re-identification has drawn massive attention in recent years due to its extensive applications in video surveillance. While deep learning based methods have led to significant progress, these methods are limited by ineffectively using complementary information, which is blamed on necessary data augmentation in training process. Data augmentation has been widely used to mitigate the overfitting trap and improve the ability of network representation. However, the previous… 

Figures and Tables from this paper

Towards Discriminative Representation Learning for Unsupervised Person Re-identification
This work proposes a cluster-wise contrastive learning algorithm (CCL) by iterative optimization of feature learning and cluster refinery to learn noise-tolerant representations in the unsupervised manner and adopts a progressive domain adaptation (PDA) strategy to gradually mitigate the domain gap between source and target data.
Channel Transformer Network
A novel parameter free method named Channel Transformer Network (CTN) is proposed to decrease or increase channels for convolutional Neural Networks modules whilst keeping most information with lower computation complexity, which can be used in other vision tasks like image classification and object detection etc.


Video-Based Person Re-Identification With Accumulative Motion Context
The experimental results demonstrate that the proposed AMOC network outperforms state-of-the-arts for video-based re-identification significantly and confirm the advantage of exploiting long-range motion context forVideo-based person re-Identification, validating the motivation evidently.
Revisiting Temporal Modeling for Video-based Person ReID
This work comprehensively study and compare four different temporal modeling methods for video-based person reID and proposes a new attention generation network which adopts temporal convolution to extract temporal information among frames.
Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification
A new spatiotemporal attention model is proposed that automatically discovers a diverse set of distinctive body parts in video clips of people across non-overlapping cameras and outperforms the state-of-the-art approaches by large margins on multiple metrics.
Region-based Quality Estimation Network for Large-scale Person Re-identification
A novel Region-based Quality Estimation Network (RQEN) is proposed, in which an ingenious training mechanism enables the effective learning to extract the complementary region-based information between different frames.
See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification
  • Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, T. Tan
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
This paper focuses on video-based person re-identification and builds an end-to-end deep neural network architecture to jointly learn features and metrics and integrates the surrounding information at each location by a spatial recurrent model when measuring the similarity with another pedestrian video.
Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention
This work uses 3D convolutions on video volume, instead of using 2D convolution across frames, to extract spatial and temporal features simultaneously and uses a non-local block to tackle the misalignment problem and capture spatial-temporal long-range dependencies.
STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification
A novel Spatial-Temporal Attention (STA) approach to tackle the large-scale person reidentification task in videos that fully exploits those discriminative parts of one target person in both spatial and temporal dimensions and results in a 2-D attention score matrix.
Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-identification
This work presents a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching items can directly influence the computation of each other's representation.
Video Person Re-Identification by Temporal Residual Learning
The proposed framework largely aims to exploit the adequate temporal information of video sequences and tackle the poor spatial alignment of moving pedestrians and design a temporal residual learning (TRL) module to simultaneously extract the generic and specific features of consecutive frames.
Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification
The proposed Spatial-Temporal Synergic Residual Network (STSRN), which contains a spatial residual extractor, a temporal residual processor and a spatial-temporal smooth module, achieves consistently superior performance over most of state-of-the-art methods.