Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

@article{Liu2021SpatialTemporalCA,
  title={Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos},
  author={Jiawei Liu and Zhengjun Zha and Wei Wu and Kecheng Zheng and Qibin Sun},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={4368-4377}
}
  • Jiawei Liu, Zhengjun Zha, +2 authors Qibin Sun
  • Published 15 April 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Video-based person re-identification aims to match pedestrians from video sequences across non-overlapping camera views. The key factor for video person re-identification is to effectively exploit both spatial and temporal clues from video sequences. In this work, we propose a novel Spatial-Temporal Correlation and Topology Learning framework (CTL) to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation. Specifically, CTL utilizes a CNN backbone… Expand
Learning Rich Features for Gait Recognition by Integrating Skeletons and Silhouettes
  • Yunjie Peng, Saihui Hou, Kang Ma, Yang Zhang, Yongzhen Huang, Zhiqiang He
  • Computer Science
  • ArXiv
  • 2021
TLDR
A simple yet effective bimodal fusion (BiFusion) network, which mines the complementary clues of skeletons and silhouettes, to learn rich features for gait identification. Expand
Deep learning-based person re-identification methods: A survey and outlook of recent works
  • Zhang Ming, Min Zhu, +4 authors Yong Yang
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work compares traditional and deep learning-based person Re-ID methods, then presents the main contributions of several surveys and analyzed their focused dimensions and shortcomings, and separates these methods into five categories according to their characteristic. Expand
ERA: ENTITY–RELATIONSHIP AWARE VIDEO SUMMARIZATION
Video summarization aims to simplify large-scale video browsing by generating concise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations,Expand
ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN
  • Guande Wu, Jianzhe Lin, Claudio T. Silva
  • Computer Science
  • ArXiv
  • 2021
TLDR
A novel Entity–relationship Aware video summarization method (ERA) is proposed to address the primary problems of this GAN-based methods, and introduces an Adversarial Spatio-Temporal network to construct the relationship among entities, which the author thinks should also be given high priority in the summarization. Expand
IntentVizor: Towards Generic Query Guided Interactive Video Summarization Using Slow-Fast Graph Convolutional Networks
  • Guande Wu, Jianzhe Lin, Claudio T. Silva
  • Computer Science
  • ArXiv
  • 2021
TLDR
A novel IntentVizor framework is proposed, which is an interactive video summarization framework guided by genric multi-modality queries that uses a set of intents to represent the inputs of users to design a new interactive visual analytic interface. Expand

References

SHOWING 1-10 OF 54 REFERENCES
Temporal Coherence or Temporal Motion: Which Is More Critical for Video-Based Person Re-identification?
TLDR
This paper proposes a simple yet effective Adversarial Feature Augmentation (AFA) method, which highlights the temporal coherence features by introducing adversarial augmented temporal motion noise. Expand
Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification
  • Yichao Yan, Jie Qin, +4 authors L. Shao
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
This work proposes a novel graph-based framework, namely Multi-Granular Hypergraph (MGH), to pursue better representational capabilities by modeling spatiotemporal dependencies in terms of multiple granularities to enhance the overall video representation. Expand
MARS: A Video Benchmark for Large-Scale Person Re-Identification
TLDR
It is shown that CNN in classification mode can be trained from scratch using the consecutive bounding boxes of each identity, and the learned CNN embedding outperforms other competing methods considerably and has good generalization ability on other video re-id datasets upon fine-tuning. Expand
Adaptive Graph Representation Learning for Video Person Re-Identification
TLDR
This work proposes an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between relevant regional features and proposes a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. Expand
Relation-Guided Spatial Attention and Temporal Refinement for Video-Based Person Re-Identification
TLDR
Two relation-guided modules are proposed to learn reinforced feature representations for effective re-identification and enables the individual frames to complement each other in an aggregation manner, leading to robust video-level feature representations. Expand
Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification
TLDR
A novel Spatial-Temporal Graph Convolutional Network (STGCN) is proposed to solve the occlusion problem and the visual ambiguity problem for visually similar negative samples and achieves state-of-the-art results on MARS and DukeMTMC-VideoReID datasets. Expand
Recurrent Convolutional Network for Video-Based Person Re-identification
TLDR
A novel recurrent neural network architecture for video-based person re-identification that makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re- identification. Expand
Adversarial Attribute-Text Embedding for Person Search With Natural Language Query
TLDR
A novel Adversarial Attribute-Text Embedding (AATE) network for person search with text query is proposed, in particular, a cross-modal adversarial learning module is proposed to learn discriminative and modality-invariant visual-textual features. Expand
Appearance-Preserving 3D Convolution for Video-based Person Re-identification
TLDR
AppearancePreserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel, which is able to model temporal information on the premise of maintaining the appearance representation quality. Expand
Context Aware Graph Convolution for Skeleton-Based Action Recognition
  • Xikun Zhang, Chang Xu, D. Tao
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
A context aware graph convolutional network (CA-GCN), in which asymmetric relevance measurement and higher level representation are utilized to compute context information for more flexibility and better performance in skeleton based action recognition. Expand
...
1
2
3
4
5
...