Multi-Scale Temporal Cues Learning for Video Person Re-Identification

@article{Li2020MultiScaleTC,
  title={Multi-Scale Temporal Cues Learning for Video Person Re-Identification},
  author={Jianing Li and Shiliang Zhang and Tiejun Huang},
  journal={IEEE Transactions on Image Processing},
  year={2020},
  volume={29},
  pages={4461-4473}
}
Temporal cues embedded in videos provide important clues for person Re-Identification (ReID). To efficiently exploit temporal cues with a compact neural network, this work proposes a novel 3D convolution layer called Multi-scale 3D (M3D) convolution layer. The M3D layer is easy to implement and could be inserted into traditional 2D convolution networks to learn multi-scale temporal cues by end-to-end training. According to its inserted location, the M3D layer has two variants, i.e., local M3D… Expand
Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
TLDR
Spatio-Temporal Representation Factorization (STRF), a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID, is proposed and empirically shows that STRF improves performance of various existing baseline architectures while demonstrating new state-of-the-art results using standard person re- ID evaluation protocols on three benchmarks. Expand
Not 3D Re-ID: Simple Single Stream 2D Convolution for Robust Video Re-identification
TLDR
This work shows that global features extracted by the 2D convolution network are a sufficient representation for robust state of the art video Re-ID, without reliance on complex and memory intensive 3D convolutions or multi-stream networks architectures as found in other contemporary work. Expand
Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-Identification
TLDR
An iterative local-global collaboration learning approach to learn robust and discriminative person representations that jointly considers the global video information and local frame sequence information to better capture the diverse appearance of the person for feature learning and pseudo-label estimation. Expand
Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification
TLDR
A long-short temporal–spatial clues excited network (LSTS-NET) for robust person Re-ID across different scenes that outperforms the state-of-the-art methods in terms of robustness and accuracy. Expand
Diverse part attentive network for video-based person re-identification
TLDR
A Diverse Part Attentive Network (DPAN) to exploit discriminative and diverse body cues and achieves competitive performance on these datasets compared with state-of-the-art methods. Expand
Dual-path CNN with Max Gated block for Text-Based Person Re-identification
TLDR
A novel Dual-path CNN with Max Gated block (DCMG) is proposed to extract discriminative word embeddings and make visual-textual association concern more on remarkable features of both modalities inText-based person re-identification. Expand
Futuristic person re-identification over internet of biometrics things (IoBT): Technical potential versus practical reality
TLDR
An overview of interpreting various futuristic cues on the IoT platform for achieving PRId is conceptualized and some opportunities and key challenges of implementing this futuristic PRId system on IoBT are highlighted. Expand
Interaction-Integrated Network for Natural Language Moment Localization
TLDR
An Interaction-Integrated Network (I2N) is designed, which contains a few Inter interaction-integrated Cells ( I2Cs), which lies in the observation that the query sentence not only provides a description to the video clip, but also contains semantic cues on the structure of the entire video. Expand
Learning discriminative features with a dual-constrained guided network for video-based person re-identification
TLDR
A novel dual-constrained guided network (DCGN) to capture discriminative features by modeling the relations across frames with two steps is proposed, alleviating the frame redundancy from a global perspective. Expand
Multi-View Spatial Attention Embedding for Vehicle Re-Identification
TLDR
A multi-view branch network where each branch learns a viewpoint-specific feature without parameter sharing performs substantially better than the general feature learned by an uniform network to alleviate negative effects of viewpoint variance. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 97 REFERENCES
Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification
TLDR
An attribute-driven method for feature disentangling and frame re-weighting for video-based person re-identification that outperforms existing state-of-the-art approaches. Expand
Spatial and Temporal Mutual Promotion for Video-based Person Re-identification
TLDR
This work proposes a Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses noisy parts of the current frame's features by referring historical frames and uses the Spatial-Temporal Integration Module (STIM) to mine the spatial-temporal information from those upgraded features. Expand
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
TLDR
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced. Expand
VRSTC: Occlusion-Free Video Person Re-Identification
TLDR
A novel network, called Spatio-Temporal Completion network (STCnet), to explicitly handle partial occlusion problem, and demonstrates that the proposed approach outperforms the state-of-the-arts. Expand
MARS: A Video Benchmark for Large-Scale Person Re-Identification
TLDR
It is shown that CNN in classification mode can be trained from scratch using the consecutive bounding boxes of each identity, and the learned CNN embedding outperforms other competing methods considerably and has good generalization ability on other video re-id datasets upon fine-tuning. Expand
Person Re-identification via Recurrent Feature Aggregation
TLDR
This work shows that a progressive/sequential fusion framework based on long short term memory (LSTM) network aggregates the frame-wise human region representation at each time stamp and yields a sequence level human feature representation. Expand
Learning Spatiotemporal Features with 3D Convolutional Networks
TLDR
The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. Expand
Two-Stream Convolutional Networks for Action Recognition in Videos
TLDR
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Expand
Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning
TLDR
This paper proposes an approach to exploiting unlabeled tracklets by gradually but steadily improving the discriminative capability of the Convolutional Neural Network feature representation via stepwise learning. Expand
Recurrent Convolutional Network for Video-Based Person Re-identification
TLDR
A novel recurrent neural network architecture for video-based person re-identification that makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re- identification. Expand
...
1
2
3
4
5
...