Cascade Transformers for End-to-End Person Search
@article{Yu2022CascadeTF, title={Cascade Transformers for End-to-End Person Search}, author={Rui Yu and Dawei Du and Rodney LaLonde and Daniel S. Davila and Christopher Funk and A. Hoogs and Brian Clipp}, journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022}, pages={7257-7266} }
The goal of person search is to localize a target person from a gallery set of scene images, which is extremely challenging due to large scale variations, pose/viewpoint changes, and occlusions. In this paper, we propose the Cascade Occluded Attention Transformer (COAT) for end-to-end person search. Our three-stage cascade design focuses on detecting people in the first stage, while later stages simultaneously and progressively refine the representation for person detection and re…
Figures and Tables from this paper
11 Citations
Sequential Transformer for End-to-End Person Search
- Computer ScienceArXiv
- 2022
The proposed SeqTR not only outperforms all existing person search methods with a 59.3% mAP on PRW but also achieves comparable performance to the state-of-the-art results with an mAP of 94.8% on CUHK-SYSU.
SAT: Scale-Augmented Transformer for Person Search
- Computer Science2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2023
In the three-stage design of the SAT framework, the first stage performs person detection whereas the last two stages performs both detection and re-identification, and introduces separate norm feature embeddings for the two tasks to reconcile the relationship between them in a joint person search model.
Gallery Filter Network for Person Search
- Computer Science2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2023
The Gallery Filter Network (GFN), a novel module which can efficiently discard gallery scenes from the search process, and benefit scoring for persons detected in remaining scenes, is described and demonstrated.
Deep Intra-Image Contrastive Learning for Weakly Supervised One-Step Person Search
- Computer ScienceArXiv
- 2023
This paper argues that current intra- image contrast is shallow, which suffers from spatial-level and occlusion-level variance, and presents a novel deep intra-image contrastive learning using a Siamese network.
Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search
- Computer ScienceArXiv
- 2023
A novel one-step framework based on self-similarity driven Scale-invariant Learning (SSL) to enhance the discriminative power of the features in an unsupervised manner and introduces a dynamic multi-label prediction which progressively seeks true labels for training.
MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification
- Computer Science2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2023
A semi-automatic annotation framework and GUI that combines state-of-the-art real-time models for object detection, pose estimation, person ReID, and multi-object tracking is developed and evaluated.
An Efficient Person Search Method Using Spatio-Temporal Features for Surveillance Videos
- Computer ScienceApplied Sciences
- 2022
An efficient person search method that employs spatio-temporal features in surveillance videos that considers the spatial features of persons in each frame, but also utilizes the temporal relationship of the same person between adjacent frames.
Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network
- Computer ScienceArXiv
- 2022
Extensive quantitative and qualitative experiments show that the proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution
- Computer ScienceArXiv
- 2022
A lightweight Cross-receptive Focused Inference Network (CFIN), a hybrid network composed of a Convolutional Neural Network (CNN) and a Transformer, which can achieve a good balance between computational cost and model performance.
Transformer in Transformer as Backbone for Deep Reinforcement Learning
- Computer ScienceArXiv
- 2022
The Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making.
References
SHOWING 1-10 OF 40 REFERENCES
Sequential End-to-end Network for Efficient Person Search
- Computer ScienceAAAI
- 2021
A Sequential End-to-end Network (SeqNet) to extract superior features in person search and design a robust Context Bipartite Graph Matching (CBGM) algorithm to effectively employ context information as an important complementary cue for person matching.
Person Search by Multi-Scale Matching
- Computer ScienceECCV
- 2018
This work proposes a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model that favourably eliminates the need for constructing a computationally expensive image pyramid and a complex multi-branch network architecture.
Re-ID Driven Localization Refinement for Person Search
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
A differentiable ROI transform layer is developed to effectively transform the bounding boxes from the original images so that the box coordinates can be supervised by the re-ID training other than the original detection task.
Anchor-Free Person Search
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work presents the Feature-Aligned Person Search Network (AlignPS), the first anchor-free framework to efficiently tackle this challenging task, and proposes an aligned feature aggregation module to generate more discriminative and robust feature embeddings by following a "re-id first" principle.
IAN: The Individual Aggregation Network for Person Search
- Computer SciencePattern Recognit.
- 2019
Hierarchical Online Instance Matching for Person Search
- Computer ScienceAAAI
- 2020
A Hierarchical Online Instance Matching (HOIM) loss is proposed which exploits the hierarchical relationship between detection and re-ID to guide the learning of the network and justifies the effectiveness of the proposed HOIM loss on learning robust features.
Norm-Aware Embedding for Efficient Person Search
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
A novel approach called Norm-Aware Embedding is presented to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training.
RCAA: Relational Context-Aware Agents for Person Search
- Computer ScienceECCV
- 2018
This paper proposes to use the target person as the query in the query-dependent relational network and incorporates the relational spatial and temporal contexts into the framework to address the problem of search for a target person from a gallery of whole scene images for which the annotations of pedestrian bounders are unavailable.
Query-Guided End-To-End Person Search
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A novel query-guided end-to-end person search network (QEEPS) to address both person detection and re-identification and outperform the previous state-of-the-art datasets by a large margin.
Joint Detection and Identification Feature Learning for Person Search
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
A new deep learning framework for person search that jointly handles pedestrian detection and person re-identification in a single convolutional neural network and converges much faster and better than the conventional Softmax loss.