Cascade Transformers for End-to-End Person Search

  title={Cascade Transformers for End-to-End Person Search},
  author={Rui Yu and Dawei Du and Rodney LaLonde and Daniel S. Davila and Christopher Funk and A. Hoogs and Brian Clipp},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Rui YuDawei Du Brian Clipp
  • Published 17 March 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
The goal of person search is to localize a target person from a gallery set of scene images, which is extremely challenging due to large scale variations, pose/viewpoint changes, and occlusions. In this paper, we propose the Cascade Occluded Attention Transformer (COAT) for end-to-end person search. Our three-stage cascade design focuses on detecting people in the first stage, while later stages simultaneously and progressively refine the representation for person detection and re… 

Figures and Tables from this paper

Sequential Transformer for End-to-End Person Search

The proposed SeqTR not only outperforms all existing person search methods with a 59.3% mAP on PRW but also achieves comparable performance to the state-of-the-art results with an mAP of 94.8% on CUHK-SYSU.

SAT: Scale-Augmented Transformer for Person Search

In the three-stage design of the SAT framework, the first stage performs person detection whereas the last two stages performs both detection and re-identification, and introduces separate norm feature embeddings for the two tasks to reconcile the relationship between them in a joint person search model.

Gallery Filter Network for Person Search

  • Lucas JaffeA. Zakhor
  • Computer Science
    2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  • 2023
The Gallery Filter Network (GFN), a novel module which can efficiently discard gallery scenes from the search process, and benefit scoring for persons detected in remaining scenes, is described and demonstrated.

Deep Intra-Image Contrastive Learning for Weakly Supervised One-Step Person Search

This paper argues that current intra- image contrast is shallow, which suffers from spatial-level and occlusion-level variance, and presents a novel deep intra-image contrastive learning using a Siamese network.

Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search

A novel one-step framework based on self-similarity driven Scale-invariant Learning (SSL) to enhance the discriminative power of the features in an unsupervised manner and introduces a dynamic multi-label prediction which progressively seeks true labels for training.

MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification

A semi-automatic annotation framework and GUI that combines state-of-the-art real-time models for object detection, pose estimation, person ReID, and multi-object tracking is developed and evaluated.

An Efficient Person Search Method Using Spatio-Temporal Features for Surveillance Videos

An efficient person search method that employs spatio-temporal features in surveillance videos that considers the spatial features of persons in each frame, but also utilizes the temporal relationship of the same person between adjacent frames.

Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network

Extensive quantitative and qualitative experiments show that the proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.

Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution

A lightweight Cross-receptive Focused Inference Network (CFIN), a hybrid network composed of a Convolutional Neural Network (CNN) and a Transformer, which can achieve a good balance between computational cost and model performance.

Transformer in Transformer as Backbone for Deep Reinforcement Learning

The Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making.



Sequential End-to-end Network for Efficient Person Search

A Sequential End-to-end Network (SeqNet) to extract superior features in person search and design a robust Context Bipartite Graph Matching (CBGM) algorithm to effectively employ context information as an important complementary cue for person matching.

Person Search by Multi-Scale Matching

This work proposes a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model that favourably eliminates the need for constructing a computationally expensive image pyramid and a complex multi-branch network architecture.

Re-ID Driven Localization Refinement for Person Search

  • Chuchu HanJiacheng Ye N. Sang
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
A differentiable ROI transform layer is developed to effectively transform the bounding boxes from the original images so that the box coordinates can be supervised by the re-ID training other than the original detection task.

Anchor-Free Person Search

This work presents the Feature-Aligned Person Search Network (AlignPS), the first anchor-free framework to efficiently tackle this challenging task, and proposes an aligned feature aggregation module to generate more discriminative and robust feature embeddings by following a "re-id first" principle.

Hierarchical Online Instance Matching for Person Search

A Hierarchical Online Instance Matching (HOIM) loss is proposed which exploits the hierarchical relationship between detection and re-ID to guide the learning of the network and justifies the effectiveness of the proposed HOIM loss on learning robust features.

Norm-Aware Embedding for Efficient Person Search

A novel approach called Norm-Aware Embedding is presented to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training.

RCAA: Relational Context-Aware Agents for Person Search

This paper proposes to use the target person as the query in the query-dependent relational network and incorporates the relational spatial and temporal contexts into the framework to address the problem of search for a target person from a gallery of whole scene images for which the annotations of pedestrian bounders are unavailable.

Query-Guided End-To-End Person Search

A novel query-guided end-to-end person search network (QEEPS) to address both person detection and re-identification and outperform the previous state-of-the-art datasets by a large margin.

Joint Detection and Identification Feature Learning for Person Search

A new deep learning framework for person search that jointly handles pedestrian detection and person re-identification in a single convolutional neural network and converges much faster and better than the conventional Softmax loss.