Corpus ID: 236447478

Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification

  title={Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification},
  author={Zefeng Ding and Changxing Ding and Zhiyin Shao and Dacheng Tao},
  • Zefeng Ding, Changxing Ding, +1 author D. Tao
  • Published 2021
  • Computer Science
  • ArXiv
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level… Expand


Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search
A pose-guided multi-granularity attention network (PMA) is proposed, which employs pose information to learn latent semantic alignment between visual body part and textual noun phrase and Experimental results show that this approach outperforms the state-of-the-art methods by 15 \% in terms of the top-1 metric. Expand
Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments
A Multi-granularity Image-text Alignments (MIA) model is proposed to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id and obtains the state-of-the-art performance on the CUHK-PEDES dataset. Expand
Identity-Aware Textual-Visual Matching with Latent Co-attention
This paper proposes an identity-aware two-stage framework that learns to embed cross-modal features with a novel Cross-Modal Cross-Entropy (CMCE) loss and refines the matching results with a latent co-attention mechanism. Expand
Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification
Through domain adaptation experiment, it is shown that images generated by SPGAN are more suitable for domain adaptation and yield consistent and competitive re-ID accuracy on two large-scale datasets. Expand
Context-Aware Attention Network for Image-Text Retrieval
A unified Context-Aware Attention Network (CAAN) is proposed, which selectively focuses on critical local fragments (regions and words) by aggregating the global context and simultaneously utilizes global inter-modal alignments and intra- modal correlations to discover latent semantic relations. Expand
Densely Semantically Aligned Person Re-Identification
This work is the first to make use of fine grained semantics for addressing misalignment problems for re-ID and construct a set of densely semantically aligned part images (DSAP-images), where the same spatial positions have the same semantics across different person images. Expand
Context-Aware Multi-View Summarization Network for Image-Text Matching
A novel context-aware multi-view summarization network to summarize context-enhanced visual region information from multiple views and designs an adaptive gating self-attention module to extract representations of visual regions and words. Expand
Person Re-identification Meets Image Search
By designing an unsupervised Bag-of-Words representation, this paper is devoted to bridging the gap between the two tasks by integrating techniques from image search in person re-identification and shows that the system sets up an effective yet efficient baseline that is amenable to further supervised/unsupervised improvements. Expand
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention to discover the full latent alignments using both image regions and words in sentence as context and infer the image-text similarity achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets. Expand
Cross-Modal Cross-Domain Moment Alignment Network for Person Search
  • Ya Jing, Wei Wang, Liang Wang, T. Tan
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This paper makes the first attempt to adapt the model to new target domains in the absence of pairwise labels and proposes a moment alignment network (MAN) to solve the cross-modal cross-domain person search task. Expand