StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

@article{Sain2021StyleMeUpTS,
  title={StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval},
  author={Aneeshan Sain and Ayan Kumar Bhunia and Yongxin Yang and Tao Xiang and Yi-Zhe Song},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={8500-8509}
}
Sketch-based image retrieval (SBIR) is a cross-modal matching problem which is typically solved by learning a joint embedding space where the semantic content shared between photo and sketch modalities are preserved. However, a fundamental challenge in SBIR has been largely ignored so far, that is, sketches are drawn by humans and considerable style variations exist amongst different users. An effective SBIR model needs to explicitly account for this style diversity, crucially, to generalise to… 

Figures and Tables from this paper

ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval

TLDR
This work proposes an approach to jointly optimize sketch-to-photo synthesis and the image retrieval, which achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.

Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval

TLDR
A novel Three-Stream Joint Training Network (3JOIN) for the ZS-SBIR task is proposed, which uses a teacher network to extract the implicit semantics of the samples without the aid of other semantics and transfer the learned knowledge to unseen classes.

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

TLDR
This paper introduces a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unlabelled photos to account for data scarcity, and treats generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other.

Sketch3T: Test-Time Training for Zero-Shot SBIR

TLDR
This paper extends ZS-SBIR to include a test-time training paradigm that can adapt using just one sketch, and designs a novel meta-learning based training paradigm to learn a separation between model updates incurred by this auxiliary task from those off the primary objective of discriminative learning.

Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

TLDR
A stroke subset selector that detects noisy strokes, leaving only those which make a positive contribution towards successful retrieval, and can be used in a plug-and-play manner to empower various sketch applications in ways that were not previously possible.

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

TLDR
A framework that infuses gradient consensus for domain invariant learning, knowledge distillation for preserving old class information, and graph attention networks for message passing between old and novel classes is presented.

MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

TLDR
This paper proposes to meta-learn instance specific weights for a character-wise cross-entropy loss, which is specifically designed to work with the sequential nature of text data, and proposes a writer-adaptive MetaHTR framework which can be easily implemented on the top of most state-of-the-art HTR models.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

TLDR
This paper proposes two novel cross-modal translation pre-text tasks for self-supervised feature learning: Vectorization and Rasterization and shows that the learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data.

Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences

TLDR
This paper introduces a novel feature learning framework that addresses the problem of visible-infrared person re-identification (VI-reID), and proposes to exploit dense correspondences between cross-modal person images to address the cross- modal discrepancies in a pixel-level, suppressing modality-related features from person representations more effectively.

Finding Badly Drawn Bunnies

TLDR
This paper proposes Geometry-Aware Classification Layer (GACL), a generic method that makes feature-magnitude-as-quality-metric possible and importantly does it without the need for specific quality annotations from humans.

References

SHOWING 1-10 OF 66 REFERENCES

Cross-domain Generative Learning for Fine-Grained Sketch-Based Image Retrieval

TLDR
A novel discriminative-generative hybrid model is proposed by introducing a generative task of cross-domain image synthesis that enforces the learned embedding space to preserve all the domain invariant information that is useful for cross- domain reconstruction, thus explicitly reducing the domain gap as opposed to existing models.

Generalising Fine-Grained Sketch-Based Image Retrieval

  • Kaiyue PangKe Li Yi-Zhe Song
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
A novel unsupervised learning approach to model a universal manifold of prototypical visual sketch traits that can be used to paramaterise the learning of a sketch/photo representation, and demonstrates the efficacy of this approach in enabling cross-category generalisation of FG-SBIR.

A Zero-Shot Framework for Sketch-based Image Retrieval

TLDR
Experiments on this new benchmark created from the “Sketchy” dataset demonstrate that the performance of these generative models is significantly better than several state-of-the-art approaches in the proposed zero-shot framework of the coarse-grained SBIR task.

Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma

TLDR
This work introduces a multi-modal FGIR dataset with both sketches and sentences description provided as query modalities and shows that on its own the sketch modality is much more informative than text and each modality can benefit the other when they are modelled jointly.

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

TLDR
A novel network is designed that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels, and enriched using cross-modal co-attention and hierarchical node fusion at every level to form a better embedding space to conduct retrieval.

Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval

TLDR
This paper proposes a novel ZS-SBIR framework to jointly model sketches and photos into a common embedding space, and forms a novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap.

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

TLDR
This paper introduces a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unlabelled photos to account for data scarcity, and treats generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other.

Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval

TLDR
A reinforcement learning based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode and introduces a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval.

Fine-Grained Sketch-Based Image Retrieval by Matching Deformable Part Models

  • Li
  • Computer Science
  • 2014
TLDR
This paper learns deformable part-based model (DPM) as a mid-level representation to discover and encode the various poses in sketch and image domains independently, after which graph matching is performed on DPMs to establish pose correspondences across the two domains.

Sketch Me That Shoe

TLDR
A deep tripletranking model for instance-level SBIR is developed with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data.
...