• Corpus ID: 244270412

2nd Place Solution to Facebook AI Image Similarity Challenge Matching Track

@article{Jeon20212ndPS,
  title={2nd Place Solution to Facebook AI Image Similarity Challenge Matching Track},
  author={SeungKee Jeon},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.09113}
}
This paper presents the 2nd place solution to the Facebook AI Image Similarity Challenge : Matching Track on DrivenData. The solution is based on self-supervised learning, and Vision Transformer(ViT). The main breaktrough comes from concatenating query and reference image to form as one image and asking ViT to directly predict from the image if query image used reference image. The solution scored 0.8291 Micro-average Precision on the private leaderboard. 

Figures from this paper

Results and findings of the 2021 Image Similarity Challenge

It appears that the most difficult image transformations involve either severe image crops or overlay-ing onto unrelated images, combined with local pixel perturbations, in the 2021 Image Similarity Challenge.

References

SHOWING 1-2 OF 2 REFERENCES

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.