• Corpus ID: 238408090

2nd Place Solution to Google Landmark Recognition Competition 2021

  title={2nd Place Solution to Google Landmark Recognition Competition 2021},
  author={Shubin Dai},
  • Shubin Dai
  • Published 6 October 2021
  • Computer Science
  • ArXiv
As Transformer-based architectures have recently shown encouraging progresses in computer vision. In this work, we present the solution to the Google Landmark Recognition 2021 Challenge [10] held on Kaggle, which is an improvement on our last year’s solution [1] by changing three designs, including (1) Using Swin[12] andCSWin[4] as backbone for feature extraction, (2) Train on full GLDv2 [15], and (3) Using full GLDv2 [15] images as index image set for kNN search. With these modifications, our… 

Tables from this paper

Google Landmark Retrieval 2021 Competition Third Place Solution
This work presents two solutions to the Google Landmark Challenges 2021, ensembles of transformers and ConvNet models based on Sub-center ArcFace with dynamic margins, for both the retrieval and the recognition tracks.


1st Place Solution to Google Landmark Retrieval 2020
This paper presents the 1st place solution to the Google Landmark Retrieval 2020 Competition on Kaggle. The solution is based on metric learning to classify numerous landmark classes, and uses
Team JL Solution to Google Landmark Recognition 2019
The full pipeline, after ensembling the models and applying several steps of re-ranking strategies, scores 0.37606 GAP on the private leaderboard which won the 1st place in the competition.
Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset
This work presents a novel landmark retrieval/recognition system, robust to a noisy and diverse dataset, based on deep convolutional neural networks with metric learning, trained by cosine-softmax based losses.
2nd Place and 2nd Place Solution to Kaggle Landmark Recognition andRetrieval Competition 2019
This work presents a retrieval based system for landmark retrieval and recognition challenge, using models trained and predicted by PaddlePaddle framework, and achieved 2nd place in the Google Landmark Recognition 2019 and 2 second place on kaggle.
Attention-aware Generalized Mean Pooling for Image Retrieval
This paper applies attention mechanism to CNN, which aims at enhancing more relevant features that correspond to important keypoints in the input image to produce a compact global descriptor, which can be efficiently compared to other image descriptors by the dot product.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
This work develops the CSWin Transformer, ancient and effective Transformer-based backbone for general-purpose vision tasks, and introduces Locally-enhanced Positional Encoding (LePE), which han-dles the local positional information better than existing encoding schemes.
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
  • Ze LiuYutong Lin B. Guo
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.