• Corpus ID: 238408090

2nd Place Solution to Google Landmark Recognition Competition 2021

  title={2nd Place Solution to Google Landmark Recognition Competition 2021},
  author={Shubin Dai},
  • Shubin Dai
  • Published 6 October 2021
  • Computer Science
  • ArXiv
As Transformer-based architectures have recently shown encouraging progresses in computer vision. In this work, we present the solution to the Google Landmark Recognition 2021 Challenge [10] held on Kaggle, which is an improvement on our last year’s solution [1] by changing three designs, including (1) Using Swin[12] andCSWin[4] as backbone for feature extraction, (2) Train on full GLDv2 [15], and (3) Using full GLDv2 [15] images as index image set for kNN search. With these modifications, our… 

Tables from this paper

Google Landmark Retrieval 2021 Competition Third Place Solution

This work presents two solutions to the Google Landmark Challenges 2021, ensembles of transformers and ConvNet models based on Sub-center ArcFace with dynamic margins, for both the retrieval and the recognition tracks.



1st Place Solution to Google Landmark Retrieval 2020

This paper presents the 1st place solution to the Google Landmark Retrieval 2020 Competition on Kaggle. The solution is based on metric learning to classify numerous landmark classes, and uses

Team JL Solution to Google Landmark Recognition 2019

The full pipeline, after ensembling the models and applying several steps of re-ranking strategies, scores 0.37606 GAP on the private leaderboard which won the 1st place in the competition.

Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

This work presents a novel landmark retrieval/recognition system, robust to a noisy and diverse dataset, based on deep convolutional neural networks with metric learning, trained by cosine-softmax based losses.

2nd Place and 2nd Place Solution to Kaggle Landmark Recognition andRetrieval Competition 2019

This work presents a retrieval based system for landmark retrieval and recognition challenge, using models trained and predicted by PaddlePaddle framework, and achieved 2nd place in the Google Landmark Recognition 2019 and 2 second place on kaggle.

Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval

This work introduces the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks, and demonstrates the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

  • Ze LiuYutong Lin B. Guo
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.

Attention-aware Generalized Mean Pooling for Image Retrieval

This paper applies attention mechanism to CNN, which aims at enhancing more relevant features that correspond to important keypoints in the input image to produce a compact global descriptor, which can be efficiently compared to other image descriptors by the dot product.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

This work develops the CSWin Transformer, ancient and effective Transformer-based backbone for general-purpose vision tasks, and introduces Locally-enhanced Positional Encoding (LePE), which han-dles the local positional information better than existing encoding schemes.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.