Using Keypoint Matching and Interactive Self Attention Network to verify Retail POSMs

  title={Using Keypoint Matching and Interactive Self Attention Network to verify Retail POSMs},
  author={Harshita Seth and Sonaal Kant and Muktabh Mayank Srivastava},
: Point of Sale Materials(POSM) are the merchandising and decoration items that are used by companies to communicate product information and offers in retail stores. POSMs are part of companies’ retail marketing strategy and are often applied as stylized window displays around retail shelves. In this work, we apply computer vision techniques to the task of verification of POSMs in supermarkets by telling if all desired components of window display are present in a shelf image. We use… 

Figures and Tables from this paper



Multi-label classification of promotions in digital leaflets using textual and visual information

This paper presents an end-to-end approach that classifies promotions within digital leaflets into their corresponding product categories using both visual and textual information and demonstrates the effectiveness of this approach for two separated tasks.

Deep Matching and Validation Network: An End-to-End Solution to Constrained Image Splicing Localization and Detection

A novel deep convolutional neural network architecture, called Deep Matching and Validation Network (DMVN), which simultaneously localizes and detects image splicing and is end-to-end optimized to produce the probability estimates and the segmentation masks.

SuperPoint: Self-Supervised Interest Point Detection and Description

This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision and introduces Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation.

The Contextual Loss for Image Transformation with Non-Aligned Data

This work presents an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems.

Squeeze-and-Excitation Networks

This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.

Dual Attention Network for Scene Segmentation

New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Non-local Neural Networks

This paper presents non-local operations as a generic family of building blocks for capturing long-range dependencies in computer vision and improves object detection/segmentation and pose estimation on the COCO suite of tasks.

Efficient Deep Learning for Stereo Matching

This paper proposes a matching network which is able to produce very accurate results in less than a second of GPU computation, and exploits a product layer which simply computes the inner product between the two representations of a siamese architecture.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.