Query2Prod2Vec: Grounded Word Embeddings for eCommerce

@inproceedings{Bianchi2021Query2Prod2VecGW,
  title={Query2Prod2Vec: Grounded Word Embeddings for eCommerce},
  author={Federico Bianchi and Jacopo Tagliabue and Bingqing Yu},
  booktitle={NAACL},
  year={2021}
}
We present Query2Prod2Vec, a model that grounds lexical representations for product search in product embeddings: in our model, meaning is a mapping between words and a latent space of products in a digital shop. We leverage shopping sessions to learn the underlying space and use merchandising annotations to build lexical analogies for evaluation: our experiments show that our model is more accurate than known techniques from the NLP and IR literature. Finally, we stress the importance of data… 

Tables from this paper

BERT Goes Shopping: Comparing Distributional Models for Product Representations

TLDR
This work proposes to transfer BERT-like architectures to eCommerce: the model - Prod2BERT - is trained to generate representations of products through masked session modeling and provides guidelines to practitioners for training embeddings under a variety of computational and data constraints.

FashionCLIP: Connecting Language and Images for Product Representations

TLDR
This work builds on recent developments in contrastive learning to train FashionCLIP, a CLIP -like model for the fashion industry, and showcases its capabilities for retrieval, classification and grounding, and releases the model and code to the community.

Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction

TLDR
This work investigates grounded language learning through real-world data, by modelling a teacher-learner dynamics through the natural interactions occurring between users and search engines, and shows how the resulting semantics for noun phrases exhibits compositional properties while being fully learnable without any explicit labelling.

"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

TLDR
Preliminary results from building a comparison pipeline designed to scale in a multi-shop scenario are presented and the design choices are described and extensive benchmarks on multiple shops to stress-test it.

EvalRS: a Rounded Evaluation of Recommender Systems

TLDR
EvalRS is proposed as a new type of challenge, in order to foster this discus-sion among practitioners and build in the open new methodologies for testing RSs “in the wild”.

SIGIR 2021 E-Commerce Workshop Data Challenge

TLDR
The need for efficient procedures for personalization is even clearer if the authors consider the e-commerce landscapemore broadly, and the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers.

First Workshop on Content Understanding and Generation for E-commerce

Shopping experience on any e-commerce website is largely driven by the content customers interact with. The large volume of diverse content on e-commerce platforms, and the advances in machine

“Does it come in black?” CLIP-like models are zero-shot recommenders

TLDR
Leveraging a large model built for fashion, GradREC is introduced and its industry potential is introduced, and a first rounded assessment of its strength and weaknesses are offered.

References

SHOWING 1-10 OF 41 REFERENCES

BERT Goes Shopping: Comparing Distributional Models for Product Representations

TLDR
This work proposes to transfer BERT-like architectures to eCommerce: the model - Prod2BERT - is trained to generate representations of products through masked session modeling and provides guidelines to practitioners for training embeddings under a variety of computational and data constraints.

Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation

TLDR
This work proposes Meta-Prod2vec, a novel method to compute item similarities for recommendation that leverages existing item metadata and shows that the new item representations lead to better performance on recommendation tasks on an open music dataset.

Deep Contextualized Word Representations

TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Revisiting Skip-Gram Negative Sampling Model with Regularization

TLDR
This work revisits skip-gram negative sampling and rectifies the SGNS model with quadratic regularization, and shows that this simple modification suffices to structure the solution in the desired manner.

Dense Passage Retrieval for Open-Domain Question Answering

TLDR
This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.

Fantastic Embeddings and How to Align Them: Zero-Shot Inference in a Multi-Shop Scenario

TLDR
This paper addresses the challenge of leveraging multiple embedding spaces for multi-shop personalization by proving that zero-shot inference is possible by transferring shopping intent from one website to another without manual intervention, and proposes and benchmark unsupervised and supervised methods to "travel" between embedded spaces.

How to Grow a (Product) Tree: Personalized Category Suggestions for eCommerce Type-Ahead

TLDR
This work presents SessionPath, a novel neural network model that improves facet suggestions on two counts: first, the model is able to leverage session embeddings to provide scalable personalization; second, SessionPath predicts facets by explicitly producing a probability distribution at each node in the taxonomy path.

The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference

TLDR
This work shows how to inject product knowledge into behavior-based embeddings to provide the best accuracy with minimal engineering changes in existing infrastructure and without additional manual effort.

“An Image is Worth a Thousand Features”: Scalable Product Representations for In-Session Type-Ahead Personalization

TLDR
It is shown how a shared vector space between similar shops can be used to improve the experience of users browsing across sites, opening up the possibility of applying zero-shot unsupervised personalization to increase conversions.

Shopping in the Multiverse: A Counterfactual Approach to In-Session Attribution

TLDR
This work proposes to learn a generative browsing model over a target shop, leveraging the latent space induced by prod2vec embeddings, and proposes to approach counterfactuals in analogy with treatments in formal semantics, explicitly modeling possible outcomes through alternative shopper timelines.