Query2Prod2Vec: Grounded Word Embeddings for eCommerce

@inproceedings{Bianchi2021Query2Prod2VecGW,
  title={Query2Prod2Vec: Grounded Word Embeddings for eCommerce},
  author={Federico Bianchi and Jacopo Tagliabue and Bingqing Yu},
  booktitle={North American Chapter of the Association for Computational Linguistics},
  year={2021}
}
We present Query2Prod2Vec, a model that grounds lexical representations for product search in product embeddings: in our model, meaning is a mapping between words and a latent space of products in a digital shop. We leverage shopping sessions to learn the underlying space and use merchandising annotations to build lexical analogies for evaluation: our experiments show that our model is more accurate than known techniques from the NLP and IR literature. Finally, we stress the importance of data… 

Tables from this paper

BERT Goes Shopping: Comparing Distributional Models for Product Representations

This work proposes to transfer BERT-like architectures to eCommerce: the model - Prod2BERT - is trained to generate representations of products through masked session modeling and provides guidelines to practitioners for training embeddings under a variety of computational and data constraints.

FashionCLIP: Connecting Language and Images for Product Representations

This work builds on recent developments in contrastive learning to train FashionCLIP, a CLIP -like model for the fashion industry, and showcases its capabilities for retrieval, classification and grounding, and releases the model and code to the community.

Contrastive language and vision learning of general fashion concepts

This work builds on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model adapted for the fashion industry, and demonstrates the effectiveness of the representations learned by Fashion CLIP with extensive tests across a variety of tasks, datasets and generalization probes.

“Does it come in black?” CLIP-like models are zero-shot recommenders

Leveraging a large model built for fashion, GradREC is introduced and its industry potential is introduced, and a first rounded assessment of its strength and weaknesses are offered.

Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction

This work investigates grounded language learning through real-world data, by modelling a teacher-learner dynamics through the natural interactions occurring between users and search engines, and shows how the resulting semantics for noun phrases exhibits compositional properties while being fully learnable without any explicit labelling.

Improving Text-based Similar Product Recommendation for Dynamic Product Advertising at Yahoo

This work proposes a novel product name generation model that fine tunes a pre-trained Transformer-based language model with a sequence to sequence objective that retrieves high quality similar products, leading to an increase of ad clicks and ad revenue.

"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Preliminary results from building a comparison pipeline designed to scale in a multi-shop scenario are presented and the design choices are described and extensive benchmarks on multiple shops to stress-test it.

EvalRS: a Rounded Evaluation of Recommender Systems

EvalRS is proposed as a new type of challenge, in order to foster this discus-sion among practitioners and build in the open new methodologies for testing RSs “in the wild”.

SIGIR 2021 E-Commerce Workshop Data Challenge

The need for efficient procedures for personalization is even clearer if the authors consider the e-commerce landscapemore broadly, and the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers.

First Workshop on Content Understanding and Generation for E-commerce

Shopping experience on any e-commerce website is largely driven by the content customers interact with. The large volume of diverse content on e-commerce platforms, and the advances in machine

References

SHOWING 1-10 OF 41 REFERENCES

BERT Goes Shopping: Comparing Distributional Models for Product Representations

This work proposes to transfer BERT-like architectures to eCommerce: the model - Prod2BERT - is trained to generate representations of products through masked session modeling and provides guidelines to practitioners for training embeddings under a variety of computational and data constraints.

Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation

This work proposes Meta-Prod2vec, a novel method to compute item similarities for recommendation that leverages existing item metadata and shows that the new item representations lead to better performance on recommendation tasks on an open music dataset.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Revisiting Skip-Gram Negative Sampling Model with Regularization

This work revisits skip-gram negative sampling and rectifies the SGNS model with quadratic regularization, and shows that this simple modification suffices to structure the solution in the desired manner.

Dense Passage Retrieval for Open-Domain Question Answering

This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.

Fantastic Embeddings and How to Align Them: Zero-Shot Inference in a Multi-Shop Scenario

This paper addresses the challenge of leveraging multiple embedding spaces for multi-shop personalization by proving that zero-shot inference is possible by transferring shopping intent from one website to another without manual intervention, and proposes and benchmark unsupervised and supervised methods to "travel" between embedded spaces.

How to Grow a (Product) Tree: Personalized Category Suggestions for eCommerce Type-Ahead

This work presents SessionPath, a novel neural network model that improves facet suggestions on two counts: first, the model is able to leverage session embeddings to provide scalable personalization; second, SessionPath predicts facets by explicitly producing a probability distribution at each node in the taxonomy path.

The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference

This work shows how to inject product knowledge into behavior-based embeddings to provide the best accuracy with minimal engineering changes in existing infrastructure and without additional manual effort.

“An Image is Worth a Thousand Features”: Scalable Product Representations for In-Session Type-Ahead Personalization

It is shown how a shared vector space between similar shops can be used to improve the experience of users browsing across sites, opening up the possibility of applying zero-shot unsupervised personalization to increase conversions.

Scalable Query N-Gram Embedding for Improving Matching and Relevance in Sponsored Search

This work uses the similarity between a query and an ad derived from the query n-gram embeddings as an additional feature in the query-ad relevance model used in Yahoo Search, and proposes a novel online query to ads matching system built on an open-source big-data serving engine.