• Corpus ID: 221172717

Semantic Product Search for Matching Structured Product Catalogs in E-Commerce

  title={Semantic Product Search for Matching Structured Product Catalogs in E-Commerce},
  author={Jason Ingyu Choi and Surya Kallumadi and Bhaskar Mitra and Eugene Agichtein and Faizan Javed},
Retrieving all semantically relevant products from the product catalog is an important problem in E-commerce. Compared to web documents, product catalogs are more structured and sparse due to multi-instance fields that encode heterogeneous aspects of products (e.g. brand name and product dimensions). In this paper, we propose a new semantic product search algorithm that learns to represent and aggregate multi-instance fields into a document representation using state of the art transformers as… 

Figures and Tables from this paper

WANDS: Dataset for Product Search Relevance Assessment

This work proposes a systematic and e-ective way to build a discriminative, reusable, and fair human-labeled dataset, Wayfair Annotation DataSet (WANDS), for e-commerce scenarios and introduces an important cross-referencing step to the annotation process which increases dataset completeness.

A Boring-yet-effective Approach for the Product Ranking Task of the Amazon KDD Cup 2022

This work describes its submission to the product ranking task of the AmazonKDDCup 2022 and argues that more difficult e-Commerce evaluation datasets to discriminate retrieval methods.

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

This paper proposes the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder and is more robust in its training than previous methods.

Structured Fine-Tuning of Contextual Embeddings for Effective Biomedical Retrieval

This paper investigates the suitability of leveraging biomedical abstract sections for fine-tuning pretrained contextual language models at a finer granularity and shows that models fine- tuned on individual sections are able to capture potentially useful word contexts that may be otherwise ignored by structure-agnostic models.



Semantic Product Search

This paper trains a deep learning model for semantic matching using customer behavior data and presents compelling offline results that demonstrate at least 4.7% improvement in Recall@100 and 14.5% improvement over baseline state-of-the-art semantic search methods using the same tokenization method.

Learning deep structured semantic models for web search using clickthrough data

A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed.

Learning to Match using Local and Distributed Representations of Text for Web Search

This work proposes a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that Matching with distributed representations complements matching with traditional local representations.

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

Neural Vector Spaces for Unsupervised Information Retrieval

It is found that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model, and therefore NVSM can safely be used for ranking documents without supervised relevance judgments.

Simple BM25 extension to multiple weighted fields

This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents and proposes a much more intuitive alternative which weights term frequencies before the non-linear term frequency saturation function is applied.

TU Wien @ TREC Deep Learning '19 - Simple Contextualization for Re-ranking

The TK (Transformer-Kernel) model is submitted: a neural re-ranking model for ad-hoc search using an efficient contextualization mechanism and a document-length enhanced kernel-pooling, which enables users to gain insight into the model.

Neural Ranking Models with Multiple Document Fields

A neural ranker that can take advantage of full document structure, including multiple instance and missing instance data, of variable length is introduced and significantly enhance the performance of the ranker and outperform a learning to rank baseline with hand-crafted features.

MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching

A novel system, namely MatchZoo, to facilitate the learning, practicing and designing of neural text matching models and can help researchers to train, test and apply state-of-the-art models systematically and to develop their own models with rich APIs and assistance.

Distributed Representations of Sentences and Documents

Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.