Graph-based Multilingual Product Retrieval in E-Commerce Search

  title={Graph-based Multilingual Product Retrieval in E-Commerce Search},
  author={Hanqing Lu and You-Heng Hu and Tong Zhao and Tony Wu and Yiwei Song and Bing Yin},
Nowadays, with many e-commerce platforms conducting global business, e-commerce search systems are required to handle product retrieval under multilingual scenarios. Moreover, comparing with maintaining per-country specific e-commerce search systems, having an universal system across countries can further reduce the operational and computational costs, and facilitate business expansion to new countries. In this paper, we introduce an universal end-to-end multilingual retrieval system, and… 

Figures and Tables from this paper

Evaluating Machine Translation in Cross-lingual E-Commerce Search

A search ranking-based evaluation framework with an edit-distance based search metric to evaluate machine translation impact on cross-lingual information retrieval for e-commerce search query translation and the proposed metric is strongly associated with traditional machine translation and traditional search relevance-based metrics.

Ask Me What You Need: Product Retrieval using Knowledge from GPT-3

A G PT-3 based product retrieval system that leverages the knowledge-base of GPT-3 for question answering; users do not need to know the specific illustrative keywords for a product when querying and the method shows consistent performance improvement on two real-world and one public dataset.

Improve MT for Search with Selected Translation Memory using Search Signals

A method of improving MT query translation using such TM entries when the TM entries are only sub-strings of a customer search query is proposed, and an approach to selecting TM entries using search signals that can contribute to better search results is proposed.

ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest

This work introduces a transformer-based architecture capable of aggregating information from both text and image modalities and show that it significantly outperforms single modality baselines.

Combining semantic search and twin product classification for recognition of purchasable items in voice shopping

This paper focuses on the problem of detecting if an utterance contains actual and purchasable products, thus referring to a shopping-related intent in a typical Spoken Language Understanding architecture consist- ing of an intent classifier and a slot detec- tor.

Multilingual Semantic Sourcing using Product Images for Cross-lingual Alignment

This work uses the human annotated data from established marketplace to transfer relevance classification knowledge to new/emerging marketplaces to solve the data paucity problem and learns semantic alignment across languages using product images as an anchor between them.



Language-Agnostic Representation Learning for Product Search on E-Commerce Platforms

This work proposes a novel multi-lingual multi-task learning framework, to jointly train product search models on multiple languages, with limited amount of training data from each language, which improves the performance over baseline search models in any given language.

Neural IR Meets Graph Embedding: A Ranking Model for Product Search

The recent advances in graph embedding techniques are leveraged to enable neural retrieval models to exploit graph-structured data for automatic feature extraction to overcome the long-tail problem of click-through data and incorporate external heterogeneous information to improve search results.

Semantic Product Search

This paper trains a deep learning model for semantic matching using customer behavior data and presents compelling offline results that demonstrate at least 4.7% improvement in Recall@100 and 14.5% improvement over baseline state-of-the-art semantic search methods using the same tokenization method.

Embedding-based Retrieval in Facebook Search

The unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index are introduced.

Pre-training Tasks for Embedding-based Large-scale Retrieval

It is shown that the key ingredient of learning a strong embedding-based Transformer model is the set of pre- training tasks, and with adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers.

Learning deep structured semantic models for web search using clickthrough data

A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed.

Cross-lingual Information Retrieval with BERT

This paper explores the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents in the task of cross-lingual information retrieval, and introduces a deep relevance matching model based on BERT.

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.

Query Rewriting using Automatic Synonym Extraction for E-commerce Search

One version of the approaches to query rewriting taken at eBay search is described, using a machine learned binary classifier to filter the candidate synonyms that are truly useful as query expansions without compromising result set precision.

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

A new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents is proposed.