Fast Offline Policy Optimization for Large Scale Recommendation

@article{Sakhi2022FastOP,
  title={Fast Offline Policy Optimization for Large Scale Recommendation},
  author={Otmane Sakhi and David Rohde and Alexandre Gilotte},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.05327}
}
Personalized interactive systems such as recommender systems require selecting relevant items dependent on context. Production systems need to identify the items rapidly from very large catalogues which can be efficiently solved using maximum inner product search technology. Offline optimisation of maximum inner product search can be achieved by a relaxation of the discrete problem resulting in policy learning or reinforce style learning algorithms. Unfortunately this relaxation step requires… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 44 REFERENCES

The Thermodynamic Variational Objective

This work provides a computationally efficient gradient estimator for the thermodynamic variational objective that applies to continuous, discrete, and non-reparameterizable distributions and shows that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO.

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs

The proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches and similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Doubly Robust Policy Evaluation and Optimization

It is proved that the doubly robust estimation method uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies, and is expected to become common practice in policy evaluation and optimization.

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

This work develops a learning principle and an efficient algorithm for batch learning from logged bandit feedback and shows how CRM can be used to derive a new learning method - called Policy Optimizer for Exponential Models (POEM - for learning stochastic linear rules for structured output prediction.

Counterfactual reasoning and learning systems: the example of computational advertising

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such

Monte Carlo theory, methods and examples

  • 2013

Scalable representation learning and retrieval for display advertising

This work shows that combining large-scale matrix factorization with lightweight embedding fine-tuning unlocks state-of-the-art performance at scale, and proposes an efficient model (LED, for Lightweight EncoderDecoder) reaching a new trade-off between complexity, scale and performance.

PyTorch: An Imperative Style, High-Performance Deep Learning Library

This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

Billion-Scale Similarity Search with GPUs

This paper proposes a novel design for an inline-formula that enables the construction of a high accuracy, brute-force, approximate and compressed-domain search based on product quantization, and applies it in different similarity search scenarios.