• Corpus ID: 51908691

RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising

@article{Rohde2018RecoGymAR,
  title={RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising},
  author={David Rohde and Stephen Bonner and Travis Dunlop and Flavian Vasile and Alexandros Karatzoglou},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.00720}
}
Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in e-commerce stores, to query suggestions in search engines, to friend recommendation in social networks. [] Key Method To this end we introduce RecoGym, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites. We believe that this is an important step forward for the field of…

Figures and Tables from this paper

Self-Supervised Reinforcement Learning for Recommender Systems

The proposed self-supervised reinforcement learning for sequential recommendation tasks augments standard recommendation models with two output layers: one for selfsupervised learning and the other for RL, and integrates the proposed frameworks with four state-of-the-art recommendation models.

Self-Supervised Reinforcement Learning for Recommender Systems

This paper proposes two frameworks namely Self-supervised Q-learning and Self-Supervised Actor-Critic and integrates the proposed frameworks with four state-of-the-art recommendation models, demonstrating the effectiveness of the approach on real-world datasets.

Self-Supervised Reinforcement Learning for Recommender Systems

This paper proposes two frameworks namely Self-Supervised Q-learning (SQN) and Self-supervised Actor-Critic (SAC) and integrates the proposed frameworks with four state-of-the-art recommendation models.

Deep Reinforcement Learning-Based Product Recommender for Online Advertising

A comparative study between value-based and policy-based deep RL algorithms for designing recommender systems for online advertising is provided, where the long short term memory (LSTM) is deployed to build value and policy networks in these two approaches, respectively.

Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation

The proposed PDQ not only avoids the instability of convergence and high computation cost of existing approaches but also provides unlimited interactions without involving real customers, and a proved upper bound of empirical error of reward function guarantees that the learned offline policy has lower bias and variance.

DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems

This paper develops an RL-based framework that can continuously update its advertising strategies and maximize reward in the long run and designs a novel Deep Q-network architecture that can determine three internally related tasks jointly.

Reinforcement Learning based Recommender Systems: A Survey

A survey on reinforcement learning based recommender systems (RLRSs) is presented and it is recognized and illustrated that RLRSs can be generally classified into RL- and DRL-based methods and proposed an RLRS framework with four components, i.e., state representation, policy optimization, reward formulation, and environment building.

Deep Reinforcement Learning for Online Advertising in Recommender Systems

This paper develops a reinforcement learning based framework that can continuously update its advertising strategies and maximize reward in the long run and demonstrates the effectiveness of the proposed framework based on real-world data.

Supervised Advantage Actor-Critic for Recommender Systems

Negative sampling strategy for training the RL component and combine it with supervised sequential learning is proposed and results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.

Reinforcement Learning for Long-term Reward Optimization in Recommender Systems

  • Anton Dorozhko
  • Computer Science
    2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)
  • 2019
This work proposes an environment with a unified interface that will permit to compare different modelization of recommender process and different algorithms on the same underlying sequential data and performed the extensive parameter study for deep deterministic policy gradient methods on the well-known MovieLens dataset.
...

References

SHOWING 1-10 OF 19 REFERENCES

Causal embeddings for recommendation

A new domain adaptation algorithm is proposed that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure, and is shown to be equivalent to learning to predict recommendation outcomes under a fully random recommendation policy.

BPR: Bayesian Personalized Ranking from Implicit Feedback

This paper presents a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem and provides a generic learning algorithm for optimizing models with respect to B PR-Opt.

A contextual-bandit approach to personalized news article recommendation

This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

Causal Inference for Recommendation

On real-world data, it is demonstrated that causal inference for recommender systems leads to improved generalization to new data.

E-commerce in Your Inbox: Product Recommendations at Scale

This paper proposes to use a novel neural language-based algorithm specifically tailored for delivering effective product recommendations to Yahoo Mail users, which was evaluated against baselines that included showing popular products and products predicted based on co-occurrence.

Efficient Optimal Learning for Contextual Bandits

This work provides the first efficient algorithm with an optimal regret and uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose.

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

The results show experimental evidence that recent off-policy learning methods can improve upon state-of-the-art supervised learning techniques on a large-scale real-world data set.

Efficient Thompson Sampling for Online Matrix-Factorization Recommendation

A novel algorithm for online MF recommendation that automatically combines finding the most relevant items with exploring new or less-recommended items, and is augmented with a novel efficient online Bayesian probabilistic matrix factorization method based on the Rao-Blackwellized particle filter.

Deep Learning with Logged Bandit Feedback

A Counterfactual Risk Minimization (CRM) approach for training deep networks using an equivariant empirical risk estimator with variance regularization, BanditNet, is proposed and it is shown how the resulting objective can be decomposed in a way that allows Stochastic Gradient Descent (SGD) training.

Contextual Gaussian Process Bandit Optimization

This work model the payoff function as a sample from a Gaussian process defined over the joint context-action space, and develops CGP-UCB, an intuitive upper-confidence style algorithm that shows that context-sensitive optimization outperforms no or naive use of context.