Variance Reduction in Gradient Exploration for Online Learning to Rank

@article{Wang2019VarianceRI,
  title={Variance Reduction in Gradient Exploration for Online Learning to Rank},
  author={Huazheng Wang and Sonwoo Kim and Eric McCord-Snook and Qingyun Wu and Hongning Wang},
  journal={Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2019}
}
  • Huazheng Wang, Sonwoo Kim, Hongning Wang
  • Published 10 June 2019
  • Computer Science
  • Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly. The key to such algorithms is an unbiased estimate of gradients, which is often (trivially) achieved by uniformly sampling from the entire parameter space. Unfortunately, this leads to high-variance in gradient estimation, resulting in high regret during model updates, especially when the dimension of the parameter space is large. In this work, we aim at reducing the variance of gradient estimation in OL2R… 

Figures and Tables from this paper

Learning Neural Ranking Models Online from Implicit User Feedback
TLDR
This work proposes to directly learn a neural ranking model from users’ implicit feedback, focusing on RankNet and LambdaRank, and proves that under standard assumptions the OL2R solution achieves a gap-dependent upper regret bound of O(log 2(T), in which the regret is defined on the total number of mis-ordered pairs over T rounds.
Unbiased Learning to Rank
TLDR
Eight state-of-the-art ULTR algorithms are evaluated and it is shown that many of them can be used in both offline settings and online environments with or without minor modifications.
Reinforcement Online Learning to Rank with Unbiased Reward Shaping
TLDR
A novel learning algorithm for OLTR that uses reinforcement learning to optimize rankers: Reinforcement Online Learning to Rank (ROLTR), where the gradients of the ranker are estimated based on the rewards assigned to clicked and unclicked documents.
Beyond Relevance Ranking: A General Graph Matching Framework for Utility-Oriented Learning to Rank
TLDR
This work systematically analyzes the biases in user feedback, including examination bias and selection bias, and proposes a general framework U-rank+ for learning to rank with logged user feedback from the perspective of graph matching.
Counterfactual Online Learning to Rank
TLDR
A counterfactual online learning to rank algorithm that combines the key components of both CLTR and OLTR is proposed that significantly outperforms traditional OLTR methods and can evaluate a large number of candidate rankers in a more efficient manner.
Calibrating Explore-Exploit Trade-off for Fair Online Learning to Rank
TLDR
This work states that different groups of items might receive differential treatments during the course of OL2R, and existing fair ranking solutions usually require the knowledge of result relevance or a performing ranker beforehand, which contradicts with the setting ofOL2R and thus cannot be directly applied to guarantee fairness.
Interactive Information Retrieval with Bandit Feedback
TLDR
This tutorial covers the online policy learning solutions for interactive IR with bandit feedback and addresses the new challenges that arose in such a solution paradigm, including sample complexity, costly and even outdated feedback, and ethical considerations in online learning (such as fairness and privacy) in interactive IR.
How do Online Learning to Rank Methods Adapt to Changes of Intent?
TLDR
The empirical experiments show that the adaptation to intent change does vary across OLTR methods, and is also dependent on the amount of noise in the implicit feedback signal, which highlights that intent change adaptation should be studied alongside online and offline performance.
Effective and Privacy-preserving Federated Online Learning to Rank
TLDR
This paper proposes a Federated OLTR method, called FPDGD, which leverages the state-of-the-art Pairwise Differentiable Gradient Descent (PDGD) and adapts it to the Federated Averaging framework and introduces a noise-adding clipping technique based on the theory of differential privacy to be used in combination with it.
Learning by Exploration: New Challenges in Real-World Environments
TLDR
This tutorial will introduce the learning by exploration paradigm, which is the key ingredient in many interactive online learning problems, including the multi-armed bandit and, more generally, reinforcement learning problems.
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
Efficient Exploration of Gradient Space for Online Learning to Rank
TLDR
The proposed algorithm, named as Null Space Gradient Descent, reduces the exploration space to only the null space of recent poorly performing gradients, preventing the algorithm from repeatedly exploring directions that have been discouraged by the most recent interactions with users.
Constructing Reliable Gradient Exploration for Online Learning to Rank
TLDR
Two OLR algorithms that improve the reliability of the exploration by constructing robust exploratory directions and a Multi-Point Deterministic Gradient Descent method that constructs a set of deterministic standard unit basis vectors for exploration are proposed.
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval
TLDR
A fast OLTR model called Sim-MGD is introduced that addresses the speed aspect of the speed-quality tradeoff and Cascading Multileave Gradient De- scent is contributed for OLTR that directly addresses theSpeed- quality tradeoff.
Differentiable Unbiased Online Learning to Rank
TLDR
Paired Differentiable Gradient Descent is an efficient and unbiased OLTR approach that provides a better user experience than previously possible and shows that using a neural network leads to even better performance at convergence than a linear model.
Multileave Gradient Descent for Fast Online Learning to Rank
TLDR
An online learning to rank algorithm called multileave gradient descent (MGD) is proposed that extends DBGD to learn from so-called multileaved comparison methods that can compare a set of rankings instead of merely a pair.
DCM Bandits: Learning to Rank with Multiple Clicks
TLDR
This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model, and proposes DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfactory items, such as web pages.
Online Learning to Rank in Stochastic Click Models
TLDR
BatchRank is proposed, the first online learning to rank algorithm for a broad class of click models that encompasses two most fundamental click models, the cascade and position-based models, and is observed to outperforms ranked bandits and is more robust than CascadeKL-UCB, an existing algorithm for the cascade model.
Reusing historical interaction data for faster online learning to rank for IR
TLDR
It is found that historical data can speed up learning, leading to substantially and significantly higher online performance, and the pre-selection method proves highly effective at compensating for noise in user feedback.
Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial
TLDR
Why it is believed that the time is right for an intermediate-level tutorial on online learning to rank, the objectives of the proposed tutorial, its relevance, as well as more practical details, such as format, schedule and support materials.
Cascading Bandits: Learning to Rank in the Cascade Model
TLDR
This paper proposes cascading bandits, a learning variant of the cascade model where the objective is to identify K most attractive items, and proposes two algorithms for solving it, CascadeUCB1 and Cascade KL-UCB.
...
1
2
3
...