Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

@article{Oosterhuis2021UnifyingOA,
  title={Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions},
  author={Harrie Oosterhuis and M. de Rijke},
  journal={Proceedings of the 14th ACM International Conference on Web Search and Data Mining},
  year={2021}
}
  • Harrie Oosterhuis, M. de Rijke
  • Published 8 December 2020
  • Computer Science
  • Proceedings of the 14th ACM International Conference on Web Search and Data Mining
Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit… 

Figures from this paper

Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank
TLDR
MBC is a new correction method for position and trust bias in CLTR in which, unlike the existing methods, the correction does not rely on relevance estimation, and it is proved that the method is unbiased.
Reinforcement Online Learning to Rank with Unbiased Reward Shaping
TLDR
A novel learning algorithm for OLTR that uses reinforcement learning to optimize rankers: Reinforcement Online Learning to Rank (ROLTR), where the gradients of the ranker are estimated based on the rewards assigned to clicked and unclicked documents.
Unbiased Top-k Learning to Rank with Causal Likelihood Decomposition
TLDR
Causal Likelihood Decomposition (CLD) is proposed, a unified approach to simultaneously mitigating these two biases in top-k learning to rank by decomposing the log-likelihood of the biased data as an unbiased term that only related to relevance, plus other terms related to biases.
Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling
TLDR
A novel approach to producing more sample-efficient estimators of expectations in the PL model is developed by combining the Gumbel top-k trick with quasi-Monte Carlo (QMC) sampling, a well-established technique for variance reduction.
Interactive Information Retrieval with Bandit Feedback
TLDR
This tutorial covers the online policy learning solutions for interactive IR with bandit feedback and addresses the new challenges that arose in such a solution paradigm, including sample complexity, costly and even outdated feedback, and ethical considerations in online learning (such as fairness and privacy) in interactive IR.
Learning from User Interactions with Rankings: A Unification of the Field
TLDR
The second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank that has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning toRank from user clicks.
Is Non-IID Data a Threat in Federated Online Learning to Rank?
TLDR
The effect of non independent and identically distributed (non-IID) data on federated online learning to rank (FOLTR) and chart directions for future work in this new and largely unexplored research area of Information Retrieval are studied.
Understanding and Mitigating the Effect of Outliers in Fair Ranking
TLDR
This work formalizes outlierness in a ranking, shows that outliers are present in realistic datasets, and presents the results of an eye-tracking study, showing that users scanning order and the exposure of items are influenced by the presence of outliers.
Learning from user interactions with rankings
TLDR
The second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank, and introduces a novel pairwise method for learning from clicks that contrasts with the previous prevalent dueling-bandit methods.
Adaptive normalization for IPW estimation
TLDR
This work studies a family of IPW estimators, first proposed by Trotter and Tukey in the context of Monte Carlo problems, that are normalized by an affine combination of these two terms, and proposes an adaptively normalized estimator that has asymptotic variance that is never worse than the Horvitz–Thompson or Hájek estimators and is smaller except in edge cases.
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions
TLDR
The results show that the choice between the methodologies is consequential and depends on the presence of selection bias, and the degree of position bias and interaction noise, and that counterfactual methods can obtain the highest ranking performance; however, in other circumstances their optimization can be detrimental to the user experience.
Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking
TLDR
The novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance, is introduced, and it is proved that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
Counterfactual Online Learning to Rank
TLDR
A counterfactual online learning to rank algorithm that combines the key components of both CLTR and OLTR is proposed that significantly outperforms traditional OLTR methods and can evaluate a large number of candidate rankers in a more efficient manner.
Estimating Position Bias without Intrusive Interventions
TLDR
This paper shows how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and proposes a new extremum estimator that makes effective use of this data and is robust to a wide range of settings in simulation studies.
Intervention Harvesting for Context-Dependent Examination-Bias Estimation
TLDR
A Contextual Position-Based Model (CPBM) where the examination bias may also depend on a context vector describing the query and the user is proposed, and an effective estimator for the CPBM based on intervention harvesting is proposed.
A General Framework for Counterfactual Learning-to-Rank
TLDR
This paper provides a general and theoretically rigorous framework for counterfactual learning-to-rank that enables unbiased training for a broad class of additive ranking metrics as well as a broadclass of models (e.g., deep networks).
Unbiased Learning-to-Rank with Biased Feedback
TLDR
A counterfactual inference framework is presented that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data, and a Propensity-Weighted Ranking SVM is derived for discriminative learning from implicit feedback, where click models take the role of the propensity estimator.
Correcting for Selection Bias in Learning-to-rank Systems
TLDR
New counterfactual approaches which adapt Heckman’s two-stage method and accounts for selection and position bias in LTR systems are proposed, which are much more robust to noise and have better accuracy compared to existing unbiased LTR algorithms.
Addressing Trust Bias for Unbiased Learning-to-Rank
TLDR
This paper model the noise as the position-dependent trust bias and proposes a noise-aware Position-Based Model, named TrustPBM, to better capture user click behavior and shows that the proposed model can significantly outperform the existing unbiased learning-to-rank methods.
A probabilistic method for inferring preferences from clicks
TLDR
This paper derives an unbiased estimator of comparison outcomes and shows how marginalizing over possible comparison outcomes given the observed click data can make this estimator even more effective.
...
1
2
3
4
...