On Sampled Metrics for Item Recommendation

@article{Krichene2020OnSM,
  title={On Sampled Metrics for Item Recommendation},
  author={Walid Krichene and Steffen Rendle},
  journal={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  year={2020}
}
  • Walid Krichene, Steffen Rendle
  • Published 23 August 2020
  • Computer Science
  • Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
The task of item recommendation requires ranking a large catalogue of items given a context. Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates sampled metrics in more detail and shows that they are inconsistent with their exact version, in the sense that… 

Figures and Tables from this paper

On Sampled Metrics for Item Recommendation (Extended Abstract)
TLDR
This paper investigates sampled metrics and shows that they are inconsistent with their exact counterpart, in the sense that they do not persist relative statements, e.g., recommender A is better than B, not even in expectation, and suggests that sampling should be avoided for metric calculation.
A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models
TLDR
Both sampling by popularity and uniform random sampling do not consistently produce the same ranking when compared over different sample sizes and therefore both should be avoided in favor of the full ranking when establishing state-of-the-art recommender models.
Popularity Bias in False-positive Metrics for Recommender Systems Evaluation
TLDR
This analysis is the first to show that false-positive metrics tend to penalise popular items, the opposite behavior of true- positive metrics—causing a disagreement trend between both types of metrics in the presence of popularity biases.
Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?
TLDR
Quality metrics used for recommender systems evaluation are investigated and it is found that Precision is the only metric universally understood among papers and libraries, while other metrics may have different interpretations.
Offline Retrieval Evaluation Without Evaluation Metrics
TLDR
This work proposes recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists that substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data.
On Sampling Collaborative Filtering Datasets
TLDR
The main benefit of DATA-GENIE is that it will allow recommender system practitioners to quickly prototype and compare various approaches, while remaining confident that algorithm performance will be preserved, once the algorithm is retrained and deployed on the complete data.
Top-N Recommendation Algorithms: A Quest for the State-of-the-Art
TLDR
This work provides a set of fine-tuned baselinemodels for different datasets to establish a common understanding of the state-of-the-art for top-n recommendation tasks, and shall serve as a guideline for researchers regarding existing baselines to consider in future performance comparisons.
A sampling approach to Debiasing the offline evaluation of recommender systems
TLDR
This paper proposes and formulates a novel sampling approach, which it is suggested is a better estimator of the performance that one would obtain on (unbiased) MAR test data, and compares its methods to SKEW and to two baselines which perform a random intervention on MNAR data.
Item Recommendation from Implicit Feedback
TLDR
The main body deals with learning algorithms and presents sampling based algorithms for general recommenders and more efficient algorithms for dot product models and the application of item recommenders for retrieval tasks is discussed.
A Next Basket Recommendation Reality Check
TLDR
A novel angle is provided on the evaluation of next basket recommendation (NBR) methods, centered on the distinction between repetition and exploration: the next basket is typically composed of previously consumed items and new items, and a set of metrics that measure the repeat/explore ratio and performance of NBR models are proposed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Efficient top-n recommendation for very large scale binary rated datasets
We present a simple and scalable algorithm for top-N recommendation able to deal with very large datasets and (binary rated) implicit feedback. We focus on memory-based collaborative filtering
Item-based collaborative filtering recommendation algorithms
TLDR
This paper analyzes item-based collaborative ltering techniques and suggests that item- based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.
Collaborative Filtering for Implicit Feedback Datasets
TLDR
This work identifies unique properties of implicit feedback datasets and proposes treating the data as indication of positive and negative preference associated with vastly varying confidence levels, which leads to a factor model which is especially tailored for implicit feedback recommenders.
Unbiased offline recommender evaluation for missing-not-at-random implicit feedback
TLDR
This paper investigates evaluation bias of AOA and develops an unbiased and practical offline evaluator for implicit MNAR datasets using the Inverse-Propensity-Scoring (IPS) technique, and shows that popularity bias is widely manifested in item presentation and interaction.
Selection of Negative Samples for One-class Matrix Factorization
TLDR
This paper successfully develops efficient optimization techniques to solve the challenging problem of selecting negative entries in recommender systems and shows that the “full” approach of including much more missing entries as negative yields better results.
A Generic Coordinate Descent Framework for Learning from Implicit Feedback
TLDR
It is shown that k-separability is a sufficient property to allow efficient optimization of implicit recommender problems with CD, and a new framework for deriving efficient CD algorithms for complex recommender models is provided.
Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model
TLDR
A novel deep neural network with the co-attention mechanism for leveraging rich meta-path based context for top-N recommendation and performs well in the cold-start scenario and has potentially good interpretability for the recommendation results.
OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms
TLDR
This work proposes OpenRec, an open and modular Python framework that supports extensible and adaptable research in recommender systems and demonstrates that OpenRec provides adaptability, modularity and reusability while maintaining training efficiency and recommendation accuracy.
Explainable Reasoning over Knowledge Graphs for Recommendation
TLDR
A new model named Knowledge-aware Path Recurrent Network (KPRN) is contributed to exploit knowledge graph for recommendation to allow effective reasoning on paths to infer the underlying rationale of a user-item interaction.
Efficient Training on Very Large Corpora via Gramian Estimation
TLDR
This work proposes new efficient methods to train neural network embedding models without having to sample unobserved pairs, and conducts large-scale experiments that show a significant improvement in training time and generalization quality compared to traditional sampling methods.
...
1
2
3
...