What recommenders recommend: an analysis of recommendation biases and possible countermeasures

@article{Jannach2015WhatRR,
  title={What recommenders recommend: an analysis of recommendation biases and possible countermeasures},
  author={D. Jannach and Lukas Lerche and Iman Kamehkhosh and Michael Jugovac},
  journal={User Modeling and User-Adapted Interaction},
  year={2015},
  volume={25},
  pages={427-491}
}
Most real-world recommender systems are deployed in a commercial context or designed to represent a value-adding service, e.g., on shopping or Social Web platforms, and typical success indicators for such systems include conversion rates, customer loyalty or sales numbers. In academic research, in contrast, the evaluation and comparison of different recommendation algorithms is mostly based on offline experimental designs and accuracy or rank measures which are used as proxies to assess an… 
Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems
TLDR
A crowdsourced dataset devoid of the usual biases displayed by common publicly available data is built, in which contradictions between the accuracy that would be measured in a common biased offline experimental setting, and the actual accuracy that can be measured with unbiased observations are illustrated.
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
TLDR
An empirical evaluation investigating how recommendation algorithms are affected by popularity bias is presented, based on two state-of-theart datasets containing users’ preferences on movies and books and three different recommendation paradigms, i.e., collaborative filtering, content-based filtering and graph-based algorithms.
Measuring the Business Value of Recommender Systems
TLDR
A review of existing publications on field tests of recommender systems and which business-related performance measures were used in such real-world deployments indicates that various open questions remain regarding the realistic quantification of the business effects of recommenders and the performance assessment of recommendation algorithms in academia.
User-centered Evaluation of Popularity Bias in Recommender Systems
TLDR
The limitations of the existing metrics to evaluate popularity bias mitigation when one wants to assess these algorithms from the users’ perspective are shown and a new metric is proposed that can address these limitations.
The Unfairness of Popularity Bias in Recommendation
TLDR
The experimental results on a movie dataset show that in many recommendation algorithms the recommendations the users get are extremely concentrated on popular items even if a user is interested in long-tail and non-popular items showing an extreme bias disparity.
Addressing the Multistakeholder Impact of Popularity Bias in Recommendation Through Calibration
TLDR
This paper proposes the concept of popularity calibration which measures the match between the popularity distribution of items in a user's profile and that of the recommended items, and develops an algorithm that optimizes this metric and has a secondary effect of improving supplier fairness.
User Bias in Beyond-Accuracy Measurement of Recommendation Algorithms
TLDR
This work studies user biases of four algorithms in terms of those five measurements between user groups of the eight user characteristics, and looks into users’ behavior patterns like the preference of using more positive ratings, in order to interpret the observed biases.
Bias and Debias in Recommender System: A Survey and Future Directions
TLDR
This paper provides a taxonomy to position and organize the existing work on recommendation debiasing, and identifies some open challenges and envision some future directions on this important yet less investigated topic.
Connecting User and Item Perspectives in Popularity Debiasing for Collaborative Recommendation
The Impact of Popularity Bias on Fairness and Calibration in Recommendation
TLDR
The experimental results show that there is a strong correlation between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items, and algorithms with greater popularity bias amplification tend to have greater miscalibration.
...
...

References

SHOWING 1-10 OF 71 REFERENCES
What Recommenders Recommend - An Analysis of Accuracy, Popularity, and Sales Diversity Effects
TLDR
This first analysis on different data sets shows that some RS algorithms – while able to generate highly accurate predictions – concentrate their top 10 recommendations on a very small fraction of the product catalog or have a strong bias to recommending only relatively popular items than others.
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
TLDR
The results show that the proposed model can better capture the quality of a recommender system than traditional evaluation does, and is not affected by characteristics of the data (e.g. size).
Performance of recommender algorithms on top-n recommendation tasks
TLDR
An extensive evaluation of several state-of-the art recommender algorithms suggests that algorithms optimized for minimizing RMSE do not necessarily perform as expected in terms of top-N recommendation task, and new variants of two collaborative filtering algorithms are offered.
Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study
TLDR
The adoption of an RS can affect both the lift factor and the conversion rate, determining an increased volume of sales and influencing the user’s decision to actually buy one of the recommended products, and the perceived novelty of recommendations is likely to be more influential than their perceived accuracy.
Evaluating Recommendation Systems
TLDR
This paper discusses how to compare recommenders based on a set of properties that are relevant for the application, and focuses on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms.
Looking for "Good" Recommendations: A Comparative Evaluation of Recommender Systems
TLDR
An empirical study that involved 210 users and considered seven RSs on the same dataset that use different baseline and state-of-the-art recommendation algorithms was discussed, measuring the user's perceived quality of each of them, focusing on accuracy and novelty of recommended items, and on overall users' satisfaction.
Accurate and Novel Recommendations
TLDR
The proposed algorithms for providing novel and accurate recommendation to users are used to improve the performance of classic recommenders, including item-based collaborative filtering and Markov-based recommender systems.
User perception of differences in recommender algorithms
TLDR
It is found that satisfaction is negatively dependent on novelty and positively dependent on diversity in this setting, and that satisfaction predicts the user's final selection of a recommender that they would like to use in the future.
Comparative recommender system evaluation: benchmarking recommendation frameworks
TLDR
This work compares common recommendation algorithms as implemented in three popular recommendation frameworks and shows the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.
RF-Rec: Fast and Accurate Computation of Recommendations Based on Rating Frequencies
TLDR
Extensions to the proposed novel recommendation scheme RF-Rec are proposed in order to further increase the predictive accuracy by introducing schemes to weight and parameterize the components of the predictor.
...
...