Evaluating Recommendation Systems

@inproceedings{Shani2011EvaluatingRS,
  title={Evaluating Recommendation Systems},
  author={Guy Shani and Asela Gunawardana},
  booktitle={Recommender Systems Handbook},
  year={2011}
}
Recommender systems are now popular both commercially and in the research community, where many approaches have been suggested for providing recommendations. [] Key Method In each of these cases we describe types of questions that can be answered, and suggest protocols for experimentation. We also discuss how to draw trustworthy conclusions from the conducted experiments. We then review a large set of properties, and explain how to evaluate systems given relevant properties. We also survey a large set of…
Towards reproducibility in recommender-systems research
TLDR
The recommender-system community needs to survey other research fields and learn from them, find a common understanding of reproducibility, identify and understand the determinants that affect reproduCibility, conduct more comprehensive experiments, and establish best-practice guidelines for recommender -systems research.
Mix and Rank: A Framework for Benchmarking Recommender Systems
TLDR
This work proposes a novel benchmarking framework that mixes different evaluation measures in order to rank the recommender systems on each benchmark dataset, separately, and discovers sets of correlated measures as well as sets of evaluation measures that are least correlated.
How good your recommender system is? A survey on evaluations in recommendation
TLDR
This paper surveys and organizes the main research that present definitions about concepts and propose metrics or strategies to evaluate recommendations, and settles the relationship between the concepts, categorizes them according to their objectives and suggests potential future topics on user satisfaction.
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
TLDR
The results show that the proposed model can better capture the quality of a recommender system than traditional evaluation does, and is not affected by characteristics of the data (e.g. size).
Evaluating the Accuracy and Utility of Recommender Systems
TLDR
It is concluded that current recommendation quality has outgrown the methods and metrics used for the evaluation of these systems, and qualitative approaches can be used, with minimal user interference, to correctly estimate the actual quality of recommendation systems.
Non-transparent recommender system evaluation leads to misleading results
TLDR
This work investigates the discrepancies between common open source recommender system frameworks and highlights the difference in evaluation protocols – even when the same evaluation metrics are employed, evidencing differences in their implementation.
Optimal Recommendation in the Presense of Comparison Shopping
TLDR
This work proposes to use choice probability to measure the overall quality of recommendation lists, which unifies the desire to achieve both relevancy and diversity in generating recommendations.
Research Paper Recommender System Evaluation Using Coverage
TLDR
A range of evaluation metrics and measures as well as some approaches used for evaluating recommendation systems are reviewed, showing that large differences in recommendation accuracy across frameworks and strategies are shown.
Collaborative Filtering Recommender Systems
TLDR
A wide variety of the choices available and their implications are discussed, aiming to provide both practicioners and researchers with an introduction to the important issues underlying recommenders and current best practices for addressing these issues.
Recommender system algorithms: A comparative analysis based on monotonicity
TLDR
Two other popular measures called precison and recall are considered to provide an experimental analysis of five most popular recommendation algorithms for evaluating the utility of recommendations.
...
...

References

SHOWING 1-10 OF 74 REFERENCES
Shilling recommender systems for fun and profit
TLDR
Four open questions are explored that may affect the effectiveness of shilling attacks on recommender systems: which recommender algorithm is being used, whether the application is producing recommendations or predictions, how detectable the attacks are by the operator of the system, and what the properties are of the items being attacked.
Avoiding monotony: improving the diversity of recommendation lists
TLDR
This work model the competing goals of maximizing the diversity of the retrieved list while maintaining adequate similarity to the user query as a binary optimization problem, leading to a parameterized eigenvalue problem whose solution is finally quantized to the required binary solution.
Beyond Algorithms: An HCI Perspective on Recommender Systems
TLDR
From a user’s perspective, an effective recommender system inspires trust in the system; has system logic that is at least somewhat transparent; points users towards new, not-yet-experienced items; provides details about recommended items, including pictures and community ratings; and finally, provides ways to refine recommendations by including or excluding particular genres.
Evaluation of Item-Based Top-N Recommendation Algorithms
TLDR
The experimental evaluation on five different datasets show that the proposed item-based algorithms are up to 28 times faster than the traditional user-neighborhood based recommender systems and provide recommendations whose quality is up to 27% better.
Item-based top-N recommendation algorithms
TLDR
This article presents one class of model-based recommendation algorithms that first determines the similarities between the various items and then uses them to identify the set of items to be recommended, and shows that these item-based algorithms are up to two orders of magnitude faster than the traditional user-neighborhood based recommender systems and provide recommendations with comparable or better quality.
Accounting for taste: using profile similarity to improve recommender systems
TLDR
It is proposed that the usefulness of recommender systems can be improved by including more information about recommenders, to understand the decision-making processes in an online context and form the basis for user-centered social recommender system design.
Item-based collaborative filtering recommendation algorithms
TLDR
This paper analyzes item-based collaborative ltering techniques and suggests that item- based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.
Empirical Analysis of Predictive Algorithms for Collaborative Filtering
TLDR
Several algorithms designed for collaborative filtering or recommender systems are described, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods, to compare the predictive accuracy of the various methods in a set of representative problem domains.
Evaluating collaborative filtering recommender systems
TLDR
The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole.
An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms
TLDR
An analysis framework is applied that divides the neighborhood-based prediction approach into three components and then examines variants of the key parameters in each component, and identifies the three components identified are similarity computation, neighbor selection, and rating combination.
...
...