Precision-oriented evaluation of recommender systems: an algorithmic comparison

@inproceedings{Bellogn2011PrecisionorientedEO,
  title={Precision-oriented evaluation of recommender systems: an algorithmic comparison},
  author={Alejandro Bellog{\'i}n and Pablo Castells and Iv{\'a}n Cantador},
  booktitle={RecSys '11},
  year={2011}
}
There is considerable methodological divergence in the way precision-oriented metrics are being applied in the Recommender Systems field, and as a consequence, the results reported in different studies are difficult to put in context and compare. We aim to identify the involved methodological design alternatives, and their effect on the resulting measurements, with a view to assessing their suitability, advantages, and potential shortcomings. We compare five experimental methodologies, broadly… 

Figures and Tables from this paper

Evaluating the Relative Performance of Collaborative Filtering Recommender Systems
TLDR
An evaluation framework based on a set of accuracy and beyond accuracy metrics, including a novel metric that captures the uniqueness of a recommendation list is presented, which finds that the matrix factorisation approach leads to more accurate and diverse recommendations, while being less biased toward popularity.
Evaluating Decision-Aware Recommender Systems
TLDR
This work analyses how the recommender system could measure the confidence on its own recommendations, so it has the capability of taking decisions about whether an item should be recommended or not, and explores evaluation metrics that allow to combine more than one evaluation dimension.
Comparative recommender system evaluation: benchmarking recommendation frameworks
TLDR
This work compares common recommendation algorithms as implemented in three popular recommendation frameworks and shows the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.
Statistical biases in Information Retrieval metrics for recommender systems
TLDR
This paper lays out an experimental configuration framework upon which to identify and analyse specific statistical biases arising in the adaptation of Information Retrieval metrics to recommendation tasks, namely sparsity and popularity biases.
Assessing ranking metrics in top-N recommendation
TLDR
A principled analysis of the robustness and the discriminative power of different ranking metrics for the offline evaluation of recommender systems is undertaken, drawing from previous studies in the information retrieval field.
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
TLDR
The results show that the proposed model can better capture the quality of a recommender system than traditional evaluation does, and is not affected by characteristics of the data (e.g. size).
Evaluating Recommender Systems: A Systemized Quantitative Survey
TLDR
The recommender evaluation guidelines (REval), which presents a roadmap for recommender systems' evaluators, is proposed, which provides stepwise guidelines for offline evaluation settings.
Mix and Rank: A Framework for Benchmarking Recommender Systems
TLDR
This work proposes a novel benchmarking framework that mixes different evaluation measures in order to rank the recommender systems on each benchmark dataset, separately, and discovers sets of correlated measures as well as sets of evaluation measures that are least correlated.
Research Paper Recommender System Evaluation Using Coverage
TLDR
A range of evaluation metrics and measures as well as some approaches used for evaluating recommendation systems are reviewed, showing that large differences in recommendation accuracy across frameworks and strategies are shown.
Adaptive Diversity in Recommender Systems
TLDR
Users’ propensity in selecting diverse items is analyzed, by taking into account content-based item attributes, to re-rank the list of Top-N items predicted by a recommendation algorithm, with the aim of fostering diversity in the final ranking.
...
...

References

SHOWING 1-10 OF 14 REFERENCES
Evaluating collaborative filtering recommender systems
TLDR
The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole.
Being accurate is not enough: how accuracy metrics have hurt recommender systems
TLDR
This paper proposes informal arguments that the recommender community should move beyond the conventional accuracy metrics and their associated experimental methodologies, and proposes new user-centric directions for evaluating recommender systems.
Goal-Driven Collaborative Filtering - A Directional Error Based Approach
TLDR
This paper proposes a flexible optimization framework that can adapt to individual recommendation goals and introduces a Directional Error Function to capture the cost (risk) of each individual predictions, and it can be learned from the specified performance measures at hand.
Performance of recommender algorithms on top-n recommendation tasks
TLDR
An extensive evaluation of several state-of-the art recommender algorithms suggests that algorithms optimized for minimizing RMSE do not necessarily perform as expected in terms of top-N recommendation task, and new variants of two collaborative filtering algorithms are offered.
Evaluating Recommendation Systems
TLDR
This paper discusses how to compare recommenders based on a set of properties that are relevant for the application, and focuses on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms.
Factorization meets the neighborhood: a multifaceted collaborative filtering model
TLDR
The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model and a new evaluation metric is suggested, which highlights the differences among methods, based on their performance at a top-K recommendation task.
Optimizing multiple objectives in collaborative filtering
TLDR
A general recommendation optimization framework that not only considers the predicted preference scores but also deals with additional operational or resource related recommendation goals and demonstrates through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other external factors such as the availability, profitability or usefulness of an item.
kNN CF: a temporal social network
TLDR
In this work, user-user kNN graphs are analysed from a temporal perspective, retrieving characteristics such as dataset growth, the evolution of similarity between pairs of users, the volatility of user neighbourhoods over time, and emergent properties of the entire graph as the algorithm parameters change.
A collaborative filtering algorithm and evaluation metric that accurately model the user experience
TLDR
It is empirically demonstrated that two of the most acclaimed CF recommendation algorithms have flaws that result in a dramatically unacceptable user experience, and a new Belief Distribution Algorithm is introduced that overcomes these flaws and provides substantially richer user modeling.
Text Retrieval Methods for Item Ranking in Collaborative Filtering
TLDR
A common notational framework for IR and rating-based CF, as well as a technique to provide CF data with a particular structure, in order to be able to use any IR weighting function with it, are proposed.
...
...