A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

@inproceedings{Beel2013ACA,
  title={A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation},
  author={Joeran Beel and Marcel Genzmehr and Stefan Langer and A. N{\"u}rnberger and Bela Gipp},
  booktitle={RepSys '13},
  year={2013}
}
Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations… Expand
A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems
TLDR
It is concluded that in practice, offline evaluations are probably not suitable to evaluate recommender systems, particularly in the domain of research paper recommendations. Expand
Multi-method Evaluation in Scientific Paper Recommender Systems
TLDR
A scientific paper recommender system (SPRS) prototype which was subject to both offline and user evaluations is presented and the lessons learnt from the evaluation studies are described. Expand
Comparing Offline and Online Recommender System Evaluations on Long-tail Distributions
TLDR
By focusing on recommendations of long-tail items, which are usually more interesting for users, it was possible to reduce the bias caused by extremely popular items and to observe a better alignment of accuracy results in oine and online evaluations. Expand
Recommender Systems Evaluations : Offline, Online, Time and A/A Test
TLDR
A comparison of recommender systems algorithms along four dimensions is presented, including the quantification of the effect of non-Algorithmic factors on the performance of an online recommender system by using an A/A test. Expand
Comparison of online and offline evaluation metrics in Recommender Systems
The goal of this work is to explore Recommender Systems and methods of evaluating them. The focus is on comparing online and offline approaches of evaluation, as their relationship is highlyExpand
Research paper recommender system evaluation: a quantitative literature survey
TLDR
It is currently not possible to determine which recommendation approaches for academic literature recommendation are the most promising, but there is little value in the existence of more than 80 approaches if the best performing approaches are unknown. Expand
Random Performance Differences Between Online Recommender System Algorithms
TLDR
The experiments aim to quantify the expected degree of variation in performance that cannot be attributed to differences between systems, and classify and discuss the non-algorithmic causes of performance differences observed. Expand
The Comparability of Recommender System Evaluations and Characteristics of Docear ’ s Users
Recommender systems are used in many fields, and many ideas have been proposed how to recommend useful items. In previous research, we showed that the effectiveness of recommendation approaches couldExpand
Meta-analysis of evaluation methods and metrics used in context-aware scholarly recommender systems
TLDR
Meta-analyses of the evaluation methods and metrics of 67 studies related to context-aware scholarly recommender systems from the years 2000 to 2014 show that offline evaluation methods are more commonly used compared to online and user studies, with the maximum rate of success. Expand
Survey on Evaluation of Recommender Systems
Recommender Systems (RSs) can be found in many modern applications and that expose the user to a huge collections of items and helps user to decide on appropriate items, and ease the task of findingExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 27 REFERENCES
Research paper recommender system evaluation: a quantitative literature survey
TLDR
It is currently not possible to determine which recommendation approaches for academic literature recommendation are the most promising, but there is little value in the existence of more than 80 approaches if the best performing approaches are unknown. Expand
Evaluating collaborative filtering recommender systems
TLDR
The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. Expand
What Recommenders Recommend - An Analysis of Accuracy, Popularity, and Sales Diversity Effects
TLDR
This first analysis on different data sets shows that some RS algorithms – while able to generate highly accurate predictions – concentrate their top 10 recommendations on a very small fraction of the product catalog or have a strong bias to recommending only relatively popular items than others. Expand
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks
TLDR
This paper reviews the proper construction of offline experiments for deciding on the most appropriate algorithm, and discusses three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. Expand
Beyond accuracy: evaluating recommender systems by coverage and serendipity
TLDR
It is argued that the new ways of measuring coverage and serendipity reflect the quality impression perceived by the user in a better way than previous metrics thus leading to enhanced user satisfaction. Expand
Explaining the user experience of recommender systems
TLDR
This paper proposes a framework that takes a user-centric approach to recommender system evaluation that links objective system aspects to objective user behavior through a series of perceptual and evaluative constructs (called subjective system aspects and experience, respectively). Expand
Evaluating Recommendation Systems
TLDR
This paper discusses how to compare recommenders based on a set of properties that are relevant for the application, and focuses on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. Expand
The Impact of Demographics (Age and Gender) and Other User-Characteristics on Evaluating Recommender Systems
TLDR
It was found that elderly users clicked more often on recommendations than younger ones and future research articles on recommender systems should report detailed data on their users to make results better comparable. Expand
Introducing Docear's research paper recommender system
TLDR
This demo paper presents Docear's research paper recommender system, an academic literature suite to search, organize, and create research articles that achieves click-through rates around 6%, in some scenarios even over 10%. Expand
Recommender systems: from algorithms to user experience
TLDR
It is argued that evaluating the user experience of a recommender requires a broader set of measures than have been commonly used, and additional measures that have proven effective are suggested. Expand
...
1
2
3
...