Towards reproducibility in recommender-systems research

@article{Beel2016TowardsRI,
  title={Towards reproducibility in recommender-systems research},
  author={Joeran Beel and Corinna Breitinger and Stefan Langer and Andreas Lommatzsch and Bela Gipp},
  journal={User Modeling and User-Adapted Interaction},
  year={2016},
  volume={26},
  pages={69-101}
}
Numerous recommendation approaches are in use today. However, comparing their effectiveness is a challenging task because evaluation results are rarely reproducible. In this article, we examine the challenge of reproducibility in recommender-system research. We conduct experiments using Plista’s news recommender system, and Docear’s research-paper recommender system. The experiments show that there are large discrepancies in the effectiveness of identical recommendation approaches in only… Expand
Reproducibility of Experiments in Recommender Systems Evaluation
TLDR
This paper compares well known recommendation algorithms, using the same dataset, metrics and overall settings, the results of which point to result differences across frameworks with the exact same settings. Expand
Real-World Recommender Systems for Academia: The Pain and Gain in Building, Operating, and Researching them
TLDR
This paper discusses the required skills to build recommender systems, and why the literature provides little help in identifying promising recommendation approaches, and explains the challenge in creating a randomization engine to run A/B tests. Expand
Collaborative Filtering: Matrix Completion and Session-Based Recommendation Tasks
TLDR
This chapter provides a self-contained overview on the basics of collaborative filtering recommender systems, and particularly focuses on neighborhood-based methods, which were proposed in the early days of collaborative filters and which are still relevant today. Expand
User Experience and Recommender Systems
TLDR
Despite the recent attempts on UX evaluation of RS, this area is still new and needs further investigations, so definition of UX especially in the field of RS is discussed. Expand
A Stream-based Resource for Multi-Dimensional Evaluation of Recommender Algorithms
TLDR
The new data set of stream recommendation interactions released for CLEF NewsREEL 2017, and the new Open Recommendation Platform (ORP) are introduced, which allows researchers to study a stream recommendation problem closely by "replaying" it locally, and makes it possible to take this evaluation "live" in a living lab scenario. Expand
It's Time to Consider "Time" when Evaluating Recommender-System Algorithms [Proposal]
TLDR
It is proposed that recommender-system researchers should instead calculate metrics for time-series such as weeks or months, and plot the results in e.g. a line chart to show how algorithms' effectiveness develops over time, and hence the results allow drawing more meaningful conclusions about how an algorithm will perform in the future. Expand
A Novel Approach to Recommendation Algorithm Selection using Meta-Learning
TLDR
This paper proposes a meta-learning-based approach to recommendation, which aims to select the best algorithm for each user-item pair, and develops a distinction between meta-learners that operate per-instance, per-data subset, and per-dataset (global level). Expand
Document Embeddings vs. Keyphrases vs. Terms for Recommender Systems: A Large-Scale Online Evaluation
TLDR
A standard term-based recommendation approach is compared to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases to evaluate the consistency of their performance across multiple scenarios. Expand
Real World Evaluation of Approaches to Research Paper Recommendation
TLDR
It is found that a term based similarity search performs better than keyword based approaches for research-paper recommendation systems and is a good starting point in finding performance improvements for related document searches. Expand
One-at-a-time: A Meta-Learning Recommender-System for Recommendation-Algorithm Selection on Micro Level
TLDR
This paper proposes a meta-learning-based approach to recommendation, which aims to select the best algorithm for each user-item pair, and develops a distinction between meta-learners that operate per-instance, per-data subset, and per-dataset (global level). Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 124 REFERENCES
The Comparability of Recommender System Evaluations and Characteristics of Docear ’ s Users
Recommender systems are used in many fields, and many ideas have been proposed how to recommend useful items. In previous research, we showed that the effectiveness of recommendation approaches couldExpand
Evaluating Recommendation Systems
TLDR
This paper discusses how to compare recommenders based on a set of properties that are relevant for the application, and focuses on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. Expand
Rival: a toolkit to foster reproducibility in recommender system evaluation
TLDR
Some of the functionality of RiVal are presented and step-by-step how RiVal can be used to evaluate the results from any recommendation framework and make sure that the results are comparable and reproducible. Expand
Evaluating the Accuracy and Utility of Recommender Systems
TLDR
It is concluded that current recommendation quality has outgrown the methods and metrics used for the evaluation of these systems, and qualitative approaches can be used, with minimal user interference, to correctly estimate the actual quality of recommendation systems. Expand
Research-paper recommender systems: a literature survey
TLDR
Several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches. Expand
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks
TLDR
This paper reviews the proper construction of offline experiments for deciding on the most appropriate algorithm, and discusses three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. Expand
What Recommenders Recommend - An Analysis of Accuracy, Popularity, and Sales Diversity Effects
TLDR
This first analysis on different data sets shows that some RS algorithms – while able to generate highly accurate predictions – concentrate their top 10 recommendations on a very small fraction of the product catalog or have a strong bias to recommending only relatively popular items than others. Expand
Layered Evaluation of Multi-Criteria Collaborative Filtering for Scientific Paper Recommendation
TLDR
How layered evaluation can be applied for a multi-criteria recommendation service that is planned to deploy for paper recommendation using the Mendeley dataset is studied and two experiments are suggested that may help assess the components of the envisaged system separately. Expand
Recommender systems: from algorithms to user experience
TLDR
It is argued that evaluating the user experience of a recommender requires a broader set of measures than have been commonly used, and additional measures that have proven effective are suggested. Expand
Do clicks measure recommendation relevancy?: an empirical user study
TLDR
Experiments show that algorithms with higher overall CTR may not correspond to higher relevance and may not be the optimal metric for online evaluation of recommender systems if producing relevant recommendations is of importance. Expand
...
1
2
3
4
5
...