Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit

@inproceedings{Ekstrand2011RethinkingTR,
  title={Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit},
  author={Michael D. Ekstrand and Michael Ludwig and Joseph A. Konstan and John Riedl},
  booktitle={RecSys '11},
  year={2011}
}
Recommender systems research is being slowed by the difficulty of replicating and comparing research results. Published research uses various experimental methodologies and metrics that are difficult to compare. It also often fails to sufficiently document the details of proposed algorithms or the evaluations employed. Researchers waste time reimplementing well-known algorithms, and the new implementations may miss key details from the original algorithm or its subsequent refinements. When… Expand
Towards reproducibility in recommender-systems research
TLDR
The recommender-system community needs to survey other research fields and learn from them, find a common understanding of reproducibility, identify and understand the determinants that affect reproduCibility, conduct more comprehensive experiments, and establish best-practice guidelines for recommender -systems research. Expand
Improving Accountability in Recommender Systems Research Through Reproducibility
TLDR
This work argues that, by facilitating reproducibility of recommender systems experimentation, it indirectly address the issues of accountability and transparency in recommender system research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. Expand
Replication and Reproduction in Recommender Systems Research - Evidence from a Case-Study with the rrecsys Library
Recommender systems (RS) are a real-world application domain for Artificial Intelligence standing at the core of massively used e-commerce and social-media platforms like Amazon, Netflix, Spotify andExpand
The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project
TLDR
A set of research tools that should significantly increase research velocity and provide much smoother integration with other software such as Keras while maintaining the same level of reproducibility as a LensKit experiment are described. Expand
Comparative recommender system evaluation: benchmarking recommendation frameworks
TLDR
This work compares common recommendation algorithms as implemented in three popular recommendation frameworks and shows the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results. Expand
Mix and Rank: A Framework for Benchmarking Recommender Systems
TLDR
This work proposes a novel benchmarking framework that mixes different evaluation measures in order to rank the recommender systems on each benchmark dataset, separately, and discovers sets of correlated measures as well as sets of evaluation measures that are least correlated. Expand
OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms
TLDR
This work proposes OpenRec, an open and modular Python framework that supports extensible and adaptable research in recommender systems and demonstrates that OpenRec provides adaptability, modularity and reusability while maintaining training efficiency and recommendation accuracy. Expand
Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation
TLDR
Elliot is a comprehensive recommendation framework that aims to run and reproduce an entire experimental pipeline by processing a simple configuration file and optimizes hyperparameters for several recommendation algorithms. Expand
Toward identification and adoption of best practices in algorithmic recommender systems research
TLDR
This work aims to address a growing concern that the Recommender Systems research community is facing a crisis where a significant number of research papers lack the rigor and evaluation to be properly judged and, therefore, have little to contribute to collective knowledge. Expand
Reproducibility of Experiments in Recommender Systems Evaluation
TLDR
This paper compares well known recommendation algorithms, using the same dataset, metrics and overall settings, the results of which point to result differences across frameworks with the exact same settings. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Evaluating Recommendation Systems
TLDR
This paper discusses how to compare recommenders based on a set of properties that are relevant for the application, and focuses on comparative studies, where a few algorithms are compared using some evaluation metric, rather than absolute benchmarking of algorithms. Expand
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures
TLDR
This study is the first of its kind, and the findings reveal an interesting trade-off: “hand-built” recommenders exhibit superior performance in model-building and pure recommendation tasks, while DBMS-based recommenders are superior at more complex recommendation tasks such as providing filtered recommendations and blending text-search with recommendation prediction scores. Expand
Evaluating collaborative filtering over time
TLDR
Investigating collaborative filtering from a temporal perspective is not only more suitable to the context in which recommender systems are deployed, but also opens a number of future research opportunities. Expand
Evaluating collaborative filtering recommender systems
TLDR
The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. Expand
Factorization meets the neighborhood: a multifaceted collaborative filtering model
TLDR
The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model and a new evaluation metric is suggested, which highlights the differences among methods, based on their performance at a top-K recommendation task. Expand
Improving regularized singular value decomposition for collaborative filtering
TLDR
Different efficient collaborative filtering techniques and a framework for combining them to obtain a good prediction are described, predicting users’ preferences for movies with error rate 7.04% better on the Netflix Prize dataset than the reference algorithm Netflix Cinematch. Expand
Evaluating the dynamic properties of recommendation algorithms
TLDR
A new evaluation method for the dynamic aspects of collaborative algorithms, the "temporal leave-one-out" approach, which can provide insight into both user-specific and system-level evolution of recommendation behavior. Expand
Item-based collaborative filtering recommendation algorithms
TLDR
This paper analyzes item-based collaborative ltering techniques and suggests that item- based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms. Expand
Empirical Analysis of Predictive Algorithms for Collaborative Filtering
TLDR
Several algorithms designed for collaborative filtering or recommender systems are described, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods, to compare the predictive accuracy of the various methods in a set of representative problem domains. Expand
An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms
TLDR
An analysis framework is applied that divides the neighborhood-based prediction approach into three components and then examines variants of the key parameters in each component, and identifies the three components identified are similarity computation, neighbor selection, and rating combination. Expand
...
1
2
3
...