The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project

  title={The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project},
  author={Michael D. Ekstrand},
Since 2010, we have built and maintained LensKit, an open-source toolkit for building, researching, and learning about recommender systems. We have successfully used the software in a wide range of recommender systems experiments, to support education in traditional classroom and online settings, and as the algorithmic backend for user-facing recommendation services in movies and books. This experience, along with community feedback, has surfaced a number of challenges with LensKit's design and… 
A Recommender Systems’ algorithm evaluation using the Lenskit library and MovieLens databases
  • Alejo Paullier, R. Sotelo
  • Computer Science
    2020 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)
  • 2020
This work will compare nine different algorithms used for generating recommendations across seven different metrics and provide a detailed and clear explanation of the methodology carried out as well as the experimental setup in order to ensure future reproducibility.
DRecPy: A Python Framework for Developing Deep Learning-Based Recommenders
This work introduces a new framework that not only provides several modules to avoid repetitive development work, but also to assist practitioners with these existing challenges.
Using Research Literature to Generate Datasets of Implicit Feedback for Recommending Scientific Items
A methodology that explores scientific literature for generating utility matrices of implicit feedback for recommending scientific items is proposed, and is in principle applicable to Recommender Systems in any scientific field.
Algorithm Selection with Librec-auto
Due to the complexity of recommendation algorithms, experimentation on recommender systems has become a challenging task. Current recommendation algorithms, while powerful, involve large numbers of
Standing in Your Shoes: External Assessments for Personalized Recommender Systems
The findings show that external assessments can be used for assessing user preference labels and evaluating systems in personalized recommendation scenarios, even better than traditional history-based online evaluation.
Modeling uncertainty to improve personalized recommendations via Bayesian deep learning
An approach based on Bayesian deep learning to improve personalized recommendations by capturing the uncertainty associated with the model output and utilizing it to boost exploration in the context of Recommender Systems.
Estimating Error and Bias in Offline Evaluation Results
It is found that missing data in the rating or observation process causes the evaluation protocol to systematically mis-estimate metric values, and in some cases erroneously determine that a popularity-based recommender outperforms even a perfect personalized recommender.
Fairness-aware Recommendation with librec-auto
Fairness-aware enhancements to the recommender systems experimentation tool librec-auto are described, which include metrics for various classes of fairness definitions, extension of the experimental model to support result re-ranking and a library of associated re- ranking algorithms, and additional support for experiment automation and reporting.
Bayesian Deep Learning Based Exploration-Exploitation for Personalized Recommendations
  • Xin Wang, Serdar Kadioglu
  • Computer Science
    2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)
  • 2019
This paper presents an approach based on Bayesian Deep Learning to learn a compact representation of user and item attributes to guide exploitation and shows how to further boost exploration by incorporating model uncertainty with that of data uncertainty.


Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit
The utility of LensKit is demonstrated by replicating and extending a set of prior comparative studies of recommender algorithms, and a question recently raised by a leader in the recommender systems community on problems with error-based prediction evaluation is investigated.
pyRecLab: A Software Library for Quick Prototyping of Recommender Systems
Details of pyRecLab are introduced, showing as well performance analysis in terms of error metrics (MAE and RMSE) and train/test time, and it is benchmarked against the popular Java-based library LibRec, showing similar results.
When recommenders fail: predicting recommender failure for algorithm selection and combination
This work presents an analysis of the predictions made by several well-known recommender algorithms on the MovieLens 10M data set, showing that for many cases in which one algorithm fails, there is another that will correctly predict the rating.
All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness
System evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users are proposed, as well as the need for researchers and operators to move beyond evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.
Surprise: A Python library for recommender systems
Recommender systems aim at providing users with a list of recommendations of items that a service offers. For example, a video streaming service will typically rely on a recommender system to propose
The MovieLens Datasets: History and Context
The history of MovieLens and the MovieLens datasets is documents, including a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization, and best practices and limitations of using the Movie Lens datasets in new research are documented.
Hybrid group recommendations for a travel service
The results prove the usefulness of individual and group recommendations and show that users prefer the hybrid algorithm over each individual technique, compared to an unpersonalized list of the most-popular destinations.
Evaluating recommender behavior for new users
A methodology is described that is used to compare representatives of three common families of algorithms along eleven different metrics and finds that for the first few ratings a baseline algorithm performs better than three common collaborative filtering algorithms.
Confer: A Conference Recommendation and Meetup Tool
Confer, a tool designed to help conference attendees find interesting papers and talks, cover and meet people with shared interests, and manage their time using a personalized schedule for the conference, is presented.
API design for machine learning software: experiences from the scikit-learn project
The simple and elegant interface shared by all learning and processing units in the Scikit-learn library is described and its advantages in terms of composition and reusability are discussed.