A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research

@article{FerrariDacrema2019ATA,
  title={A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research},
  author={Maurizio Ferrari Dacrema and Simone Boglio and Paolo Cremonesi and D. Jannach},
  journal={ACM Transactions on Information Systems (TOIS)},
  year={2019},
  volume={39},
  pages={1 - 49}
}
The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today’s research practice, e.g., with respect to the choice and optimization of the baselines used for… 

Methodological Issues in Recommender Systems Research (Extended Abstract)

Analysis of research papers published recently at top-ranked conferences found only 7 were reproducible with reasonable effort, and 6 of them could often be outperformed by relatively simple heuristic methods, e.g., nearest neighbors.

Empirical analysis of session-based recommendation algorithms

Twelve algorithmic approaches to session-based recommendation are compared and it is found that the progress in terms of prediction accuracy that is achieved with neural methods is still limited and simple heuristic methods based on nearest-neighbors schemes are preferable over conceptually and computationally more complex methods.

Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems

It is shown through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers.

Neural Collaborative Filtering vs. Matrix Factorization Revisited

It is shown that with a proper hyperparameter selection, a simple dot product substantially outperforms the proposed learned similarities and that MLPs should be used with care as embedding combiner and that dot products might be a better default choice.

Adversarial learning for product recommendation

This work proposes a conditional, coupled generative adversarial network (RecommenderGAN) that learns to produce samples from a joint distribution between (view, buy) behaviors found in extremely sparse implicit feedback training data.

Personality Bias of Music Recommendation Algorithms

This work focuses on the music domain and creates a dataset of Twitter users’ music consumption behavior and personality traits, measuring the latter in terms of the OCEAN model and finds several significant differences in performance between user groups scoring high vs. groups scoring low on several personality traits.

Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Information Networks

A novel recommendation framework with a goal of improving information diversity using a modified random walk exploration of the user-item graph, and proposes a new model to estimate the ideological positions for both users and the content they share, which is able to recover ideological positions with high accuracy.

ContentWise Impressions: An Industrial Dataset with Impressions Included

Thedataset dataset is introduced, a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet.

Automated problem setting selection in multi-target prediction with AutoMTP

AutoMTP is proposed, an automated framework that performs algorithm selection for Multi-Target Prediction, and is realized by adopting a rule-based system for the algorithm selection step and a flexible neural network architecture that can be used for the several subfields of MTP.
...

References

SHOWING 1-10 OF 82 REFERENCES

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

A systematic analysis of algorithmic proposals for top-n recommendation tasks that were presented at top-level research conferences in the last years sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.

Methodological Issues in Recommender Systems Research (Extended Abstract)

Analysis of research papers published recently at top-ranked conferences found only 7 were reproducible with reasonable effort, and 6 of them could often be outperformed by relatively simple heuristic methods, e.g., nearest neighbors.

Performance comparison of neural and non-neural approaches to session-based recommendation

An extensive set of experiments were conducted, using a variety of datasets, in which it turned out that simple techniques in most cases outperform recent neural approaches and point to certain major limitations of today's research practice.

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

It is shown that running baselines properly is difficult and empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.

Performance of recommender algorithms on top-n recommendation tasks

An extensive evaluation of several state-of-the art recommender algorithms suggests that algorithms optimized for minimizing RMSE do not necessarily perform as expected in terms of top-N recommendation task, and new variants of two collaborative filtering algorithms are offered.

Evaluation of session-based recommendation algorithms

An in-depth performance comparison of a number of session-based recommendation algorithms based on recurrent neural networks, factorized Markov model approaches, as well as simpler methods based, e.g., on nearest neighbor schemes reveals that algorithms of this latter class often perform equally well or significantly better than today’s more complex approaches based on deep neural networks.

Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems

It is shown through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers.

Collaborative Denoising Auto-Encoders for Top-N Recommender Systems

It is demonstrated that the proposed model is a generalization of several well-known collaborative filtering models but with more flexible components, and that CDAE consistently outperforms state-of-the-art top-N recommendation methods on a variety of common evaluation metrics.

NPE: Neural Personalized Embedding for Collaborative Filtering

A neural personalized embedding model is proposed, which improves the recommendation performance for cold-users and can learn effective representations of items and outperforms competing methods for top-N recommendations, specially forcold-user recommendations.

Offline evaluation options for recommender systems

It is shown that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, orHow empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.
...