• Corpus ID: 9024470

Collaborative Filtering and the Missing at Random Assumption

  title={Collaborative Filtering and the Missing at Random Assumption},
  author={Benjamin M Marlin and Richard S. Zemel and Sam T. Roweis and Malcolm Slaney},
Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample… 

Figures and Tables from this paper

Collaborative prediction and ranking with non-random missing data

This paper presents the first study of the effect of non-random missing data on collaborative ranking, and extends the previous results regarding the impact ofNon-randomMissingData on collaborative prediction.

Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random

This work proposes an estimator that integrates the imputed errors and propensities in a doubly robust way to obtain unbiased performance estimation and alleviate the effect of the propensity variance.

Learning from missing data using selection bias in movie recommendation

  • Claire VernadeO. Cappé
  • Computer Science
    2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
  • 2015
It is proposed a computationally efficient variational approach that makes it possible to exploit this selection bias so as to improve the estimation of ratings from small populations of users.

Collaborative Filtering With Ranking-Based Priors on Unknown Ratings

A ranking-based prior is proposed by hypothesizing that each user's unknown ratings are close to each other to avert the use of prior ratings and the results show that the proposed algorithms consistently outperform the state-of-the-art baselines and that the ranking- based prior leads to superior recommendation accuracy.

Training and testing of recommender systems on data missing not at random

It is shown that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations, and two performance measures can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR).

Bayesian binomial mixture model for collaborative prediction with non-random missing data

A Bayesian binomial mixture model for collaborative prediction, where the generative process for data and missing data mechanism are jointly modeled to handle non-random missing data and computationally-efficient variational inference algorithms.

Probabilistic Matrix Factorization with Non-random Missing Data

A probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random (MNAR) to obtain improved performance over state-of-the-art methods when predicting the ratings and when modeling the data observation process.

Tripartite Collaborative Filtering with Observability and Selection for Debiasing Rating Estimation on Missing-Not-at-Random Data

Extensive experiments show that modeling item observability and user selection effectively debias MNAR rating estimation, and TPMF outperforms the state-of-the-art methods in estimating the MNAR ratings.

Semi-supervised Collaborative Ranking with Push at the Top

S2COR mitigates the sparsity issue by leveraging side information about both observed and missing ratings by collaboratively learning the ranking model, which enables it to deal with the case of data missing not at random, but to also effectively incorporate the available side information in transduction.

Debiased offline evaluation of recommender systems: a weighted-sampling approach

This paper empirically validate for the first time the effectiveness of SKEW and shows the approach to be a better estimator of the performance one would obtain on (unbiased) MAR test data.



Collaborative Filtering: A Machine Learning Perspective

This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering, and implements a total of nine prediction methods, and conducts large scale prediction accuracy experiments.

Empirical Analysis of Predictive Algorithms for Collaborative Filtering

Several algorithms designed for collaborative filtering or recommender systems are described, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods, to compare the predictive accuracy of the various methods in a set of representative problem domains.

Latent Class Models for Collaborative Filtering

This paper presents a statistical approach to collaborative filtering and investigates the use of latent class models for predicting individual choices and preferences based on observed preference behavior and presents EM algorithms for different variants of the aspect model.

Eigentaste: A Constant Time Collaborative Filtering Algorithm

This work compares Eigentaste to alternative algorithms using data from Jester, an online joke recommending system, and uses the Normalized Mean Absolute Error (NMAE) measure to compare performance of different algorithms.

Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations

This paper investigates ways to further improve the performance of MMMF, by casting it within an ensemble approach, and explores and evaluates a variety of alternative ways to define such ensembles.

Unsupervised Learning with Non-Ignorable Missing Data

Empirical results using synthetic data show that unsupervised learning in the presence of nonignorable missing data with an unknown missing data mechanism can recover both the unknown selection model parameters and the underlying data model parameters to a high degree of accuracy.


Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the

An algorithmic framework for performing collaborative filtering

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial

Statistical Analysis With Missing Data

  • N. Lazar
  • Computer Science
  • 2003
Generalized Estimating Equations is a good introductory book for analyzing continuous and discrete correlated data using GEE methods and provides good guidance for analyzing correlated data in biomedical studies and survey studies.

Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper

Vibratory power unit for vibrating conveyers and screens comprising an asynchronous polyphase motor, at least one pair of associated unbalanced masses disposed on the shaft of said motor, with the