• Corpus ID: 10219819

Bayesian Boolean Matrix Factorisation

@article{Rukat2017BayesianBM,
  title={Bayesian Boolean Matrix Factorisation},
  author={Tammo Rukat and Christopher C. Holmes and Michalis K. Titsias and Christopher Yau},
  journal={ArXiv},
  year={2017},
  volume={abs/1702.06166}
}
Boolean matrix factorisation aims to decompose a binary data matrix into an approximate Boolean product of two low rank, binary matrices: one containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns. We introduce the OrMachine, a probabilistic generative model for Boolean matrix factorisation and derive a Metropolised Gibbs sampler that facilitates efficient parallel posterior inference. On real world and simulated data, our… 

Figures and Tables from this paper

Bayesian Nonparametric Boolean Factor Models
TLDR
This work lifts the restriction of a pre-specified number of latent dimensions by introducing an Indian Buffet Process prior over factor matrices to enable posterior inference to scale to Billions of observation.
Probabilistic Boolean Tensor Decomposition
TLDR
This work facilitates scalable sampling-based posterior inference by exploitation of the combinatorial structure of the factor conditionals in Boolean tensor decomposition, and provides an entirely novel perspective on relational properties of continuous data and, in the present example, on the molecular heterogeneity of cancer.
TensOrMachine: Probabilistic Boolean Tensor Decomposition
TLDR
This work facilitates scalable sampling-based posterior inference by exploitation of the combinatorial structure of the factor conditionals in Boolean tensor decomposition, and provides an entirely novel perspective on relational properties of continuous data and, in the present example, on the molecular heterogeneity of cancer.
Recent Developments in Boolean Matrix Factorization
TLDR
A concise summary of the efforts of all of the communities studying Boolean Matrix Factorization is given and some open questions which in this opinion require further investigation are raised.
MEBF: a fast and efficient Boolean matrix factorization method
TLDR
MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate sparse patterns than popular methods such as ASSO, PANDA and MP, and revealed its further potential in knowledge retrieving and data denoising.
Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation
TLDR
MEBF (Median Expansion for Boolean Factorization) demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate density patterns than popular methods such as ASSO, PANDA and Message Passing.
Bayesian Mean-parameterized Nonnegative Binary Matrix Factorization
TLDR
This work proposes a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF) and derives a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors.
Geometric All-Way Boolean Tensor Decomposition
TLDR
This work presented a computationally efficient BTD algorithm, namely GETF, that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective that has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.
Biclustering and Boolean Matrix Factorization in Data Streams
TLDR
An algorithm is provided that, after one pass over the stream, recovers the set of clusters on the right side of the graph using sublinear space; to the best of the knowledge, this is the first algorithm with this property.
Boolean matrix factorization meets consecutive ones property
TLDR
This paper studies a problem of Boolean matrix factorization where it is additionally require that the factor matrices have consecutive ones property (OBMF), and develops a greedy algorithm where at each step the authors look for the best 1-rank factorization.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
MDL4BMF: Minimum Description Length for Boolean Matrix Factorization
TLDR
An existing algorithm for BMF is extended to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.
Boolean Matrix Factorization and Noisy Completion via Message Passing
TLDR
This empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.
The Discrete Basis Problem
TLDR
This paper describes a matrix decomposition formulation for Boolean data, the Discrete Basis Problem, and gives a simple greedy algorithm for solving it and shows how it can be solved using existing methods.
Multi-assignment clustering for Boolean data
TLDR
A generative method for clustering vectorial data, where each object can be assigned to multiple clusters using a deterministic annealing scheme, which decomposes the observed data into the contributions of individual clusters and infers their parameters.
Modeling Dyadic Data with Binary Latent Factors
TLDR
This work introduces binary matrix factorization, a novel model for unsupervised matrix decomposition, and shows how to extend it to an infinite model in which the number of features is not a priori fixed but is allowed to grow with the size of the data.
Why Does Deep and Cheap Learning Work So Well?
TLDR
It is argued that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine learning, a deep neural network can be more efficient than a shallow one.
Deep Exponential Families
TLDR
This extensive study shows that going beyond one layer improves predictions for DEFs, and demonstrates that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models.
Probabilistic topic models
  • D. Blei
  • Computer Science
    Commun. ACM
  • 2010
TLDR
Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
Hierarchical compositional feature learning
TLDR
Using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest.
A Non-Parametric Bayesian Method for Inferring Hidden Causes
TLDR
This work presents a non-parametric Bayesian approach to structure learning with hidden causes that assumes that the number of hidden causes is unbounded, but only a finite number influence observable variables.
...
...