• Corpus ID: 4320823

Trace your sources in large-scale data: one ring to find them all

  title={Trace your sources in large-scale data: one ring to find them all},
  author={Alexander B{\"o}ttcher and Wieland Brendel and Bernhard Englitz and Matthias Bethge},
An important preprocessing step in most data analysis pipelines aims to extract a small set of sources that explain most of the data. Currently used algorithms for blind source separation (BSS), however, often fail to extract the desired sources and need extensive cross-validation. In contrast, their rarely used probabilistic counterparts can get away with little cross-validation and are more accurate and reliable but no simple and scalable implementations are available. Here we present a novel… 

Figures and Tables from this paper



Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations

  • A. CichockiA. Phan
  • Computer Science
    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.
  • 2009
A class of optimized local algorithms which are referred to as Hierarchical Alternating Least Squares (HALS) algorithms, which work well for NMF-based blind source separation (BSS) not only for the over-determined case but also for an under-d determined (over-complete) case if data are sufficiently sparse.

A Unified Joint Matrix Factorization Framework for Data Integration

This paper introduces a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration and presents four update algorithms to solve this framework.

Erasing the Milky Way: new cleaning technique applied to GBT intensity mapping data

We present the rst application of a new foreground removal pipeline to the current leading HI intensity mapping dataset, obtained by the Green Bank Telescope (GBT). We study the 15hr and 1hr eld data

The non-negative matrix factorization toolbox for biological data mining

  • Yifeng LiA. Ngom
  • Computer Science, Biology
    Source Code for Biology and Medicine
  • 2012
A convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data is provided.

Cross-validation of component models: A critical look at current methods

In this paper, the most commonly used generic PCA cross-validation schemes are reviewed and how well they work in various scenarios are assessed.

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

  • Ahn
  • Computer Science
  • 2015
This paper proposes a scalable distributed Bayesian matrix factorization algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, that can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent.

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.

A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors

A set of energy minimization benchmarks are described and used to compare the solution quality and runtime of several common energy minimizations algorithms and a general-purpose software interface is provided that allows vision researchers to easily switch between optimization methods.

On the Statistical Analysis of Dirty Pictures

may 7th, 1986, Professor A. F. M. Smith in the Chair] SUMMARY A continuous two-dimensional region is partitioned into a fine rectangular array of sites or "pixels", each pixel having a particular