Artificial Neural Networks to Impute Rounded Zeros in Compositional Data

  title={Artificial Neural Networks to Impute Rounded Zeros in Compositional Data},
  author={Matthias Templ},
  • M. Templ
  • Published 18 December 2020
  • Computer Science
  • ArXiv
Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis. Imputation methods for compositional data are typically applied on additive, centered, or isometric log-ratio representations of the data. Generally, methods for compositional data analysis can only be applied to observed positive entries in a data matrix. Therefore, one tries to impute missing values or measurements that were below a detection limit. In this… 
On a novel probability distribution for zero-laden compositional data
This paper reviews the key properties of the novel distribution and presents an application where it can be used for dimensionality reduction of compositional data, and highlights some underexplored connections between the machine learning and Compositional data analysis.
Learning sparse log-ratios for high-throughput sequencing data
This work presents CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers from HTS data by exploiting a continuous relaxation to approximate the underlying combinatorial optimization problem.
Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities
This study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects and showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre- processing steps are left out.


Regression imputation with Q-mode clustering for rounded zero replacement in high-dimensional compositional data
The results show that the proposed method based on regression imputation with Q-mode clustering can reduce the calculation time in higher dimensions and improve the quality of results.
missIWAE: Deep Generative Modelling and Imputation of Incomplete Data
The approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data, and is highly competitive with state-of-the-art methods.
Learning Generative Models from Incomplete Data
This thesis introduces a deep generative model, the Variational Auto-decoder (VAD), a variant of the stochastic gradient variational Bayes (SGVB) estimator first introduced by Kingma and Welling in 2013 and shows that the VAD framework is more robust to different rates of missing data than previous generative models for incomplete data.
Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation
Existing nonparametric imputation methods—both for the additive and the multiplicative approach—are revised and essential properties of the last method are given and for missing values a generalization of themultiplicative approach is proposed.
GAIN: Missing Data Imputation using Generative Adversarial Nets
This work proposes a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework and calls it GAIN, which significantly outperforms state-of-the-art imputation methods.
Regression with compositional response having unobserved components or below detection limit values
The typical way to deal with zeros and missing values in compositional data sets is to impute them with a reasonable value, and then the desired statistical model is estimated with the imputed data