Missing values in gel‐based proteomics
@article{Albrecht2010MissingVI,
title={Missing values in gel‐based proteomics},
author={Daniela Albrecht and Olaf Kniemeyer and Axel A. Brakhage and Reinhard Guthke},
journal={PROTEOMICS},
year={2010},
volume={10}
}Gel‐based proteomics is a widely applied technique to measure abundances of proteins in various biological systems. Comparison of two or more biological groups involves matching of 2‐D gels. Depending on the software, this can result in spots showing missing values on several gels. Most studies ignore this fact or substitute all missing data by zero. Since a couple of years, scientists have realized that this is not the optimal way of analyzing their data and several studies were published…
63 Citations
Data visualization and feature selection methods in gel-based proteomics.
- BiologyCurrent protein & peptide science
- 2014
This paper reviews and illustrates several different aspects of data analysis within the context of gel-based proteomics, summarizing the current state of research within this field and discussing the usefulness of available multivariate analysis tools both for data visualization and feature selection purposes.
Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline
- Computer ScienceMetabolomics
- 2011
The k-nearest neighbour imputation method (KNN) was identified as the optimal missing value estimation approach for direct infusion mass spectrometry datasets using direct infusion Fourier transform ion cyclotron resonance mass spectromaetry data.
Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.
- Computer ScienceJournal of proteome research
- 2013
Combined usage of these methods are recommended as a novel and optimal way to detect significantly changing features in these data sets, suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type.
Proper imputation of missing values in proteomics datasets for differential expression analysis
- Computer Science, BiologyBriefings Bioinform.
- 2021
This study investigated public DDA datasets of various tissue/sample types to determine the composition of MVs and developed simulated datasets that imitate the MV profile of real-life datasets, and compared the impact of various popular imputation methods on the analysis of differentially expressed proteins.
A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Datasets
- Biology
- 2020
Overall, this study indicates that the most suitable imputation method depends on the overall structure and correlations of proteins within the data set and can be identified with the workflow presented here.
A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets.
- BiologyJournal of proteome research
- 2021
Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.
Quantitative plant proteomics
- BiologyProteomics
- 2011
Plant‐specific quantitative methods such as metabolic labeling methods that can take full advantage of plant metabolism and culture practices are described, and other potential advantages and challenges that may arise from the unique properties of plants are discussed.
Back to the basics: Maximizing the information obtained by quantitative two dimensional gel electrophoresis analyses by an appropriate experimental design and statistical analyses.
- Computer ScienceJournal of proteomics
- 2011
Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets.
- BiologyMolecular bioSystems
- 2011
A non-linear data-driven stochastic gradient boosted trees (GBT) model is applied to impute missing proteomic values using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis to demonstrate that such missing value imputation improved characterization of the temporal response of S. oneidense to chromate.
References
SHOWING 1-10 OF 38 REFERENCES
Treatment of missing values for multivariate statistical analysis of gel‐based proteomics data
- Computer ScienceProteomics
- 2008
From the three tested methods to handle missing values in gel‐based proteomics data, BPCA imputation of missing values showed to be the most consistent method.
Optimal replication and the importance of experimental design for gel-based quantitative proteomics.
- BiologyJournal of proteome research
- 2005
The ways to improve the quality of protein expression data from 2-DE gels are explored, and an approach for defining the number of samples required and thenumber of gels per sample is described.
On the statistical analysis of the GS-NS0 cell proteome: imputation, clustering and variability testing.
- BiologyBiochimica et biophysica acta
- 2006
A likelihood-based approach to defining statistical significance in proteomic analysis where missing data cannot be disregarded
- Computer ScienceSignal Process.
- 2004
Normalization and analysis of residual variation in two‐dimensional gel electrophoresis for quantitative differential proteomics
- BiologyProteomics
- 2005
The model described is being used to assign confidence values to observed variations in arbitrary 2‐DE gels in order to quantify the degree of over‐expression and under‐expression of protein spots.
A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experiments.
- BiologyJournal of proteome research
- 2007
This study shows that the probability for a spot to be missing can be modeled by a logistic regression function of the logarithm of the volume, and presents an algorithm that takes a set of gels with technical and biological replicates as input and estimates the average protein abundances in the biological groups from the number of missing spots and measured volumes of the present spots using a maximum likelihood approach.
Statistics for proteomics: experimental design and 2-DE differential analysis.
- BiologyJournal of chromatography. B, Analytical technologies in the biomedical and life sciences
- 2007
Determining a significant change in protein expression with DeCyder™ during a pair‐wise comparison using two‐dimensional difference gel electrophoresis
- BiologyProteomics
- 2004
An alternative normalization method was applied which resulted in improved data distribution and allowed greater sensitivity in analysis and represents a method to greatly improve the success of DIGE data analysis.
Challenges related to analysis of protein spot volumes from two-dimensional gel electrophoresis as revealed by replicate gels.
- BiologyJournal of proteome research
- 2006
Challenges with 2-DE protein spot volumes are viewed in light of multiple gel comparisons and multivariate data analysis, which implies both loss of information and problems for the subsequent statistical analysis.
ANALYSIS OF DYNAMIC PROTEIN EXPRESSION DATA
- Computer Science
- 2005
A method is proposed for the estimation of missing data points in DIGE experiments used for measuring the expression levels of proteins in difierent mixtures on the same two-dimensional electrophoresis (2-DE) gel.