What you see is not what you get: how sampling affects macroscopic features of biological networks

@article{Annibale2011WhatYS,
  title={What you see is not what you get: how sampling affects macroscopic features of biological networks},
  author={Alessia Annibale and Anthonius C. C. Coolen},
  journal={Interface Focus},
  year={2011},
  volume={1},
  pages={836 - 856}
}
We use mathematical methods from the theory of tailored random graphs to study systematically the effects of sampling on topological features of large biological signalling networks. Our aim in doing so is to increase our quantitative understanding of the relation between true biological networks and the imperfect and often biased samples of these networks that are reported in public data repositories and used by biomedical scientists. We derive exact explicit formulae for degree distributions… Expand
Quantifying noise in mass spectrometry and yeast two-hybrid protein interaction detection experiments
TLDR
This paper aims at understanding the connection between true protein interactions and the protein interaction datasets that have been obtained using the most popular experimental techniques, i.e. mass spectronomy and yeast two-hybrid. Expand
Fast Evaluation of Link Prediction by Random Sampling of Unobserved Links
TLDR
This paper uses a new evaluation scheme for link prediction algorithms, i.e., link prediction with random sampling, to evaluate the performance of twelve link predictors on ten real-world networks of different contexts and scales and shows that the proposed scheme is a fast and effective evaluation method. Expand
Uncovering disease-disease relationships through the incomplete interactome
TLDR
A network-based framework to identify the location of disease modules within the interactome and use the overlap between the modules to predict disease-disease relationships is presented and it is found that disease pairs with overlapping disease modules display significant molecular similarity, elevated coexpression of their associated genes, and similar symptoms and high comorbidity. Expand
Performance of Local Information Based Link Prediction: A Sampling Perspective
TLDR
This paper tries to re-evaluate the performance of local information-based link predictions through sampling method governed division of the training set and the probe set and finds that for different sampling methods, each prediction approach performs unevenly. Expand
Mathematical Modeling of Avidity Distribution and Estimating General Binding Properties of Transcription Factors from Genome-Wide Binding Profiles.
  • V. Kuznetsov
  • Mathematics, Medicine
  • Methods in molecular biology
  • 2017
TLDR
An analytical framework for modeling, analysis, and prediction of transcription factor (TF) DNA binding properties detected at the genome scale is developed and a mixture probabilistic model of binding avidity function that includes nonspecific and specific binding events is introduced. Expand
Proteomics: from single molecules to biological pathways
TLDR
A systems biology approach may advance the understanding of cardiovascular disease processes at a ’biological pathway’ instead of a ‘single molecule’ level and accelerate progress towards disease-modifying interventions. Expand
Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review
TLDR
It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates and an optimized protocol of network-aided drug development is suggested, and a list of systems-level hallmarks of drug quality is provided. Expand
Species interactions: estimating per‐individual interaction strength and covariates before simplifying data into per‐species ecological networks
Summary Ecological network models based on aggregated data from species interactions are widely used to make inferences about species specialization, functionality and extinction risk. WhileExpand
Network epidemiology and plant trade networks
TLDR
This review focuses on the application of new developments in network epidemiology to the study and management of plant diseases through epidemic models in directed and hierarchical networks and spatial epidemic simulations integrating network data. Expand
Network completion by leveraging similarity of nodes
TLDR
This paper investigates the network completion problem and demonstrates that by effectively leveraging the side information about the nodes (such as the pairwise similarity), it is possible to predict the unobserved part of the network with high accuracy and proposes an efficient algorithm that decouples the completion from transduction stage to effectively exploit the similarity information. Expand
...
1
2
...

References

SHOWING 1-10 OF 17 REFERENCES
Statistical properties of sampled networks.
TLDR
It is found that the quantities related to those properties in sampled networks appear to be estimated quite differently for each sampling method, and it is explained why such a biased estimation of quantities would emerge from the sampling procedure. Expand
Tailored graph ensembles as proxies or null models for real networks I: tools for quantifying structure
We study the tailoring of structured random graph ensembles to real networks, with the objective of generating precise and practical mathematical tools for quantifying and comparing networkExpand
Subnets of scale-free networks are not scale-free: sampling properties of networks.
  • M. Stumpf, C. Wiuf, R. May
  • Computer Science, Medicine
  • Proceedings of the National Academy of Sciences of the United States of America
  • 2005
TLDR
The sampling properties of a network's degree distribution under the most parsimonious sampling scheme is discussed and it is shown that this condition is indeed satisfied for some important classes of networks, notably classical random graphs and exponential random graphs. Expand
Protein Networks Reveal Detection Bias and Species Consistency When Analysed by Information-Theoretic Methods
TLDR
By quantifying the methodological biases of the experimental data, this work can define an information threshold above which networks may be deemed to comprise consistent macroscopic topological properties, despite their small microscopic overlaps. Expand
The effects of incomplete protein interaction data on structural and evolutionary inferences
TLDR
The need to consider network sampling properties explicitly and from the outset in any analysis is demonstrated, when only small, partial network data sets are considered, bias is virtually inevitable. Expand
Sampling properties of random graphs: the degree distribution.
  • M. Stumpf, C. Wiuf
  • Mathematics, Medicine
  • Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2005
TLDR
A necessary and sufficient condition is derived that guarantees that the degree distributions of the subnet and the true network belong to the same family of probability distributions. Expand
Effect of sampling on topology predictions of protein-protein interaction networks
TLDR
It is concluded that given the current limited coverage levels, the observed scale-free topology of existing interactome maps cannot be confidently extrapolated to complete interactomes. Expand
Dynamic modularity in protein interaction networks predicts breast cancer outcome
TLDR
Analysis of two breast cancer patient cohorts revealed that altered modularity of the human interactome may be useful as an indicator of breast cancer prognosis. Expand
What is the real size of a sampled network? The case of the Internet.
TLDR
It is argued that inference of some of the standard topological quantities is, in fact, a version of the so-called "species" problem in statistics, which is important in categorizing the problem and providing some indication of its inherent difficulties. Expand
Classification of microarray data using gene networks
TLDR
This work proposes a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data, based on the spectral decomposition of geneexpression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles withrespect to the topology of thegraph. Expand
...
1
2
...