Emergent Statistical Laws in Single-Cell Transcriptomic Data

  title={Emergent Statistical Laws in Single-Cell Transcriptomic Data},
  author={Silvia Lazzardi and Filippo Valle and Andrea Mazzolini and Antonio Scialdone and Michele Caselle and Matteo Osella},
Large scale data on single-cell gene expression have the potential to unravel the specific transcriptional programs of different cell types. The structure of these expression datasets suggests a similarity with several other complex systems that can be analogously described through the statistics of their basic building blocks. Transcriptomes of single cells are collections of messenger RNA abundances transcribed from a common set of genes just as books are different collections of words from a… 

Comprehensive analysis of long non-coding RNAs in breast cancer using topic modeling

A topic modeling approach is proposed to investigate the transcriptional heterogeneity of luminal and triple negative breast cancer cells using patient-derived xenograft models of acquired resistance to chemotherapy and targeted therapy and shows that using an integrative clustering that combines the information coming from mRNAs and lncRNA treated as disjoint omic layers greatly improves the accuracy of cell classification.

Multiomics Topic Modeling for Breast Cancer Classification

This work presents an application of topic modeling techniques for the identification of breast cancer subtypes, and shows how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ’omics data.

Multi-omics Topic Modeling for Breast Cancer Classification

This work shows how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of ‘omics data, and shows that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification.

A Planck Radiation and Quantization Scheme for Human Cognition and Language

It is shown that the lack of independence of the Bose-Einstein statistics compared to the Maxwell-Boltzmann statistics can be explained by the presence of a ‘meaning dynamics’, which causes words to be attracted to the same words.



Quantifying E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells

System-wide analyses of protein and mRNA expression in individual cells with single-molecule sensitivity using a newly constructed yellow fluorescent protein fusion library for Escherichia coli found that almost all protein number distributions can be described by the gamma distribution with two fitting parameters which, at low expression levels, have clear physical interpretations as the transcription rate and protein burst size.

Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise

A strategy that pairs high-throughput flow cytometry and a library of GFP-tagged yeast strains to monitor rapidly and precisely protein levels at single-cell resolution is presented, revealing a remarkable structure to biological noise.

RNA sequencing reveals two major classes of gene expression levels in metazoan cells

RNA sequencing of mouse Th2 cells is used, coupled with a range of other techniques, to show that all genes can be separated, based on their expression abundance, into two distinct groups: one group comprised of lowly expressed and putatively non‐functional mRNAs, and the other of highly expressed m RNAs with active chromatin marks at their promoters.

Are There Laws of Genome Evolution?

The observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles, which might qualify as “laws of evolutionary genomics” in the same sense “law” is understood in modern physics.

Comprehensive integration of single cell data

This work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets, and demonstrates how anchoring can harmonize in-situ gene expression and scRNA-seq datasets.

Bayesian inference of gene expression states from single-cell RNA-seq data.

A Bayesian normalization procedure called Sanity (SAmpling-Noise-corrected Inference of Transcription activitY) is derived from first principles and shows that Sanity outperforms other normalization methods on downstream tasks, such as finding nearest-neighbor cells and clustering cells into subtypes.

Universal features in the genome-level evolution of protein domains

A stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication, and a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity.

Computational and analytical challenges in single-cell transcriptomics

The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the

Zipf's law in gene expression.

Using data from gene expression databases on various organisms and tissues, it is found that the abundances of expressed genes exhibit a power-law distribution with an exponent close to -1; i.e., they obey Zipf's law.