Corpus ID: 231749649

The Gene Mover's Distance: Single-cell similarity via Optimal Transport

  title={The Gene Mover's Distance: Single-cell similarity via Optimal Transport},
  author={Riccardo Bellazzi and Andrea Codegoni and Stefano Gualandi and Giovanna Nicora and Eleonora Vercesi},
This paper introduces the Gene Mover’s Distance, a measure of similarity between a pair of cells based on their gene expression profiles obtained via single-cell RNA sequencing. The underlying idea of the proposed distance is to interpret the gene expression array of a single cell as a discrete probability measure. The distance between two cells is hence computed by solving an Optimal Transport problem between the two corresponding discrete measures. In the Optimal Transport model, we use two… Expand

Figures and Tables from this paper

Optimal Transport improves cell-cell similarity inference in single-cell omics data
In this in-depth evaluation, OT is found to improve cell-cell similarity inference and cell clustering in all simulated and real scRNA-seq data, while its performances are comparable with Pearson correlation in scATAC-seq and single-cell DNA methylation data. Expand
Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance
An unbalanced graph earth mover’s distance is proposed that efficiently embeds the unbalanced EMD on an underlying graph into an L space, whose metric the authors call unbalanced diffusion earth movers’ distance (UDEMD), which leads to an efficient nearest neighbors kernel over many signals defined on a large graph. Expand


Gene2vec: distributed representation of genes based on co-expression
A machine learning method is proposed that utilizes transcriptome-wide gene co-expression to generate a distributed representation of genes, and the utility of this distribution is demonstrated by predicting gene-gene interaction based solely on gene names. Expand
Gene expression cartography
A new computational framework, novoSpaRc, leverages single-cell data to reconstruct spatial context for cells and spatial expression across tissues and organisms, on the basis of an organization principle for gene expression. Expand
Impact of similarity metrics on single-cell RNA-seq data clustering
A state-of-the-art kernel-based clustering algorithm (SIMLR) is modified using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. Expand
The Earth Mover's Distance as a Metric for Image Retrieval
This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances. Expand
Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data.
HoneyBADGER is effective at identifying deletions, amplifications, and copy-neutral loss-of-heterozygosity events and is capable of robustly identifying subclonal focal alterations as small as 10 megabases and highlights the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer. Expand
Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey
This work compares the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study and concludes which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. Expand
A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.
A droplet-based, single-cell RNA-seq method is implemented to determine the transcriptomes of over 12,000 individual pancreatic cells from four human donors and two mouse strains and provides a resource for the discovery of novel cell type-specific transcription factors, signaling receptors, and medically relevant genes. Expand
Computational Optimal Transport
This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications. Expand
From Word Embeddings To Document Distances
It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates. Expand
A comparison of automatic cell identification methods for single-cell RNA-sequencing data
It is found that most classifiers performed well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations, but the general-purpose SVM classifier has overall the best performance across the different experiments. Expand