New Genome Similarity Measures Based on Conserved Gene Adjacencies

  title={New Genome Similarity Measures Based on Conserved Gene Adjacencies},
  author={Luis Antonio Brasil Kowada and Daniel Doerr and Simone Dantas and Jens Stoye},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  volume={24 6},
  • L. KowadaDaniel Doerr J. Stoye
  • Published 17 April 2016
  • Biology
  • Journal of computational biology : a journal of computational molecular cell biology
Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome… 

Gene family-free genome comparison

This work develops new methods for genome rearrangement studies that do not require prior knowledge of gene family assignments of genes, and accounts for differences between genes caused by point mutations while studying their order and composition in chromosomes.

The gene family-free median of three

This work presents a heuristic method, FFAdj-AM, which performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR, and proposes an appealing alternative to established tools for identifying higher confidence positional orthologs.

Family-Free Genome Comparison.

The family-free genome comparison tool FFGC is reviewed which provides several methods for gene order analyses that do not require prior knowledge of evolutionary relationships between the genes across the studied genomes.

Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics

It is argued that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS), and that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, are essential for progress.

Comparative Genomics

  • P. Little
  • Biology
    Methods in Molecular Biology
  • 2018
This chapter covers the theory and practice of ortholog gene set computation, and provides an overview of practical considerations intended for researchers who need to determine orthologous genes from a collection of annotated genomes.

Partitioned K-nearest neighbor local depth for scalable comparison-based learning

Partitioned Nearest Neighbors Local Depth is introduced, a computationally tractable variant of PaLD leveraging the K-nearest neighbors digraph on S and shows that the probability of randomization-induced error δ in PaNNLD is no more than 2e−δ K.

Analysis of the Period Recovery Error Bound

This paper provides the first analysis of the relationship between the error bound and the number of candidates, as well as identification of the error parameters that still guarantee recovery, and provides a hierarchy of more restrictive upper error bounds that asymptotically reduces the size of the potential period candidate set.

Genfamilienfreier Genomvergleich 1

Das Genom bezeichnet die gesamte genetische Information eines Organismus, welche hauptsächlich auf Chromosomen gespeichert ist. Der rechnergestützte Vergleich der Genome unterschiedlicher Spezies



Gene family assignment-free comparative genomics

It is demonstrated that gene order studies can be improved by direct, gene family assignment-free comparisons, and it is shown that the exact algorithm is suitable for computations on small genomes.

Gene family-free genome comparison

This work develops new methods for genome rearrangement studies that do not require prior knowledge of gene family assignments of genes, and accounts for differences between genes caused by point mutations while studying their order and composition in chromosomes.

Assignment of orthologous genes via genome rearrangement

A new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement is presented.

Efficient Tools for Computing the Number of Breakpoints and the Number of Adjacencies between Two Genomes with Duplicate Genes

This paper proposes to compute the minimum number of breakpoints and the maximum number of adjacencies between two genomes in presence of duplications using two different approaches: an exact, generic 0-1 linear programming approach, and a collection of three heuristics.

On the family-free DCJ distance and similarity

This work proposes the problem of computing the DCJ distance of two given genomes without prior gene family assignment, directly using the pairwise similarities between genes, and proves that this new family-free DCJdistance problem is APX-hard and provides an integer linear program to its solution.

Computing the Summed Adjacency Disruption Number between Two Genomes with Duplicate Genes

New algorithms for computing the exact summed adjacency disruption number for two genomes with duplicate genes are presented and Experimental results on a γ-Proteobacteria data set illustrate the approach.

Genome rearrangement with gene families

Simulations show that in two random genomes, the expected exemplar distance/n is sensitive to the number and size of gene families, but approaches 1 as the number of singleton families increases, while basing exemplardistance on exemplar reversals distance (ERD), the expected computing cost depends on the configuration of genes but is not sensitive to n.

Identifying gene clusters by discovering common intervals in indeterminate strings

A new dynamic model and efficient computational approaches are presented suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to geneFamily-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes.

Maximum Likelihood Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Tree of 68 Eukaryotes

A maximum likelihood approach for phylogenetic analysis that takes into account genome rearrangements as well as duplications, insertions, and losses is described, which can handle high-resolution genomes (with 40,000 or more markers) and can use in the same analysis genomes with very different numbers of markers.

Edit Distances for Genome Comparisons Based on Non-Local Operations

A number of measures of gene order rearrangement are defined, algorithm design and software development for the calculation of some of these quantities in single-chromosome genomes are described, and the results of applying these tools to a database of mitochondrial gene orders inferred from genomic sequences are reported on.