Human-chimpanzee alignment: Ortholog exponentials and paralog power laws

@article{Gao2014HumanchimpanzeeAO,
  title={Human-chimpanzee alignment: Ortholog exponentials and paralog power laws},
  author={Kun Gao and Jonathan Miller},
  journal={Computational biology and chemistry},
  year={2014},
  volume={53 Pt A},
  pages={
          59-70
        }
}
Comparing the Statistical Fate of Paralogous and Orthologous Sequences
TLDR
A simple model of sequence evolution by substitutions and segmental duplications is developed, showing analytically and computationally that paralogous and orthologous gene pairs contribute differently to this distribution, providing a better understanding of statistical properties of genomic sequences and their evolution.
Orthologs from maxmer sequence context
TLDR
This work performs a genome "intersection" that in general consumes less than one thirtieth of the computation time required by commonly used methods for whole-genome alignment, and extracts "non-embedded maximal matches," maximal matches that are not embedded into other maximal matches, as potential orthologs.
The statistical fate of genomic DNA: modelling match statistics in different evolutionary scenarios
TLDR
This thesis develops mathematical frameworks taking into account complex mechanisms and that reproduce the observed deviations of maximal exact matches within and between eukaryotic genomes and implemented in silico sequence evolution models that reproduce these behaviors.
Size distribution of function-based human gene sets and the split–merge model
TLDR
A simple mechanism to break a power-law size distribution by a combination of splitting and merging operations is proposed and a simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function.
How Evolution of Genomes Is Reflected in Exact DNA Sequence Match Statistics
TLDR
This work shows that simple dynamical models consisting solely of duplication and mutation processes can already explain the characteristic features of MLDs observed in genomic sequences, and finds that these features are largely insensitive to details of the underlying mutational processes and do not necessarily rely on the action of natural selection.
Mappability and read length
TLDR
The slow decay (long tail) of the power-law function implies a diminishing return in converting unmappable regions/reads to become mappable with the increase of the read length, with the understanding that increasing read length will always move toward the direction of 100% mappability.
ditorial ditorial : Complexity in genomes
  • Biology
  • 2014
TLDR
This special issue aims to bring researchers who are comfortable with the theme of complexity in physical scinces to discuss genomes and to explore genomics in the framework of complex ystems theory.
CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters
TLDR
A new parallel traceback algorithm called Incremental Speculative Traceback (IST), which pipelines the traceback phase, speculating incrementally over the values calculated so far, producing results in advance, is proposed and evaluated.
Primary orthologs from local sequence context
TLDR
An intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context is described, and is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes.
...
...

References

SHOWING 1-10 OF 40 REFERENCES
Algebraic Distribution of Segmental Duplication Lengths in Whole-Genome Sequence Self-Alignments
TLDR
Distributions of duplicated sequences from genome self-alignment are characterized, including forward and backward alignments in bacteria and eukaryotes, suggesting a novel kind of long-distance correlation that must be non-local in origin.
Inferring orthology and paralogy.
TLDR
This chapter provides an overview of the methods used to infer orthology and paralogy, and surveys both graph-based approaches (and their various grouping strategies) and tree- based approaches, which solve the more general problem of gene/species tree reconciliation.
Scale-free duplication dynamics: a model for ultraduplication.
TLDR
This work proposes and studies scale-free duplication dynamics, a class of model for genome sequence evolution that generates the observed shapes of this distribution of duplicated sequences, and accounts plausibly for the observed form of the algebraic tail.
Neutral evolution of duplicated DNA: an evolutionary stick-breaking process causes scale-invariant behavior.
TLDR
A simple and evolutionarily neutral model is introduced, which involves only point mutations and segmental duplications, and produces the same statistical features as observed for genomic data.
Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes
TLDR
New alignment techniques that can handle large gaps in a robust fashion and discriminate between orthologous and paralogous alignments are developed and provide evidence that ≈2% of the genes in the human/mouse common ancestor have been deleted or partially deleted in the mouse.
Computational methods for Gene Orthology inference
TLDR
Comparisons of tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances.
Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
TLDR
A new model and comparative method that uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection and identifies sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection withrespect to indels.
Adaptive evolution of young gene duplicates in mammals.
TLDR
It is found that a high proportion of young gene duplicates in the human, macaque, mouse, and rat genomes have experienced adaptive natural selection, larger than any reported amount of selection among single-copy genes in these lineages using similar methods.
Fragmentation dynamics of DNA sequence duplications
TLDR
A class of simple discrete duplication/substitution models that generate steady-states sharing this property are formulated and solved and can be mapped directly onto certain fragmentation models that have been intensively studied by physicists in recent years.
Improved pairwise alignment of genomic dna
TLDR
A program is introduced, INFERZ, which addresses part of the inference problem, inferring substitution and gap scores according to a mathematically sound model, and the usefulness of iterating inferred scores to convergence is explored, finding that converged scores were not a consistent improvement.
...
...