Miklós Csűrös

Learn More
Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study(More)
Chromalveolates are a large, diverse supergroup of unicellular eukaryotes that includes Apicomplexa, dinoflagellates, ciliates (three lineages that form the alveolate branch), heterokonts, haptophytes, and cryptomonads (three lineages comprising the chromist branch). All sequenced genomes of chromalveolates have relatively low intron density in(More)
By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which(More)
We examine the problem of finding maximal-scoring sets of disjoint regions in a sequence of scores. The problem arises in DNA and protein segmentation, and in post-processing of sequence alignments. Our key result states a simple recursive relationship between maximalscoring segment sets. The statement leads to an algorithm that finds such a k-set of(More)
Shared genealogies introduce allele dependences in diploid genotypes, as alleles within an individual or between different individuals will likely match when they originate from a recent common ancestor. At a locus shared by a pair of diploid individuals, there are nine combinatorially distinct modes of identity-by-descent (IBD), capturing all possible(More)
We describe a model for the sequence evolution of a processed pseudogene and its paralog from a common protein-coding ancestor. The model accounts for substitutions, insertions, and deletions and combines nucleotide- and codon-level mutation models. We give a dynamic programming method for calculating the likelihood of homology between two sequences in the(More)
We examine exon junctions near apparent amino acid insertions and deletions in alignments of orthologous protein-coding genes. In 1,917 ortholog families across nine oomycete genomes, 10-20% of introns are near an alignment gap, indicating at first sight that splice-site displacements are frequent. We designed a robust algorithmic procedure for the(More)
This paper describes a novel algorithm that builds an evolutionary tree with n leaves in O(n log l) time from a distance matrix computed from sample sequences of length l. Our algorithm combines the computational and statistical efficiency of a distance-based algorithm, Fast Harmonic Greedy Triplets, with the biological insights of the minimum evolution(More)
  • 1