Learn More
With the increasing amount of DNA sequence information deposited in public databases, searching for similarity to a query sequence has become a basic operation in molecular biology. But even today’s fast algorithms reach their limits when applied to all-versus-all comparisons of large databases. Here we present a new database searching algorithm called(More)
In metazoans, thousands of DNA replication origins (Oris) are activated at each cell cycle. Their genomic organization and their genetic nature remain elusive. Here, we characterized Oris by nascent strand (NS) purification and a genome-wide analysis in Drosophila and mouse cells. We show that in both species most CpG islands (CGI) contain Oris, although(More)
MOTIVATION PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation(More)
In the class of repeated sequences that occur in DNA, minisatellites have been found polymorphic and became useful tools in genetic mapping and forensic studies. They consist of a heterogeneous tandem array of a short repeat unit. The slightly different units along the array are called variants. Minisatellites evolve mainly through tandem duplications and(More)
MOTIVATION Evolution acts in several ways on DNA: either by mutating a base, or by inserting, deleting or copying a segment of the sequence (Ruddle, 1997; Russell, 1994; Li and Grauer, 1991). Classical alignment methods deal with point mutations (Waterman, 1995), genome-level mutations are studied using genome rearrangement distances (Bafna and Pevzner,(More)
We consider the set Γ (n) of all period sets of strings of length n over a finite alphabet. We show that there is redundancy in period sets and introduce the notion of an irreducible period set. We prove that Γ (n) is a lattice under set inclusion and does not satisfy the JordanDedekind condition. We propose the first enumeration algorithm for Γ (n) and(More)
In humans and mice, meiotic recombination events cluster into narrow hotspots whose genomic positions are defined by the PRDM9 protein via its DNA binding domain constituted of an array of zinc fingers (ZnFs). High polymorphism and rapid divergence of the Prdm9 gene ZnF domain appear to involve positive selection at DNA-recognition amino-acid positions, but(More)
A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in(More)