Learn More
Since the work of Kolpakov and Kucherov in [5, 6], it is known that ρ(n), the maximal number of runs in a string, is linear in the length n of the string. A lower bound of 3/(1 + √ 5)n ∼ 0.927n has been given by Franek and al. [3, 4], and upper bounds have been recently provided by Rytter, Puglisi and al., and Crochemore and Ilie (1.6n) [8, 7, 1]. However,(More)
BACKGROUND Dogs and rats have a highly developed capability to detect and identify odorant molecules, even at minute concentrations. Previous analyses have shown that the olfactory receptors (ORs) that specifically bind odorant molecules are encoded by the largest gene family sequenced in mammals so far. RESULTS We identified five amino acid patterns(More)
With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large-scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a(More)
BACKGROUND Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random(More)
V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing(More)
Genomic data are growing exponentially and are daily searched by thousands of biologists. To reduce the search time, efficient parallelism can be exploited by dispatching data among a cluster of processing units able to scan locally and independently their own data. If PC clusters are well suited to support this type of parallelism, we propose to substitute(More)
MOTIVATION The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of(More)