Learn More
Since the work of Kolpakov and Kucherov in [5, 6], it is known that ρ(n), the maximal number of runs in a string, is linear in the length n of the string. A lower bound of 3/(1 + √ 5)n ∼ 0.927n has been given by Franek and al. [3, 4], and upper bounds have been recently provided by Rytter, Puglisi and al., and Crochemore and Ilie (1.6n) [8, 7, 1]. However,(More)
BACKGROUND MG is an autoimmune disease of the neuromuscular junction. MG with thymus hyperplasia has been associated with, but not genetically linked to, the HLA-DR3 haplotype. OBJECTIVE To re-evaluate the association of HLA with MG in 656 patients with generalized disease and to test linkage of HLA to MG with thymus hyperplasia. METHOD Patients were(More)
With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large-scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a(More)
In this paper, we show that the linear encoding scheme efficiently implements weighted finite automata (WFA). WFA with t transitions can be hardwired with O(t) cells. They solve pattern matching problems in a pipelined way, parsing one character every clock cycle. With the massive parallelism of reconfigurable processors like FPGAs, a significant speed-up(More)
V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing(More)
Genomic data are growing exponentially and are daily searched by thousands of biologists. To reduce the search time, efficient parallelism can be exploited by dispatching data among a cluster of processing units able to scan locally and independently their own data. If PC clusters are well suited to support this type of parallelism, we propose to substitute(More)
MOTIVATION The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of(More)
Position Weight Matrices (PWMs) are broadly used in computational biology. The basic problem, SCAN, aims to find the occurrences of a given PWM in large sequences. Some other PWM tasks share a common NP-hard subprob-lem, SCOREDISTRIBUTION. The existing algorithms rely on the enumeration on a large set of scores or words, and they are mostly not suitable for(More)