Kalle Karhu

Learn More
Many document collections consist largely of repeated material , and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document(More)
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string(More)
Alignment to a genomic sequence is a common task in modern bioinformatics. By improving the methods used, significant amount of time and resources can be saved. We have developed a new genomic alignment search tool, called GAST, for sequences of at least 160 nt. GAST is many times faster than commonly used alignment tools BLAT and Mega BLAST. As the sizes(More)
Self-indexes are largely studied and widely applied structures in string matching. However, the exact matching of multiple patterns using self-indexes is a topic that has not been the subject of concentrated study although it is an area that may have direct and indirect applications and uses in fields such as bioinformatics. This paper presents a method of(More)
  • 1