Learn More
Many document collections consist largely of repeated material , and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document(More)
Motif recognition is a challenging problem in bioinformatics due to the diversity of protein motifs. Many existing algorithms identify motifs of a given length, thus being either not applicable or not efficient when searching simultaneously for motifs of various lengths. Searching for gapped motifs, although very important, is a highly time-consuming task(More)
MicroRNAs (miRNAs) play an important role in eukaryotic gene regulation. Although thousands of miRNAs have been identified in laboratories around the world, most of their targets still remain unknown. Different computational techniques exist to predict miRNA targets. In this article, we propose a new method for identifying human miRNA-mRNA interactions(More)
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string(More)
Self-indexes are largely studied and widely applied structures in string matching. However, the exact matching of multiple patterns using self-indexes is a topic that has not been the subject of concentrated study although it is an area that may have direct and indirect applications and uses in fields such as bioinformatics. This paper presents a method of(More)
Many retroviral vectors for hematopoietic cell and other clinical gene therapy are derived from murine packaging cell lines. The exposure of these retroviruses and packaging cell lines to adult human serum (AS) inactivates them by complement and anti-alpha-galactosyl natural antibody-mediated mechanisms. We show that virus stability and infection efficiency(More)
Medical additive manufacturing requires standard tessellation language (STL) models. Such models are commonly derived from computed tomography (CT) images using thresholding. Threshold selection can be performed manually or automatically. The aim of this study was to assess the impact of manual and default threshold selection on the reliability and accuracy(More)
Alignment to a genomic sequence is a common task in modern bioinformatics. By improving the methods used, significant amount of time and resources can be saved. We have developed a new genomic alignment search tool, called GAST, for sequences of at least 160 nt. GAST is many times faster than commonly used alignment tools BLAT and Mega BLAST. As the sizes(More)