Learn More
We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least |CSA| + O(n lg D lg lg D) or 2|CSA| + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new(More)
Given a collection of documents and a query pattern, <i>document retrieval</i> is the problem of obtaining documents that are relevant to the query. The collection is available beforehand so that a data structure, called an index, can be built on it to speed up queries. While initially restricted to natural language text collections, document retrieval(More)
The pygmy raccoon Procyon pygmaeus and dwarf coati Nasua nelsoni, both endemic to Cozumel Island, Mexico, are two of the most endangered carnivores in the world, and their persistence requires active management. However, the taxonomic status of these populations remains unclear. Therefore we investigated mitochondrial DNA variation using the control region(More)
Many disciplines, from human genetics and oncology to plant and animal breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines(More)
Traditionally biological similarity search has been studied under the abstraction of a single string to represent each genome. The more realistic representation of diploid genomes, with two strings defining the genome, has so far been largely omitted in this context. With the development of sequencing techniques and better phasing routines through haplotype(More)
Recent genetic and neuropathologic advances support the concept that frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) are overlapping multisystem disorders. While 10-15% of ALS patients fulfil criteria for FTD, features of motor neuron disease appear in approximately 15% of FTD patients, during the evolution of the disease. This overlap(More)
Detection of genomic variants is commonly conducted by aligning a set of reads sequenced from an individual to the reference genome of the species and analyzing the resulting read pileup. Typically, this process finds a subset of variants already reported in databases and additional novel variants characteristic to the sequenced individual. Most of the(More)