- Djamal Belazzougui, Gonzalo Navarro, Daniel Valenzuela
- J. Discrete Algorithms
- 2011

We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least |CSA| + O(n lgD lg lgD ) or 2|CSA| + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new… (More)

Recent research on document retrieval for general texts has established the virtues of explicitly representing the so-called document array, which stores the document each pointer of the suffix array belongs to. While it makes document retrieval faster, this array occupies a significative amount of redundant space and is not easily compressible. In this… (More)

- Gonzalo Navarro, Daniel Valenzuela
- SEA
- 2012

Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various… (More)

- Gonzalo Navarro, Simon J. Puglisi, Daniel Valenzuela
- ACM Journal of Experimental Algorithmics
- 2014

Given a collection of documents and a query pattern, <i>document retrieval</i> is the problem of obtaining documents that are relevant to the query. The collection is available beforehand so that a data structure, called an index, can be built on it to speed up queries. While initially restricted to natural language text collections, document retrieval… (More)

- Veli Mäkinen, Valeria Staneva, Alexandru I. Tomescu, Daniel Valenzuela, Sebastian Wilzbach
- Discrete Applied Mathematics
- 2017

In the classical interval scheduling type of problems, a set of n jobs, characterized by their start and end time, need to be executed by a set of machines, under various constraints. In this paper we study a new variant in which the jobs need to be assigned to at most k identical machines, such that the minimum number of machines that are busy at the same… (More)

- Travis Gagie, Giovanni Manzini, Daniel Valenzuela
- ICABD
- 2014

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to… (More)

- Djamal Belazzougui, Veli Mäkinen, Daniel Valenzuela
- Encyclopedia of Algorithms
- 2008

- Veli Mäkinen, Daniel Valenzuela
- BMC Genomics
- 2014

Traditionally biological similarity search has been studied under the abstraction of a single string to represent each genome. The more realistic representation of diploid genomes, with two strings defining the genome, has so far been largely omitted in this context. With the development of sequencing techniques and better phasing routines through haplotype… (More)

Detection of genomic variants is commonly conducted by aligning a set of reads sequenced from an individual to the reference genome of the species and analyzing the resulting read pileup. Typically, this process finds a subset of variants already reported in databases and additional novel variants characteristic to the sequenced individual. Most of the… (More)

- Daniel Valenzuela
- SEA
- 2016