#### Filter Results:

- Full text PDF available (15)

#### Publication Year

2012

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

Cytosine methylation is a DNA modification that has great impact on the regulation of gene expression and important implications for the biology and health of several living beings, including humans. Bisulfite conversion followed by next generation sequencing (BS-seq) of DNA is the gold standard technique used to detect DNA methylation at single-base… (More)

- Nicola Prezza
- ArXiv
- 2016

Longest Common Extension (LCE) queries are a fundamental sub-routine in many string-processing algorithms, including (but not limited to) suffix-sorting, string matching, and identification of palindrome factors and repeats. A LCE query takes as input two positions i, j in a text T ∈ Σ n and returns the length ℓ of the longest common prefix between T 's… (More)

- Alberto Policriti, Nicola Prezza
- 2016 Data Compression Conference (DCC)
- 2016

In this paper, we show that the LZ77 factorization of a text T ε Σ<sup>n</sup> can be computed in O(R log n) bits of working space and O(n log R) time, R being the number of runs in the Burrows-Wheeler transform of T (reversed). For (extremely) repetitive inputs, the working space can be as low as O(log n) bits: exponentially smaller than the… (More)

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide… (More)

- Alberto Policriti, Nicola Prezza
- SPIRE
- 2015

- Alberto Policriti, Nicola Prezza
- ArXiv
- 2016

In this paper we address the longest common extension (LCE) problem: to compute the length ℓ of the longest common prefix between any two suffixes of T ∈ Σ n with Σ = {0,. .. σ − 1}. We present two fast and space-efficient solutions based on (Karp-Rabin) fingerprinting and sampling. Our first data structure exploits properties of Mersenne prime numbers when… (More)

- Alberto Policriti, Nicola Prezza
- BMC Bioinformatics
- 2015

The high throughput of modern NGS sequencers coupled with the huge sizes of genomes currently analysed, poses always higher algorithmic challenges to align short reads quickly and accurately against a reference sequence. A crucial, additional, requirement is that the data structures used should be light. The available modern solutions usually are a… (More)

- Alberto Policriti, Nicola Gigante, Nicola Prezza
- LATA
- 2015

- Alberto Policriti, Nicola Prezza
- ISAAC
- 2014

We consider the problem of indexing a text T (of length n) with a light data structure that supports efficient search of patterns P (of length m) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of… (More)

- Philip Bille, Inge Li Gørtz, Nicola Prezza
- 2017 Data Compression Conference (DCC)
- 2017

Re-Pair [5] is an effective grammar-based compression scheme achieving strong compression rates in practice. Let n, σ, and d be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and 5n + 4σ2 + 4d +… (More)