Optimal compressed representation of high throughput sequence data via light assembly

@inproceedings{Ginart2017OptimalCR,
  title={Optimal compressed representation of high throughput sequence data via light assembly},
  author={Antonio A. Ginart and Joseph Hui and Kun Zhu and Ibrahim Numanagi{\'c} and Thomas A. Courtade and S{\"u}leyman Cenk Sahinalp and David N. Tse},
  booktitle={Nature Communications},
  year={2017}
}
The most effective genomic data compression methods either assemble reads into contigs, or replace them with their alignment positions on a reference genome. Such methods require significant computational resources, but faster alternatives that avoid using explicit or de novo-constructed references fail to match their performance. Here, we introduce a new reference-free compressed representation for genomic data based on light de novo assembly of reads, where each read is represented as a node… CONTINUE READING

Similar Papers

Loading similar papers…