Disentangled Long-Read De Bruijn Graphs via Optical Maps

Abstract

While long reads produced by third-generation sequencing technology from, e.g, Pacific Biosciences have been shown to increase the quality of draft genomes in repetitive regions, fundamental computational challenges remain in overcoming their high error rate and assembling them efficiently. In this paper we show that the de Bruijn graph built on the long reads can be efficiently and substantially disentangled using optical mapping data as auxiliary information. Fundamental to our approach is the use of the positional de Bruijn graph and a succinct data structure for constructing and traversing this graph. Our experimental results show that over 97.7% of directed cycles have been removed from the resulting positional de Bruijn graph as compared to its non-positional counterpart. Our results thus indicate that disentangling the de Bruijn graph using positional information is a promising direction for developing a simple and efficient assembly algorithm for long reads. 1998 ACM Subject Classification J.3 [Life and Medical Sciences] Biology and Genetics, G.2.2 Graph Theory

DOI: 10.4230/LIPIcs.WABI.2017.1

6 Figures and Tables

Cite this paper

@inproceedings{Alipanahi2017DisentangledLD, title={Disentangled Long-Read De Bruijn Graphs via Optical Maps}, author={Bahar Alipanahi and Leena Salmela and Simon J. Puglisi and Martin D. Muggli and Christina Boucher}, booktitle={WABI}, year={2017} }