• Corpus ID: 16087181

Even better correction of genome sequencing data

@article{Dlugosz2017EvenBC,
  title={Even better correction of genome sequencing data},
  author={Maciej Dlugosz and Sebastian Deorowicz and Marek Kokot},
  journal={ArXiv},
  year={2017},
  volume={abs/1703.00690}
}
We introduce an improved version of RECKONER, an error corrector for Illumina whole genome sequencing data. By modifying its workflow we reduce the computation time even 10 times. We also propose a new method of determination of $k$-mer length, the key parameter of $k$-spectrum-based family of correctors. The correction algorithms are examined on huge data sets, i.e., human and maize genomes for both Illumina HiSeq and MiSeq instruments. 

Figures and Tables from this paper

References

SHOWING 1-10 OF 24 REFERENCES
RECKONER: read error corrector based on KMC
TLDR
A new correction algorithm capable of processing eukaryotic close to 500 Mbp‐genome‐size, high error‐rated data using less than 4 GB of RAM in about 35 min on 16‐core computer is introduced.
Correcting Illumina data
TLDR
A thorough comparison of the efficiency of the current state-of-the-art programs for correcting Illumina data and research directions for further improvement are provided.
Trowel: a fast and accurate error correction module for Illumina sequencing reads
TLDR
Trowel, a massively parallelized and highly efficient error correction module for Illumina read data that both corrects erroneous base calls and boosts base qualities based on the k-mer spectrum, achieves high accuracy for different short read sequencing applications.
RACER: Rapid and accurate correction of errors in reads
TLDR
This work proposes RACER (Rapid and Accurate Correction of Errors in Reads), a new software program for correcting errors in sequencing data that has better error-correcting performance than existing programs, is faster and requires less memory.
BFC: correcting Illumina sequencing errors
UNLABELLED BFC is a free, fast and easy-to-use sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm but still maintains a speed comparable to implementations
Mason – A Read Simulator for Second Generation Sequencing Data
TLDR
A read simulator software for Illumina, 454 and Sanger reads that has been written with performance in mind and can sample reads from large genomes.
ACE: accurate correction of errors using K-mer tries
TLDR
A tool, ACE, based on K-mer tries to correct substitution errors in Illumina archives, which yields higher gains in terms of coverage depth, outperforming state-of-the-art competitors in the majority of cases.
Blue: correcting sequencing errors using consensus and context
TLDR
Blue is an error-correction algorithm based on k-mer consensus and context that can correct substitution, deletion and insertion errors, as well as uncalled bases, and is usable on large sequencing datasets.
Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data
TLDR
This article uses the k-mer spectrum approach and introduces three correction techniques in a multistage workflow: two-sided conservative correction, one-sided aggressive correction and voting-based refinement to reveal that Musket is consistently one of the top performing correctors for Illumina short-read data.
Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data
TLDR
Karect is a novel error correction technique based on multiple alignment that supports substitution, insertion and deletion errors and can handle non-uniform coverage as well as moderately covered areas of the sequenced genome.
...
...