DNACompress: fast and effective DNA sequence compression

@article{Chen2002DNACompressFA,
  title={DNACompress: fast and effective DNA sequence compression},
  author={Xin Chen and Ming Li and Bin Ma and John Tromp},
  journal={Bioinformatics},
  year={2002},
  volume={18 12},
  pages={
          1696-8
        }
}
While achieving the best compression ratios for DNA sequences, our new DNACompress program significantly improves the running time of all previous DNA compression programs. 

Tables and Topics from this paper

DNA Compression Challenge Revisited
Standard compression algorithms are not able to compress DNA sequences. Recently, new algorithms have been introduced specifically for this purpose, often using detection of long approximate repeats.Expand
DNA Compression Challenge Revisited: A Dynamic Programming Approach
TLDR
This paper presents another algorithm, DNAPack, based on dynamic programming, which compresses DNA slightly better, while the cost of dynamic programming is almost negligible. Expand
Dynamic Programming Based DNA Compression Algorithm through Substitution Method
In this paper, a DNA sequence compression algorithm through substitution mechanism has been proposed. The field of bioinformatics research deals with huge DNA data, which requires to be compressedExpand
A DNA sequence compression algorithm based on LUT and LZ77
  • Sheng Bao, Shi Chen, Z. Jing, Ran Ren
  • Computer Science, Mathematics
  • Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.
  • 2005
TLDR
A new DNA sequence compression algorithm which is based on LUT and LZ77 algorithm which can approach a compression ratio of 1.9 bits/base and even lower is introduced. Expand
An Improvised DNA Sequence Compressor Using Pattern Recognition
TLDR
This paper presents an improvised version of (PRDNAC) Pattern Recognition based DNA Sequence Compression algorithm which compresses the DNA sequences. Expand
A Fixed-Length Coding Algorithm for DNA Sequence Compression
TLDR
A new algorithm codes non-N bases 1 in fixed length that dramatically reduces the time of coding and decoding than previous DNA compression algorithms and some universal compression programs. Expand
A Lossless Compression Algorithm for DNA Sequences
TLDR
A Lossless Compression Algorithm (LCA) is proposed, providing a new encoding method that achieves a better compression ratio than that of existing DNA-oriented compression algorithms, when compared to GenCompress, DNACompress, and DNAPack. Expand
A Novel Approach for Compressing DNA Sequences Using Semi-Statistical Compressor
TLDR
An algorithm for DNA sequence compression that uses a replacement method that is competent and useful for DNA chain compression and is better than existing compressors on typical DNA sequence datasets is presented. Expand
Efficient Storage of Massive Biological Sequences in Compact Form
TLDR
A novel algorithm for DNA sequence compression that makes use of a transformation and statistical properties within the transformed sequence, which is able to search the pattern inside the compressed text which is useful in knowledge discovery. Expand
An efficient compressor for biological sequences
  • A. Gupta, K. K. Dubey
  • Computer Science
  • 2013 3rd IEEE International Advance Computing Conference (IACC)
  • 2013
TLDR
This paper introduces a state of art compressor for DNA sequences that makes use of a replacement method and is shown to outperform existing compressors on typical DNA sequence datasets. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 12 REFERENCES
A New Challenge for Compression Algorithms: Genetic Sequences
TLDR
A lossless algorithm is presented, biocompress-2, to compress the information contained in DNA and RNA sequences, based on the detection of regularities, such as the presence of palindromes, which leads to the highest compression of DNA. Expand
Biological sequence compression algorithms.
TLDR
This paper improves the CTW (Context Tree Weighting Method) so that characteristic structures of DNA sequences are available and achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. Expand
Language trees and zipping.
TLDR
A very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series based on data-compression techniques, featuring highly accurate results for language recognition, authorship attribution, and language classification. Expand
PatternHunter: faster and more sensitive homology search
TLDR
A new homology search algorithm 'PatternHunter' is presented that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. Expand
On J. Goodman's comment to "Language Trees and Zipping"
Motivated by the recent submission to cond-mat archives by J. Goodman (cond-mat/0202383) whose results apparently discredit the approach we have proposed in a recent paper (Phys. Rev. Lett., 88,Expand
An information-based sequence distance and its application to whole mitochondrial genome phylogeny
TLDR
A sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance is presented. Expand
A theory of uncheatable program plagiarism detection and its practical implementation
  • 2002
A theory of uncheatable program plagiarism detection and its practical implementation
  • A theory of uncheatable program plagiarism detection and its practical implementation
  • 2002
A theory of uncheatable program plagiarism detection and its practical implementation. SID website at http://dna.cs.ucsb.edu/SID
  • 2002
A compression algorithm for DNA sequences.
  • C. Xin, K. Sam, L. Ming
  • Medicine
  • IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society
  • 2001
...
1
2
...