Learn More
We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome.(More)
Phylogenetic inference is a <i>grand challenge</i> in Bioinformatics due to immense computational requirements. The increasing popularity of multi-gene alignments in biological studies, which typically provide a stable topological signal due to a more favorable ratio of the number of base pairs to the number of sequences, coupled with rapid accumulation of(More)
We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing(More)
MOTIVATION Error correction is critical to the success of next-generation sequencing applications, such as resequencing and de novo genome sequencing. It is especially important for high-throughput short-read sequencing, where reads are much shorter and more abundant, and errors more frequent than in traditional Sanger sequencing. Processing massive numbers(More)
Expressed sequence tags, abbreviated ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and understanding important genetic variations such as those resulting in diseases. In this paper, we present the design and development of a parallel software system for EST clustering.(More)
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed(More)
We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine gap penalty functions, full-sequence and subsequence matching, and space-saving algorithms. The best known sequential algorithms solve these problems in Omn time and Om + n(More)
UNLABELLED Error Correction is important for most next-generation sequencing applications because highly accurate sequenced reads will likely lead to higher quality results. Many techniques for error correction of sequencing data from next-gen platforms have been developed in the recent years. However, compared with the fast development of sequencing(More)