Learn More
SUMMARY SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it(More)
MOTIVATION Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very(More)
The Constrained Multiple Sequence Alignment problem is to align a set of sequences subject to a given constrained sequence, which arises from some knowledge of the structure of the sequences. This paper presents new algorithms for this problem, which are more efficient in terms of time and space (memory) than the previous algorithms [14], and with a(More)
Let <i>T</i> be a string with <i>n</i> characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for <i>T</i> in optimal space (i.e., <i>O</i>(<i>n</i>) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such(More)
Let G be a bipartite graph with positive integer weights on the edges and without isolated nodes. Let n, N and W be the node count, the largest edge weight and the total weight of G. Let k(x, y) be log x/ log(x 2 /y). We present a new decomposition theorem for maximum weight bipartite matchings and use it to design an O(√ nW/k(n, W/N))-time algorithm for(More)
MOTIVATION Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern(More)
We consider online scheduling algorithms in the dynamic speed scaling model, where a processor can scale its speed between 0 and some maximum speed T. The processor uses energy at rate s α when run at speed s, where α > 1 is a constant. Most modern processors use dynamic speed scaling to manage their energy usage. This leads to the problem of designing(More)
MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MEGAHIT assembles the data as a whole, i.e. no pre-processing(More)