Learn More
The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on(More)
SUMMARY In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA(More)
BACKGROUND DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. RESULTS Here, we describe a method to detect copy number variation using shotgun sequencing,(More)
This paper presents Salt, a distributed database that allows developers to improve the performance and scalability of their ACID applications through the incremental adoption of the BASE approach. Salt's motivation is rooted in the Pareto principle: for many applications , the transactions that actually test the performance limits of ACID are few. To(More)
Modeling peer-to-peer (P2P) networks is a challenge for P2P researchers. In this paper, we provide a detailed analysis of large-scale hybrid P2P overlay network topology, using Gnutella as a case study. First, we reexamine the power-law distributions of the Gnutella network discovered by previous researchers. Our results show that the current Gnutella(More)
This paper describes the design, implementation, and evaluation of Callas, a distributed database system that offers to unmodified, transactional ACID applications the opportunity to achieve a level of performance that can currently only be reached by rewriting all or part of the application in a BASE/NoSQL Style. The key to combining performance and ease(More)
A primary delay is the deviation from a scheduled process time caused by disruption within the process. Delay is controlled by timetable and shows the characters of random occurrence. Rail transit system is a complex system which is dynamic, nonlinear, self-adaptive, random-occurrence and schedule-controllability. The Multi-agent method enlarges the range(More)
During the past few years, unstructured peer-to- peer (P2P) file-sharing systems have witnessed a significant increase in popularity. However, there lacks a systematic study on graph properties of the overlay topology. In this paper, we use accurate snapshots of the Gnutella overlay that span over roughly three years to explore changes in graph properties(More)