Learn More
BACKGROUND There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity,(More)
An incremental updating technique is developed for maintenance of the association rules discovered by database mining. There have been many studies on eecient discovery of association rules in large databases. However, it is nontrivial to maintain such discovered rules in large databases because a database may allow frequent or occasional updates and such(More)
—In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used.(More)
Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not eeective, because it requires a large amount of communication overhead. In this study, an eecient algorithm,(More)
Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform.(More)
With the existence of many large transaction databases, the huge amounts of data, the high scal-ability of distributed systems, and the easy partition and distribution of a centralized database, it is important to investigate eecient methods for distributed mining of association rules. This study discloses some interesting relationships between locally(More)
Data uncertainty is inherent in applications such as sensor monitoring systems, location-based services, and biological databases. To manage this vast amount of imprecise information, probabilistic databases have been recently developed. In this paper, we study the discovery of <i>frequent patterns and association rules</i> from probabilistic data under the(More)
What is an outlier? " An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. " Hawkins – Application of outlier detection would be credit card fraud Previous Outlier Detection Schemes • Clustering – Generate outliers as a by-product – Outliers are highly dependant on(More)
To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared(More)
—With the standardization of XML as an information exchange language over the net, a huge amount of information is formatted in XML documents. In order to analyze this information efficiently, decomposing the XML documents and storing them in relational tables is a popular practice. However, query processing becomes expensive since, in many cases, an(More)