Learn More
BACKGROUND There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity,(More)
An incremental updating technique is developed for maintenance of the association rules discovered by database mining. There have been many studies on eecient discovery of association rules in large databases. However, it is nontrivial to maintain such discovered rules in large databases because a database may allow frequent or occasional updates and such(More)
In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used.(More)
With the existence of many large transaction databases, the huge amounts of data, the high scal-ability of distributed systems, and the easy partition and distribution of a centralized database, it is important to investigate eecient methods for distributed mining of association rules. This study discloses some interesting relationships between locally(More)
A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modiication of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the case of insertion. The proposed algorithm FUP2 makes use of(More)
Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not eeective, because it requires a large amount of communication overhead. In this study, an eecient algorithm,(More)
Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform.(More)
With the standardization of XML as an information exchange language over the Internet, a huge amount of information is formatted in XML documents. In order to analyze this information efficiently, decomposing the XML documents and storing them in relational tables is a popular practice. However, query processing becomes expensive since, in many cases, an(More)
Data uncertainty is inherent in applications such as sensor monitoring systems, location-based services, and biological databases. To manage this vast amount of imprecise information, probabilistic databases have been recently developed. In this paper, we study the discovery of <i>frequent patterns and association rules</i> from probabilistic data under the(More)
What is an outlier? " An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. " Hawkins – Application of outlier detection would be credit card fraud Previous Outlier Detection Schemes • Clustering – Generate outliers as a by-product – Outliers are highly dependant on(More)