Learn More
The new interdisciplinary field of Data Mining emerged in the early 1990s as a response to the profusion of digital data generated in numerous fields such as biology, chemistry, astronomy, advertising, banking and finance, retail market, stock market, and the WWW. In this paper, I describe an undergraduate course in Data Mining offered at the College of(More)
Outlier detection can lead to discovering unexpected and interesting knowledge, which is critically important to some areas such as monitoring of criminal activities in electronic commerce, credit card fraud, and the like. In This work, we propose an efficient outlier detection method with clusters as by-product, which works efficiently for large datasets.(More)
"One person's noise is another person's signal". Outlier detection is used to clean up datasets and also to discover useful anomalies, such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. Thus, outlier detection is critically important in the information-based society. This(More)
Data arising from genomic and proteomic experiments is amassing at high speeds resulting in huge amounts of raw data; consequently, the need for analyzing such biological data --- the understanding of which is still lagging way behind --- has been prominently solicited in the post-genomic era we are currently witnessing. In this paper we attempt to analyze(More)
Data clustering has been proven to be a promising data mining technique. Recently, there have been many attempts for clustering market-basket data. In this paper, we propose a parallelized hierarchical clustering approach on market-basket data (PH-Clustering), which is implemented using MPI. Based on the analysis of the major clustering steps, we adopt a(More)
In the early 1990s, a lot of research was conducted in the area of multi-level secure database systems. Most of the work was directed towards the security aspect without much concentration on query acceleration. In this paper, a P-tree [1] based algorithm using the Sea View model [5] for multilevel relations is presented to accelerate queries in multilevel(More)
Association rule mining (ARM) finds all the association rules in data, that match some measures of interest such as support and confidence. In certain situations where high support is not necessarily of interest, fixed-consequent association-rule mining for confident rules might be favored over traditional ARM. The need for fixed consequent ARM is becoming(More)
Vast amounts of information available online make plagiarism increasingly easy to commit, and this is particularly true of source code. The traditional approach of detecting copied work in a course setting is manual inspection. This is not only tedious but also typically misses code plagiarized from outside sources or even from an earlier offering of the(More)