Learn More
Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover(More)
in a database. For many reasons—encoding errors, measurement errors, unrecorded causes of recorded features—the information in a database is almost always noisy; therefore, inference from databases invites applications of the theory of probability. From a statistical point of view, databases are usually uncontrolled convenience samples; therefore data(More)
A feature of data mining that distinguishes it from " classical " machine learning (ML) and statistical modeling (SM) is scale. The community seems to agree on this yet progress to this point has been limited. We present a methodology that addresses scale in a novel fashion that has the potential for revolutionizing the field. While the methodology applies(More)
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data(More)
The quest to nd models usefully characterizing data is a process central to the scientiic method, and has been carried out on many fronts. Researchers from an expanding number of elds have designed algorithms to discover rules or equations that capture key relationships between variables in a database. The task of this chapter is to provide a perspective on(More)