Learn More
Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover(More)
A feature of data mining that distinguishes it from " classical " machine learning (ML) and statistical modeling (SM) is scale. The community seems to agree on this yet progress to this point has been limited. We present a methodology that addresses scale in a novel fashion that has the potential for revolutionizing the field. While the methodology applies(More)
in a database. For many reasons—encoding errors, measurement errors, unrecorded causes of recorded features—the information in a database is almost always noisy; therefore, inference from databases invites applications of the theory of probability. From a statistical point of view, databases are usually uncontrolled convenience samples; therefore data(More)
The goal of this thesis is to modify projection pursuit by trading accuracy for interpretability. The modification produces a more parsimonious and understandable model without sacrificing the structure which projection pursuit seeks. The method retains the nonlinear versatility of projection pursuit while clarifying the results. Following an introduction(More)
We propose a methodology for assessing how ad campaigns in offline media such as print, audio and TV affect online interest in the advertiser's brand. Online interest can be measured by daily counts of the number of search queries that contain brand related keywords, by the number of visitors to the advertiser's web pages, by the number of pageviews at the(More)
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data(More)
The quest to nd models usefully characterizing data is a process central to the scientiic method, and has been carried out on many fronts. Researchers from an expanding number of elds have designed algorithms to discover rules or equations that capture key relationships between variables in a database. The task of this chapter is to provide a perspective on(More)