Learn More
We consider problems that can be characterized by large dynamic graphs. Communication networks provide the prototypical example of such problems where nodes in the graph are network IDs and the edges represent communication between pairs of network IDs. In such graphs, nodes and edges appear and disappear through time so that methods that apply to static(More)
Logistic regression-type models are used in many applications. Some examples include the classical dose-response experiment, prospective and retrospective studies of disease incidence (with and without matching), and the analysis of ordinal data. In most instances, the model is fitted by the method of maximum likelihood, which, like least squares, is(More)
Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover(More)
A feature of data mining that distinguishes it from " classical " machine learning (ML) and statistical modeling (SM) is scale. The community seems to agree on this yet progress to this point has been limited. We present a methodology that addresses scale in a novel fashion that has the potential for revolutionizing the field. While the methodology applies(More)
in a database. For many reasons—encoding errors, measurement errors, unrecorded causes of recorded features—the information in a database is almost always noisy; therefore, inference from databases invites applications of the theory of probability. From a statistical point of view, databases are usually uncontrolled convenience samples; therefore data(More)
We propose a methodology for assessing how ad campaigns in offline media such as print, audio and TV affect online interest in the advertiser's brand. Online interest can be measured by daily counts of the number of search queries that contain brand related keywords, by the number of visitors to the advertiser's web pages, by the number of pageviews at the(More)