David D. Jensen

Learn More
Disruption-tolerant networks (DTNs) attempt to route network messages via intermittently connected nodes. Routing in such environments is difficult because peers have little information about the state of the partitioned network and transfer opportunities between peers are of limited duration. In this paper, we propose MaxProp, a protocol for effective(More)
We identify privacy risks associated with releasing network datasets and provide an algorithm that mitigates those risks. A network dataset is a graph representing entities connected by edges representing relations such as friendship, communication or shared activity. Maintaining privacy when publishing a network dataset is uniquely challenging because an(More)
Relational data offer a unique opportunity for improving the classification accuracy of statistical models. If two objects are related, inferring something about one object can aid inferences about the other. We present an iterative classification procedure that exploits this characteristic of relational data. This approach uses simple Bayesian classifiers(More)
Advances in technology have made it possible to collect data about individuals and the connections between them, such as email correspondence and friendships. Agencies and researchers who have collected such social network data often have a compelling interest in allowing others to analyze the data. However, in many cases the data describes relationships(More)
Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive samplingusing progressively larger samples as long as model accuracy improves. We(More)
Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees(More)
Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational datasets. For example, in citation data there are(More)
This paper evaluates several modifications of the Simple Bayesian Classifier to enable estimation and inference over relational data. The resulting Relational Bayesian Classifiers are evaluated on three real-world datasets and compared to a baseline SBC using no relational information. The approach we call INDEPVAL achieves the best results. We use(More)
Procedures for <i>collective inference</i> make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial transactions. Several recent studies indicate that(More)
We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical analysis shows that the error scales linearly with the number of(More)