Kamalika Chaudhuri

Learn More
Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under(More)
The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems(More)
Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such(More)
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacy-preserving logistic regression. First we apply an idea of Dwork et al. [6] to design a privacy-preserving logistic regression algorithm. This involves bounding the sensitivity of regularized(More)
In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The(More)
We give improved approximation algorithms for a variety of latency minimization problems. In particular, we give a 3.591-approximation to the minimum latency problem, improving on previous algorithms by a multiplicative factor of 2. Our techniques also give similar improvements for related problems like k-traveling repairmen and its multiple depot variant.(More)
For a density f on R, a high-density cluster is any connected component of {x : f(x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the(More)
We analyze replication of resources by server nodes that act selfishly, using a game-theoretic approach. We refer to this as the <i>selfish caching problem</i>. In our model, nodes incur either cost for replicating resources or cost for access to a remote replica. We show the existence of pure strategy Nash equilibria and investigate the price of anarchy,(More)
<lb>This work considers the problem of learning the structure of multivariate linear tree models, which<lb>include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables<lb>such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolu-<lb>tionary trees. The setting is one where we(More)