Pradeep Ravikumar

Learn More
Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall,(More)
High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse and(More)
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. The method is analyzed under high-dimensional scaling, in which both the(More)
Given i.i.d. observations of a random vector X ∈ R, we study the problem of estimating both its covariance matrix Σ, and its inverse covariance or concentration matrix Θ = (Σ). When X is multivariate Gaussian, the non-zero structure of Θ is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Θ is the(More)
The l1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a(More)
We consider multi-task learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use of l1/lq norm block-regularizations with q > 1 for such blocksparse structured problems, establishing strong guarantees on recovery even under high-dimensional scaling where the(More)
more complex examples of duplicate records that are not identical. Variations in representation across information sources can arise from differences in formats that store data, typographical and optical character recognition (OCR) errors, and abbreviations. Variations are particularly pronounced in data that’s automatically extracted from Web pages and(More)
We describe an open-source Java toolkit of methods for matching names and records. We summarize results obtained from using various string distance metrics on the task of matching entity names. These metrics include distance functions proposed by several different communities, such as edit-distance metrics, fast heuristic string comparators, token-based(More)
In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional — the flip probability depends on the class. We provide two(More)
Quadratic program relaxations are proposed as an alternative to linear program relaxations and tree reweighted belief propagation for the metric labeling or MAP estimation problem. An additional convex relaxation of the quadratic approximation is shown to have additive approximation guarantees that apply even when the graph weights have mixed sign or do not(More)