• Publications
  • Influence
Optimal kernel choice for large-scale two-sample tests
TLDR
The new kernel selection approach yields a more powerful test than earlier kernel selection heuristics, and makes the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory.
Statistical guarantees for the EM algorithm: From population to sample-based analysis
TLDR
A general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM and consequences of the general theory for three canonical examples of incomplete-data problems are developed.
Learning generative models for protein fold families
TLDR
A new approach to learning statistical models from multiple sequence alignments (MSA) of proteins, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA, which encodes both the position‐specific conservation statistics and the correlated mutation statistics between sequential and long‐range pairs of residues.
Confidence sets for persistence diagrams
TLDR
This paper derives confidence sets that allow us to separate topological signal from topological noise, and brings some statistical ideas to persistent homology.
Robust estimation via robust gradient estimation
TLDR
The workhorse is a novel robust variant of gradient descent, and the conditions under which this gradient descent variant provides accurate estimators in a general convex risk minimization problem are provided.
Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues
TLDR
This paper studies a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity, and proposes and studies algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class.
Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence
TLDR
This work considers parametric ordinal models for pairwise comparison data involving a latent vector w* e Rd that represents the "qualities" of the d items being compared; this class of models includes the two most widely used parametric models|the Bradley-Terry-Luce (BTL) and the Thurstone models.
Minimax Localization of Structural Information in Large Noisy Matrices
TLDR
The SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition is characterized.
Noise Thresholds for Spectral Clustering
TLDR
The performance of a spectral algorithm for hierarchical clustering is analyzed and it is shown that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability.
Computationally Efficient Robust Sparse Estimation in High Dimensions
TLDR
The theory identifies a unified set of deterministic conditions under which the algorithm guarantees accurate recovery of sparse functionals, and provides a novel algorithm based on the same intuition which is able to take advantage of further structure of the problem to achieve nearly optimal rates.
...
...