• Publications
  • Influence
Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters
TLDR
This paper employs approximation algorithms for the graph-partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities, and defines the network community profile plot, which characterizes the "best" possible community—according to the conductance measure—over a wide range of size scales.
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
TLDR
An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.
Empirical comparison of algorithms for network community detection
TLDR
Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.
Statistical properties of community structure in large social and information networks
TLDR
It is found that a generative model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community structure similar to that observed in nearly every network dataset examined.
CUR matrix decompositions for improved data analysis
TLDR
An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.
Relative-Error CUR Matrix Decompositions
TLDR
These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication
TLDR
A model (the pass-efficient model) is presented in which the efficiency of these and other approximate matrix algorithms may be studied and which is argued is well suited to many applications involving massive data sets.
A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions
The ability of simple potential functions to reproduce accurately the density of liquid water from −37 to 100 °C at 1 to 10 000 atm has been further explored. The result is the five-site TIP5P model,
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix
TLDR
Two simple and intuitive algorithms are presented which compute a description of a low-rank approximation of a singular value decomposition (SVD) to an matrix of rank not greater than a specified rank, and which are qualitatively faster than the SVD.
An improved approximation algorithm for the column subset selection problem
TLDR
A novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A, and it is proved that the spectral norm bound improves upon the best previously-existing result and is roughly O(√k!) better than the best previous algorithmic result.
...
...