Variational perspective on local graph clustering

  title={Variational perspective on local graph clustering},
  author={Kimon Fountoulakis and Farbod Roosta-Khorasani and Julian Shun and Xiang Cheng and Michael W. Mahoney},
  journal={Mathematical Programming},
AbstractModern graph clustering applications require the analysis of large graphs and this can be computationally expensive. In this regard, local spectral graph clustering methods aim to identify well-connected clusters around a given “seed set” of reference nodes without accessing the entire graph. The celebrated Approximate Personalized PageRank (APPR) algorithm in the seminal paper by Andersen et al. (in: FOCS ’06 proceedings of the 47th annual IEEE symposium on foundations of computer… 

p-Norm Flow Diffusion for Local Graph Clustering

This work proposes a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for local clustering and demonstrates the optimal solutions for these optimization problems and their usefulness in finding low conductance cuts around input seed set.

Weighted flow diffusion for local graph clustering with node attributes: an algorithm and statistical guarantees

This work presents a simple local graph clustering algorithm for graphs with node attributes, based on the idea of using mass locally in the graph while accounting for both structural and attribute proximities, and shows that incorporating node attributes leads to superior local clustering performances using real-world graph datasets.

Statistical guarantees for local graph clustering

It is shown that l1-regularized PageRank and approximate personalized PageRank (APPR), another very popular method for local graph clustering, are equivalent in the sense that one can lower and upper bound the output of one with theoutput of the other.

Multiway p-spectral graph cuts on Grassmann manifolds

This work presents a novel direct multiway spectral clustering algorithm in the p -norm, a nonlinear generalization of the standard graph Laplacian, recasted as an unconstrained minimization problem on a Grassmann manifold, and demonstrates the effectiveness and accuracy of the algorithm in various artificial test-cases.

Edge-based Local Push for Personalized PageRank

The proposed EdgePush algorithm is a novel method for computing SSPPR queries on weighted graphs that decomposes the aforementioned push operations in edge-based push, allowing the algorithm to operate at the edge level granularity, and flexibly distribute the probabilities according to edge weights.

Residual2Vec: Debiasing graph embedding with random graphs

This work investigates the impact of the random walks’ bias on graph embedding and proposes residual2vec, a generalgraph embedding method that can debias various structural biases in graphs by using random graphs, and demonstrates that this debiasing not only improves link prediction and clustering performance but also allows us to explicitly model salient structural properties in graphs.

Flow-based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance

A unifying fractional programming optimization framework is presented that permits us to distill out in a simple way the crucial components of all these cluster improvement algorithms, and makes apparent similarities and differences between related methods.

Semi-supervised Local Cluster Extraction by Compressive Sensing

This paper proposes a new semi-supervised local cluster extraction approach by applying the idea of compressive sensing based on two pioneering works under the same framework and improves the existing works by making the initial cut to be the entire graph and overcomes a major limitation of existing works.

Towards Training Graph Neural Networks with Node-Level Differential Privacy

This work adopts the training framework utilizing personalized PageRank to decouple the message-passing process from feature aggregation during training GNN models and proposes differentially private PageRank algorithms to protect graph topology information formally.

Targeted pandemic containment through identifying local contact network bottlenecks

This paper proposes a new flow-based edge-betweenness centrality method for detecting bottleneck edges that connect communities in contact networks and demonstrates empirically that the proposed method is orders of magnitude faster than existing methods.



Exploiting Optimization for Local Graph Clustering

This work clarifies the relationship between the local spectral algorithm of (Andersen, Chung and Lang, FOCS '06) and a variant of a well-studied optimization objective and develops a local spectral graph clustering algorithm that has improved theoretical convergence properties.

A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally

This paper introduces a locally-biased analogue of the second eigenvector of the Laplacian matrix, and demonstrates its usefulness at highlighting local properties of data graphs in a semi-supervised manner and shows how it can applied to finding locally- biased sparse cuts around an input vertex seed set in social and information networks.

Flow-Based Algorithms for Local Graph Clustering

This work shows how to use LocalImprove to obtain a constant approximation O(OPT) as long as CONN/OPT = Omega(1), the first flow-based algorithm and shows that spectral methods are not the only viable approach to the construction of local graph partitioning algorithm and open door to the study of algorithms with even better approximation and locality guarantees.

A Simple and Strongly-Local Flow-Based Method for Cut Improvement

This work introduces and analyzes a new method for locally-biased graph-based learning called SimpleLocal, which finds good conductance cuts near a set of seed vertices and achieves localization through an implicit L1-norm penalty term.

Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow

A case study of approximation algorithms for finding locally-biased partitions in data graphs, demonstrating connections between min-cut objectives, a personalized version of the popular PageRank vector, and the highly effective "push" procedure for computing an approximation to personalized PageRank.

A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning

This work presents a local clustering algorithm, a useful primitive for handling massive graphs, such as social networks and web-graphs, that finds a good cluster---a subset of vertices whose internal connections are significantly richer than its external connections---near a given vertex.

Heat kernel based community detection

This work presents the first deterministic, local algorithm to compute this type of graph diffusion and uses that algorithm to study the communities that it produces, indicating that the communities produced by this method have better conductance than those produced by PageRank, although they take slightly longer to compute.

Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

This paper employs approximation algorithms for the graph-partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities, and defines the network community profile plot, which characterizes the "best" possible community—according to the conductance measure—over a wide range of size scales.

Isoperimetric Partitioning: A New Algorithm for Graph Partitioning

It is shown empirically that this algorithm is competitive with other global partitioning algorithms in terms of partition quality, and easy to parallelize, does not require coordinate information, and handles nonplanar graphs, weighted graphs, and families of graphs which are known to cause problems for other methods.

Local Graph Partitioning using PageRank Vectors

An improved algorithm for computing approximate PageRank vectors, which allows us to find a cut with conductance at most oslash and approximately optimal balance in time O(m log4 m/oslash) in time proportional to its size.