Effective and Scalable Clustering on Massive Attributed Graphs

  title={Effective and Scalable Clustering on Massive Attributed Graphs},
  author={Renchi Yang and Jieming Shi and Yin David Yang and Keke Huang and Shiqi Zhang and Xiaokui Xiao},
  journal={Proceedings of the Web Conference 2021},
Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar. This problem is challenging on massive graphs, e.g., with millions of nodes and billions of attribute values. For such graphs, existing… 

Figures and Tables from this paper

Co-clustering Interactions via Attentive Hypergraph Neural Network

An attentive hypergraph neural network to encode the entire interactions, where an attention mechanism is utilized to select important attributes for explanations, and a novel co-clustering method to perform a joint clustering for the representations of interactions and the corresponding distributions of attribute selection, namely cluster-based consistency.



Scaling attributed network embedding to massive graphs

This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification.

Attributed Graph Clustering: an Attribute-aware Graph Embedding Approach

A graph embedding approach to clustering content-enriched graphs by embedding each vertex of a graph into a continuous vector space where the localized structural and attributive information of vertices can be encoded in a unified, latent representation.

Efficient Estimation of Heat Kernel PageRank for Local Clustering

TEA and TEA+, two novel local graph clustering algorithms based on heat kernel PageRank that provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity and outperforms the state-of-the-art algorithm by more than four times on most benchmark datasets.

Graph Clustering Based on Structural/Attribute Similarities

This paper proposes a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure, which partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values.

Clustering attributed graphs: Models, measures and methods

This article characterizing the main existing clustering methods and highlighting their conceptual differences are characterized, covering the important topic of clustering evaluation and identifying current open problems.

Clustering Large Attributed Graphs: An Efficient Incremental Approach

An efficient algorithm Inc-Cluster is proposed to incrementally update the random walk distances given the edge weight increments to achieve significant speedup over SA-Clusters on large graphs, while achieving exactly the same clustering quality in terms of intra-cluster structural cohesiveness and attribute value homogeneity.

Homogeneous network embedding for massive graphs via reweighted personalized PageRank

A simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware, and an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly.

Attributed Graph Clustering via Adaptive Graph Convolution

This paper proposes an adaptive graph convolution method for attributed graph clustering that exploits high-ordergraph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs.

A model-based approach to attributed graph clustering

This paper develops a Bayesian probabilistic model for attributed graphs that provides a principled and natural framework for capturing both structural and attribute aspects of a graph, while avoiding the artificial design of a distance measure.

Community Detection in Attributed Graphs: An Embedding Approach

A novel embedding based model based on the observation of densely-connected structures in communities and a novel community structure embedding method to encode inherent community structures via underlying community memberships is developed.