• Corpus ID: 5763314

An Empirical Comparison of the Summarization Power of Graph Clustering Methods

@article{Liu2015AnEC,
  title={An Empirical Comparison of the Summarization Power of Graph Clustering Methods},
  author={Yike Liu and Neil Shah and Danai Koutra},
  journal={ArXiv},
  year={2015},
  volume={abs/1511.06820}
}
How do graph clustering techniques compare with respect to their summarization power? How well can they summarize a million-node graph with a few representative structures? Graph clustering or community detection algorithms can summarize a graph in terms of coherent and tightly connected clusters. In this paper, we compare and contrast different techniques: METIS, Louvain, spectral clustering, SlashBurn and KCBC, our proposed k-core-based clustering method. Unlike prior work that focuses on… 

Figures and Tables from this paper

Reducing large graphs to small supergraphs: a unified approach
TLDR
This paper proposes CONditional Diversified Network Summarization (CondeNSe), a Minimum Description Length-based method that summarizes a given graph with approximate “supergraphs” conditioned on a set of diverse, predefined structural patterns.
Graph Summarization Methods and Applications: A Survey
TLDR
This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data, and categorizes summarization approaches by the type of graphs taken as input and further organize each category by core methodology.
A Graph Summarization: A Survey
TLDR
This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data, and categorizes summarization approaches by the type of graphs taken as input and further organize each category by core methodology.
Direction Matters: On Influence-Preserving Graph Summarization and Max-Cut Principle for Directed Graphs
TLDR
A model, based on minimizing reconstruction error with nonnegative constraints, which relates to a Max-Cut criterion that simultaneously identifies the compressed nodes and the directed compressed relations between these nodes is presented and a multiplier update algorithm with column-wise normalization is proposed.
Graph Summarization Methods and Applications
TLDR
This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data, and categorizes summarization approaches by the type of graphs taken as input and further organize each category by core methodology.
The k-peak Decomposition: Mapping the Global Structure of Graphs
TLDR
This work presents a novel graph decomposition - the k-peak decomposition- and corresponding algorithm, and performs a theoretical analysis of its properties, and describes a new visualization method, the "Mountain Plot", which can be used to better understand the global structure of a graph.
Modeling Graphs with Vertex Replacement Grammars
TLDR
This work revises a different graph grammar formalism called Vertex Replacement Grammars (VRGs), and shows that a variant of the VRG called Clustering-based Node Replacement Grammar (CNRG) can be efficiently extracted from many hierarchical clusterings of a graph.
Graph-Partitioning-Based Diffusion Convolutional Recurrent Neural Network for Large-Scale Traffic Forecasting
TLDR
This approach uses a graph-partitioning method to decompose a large highway network into smaller networks and trains them independently and demonstrates that the DCRNN model can be used to forecast the speed and flow simultaneously and that the forecasted values preserve fundamental traffic flow dynamics.
The Minimum Description Length Principle for Pattern Mining: A Survey
TLDR
The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns, and methods for mining various types of data and patterns are reviewed.

References

SHOWING 1-10 OF 34 REFERENCES
Graph Clustering Based on Structural/Attribute Similarities
TLDR
This paper proposes a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure, which partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values.
VOG: Summarizing and Understanding Large Graphs
TLDR
The main ideas are to construct a "vocabulary" of sub graph-types that often occur in real graphs, and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary.
Graph summarization with bounded error
TLDR
This is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.
Summarizing and understanding large graphs
TLDR
This work identifies the optimal summarization using the minimum description length (MDL) principle, picking only those subgraphs from the candidates that together yield the best lossless compression of the graph—or, equivalently, that most succinctly describe its adjacency matrix.
A model-based approach to attributed graph clustering
TLDR
This paper develops a Bayesian probabilistic model for attributed graphs that provides a principled and natural framework for capturing both structural and attribute aspects of a graph, while avoiding the artificial design of a distance measure.
Evaluating Cooperation in Communities with the k-Core Structure
TLDR
The k-core concept, which essentially measures the robustness of a community under degeneracy, is extended to weighted graphs, devising a novel concept of k-cores on weighted graphs and applied on large real world graphs -- such as DBLP and report interesting results.
Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining
  • U. Kang, C. Faloutsos
  • Computer Science
    2011 IEEE 11th International Conference on Data Mining
  • 2011
TLDR
This work proposes the Slash Burn method (burn the hubs, and slash the remaining graph into smaller connected components), which avoids the `no good cuts' problem, gives better compression, and leads to faster execution times for matrix-vector operations, which are the back-bone of most graph processing tools.
Empirical comparison of algorithms for network community detection
TLDR
Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.
Overlapping community detection at scale: a nonnegative matrix factorization approach
TLDR
This paper presents BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges and builds on a novel observation that overlaps between communities are densely connected.
Community detection in graphs
...
...