Geographically Organized Small Communities and the Hardness of Clustering Social Networks


Spectral clustering, while perhaps the most efficient heuristics for graph partitioning, has recently gathered bad reputation for failure over large-scale power law graphs. In this chapter we identify the abundance of small-size communities connected by long tentacles as the major obstacle for spectral clustering. These subgraphs hide the higher level structure and result in a highly degenerate adjacency matrix with several hundreds of eigenvalues very close to 1. Our results on clustering social networks, telephone call graphs, and Web graphs are twofold. (1) We show that graphs generated by existing social network models are not as difficult to cluster as they are in the real world. For this end we give a new combined model that yields degenerate adjacency matrices and hard-to-partition graphs. (2) We give heuristics for spectral clustering for large-scale real-world social networks that handle tentacles and small dense communities. Our algorithm outperforms all previous methods for power law graph partitioning both in speed and in cluster quality. In a combination of heuristics for the contraction of tentacles as well as the removal of community cores that involve the recent SCAN (Structural Clustering Algorithm for Networks) algorithm, we are able to efficiently find balanced partitioning of over 10 million edge power law graphs. In particular, our heuristics promise similar or better performance than semidefinite relaxation with orders of magnitude lower running time. 10.

DOI: 10.1007/978-1-4419-6287-4_10

9 Figures and Tables

Cite this paper

@inproceedings{Kurucz2010GeographicallyOS, title={Geographically Organized Small Communities and the Hardness of Clustering Social Networks}, author={Mikl{\'o}s Kurucz and Andr{\'a}s A. Bencz{\'u}r}, booktitle={Data Mining for Social Network Data}, year={2010} }