Scalable and Memory-Efficient Clustering of Large-Scale Social Networks

Joyce Whang, Xin Sui, Inderjit Dhillon

Abstract:   Clustering of social networks is an important task for their analysis; however, most existing algorithms do not scale to the massive size of today’s social networks. A popular class of graph clustering algorithms for large-scale networks, such as PMetis, KMetis and Graclus, is based on a multilevel framework. Generally, these multilevel algorithms work reasonably well on networks with a few million vertices. However, when the network size increases to the scale of 10 million vertices or greater, the performance of these algorithms rapidly degrades. Furthermore, an inherent property of social networks, the power law degree distribution, makes these algorithms infeasible to apply to large-scale social networks. In this paper, we propose a scalable and memory-efficient clustering algorithm for large-scale social networks. We name our algorithm GEM, by mixing two key concepts of the algorithm, Graph Extraction and weighted kernel k-Means. GEM efficiently extracts a good skeleton graph from the original graph, and propagates the clustering result of the extracted graph to the rest of the network. Experimental results show that GEM produces clusters of quality comparable to or better than existing state-of-the-art graph clustering algorithms, while it is much faster and consumes much less memory. Furthermore, the parallel implementation of GEM, called PGEM, not only produces higher quality of clusters but also achieves much better scalability than most current parallel graph clustering algorithms.

Download: pdf, slides


  • Scalable and Memory-Efficient Clustering of Large-Scale Social Networks (pdf, slides)
    J. Whang, X. Sui, I. Dhillon.
    In IEEE International Conference on Data Mining (ICDM), pp. 705-714, December 2012. (Oral)