News Release

The first scalable motif-based graph local clustering algorithm for real-world massive graphs

Peer-Reviewed Publication

Higher Education Press

The framework of the Index-free Triangle-based Graph Local Clustering (TGLC*)

image: 

The framework of the Index-free Triangle-based Graph Local Clustering (TGLC*).

view more 

Credit: Zhe YUAN, Zhewei WEI, Fangrui LV, Ji-Rong WEN

Motif-based graph local clustering is a popular method for graph mining tasks due to its various applications, such as community detection, network optimization and graph learning. However, the traditional two-phase approach of precomputing motif weights before performing graph local clustering loses locality and thus is impractical for real-world massive graphs.

To solve the problems, a research team led by Zhewei WEI published their new research on 15 June 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposed an index-free and purely local motif-based graph local clustering algorithm with developing effectiveness and efficient Monte-Carlo sampling techniques. The correctness and efficiency of the algorithm are proved with a series of theoretical results and demonstrated by extensive experiments on seven widely used real-world graph benchmarks. Compared with the existing research results, the proposed algorithm is the first one to be applicable and scalable on massive graphs.

In the research, they focused on the scalability challenges for motif-based graph local clustering and attributed the core problem to the motif-based graph weighting process. In order to avoid the time-consuming pre-computing operation to improve the scalability while keep the algorithm correct, they followed the widely used graph local clustering framework Nibble and devised novel triangle-based adaptive rejection sampling technique to outcome the proximity score vector towards the seed node in Monte-Carlo manner. With the score vector, they conduct standard sweep process to get the final clustering result. The practicality, scalability and consistency of the proposed algorithm is demonstrated by extensive experiments on seven massive graphs under both topology-based metric (conductance) and information-based metric (F1-score). Besides, they devise a novel visualization layout to better demonstrate the empirical comparisons by defining the performance distribution and its density. The layout can reveal the characters of graph local clustering algorithms with their visual patterns and being more informative and convincing.

Future work can focus on developing methods to choosing the optimal proximity metrics rather than just the Personalized PageRank (PPR) and generalizing to various motifs or even arbitrary high-order structures rather than just the triangle.

DOI: 10.1007/s11704-023-2768-7


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.