这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Pasco (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: (1) We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. (2) A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. (3) These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

The code to reproduce the experiments is available at https://github.com/elasalle/PASCO

Notes

  1. For example, permuting the columns yields different representations of the same partition.

  2. The use of \(\alpha\) as a parameter is relevant for the experiments done afterwards, as we then keep the density fixed while varying the difficulty level \(\alpha\) to recover the blocks.

References

  • Ayad, H. G., Kamel, M. S. (2010). On voting-based consensus of cluster ensembles. Pattern Recognition43(5)

  • Blondel, V. D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment2008(10)

  • Blondel, M., Seguy, V., Rolet, A. (2018). Smooth and sparse optimal transport. In AISTATS

  • Boutsidis, C., Kambadur, P., Gittens, A. (2015). Spectral clustering via the power method-provably. In ICML

  • Bunimovich, L., Smith, D., Webb, B. (2019). Finding hidden structures, hierarchies, and cores in networks via isospectral reduction. Applied Mathematics & Nonlinear Sciences4(1)

  • Chen, J., Saad, Y., & Zhang, Z. (2022). Graph coarsening: From scientific computing to machine learning. SeMA Journal, 79(1), 187–223.

    Article  MathSciNet  MATH  Google Scholar 

  • Cuturi, M., Doucet, A. (2014). Fast computation of wasserstein barycenters. In ICML

  • Dhillon, I.S., Guan, Y., Kulis, B. (2004). Kernel k-means: Spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining

  • Dhillon, I. S., Guan, Y., Kulis, B. (2007). Weighted graph cuts without eigenvectors a multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence29(11)

  • Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., et al. (2021). Pot: Python optimal transport. JMLR22(78)

  • Fortunato, S. (2010). Community detection in graphs. Physics Reports486.

  • Fortunato, S., Hric, D. (2016). Community detection in networks: A user guide. Physics Reports659. Community detection in networks: A user guide

  • Fred, A. (2001). Finding consistent clusters in data partitions. In International Workshop on Multiple Classifier Systems

  • Ghosh, J., Acharya, A. (2018). Cluster ensembles: Theory and applications. Data Clustering

  • Gottesbüren, L., Heuer, T., Sanders, P., Schulz, C., Seemaier, D. (2021). Deep multilevel graph partitioning. arXiv preprint arXiv:2105.02022

  • Hendrickson, B., Leland, R. W., et al. (1995). A multi-level algorithm for partitioning graphs. SC95(28)

  • Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., Leskovec, J. (2020). Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687

  • Jin, Y., Loukas, A., JaJa, J. (2020). Graph coarsening with preserved spectral properties. In International Conference on Artificial Intelligence and Statistics, pp. 4452–4462. PMLR

  • Karataş, A., Şahin, S. (2018). Application areas of community detection: A review. In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT)

  • Karypis, G., Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing20(1)

  • Lasalle, E., Vaudaine, R., Vayer, T. PASCO. https://hal.science/hal-05086352

  • Li, Y., Nie, F., Huang, H., Huang, J. (2015). Large-scale multi-view spectral clustering via bipartite graph. In AAAI

  • Li, J., Seo, B., Lin, L. (2019). Optimal transport, mean partition, and uncertainty assessment in cluster analysis. Statistical Analysis and Data Mining: The ASA Data Science Journal12(5)

  • Liu, D. C., Nocedal, J. (1989). On the limited memory bfgs method for large scale optimization. Mathematical Programming45(1)

  • Loukas, A. (2019). Graph reduction with spectral and cut guarantees. Journal of Machine Learning Research20(116)

  • Loukas, A., Vandergheynst, P. (2018). Spectrally approximating large graphs with smaller graphs. In ICML

  • Newman, M. E. J. (2010). Networks: An introduction

  • Newman, M. E., Girvan, M. (2004). Finding and evaluating community structure in networks. Physical review E69

  • Peixoto, T. P. (2014). Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Physical Review E89(1)

  • Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning11(5-6)

  • Pourkamali-Anaraki, F. (2020). Scalable spectral clustering with nyström approximation: Practical and theoretical aspects. IEEE Open Journal of Signal Processing1

  • Quemener, E., & Corvellec, M. (2013). Sidus-the solution for extreme deduplication of an operating system. Linux Journal, 2013(235), 3.

    Google Scholar 

  • Rissanen, J. (2007). Information and complexity in statistical modeling

  • Rossi, R. A., Ahmed, N. K. (2015). The network data repository with interactive graph analytics and visualization. AAAI

  • Rosvall, M., Bergstrom, C. T. (2011). Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLOS ONE6

  • Rotta, R., & Noack, A. (2011). Multilevel local search algorithms for modularity clustering. Journal of Experimental Algorithmics (JEA), 16, 2–1.

    MathSciNet  Google Scholar 

  • Spielman, D. A., Srivastava, N. (2008). Graph sparsification by effective resistances. In Proceedings of the fortieth annual ACM symposium on theory of computing

  • Spielman, D. A., Teng, S.-H. (2011). Spectral sparsification of graphs. SIAM Journal on Computing40(4)

  • Traag, V. A., Waltman, L., Van Eck, N. J. (2019). From louvain to leiden: guaranteeing well-connected communities. Scientific Reports9(1)

  • Tremblay, N., Loukas, A. (2020). Approximating spectral clustering via sampling: A review. Sampling techniques for supervised or unsupervised tasks

  • Tremblay, N., Puy, G., Gribonval, R., Vandergheynst, P. (2016). Compressive spectral clustering. ICML

  • Vinh, N. X., Epps, J., Bailey, J. (2009). Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In ICML

  • Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing17

  • Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B, 86, 1–14.

    Article  Google Scholar 

  • Wu, J., Liu, H., Xiong, H., Cao, J., Chen, J. (2014). K-means-based consensus clustering: A unified view.IEEE Transactions on Knowledge and Data Engineering27(1)

  • Yan, D., Huang, L., Jordan, M. I. (2009). Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Acknowledgements

This work has been supported by the DATAREDUX project, ANR19-CE46-0008, and the ANR DARLING project ANR-19-CE48-0002. This project was supported in part by the AllegroAssai ANR project ANR-19-CHIA0009.

Author information

Authors and Affiliations

Authors

Contributions

E.L., R.V. and T.V. contributed to the code and figures and wrote most of the manuscript. All authors participated in the decisions that led to the final version of the manuscript and reviewed it.

Corresponding author

Correspondence to Lasalle Etienne.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Editors: Maria Concepcion Bielza Lozoya, Ana Carolina Lorena, Tiago A. Almeida.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Finding consensus between partitions

In this section we describe the different approaches that we investigated for aligning different partitions into a reference partition. We recall that the problem reformulates as finding the partition \(\overline{P}\) that best agrees with all \((P_r)\) with \({r \in {[\![R]\!]}}\). Methods from the clustering ensemble literature already propose to solve such a problem. Mathematically, they amount to find a Frechet mean of the partitions \((P_r)\) for various notions of divergence \(\mathcal {D}\) between partitions

$$\begin{aligned} P\in \underset{\overline{P}\in \mathcal {P}_{N, \overline{k}} }{\arg \min } \frac{1}{R} \sum \limits _{i=1}^R \mathcal {D}(P_r, \overline{P}) . \end{aligned}$$
(A1)

Depending on the choice of divergence \(\mathcal {D}\), we can obtain different consensus methods, as detailed below.

Linear regression and many-to-one

These two methods are based on a similar notion of divergence between partitions. Given two partitions \(P\in \mathcal {P}_{N, k}, \overline{P}\in \mathcal {P}_{N, \overline{k}}\), it is defined as

$$\begin{aligned} \mathcal {D}( P, \overline{P}) = \min _{Q \in \mathcal {Q}(k, \overline{k})} \Vert PQ - \overline{P}\Vert _{F}^2 \end{aligned}$$
(A2)

Depending on the choice for the set \(\mathcal {Q}(k, \overline{k})\) we get different methods:

  • When we simply set \(\mathcal {Q}(k, \overline{k}) = \mathcal {Q}_{\operatorname {lin-reg}}(k, \overline{k}) = \mathbb {R}^{k \times \overline{k}}\), the optimal matrix Q is given by (see Lemma 3)

    $$\begin{aligned} Q_{\operatorname {lin-reg}} =(P^\top P)^{-1}P^\top \overline{P}= \operatorname {diag}(P^\top {\textbf{1}}_N)^{-1} P^\top \overline{P}, \end{aligned}$$
    (A3)

    where \(\operatorname {diag}(P^\top {\textbf{1}}_N)^{-1}\) corresponds to the diagonal matrix containing the inverse of the cluster sizes. This realignment is proposed in Ayad and Kamel (2010). Even though it allows to compute the divergence \(\mathcal {D}( P, \overline{P})\), this matrix yields a “re-aligned partition” \(PQ_{\operatorname {lin-reg}}\) which is only a “soft-partition”, with elements in [0, 1]. We refer to this solution as lin-reg.

  • If one wants to obtain a true partition (with elements in \(\{0,1\}\)), a solution is to restrict the matrix in \(\mathcal {Q}(k, \overline{k})\) to send each cluster of \(P\) to at most one cluster of \(\overline{P}\). For that we define \(\mathcal {Q}_{\operatorname {many-to-one}}(k, \overline{k}) = \{Q \in \{0,1\}^{k \times \overline{k}}, Q {\textbf{1}}_{\overline{k}} = {\textbf{1}}_{k} \}\). The solution of (A2) with \(\mathcal {Q}_{\operatorname {many-to-one}}(k, \overline{k})\) is given (see Lemma 4) by \(\operatorname {row-bin}(Q_{\operatorname {lin-reg}})\), where \(\operatorname {row-bin}\) is the operator that returns a binary matrix of same shape, where each row contains only zeros except at the position of the maximum in the corresponding row of the input matrix. We refer to this solution as many-to-one.

OT-based method To prevent empty clusters in the realigned partition, we also consider \(\mathcal {Q}(k,\overline{k}) = \mathcal {Q}_{\operatorname {ot}}(k,\overline{k})\) in (A2). With these constraints, the alignment problem becomes an OT problem which can be related to a specific quadratic regularized OT problem (Blondel et al., 2018) which admits efficient convex solvers, as detailed in the following lemma (proof can be found in Sect. D).

Lemma 1

Let \({P} \in \mathcal {P}_{N, k}\) and \(D = \operatorname {diag}(P^\top {\textbf{1}}_N)\). Then problem (A2) with \(\mathcal {Q}(k, \overline{k}) = \mathcal {Q}_{\operatorname {ot}}(k, \overline{k})\) is equivalent to the problem

$$\begin{aligned} \min _{Q \in \mathcal {Q}_{\operatorname {ot}}(k, \overline{k})} \Vert D^{\frac{1}{2}} Q\Vert _F^2 - 2 \langle C, Q \rangle , \end{aligned}$$
(A4)

where \(C = P^\top \overline{P}\). Assuming no empty cluster in the partition \(P\), the solution of Eq. (A4) is unique and can be solved by considering the dual problem

$$\begin{aligned} \max _{\mu \in \mathbb {R}^{k}, \ \nu \in \mathbb {R}^{\overline{k}}} \ \frac{1}{k} \sum _i \mu _i + \frac{1}{\overline{k}} \sum _j \nu _j -\frac{1}{4}\Vert D^{-\frac{1}{2}}[\mu \oplus \nu +2 C]_{+}\Vert _F^2, \end{aligned}$$
(A5)

where \(\mu \oplus \nu := (\mu _i + \nu _j)_{ij}\) and for any matrix \(A, \ [A]_{+}:= \left( \max \{A_{ij}, 0\}\right) _{ij}\). More precisely, the optimal solution \(Q^\star\) of Eq. (A4) can be written as \(Q^{\star } = \frac{1}{2} D^{-1}[\mu ^\star \oplus \nu ^\star + 2 C]_{+}\) where \((\mu ^\star , \nu ^\star )\) is the optimal solution of Eq. (A5).

Building upon this result we can solve Eq. (A4) by tackling the dual Eq. (A5) which is a convex unconstrained problem of two variables (maximization of a concave function). This expression allow us to use any convex solver, and, as suggested in Blondel et al. (2018), we rely on L-BFGS (Liu & Nocedal, 1989) that we find particularly effective in practice. We call this alignment procedure quad-ot which has roughly a \(\mathcal {O}(N K^2)\) complexity.

Remark 1

The only difference between Eq. (A4) and standard quadratic OT problem of the form \(\min _Q \langle M, Q\rangle + \frac{\gamma }{2} \Vert Q\Vert _F^2\) is that in our case the regularization term is \(\Vert D^{1/2}Q\Vert _F^2\). This is equivalent to consider a Mahalanobis type regularization instead of a \(\ell ^2_2\) one.

Solving for the barycenter

To solve the barycenter problem in (A1) with a distance of the form of (A2), we alternate between finding the alignment matrices \(Q_r\) as explained above and updating the reference \(\overline{P}\) as in (3). This corresponds to an alternating minimization algorithm, where we alternate between (i) realigning the R partitions on the reference and (ii) updating the reference. The reference update is based on Lemma 5 which states that finding the closest partition matrix to a set a (realigned) matrices is given by the majority-vote update, see Eq. (3).

(Additional theoretical results can be found in Sect. D.)

Appendix B: Other edge sampling rules

In this section, we present edge sampling rules that we came up with while trying to design an fast coarsening procedure. They were not satisfying but we choose to present them here and explain their drawbacks, as we believe it is informative to better understand the coarsening algorithm we propose in the end. To the best of our knowledge, these strategies were not considered systematically in previous works to coarsen graphs.

Uniform edge sampling This approach might be the most natural one. It consists in choosing edges uniformly at random in the graph and collapsing them. Doing so favors edges incident to high degree nodes, resulting after collapsing to an even higher degree hypernode. This amplification phenomenon yields an unbalanced final coarsened graph that contains one huge hypernode and all other hypernodes containing only a few nodes. This is problematic as it would not express well the community structure of the initial graph.

Uniform edge sampling with marked neighboring edges This approach fixes the above issue. To avoid collapsing onto almost always the same hypernode, when an edge (ij) is collapsed, we mark the edges incident to nodes i and j, and we sample edges uniformly at random among unmarked edges. This solves the issue of unbalancedness. However, the algorithm quickly runs out of unmarked edges and forces the early computation of the next current coarsened graph.

While this can be done with sparse matrix products between the adjacency matrix and the matrix encoding the composition of the hypernodes, it remains costly and should be avoided as much as possible.

Uniform edge sampling with marked visited nodes Here, we want to relax the limit imposed by the previous approach with marked edges. First, recall that sampling an edge uniformly at random is equivalent to sampling a node i with a probability proportional to its degree and sampling a neighbor j uniformly, see e.g., ((Newman, 2010), Section 6.14). So instead of discarding edge (ij) whenever either i or j has been used in a previous collapse, a natural relaxation is to reject the edge (ij) only when i has been previously involved in a collapse, irrespective of whether j was involved or not in such a collapse. Simulations showed that when using this sampling strategy (in step 6 of Algorithm 2) to generate coarsening tables \(h^\ell\), the hypernodes size distribution was similar to the one with the edge marking strategy, but resulted in much less intermediate coarsened graph reconstructions.

Uniform node sampling with marked visited nodes A final improvement to speedup the sampling procedure is to sample the first node i uniformly at random among non-visited nodes instead of according to its degree (the second node j being still draw uniformly among the neighbors of i). Computationally, this avoids updating the degrees after each collapse and further speeds up the coarsening procedure. The impact of sampling uniformly instead of according to the degrees stays limited thanks to the friendship paradox. This results in Algorithm 2.

Appendix C: Complexity of the coarsening phase in PASCO

Lemma 2

The complexity of Algorithm 2 is \(\mathcal {O}(n^{(\ell )} + N_{\operatorname {while}}^{(\ell )} + E^{\ell } )\) where \(N_{\operatorname {while}}^{(\ell )}\) is the number of time the while loop is being executed and \(E^{(\ell )}\) is the number of non-zero coefficients in the adjacency matrix \(A^{(\ell )}\).

Proof

The initializations of \(V_a\) and \(h^{(\ell )}\) cost \(\mathcal {O}(n^{(\ell )})\). Drawing the nodes u and v costs \(\mathcal {O}(1)\) as they are sampled uniformly at random. Updating \(h^{(\ell )}\) (step 7) has complexity \(\mathcal {O}(1)\). Similarly, updating \(V_a\) can be performed in \(\mathcal {O}(1)\) by updating a boolean array that keeps tracks of which nodes are in \(V_a\). Moreover, relabeling \(h^{(\ell )}\) has complexity \(\mathcal {O}(n^{(\ell )})\). Finally, computing \(A^{(\ell +1)}\) can be performed in \(E^{(\ell )}\) by iterating over the non-zero coefficients of \(A^{(\ell )}\). Summarizing, we have that the complexity of Algorithm 2 is \(\mathcal {O}(n^{(\ell )} + N_{\operatorname {while}}^{(\ell )} + E^{\ell })\). \(\square\)

Now we prove Proposition 1.

Proof of Proposition 1

The initialization of Algorithm 1 (essentially creating h) has complexity \(\mathcal {O}(N)\).

Now let us denote by \(N_{rep}\) the number of times the while loop is repeated. According to Lemma 2, the overall complexity of the while loop in Algorithm 1 is \(\mathcal {O}( N_{rep} ( N + |E| ) )\). This is obtained by using upperbounds of \(n^{(\ell )}\) and \(N_{\operatorname {while}}^{(\ell )}\) by N, and \(E^{\ell }\) by |E| where |E| is the number of non-zero coefficient in the initial adjacency matrix \(\mathbb {A}\). Without loss of generality, we can always assume that the graph has no isolated node (the clustering problem being irrelevant for these nodes), therefore the complexity of the while loop reduces to \(\mathcal {O}( N_{rep} |E| )\).

Now let us prove that \(N_{rep} \le 1 + \log \rho\). In Algorithm 2, remark that the while loop may stop for two reasons. First, the target size is reached, meaning that \(N_{\operatorname {while}}^{(\ell )} = n^{(\ell )}-n\). This only happens for \(\ell = N_{rep} -1\). Second, the set of available nodes \(V_a\) is empty. In this case, knowing that at each iteration we can remove either 1 or 2 elements from \(V_a\) (except at the first iteration where 2 are removed), we have \(n^{(\ell )} / 2 \le N_{\operatorname {while}}^{(\ell )} \le n^{(\ell )} -1\). This is the case for every iteration \(\ell \le N_{rep} -2\). Thus, for all \(\ell \le N_{rep} -2\), we have that \(n^{(\ell +1)} = n^{(\ell )} - N_{\operatorname {while}}^{(\ell )} \le n^{(\ell )} / 2\). Therefore, for all \(\ell \le N_{rep}-1\), we have \(n^{(\ell )} \le (1/2)^{\ell } N\) (as \(n^{(0)}=N\)). Now, remark that \(n^{(N_{rep}-1)} > n\), otherwise the coarsening would already have stopped. Recalling that \(n = \lfloor \rho ^{-1} N \rfloor\) we have \(\rho ^{-1} N \le (1/2)^{N_{rep}-1} N\), yielding \(N_{rep} \le 1 + \log \rho\).

Appendix D: Relegated theoretical results

Lemma 3

Let \({P} \in \mathcal {P}_{N, k}\) be a partition matrix (Definition 1), and assume that \(\forall i \in {[\![k]\!]}, [{P}^\top {\textbf{1}}_N]_i \ne 0\). Then, for any integer \(\overline{k}\) and any matrix \(\overline{P}\in \mathbb {R}^{N \times \overline{k}}\), with \(\mathcal {Q}(k, \overline{k}) = \mathbb {R}^{k \times \overline{k}}\), \(Q^\star = \operatorname {diag}({P}^\top {\textbf{1}}_N)^{-1}{P}^\top \overline{P}\) is an optimal solution to problem (A2).

Proof

Denoting \(f(Q) = \Vert {P} Q-\overline{P}\Vert _F^2\). Since \({P}\) is a partiton matrix its columns have pairwise disjoint support and we have \(P^\top P=\operatorname {diag}({P^\top }{\textbf{1}}_N)\) hence

$$\begin{aligned} f(Q) = \Vert \overline{P}\Vert _F^2 -2 \langle Q, {P}^\top \overline{P}\rangle + \operatorname {tr}(Q^\top {P}^\top {P} Q) = \Vert \overline{P}\Vert _F^2 + \Vert \operatorname {diag}({P}^\top {\textbf{1}}_N)^{\frac{1}{2}}Q\Vert _F^2- 2 \langle Q, {P}^\top \overline{P}\rangle . \end{aligned}$$
(D6)

The optimization problem is convex, setting the gradient of f to zero gives the solution. \(\square\)

Lemma 4

Let \({P} \in \mathcal {P}_{N, k}\) be a partition matrix (Definition 1), \(\overline{P} \in \mathbb {R}^{N \times \overline{k}}\), and \(\mathcal {Q}(k, \overline{k}) = \{Q \in \{0,1\}^{k \times \overline{k}}: Q{\textbf{1}}_{\overline{k}} = {\textbf{1}}_k\}\). Then \(Q^{\star }\) defined by

$$\begin{aligned} \forall i \in {[\![k]\!]}, Q^{\star }_{ij} = {\left\{ \begin{array}{ll} 1 & j \in \operatorname {argmax}_{p \in {[\![\overline{k}]\!]}} [{P}^\top \overline{P}]_{i p} \\ 0 & \text {otherwise} \end{array}\right. }, \end{aligned}$$
(D7)

is an optimal solution to problem (A2).

Proof

Denoting \(f(Q) = \Vert PQ-\overline{P}\Vert _F^2\) and using that \({P}\) is a partition matrix, that \(Q_{ij} \in \{0,1\}\) (hence \(Q_{ij}^2=Q_{ij}\)) and \(Q {\textbf{1}}_{\overline{k}} = {\textbf{1}}_k\) we can rewrite f as

$$\begin{aligned} \begin{aligned} f(Q)&= \Vert \overline{P}\Vert _F^2 + \Vert \operatorname {diag}({P}^\top {\textbf{1}}_N)^{\frac{1}{2}}Q\Vert _F^2- 2 \langle Q, {P}^\top \overline{P}\rangle \\&= \Vert \overline{P}\Vert _F^2 + \sum _{ij} [{P}^\top {\textbf{1}}_N]_{i} Q_{ij}^2 - 2 \langle Q, {P}^\top \overline{P}\rangle \\&= \Vert \overline{P}\Vert _F^2 + \sum _{i=1}^{k} \sum _{j=1}^{\overline{k}} [{P}^\top {\textbf{1}}_N]_{i} Q_{ij} - 2 \langle Q, {P}^\top \overline{P}\rangle \\&= \Vert \overline{P}\Vert _F^2 + \sum _{i=1}^{k} [{P}^\top {\textbf{1}}_N]_{i} (\sum _{j=1}^{\overline{k}} Q_{ij}) - 2 \langle Q, {P}^\top \overline{P}\rangle \\&= \Vert \overline{P}\Vert _F^2 + \sum _{i=1}^{k} [{P}^\top {\textbf{1}}_N]_{i} - 2 \langle Q, {P}^\top \overline{P}\rangle \\&= \Vert \overline{P}\Vert _F^2 + N - 2 \langle Q, {P}^\top \overline{P}\rangle \end{aligned} \end{aligned}$$
(D8)

Denoting \(C = - {P}^\top \overline{P}\), a solution to problem (A2) can thus be found by solving

$$\begin{aligned} \min _{Q \in \{0,1\}^{k \times \overline{k}}: Q {\textbf{1}}_{\overline{k}} = {\textbf{1}}_k} \ \langle Q, C \rangle . \end{aligned}$$
(D9)

Now Eq. (D9) is an optimization problem that decouples with respect to the rows of Q, i.e. there are k independent problems per row of Q. For each row \(i \in {[\![k]\!]}\), a solution can be found by choosing any column index j such that \(j \in \operatorname {argmin}_{p \in {[\![\overline{k}]\!]}} C_{ip}\) This condition is equivalent to find j such that \(j \in \operatorname {argmax}_{p \in {[\![\overline{k}]\!]}} [{P}^\top \overline{P}]_{i p}\). \(\square\)

Lemma 5

Let \(P^{(1)}, \cdots , P^{(R)}\) where each \(P^{(r)} \in \mathbb {R}^{N \times \overline{k}}\). Then a solution to

$$\begin{aligned} \underset{\overline{P}\in \mathcal {P}_{N, \overline{k}}}{\operatorname {min}} \ \frac{1}{R} \ \sum _{r=1}^{R} \Vert \overline{P}- P^{(r)}\Vert _F^2. \end{aligned}$$
(D10)

is given by

$$\begin{aligned} \forall i \in {[\![N]\!]}, \ [\overline{P}]_{ij} = {\left\{ \begin{array}{ll} 1 & j \in \underset{p \in {[\![\overline{k}]\!]}}{\operatorname {argmax}} \ [\sum _{r=1}^{R} P^{(r)}]_{ip} \\ 0 & \text {otherwise} \end{array}\right. } \end{aligned}$$
(D11)

Now let \(P^{(1)}, \cdots , P^{(R)}\) where each \(P^{(r)} \in \mathbb {R}^{N \times k_r}\) and \(Q^{(1)}, \cdots , Q^{(R)}\) be coupling matrices such that each \(Q^{(r)} \in \mathcal {Q}_{\operatorname {ot}}(k_r, \overline{k})\). Then a solution to

$$\begin{aligned} \underset{\overline{P}\in \mathcal {P}_{N, \overline{k}}}{\operatorname {min}} \ \frac{1}{R} \ \sum _{r=1}^{R} \sum _{i,j=1}^{k, \overline{k}} \Vert P^{(r)}_{:,i} - \overline{P }_{:,j}\Vert _2^2 Q^{(r)}_{i,j}. \end{aligned}$$
(D12)

is given by

$$\begin{aligned} \forall i \in {[\![N]\!]}, \ [\overline{P}]_{ij} = {\left\{ \begin{array}{ll} 1 & j \in \underset{p \in {[\![\overline{k}]\!]}}{\operatorname {argmax}} \ [\sum _{r=1}^{R} P^{(r)} Q^{(r)}]_{ip} \\ 0 & \text {otherwise} \end{array}\right. } \end{aligned}$$
(D13)

Proof

For the first point, take \(\overline{P}\) in the constraints. Since it is a partition matrix we have \(\Vert \overline{P}\Vert _F^2 = \sum _{ij} \overline{P}_{ij}^2 = \sum _{ij} \overline{P}_{ij} = {\textbf{1}}_N^\top \overline{P}{\textbf{1}}_{\overline{k}} = N\). Thus problem (D10) is equivalent to

$$\begin{aligned} \underset{\overline{P}\in \mathcal {P}_{N, \overline{k}}}{\operatorname {min}} \ \langle \overline{P}, - \sum _{r=1}^{R} P^{(r)}\rangle . \end{aligned}$$
(D14)

As detailed in the proof of Lemma 4 a solution can be found by choosing the index of the column j such that \(j \in \operatorname {argmin}_{p \in {[\![\overline{k}]\!]}} [- \sum _{r=1}^{R} P^{(r)}]_{ip}\) which concludes the proof for the first point. For the second point, we use that \(\overline{P}\) is a partition matrix and \(Q^{(r)}\) are coupling matrices so that

$$\begin{aligned} \begin{aligned} \sum _{r=1}^{R} \sum _{i,j=1}^{k, \overline{k}} \Vert P^{(r)}_{:,i} - \overline{P}_{:,j}\Vert _2^2 Q^{(r)}_{i,j}&= \sum _{r=1}^{R} \sum _{i,j=1}^{k, \overline{k}} (\Vert P^{(r)}_{:,i}\Vert _2^2 - 2 \langle P^{(r)}_{:,i}, \overline{P}_{:,j}\rangle + \Vert \overline{P}_{:,j}\Vert _2^2)Q^{(r)}_{i,j} \\&=\text {cte} - 2 \sum _{r=1}^{R} \sum _{i,j=1}^{k, \overline{k}} \langle P^{(r)}_{:,i}, \overline{P}_{:,j}\rangle Q^{(r)}_{i,j} + \sum _{r=1}^{R} \sum _{i,j=1}^{k, \overline{k}} \Vert \overline{P}_{:,j}\Vert _2^2 Q^{(r)}_{i,j} \\&=\text {cte} - 2 \sum _{r=1}^{R} \langle \overline{P}, P^{(r)} Q^{(r)} \rangle + \sum _{r=1}^{R} \sum _{j=1}^{\overline{k}} \Vert \overline{P}_{:,j}\Vert _2^2 \sum _{i=1}^{k} Q^{(r)}_{i,j} \\&=\text {cte} - - 2 \sum _{r=1}^{R} \langle \overline{P}, P^{(r)} Q^{(r)} \rangle + \frac{R}{\overline{k}}\Vert \overline{P}\Vert _F^2. \end{aligned} \end{aligned}$$
(D15)

Using that \(\Vert \overline{P}\Vert _F^2 = N\) as previously proved, we get that the problem is equivalent to

$$\begin{aligned} \underset{\overline{P}\in \mathcal {P}_{N, \overline{k}}}{\operatorname {min}} \ \langle \overline{P}, - \sum _{r=1}^{R} P^{(r)} Q^{(r)}\rangle , \end{aligned}$$
(D16)

hence the result. \(\square\)

Lemma 1

Let \({P} \in \mathcal {P}_{N, k}\) and \(D = \operatorname {diag}(P^\top {\textbf{1}}_N)\). Then problem (A2) with \(\mathcal {Q}(k, \overline{k}) = \mathcal {Q}_{\operatorname {ot}}(k, \overline{k})\) is equivalent to the problem

$$\begin{aligned} \min _{Q \in \mathcal {Q}_{\operatorname {ot}}(k, \overline{k})} \Vert D^{\frac{1}{2}} Q\Vert _F^2 - 2 \langle C, Q \rangle , \end{aligned}$$
(A4)

where \(C = P^\top \overline{P}\). Assuming no empty cluster in the partition \(P\), the solution of Eq. (A4) is unique and can be solved by considering the dual problem

$$\begin{aligned} \max _{\mu \in \mathbb {R}^{k}, \ \nu \in \mathbb {R}^{\overline{k}}} \ \frac{1}{k} \sum _i \mu _i + \frac{1}{\overline{k}} \sum _j \nu _j -\frac{1}{4}\Vert D^{-\frac{1}{2}}[\mu \oplus \nu +2 C]_{+}\Vert _F^2, \end{aligned}$$
(A5)

where \(\mu \oplus \nu := (\mu _i + \nu _j)_{ij}\) and for any matrix \(A, \ [A]_{+}:= \left( \max \{A_{ij}, 0\}\right) _{ij}\). More precisely, the optimal solution \(Q^\star\) of Eq. (A4) can be written as \(Q^{\star } = \frac{1}{2} D^{-1}[\mu ^\star \oplus \nu ^\star + 2 C]_{+}\) where \((\mu ^\star , \nu ^\star )\) is the optimal solution of Eq. (A5).

Proof

We will prove a slightly more general result by considering the problem

$$\begin{aligned} \min _{Q \in \mathcal {Q}_{\operatorname {ot}}(k_r, \overline{k})} \ \langle M, Q \rangle + \frac{\gamma }{2}\Vert L^{\frac{1}{2}} Q\Vert _F^2, \end{aligned}$$
(D17)

where \(M \in \mathbb {R}^{p \times p_{\operatorname {ref}}}, \gamma > 0\) and L is a symmetric positive definite matrix. We note \(a = \frac{1}{k} {\textbf{1}}_k, b = \frac{1}{\overline{k}} {\textbf{1}}_{\overline{k}}\). We will then apply to \(M = -2C, \gamma = 2\) and \(L = D\) which is symmetric positive definite when there is no empty clusters (since in this case \(\forall i \in {[\![K]\!]}, D_{ii} \ne 0\)). Most of our calculus are adapted from (Blondel et al., 2018). First, since L is a symmetric positive definite matrix, the problem Eq. (D17) is a strongly convex problem, thus it admits a unique solution.

To look at the dual of Eq. (D17), we consider the Lagrangian

$$\begin{aligned} \begin{aligned} \mathcal {L}(Q, \mu , \nu , \Gamma )&= \langle M, Q \rangle + \frac{\gamma }{2}\Vert L^{\frac{1}{2}} Q\Vert _F^2 + \langle \mu , a - Q{\textbf{1}}_k \rangle + \langle \nu , b - Q^\top {\textbf{1}}_{\overline{k}} \rangle - \langle \Gamma , Q\rangle \\&= \langle M, Q \rangle + \frac{\gamma }{2}\Vert L^{\frac{1}{2}} Q\Vert _F^2 - \langle Q, \mu {\textbf{1}}_{\overline{k}}^\top + {\textbf{1}}_k \nu ^\top + \Gamma \rangle + \langle \mu , a \rangle + \langle \nu , b\rangle , \end{aligned} \end{aligned}$$
(D18)

where \(\Gamma\) is the variable accounting for the non-negativity constraints on Q. We have

$$\begin{aligned} \nabla _Q \mathcal {L}(Q, \mu , \nu , \Gamma ) = 0 \iff M + \gamma L Q - \mu \oplus \nu - \Gamma = 0. \end{aligned}$$
(D19)

This is statisfied when \(Q = Q^\star = \frac{1}{\gamma } L^{-1}(\Gamma + \mu \oplus \nu - M)\). Moreover,

$$\begin{aligned} \begin{aligned} \langle M, Q^\star \rangle - \langle Q^\star , \mu \otimes \nu + \Gamma \rangle&= -\langle Q^\star , \mu \otimes \nu + \Gamma -M\rangle \\&=-\langle \frac{1}{\gamma } L^{-1}(\Gamma + \mu \oplus \nu - M), \mu \otimes \nu + \Gamma -M\rangle \\&= -\frac{1}{\gamma } \Vert L^{-\frac{1}{2}}(\Gamma + \mu \oplus \nu - M)\Vert _F^2. \end{aligned} \end{aligned}$$
(D20)

Thus

$$\begin{aligned} \begin{aligned} \langle M, Q^\star \rangle - \langle Q^\star , \mu \otimes \nu + \Gamma \rangle + \frac{\gamma }{2} \Vert L^{\frac{1}{2}} Q^\star \Vert _F^2&= -\frac{1}{\gamma } \Vert L^{-\frac{1}{2}}(\Gamma + \mu \oplus \nu - M)\Vert _F^2 \\&+ \frac{\gamma }{2} \Vert L^{\frac{1}{2}} (\frac{1}{\gamma } L^{-1}(\Gamma + \mu \oplus \nu - M))\Vert _F^2\\&=-\frac{1}{\gamma } \Vert L^{-\frac{1}{2}}(\Gamma + \mu \oplus \nu - M)\Vert _F^2 \\&+ \frac{1}{2\gamma } \Vert L^{-\frac{1}{2}}(\Gamma + \mu \oplus \nu - M)\Vert _F^2 \\&=-\frac{1}{2\gamma } \Vert L^{-\frac{1}{2}}(\Gamma + \mu \oplus \nu - M)\Vert _F^2 \end{aligned} \end{aligned}$$
(D21)

Hence

$$\begin{aligned} \mathcal {L}(Q^\star , \mu , \nu , \Gamma ) = -\frac{1}{2\gamma } \Vert L^{-\frac{1}{2}}(\Gamma + \mu \oplus \nu - M)\Vert _F^2 + \langle \mu , a\rangle + \langle \nu , b\rangle . \end{aligned}$$
(D22)

Now we solve the problem over \(\Gamma\) that is we maximize the problem

$$\begin{aligned} \max _{\Gamma \ge 0} \ \mathcal {L}(Q^\star , \mu , \nu , \Gamma ), \end{aligned}$$
(D23)

where \(\ge 0\) should be understood pointwise. This is equivalent to

$$\begin{aligned} \min _{\Gamma \ge 0} \Vert L^{-\frac{1}{2}}(\Gamma -A)\Vert _F^2. \end{aligned}$$
(D24)

where \(A:= M - \mu \oplus \nu\). Writing \(L = U \Delta U^\top\) where \(\Delta = \operatorname {diag}(d_1, \cdots , d_n)\) with \(d_i > 0\), Eq. (D24) equivalently writes

$$\begin{aligned} \min _{\Gamma \ge 0} \Vert \Delta ^{-\frac{1}{2}}\Gamma -\Delta ^{-\frac{1}{2}}A\Vert _F^2. \end{aligned}$$
(D25)

With a change of variable \({\tilde{\Gamma }} = \Delta ^{-1/2} \Gamma \ge 0\) this is equivalent to

$$\begin{aligned} \min _{{\tilde{\Gamma }} \ge 0} \Vert {\tilde{\Gamma }}-\Delta ^{-\frac{1}{2}}A\Vert _F^2, \end{aligned}$$
(D26)

whose minimum is given by \({\tilde{\Gamma }} = [\Delta ^{-\frac{1}{2}}A]_+ = \Delta ^{-\frac{1}{2}}[A]_+\) since \(\Delta ^{-\frac{1}{2}}\) is a diagonal matrix with positive entries. Thus the solution of (D24) is given by \(\Gamma = \Delta ^{1/2} {\tilde{\Gamma }} = \Delta ^{1/2} [\Delta ^{-1/2}A]_+ = [A]_+\). Also \([A]_+ - A = [-A]_{+}\). Thus

$$\begin{aligned} \min _{\Gamma \ge 0} \Vert L^{-\frac{1}{2}}(\Gamma -A)\Vert _F^2 = \Vert L^{-\frac{1}{2}}[-A]_{+}\Vert _F^2 = \Vert L^{-\frac{1}{2}}[\mu \oplus \nu -M]_{+}\Vert _F^2 \end{aligned}$$
(D27)

Hence the dual problem of Eq. (D17) is given by \(\max _{\mu , \nu } \mathcal {L}(Q^\star , \mu , \nu , [-A]_{+})\) which is

$$\begin{aligned} \max _{\mu , \nu } \ \langle \mu , a \rangle + \langle \nu , b\rangle -\frac{1}{2 \gamma }\Vert L^{-\frac{1}{2}}(\mu \oplus \nu -M)_{+}\Vert _F^2. \end{aligned}$$
(D28)

Applying this to \(M = -2C, \gamma = 2\) and \(L = D\) concludes. \(\square\)

Appendix E: Experiments details and extra results

1.1 E.1: Details about the implementation

The PASCO implementation relies on the POT library (Flamary et al., 2021) for the fusion part. The heaviest experiments were performed with the support of the Centre Blaise Pascal’s IT test platform at ENS de Lyon (Lyon, France) that operates the SIDUS solution (Quemener & Corvellec, 2013). We use an Intel Xeon Gold 5218 machine.

1.2 E.2: Details of the coarsening experiment

Table 1 provide some characteristics of the real graphs used in Fig. 3.

Table 1 Some characteristics of the real graphs used in Fig. 3, extracted from (Rossi & Ahmed, 2015)

The RSA experiments in Sect. 5.1 tested the conservation of spectral properties by coarsening algorithms, including PASCO. Figure 8 complete the Fig. 3 with SBM graphs. Graphs were drawn under the \(\operatorname {SSBM}(1000,k,2,\alpha )\), for \((k = 10, \alpha = 1/k)\), \((k = 100, \alpha = 1/k)\) and \((k = 10, \alpha = 1/(2k))\). Below, in Fig. 9, we display computational times of the coarsening methods.

Fig. 8
figure 8

We represent the RSA (the smaller, the better) of various coarsening schemes (including PASCO), as a function of the compression rate (the higher, the coarser the obtained graph). Shaded areas represent 0.2 upper- and lower-quantiles

Fig. 9
figure 9

We represent the computational time of various coarsening schemes (including PASCO), as a function of the compression rate (the higher, the coarser the obtained graph). Shaded areas represent 0.2 upper- and lower-quantiles over the 10 repetitions of the experiment. Top row: we reproduce a part of the experiment of ((Loukas, 2019), Figure 2) with the same real graphs and added PASCO. Bottom row: same experiment but with random graphs drawn from \(\operatorname {SSBM}(1000, k, 2, \alpha )\) for \((k=10, \alpha =1/k)\), \((k=100, \alpha =1/k)\) and \((k=10, \alpha =1/(2k))\)

1.3 Effectiveness of the alignment/fusion phase

To demonstrate the alignment+fusion procedure, we now consider a synthetic dataset generated from a two-dimensional Gaussian Mixture Model (GMM) with three clusters. The clusters consist of 500, 400 and 200 points, respectively. Each cluster is sampled from isotropic Gaussian distributions with a standard deviation of 0.25 and centers located at (0, 1), (1, 0), and (0, 0). The resulting dataset is shown in the top left panel of Fig. 10.

We generate 15 different partitions of the dataset with the goal of recovering the true partition corresponding to the original GMM clusters. The partitions are constructed as follows: first, we randomly select a number of clusters k, drawn uniformly between 3 and 10. Then, we designate k centroids: the first three are randomly selected from each of the true GMM clusters (so that we have at least one starting centroid in each true cluster), while the remaining centroids are uniformly sampled from the remaining points. Each point in the dataset is assigned to the nearest centroid, thereby forming a partition. Forcing each true cluster to be initially represented by at least one centroid ensures that the resulting partition is related to the true partition. See two examples of these generated partitions in Fig. 10.

The effectiveness of the proposed alignment and fusion method (Algorithm 3) is evaluated by comparing several alignment techniques: lin-reg, many-to-one, and the proposed ot to recover the true partition. For each case, we consider a randomly initialized reference with \(\overline{k} = 3\), such that each point is assigned to a cluster chosen uniformly at random. As a baseline, we compare with a K-means clustering with \(K=3\). We emphasize that the K-means algorithm benefits from the spatial coordinates of the data points, whereas the alignment+fusion methods operate solely on the different partitions \(P_1, \cdots , P_R\), and ignore the positions. The experiment is repeated hundred times, and the average Adjusted Mutual Information (AMI) (Vinh et al., 2009) (see Sect. E.5 for the definition) between the inferred and true partitions is plotted as a function of the number of partitions R on the bottom right panel of Fig. 10. The results indicate that the OT methods achieve performance comparable to K-means (high AMI, with small variance) when \(R \ge 6\), while lin-reg and many-to-one have high variance and struggle to retrieve the true partition for any value of R. This can be explained by the fact that the partitions are quite unbalanced and thus more suited for the coupling constraints. Finally, a partition recovered using the ot method is shown in the bottom left panel of Fig. 10, illustrating that it is nearly a perfect permutation of the true partition.

Fig. 10
figure 10

Experience with a toy dataset drawn from a 2D GMM (top left). Colors indicate clusters. Two partitions (out of 15) are depicted (top center and right), as well as the recovered partition using the alignment+fusion procedure based on ot (bottom left). The bottom right panel presents the average AMI between the true partition and the fused partitions obtained by ot, many-to-one, and lin-reg. Shaded areas represent 0.2 upper- and lower-quantiles over the 100 runs of the alignment+fusion algorithms

1.4 Additional results on parameters influence

Here, we provide additional results on the experiments of Sect. 5.2. Figure 11 show the performance of the various alignment methods for different difficulty levels in the SSBM. Parameters are the same as in Fig. 4.

In Fig. 5, we showed the computational gains of using PASCO. As a sanity check, in Fig. 12 we show that the performance w.r.t. the quality of the output partition are satisfying. Speed was not achieve at the cost of quality.

Fig. 11
figure 11

Influence of the alignment method in PASCO. The AMI score is average over the 10 runs and shaded areas represent 0.2 upper- and lower-quantiles. Dashed lines correspond to PASCO threshold

Fig. 12
figure 12

Quality of the partitions w.r.t. \(\rho\) and R, when \(R=\rho\). The number of communities vary in \(\{20, 100, 1000\}\). AMI values are averages over 10 runs and shaded areas represent 0.2 upper- and lower-quantiles

1.5 Definition of the scores used to evaluate clustering quality

Definition 4

(Adjusted Mutual Information) The Adjusted Mutual Information (AMI) between two partitions \(U = (U_1, \dots , U_k)\) and \(V = (V_1, \dots , V_{k'})\) of the set of node \(\mathbb {V}\) of size N is given by

$$\begin{aligned} AMI(U,V) = \frac{MI(U,V) - \mathbb {E}[MI(U,V)]}{ \max (H(U), H(V)) - \mathbb {E}[MI(U,V)]} , \end{aligned}$$
(E29)

where

$$\begin{aligned} MI(U,V) = \sum \limits _{i=1}^k \sum \limits _{j=1}^{k'} P_{UV}(i,j) \log \frac{P_{UV}(i,j)}{P_U(i) P_V(j)} , \end{aligned}$$
(E30)

with \(P_U(i) = |U_i|/N\) (similarly for \(P_V(j)\)), \(P_{UV}(i,j) = |U_i \cap V_j|/N\), and \(H(U) = - \sum _i P_U(i) \log P_U(i)\) (similarly for H(V)). The expected mutual information (MI) \(\mathbb {E}[MI(U,V)]\) in (E29) is computed by assuming hyper-geometric distribution for U and V, the parameters being estimated from the partitions. For the sake of conciseness and simplicity, we refer the reader to Vinh et al. (2009) for the precise formula.

Definition 5

(Generalized Normalized Cut) Given a graph \(\mathbb {G}\) represented by its adjacency or weight matrix \(\mathbb {A}\), consider a partition \((V_1, \dots , V_k)\) of the vertex set \(\mathbb {V}\) of \(\mathbb {G}\). The generalized normalized cut of the partition is defined as

$$\begin{aligned} \frac{1}{k} \sum \limits _{j=1}^k \frac{ \sum _{u \in V_j, v \notin V_j} \mathbb {A}_{u,v} }{ \sum _{u \in V_j, v \in \mathbb {V}} \mathbb {A}_{u,v}} . \end{aligned}$$
(E31)

Definition 6

(Modularity) Given a graph \(\mathbb {G}\) represented by its adjacency or weight matrix \(\mathbb {A}\), consider a partition \((V_1, \dots , V_k)\) of the vertex set \(\mathbb {V}\) of \(\mathbb {G}\). Let \(d_u\) denote the degree of node u and m the weight of all the edges. Then, the modularity is given by

$$\begin{aligned} \frac{1}{2m} \sum \limits _{j=1}^k \sum _{u \in V_j, v \in V_j} \left( \mathbb {A}_{u,v} - \frac{d_u d_v}{2m} \right) . \end{aligned}$$
(E32)

Definition 7

(Description Length) Let \(\mathbb {G}= (\mathbb {V}, \mathbb {E})\) be a graph and consider the partition \((V_1, \dots , V_k)\) of the vertex set \(\mathbb {V}\). We denote by \(e_{i,j}\) the number of edges between \(V_i\) and \(V_j\). In mathematical terms, the description length is defined by \(\texttt {dl} = \mathcal {S} + \mathcal {L}\), where \(\mathcal {S}\) is the entropy of the fitted stochastic block model and \(\mathcal {L}\) is the information required to describe the model. Their expressions are given by

$$\begin{aligned} \mathcal {S}&= \left| \mathbb {E} \right| - \frac{1}{2} \sum \limits _{i,j=1}^{k} e_{i,j} \log \left( \frac{e_{i,j}}{\left| V_i \right| \left| V_j \right| } \right) \\ \mathcal {L}&= \left| \mathbb {E} \right| \cdot h \left( \frac{k (k+1)}{2\left| \mathbb {E} \right| } \right) + \left| \mathbb {V} \right| \log (k), \end{aligned}$$

with \(h(x) = (1+x) \log (1+x) - x \log x\).

1.6 Real data experiments

Table 2 provides a few caracteristics of the real graphs used in Sect. 5.3.

Table 2 Name, number of nodes, number of edges and number of communities of the large real datasets used in the experiments

Here, we present the results of the experiments on real graph with tables. Bold figures represent the best result for each criterion for a given clustering algorithm (SC, CSC, louvain, leiden, MDL, infomap) (Tables 3, 4, 5).

Table 3 Results for the arxiv dataset
Table 4 Results for the mag dataset
Table 5 Results for the products dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Etienne, L., Rémi, V., Titouan, V. et al. Pasco (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms. Mach Learn 114, 212 (2025). https://doi.org/10.1007/s10994-025-06837-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s10994-025-06837-7

Keywords