Abstract
Spectral clustering is one of the most common unsupervised learning algorithms in machine learning and plays an important role in data science. Fair spectral clustering has also become a hot topic with the extensive research on fair machine learning in recent years. Current iterations of fair spectral clustering methods are based on the concepts of group and individual fairness. These concepts act as mechanisms to mitigate decision bias, particularly for individuals with analogous characteristics and groups that are considered to be sensitive. Existing algorithms in fair spectral clustering have made progress in redistributing resources during clustering to mitigate inequities for certain individuals or subgroups. However, these algorithms still suffer from an unresolved problem at the global level: the resulting clusters tend to be oversized and undersized. To this end, the first original research on scale fairness is presented, aiming to explore how to enhance scale fairness in spectral clustering. We define it as a cluster attribution problem for uncertain data points and introduce entropy to enhance scale fairness. We measure the scale fairness of clustering by designing two statistical metrics. In addition, two scale fair spectral clustering algorithms are proposed, the entropy weighted spectral clustering (EWSC) and the scale fair spectral clustering (SFSC). We have experimentally verified on several publicly available real datasets of different sizes that EWSC and SFSC have excellent scale fairness performance, along with comparable clustering effects.
Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
Notes
Code is available on https://github.com/wsyzj1025/SFSC-AND-EWSC.
References
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14, MIT Press
Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. In: Proceedings of the web conference 2020, pp 1400–1410
White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 274–285
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Saul L, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17, MIT Press
Zhou D, Bousquet O, Lal T, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems, vol 16, MIT Press
Ning H, Xu W, Chi Y, Gong Y, Huang TS (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recognit 43(1):113–127
Tung F, Wong A, Clausi DA (2010) Enabling scalable spectral clustering for image segmentation. Pattern Recognit 43(12):4069–4076
Li Z, Chen J (2015) Superpixel segmentation using linear spectral clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1356–1363
Liu H, Zhao F, Jiao L (2012) Fuzzy spectral clustering with robust spatial information for image segmentation. Appl Soft Comput 12(11):3636–3647
Higham DJ, Kalna G, Kibble M (2007) Spectral clustering and its use in bioinformatics. J Comput Appl Math 204(1):25–37
Nouri N, Kleinstein SH (2018) A spectral clustering-based method for identifying clones from high-throughput b cell repertoire sequencing data. Bioinformatics 34(13):341–349
Tola V, Lillo F, Gallegati M, Mantegna RN (2008) Cluster analysis for portfolio optimization. J Econ Dyn Control 32(1):235–258
Mansano RE, Allem LE, Del-Vecchio RR, Hoppen C (2022) Balanced portfolio via signed graphs and spectral clustering in the Brazilian stock market. Qual Quant 56(4):2325–2340
Kleindessner M, Samadi S, Awasthi P, Morgenstern J (2019) Guarantees for spectral clustering with fairness constraints. In: International conference on machine learning. PMLR, pp 3458–3467
Du X, Pei Y, Duivesteijn W, Pechenizkiy M (2020) Fairness in network representation by latent structural heterogeneity in observational data. In: National conference on artificial intelligence
Xia X, Hui Z, Chunming Y, Xujian Z, Bo L (2021) Fairness constraint of fuzzy c-means clustering improves clustering fairness. In: Asian conference on machine learning. PMLR, pp 113–128
Dai E, Wang S (2021) Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In: WSDM ’21: the fourteenth ACM international conference on web search and data mining
Dong Y, Ma J, Wang S, Chen C, Li J (2023) Fairness in graph mining: a survey, IEEE Trans Knowl Data Eng 35(10):10583–10602. https://doi.org/10.1109/TKDE.2023.3265598
Kang J, He J, Maciejewski R, Tong H (2020) Inform: Individual fairness on graph mining. In: KDD ’20: the 26th ACM SIGKDD conference on knowledge discovery and data mining
Zhu S, Wang D, Li T (2010) Data clustering with size constraints. Knowl Based Syst 23(8):883–889
Venkatasubbu S, Krishnamoorthy G (2022) Ethical considerations in AI addressing bias and fairness in machine learning models. J Knowl Learn Sci Technol 1(1):130–138
Luxburg UV (2004) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Yu SX, Shi J (2004) Segmentation given partial grouping constraints. IEEE Trans Pattern Anal Mach Intell 26(2):173–183
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18, MIT Press
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2011) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference
Xia X, Hui Z, Chunming Y, Xujian Z, Bo L (2023) Fair method for spectral clustering to improve intra-cluster fairness. Comput Sci 50(2):8
Wang J, Lu D, Davidson I, Bai Z (2023) Scalable spectral clustering with group fairness constraints. In: International conference on artificial intelligence and statistics. PMLR, pp 6613–6629
Fleisher W (2021) What’s fair about individual fairness? In: Proceedings of the 2021 AAAI/ACM conference on AI, ethics, and society, pp 480–490
Gupta S, Dukkipati A (2021) Protecting individual interests across clusters: spectral clustering with guarantees. arXiv:2105.03714
Nanda SJ, Gulati I, Chauhan R, Modi R, Dhaked U (2019) A k-means-galactic swarm optimization-based clustering algorithm with Otsu’s entropy for brain tumor detection. Appl Artif Intell 33(2):152–170
Liu M, Zhang B, Li X, Tang W, Zhang G (2021) An optimized k-means algorithm based on information entropy. Comput J 64(7):1130–1143
Meng G, Dan L, Ni-hong W, Li-chen L (2014) A network intrusion detection model based on k-means algorithm and information entropy. Int J Secur Appl 8(6):285–294
Khan I, ALghafri M, Abdessalem A (2023) Entropy in fuzzy k-means algorithm for multi-view data. In: International conference on advances in computing research. Springer, pp 120–133
Jenssen R, Eltoft T, Girolami M, Erdogmus D (2006) Kernel maximum entropy data transformation and an enhanced spectral clustering algorithm. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19, MIT Press
Zhao F, Jiao L, Liu H, Gao X, Gong M (2010) Spectral clustering with eigenvector selection based on entropy ranking. Neurocomputing 73(10–12):1704–1717
Jia H, Ding S, Zhu H, Wu F, Bao L (2013) A feature weighted spectral clustering algorithm based on knowledge entropy. J Softw 8(5):1101–1108
Hu X, Zhang H, Yang C, Zhao X, Li B (2020) Regularized spectral clustering with entropy perturbation. IEEE Trans Big Data 7(6):967–972
Kumar D, Padhy BP (2022) Entropy based spectral clustering for distribution network with high penetration of DGS. In: 2022 22nd National power systems conference (NPSC). IEEE, pp 53–58
Jure Leskovec AK (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Weeks MR, Clair S, Borgatti SP, Radda K, Schensul JJ (2002) Social networks of drug users in high-risk sites: finding the connections. AIDS Behav 6:193–206
Fink CG, Omodt N, Zinnecker S, Sprint G (2023) A congressional twitter network dataset quantifying pairwise probability of influence. Data Brief 50:109521
Rossana M, Julie F, Alain B, Cecile V (2015) Contact patterns in a high school: A comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10(9):0136497
Acknowledgements
This work is partially supported by Postgraduate Innovation Fund Project by Southwest University of Science and Technology (24ycx2064). This work is also supported by provincial scientific research institutes’ achievement transformation project of the science and technology. This work is also supported by provincial scientific research institutes’ achievement transformation project of the science and technology department of Sichuan Province, China (2023JDZH0011).
Author information
Authors and Affiliations
Contributions
Zhijing Yang and Hui Zhang designed the algorithmic flow and wrote this paper. Chunming Yang and Bo Li were responsible for the code and experimental parts. Xujian Zhao and Yin Long drew the pictures and polished this paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Z., Zhang, H., Yang, C. et al. Spectral clustering with scale fairness constraints. Knowl Inf Syst 67, 273–300 (2025). https://doi.org/10.1007/s10115-024-02183-7
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s10115-024-02183-7