这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Spectral clustering with scale fairness constraints

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Spectral clustering is one of the most common unsupervised learning algorithms in machine learning and plays an important role in data science. Fair spectral clustering has also become a hot topic with the extensive research on fair machine learning in recent years. Current iterations of fair spectral clustering methods are based on the concepts of group and individual fairness. These concepts act as mechanisms to mitigate decision bias, particularly for individuals with analogous characteristics and groups that are considered to be sensitive. Existing algorithms in fair spectral clustering have made progress in redistributing resources during clustering to mitigate inequities for certain individuals or subgroups. However, these algorithms still suffer from an unresolved problem at the global level: the resulting clusters tend to be oversized and undersized. To this end, the first original research on scale fairness is presented, aiming to explore how to enhance scale fairness in spectral clustering. We define it as a cluster attribution problem for uncertain data points and introduce entropy to enhance scale fairness. We measure the scale fairness of clustering by designing two statistical metrics. In addition, two scale fair spectral clustering algorithms are proposed, the entropy weighted spectral clustering (EWSC) and the scale fair spectral clustering (SFSC). We have experimentally verified on several publicly available real datasets of different sizes that EWSC and SFSC have excellent scale fairness performance, along with comparable clustering effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 2
Algorithm 3
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

Notes

  1. Code is available on https://github.com/wsyzj1025/SFSC-AND-EWSC.

References

  1. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14, MIT Press

  2. Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. In: Proceedings of the web conference 2020, pp 1400–1410

  3. White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 274–285

  4. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Saul L, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17, MIT Press

  5. Zhou D, Bousquet O, Lal T, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems, vol 16, MIT Press

  6. Ning H, Xu W, Chi Y, Gong Y, Huang TS (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recognit 43(1):113–127

    Article  MATH  Google Scholar 

  7. Tung F, Wong A, Clausi DA (2010) Enabling scalable spectral clustering for image segmentation. Pattern Recognit 43(12):4069–4076

    Article  MATH  Google Scholar 

  8. Li Z, Chen J (2015) Superpixel segmentation using linear spectral clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1356–1363

  9. Liu H, Zhao F, Jiao L (2012) Fuzzy spectral clustering with robust spatial information for image segmentation. Appl Soft Comput 12(11):3636–3647

    Article  MATH  Google Scholar 

  10. Higham DJ, Kalna G, Kibble M (2007) Spectral clustering and its use in bioinformatics. J Comput Appl Math 204(1):25–37

    Article  MathSciNet  MATH  Google Scholar 

  11. Nouri N, Kleinstein SH (2018) A spectral clustering-based method for identifying clones from high-throughput b cell repertoire sequencing data. Bioinformatics 34(13):341–349

    Article  MATH  Google Scholar 

  12. Tola V, Lillo F, Gallegati M, Mantegna RN (2008) Cluster analysis for portfolio optimization. J Econ Dyn Control 32(1):235–258

    Article  MathSciNet  MATH  Google Scholar 

  13. Mansano RE, Allem LE, Del-Vecchio RR, Hoppen C (2022) Balanced portfolio via signed graphs and spectral clustering in the Brazilian stock market. Qual Quant 56(4):2325–2340

    Article  MATH  Google Scholar 

  14. Kleindessner M, Samadi S, Awasthi P, Morgenstern J (2019) Guarantees for spectral clustering with fairness constraints. In: International conference on machine learning. PMLR, pp 3458–3467

  15. Du X, Pei Y, Duivesteijn W, Pechenizkiy M (2020) Fairness in network representation by latent structural heterogeneity in observational data. In: National conference on artificial intelligence

  16. Xia X, Hui Z, Chunming Y, Xujian Z, Bo L (2021) Fairness constraint of fuzzy c-means clustering improves clustering fairness. In: Asian conference on machine learning. PMLR, pp 113–128

  17. Dai E, Wang S (2021) Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In: WSDM ’21: the fourteenth ACM international conference on web search and data mining

  18. Dong Y, Ma J, Wang S, Chen C, Li J (2023) Fairness in graph mining: a survey, IEEE Trans Knowl Data Eng 35(10):10583–10602. https://doi.org/10.1109/TKDE.2023.3265598

  19. Kang J, He J, Maciejewski R, Tong H (2020) Inform: Individual fairness on graph mining. In: KDD ’20: the 26th ACM SIGKDD conference on knowledge discovery and data mining

  20. Zhu S, Wang D, Li T (2010) Data clustering with size constraints. Knowl Based Syst 23(8):883–889

    Article  MATH  Google Scholar 

  21. Venkatasubbu S, Krishnamoorthy G (2022) Ethical considerations in AI addressing bias and fairness in machine learning models. J Knowl Learn Sci Technol 1(1):130–138

    Article  MATH  Google Scholar 

  22. Luxburg UV (2004) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  MATH  Google Scholar 

  23. Yu SX, Shi J (2004) Segmentation given partial grouping constraints. IEEE Trans Pattern Anal Mach Intell 26(2):173–183

    Article  MATH  Google Scholar 

  24. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18, MIT Press

  25. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2011) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference

  26. Xia X, Hui Z, Chunming Y, Xujian Z, Bo L (2023) Fair method for spectral clustering to improve intra-cluster fairness. Comput Sci 50(2):8

    Google Scholar 

  27. Wang J, Lu D, Davidson I, Bai Z (2023) Scalable spectral clustering with group fairness constraints. In: International conference on artificial intelligence and statistics. PMLR, pp 6613–6629

  28. Fleisher W (2021) What’s fair about individual fairness? In: Proceedings of the 2021 AAAI/ACM conference on AI, ethics, and society, pp 480–490

  29. Gupta S, Dukkipati A (2021) Protecting individual interests across clusters: spectral clustering with guarantees. arXiv:2105.03714

  30. Nanda SJ, Gulati I, Chauhan R, Modi R, Dhaked U (2019) A k-means-galactic swarm optimization-based clustering algorithm with Otsu’s entropy for brain tumor detection. Appl Artif Intell 33(2):152–170

    Article  Google Scholar 

  31. Liu M, Zhang B, Li X, Tang W, Zhang G (2021) An optimized k-means algorithm based on information entropy. Comput J 64(7):1130–1143

    Article  MathSciNet  MATH  Google Scholar 

  32. Meng G, Dan L, Ni-hong W, Li-chen L (2014) A network intrusion detection model based on k-means algorithm and information entropy. Int J Secur Appl 8(6):285–294

    MATH  Google Scholar 

  33. Khan I, ALghafri M, Abdessalem A (2023) Entropy in fuzzy k-means algorithm for multi-view data. In: International conference on advances in computing research. Springer, pp 120–133

  34. Jenssen R, Eltoft T, Girolami M, Erdogmus D (2006) Kernel maximum entropy data transformation and an enhanced spectral clustering algorithm. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19, MIT Press

  35. Zhao F, Jiao L, Liu H, Gao X, Gong M (2010) Spectral clustering with eigenvector selection based on entropy ranking. Neurocomputing 73(10–12):1704–1717

    Article  MATH  Google Scholar 

  36. Jia H, Ding S, Zhu H, Wu F, Bao L (2013) A feature weighted spectral clustering algorithm based on knowledge entropy. J Softw 8(5):1101–1108

    Article  MATH  Google Scholar 

  37. Hu X, Zhang H, Yang C, Zhao X, Li B (2020) Regularized spectral clustering with entropy perturbation. IEEE Trans Big Data 7(6):967–972

    Article  MATH  Google Scholar 

  38. Kumar D, Padhy BP (2022) Entropy based spectral clustering for distribution network with high penetration of DGS. In: 2022 22nd National power systems conference (NPSC). IEEE, pp 53–58

  39. Jure Leskovec AK (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  40. Weeks MR, Clair S, Borgatti SP, Radda K, Schensul JJ (2002) Social networks of drug users in high-risk sites: finding the connections. AIDS Behav 6:193–206

    Article  Google Scholar 

  41. Fink CG, Omodt N, Zinnecker S, Sprint G (2023) A congressional twitter network dataset quantifying pairwise probability of influence. Data Brief 50:109521

    Article  Google Scholar 

  42. Rossana M, Julie F, Alain B, Cecile V (2015) Contact patterns in a high school: A comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10(9):0136497

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by Postgraduate Innovation Fund Project by Southwest University of Science and Technology (24ycx2064). This work is also supported by provincial scientific research institutes’ achievement transformation project of the science and technology. This work is also supported by provincial scientific research institutes’ achievement transformation project of the science and technology department of Sichuan Province, China (2023JDZH0011).

Author information

Authors and Affiliations

Authors

Contributions

Zhijing Yang and Hui Zhang designed the algorithmic flow and wrote this paper. Chunming Yang and Bo Li were responsible for the code and experimental parts. Xujian Zhao and Yin Long drew the pictures and polished this paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hui Zhang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Zhang, H., Yang, C. et al. Spectral clustering with scale fairness constraints. Knowl Inf Syst 67, 273–300 (2025). https://doi.org/10.1007/s10115-024-02183-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s10115-024-02183-7

Keywords

Profiles

  1. Zhijing Yang