Abstract
Incomplete multi-modal clustering (IMmC) is challenging due to the unexpected missing of some modalities in data. A key to this problem is to explore complementarity information among different samples with incomplete information of unpaired data. Despite preliminary progress, existing methods suffer from (1) relying heavily on paired data, and (2) difficulty in mining complementarity on data with high missing rates. To address the problems, we propose a novel method, Integrated Heterogeneous Graph ATtention (IHGAT) network, for IMmC. To fully exploit the complementarity among different samples and modalities, we first construct a set of integrated heterogeneous graphs based on the similarity graph learned from unified latent representations and the modality-specific availability graphs formed by the existing relations of different samples. Thereafter, the attention mechanism is applied to the constructed integrated heterogeneous graph to aggregate the embedded content of heterogeneous neighbors for each node. In this way, the representations of missing modalities can be learned based on the complementarity information of other samples and their other modalities. Finally, the consistency of probability distribution is embedded into the network for clustering. Consequently, the proposed method can form a complete latent space where incomplete information can be supplemented by other related samples via the learned intrinsic structure. Extensive experiments on eight public datasets show that the proposed IHGAT outperforms existing methods under various settings and is typically more robust in cases of high missing rates.
Similar content being viewed by others
Data Availibility
The CUB Wah et al. (2011) dataset can be obtained from https://www.vision.caltech.edu/datasets/cub_200_2011/. The Football dataset can be obtained from http://mlg.ucd.ie/aggregation/index.html. The ORL dataset can be obtained from https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html. The PIE dataset can be obtained from http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html. The Politics dataset can be obtained from http://mlg.ucd.ie/aggregation/index.html. The 3Sources dataset can be obtained from http://mlg.ucd.ie/datasets/3sources.html.
Notes
References
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.
Bothorel, C., Cruz, J. D., Magnani, M., & Micenkova, B. (2015). Clustering attributed graphs: Models, measures and methods. Network Science, 3(3), 408–444.
Brasó, G., Cetintas, O., & Leal-Taixé, L. (2022). Multi-object tracking and segmentation via neural message passing. International Journal of Computer Vision, 130(12), 3035–3053.
Brissman, E., Johnander, J., Danelljan, M., & Felsberg, M. (2023). Recurrent graph neural networks for video instance segmentation. International Journal of Computer Vision, 131(2), 471–495.
Cao, Y., Luo, X., Yang, J., Cao, Y., & Yang, M. Y. (2022). Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection. Information Fusion, 88, 1–11.
Chang, S., Han, W., Tang, J., Qi, G.-J., Aggarwal, C. C., & Huang, T. S. (2015). Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128.
Chen, L., Gao, Y., Huang, X., Jensen, C. S., & Zheng, B. (2020). Efficiently distributed clustering algorithms on star-schema heterogeneous graphs. IEEE Transactions on Knowledge and Data Engineering, pp. 1–15.
Chen, Y., Mancini, M., Zhu, X., & Akata, Z. (2022). Semi-supervised and unsupervised deep visual learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–23.
Chen, Y., Xiao, X., & Zhou, Y. (2019). Jointly learning kernel representation tensor and affinity matrix for multi-view clustering. IEEE Transactions on Multimedia, 22(8), 1985–1997.
Cheng, J., Wang, Q., Tao, Z., Xie, D.-Y., & Gao, Q. (2020). Multi-view attribute graph convolution networks for clustering. In IJCAI, pp. 2973–2979.
Deng, S., Wen, J., Liu, C., Yan, K., Xu, G., & Xu, Y. (2023). Projective incomplete multi-view clustering. IEEE Transactions on Neural Networks and Learning Systems.
Enders, C. K. (2010). Applied missing data analysis. Guilford press.
Fang, U., Li, M., Li, J., Gao, L., Jia, T., & Zhang, Y. (2023). A comprehensive survey on multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12350–12368.
Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in Neural Information Processing systems.
Han, R., Gan, Y., Wang, L., Li, N., Feng, W., & Wang, S. (2023). Relating view directions of complementary-view mobile cameras via the human shadow. International Journal of Computer Vision, pp. 1–16.
Hotelling, H. (1992). Relations between two sets of variates. In Breakthroughs in Statistics, pp. 162–190. Springer.
Kumar, R., Chen, T., Hardt, M., Beymer, D., Brannon, K., & Syeda-Mahmood, T. (2013). Multiple kernel completion and its application to cardiac disease discrimination. In 2013 IEEE 10th International Symposium on Biomedical Imaging, pp. 764–767. IEEE.
Le, Q. & Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning, pp. 1188–1196. PMLR.
Li, L., Wan, Z., & He, H. (2021). Incomplete multi-view clustering with joint partition and graph learning. IEEE Transactions on Knowledge and Data Engineering, pp. 1–15.
Li, X., Wu, Y., Ester, M., Kao, B., Wang, X., & Zheng, Y. (2017). Semi-supervised clustering in attributed heterogeneous information networks. In Proceedings of the 26th International Conference on World Wide Web, pp. 1621–1629.
Lin, Y., Gou, Y., Liu, X., Bai, J., Lv, J., & Peng, X. (2023). Dual contrastive prediction for incomplete multi-view representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4447–4461.
Lin, Y., Gou, Y., Liu, Z., Li, B., Lv, J., & Peng, X. (2021). Completer: Incomplete multi-view clustering via contrastive prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11174–11183.
Michieli, U., & Zanuttigh, P. (2022). Edge-aware graph matching network for part-based semantic segmentation. International Journal of Computer Vision, 130(11), 2797–2821.
Qi, G.-J., Aggarwal, C. C., & Huang, T. S. (2012). On clustering heterogeneous social media objects with outlier links. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, pp. 553–562.
Shao, W., He, L., & Philip, S. Y. (2015). Multiple incomplete views clustering via weighted nonnegative matrix factorization with \(l_ {2, 1}\) regularization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 318–334. Springer.
Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 29(1), 17–37.
Tao, Z., Liu, H., Li, J., Wang, Z., & Fu, Y. (2019). Adversarial graph embedding for ensemble clustering. In International Joint Conferences on Artificial Intelligence Organization, pp. 3562–3568.
Tran, L., Liu, X., Zhou, J., & Jin, R. (2017). Missing modalities imputation via cascaded residual autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1405–1414.
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph Attention Networks. International Conference on Learning Representations.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.
Wang, Q., Ding, Z., Tao, Z., Gao, Q., & Fu, Y. (2018). Partial multi-view clustering via consistent gan. In 2018 IEEE International Conference on Data Mining (ICDM), pp. 1290–1295. IEEE.
Wang, Q., Ding, Z., Tao, Z., Gao, Q., & Fu, Y. (2021). Generative partial multi-view clustering with adaptive fusion and cycle consistency. IEEE Transactions on Image Processing, 30, 1771–1783.
Wang, Q., Lian, H., Sun, G., Gao, Q., & Jiao, L. (2020). icmsc: Incomplete cross-modal subspace clustering. IEEE Transactions on Image Processing, 30, 305–317.
Wang, Q., Zhan, L., Thompson, P., & Zhou, J. (2020b). Multimodal learning with incomplete modalities by knowledge distillation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1828–1838.
Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P., & Yu, P. S. (2019). Heterogeneous graph attention network. In The World Wide Web Conference, pp. 2022–2032.
Wen, J., Xu, G., Tang, Z., Wang, W., Fei, L., & Xu, Y. (2023a). Graph regularized and feature aware matrix factorization for robust incomplete multi-view clustering. IEEE Transactions on Circuits and Systems for Video Technology.
Wen, J., Yan, K., Zhang, Z., Xu, Y., Wang, J., Fei, L., & Zhang, B. (2020). Adaptive graph completion based incomplete multi-view clustering. IEEE Transactions on Multimedia, 23, 2493–2504.
Wen, J., Zhang, Z., Fei, L., Zhang, B., Xu, Y., Zhang, Z., & Li, J. (2023). A survey on incomplete multiview clustering. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 53(2), 1136–1149.
Wen, J., Zhang, Z., Zhang, Z., Zhu, L., Fei, L., Zhang, B., & Xu, Y. (2021). Unified tensor framework for incomplete multi-view clustering and missing-view inferring. In Proceedings of the AAAI Conference on Artificial Intelligence, 35, 10273–10281.
Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., & Ye, J. (2013). Multi-source learning with block-wise missing data for alzheimer’s disease prediction. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 185–193.
Xie, D., Zhang, X., Gao, Q., Han, J., Xiao, S., & Gao, X. (2019). Multiview clustering by joint latent representation and similarity learning. IEEE Transactions on Cybernetics, 50(11), 4848–4854.
Xu, C., Tao, D., & Xu, C. (2015). Multi-view learning with incomplete views. IEEE Transactions on Image Processing, 24(12), 5812–5825.
Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How powerful are graph neural networks? International Conference on Learning Representations.
Yang, L., Shen, C., Hu, Q., Jing, L., & Li, Y. (2019). Adaptive sample-level graph combination for partial multiview clustering. IEEE Transactions on Image Processing, 29, 2780–2794.
Yang, S., Li, L., Wang, S., Zhang, W., Huang, Q., & Tian, Q. (2019). Skeletonnet: A hybrid network with a skeleton-embedding process for multi-view image representation learning. IEEE Transactions on Multimedia, 21(11), 2916–2929.
Yuan, L., Wang, Y., Thompson, P. M., Narayan, V. A., & Ye, J. (2012). Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1149–1157.
Zhan, K., Nie, F., Wang, J., & Yang, Y. (2018). Multiview consensus graph clustering. IEEE Transactions on Image Processing, 28(3), 1261–1270.
Zhang, C., Cui, Y., Han, Z., Zhou, J. T., Fu, H., & Hu, Q. (2022). Deep partial multi-view learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 2402–2415.
Zhang, C., Fu, H., Hu, Q., Cao, X., Xie, Y., Tao, D., & Xu, D. (2018). Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 86–99.
Zhang, C., Fu, H., Wang, J., Li, W., Cao, X., & Hu, Q. (2020). Tensorized multi-view subspace representation learning. International Journal of Computer Vision, 128(8–9), 2344–2361.
Zhang, C., Song, D., Huang, C., Swami, A., & Chawla, N. V. (2019). Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793–803.
Zhang, L., Zhao, Y., Zhu, Z., Shen, D., & Ji, S. (2018). Multi-view missing data completion. IEEE Transactions on Knowledge and Data Engineering, 30(7), 1296–1309.
Zhang, Y., Xiong, Y., Kong, X., Li, S., Mi, J., & Zhu, Y. (2018c). Deep collective classification in heterogeneous information networks. In Proceedings of the 2018 World Wide Web Conference, pp. 399–408.
Zhao, J., Wang, X., Shi, C., Hu, B., Song, G., & Ye, Y. (2021). Heterogeneous graph structure learning for graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4697–4705.
Acknowledgements
This work was supported in part by the National Science and Technology Major Project under Grant 2022ZD0116500, in part by the National Natural Science Foundation of China under Grants 62106174, 62222608, 62266035, and 61925602, and in part by Tianjin Natural Science Funds for Distinguished Young Scholar under Grant 23JCJQJC00270.
Funding
National Science and Technology Major Project under Grant 2022ZD0116500; National Natural Science Foundation of China under Grants 62106174, 62222608, 62266035, and 61925602; Tianjin Natural Science Funds for Distinguished Young Scholar under Grant 23JCJQJC00270.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Communicated by Massimiliano Mancini.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Yao, X., Zhu, P. et al. Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering. Int J Comput Vis 132, 3847–3866 (2024). https://doi.org/10.1007/s11263-024-02066-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02066-y