Exploration and Exploitation of Unlabeled Data for Open-Set Semi-supervised Learning

Zhao, Ganlong; Li, Guanbin; Qin, Yipeng; Zhang, Jinjin; Chai, Zhenhua; Wei, Xiaolin; Lin, Liang; Yu, Yizhou

doi:10.1007/s11263-024-02155-y

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-supervised Learning

Published: 08 July 2024

Volume 132, pages 5888–5904, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Ganlong Zhao¹,
Guanbin Li ORCID: orcid.org/0000-0002-4805-0926^1,2,
Yipeng Qin³,
Jinjin Zhang⁴,
Zhenhua Chai⁴,
Xiaolin Wei⁴,
Liang Lin¹ &
…
Yizhou Yu⁵

632 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples. Unlike previous methods that only consider ID samples to be useful and aim to filter out OOD ones completely during training, we argue that the exploration and exploitation of both ID and OOD samples can benefit SSL. To support our claim, (i) we propose a prototype-based clustering and identification algorithm that explores the inherent similarity and difference among samples at feature level and effectively cluster them around several predefined ID and OOD prototypes, thereby enhancing feature learning and facilitating ID/OOD identification; (ii) we propose an importance-based sampling method that exploits the difference in importance of each ID and OOD sample to SSL, thereby reducing the sampling bias and improving the training. Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task Curriculum Framework for Open-Set Semi-supervised Learning

SCOMatch: Alleviating Overtrusting in Open-Set Semi-supervised Learning

LaRW: boosting open-set semi-supervised learning with label-guided re-weighting

Article 20 October 2023

Availability of data and materials

The data used in the paper is publicly available online.

Notes

We did not use SVHN because it has only 10 classes and thus cannot fit into this experiment.
In this case, “Clean” degenerates to “Labeled Only”.

References

Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2927–2936).
An, W., Tian, F., Zheng, Q., Ding, W., Wang, Q., & Chen, P. (2023). Generalized category discovery with decoupled prototypical network. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 12527–12535.
Article Google Scholar
Bachman, P., Alsharif, O., & Precup, D. (2014). Learning with pseudo-ensembles. Advances in Neural Information Processing Systems, 27
Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Sohn, K., Zhang, H., & Raffel, C. (2019). Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In International conference on learning representations.
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C.A. (2019). Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems, 32
Brigit, S., & Yin, C. (2018). Fgvcx fungi classification challenge. Online.
Cascante-Bonilla, P., Tan, F., Qi, Y., & Ordonez, V. (2021). Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 6912–6920.
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597–1607). PMLR.
Chen, Y., Zhu, X., Li, W., & Gong, S. (2020). Semi-supervised learning under class distribution mismatch. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 3569–3576.
Article Google Scholar
Chow, C. (1970). On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1), 41–46.
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.
DeVries, T., & Taylor, G. W. (2018). Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865
Du, X., Gozum, G., Ming, Y., & Li, Y. (2022). Siren: Shaping representations for detecting out-of-distribution objects. In Advances in neural information processing systems.
Dubey, A., Gupta, O., Raskar, R., & Naik, N. (2018). Maximum-entropy fine grained classification. Advances in Neural Information Processing Systems, 31.
Fan, Y., Kukleva, A., Dai, D., & Schiele, B. (2023). Ssb: Simple but strong baseline for boosting performance of open-set semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16068–16078).
Ghoting, A., Parthasarathy, S., & Otey, M. E. (2008). Fast mining of distance-based outliers in high-dimensional datasets. Data Mining and Knowledge Discovery, 16(3), 349–364.
Article MathSciNet Google Scholar
Grandvalet, Y., & Bengio, Y. (2004). Semi-supervised learning by entropy minimization. Advances in Neural Information Processing Systems, 17.
Guo, L.-Z., Zhang, Z.-Y., Jiang, Y., Li, Y.-F., & Zhou, Z.-H. (2020). Safe deep semi-supervised learning for unseen-class unlabeled data. In International conference on machine learning (pp. 3897–3906). PMLR.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (p. 9729–9738).
He, R., Han, Z., Lu, X., & Yin, Y. (2022). Safe-student for safe deep semi-supervised learning with unseen-class unlabeled data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14585–14594).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
He, R., Han, Z., Yang, Y., & Yin, Y. (2022). Not all parameters should be treated equally: Deep safe semi-supervised learning under class distribution mismatch. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 6874–6883.
Article Google Scholar
Hendrycks, D., & Gimpel, K. (2017). A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International conference on learning representations.
Hsu, Y.-C., Shen, Y., Jin, H., & Kira, Z. (2020). Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10951–10960).
Huang, J., Fang, C., Chen, W., Chai, Z., Wei, X., Wei, P., Lin, L., & Li, G. (2021). Trash to treasure: Harvesting ood data with cross-modal matching for open-set semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8310–8319).
Huang, Z., Yang, J., & Gong, C. (2022). They are not completely useless: Towards recycling transferable unlabeled data for class-mismatched semi-supervised learning. IEEE Transactions on Multimedia.
Krizhevsky, A., & Hinton, G., et al. (2009). Learning multiple layers of features from tiny images.
Laine, S., & Aila, T. (2017). Temporal ensembling for semi-supervised learning. In International conference on learning representations.
Lee, D.-H., et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML (vol. 3).
Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in Neural Information Processing Systems, 31.
Li, J., Zhou, P., Xiong, C., & Hoi, S. (2020). Prototypical contrastive learning of unsupervised representations. In International conference on learning representations.
Liang, S., Li, Y., & Srikant, R. (2018). Enhancing the reliability of out-of-distribution image detection in neural networks. In International conference on learning representations.
Liu, W., Wang, X., Owens, J., & Li, Y. (2020). Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33, 21464–21475.
Google Scholar
Luo, H., Cheng, H., Gao, Y., Li, K., Zhang, M., Meng, F., Guo, X., Huang, F., & Sun, X. (2021). On the consistency training for open-set semi-supervised learning. arXiv preprint arXiv:2101.08237
Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(11).
Ming, Y., Cai, Z., Gu, J., Sun, Y., Li, W., & Li, Y. (2022). Delving into out-of-distribution detection with vision-language representations. In Advances in neural information processing systems.
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning.
Oliver, A., Odena, A., Raffel, C. A., Cubuk, E. D., & Goodfellow, I. (2018). Realistic evaluation of deep semi-supervised learning algorithms. Advances in Neural Information Processing Systems, 31.
Park, S., Park, J., Shin, S.-J., & Moon, I.-C. (2018). Adversarial dropout for supervised and semi-supervised learning. In Proceedings of the AAAI conference on artificial intelligence (vol. 32).
Park, J., Yun, S., Jeong, J., & Shin, J. (2022). Opencos: Contrastive semi-supervised learning for handling open-set unlabeled data. In European conference on computer vision (pp. 134–149). Springer
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1406–1415).
Pham, H., Dai, Z., Xie, Q., & Le, Q.V. (2021). Meta pseudo labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11557–11568).
Saito, K., Kim, D., & Saenko, K. (2021). Openmatch: Open-set consistency regularization for semi-supervised learning with outliers. In Advances in neural information processing systems.
Sajjadi, M., Javanmardi, M., & Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Advances in Neural Information Processing Systems, 29
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C. A., Cubuk, E. D., Kurakin, A., & Li, C.-L. (2020). Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, 33, 596–608.
Google Scholar
Su, J.-C., & Maji, S. (2021). The semi-supervised inaturalist-aves challenge at fgvc7 workshop. arXiv preprint arXiv:2103.06937
Su, J.-C., Cheng, Z., & Maji, S. (2021). A realistic evaluation of semi-supervised learning for fine-grained classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12966–12975).
Sun, Y., Guo, C., & Li, Y. (2021). React: Out-of-distribution detection with rectified activations. In Advances in neural information processing systems.
Syeda-Mahmood, T., Wong, K. C., Gur, Y., Wu, J. T., Jadhav, A., Kashyap, S., Karargyris, A., Pillai, A., Sharma, A., & Syed, A. B., et al. (2020). Chest x-ray report generation through fine-grained label learning. In International conference on medical image computing and computer-assisted intervention (pp. 561–571). Springer.
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30.
Vaze, S., Han, K., Vedaldi, A., & Zisserman, A. (2022). Generalized category discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7492–7501).
Vincent, P., & Bengio, Y. (2003). Manifold parzen windows. Advances in Neural Information Processing Systems, 849–856.
Wager, S., Wang, S., & Liang, P. S. (2013). Dropout training as adaptive regularization. Advances in Neural Information Processing Systems, 26
Wen, X., Zhao, B., & Qi, X. (2023). Parametric classification for generalized category discovery: A baseline study. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16590–16600).
Winkens, J., Bunel, R., Roy, A. G., Stanforth, R., Natarajan, V., Ledsam, J. R., MacWilliams, P., Kohli, P., Karthikesalingam, A., & Kohl, S., et al. (2020). Contrastive training for improved out-of-distribution detection. arXiv preprint arXiv:2007.05566
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3733–3742).
Xie, Q., Dai, Z., Hovy, E., Luong, T., & Le, Q. (2020). Unsupervised data augmentation for consistency training. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 6256–6268). Curran Associates Inc.
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., & Wang, L. (2018). Learning to navigate for fine-grained classification. In Proceedings of the European conference on computer vision (ECCV) (pp. 420–435).
Yang, J., Wang, P., Zou, D., Zhou, Z., Ding, K., PENG, W., Wang, H., Chen, G., Li, B., & Sun, Y., et al.: Openood: Benchmarking generalized out-of-distribution detection. In Thirty-sixth conference on neural information processing systems datasets and benchmarks track.
Yu, Q., Ikami, D., Irie, G., & Aizawa, K. (2020). Multi-task curriculum framework for open-set semi-supervised learning. In European conference on computer vision (pp. 438–454). Springer.
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In British machine vision conference 2016. British Machine Vision Association.
Zhai, X., Oliver, A., Kolesnikov, A., & Beyer, L. (2019). S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1476–1485).
Zhang, S., Khan, S., Shen, Z., Naseer, M., Chen, G., & Khan, F.S. (2023). Promptcal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 3479–3488).
Zhang, B., Wang, Y., Hou, W., Wu, H., Wang, J., Okumura, M., & Shinozaki, T. (2021). Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems, 34, 18408–18419.
Google Scholar
Zhu, Y., Deng, X., & Newsam, S. (2019). Fine-grained land use classification at the city scale using ground-level images. IEEE Transactions on Multimedia, 21(7), 1825–1838.
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NO. 62322608), in part by the Shenzhen Science and Technology Program (NO. JCYJ20220530141211024), in part by the Guangdong Basic and Applied Basic Research Foundation under Grant NO. 2024A1515010255, and in part by the Open Project Program of the Key Laboratory of Artificial Intelligence for Perception and Understanding, Liaoning Province (AIPU, No. 20230003).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, China
Ganlong Zhao, Guanbin Li & Liang Lin
Research Institute, Sun Yat-sen University, Shenzhen, China
Guanbin Li
School of Computer Science and Informatics, Cardiff University, Cardiff, UK
Yipeng Qin
MeituanDianping Group, Beijing, China
Jinjin Zhang, Zhenhua Chai & Xiaolin Wei
Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong
Yizhou Yu

Authors

Ganlong Zhao
View author publications
Search author on:PubMed Google Scholar
Guanbin Li
View author publications
Search author on:PubMed Google Scholar
Yipeng Qin
View author publications
Search author on:PubMed Google Scholar
Jinjin Zhang
View author publications
Search author on:PubMed Google Scholar
Zhenhua Chai
View author publications
Search author on:PubMed Google Scholar
Xiaolin Wei
View author publications
Search author on:PubMed Google Scholar
Liang Lin
View author publications
Search author on:PubMed Google Scholar
Yizhou Yu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Guanbin Li.

Ethics declarations

Conflict of interest

The authors declare they have no Conflict of interest.

Ethical statements

The datasets used in our work are officially shared by reliable research agencies, which guarantee that the collecting, processing, releasing, and using of data have gained the formal consent of participants. To protect privacy, all individuals are anonymized with simple identity numbers.

Additional information

Communicated by ZHUN ZHONG.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, G., Li, G., Qin, Y. et al. Exploration and Exploitation of Unlabeled Data for Open-Set Semi-supervised Learning. Int J Comput Vis 132, 5888–5904 (2024). https://doi.org/10.1007/s11263-024-02155-y

Download citation

Received: 08 October 2023
Accepted: 17 June 2024
Published: 08 July 2024
Version of record: 08 July 2024
Issue date: December 2024
DOI: https://doi.org/10.1007/s11263-024-02155-y

Keywords

Profiles

Jinjin Zhang View author profile

Part of a collection:

Special Issue on Open-World Visual Recognition

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-supervised Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-task Curriculum Framework for Open-Set Semi-supervised Learning

SCOMatch: Alleviating Overtrusting in Open-Set Semi-supervised Learning

LaRW: boosting open-set semi-supervised learning with label-guided re-weighting

Explore related subjects

Availability of data and materials

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical statements

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now