Defending Against Adversarial Examples Via Modeling Adversarial Noise

Zhou, Dawei; Wang, Nannan; Han, Bo; Liu, Tongliang; Gao, Xinbo

doi:10.1007/s11263-025-02467-7

Defending Against Adversarial Examples Via Modeling Adversarial Noise

Published: 14 May 2025

Volume 133, pages 5920–5937, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Dawei Zhou¹,
Nannan Wang ORCID: orcid.org/0000-0003-1435-489X¹,
Bo Han²,
Tongliang Liu³ &
…
Xinbo Gao⁴

444 Accesses
1 Citation
Explore all metrics

Abstract

Adversarial examples have become a major threat to the reliable application of deep learning models. Meanwhile, this issue promotes the development of adversarial defenses. Adversarial noise contains well-generalizing and misleading features, which can manipulate predicted labels to be flipped maliciously. Motivated by this, we study modeling adversarial noise for defending against adversarial examples by learning the transition relationship between adversarial labels (i.e., flipped labels caused by adversarial noise) and natural labels (i.e., real labels of natural samples). In this work, we propose an adversarial defense method from the perspective of modeling adversarial noise. Specifically, we construct an instance-dependent label transition matrix to represent the label transition relationship for explicitly modeling adversarial noise. The label transition matrix is obtained from the input sample by leveraging a label transition network. By exploiting the label transition matrix, we can infer the natural label from the adversarial label and thus correct wrong predictions misled by adversarial noise. Additionally, to enhance the robustness of the label transition network, we design an adversarial robustness constraint at the transition matrix level. Experimental results demonstrate that our method effectively improves the robust accuracy against multiple attacks and exhibits great performance in detecting adversarial input samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

2N labeling defense method against adversarial attacks by filtering and extended class label set

Article Open access 07 October 2022

An Adversarial Training Method for Improving Model Robustness in Unsupervised Domain Adaptation

Enhancing Model Robustness Against Adversarial Attacks with an Anti-adversarial Module

Data Availability

The datasets used in our paper are available in: CIFAR-10: https://www.cs.toronto.eduy/~kriz/cifar-10-python.tar.gz, Tiny-ImageNet: https://www.kaggle.com/c/tiny-imagenet/data, Mini-ImageNet: https://www.kaggle.com/datasets/arjunashok33/miniimagenet.

References

Goodfellow, I.J., Shlens, J., & Szegedy, C (2015) Explaining and harnessing adversarial examples. In: International Conference on Learning Representations
Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K (2018). Synthesizing robust adversarial examples. In: International Conference on Machine Learning, pp. 284–293. PMLR
Cai, J., Wang, B., Wang, X., & Jin, B (2019). Accelerate black-box attack with white-box prior knowledge. In: Intelligence Science and Big Data Engineering. Big Data and Machine Learning: 9th International Conference, IScIDE 2019, Nanjing, China, October 17–20, 2019, Proceedings, Part II 9, 394–405. Springer
Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33, 1633–1645.
Google Scholar
Tang, S., Huang, X., Chen, M., Sun, C., & Yang, J. (2021). Adversarial attack type i: Cheat classifiers by significant changes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(3), 1100–1109. https://doi.org/10.1109/TPAMI.2019.2936378
Article Google Scholar
Dong, Y., Zhu, J., & Gao, X.-S. (2022). Isometric 3d adversarial examples in the physical world. Advances in Neural Information Processing Systems, 35, 19716–19731.
Google Scholar
Ying, C., Qiaoben, Y., Zhou, X., Su, H., Ding, W., & Ai, J. (2023). Consistent attack: Universal adversarial perturbation on embodied vision navigation. Pattern Recognition Letters, 168, 57–63.
Article Google Scholar
Yin, M., Li, S., Song, C., Asif, M.S., Roy-Chowdhury, A.K., & Krishnamurthy, S.V (2022). Adc: Adversarial attacks against object detection that evade context consistency checks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3278–3287
Mi, J.-X., Wang, X.-D., Zhou, L.-F., & Cheng, K. (2023). Adversarial examples based on object detection tasks: A survey. Neurocomputing, 519, 114–126.
Article Google Scholar
Wei, X., Guo, Y., & Yu, J. (2023). Adversarial sticker: A stealthy attack method in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 2711–2725. https://doi.org/10.1109/TPAMI.2022.3176760
Article Google Scholar
Jaiswal, S., Duggirala, K., Dash, A., & Mukherjee, A. (2022). Two-face: Adversarial audit of commercial face recognition systems. In: Proceedings of the International AAAI Conference on Web and Social Media, 16, 381–392
Feng, S., Yan, X., Sun, H., Feng, Y., & Liu, H. X. (2021). Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nature communications, 12(1), 748.
Article Google Scholar
Shibly, K. H., Hossain, M. D., Inoue, H., Taenaka, Y., & Kadobayashi, Y. (2023). Towards autonomous driving model resistant to adversarial attack. Applied Artificial Intelligence, 37(1), 2193461.
Article Google Scholar
Bai, S., Li, Y., Zhou, Y., Li, Q., & Torr, P. H. S. (2021). Adversarial metric attack and defense for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 2119–2126. https://doi.org/10.1109/TPAMI.2020.3031625
Article Google Scholar
Yang, F., Weng, J., Zhong, Z., Liu, H., Wang, Z., Luo, Z., Cao, D., Li, S., Satoh, S., & Sebe, N. (2023). Towards robust person re-identification by defending against universal attackers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 5218–5235. https://doi.org/10.1109/TPAMI.2022.3199013
Article Google Scholar
Ghaffari Laleh, N., Truhn, D., Veldhuizen, G. P., Han, T., Treeck, M., Buelow, R. D., Langer, R., Dislich, B., Boor, P., & Schulz, V. (2022). Adversarial attacks and adversarial robustness in computational pathology. Nature Communications, 13(1), 5711.
Article Google Scholar
Puttagunta, M.K., Ravi, S., & Nelson Kennedy Babu, C (2023). Adversarial examples: attacks and defences on medical deep learning systems. Multimedia Tools and Applications, 1–37
Li, X., Wang, Z., Zhang, B., Sun, F., & Hu, X. (2023). Recognizing object by components with human prior knowledge enhances adversarial robustness of deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence
Wei, Z., Wang, Y., Guo, Y., & Wang, Y. (2023). Cfa: Class-wise calibrated fair adversarial training. In: Conference on Computer Vision and Pattern Recognition
Yan, H., Zhang, J., Niu, G., Feng, J., Tan, V., & Sugiyama, M. (2021). Cifs: Improving adversarial robustness of cnns via channel-wise importance-based feature selection. In: International Conference on Machine Learning, 11693–11703. PMLR
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638
Zhu, J., Yao, J., Liu, T., Xu, J., & Han, B. (2023). Combating exacerbated heterogeneity for robust models in federated learning. In: The Eleventh International Conference on Learning Representations
Liu, D., & Hu, W. (2023). Imperceptible transfer attack and defense on 3d point cloud classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4727–4746. https://doi.org/10.1109/TPAMI.2022.3193449
Article Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., & Zhu, J. (2018). Defense against adversarial attacks using high-level representation guided denoiser. In: Conference on Computer Vision and Pattern Recognition, pp. 1778–1787
Naseer, M., Khan, S., Hayat, M., Khan, F.S., & Porikli, F. (2020). A self-supervised approach for adversarial robustness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 262–271
Yoon, J., Hwang, S.J., & Lee, J. (2021). Adversarial purification with score-based generative models. In: International Conference on Machine Learning, 12062–12072. PMLR
Li, Y., Zhang, W., Liu, J., Kou, X., Li, H., & Cui, J. (2021). Enhanced countering adversarial attacks via input denoising and feature restoring. arXiv preprint arXiv:2111.10075
Xiao, C., Chen, Z., Jin, K., Wang, J., Nie, W., Liu, M., Anandkumar, A., Li, B., & Song, D. (2022). Densepure: Understanding diffusion models towards adversarial robustness. arXiv preprint arXiv:2211.00322
Xu, W., Evans, D., & Qi, Y. (2017). Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175
Wei, K.-A.A. (2020). Understanding non-robust features in image classification. PhD thesis, Massachusetts Institute of Technology
Kim, J., Lee, B.-K., & Ro, Y. M. (2021). Distilling robust and non-robust features in adversarial examples by information bottleneck. Advances in Neural Information Processing Systems, 34, 17148–17159.
Google Scholar
Zhou, D., Wang, N., Han, B., & Liu, T. (2022) Modeling adversarial noise for adversarial training. In: International Conference on Machine Learning, 27353–27366. PMLR
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., & Fergus, R. (2014). Intriguing properties of neural networks. In: International Conference on Learning Representations
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018) Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9185–9193
Wang, X., & He, K. (2021). Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933
Carlini, N., & Wagner, D (2017) Towards evaluating the robustness of neural networks. In: 2017 Ieee Symposium on Security and Privacy (sp), 39–57. IEEE
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., & Granger, E. (2019). Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In: Conference on Computer Vision and Pattern Recognition, pp. 4322–4330
Croce, F., & Hein, M. (2020) Minimally distorted adversarial examples with a fast adaptive boundary attack. In: International Conference on Machine Learning, 2196–2205. PMLR
Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR
Ding, G.W., Lui, K.Y.C., Jin, X., Wang, L., & Huang, R. (2019). On the sensitivity of adversarial robustness to input data distributions. In: ICLR (Poster)
Yu, C., Zhou, D., Shen, L., Yu, J., Han, B., Gong, M., Wang, N., & Liu, T. (2022). Strength-adaptive adversarial training. arXiv preprint arXiv:2210.01288
Zhou, J., Zhu, J., Zhang, J., Liu, T., Niu, G., Han, B., & Sugiyama, M. (2022). Adversarial training with complementary labels: On the benefit of gradually informative attacks. In: Advances in Neural Information Processing Systems
Yu, C., Han, B., Gong, M., Shen, L., Ge, S., Du, B., & Liu, T. (2022). Robust weight perturbation for adversarial training. arXiv preprint arXiv:2205.14826
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. In: International Conference on Machine Learning, 7472–7482. PMLR
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019). Improving adversarial robustness requires revisiting misclassified examples. In: International Conference on Learning Representations
Wu, D., Xia, S.-T., & Wang, Y. (2020). Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems 33
Jin, G., Shen, S., Zhang, D., Dai, F., & Zhang, Y. (2019). APE-GAN: adversarial perturbation elimination with GAN. In: International Conference on Acoustics, Speech and Signal Processing, 3842–3846
Zhou, D., Liu, T., Han, B., Wang, N., Peng, C., & Gao, X. (2021). Towards defending against adversarial examples via attack-invariant features. In: Proceedings of the 38th International Conference on Machine Learning, 12835–12845
Zhou, D., Wang, N., Peng, C., Gao, X., Wang, X., Yu, J., & Liu, T. (2021). Removing adversarial noise in class activation feature space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7878–7887
Shi, C., Holtz, C., & Mishne, G. (2021). Online adversarial purification based on self-supervised learning. In: International Conference on Learning Representations
Guo, C., Rana, M., Cissé, M., & Maaten, L. (2018). Countering adversarial images using input transformations. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings
Das, N., Shanbhogue, M., Chen, S.-T., Hohman, F., Li, S., Chen, L., Kounavis, M.E., & Chau, D.H. (2018). Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. ACM
Yang, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2022). Estimating instance-dependent bayes-label transition matrix using a deep neural network. In: International Conference on Machine Learning, pp. 25302–25312. PMLR
Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., & Sugiyama, M. (2020). Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems 33
Liu, T., & Tao, D. (2015). Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3), 447–461.
Article Google Scholar
Xia, X., Liu, T., Wang, N., Han, B., Gong, C., Niu, G., & Sugiyama, M. (2019). Are anchor points really indispensable in label-noise learning? arXiv preprint arXiv:1906.00189
Wu, S., Xia, X., Liu, T., Han, B., Gong, M., Wang, N., Liu, H., & Niu, G. (2021). Class2simi: A noise reduction perspective on learning with noisy labels. In: International Conference on Machine Learning, 11285–11295. PMLR
Xia, X., Liu, T., Han, B., Gong, C., Wang, N., Ge, Z., & Chang, Y. (2021). Robust early-learning: Hindering the memorization of noisy labels. In: International Conference on Learning Representations
Li, S., Xia, X., Zhang, H., Zhan, Y., Ge, S., & Liu, T. (2022). Estimating noise transition matrix with label correlations for noisy multi-label learning. In: NeurIPS
Krizhevsky, A., Hinton, G., & et al. (2009). Learning multiple layers of features from tiny images
Wu, J., Zhang, Q., & Xu, G. (2017). Tiny imagenet challenge. Technical Report
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, 770–778
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016
Kim, H. (2020). Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950
Ding, G.W., Wang, L., & Jin, X. (2019). Advertorch v0. 1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623
Andrew, G., & Gao, J. (2007). Scalable training of l 1-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, 33–40
Pang, T., Yang, X., Dong, Y., Su, H., & Zhu, J. (2021). Bag of tricks for adversarial training. In: International Conference on Learning Representations
Rice, L., Wong, E., & Kolter, Z. (2020). Overfitting in adversarially robust deep learning. In: International Conference on Machine Learning, 8093–8104. PMLR
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., & et al. (2016). Matching networks for one shot learning. Advances in neural information processing systems 29
Athalye, A., Carlini, N., & Wagner, D.A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning
Carlini, N., & Wagner, D. (2017). Magnet and” efficient defenses against adversarial attacks” are not robust to adversarial examples. arXiv preprint arXiv:1711.08478
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., & Loy, C. C. (2023). Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4396–4415. https://doi.org/10.1109/TPAMI.2022.3195549
Article Google Scholar
Huang, Z., Zhu, M., Xia, X., Shen, L., Yu, J., Gong, C., Han, B., Du, B., & Liu, T. (2023). Robust generalization against photon-limited corruptions via worst-case sharpness minimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16175–16185
Zhang, X., He, Y., Xu, R., Yu, H., Shen, Z., & Cui, P. (2023). Nico++: Towards better benchmarking for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16036–16047
Xia, X., Han, B., Wang, N., Deng, J., Li, J., Mao, Y., & Liu, T. (2023). Extended $t$t: Learning with mixed closed-set and open-set noisy labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3047–3058. https://doi.org/10.1109/TPAMI.2022.3180545
Article Google Scholar
Li, J., Sun, H., & Li, J. (2023). Beyond confusion matrix: learning from multiple annotators with awareness of instance features. Machine Learning, 112(3), 1053–1075.
Article MathSciNet Google Scholar
Guo, X., Liu, J., Liu, T., & Yuan, Y. (2023). Handling open-set noise and novel target recognition in domain adaptive semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–16 https://doi.org/10.1109/TPAMI.2023.3246392
Zhou, N., Zhou, D., Liu, D., Gao, X., & Wang, N. (2024). Mitigating feature gap for adversarial robustness by feature disentanglement. arXiv preprint arXiv:2401.14707

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants U22A2096 and 62036007, in part by Scientific and Technological Innovation Teams in Shaanxi Province, in part by the Shaanxi Province Core Technology Research and Development Project under grant 2024QY2-GJHX-11, in part by the Fundamental Research Funds for the Central Universities under GrantQTZX23042.

Author information

Authors and Affiliations

State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, 710071, Shaanxi, China
Dawei Zhou & Nannan Wang
TMLR Group, Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China
Bo Han
Sydney AI Center, School of Computer Science, Faculty of Engineering, The University of Sydney, Darlington, NSW2008, Australia
Tongliang Liu
School of Electronic Engineering Organization, Xidian University, Xi’an, 710071, Shaanxi, China
Xinbo Gao

Authors

Dawei Zhou
View author publications
Search author on:PubMed Google Scholar
Nannan Wang
View author publications
Search author on:PubMed Google Scholar
Bo Han
View author publications
Search author on:PubMed Google Scholar
Tongliang Liu
View author publications
Search author on:PubMed Google Scholar
Xinbo Gao
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Nannan Wang.

Additional information

Communicated by Gunhee Kim.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, D., Wang, N., Han, B. et al. Defending Against Adversarial Examples Via Modeling Adversarial Noise. Int J Comput Vis 133, 5920–5937 (2025). https://doi.org/10.1007/s11263-025-02467-7

Download citation

Received: 20 May 2024
Accepted: 28 April 2025
Published: 14 May 2025
Version of record: 14 May 2025
Issue date: September 2025
DOI: https://doi.org/10.1007/s11263-025-02467-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Defending Against Adversarial Examples Via Modeling Adversarial Noise

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

2N labeling defense method against adversarial attacks by filtering and extended class label set

An Adversarial Training Method for Improving Model Robustness in Unsupervised Domain Adaptation

Enhancing Model Robustness Against Adversarial Attacks with an Anti-adversarial Module

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now