Abstract
Adversarial examples have become a major threat to the reliable application of deep learning models. Meanwhile, this issue promotes the development of adversarial defenses. Adversarial noise contains well-generalizing and misleading features, which can manipulate predicted labels to be flipped maliciously. Motivated by this, we study modeling adversarial noise for defending against adversarial examples by learning the transition relationship between adversarial labels (i.e., flipped labels caused by adversarial noise) and natural labels (i.e., real labels of natural samples). In this work, we propose an adversarial defense method from the perspective of modeling adversarial noise. Specifically, we construct an instance-dependent label transition matrix to represent the label transition relationship for explicitly modeling adversarial noise. The label transition matrix is obtained from the input sample by leveraging a label transition network. By exploiting the label transition matrix, we can infer the natural label from the adversarial label and thus correct wrong predictions misled by adversarial noise. Additionally, to enhance the robustness of the label transition network, we design an adversarial robustness constraint at the transition matrix level. Experimental results demonstrate that our method effectively improves the robust accuracy against multiple attacks and exhibits great performance in detecting adversarial input samples.
Similar content being viewed by others
Data Availability
The datasets used in our paper are available in: CIFAR-10: https://www.cs.toronto.eduy/~kriz/cifar-10-python.tar.gz, Tiny-ImageNet: https://www.kaggle.com/c/tiny-imagenet/data, Mini-ImageNet: https://www.kaggle.com/datasets/arjunashok33/miniimagenet.
References
Goodfellow, I.J., Shlens, J., & Szegedy, C (2015) Explaining and harnessing adversarial examples. In: International Conference on Learning Representations
Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K (2018). Synthesizing robust adversarial examples. In: International Conference on Machine Learning, pp. 284–293. PMLR
Cai, J., Wang, B., Wang, X., & Jin, B (2019). Accelerate black-box attack with white-box prior knowledge. In: Intelligence Science and Big Data Engineering. Big Data and Machine Learning: 9th International Conference, IScIDE 2019, Nanjing, China, October 17–20, 2019, Proceedings, Part II 9, 394–405. Springer
Tramer, F., Carlini, N., Brendel, W., & Madry, A. (2020). On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33, 1633–1645.
Tang, S., Huang, X., Chen, M., Sun, C., & Yang, J. (2021). Adversarial attack type i: Cheat classifiers by significant changes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(3), 1100–1109. https://doi.org/10.1109/TPAMI.2019.2936378
Dong, Y., Zhu, J., & Gao, X.-S. (2022). Isometric 3d adversarial examples in the physical world. Advances in Neural Information Processing Systems, 35, 19716–19731.
Ying, C., Qiaoben, Y., Zhou, X., Su, H., Ding, W., & Ai, J. (2023). Consistent attack: Universal adversarial perturbation on embodied vision navigation. Pattern Recognition Letters, 168, 57–63.
Yin, M., Li, S., Song, C., Asif, M.S., Roy-Chowdhury, A.K., & Krishnamurthy, S.V (2022). Adc: Adversarial attacks against object detection that evade context consistency checks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3278–3287
Mi, J.-X., Wang, X.-D., Zhou, L.-F., & Cheng, K. (2023). Adversarial examples based on object detection tasks: A survey. Neurocomputing, 519, 114–126.
Wei, X., Guo, Y., & Yu, J. (2023). Adversarial sticker: A stealthy attack method in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 2711–2725. https://doi.org/10.1109/TPAMI.2022.3176760
Jaiswal, S., Duggirala, K., Dash, A., & Mukherjee, A. (2022). Two-face: Adversarial audit of commercial face recognition systems. In: Proceedings of the International AAAI Conference on Web and Social Media, 16, 381–392
Feng, S., Yan, X., Sun, H., Feng, Y., & Liu, H. X. (2021). Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nature communications, 12(1), 748.
Shibly, K. H., Hossain, M. D., Inoue, H., Taenaka, Y., & Kadobayashi, Y. (2023). Towards autonomous driving model resistant to adversarial attack. Applied Artificial Intelligence, 37(1), 2193461.
Bai, S., Li, Y., Zhou, Y., Li, Q., & Torr, P. H. S. (2021). Adversarial metric attack and defense for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 2119–2126. https://doi.org/10.1109/TPAMI.2020.3031625
Yang, F., Weng, J., Zhong, Z., Liu, H., Wang, Z., Luo, Z., Cao, D., Li, S., Satoh, S., & Sebe, N. (2023). Towards robust person re-identification by defending against universal attackers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 5218–5235. https://doi.org/10.1109/TPAMI.2022.3199013
Ghaffari Laleh, N., Truhn, D., Veldhuizen, G. P., Han, T., Treeck, M., Buelow, R. D., Langer, R., Dislich, B., Boor, P., & Schulz, V. (2022). Adversarial attacks and adversarial robustness in computational pathology. Nature Communications, 13(1), 5711.
Puttagunta, M.K., Ravi, S., & Nelson Kennedy Babu, C (2023). Adversarial examples: attacks and defences on medical deep learning systems. Multimedia Tools and Applications, 1–37
Li, X., Wang, Z., Zhang, B., Sun, F., & Hu, X. (2023). Recognizing object by components with human prior knowledge enhances adversarial robustness of deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence
Wei, Z., Wang, Y., Guo, Y., & Wang, Y. (2023). Cfa: Class-wise calibrated fair adversarial training. In: Conference on Computer Vision and Pattern Recognition
Yan, H., Zhang, J., Niu, G., Feng, J., Tan, V., & Sugiyama, M. (2021). Cifs: Improving adversarial robustness of cnns via channel-wise importance-based feature selection. In: International Conference on Machine Learning, 11693–11703. PMLR
Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638
Zhu, J., Yao, J., Liu, T., Xu, J., & Han, B. (2023). Combating exacerbated heterogeneity for robust models in federated learning. In: The Eleventh International Conference on Learning Representations
Liu, D., & Hu, W. (2023). Imperceptible transfer attack and defense on 3d point cloud classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4727–4746. https://doi.org/10.1109/TPAMI.2022.3193449
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., & Zhu, J. (2018). Defense against adversarial attacks using high-level representation guided denoiser. In: Conference on Computer Vision and Pattern Recognition, pp. 1778–1787
Naseer, M., Khan, S., Hayat, M., Khan, F.S., & Porikli, F. (2020). A self-supervised approach for adversarial robustness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 262–271
Yoon, J., Hwang, S.J., & Lee, J. (2021). Adversarial purification with score-based generative models. In: International Conference on Machine Learning, 12062–12072. PMLR
Li, Y., Zhang, W., Liu, J., Kou, X., Li, H., & Cui, J. (2021). Enhanced countering adversarial attacks via input denoising and feature restoring. arXiv preprint arXiv:2111.10075
Xiao, C., Chen, Z., Jin, K., Wang, J., Nie, W., Liu, M., Anandkumar, A., Li, B., & Song, D. (2022). Densepure: Understanding diffusion models towards adversarial robustness. arXiv preprint arXiv:2211.00322
Xu, W., Evans, D., & Qi, Y. (2017). Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. arXiv preprint arXiv:1905.02175
Wei, K.-A.A. (2020). Understanding non-robust features in image classification. PhD thesis, Massachusetts Institute of Technology
Kim, J., Lee, B.-K., & Ro, Y. M. (2021). Distilling robust and non-robust features in adversarial examples by information bottleneck. Advances in Neural Information Processing Systems, 34, 17148–17159.
Zhou, D., Wang, N., Han, B., & Liu, T. (2022) Modeling adversarial noise for adversarial training. In: International Conference on Machine Learning, 27353–27366. PMLR
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., & Fergus, R. (2014). Intriguing properties of neural networks. In: International Conference on Learning Representations
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018) Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9185–9193
Wang, X., & He, K. (2021). Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933
Carlini, N., & Wagner, D (2017) Towards evaluating the robustness of neural networks. In: 2017 Ieee Symposium on Security and Privacy (sp), 39–57. IEEE
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., & Granger, E. (2019). Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In: Conference on Computer Vision and Pattern Recognition, pp. 4322–4330
Croce, F., & Hein, M. (2020) Minimally distorted adversarial examples with a fast adaptive boundary attack. In: International Conference on Machine Learning, 2196–2205. PMLR
Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR
Ding, G.W., Lui, K.Y.C., Jin, X., Wang, L., & Huang, R. (2019). On the sensitivity of adversarial robustness to input data distributions. In: ICLR (Poster)
Yu, C., Zhou, D., Shen, L., Yu, J., Han, B., Gong, M., Wang, N., & Liu, T. (2022). Strength-adaptive adversarial training. arXiv preprint arXiv:2210.01288
Zhou, J., Zhu, J., Zhang, J., Liu, T., Niu, G., Han, B., & Sugiyama, M. (2022). Adversarial training with complementary labels: On the benefit of gradually informative attacks. In: Advances in Neural Information Processing Systems
Yu, C., Han, B., Gong, M., Shen, L., Ge, S., Du, B., & Liu, T. (2022). Robust weight perturbation for adversarial training. arXiv preprint arXiv:2205.14826
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. In: International Conference on Machine Learning, 7472–7482. PMLR
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019). Improving adversarial robustness requires revisiting misclassified examples. In: International Conference on Learning Representations
Wu, D., Xia, S.-T., & Wang, Y. (2020). Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems 33
Jin, G., Shen, S., Zhang, D., Dai, F., & Zhang, Y. (2019). APE-GAN: adversarial perturbation elimination with GAN. In: International Conference on Acoustics, Speech and Signal Processing, 3842–3846
Zhou, D., Liu, T., Han, B., Wang, N., Peng, C., & Gao, X. (2021). Towards defending against adversarial examples via attack-invariant features. In: Proceedings of the 38th International Conference on Machine Learning, 12835–12845
Zhou, D., Wang, N., Peng, C., Gao, X., Wang, X., Yu, J., & Liu, T. (2021). Removing adversarial noise in class activation feature space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7878–7887
Shi, C., Holtz, C., & Mishne, G. (2021). Online adversarial purification based on self-supervised learning. In: International Conference on Learning Representations
Guo, C., Rana, M., Cissé, M., & Maaten, L. (2018). Countering adversarial images using input transformations. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings
Das, N., Shanbhogue, M., Chen, S.-T., Hohman, F., Li, S., Chen, L., Kounavis, M.E., & Chau, D.H. (2018). Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. ACM
Yang, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2022). Estimating instance-dependent bayes-label transition matrix using a deep neural network. In: International Conference on Machine Learning, pp. 25302–25312. PMLR
Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., & Sugiyama, M. (2020). Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems 33
Liu, T., & Tao, D. (2015). Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3), 447–461.
Xia, X., Liu, T., Wang, N., Han, B., Gong, C., Niu, G., & Sugiyama, M. (2019). Are anchor points really indispensable in label-noise learning? arXiv preprint arXiv:1906.00189
Wu, S., Xia, X., Liu, T., Han, B., Gong, M., Wang, N., Liu, H., & Niu, G. (2021). Class2simi: A noise reduction perspective on learning with noisy labels. In: International Conference on Machine Learning, 11285–11295. PMLR
Xia, X., Liu, T., Han, B., Gong, C., Wang, N., Ge, Z., & Chang, Y. (2021). Robust early-learning: Hindering the memorization of noisy labels. In: International Conference on Learning Representations
Li, S., Xia, X., Zhang, H., Zhan, Y., Ge, S., & Liu, T. (2022). Estimating noise transition matrix with label correlations for noisy multi-label learning. In: NeurIPS
Krizhevsky, A., Hinton, G., & et al. (2009). Learning multiple layers of features from tiny images
Wu, J., Zhang, Q., & Xu, G. (2017). Tiny imagenet challenge. Technical Report
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, 770–778
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016
Kim, H. (2020). Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950
Ding, G.W., Wang, L., & Jin, X. (2019). Advertorch v0. 1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623
Andrew, G., & Gao, J. (2007). Scalable training of l 1-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, 33–40
Pang, T., Yang, X., Dong, Y., Su, H., & Zhu, J. (2021). Bag of tricks for adversarial training. In: International Conference on Learning Representations
Rice, L., Wong, E., & Kolter, Z. (2020). Overfitting in adversarially robust deep learning. In: International Conference on Machine Learning, 8093–8104. PMLR
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., & et al. (2016). Matching networks for one shot learning. Advances in neural information processing systems 29
Athalye, A., Carlini, N., & Wagner, D.A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning
Carlini, N., & Wagner, D. (2017). Magnet and” efficient defenses against adversarial attacks” are not robust to adversarial examples. arXiv preprint arXiv:1711.08478
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., & Loy, C. C. (2023). Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4396–4415. https://doi.org/10.1109/TPAMI.2022.3195549
Huang, Z., Zhu, M., Xia, X., Shen, L., Yu, J., Gong, C., Han, B., Du, B., & Liu, T. (2023). Robust generalization against photon-limited corruptions via worst-case sharpness minimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16175–16185
Zhang, X., He, Y., Xu, R., Yu, H., Shen, Z., & Cui, P. (2023). Nico++: Towards better benchmarking for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16036–16047
Xia, X., Han, B., Wang, N., Deng, J., Li, J., Mao, Y., & Liu, T. (2023). Extended \(t\)t: Learning with mixed closed-set and open-set noisy labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3047–3058. https://doi.org/10.1109/TPAMI.2022.3180545
Li, J., Sun, H., & Li, J. (2023). Beyond confusion matrix: learning from multiple annotators with awareness of instance features. Machine Learning, 112(3), 1053–1075.
Guo, X., Liu, J., Liu, T., & Yuan, Y. (2023). Handling open-set noise and novel target recognition in domain adaptive semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–16 https://doi.org/10.1109/TPAMI.2023.3246392
Zhou, N., Zhou, D., Liu, D., Gao, X., & Wang, N. (2024). Mitigating feature gap for adversarial robustness by feature disentanglement. arXiv preprint arXiv:2401.14707
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grants U22A2096 and 62036007, in part by Scientific and Technological Innovation Teams in Shaanxi Province, in part by the Shaanxi Province Core Technology Research and Development Project under grant 2024QY2-GJHX-11, in part by the Fundamental Research Funds for the Central Universities under GrantQTZX23042.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Gunhee Kim.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, D., Wang, N., Han, B. et al. Defending Against Adversarial Examples Via Modeling Adversarial Noise. Int J Comput Vis 133, 5920–5937 (2025). https://doi.org/10.1007/s11263-025-02467-7
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02467-7