Abstract
Adversarial training (AT) has been demonstrated as one of the most promising defense methods against various adversarial attacks. To our knowledge, existing AT-based methods usually train with the locally most adversarial perturbed points and treat all the perturbed points equally, which may lead to considerably weaker adversarial robust generalization on test data. In this work, we introduce a new adversarial training framework that considers the diversity as well as characteristics of the perturbed points in the vicinity of benign samples. To realize the framework, we propose a Regional Adversarial Training (RAT) defense method that first utilizes the attack path generated by the typical iterative attack method of projected gradient descent (PGD), and constructs an adversarial region based on the attack path. Then, RAT samples diverse perturbed training points efficiently inside this region, and utilizes a distance-aware label smoothing mechanism to capture our intuition that perturbed points at different locations should have different impact on the model performance. Extensive experiments on several benchmark datasets show that RAT consistently makes significant improvement on standard adversarial training (SAT), and exhibits better robust generalization.
Similar content being viewed by others
References
Athalye, A., Carlini, N., & Wagner, D.A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: ICML, pp 274–283
Buckman, J., Roy, A., Raffel, C., et al. (2018). Thermometer encoding: One hot way to resist adversarial examples. In: ICLR
Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In: IEEE S &P, pp 39–57
Dabouei, A., Soleymani, S., Taherkhani, F., et al. (2020). Exploiting joint robustness to adversarial perturbations. In: CVPR, pp 1122–1131
Deng, Z., Dong, Y., Pang, T., et al. (2020). Adversarial distributional training for robust deep learning. In: NeurIPS, pp 8270–8283
Dong, Y., Fu, Q.A., Yang, X., et al. (2020). Benchmarking adversarial robustness on image classification. In: CVPR, pp 321–331
Gilmer, J., Adams, R.P., Goodfellow, I.J., et al. (2018). Motivating the rules of the game for adversarial example research. arXiv:1807.06732
Gilmer, J., Ford, N., Carlini, N., et al. (2019). Adversarial examples are a natural consequence of test error in noise. In: ICML, pp 2280–2289
Girshick, R. (2015). Fast R-CNN. In: ICCV, pp 1440–1448
Goodfellow. I.J, Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In: ICLR
He, K., Zhang, X., Ren, S., et al. (2016a). Deep residual learning for image recognition. In: CVPR, pp 770–778
He, K., Zhang, X., Ren, S., et al. (2016b). Identity mappings in deep residual networks. In: ECCV, pp 630–645
Hendrycks, D., & Dietterich, T.G. (2019). Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR
Howard, F.J. (2019). The ImageNette dataset. https://github.com/fastai/imagenette
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report
Lee, S., Lee, H., & Yoon, S. (2020). Adversarial vertex mixup: Toward better adversarially robust generalization. In: CVPR, pp 272–281
Li, Y., Li, L., Wang, L., et al. (2019). NATTACK: learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. In: ICML, pp 3866–3876
Lin, T.Y., Maire, M., Belongie, S., et al. (2014). Microsoft COCO: Common objects in context. In: ECCV, pp 740–755
Madry, A., Makelov, A., Schmidt, L., et al. (2018). Towards deep learning models resistant to adversarial attacks. In: ICLR
Montasser, O., Hanneke, S., & Srebro, N. (2019). VC classes are adversarially robustly learnable, but only improperly. In: COLT, pp 2512–2530
Pang, T., Xu, K., Du, C., et al. (2019). Improving adversarial robustness via promoting ensemble diversity. In: ICML, pp 4970–4979
Papernot, N., McDaniel, P.D., & Goodfellow, I.J. (2016). Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv:1807.06732
Papernot, N., McDaniel, P.D., Goodfellow, I.J., et al. (2017). Practical black-box attacks against machine learning. In: AsiaCCS, pp 506–519
Rice, L., Wong, E., & Kolter, J.Z. (2020). Overfitting in adversarially robust deep learning. In: ICML, pp 8093–8104
Schmidt, L., Santurkar, S., Tsipras, D., et al. (2018). Adversarially robust generalization requires more data. In: NeurIPS, pp 5019–5031
Song, C., He, K., Wang, L., et al. (2019). Improving the generalization of adversarial training with domain adaptation. In: ICLR
Szegedy, C., Zaremba, W., Sutskever, I., et al. (2014). Intriguing properties of neural networks. In: ICLR
Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826
Tramèr, F., Kurakin, A., Papernot, N., et al. (2018). Ensemble adversarial training: Attacks and defenses. In: ICLR
Wang, Y., Zou, D., Yi, J., et al. (2019). Improving adversarial robustness requires revisiting misclassified examples. In: ICLR
Yang, Y., Zhang, G., Xu, Z., et al. (2019). ME-Net: Towards effective adversarial robustness with matrix estimation. In: ICML, pp 7025–7034
Yin, D., Lopes, R.G., Shlens, J., et al. (2019a). A fourier perspective on model robustness in computer vision. In: NeurIPS, pp 13,255–13,265
Yin, D., Ramchandran, K., & Bartlett, P.L. (2019b). Rademacher complexity for adversarially robust generalization. In: ICML, pp 7085–7094
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In: BMVC
Zhang, H., Yu, Y., Jiao, J., et al. (2019). Theoretically principled trade-off between robustness and accuracy. In: ICML, pp 7472–7482
Zhang, J., Xu, X., Han, B., et al. (2020). Attacks which do not kill training make adversarial learning stronger. In: ICML, pp 11,278–11,287
Acknowledgements
This work is supported by National Natural Science Foundation of China (U22B2017, 62076105) and International Cooperation Foundation of Hubei Province, China (2021EHB011). Baoyuan Wu is supported by National Science Foundation of China (62076213), and Shenzhen Science and Technology Program (GXWD20201231105722002-20200901175001001).
Special thanks to Yichen Yang and Xin Liu for their valuable suggestions that have greatly improved this work.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Oliver Zendel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
We report the supplemental empirical results on CIFAR-10 below, including the evaluation results on other threat models and the performance on AutoAttack.
On other Threat Models. To verify the generality of our method, we compare the robustness performance of RAT and TRADES against PGD20 under different perturbation norms, using a smaller network of ResNet-18. As shown in Table 7, RAT outperforms the TRADES baseline by a clear margin of 6.5% under PGD attack with \(l_\infty \) perturbation boundary 8/255. With \(l_2\) magnitude of perturbation 128/255, RAT achieves a higher accuracy by 2.49%. After combining with TRADES, the model robustness also improves slightly.
Performance on AutoAttack. We evaluate the performance of our proposed method on AutoAttack (AA), but the results fall short of our expectations. As shown in Table 8, the effectiveness of AA is primarily attributed to APGD\(_{DLR}\), while RAT demonstrates poor defense against it. The possible explanation for this could be that RAT excessively prioritizes the diversity and characteristics of adversarial examples, thereby overlooking the difference between benign points and sampling points. Adding a regularization term to the loss function, as demonstrated in Sect. 4.2, can mitigate this issue.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, C., Fan, Y., Zhou, A. et al. Regional Adversarial Training for Better Robust Generalization. Int J Comput Vis 132, 4510–4520 (2024). https://doi.org/10.1007/s11263-024-02103-w
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02103-w