这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Deep Neural Networks are often vulnerable to adversarial attacks. Neural Architecture Search (NAS), one of the tools for developing novel deep neural architectures, demonstrates superior performance in prediction accuracy in various machine learning applications. However, the performance of a neural architecture discovered by NAS against adversarial attacks has not been sufficiently studied, especially under the regime of knowledge distillation. Given the presence of a robust teacher, we investigate if NAS would produce a robust neural architecture by inheriting robustness from the teacher. In this paper, we propose Robust Neural Architecture Search by Cross-Layer knowledge distillation (RNAS-CL), a novel NAS algorithm that improves the robustness of NAS by learning from a robust teacher through cross-layer knowledge distillation. Unlike previous knowledge distillation methods that encourage close student-teacher output only in the last layer, RNAS-CL automatically searches for the best teacher layer to supervise each student layer. Experimental results demonstrate the effectiveness of RNAS-CL and show that RNAS-CL produces compact and adversarially robust neural architectures. Our results point to new approaches for finding compact and robust neural architecture for many applications. The code of RNAS-CL is available at https://github.com/Statistical-Deep-Learning/RNAS-CL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availibility

The following datasets are employed in the experiments of this paper. (1) CIFAR-10, a collection of 60k images in 10 classes (Krizhevsky, 2009), which is available at https://www.cs.toronto.edu/\(\sim \)kriz/cifar.html; (2) ImageNet (ILSVRC-12), an image classification dataset (Russakovsky et al., 2015) with 1000 classes and about 1.2M images, which is available at https://www.image-net.org/challenges/LSVRC/; (3) ImageNet-100, a subset of ImageNet-1k dataset (Russakovsky et al., 2015) with 100 classes and about 130k images (Tian et al., 2020), which is available at https://github.com/HobbitLong/CMC.

References

  • Ahn, S., Hu, S. X., Damianou, A., Lawrence, N. D., & Dai, Z. (2019). Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9163–9171).

  • Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. A. (2019). Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems, 32.

  • Cai, H., Zhu, L., & Han, S. (2019). Proxylessnas: Direct neural architecture search on target task and hardware. In 7th international conference on learning representations, ICLR.

  • Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 IEEE symposium on security and privacy (pp. 39–57). IEEE

  • Chen, M., Peng, H., Fu, J., & Ling, H. (2021) Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12270–12280).

  • Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y. N., & Usunier, N. (2017). Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th international conference on machine learning, ICML. Proceedings of machine learning research (Vol. 70, pp. 854–863). PMLR.

  • Croce, F., & Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning (pp. 2206–2216). PMLR

  • Devaguptapu, C., Agarwal, D., Mittal, G., Gopalani, P., & Balasubramanian, V. N. (2021). On adversarial robustness: A neural architecture search perspective. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 152–161).

  • Dong, M., Li, Y., Wang, Y., & Xu, C. (2020). Adversarially robust neural architectures. arXiv preprint arXiv:2009.00902

  • Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018). Boosting adversarial attacks with momentum. In CVPR (pp. 9185–9193). Computer Vision Foundation/IEEE Computer Society.

  • Dziugaite, G. K., Ghahramani, Z., & Roy, D. M. (2016). A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853

  • Engstrom, L., Ilyas, A., Salman, H., Santurkar, S., & Tsipras, D. (2019). Robustness (python library). https://github.com/MadryLab/robustness

  • Goldblum, M., Fowl, L., Feizi, S., & Goldstein, T. (2020). Adversarially robust distillation. In Proceedings of the AAAI conference on artificial intelligence (pp. 3996–4003).

  • Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In 3rd international conference on learning representations, ICLR.

  • Gui, S., Wang, H., Yang, H., Yu, C., Wang, Z., & Liu, J. (2019). Model compression with adversarial robustness: A unified optimization framework. In Annual conference on neural information processing systems (pp. 1283–1294).

  • Guo, C., Rana, M., Cissé, M., & Maaten, L. (2018). Countering adversarial images using input transformations. In 6th international conference on learning representations, ICLR.

  • Guo, M., Yang, Y., Xu, R., Liu, Z., & Lin, D. (2020). When NAS meets robustness: In search of robust architectures against adversarial attacks. In IEEE/CVF conference on computer vision and pattern recognition, CVPR (pp. 628–637).

  • Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). EIE: Efficient inference engine on compressed deep neural network. In ISCA (pp. 243–254). IEEE Computer Society.

  • Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems, 28.

  • Hein, M., & Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. In Annual conference on neural information processing systems (pp. 2266–2276).

  • Hinton, G. E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. CoRR arXiv:1503.02531

  • Huang, H., Wang, Y., Erfani, S. M., Gu, Q., Bailey, J., & Ma, X. (2021). Exploring architectural ingredients of adversarially robust deep neural networks. In Advances in neural information processing systems (pp. 5545–5559).

  • Jang, E., Gu, S., & Poole, B. (2017). Categorical reparameterization with Gumbel–Softmax. In 5th international conference on learning representations, ICLR.

  • Kannan, H., Kurakin, A., & Goodfellow, I. (2018). Adversarial logit pairing. arXiv preprint arXiv:1803.06373

  • Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical report, Univ. Toronto.

  • Kurakin, A., Goodfellow, I., Bengio, S., Dong, Y., Liao, F., Liang, M., Pang, T., Zhu, J., Hu, X., & Xie, C., et al. (2018). Adversarial attacks and defences competition. In The NIPS’17 competition: Building intelligent systems (pp. 195–231). Springer.

  • Li, H.-T., Lin, S.-C., Chen, C.-Y., & Chiang, C.-K. (2019). Layer-level knowledge distillation for deep neural network learning. Applied Sciences, 9(10).

  • Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., & Chang, X. (2020). Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1989–1998).

  • Li, Y., Yang, Z., Wang, Y., & Xu, C. (2021). Neural architecture dilation for adversarial robustness. Advances in Neural Information Processing Systems, 34.

  • Lian, D., Zheng, Y., Xu, Y., Lu, Y., Lin, L., Zhao, P., Huang, J., & Gao, S. (2019). Towards fast adaptation of neural architectures with meta learning. In International conference on learning representations.

  • Liu, X., Cheng, M., Zhang, H., & Hsieh, C. (2018). Towards robust neural networks via random self-ensemble. In European conference on computer vision, ECCV (pp. 381–397).

  • Liu, H., Simonyan, K., & Yang, Y. (2019). Darts: Differentiable architecture search. In 7th international conference on learning representations, ICLR.

  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In 6th international conference on learning representations, ICLR.

  • Mok, J., Na, B., Choe, H., & Yoon, S. (2021). Advrush: Searching for adversarially robust neural architectures. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12322–12332).

  • Mo, Y., Wu, D., Wang, Y., Guo, Y., & Wang, Y. (2022). When adversarial training meets vision transformers: Recipes from training to architecture. Advances in Neural Information Processing Systems, 35, 18599–18611.

    Google Scholar 

  • Nath, U., Kushagra, S., & Yang, Y. (2020). Adjoined networks: A training paradigm with applications to network compression. arXiv preprint arXiv:2006.05624

  • Ning, X., Zhao, J., Li, W., Zhao, T., Zheng, Y., Yang, H., & Wang, Y. (2020). Discovering robust convolutional architecture at targeted capacity: A multi-shot approach. arXiv preprint arXiv:2012.11835

  • Pang, T., Xu, K., Dong, Y., Du, C., Chen, N., & Zhu, J. (2020). Rethinking SoftMax cross-entropy loss for adversarial robustness. In 8th international conference on learning representations, ICLR.

  • Park, W., Kim, D., Lu, Y., & Cho, M. (2019). Relational knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3967–3976).

  • Park, J., Li, S. R., Wen, W., Tang, P. T. P., Li, H., Chen, Y., & Dubey, P. (2017). Faster CNNs with direct sparse convolutions and guided pruning. In 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings. OpenReview.net.

  • Passalis, N., & Tefas, A. (2018). Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European conference on computer vision (ECCV) (pp. 268–284).

  • Peng, H., Du, H., Yu, H., Li, Q., Liao, J., & Fu, J. (2020). Cream of the crop: Distilling prioritized paths for one-shot neural architecture search. Advances in neural information processing systems, 33, 17955–17964.

    Google Scholar 

  • Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the AAAI conference on artificial intelligence (pp. 4780–4789).

  • Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Tan, J., Le, Q. V., & Kurakin, A. (2017). Large-scale evolution of image classifiers. In: Precup, D., Teh, Y. W. (Eds.), Proceedings of the 34th international conference on machine learning, ICML. Proceedings of machine learning research (Vol. 70, pp. 2902–2911). PMLR.

  • Rice, L., Wong, E., & Kolter, Z. (2020). Overfitting in adversarially robust deep learning. In International conference on machine learning (pp. 8093–8104). PMLR

  • Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., & Bengio, Y. (2015). Fitnets: Hints for thin deep nets. In 3rd international conference on learning representations (ICLR).

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sehwag, V., Mahloujifar, S., Handina, T., Dai, S., Xiang, C., Chiang, M., & Mittal, P. (2021). Robust learning meets generative models: Can proxy distributions improve adversarial robustness? In International conference on learning representations.

  • Sehwag, V., Wang, S., Mittal, P., & Jana, S. (2020). HYDRA: pruning adversarially robust neural networks. In Annual conference on neural information processing systems.

  • Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., Davis, L. S., Taylor, G., & Goldstein, T. (2019). Adversarial training for free! Advances in Neural Information Processing Systems, 32.

  • Su, D., Zhang, H., Chen, H., Yi, J., Chen, P., & Gao, Y. (2018). Is robustness the cost of accuracy? A comprehensive study on the robustness of 18 deep image classification models. In European Conference on Computer Vision, ECCV (pp. 644–661).

  • Sun, D., Yao, A., Zhou, A., & Zhao, H. (2019). Deeply-supervised knowledge synergy. In IEEE conference on computer vision and pattern recognition, CVPR (pp. 6997–7006).

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., & Fergus, R. (2014). Intriguing properties of neural networks. In 2nd international conference on learning representations, ICLR.

  • Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). MnasNet: Platform-aware neural architecture search for mobile. In IEEE conference on computer vision and pattern recognition, CVPR (pp. 2820–2828).

  • Tian, Y., Krishnan, D., & Isola, P. (2020). Contrastive multiview coding. In European conference on computer vision (pp. 776–794). Springer

  • Tian, Y., Krishnan, D., & Isola, P. (2020). Contrastive representation distillation. In 8th international conference on learning representations, ICLR.

  • Tian, Y., Krishnan, D., & Isola, P. (2020). Contrastive representation distillation. In International conference on learning representations.

  • Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D., & McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In 6th international conference on learning representations, ICLR.

  • Tung, F., & Mori, G. (2019). Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1365–1374).

  • Wan, A., Dai, X., Zhang, P., He, Z., Tian, Y., Xie, S., Wu, B., Yu, M., Xu, T., Chen, K., Vajda, P., & Gonzalez, J.E. (2020). Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In IEEE/CVF conference on computer vision and pattern recognition, CVPR (pp. 12962–12971).

  • Wong, E., Rice, L., & Kolter, J. Z. (2020). Fast is better than free: Revisiting adversarial training. In International conference on learning representations (ICLR).

  • Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In IEEE conference on computer vision and pattern recognition, CVPR (pp. 10734–10742).

  • Xie, C., & Yuille, A. L. (2020). Intriguing properties of adversarial training at scale. In 8th international conference on learning representations, ICLR.

  • Xie, G., Wang, J., Yu, G., Lyu, J., Zheng, F., & Jin, Y. (2023). Tiny adversarial multi-objective one-shot neural architecture search. Complex and Intelligent Systems.

  • Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., & Yuille, A. L. (2019). Improving transferability of adversarial examples with input diversity. In IEEE conference on computer vision and pattern recognition, CVPR (pp. 2730–2739).

  • Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G., Tian, Q., & Xiong, H. (2020). Pc-darts: Partial channel connections for memory-efficient architecture search. In ICLR. OpenReview.net.

  • Yan, Z., Guo, Y., & Zhang, C. (2018). Deep defense: Training DNNs with improved adversarial robustness. In Annual conference on neural information processing systems (pp. 417–426).

  • Yang, Y., You, S., Li, H., Wang, F., Qian, C., & Lin, Z. (2021). Towards improving the consistency, efficiency, and flexibility of differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6667–6676).

  • Yang, Y., Li, H., You, S., Wang, F., Qian, C., & Lin, Z. (2020). ISTA-NAS: Efficient and consistent neural architecture search by sparse coding. Advances in Neural Information Processing Systems, 33, 10503–10513.

    Google Scholar 

  • Ye, S., Lin, X., Xu, K., Liu, S., Cheng, H., Lambrechts, J., Zhang, H., Zhou, A., Ma, K., & Wang, Y. (2019). Adversarial robustness vs. model compression, or both? In IEEE/CVF international conference on computer vision, ICCV (pp. 111–120).

  • Yim, J., Joo, D., Bae, J., & Kim, J. (2017). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In IEEE conference on computer vision and pattern recognition, CVPR (pp. 7130–7138).

  • Yue, Z., Lin, B., Zhang, Y., & Liang, C. (2022). Effective, efficient and robust neural architecture search. In 2022 international joint conference on neural networks (IJCNN) (pp. 1–8). IEEE.

  • Zagoruyko, S., & Komodakis, N. (2017). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In 5th international conference on learning representations, ICLR.

  • Zhai, X., Oliver, A., Kolesnikov, A., & Beyer, L. (2019). S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1476–1485).

  • Zhang, H., Yu, Y., Jiao, J., Xing, E. P., El Ghaoui, L., & Jordan, M. I. (2019). Theoretically principled trade-off between robustness and accuracy. In Proceedings of the 36th International Conference on Machine Learning, ICML. Proceedings of Machine Learning Research (Vol. 97, pp. 7472–7482). PMLR.

  • Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., & Jordan, M. I. (2019). Theoretically principled trade-off between robustness and accuracy. In ICML. Proceedings of machine learning research (Vol. 97, pp. 7472–7482). PMLR.

  • Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. In 5th international conference on learning representations, ICLR.

  • Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697–8710).

Download references

Acknowledgements

This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 17STQAC00001-07-00. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security. This work was supported by the 2023 Mayo Clinic and Arizona State University Alliance for Health Care Collaborative Research Seed Grant Programm, and this work was supported in part by NSF grant 2323086.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingzhen Yang.

Additional information

Communicated by Zhouchen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Robust Teacher Models

In this section, we report the robustness of adversarially trained teacher models used throughout the paper on the CIFAR-10 dataset in Table 15.

Table 15 Robustness results for various teacher models on the CIFAR-10 dataset

Architecture

In this section, we discuss architectures for various proposed supernets used in RNAS-CL for the CIFAR-10, the ImageNet-100 and the ImageNet datasets. Table 13 describes the supernets used for CIFAR-10. We use supernets with three blocks. Supernets used for ImageNet-100 and ImageNet are described in Table 14. For ImageNet-100, the number of blocks varies from 3 to 5.

Fig. 12
figure 12

Illustration of searching for the neural architecture of a convolution layer of a student model using the searching mechanism in FBNetV2. \(\left\{ g_{w}^{(i)}\right\} \) represents the Gumbel weights associated with different filter choices

Architecture Search by FBNetV2

RNAS-CL builds an efficient and adversarially robust deep learning model. In this work, we use the training paradigm of FBNetV2 to search for efficient neural architecture. Figure 12 illustrates the searching process for the neural architecture of a single convolution layer. Each filter choice is associated with a Gumbel weight. These Gumbel weights are optimized to decide the best filter choice for the convolution layer.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nath, U., Wang, Y., Turaga, P. et al. RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation. Int J Comput Vis 132, 5698–5717 (2024). https://doi.org/10.1007/s11263-024-02133-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02133-4

Keywords

Profiles

  1. Yingzhen Yang