Abstract
Image Quality Assessment (IQA) models predict the quality score of input images. They can be categorized into Full-Reference (FR-) and No-Reference (NR-) IQA models based on the availability of reference images. These models are essential for performance evaluation and optimization guidance in the media industry. However, researchers have observed that introducing imperceptible perturbations to input images can notably influence the predicted scores of both FR- and NR-IQA models, resulting in inaccurate assessments of image quality. This phenomenon is known as adversarial attacks. In this paper, we initially define attacks targeted at both FR-IQA and NR-IQA models. Subsequently, we introduce a defense approach applicable to both types of models, aimed at enhancing the stability of predicted scores and boosting the adversarial robustness of IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the \(\ell _1\) norm of the model’s gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the \(\ell _1\) norm of the gradient, thereby boosting the robustness of IQA models. Experiments conducted on three FR-IQA and four NR-IQA models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on both FR- and NR-IQA models. Our study offers valuable insights into the adversarial robustness of IQA models and provides a foundation for future research in this area.
Similar content being viewed by others
Data Availability
This paper uses public datasets to conduct experiments. Those datasets are available in following URLs. BAPPS (Zhang et al., 2018): https://github.com/richzhang/PerceptualSimilarity. LIVE (Sheikh et al., 2006): http://live.ece.utexas.edu/index.php. LIVEC (Ghadiyaram and Bovik, 2016): https://live.ece.utexas.edu/research/ChallengeDB/index.html.
References
Bakurov, I., Buzzelli, M., Schettini, R., et al. (2023). Full-reference image quality expression via genetic programming. IEEE TIP, 32, 1458–1473. https://doi.org/10.1109/TIP.2023.3244662
Bosse, S., Maniry, D., Müller, K. R., et al. (2018). Deep neural networks for no-reference and full-reference image quality assessment. IEEE TIP, 27(1), 206–219. https://doi.org/10.1109/TIP.2017.2760518
Ding, K., Ma, K., Wang, S., et al. (2021). Comparison of full-reference image quality models for optimization of image processing systems. IJCV, 129(4), 1258–1281. https://doi.org/10.1007/s11263-020-01419-7
Ding, K., Ma, K., Wang, S., et al. (2022). Image quality assessment: Unifying structure and texture similarity. IEEE TPAMI, 44(5), 2567–2581. https://doi.org/10.1109/TPAMI.2020.3045810
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth \(16\times 16\) words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Finlay, C., & Oberman, A. M. (2021). Scaleable input gradient regularization for adversarial robustness. Machine Learning with Applications, 3, Article 100017. https://doi.org/10.1016/j.mlwa.2020.100017
Fu, H., Liang, F., Liang, J., et al. (2023). Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering. IEEE TCSVT, 33(8), 4309–4321. https://doi.org/10.1109/TCSVT.2023.3237274
Ghadiyaram, D., & Bovik, A. C. (2016). Massive online crowdsourced study of subjective and objective picture quality. IEEE TIP, 25, 372–387. https://doi.org/10.1109/TIP.2015.2500021
Ghazanfari S, Garg S, Krishnamurthy P, et al (2023) R-LPIPS: An adversarially robust perceptual similarity metric. arXiv preprint arXiv:2307.15157
Ghildyal A, Liu F (2023) Attacking perceptual similarity metrics. Transactions on Machine Learning Research pp 1–23
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: ICLR, pp 1–11
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778, https://doi.org/10.1109/CVPR.2016.90
Hemachandran K, Rodriguez RV (2023) Artificial Intelligence for Business: An Implementation Guide Containing Practical and Industry-Specific Case Studies (1st ed.). CRC Press
Ilyas A, Engstrom L, Madry A (2019) Prior convictions: Black-box adversarial attacks with bandits and priors. In: ICLR, pp 1–23
Isogawa, M., Mikami, D., Takahashi, K., et al. (2019). Which is the better inpainted image? Training data generation without any manual operations. IJCV, 127(11–12), 1751–1766. https://doi.org/10.1007/s11263-018-1132-0
Ke J, Wang Q, Wang Y, et al (2021) MUSIQ: Multi-scale image quality transformer. In: ICCV, pp 5148–5157
Kettunen M, Härkönen E, Lehtinen J (2019) E-LPIPS: Robust perceptual image similarity via random transformation ensembles. arXiv preprint arXiv:1906.03973
Kim J, Lee S (2017) Deep learning of human visual sensitivity in image quality assessment framework. In: CVPR, pp 1969–1977
Korhonen J, You J (2022) Adversarial attacks against blind image quality assessment models. In: QoMEX Workshops, pp 3–11
Lao S, Gong Y, Shi S, et al (2022) Attentions help CNNs see better: Attention-based hybrid image quality assessment network. In: CVPR Workshops. IEEE, pp 1139–1148
Li D, Jiang T, Jiang M (2020) Norm-in-norm loss with faster convergence and better performance for image quality assessment. In: ACM MM, pp 789–797
Li, D., Jiang, T., & Jiang, M. (2021). Unified quality assessment of in-the-wild videos with mixed datasets training. IJCV, 129(4), 1238–1257. https://doi.org/10.1007/s11263-020-01408-w
Li, Y., Liu, K., Liu, S., et al. (2024). Involving distinguished temporal graph convolutional networks for skeleton-based temporal action segmentation. IEEE TCSVT, 34(1), 647–660. https://doi.org/10.1109/TCSVT.2023.3285416
Liu Y, Yang C, Li D, et al (2024) Defense against adversarial attacks on no-reference image quality models with gradient norm regularization. arXiv preprint arXiv:2403.11397
Madry A, Makelov A, Schmidt L, et al (2018) Towards deep learning models resistant to adversarial attacks. In: ICLR, pp 1–28
Mishra S, Dash A, Jena L (2021) Use of deep learning for disease detection and diagnosis. In: Bio-inspired Neurocomputing. Springer, p 181–201, https://doi.org/10.1007/978-981-15-5495-7_10
Mittal, A., Moorthy, A. K., & Bovik, A. C. (2012). No-reference image quality assessment in the spatial domain. IEEE TIP, 21(12), 4695–4708. https://doi.org/10.1109/TIP.2012.2214050
Mittal, A., Soundararajan, R., & Bovik, A. C. (2012). Making a “completely blind’’ image quality analyzer. SPL, 20(3), 209–212. https://doi.org/10.1109/LSP.2012.2227726
Naeem MF, Xian Y, Gool LV, et al (2024) I2DFormer+: Learning image to document summary attention for zero-shot image classification. IJCV pp 1–17. https://doi.org/10.1007/s11263-024-02053-3
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: MICCAI, pp 234–241
Sampat, M. P., Wang, Z., Gupta, S., et al. (2009). Complex wavelet structural similarity: A new image similarity index. IEEE TIP, 18(11), 2385–2401. https://doi.org/10.1109/TIP.2009.2025923
Sheikh, H. R., Sabir, M. F., & Bovik, A. C. (2006). A statistical evaluation of recent full reference image quality assessment algorithms. IEEE TIP, 15(11), 3440–3451. https://doi.org/10.1109/TIP.2006.881959
Shumitskaya E, Antsiferova A, Vatolin DS (2022) Universal perturbation attack on differentiable no-reference image- and video-quality metrics. In: BMVC, pp 1–12
Shumitskaya E, Antsiferova A, Vatolin DS (2023) Fast adversarial CNN-based perturbation attack on No-Reference image-and video-quality metrics. In: ICLR Workshops, pp 1–4
Su S, Yan Q, Zhu Y, et al (2020) Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: CVPR, pp 3664–3673
Szegedy C, Zaremba W, Sutskever I, et al (2014) Intriguing properties of neural networks. In: ICLR, pp 1–10
Tramèr F, Kurakin A, Papernot N, et al (2018) Ensemble adversarial training: Attacks and defenses. In: ICLR, pp 1–22
Wang, J., Chen, Z., Yuan, C., et al. (2023). Hierarchical curriculum learning for no-reference image quality assessment. IJCV, 131(11), 3074–3093. https://doi.org/10.1007/s11263-023-01851-5
Wang, Y., Mao, Q., Zhu, H., et al. (2023). Multi-modal 3d object detection in autonomous driving: a survey. IJCV, 131(8), 2122–2152. https://doi.org/10.1007/s11263-023-01784-z
Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13(4), 600–12. https://doi.org/10.1109/TIP.2003.819861
Wilmott P, Howison S, Dewynne J (1995) The Mathematics of Financial Derivatives: A Student Introduction. Cambridge University Press
Xiao C, Zhu J, Li B, et al (2018) Spatially transformed adversarial examples. In: ICLR, pp 1–30
Yang S, Wu T, Shi S, et al (2022) MANIQA: Multi-dimension attention network for no-reference image quality assessment. In: CVPR Workshops, pp 1190–1199
Ye P, Kumar J, Kang L, et al (2012) Unsupervised feature learning framework for no-reference image quality assessment. In: CVPR, pp 1098–1105
Zhang D, Zhang T, Lu Y, et al (2019) You only propagate once: Accelerating adversarial training via maximal principle. In: NeurIPS, pp 227–238
Zhang, L., Zhang, L., Mou, X., et al. (2011). FSIM: A feature similarity index for image quality assessment. IEEE TIP, 20(8), 2378–2386. https://doi.org/10.1109/TIP.2011.2109730
Zhang R, Isola P, Efros AA, et al (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp 586–595
Zhang, W., Ma, K., Yan, J., et al. (2020). Blind image quality assessment using a deep bilinear convolutional neural network. IEEE TCSVT, 30(1), 36–47. https://doi.org/10.1109/TCSVT.2018.2886771
Zhang W, Li D, Min X, et al (2022) Perceptual attacks of no-reference image quality models with human-in-the-loop. In: NeurIPS, pp 2916–2929
Zhou, Z., Ding, C., Li, J., et al. (2023). Sequential order-aware coding-based robust subspace clustering for human action recognition in untrimmed videos. IEEE TIP, 32, 13–28. https://doi.org/10.1109/TIP.2022.3224877
Zhu C, Cheng Y, Gan Z, et al (2020) FreeLB: Enhanced adversarial training for natural language understanding. In: ICLR, pp 1–14
Acknowledgements
This work is partially supported by Sino-German Center (M 0187) and the NSFC under contract 62088102. Thank Zhaofei Yu for the valuable suggestions on the writing and illustrations.
Author information
Authors and Affiliations
Contributions
Yujia Liu designed the proposed method, proposed theoretical analysis, and prepared the manuscript drafts. Chenxi Yang conducted main experiments in Sec. 5, drew Fig. 5 and Fig. 6, and polished the manuscript. Dingquan Li helped write Sec. 4, designed ablation studies, and polished the manuscript. Tingting Jiang helped to analyzed main experimental results (Sec. 5.2, 5.3), provided computational resources, and polished the manuscript. Tiejun Huang supervised the overall research direction, collated the experimental results and revised the manuscript drafts.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Code Availability
The code is available at https://github.com/YangiD/DefenseIQA-NT.
Additional information
Communicated by Gang Hua.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Yang, C., Li, D. et al. A Norm Regularization Training Strategy for Robust Image Quality Assessment Models. Int J Comput Vis 133, 5883–5897 (2025). https://doi.org/10.1007/s11263-025-02458-8
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02458-8