Abstract
Blind image quality assessment (BIQA) aims at automatically and accurately forecasting objective scores for visual signals, which has been widely used to monitor product and service quality in low-light applications, covering smartphone photography, video surveillance, autonomous driving, etc. Recent developments in this field are dominated by unimodal solutions inconsistent with human subjective rating patterns, where human visual perception is simultaneously reflected by multiple sensory information. In this article, we present a unique blind multimodal quality assessment (BMQA) of low-light images from subjective evaluation to objective score. To investigate the multimodal mechanism, we first establish a multimodal low-light image quality (MLIQ) database with authentic low-light distortions, containing image-text modality pairs. Further, we specially design the key modules of BMQA, considering multimodal quality representation, latent feature alignment and fusion, and hybrid self-supervised and supervised learning. Extensive experiments show that our BMQA yields state-of-the-art accuracy on the proposed MLIQ benchmark database. In particular, we also build an independent single-image modality Dark-4K database, which is used to verify its applicability and generalization performance in mainstream unimodal applications. Qualitative and quantitative results on Dark-4K show that BMQA achieves superior performance to existing BIQA approaches as long as a pre-trained model is provided to generate text descriptions. The proposed framework and two databases as well as the collected BIQA methods and evaluation metrics are made publicly available on https://charwill.github.io/bmqa.html.
Similar content being viewed by others
Data Availability
The datasets that support the findings of this work are available from the reasonable request, and also available at https://charwill.github.io/bmqa.html.
References
Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.
Bosse, S., Maniry, D., Müller, K. R., Wiegand, T., & Samek, W. (2018). Deep neural networks for no-reference and full-reference image quality assessment. IEEE Transactions on Image Processing, 27(1), 206–219.
Božić-Štulić, D., Marušić, Ž, & Gotovac, S. (2019). Deep learning approach in aerial imagery for supporting land search and rescue missions. Springer International Journal of Computer Vision, 127(9), 1256–1278.
Chen, B., Cao, Q., Hou, M., Zhang, Z., Lu, G., & Zhang, D. (2022). Multimodal emotion recognition with temporal and semantic consistency. IEEE Transactions on Audio, Speech and Language Processing, 29(1), 3592–3603.
Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3291–3300.
Chen, L., Fu, Y., Wei, K., Zheng, D., & Heide, F. (2024). Instance Segmentation in the Dark. Springer International Journal of Computer Vision, 131(8), 2198–2218.
Chen, L., Zhang, J., Pan, J., Lin, S., Fang, F., & Ren, J.S. (2021). Learning a non-blind deblurring network for night blurry images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10542–10550.
Deng, Y., Loy, C. C., & Tang, X. (2017). Image aesthetic assessment: An experimental survey. IEEE Signal Processing Magazine, 34(4), 80–106.
Ding, K., Ma, K., Wang, S., & Simoncelli, E. P. (2021). Comparison of full-reference image quality models for optimization of image processing systems. Springer International Journal of Computer Vision, 129, 1258–1281.
Dmello, S. K., & Kory, J. (2015). A review and meta-analysis of multimodal affect detection systems. ACM Computing Surveys, 47(3), 1–36.
Fang, Y., Du, R., Zuo, Y., Wen, W., & Li, L. (2020). Perceptual quality assessment for screen content images by spatial continuity. IEEE Transactions on Circuits and Systems for Video Technology, 30(11), 4050–4063.
Fang, Y., Huang, L., Yan, J., Liu, X., & Liu, Y. (2022). Perceptual quality assessment of omnidirectional images. In AAAI Conference on Artificial Intelligence (AAAI), pp. 580–588.
Gordon, I. E. (2004). Theories of visual perception. Psychology press.
Gregory, R. L. (1980). Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London. B Biological Sciences, 290(1038), 181–197.
Gu, J., Meng, G., Da, C., Xiang, S., & Pan, C. (2019). No-reference image quality assessment with reinforcement recursive list-wise ranking. In AAAI Conference on Artificial Intelligence (AAAI), pp. 8336–8343.
Gu, K., Zhou, J., Qiao, J. F., Zhai, G., Lin, W., & Bovik, A. C. (2017). No-reference quality assessment of screen content pictures. IEEE Transactions on Image Processing, 26(8), 4005–4018.
Hanjalic, A., & Xu, L. Q. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1), 143–154.
Hosu, V., Lin, H., Sziranyi, T., & Saupe, D. (2020). KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing, 29(1), 4041–4056.
Kang, L., Ye, P., Li, Y., & Doermann, D. (2014). Convolutional neural networks for no-reference image quality assessment. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1733–1740.
Ke, J., Wang, Q., Wang, Y., Milanfar, P., & Yang, F. (2021). MUSIQ: Multi-scale image quality transformer. In IEEE international conference on computer vision (ICCV), pp. 5148–5157.
Kim, J., Nguyen, A. D., & Lee, S. (2019). Deep CNN-based blind image quality predictor. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 11–24.
Kong, X., & Yang, Q. (2018). No-reference image quality assessment for image auto-denoising. Springer International Journal of Computer Vision, 126, 537–549.
Li, C., Guo, C., Han, L. H., Jiang, J., Cheng, M. M., Gu, J., & Loy, C. C. (2022). Low-light image and video enhancement using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 9396–9416.
Li, D., Jiang, T., & Jiang, M. (2020). Norm-in-norm loss with faster convergence and better performance for image quality assessment. In ACM international conference on multimedia (MM), pp. 789–797.
Li, D., Jiang, T., & Jiang, M. (2021). Unified quality assessment of in-the-wild videos with mixed datasets training. Springer International Journal of Computer Vision, 129, 1238–1257.
Li, Q., Lin, W., & Fang, Y. (2016). No-reference quality assessment for multiply-distorted images in gradient domain. IEEE Signal Processing Letters, 23(4), 541–545.
Liu, C., Mao, Z., Zhang, T., Liu, A., Wang, B., & Zhang, Y. (2022). Focus Your Attention: A Focal Attention for Multimodal Learning. IEEE Transactions on Multimedia, 24(1), 103–115.
Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. Springer International Journal of Computer Vision, 129, 1153–1184.
Liu, L., Liu, B., Huang, H., & Bovik, A. C. (2014). No-reference image quality assessment based on spatial and spectral entropies. Elsevier Signal Processing: Image Communication, 29(8), 856–863.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1–35.
Liu, T. J., & Liu, K. H. (2018). No-reference image quality assessment by wide-perceptual-domain scorer ensemble method. IEEE Transactions on Image Processing, 27(3), 1138–1151.
Liu, X., Van De Weijer, J., & Bagdanov, A. D. (2019). Exploiting unlabeled data in CNNs by self-supervised learning to rank. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1862–1878.
Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Elsevier Computer Vision and Image Understanding, 178, 30–42.
Ma, R., Luo, H., Wu, Q., Ngan, K.N., Li, H., Meng, F., & Xu, L. (2021). Remember and Reuse: Cross-task blind image quality assessment via relevance-aware incremental learning. In ACM international conference on multimedia (MM), pp. 5248–5256.
Madhusudana, P. C., Birkbeck, N., Wang, Y., Adsumilli, B., & Bovik, A. C. (2022). Image quality assessment using contrastive learning. IEEE Transactions on Image Processing, 31(1), 4149–4161.
Martinez, H. A. B., & Farias, M. C. (2018). Combining audio and video metrics to assess audio-visual quality. Springer Multimedia Tools and Applications, 77(18), 23993–24012.
Min, X., Zhai, G., Zhou, J., Farias, M. C., & Bovik, A. C. (2020). Study of subjective and objective quality assessment of audio-visual signals. IEEE Transactions on Image Processing, 29(5588), 6054–6068.
Mittal, A., Moorthy, A. K., & Bovik, A. C. (2012). No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12), 4695–4708.
Pinson, M. H., Janowski, L., Pépion, R., Huynh-Thu, Q., Schmidmer, C., Corriveau, P., Younkin, A., Le Callet, P., Barkowsky, M., & Ingram, W. (2012). The influence of subjects and environment on audiovisual subjective tests: An international study. IEEE Journal of Selected Topics in Signal Processing, 6(6), 640–651.
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., & Krueger, G. (2021). Learning transferable visual models from natural language supervision. In PMLR international conference on machine learning (ICML), pp. 8748–8763.
Song, G., Wang, S., Huang, Q., & Tian, Q. (2019). Harmonized multimodal learning with Gaussian process latent variable models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(3), 858–872.
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. F. (2020). Learning to summarize with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 33, 3008–3021.
Su, S., Yan, Q., Zhu, Y., Zhang, C., Ge, X., Sun, J., & Zhang, Y. (2020). Blindly assess image quality in the wild guided by a self-adaptive hyper network. In IEEE Conference on computer vision and pattern recognition (CVPR), pp. 3667–3676.
Tian, Y., Zeng, H., Hou, J., Chen, J., Zhu, J., & Ma, K. K. (2021). A light field image quality assessment model based on symmetry and depth features. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 2046–2050.
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2016). Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 652–663.
Virtanen, T., Nuutinen, M., Vaahteranoksa, M., Oittinen, P., & Häkkinen, J. (2014). CID2013: A database for evaluating no-reference image quality assessment algorithms. IEEE Transactions on Image Processing, 24(1), 390–402.
Wade, N., & Swanston, M. (2013). Visual perception: An introduction. Psychology Press.
Wang, G., Chen, C., Fan, D.P., Hao, A., & Qin, H. (2021). From semantic categories to fixations: A novel weakly-supervised visual-auditory saliency detection approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15119–15128.
Wang, J., Chen, Z., Yuan, C., Li, B., Ma, W., & Hu, W. (2023). Hierarchical curriculum learning for no-reference image quality assessment. Springer International Journal of Computer Vision, 131, 3074–3093.
Wang, M., Huang, Y., Lin, J., Xie, W., Yue, G., Wang, S., & Li, L. (2021). Quality measurement of screen images via foreground perception and background suppression. IEEE Transactions on Instrumentation and Measurement, 70, 1–11.
Wang, M., Huang, Y., Xiong, J., & Xie, W. (2022). Low-light images in-the-wild: A novel visibility perception-guided blind quality indicator. IEEE Transactions on Industrial Informatics, 19(4), 6026–6036.
Wang, M., Huang, Y., & Zhang, J. (2021). Blind quality assessment of night-time images via weak illumination analysis. In IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6.
Wang, M., Xu, Z., Gong, Y., & Xie, W. (2022). S-CCR: Super-complete comparative representation for low-light image quality inference in-the-wild. In ACM international conference on multimedia (MM), pp. 5219–5227.
Wu, J., Ma, J., Liang, F., Dong, W., Shi, G., & Lin, W. (2020). End-to-end blind image quality prediction with cascaded deep neural network. IEEE Transactions on Image Processing, 29(1), 7414–7426.
Xiang, T., Yang, Y., & Guo, S. (2019). Blind night-time image quality assessment: Subjective and objective approaches. IEEE Transactions on Multimedia, 22(5), 1259–1272.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In PMLR international conference on machine learning (ICML), pp. 2048–2057.
Xu, Y., Cao, H., Yin, J., Chen, Z., Li, X., Li, Z., Xu, Q., & Yang, J. (2024). Going deeper into recognizing actions in dark environments: A comprehensive benchmark study. Springer International Journal of Computer Vision, 132(4), 1292–1309.
Yan, B., Bare, B., & Tan, W. (2019). Naturalness-aware deep no-reference image quality assessment. IEEE Transactions on Multimedia, 21(10), 2603–2615.
Yang, Q., Ma, Z., Xu, Y., Li, Z., & Sun, J. (2022). Inferring point cloud quality via graph similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 3015–3029.
Yang, W., Wu, J., Tian, S., Li, L., Dong, W., & Shi, G. (2022). Fine-grained image quality caption with hierarchical semantics degradation. IEEE Transactions on Image Processing, 31(1), 3578–3590.
Ying, Z., Ghadiyaram, D., & Bovik, A. (2022). Telepresence video quality assessment. In: Springer European Conference on Computer Vision (ECCV), pp. 327–347.
Ying, Z., Niu, H., Gupta, P., Mahajan, D., Ghadiyaram, D., & Bovik, A.C. (2020). From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3575–3585.
Zhai, G., Wu, X., Yang, X., Lin, W., & Zhang, W. (2012). A psychovisual quality metric in free-energy principle. IEEE Transactions on Image Processing, 21(1), 41–52.
Zhang, J., Dong, B., Fu, Y., Wang, Y., Wei, X., Yin, B., & Yang, X. (2024). A universal event-based plug-in module for visual object tracking in degraded conditions. Springer International Journal of Computer Vision, 132(5), 1857–1879.
Zhang, L., Zhang, L., & Bovik, A. C. (2015). A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing, 24(8), 2579–2591.
Zhang, W., Li, D., Ma, C., Zhai, G., Yang, X., & Ma, K. (2023). Continual learning for blind image quality assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 2864–2878.
Zhang, W., Ma, K., Yan, J., Deng, D., & Wang, Z. (2020). Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology, 30(1), 36–47.
Zhang, W., Ma, K., Zhai, G., & Yang, X. (2021). Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing, 30(1), 3474–3486.
Zheng, Y., Chen, W., Lin, R., Zhao, T., & Le Callet, P. (2022). Uif: An objective quality assessment for underwater image enhancement. IEEE Transactions on Image Processing, 1, 5456–5468.
Zhu, H., Li, L., Wu, J., Dong, W., & Shi, G. (2020). MetaIQA: Deep meta-learning for no-reference image quality assessment. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 14143–14152.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62472290 and Grant 62372306, in part by the Natural Science Foundation of Guangdong Province under Grant 2024A1515011972, Grant 2023A1515011197, and Grant 2022A1515011245, and in part by the Natural Science Foundation of Shenzhen City under Grant 20220809160139001.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Communicated by Jean-François Lalonde.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, M., Xu, Z., Xu, M. et al. Blind Multimodal Quality Assessment of Low-Light Images. Int J Comput Vis 133, 1665–1688 (2025). https://doi.org/10.1007/s11263-024-02239-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02239-9