Abstract
Single image super-resolution (SISR) methods often encounter great performance drops on severe degraded low-resolution (LR) images. Recently, reference-based super-resolution (RefSR) methods offer a promising solution by introducing high-quality reference (Ref) images as prior for reconstruction. However, existing RefSR methods often struggle to effectively explore informative textures from Ref images. Additionally, their performance is significantly restricted by the quality of the available training dataset. To address these challenges, we propose an innovative multi-scale texture fusion for reference-based super-resolution via a state-space model, which enables efficient multi-scale feature fusion and long-term dependency modeling for better texture restoration. Specifically, our method mainly consists of a series of texture matching fusion groups (TMFG), which include a multi-scale matching module (MSMM) and a state-space fusion module (SSFM). MSMM can match multi-scale features of Ref images at different stages of the image restoration process to accurately locate key multi-scale similar textures contained in Ref images. SSFM effectively fuses multi-scale textures obtained from Ref images by leveraging stronger modeling relationships based on long-term dependencies of linear complexity. Such a design encourages in-depth exploration of the multi-scale texture correspondence between LR and Ref images, thereby achieving the utilization of textures in Ref and better assisting in image restoration. Notably, we introduce a new large-scale dataset, dubbed DRefSR, designed explicitly for the RefSR task. Our DRefSR offers a wider variety of scenes, more accurately matched image pairs, and a larger volume of samples. With 47,653 image pairs, our dataset substantially exceeds existing datasets (13,761 pairs) in scale. Experiments verify our dataset’s superiority and demonstrate that our method outperforms SOTA methods both quantitatively and qualitatively. DRefSR dataset and code are available at: https://github.com/edbca/SSMTF.
Similar content being viewed by others
Data Availability
The data supporting the results of this study are partially derived from the open-source CUFED dataset(Wang et al., 2016), and we have also captured additional data using cameras and mobile phones to create our DRefSR dataset. Our dataset have been made publicly available, and you can access them at https://github.com/edbca/SSMTF or https://pan.baidu.com/s/1vrrM56n5xHKRrs3f3kf0-w?pwd=gnt9.
References
Aslahishahri, M., Ubbens, J., & Stavness, I. (2024). Hitsr: A hierarchical transformer for reference-based super-resolution. arXiv preprint arXiv:2408.16959
Cao, J., Liang, J., Zhang, K., Li, Y., Zhang, Y., Wang, W., & Gool, L.V. (2022). Reference-based image super-resolution with deformable attention transformer. European conference on computer vision (pp. 325–342).
Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., & Gao, W. (2021). Pre-trained image processing transformer. Ieee/cvf conference on computer vision and pattern recognition (pp. 12299–12310).
Chen, X., Wang, X., Zhou, J., Qiao, Y., & Dong, C. (2023). Activating more pixels in image super-resolution transformer. Ieee/cvf conference on computer vision and pattern recognition (pp. 22367–22377).
Chen, Z., Zhang, Y., Gu, J., Kong, L., Yang, X., & Yu, F. (2023). Dual aggregation transformer for image super-resolution. International conference on computer vision (pp. 12312–12321).
Chung, H., Sim, B., & Ye, J.C. (2022). Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 12413–12422).
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. Proceedings of the ieee/cvf international conference on computer vision (pp. 764–773).
Dai, T., Cai, J., Zhang, Y., Xia, S-T., & Zhang, L. (2019). Second-order attention network for single image super-resolution. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 11065–11074).
Dai, T., Zha, H., Jiang, Y., & Xia, S-T. (2019). Image super-resolution via residual block attention networks. Ieee/cvf international conference on computer vision workshops.
Dong, C., Loy, C.C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. European conference on computer vision (pp. 184–199).
Dong, R., Yuan, S., Luo, B., Chen, M., Zhang, J., Zhang, L., & Fu, H. (2024). Building bridges across spatial and temporal resolutions: Reference-based super-resolution via change priors and conditional diffusion model. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 27684–27694).
Fang, F., Li, J., & Zeng, T. (2020). Soft-edge assisted network for single image super-resolution. IEEE Transactions on Image Processing, 29, 4656–4668.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27
Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752
Gu, J., & Dong, C. (2021). Interpreting super-resolution networks with local attribution maps. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 9199–9208).
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A.C. (2017). Improved training of wasserstein gans. Advances in Neural Information Processing Systems, 30,
Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., & Xia, S-T. (2024). Mambair: A simple baseline for image restoration with state-space model. arXiv preprint arXiv:2402.15648
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 9729–9738).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Ieee conference on computer vision and pattern recognition (pp. 770–778).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30, 25–34.
Hinton, G. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 7132–7141).
Huang, J-B., Singh, A., & Ahuja, N. (2015). Single image super-resolution from transformed self-exemplars. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 5197–5206).
Huang, Y., Li, J., Gao, X., Hu, Y., & Lu, W. (2021). Interpretable detail-fidelity attention network for single image super-resolution. IEEE Transactions on Image Processing, 30, 2325–2339.
Jiang, Y., Chan, K.C., Wang, X., Loy, C.C., & Liu, Z. (2021). Robust reference-based super-resolution via c2-matching. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (p.2103-2112).
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. European conference on computer vision (pp. 694–711).
Kawar, B., Elad, M., Ermon, S., & Song, J. (2022). Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35, 23593–23606.
Ke, J., Wang, Q., Wang, Y., Milanfar, P., & Yang, F. (2021). Musiq: Multi-scale image quality transformer. Proceedings of the ieee/cvf international conference on computer vision (pp. 5148–5157).
Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 1646–1654).
Kong, X., Zhao, H., Qiao, Y., & Dong, C. (2021). Classsr: A general framework to accelerate super-resolution networks by data characteristic. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 12016–12025).
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., and others (2017). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 4681–4690).
Li, F., Cong, R., Wu, J., Bai, H., Wang, M., & Zhao, Y. (2024). Srconvnet: A transformer-style convnet for lightweight image super-resolution. International Journal of Computer Vision, 133, 1–17.
Li, G., Rao, C., Mo, J., Zhang, Z., Xing, W., & Zhao, L. (2024). Rethinking diffusion model for multi-contrast mri super-resolution. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 11365–11374).
Li, Y., Fan, Y., Xiang, X., Demandolx, D., Ranjan, R., Timofte, R., & Van Gool, L. (2023). Efficient and explicit modelling of image hierarchies for image restoration. Ieee/cvf conference on computer vision and pattern recognition (pp. 18278–18289).
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., & Timofte, R. (2021). Swinir: Image restoration using swin transformer. Ieee/cvf international conference on computer vision (pp. 1833–1844).
Lim, B., Son, S., Kim, H., Nah, S., & Mu Lee, K. (2017a). Enhanced deep residual networks for single image super-resolution. Proceedings of the ieee conference on computer vision and pattern recognition workshops (pp. 136–144).
Lim, B., Son, S., Kim, H., Nah, S., & Mu Lee, K. (2017b). Enhanced deep residual networks for single image super-resolution. Proceedings of the ieee conference on computer vision and pattern recognition workshops (pp. 136–144).
Liu, X., Zhai, D., Chen, R., Ji, X., Zhao, D., & Gao, W. (2018). Depth super-resolution via joint color-guided internal and external regularizations. IEEE Transactions on Image Processing, 28(4), 1636–1645.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. Proceedings of the ieee international conference on computer vision, 2, 1150–1157.
Lu, L., Li, W., Tao, X., Lu, J., & Jia, J. (2021). Masa-sr: Matching acceleration and spatial adaptation for reference-based image super-resolution. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 6368–6377).
Luo, Z., Huang, Y., Li, S., Wang, L., & Tan, T. (2023). End-to-end alternating optimization for real-world blind super resolution. International Journal of Computer Vision, 131(12), 3152–3169.
Ma, C., Rao, Y., Cheng, Y., Chen, C., Lu, J., & Zhou, J. (2020). Structure-preserving super resolution with gradient guidance. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 7769–7778).
Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., & Aizawa, K. (2017). Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 76, 21811–21838.
Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., & Shen, H. (2020). Single image super-resolution via a holistic attention network. European conference on computer vision (pp. 191–207).
Pesavento, M., Volino, M., & Hilton, A. (2021). Attention-based multi-reference learning for image super-resolution. Proceedings of the ieee/cvf international conference on computer vision (pp. 14697–14706).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 10684–10695).
Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D. J., & Norouzi, M. (2022). Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4713–4726.
Sajjadi, M.S., Scholkopf, B., & Hirsch, M. (2017). Enhancenet: Single image super-resolution through automated texture synthesis. Proceedings of the ieee international conference on computer vision (pp. 4491–4500).
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 1874–1883).
Shi, Y., Xia, B., Jin, X., Wang, X., Zhao, T., Xia, X., & Yang, W. (2025). Vmambair: Visual state space model for image restoration. IEEE Transactions on Circuits and Systems for Video Technology
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International conference on learning representations.
Smith, J.T., Warrington, A., & Linderman, S. (2022). Simplified state space layers for sequence modeling. The international conference on learning representations.
Sun, H., Li, W., Liu, J., Chen, H., Pei, R., Zou, X., & Yang, Y. (2024). Coser: Bridging image and language for cognitive super-resolution. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 25868–25878).
Sun, L., & Hays, J. (2012). Super-resolution from internet-scale scene matching. Ieee international conference on computational photography (pp. 1–12).
Timofte, R., Agustsson, E., Van Gool, L., Yang, M-H., & Zhang, L. (2017). Ntire 2017 challenge on single image super-resolution: Methods and results. Proceedings of the ieee conference on computer vision and pattern recognition workshops (pp. 114–125).
Wang, J., Yue, Z., Zhou, S., Chan, K. C., & Loy, C. C. (2024). Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision, 1–21, 5929–5949.
Wang, L., Wang, Y., Dong, X., Xu, Q., Yang, J., An, W., & Guo, Y. (2021). Unsupervised degradation representation learning for blind super-resolution. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 10581–10590).
Wang, X., Xie, L., Dong, C., & Shan, Y. (2021). Real-esrgan: Training real-world blind super-resolution with pure synthetic data. Proceedings of the ieee/cvf international conference on computer vision (pp. 1905–1914).
Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., & Change Loy, C. (2018). Esrgan: Enhanced super-resolution generative adversarial networks. European conference on computer vision workshop (pp. 1–16).
Wang, Y., Lin, Z., Shen, X., Mech, R., Miller, G., & Cottrell, G.W. (2016). Event-specific image importance. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 4810–4819).
Wang, Y., Liu, Y., Heidrich, W., & Dai, Q. (2016). The light field attachment: Turning a dslr into a light field camera using a low budget camera ring. IEEE Transactions on Visualization and Computer Graphics, 23(10), 2357–2364.
Wang, Y., Yang, W., Chen, X., Wang, Y., Guo, L., Chau, L-P., & Wen, B. (2024). Sinsr: diffusion-based image super-resolution in a single step. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 25796–25805).
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wu, H., Zhang, Z., Zhang, W., Chen, C., Liao, L., Li, C., Gao, Y., Wang, A., Zhang, E., Sun, W., and others (2023). Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv preprint arXiv:2312.17090
Xia, B., Tian, Y., Hang, Y., Yang, W., Liao, Q., & Zhou, J. (2022). Coarse-to-fine embedded patchmatch and multi-scale dynamic aggregation for reference-based super-resolution. Proceedings of the aaai conference on artificial intelligence, 36, 2768–2776.
Xiao, Y., Yuan, Q., Jiang, K., Chen, Y., Zhang, Q., & Lin, C-W. (2024). Frequency-assisted mamba for remote sensing image super-resolution. arXiv preprint arXiv:2405.04964
Xie, L., Wang, X., Chen, X., Li, G., Shan, Y., Zhou, J., & Dong, C. (2023). Desra: Detect and delete the artifacts of gan-based real-world super-resolution models. International conference on machine learning (pp. 38204–38226).
Xie, Y., Xiao, J., Sun, M., Yao, C., & Huang, K. (2020a). Feature representation matters: End-to-end learning for reference-based image super-resolution. European conference on computer vision (pp. 230–245).
Xie, Y., Xiao, J., Sun, M., Yao, C., & Huang, K. (2020b). Feature representation matters: End-to-end learning for reference-based image super-resolution. European conference on computer vision (pp. 230–245).
Yan, X., Zhao, W., Yuan, K., Zhang, R., Li, Z., & Cui, S. (2020a). Towards content-independent multi-reference super-resolution: Adaptive pattern matching and feature aggregation. European conference on computer vision (pp. 52–68).
Yan, X., Zhao, W., Yuan, K., Zhang, R., Li, Z., & Cui, S. (2020b). Towards content-independent multi-reference super-resolution: Adaptive pattern matching and feature aggregation. European conference on computer vision (pp. 52–68).
Yang, F., Yang, H., Fu, J., Lu, H., & Guo, B. (2020). Learning texture transformer network for image super-resolution. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 5791–5800).
Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., & Yang, Y. (2022). Maniqa: Multi-dimension attention network for no-reference image quality assessment. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 1191–1200).
Zhang, K., Liang, J., Van Gool, L., & Timofte, R. (2021). Designing a practical degradation model for deep blind image super-resolution. Proceedings of the ieee/cvf international conference on computer vision (pp. 4791–4800).
Zhang, L., Li, X., He, D., Li, F., Wang, Y., & Zhang, Z. (2022). Rrsr: Reciprocal reference-based image super-resolution with progressive feature alignment and selection. European conference on computer vision (pp. 648–664).
Zhang, L., Li, Y., Zhou, X., Zhao, X., & Gu, S. (2024). Transcending the limit of local window: Advanced super-resolution transformer with adaptive token dictionary. Ieee/cvf conference on computer vision and pattern recognition (pp. 2856–2865).
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 586–595).
Zhang, W., Liu, Y., Dong, C., & Qiao, Y. (2019). Ranksrgan: Generative adversarial networks with ranker for image super-resolution. Proceedings of the ieee/cvf international conference on computer vision (pp. 3096–3105).
Zhang, X., Dong, H., Hu, Z., Lai, W.-S., Wang, F., & Yang, M.-H. (2020). Gated fusion network for degraded image super resolution. International Journal of Computer Vision, 128, 1699–1721.
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., & Fu, Y. (2018). Image super-resolution using very deep residual channel attention networks. European conference on computer vision (pp. 286–301).
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., & Fu, Y. (2018). Residual dense network for image super-resolution. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 2472–2481).
Zhang, Y., Yang, Q., Chandler, D.M., & Mou, X. (2024). Reference-based multi-stage progressive restoration for multi-degraded images. IEEE Transactions on Image Processing,
Zhang, Z., Wang, Z., Lin, Z., & Qi, H. (2019). Image super-resolution by neural texture transfer. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 7982–7991).
Zheng, H., Ji, M., Wang, H., Liu, Y., & Fang, L. (2018a). Crossnet: An end-to-end reference-based super resolution network using cross-scale warping. Proceedings of the european conference on computer vision (pp. 88–104).
Zheng, H., Ji, M., Wang, H., Liu, Y., & Fang, L. (2018b). Crossnet: An end-to-end reference-based super resolution network using cross-scale warping. Proceedings of the european conference on computer vision (pp. 88–104).
Zhou, H., Zhu, X., Han, Z., & Yin, X-C. (2021). Real-world image super-resolution via spatio-temporal correlation network. 2021 ieee international conference on multimedia and expo (pp. 1–6).
Zhou, H., Zhu, X., Zhu, J., Han, Z., Zhang, S-X., Qin, J., & Yin, X-C. (2023). Learning correction filter via degradation-adaptive regression for blind single image super-resolution. Proceedings of the ieee/cvf international conference on computer vision (pp. 12365–12375).
Zhou, M., Yan, K., Pan, J., Ren, W., Xie, Q., & Cao, X. (2023). Memory-augmented deep unfolding network for guided image super-resolution. International Journal of Computer Vision, 131(1), 215–242.
Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., & Wang, X. (2024). Vision mamba: Efficient visual representation learning with bidirectional state space model. International conference on machine learning.
Zhu, X., Hu, H., Lin, S., & Dai, J. (2019). Deformable convnets v2: More deformable, better results. Proceedings of the ieee/cvf conference on computer vision and pattern recognitio (pp. 9308–9316).
Acknowledgements
The research is supported by National Science and Technology Major Project (2022ZD0119204), National Science Fund for Distinguished Young Scholars (62125601), National Natural Science Foundation of China (62172035, 62076024, 62006018), Youth Teacher International Exchange and Growth Program (No. QNXM20250001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Limin Wang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, H., Zhu, X., Qin, J. et al. Multi-Scale Texture Fusion for Reference-Based Image Super-Resolution: New Dataset and Solution. Int J Comput Vis 133, 6971–6992 (2025). https://doi.org/10.1007/s11263-025-02514-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02514-3