这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Implicit Diffusion Models for Continuous Super-Resolution

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Image super-resolution (SR) has attracted increasing attention due to its widespread applications. However, current SR methods generally suffer from over-smoothing and artifacts, and most of them work only with fixed magnifications. To address these problems, this paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework, where the implicit neural representation is adopted in the decoding process to learn continuous-resolution representation. Moreover, we design a scale-adaptive conditioning mechanism that consists of a low-resolution (LR) conditioning network and a scaling factor. The LR conditioning network adopts a parallel architecture to provide multi-resolution LR conditions for the denoising model. The scaling factor further regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output, which enables the model to accommodate the continuous-resolution requirement. Furthermore, we accelerate the inference process by adjusting the denoising equation and employing post-training quantization to compress the learned denoising network in a training-free manner. Extensive experiments on six benchmark datasets validate the effectiveness of our IDM and demonstrate its superior performance over prior arts. The source code is available at https://github.com/Ree1s/IDM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data Availability

The datasets analyzed during the current study are available in the FFHQ Karras et al. (2019), CelebA-HQ Karras et al. (2017), DIV2K Agustsson and Timofte (2017), Flicker2K Timofte et al. (2017), CAT Zhang et al. (2008), and LSUN Yu et al. (2015).

References

  • Agustsson, E., Timofte, R. (2017). Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 126–135

  • Anciukevičius, T., Xu, Z., Fisher, M. et al .(2023). Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12608–12618

  • Bao, F., Li, C., Zhu, J., et al (2022). Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503

  • Bar-Tal, O., Ofri-Amar, D., Fridman, R., et al. (2022). Text2live: Text-driven layered image and video editing. In: European conference on computer vision, Springer, pp 707–723

  • Baranchuk, D., Rubachev, I., Voynov, A., et al .(2021). Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126

  • Barron, JT., Mildenhall, B., Tancik, M., et al. (2021). Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5855–5864

  • Blattmann, A., Rombach, R., Ling, H., et al. (2023). Align your latents: High-resolution video synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 22563–22575

  • Chabra, R., Lenssen, JE., Ilg, E., et al. (2020). Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, Springer, pp 608–625

  • Chan, KC., Wang, X., Xu, X., et al. (2021). Glean: Generative latent bank for large-factor image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14245–14254

  • Chen, HW., Xu, YS., Hong, MF., et al. (2023). Cascaded local implicit transformer for arbitrary-scale super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18257–18267

  • Chen, N., Zhang, Y., Zen, H., et al. (2020). Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713

  • Chen, Y., Tai, Y., Liu, X., et al. (2018). Fsrnet: End-to-end learning face super-resolution with facial priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2492–2501

  • Chen, Y., Liu, S., Wang, X. (2021). Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8628–8638

  • Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34, 8780–8794.

    Google Scholar 

  • Esser, P., Rombach, R., Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12873–12883

  • Gao, S., Liu, X., Zeng, B., et al .(2023). Implicit diffusion models for continuous super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10021–10030

  • Guo, B., Zhang, X., Wu, H., et al .(2022). Lar-sr: A local autoregressive model for image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1909–1918

  • He, J., Shi, W., Chen, K., et al .(2022). Gcfsr: a generative and controllable face super resolution method without facial and gan priors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1889–1898

  • He, K., Fan, H., Wu, Y., et al .(2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738

  • Heusel, M., Ramsauer, H., Unterthiner, T., et al .(2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30

  • Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840–6851.

    Google Scholar 

  • Ho, J., Saharia, C., Chan, W., et al. (2022). Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47), 1–33.

    MathSciNet  Google Scholar 

  • Hu, X., Mu, H., Zhang, X., et al. (2019). Meta-sr: A magnification-arbitrary network for super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1575–1584

  • Karras, T., Aila, T., Laine, S., et al .(2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196

  • Karras, T., Laine, S., Aila, T .(2019). A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410

  • Kingma, DP., Welling, M., et al .(2013). Auto-encoding variational bayes

  • Kulikov, V., Yadin, S., Kleiner, M., et al .(2023). Sinddm: A single image denoising diffusion model. In: International conference on machine learning, PMLR, pp 17920–17930

  • Ledig, C., Theis, L., Huszár F, et al .(2017). Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

  • Lee, J., Jin, KH. (2022). Local texture estimator for implicit representation function. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1929–1938

  • Li, H., Yang, Y., Chang, M., et al. (2022). Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479, 47–59.

    Article  Google Scholar 

  • Li, H., Feng, Y., Xue, S., et al. (2024). Uv-idm: identity-conditioned latent diffusion model for face uv-texture generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10585–10595

  • Li, M., Duan, Y., Zhou, J., et al .(2023a). Diffusion-sdf: Text-to-shape via voxelized diffusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12642–12651

  • Li, X., Liu, Y., Lian, L., et al. (2023b). Q-diffusion: Quantizing diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 17535–17545

  • Liang, J., Cao, J., Sun, G., et al. (2021a). Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844

  • Liang, J., Lugmayr, A., Zhang, K., et al. (2021b). Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4076–4085

  • Lim, B., Son, S., Kim, H., et al .(2017). Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144

  • Lin, CH., Gao, J., Tang, L., et al .(2023). Magic3d: High-resolution text-to-3d content creation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 300–309

  • Liu, J., Li, C., Ren, Y., et al. (2021). Diffsinger: Diffusion acoustic model for singing voice synthesis. arXiv preprint arXiv:2105.02446 3

  • Liu, L., Ren, Y., Lin, Z., et al. (2022). Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778

  • Liu, X., Zeng, B., Gao, S., et al. (2024). Ladiffgan: Training gans with diffusion supervision in latent spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1115–1125

  • Liu, Z., Luo, P., Wang, X., et al .(2015). Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738

  • Lu, C., Zhou, Y., Bao, F., et al. (2022). Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35, 5775–5787.

    Google Scholar 

  • Lugmayr, A., Danelljan, M., Van Gool, L., et al. (2020). Srflow: Learning the super-resolution space with normalizing flow. In: Computer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part v 16, Springer, pp 715–732

  • Ma, C., Yu, P., Lu, J., et al. (2022). Recovering realistic details for magnification-arbitrary image super-resolution. IEEE Transactions on Image Processing, 31, 3669–3683.

    Article  Google Scholar 

  • Maas, AL., Hannun, AY., Ng, AY. et al .(2013). Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Atlanta, GA, p 3

  • Menon, S., Damian, A., Hu, S., et al .(2020). Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 2437–2445

  • Mescheder, L., Oechsle, M., Niemeyer, M., et al .(2019). Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4460–4470

  • Mildenhall, B., Srinivasan, P. P., Tancik, M., et al. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99–106.

    Article  Google Scholar 

  • Nagel, M., Amjad, RA., Van Baalen, M., et al .(2020). Up or down? adaptive rounding for post-training quantization. In: International conference on machine learning, PMLR, pp 7197–7206

  • Nichol, A., Dhariwal, P., Ramesh, A., et al .(2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741

  • Nichol, AQ., Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In: International conference on machine learning, PMLR, pp 8162–8171

  • Niemeyer, M., Geiger, A. (2021). Giraffe: Representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11453–11464

  • Niemeyer, M., Mescheder, L., Oechsle, M., et al .(2020). Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3504–3515

  • Van den Oord, A., Kalchbrenner, N., Espeholt, L., et al .(2016). Conditional image generation with pixelcnn decoders. Advances in neural information processing systems 29

  • Park, JJ., Florence, P., Straub J, et al .(2019). Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 165–174

  • Parmar, G., Kumar Singh, K., Zhang, R., et al .(2023). Zero-shot image-to-image translation. In: ACM SIGGRAPH 2023 conference proceedings, pp 1–11

  • Poole, B., Jain, A., Barron, JT., et al .(2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988

  • Rahaman, N., Baratin, A., Arpit, D., et al .(2019). On the spectral bias of neural networks. In: International conference on machine learning, PMLR, pp 5301–5310

  • Ramesh, A., Dhariwal, P., Nichol, A., et al .(2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2):3

  • Rasul, K., Seward, C., Schuster, I., et al. (2021). Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In: International conference on machine learning, PMLR, pp 8857–8868

  • Razavi, A., Van den Oord, A., Vinyals, O. (2019). Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32

  • Rombach, R., Blattmann, A., Lorenz, D., et al .(2022). High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10684–10695

  • Saharia, C., Chan, W., Chang, H., et al .(2022a). Palette: Image-to-image diffusion models. In: ACM SIGGRAPH 2022 conference proceedings, pp 1–10

  • Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35, 36479–36494.

    Google Scholar 

  • Saharia, C., Ho, J., Chan, W., et al. (2022). Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence, 45(4), 4713–4726.

    Google Scholar 

  • Saito, S., Huang, Z., Natsume, R., et al .(2019). Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2304–2314

  • Shang, Y., Yuan, Z., Xie, B., et al .(2023). Post-training quantization on diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1972–1981

  • Sitzmann, V., Zollhöfer, M., Wetzstein, G .(2019). Scene representation networks: Continuous 3d-structure-aware neural scene representations. Advances in neural information processing systems 32

  • Sitzmann, V., Martel, J., Bergman, A., et al. (2020). Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33, 7462–7473.

    Google Scholar 

  • Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., et al .(2015a). Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning, pmlr, pp 2256–2265

  • Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., et al .(2015b). Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning, pmlr, pp 2256–2265

  • Song, J., Meng, C., Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502

  • Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  • Tancik, M., Srinivasan, P., Mildenhall, B., et al. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. Advances in neural information processing systems, 33, 7537–7547.

    Google Scholar 

  • Timofte, R., Agustsson, E., Van Gool, L., et al .(2017). Ntire 2017 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 114–125

  • Vahdat, A., & Kautz, J. (2020). Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33, 19667–19679.

    Google Scholar 

  • Van Den Oord, A., Dieleman, S., Zen, H., et al .(2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 12

  • Wang, L., Wang, Y., Lin, Z., et al .(2021a). Learning a single network for scale-arbitrary super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4801–4810

  • Wang, W., Bao, J., Zhou, W., et al .(2025). Sindiffusion: Learning a diffusion model from a single natural image. IEEE Transactions on Pattern Analysis and Machine Intelligence

  • Wang, X., Yu, K., Wu, S., et al .(2018). Esrgan: Enhanced super-resolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0

  • Wang, X., Li, Y., Zhang, H., et al .(2021b). Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9168–9178

  • Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.

    Article  Google Scholar 

  • Wiesner, D., Suk, J., Dummer, S., et al .(2022). Implicit neural representations for generative modeling of living cell shapes. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp 58–67

  • Wu, JZ., Ge, Y., Wang, X., et al .(2023). Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7623–7633

  • Yu, F., Seff, A., Zhang, Y., et al .(2015). Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365

  • Yu, S., Sohn, K., Kim, S., et al .(2023). Video probabilistic diffusion models in projected latent space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18456–18466

  • Zeng, B., Liu, B., Li, H., et al. (2022). Fnevr: Neural volume rendering for face animation. Advances in Neural Information Processing Systems, 35, 22451–22462.

    Google Scholar 

  • Zhang, R., Isola, P., Efros, AA., et al .(2018). The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595

  • Zhang, W., Sun, J., Tang, X. (2008). Cat head detection-how to effectively exploit shape and texture features. In: Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV 10, Springer, pp 802–816

  • Zhang, W., Liu, Y., Dong, C., et al .(2019a). Ranksrgan: Generative adversarial networks with ranker for image super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3096–3105

  • Zhang, Y., Li, X., & Zhou, J. (2019). Sftgan: a generative adversarial network for pan-sharpening equipped with spatial feature transform layers. Journal of Applied Remote Sensing, 13(2), 026507–026507.

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the National Key Research and Development Program of China (Grant No. 2023YFC3306401). This research was also supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, National Natural Science Foundation of China under Grant NO. 62076016, 62176068, 92467108 and 62141604, “One Thousand Plan” projects in Jiangxi Province Jxsg2023102268, Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park Grant No.Z231100005923035, Taiyuan City “Double hundred Research action” 2024TYJB0127

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tian Wang or Baochang Zhang.

Additional information

Communicated by João F. Henriques.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Gao, S., Zeng, B. et al. Implicit Diffusion Models for Continuous Super-Resolution. Int J Comput Vis 133, 6535–6557 (2025). https://doi.org/10.1007/s11263-025-02462-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02462-y

Keywords