Abstract
Image super-resolution (SR) has attracted increasing attention due to its widespread applications. However, current SR methods generally suffer from over-smoothing and artifacts, and most of them work only with fixed magnifications. To address these problems, this paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework, where the implicit neural representation is adopted in the decoding process to learn continuous-resolution representation. Moreover, we design a scale-adaptive conditioning mechanism that consists of a low-resolution (LR) conditioning network and a scaling factor. The LR conditioning network adopts a parallel architecture to provide multi-resolution LR conditions for the denoising model. The scaling factor further regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output, which enables the model to accommodate the continuous-resolution requirement. Furthermore, we accelerate the inference process by adjusting the denoising equation and employing post-training quantization to compress the learned denoising network in a training-free manner. Extensive experiments on six benchmark datasets validate the effectiveness of our IDM and demonstrate its superior performance over prior arts. The source code is available at https://github.com/Ree1s/IDM.
Similar content being viewed by others
References
Agustsson, E., Timofte, R. (2017). Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 126–135
Anciukevičius, T., Xu, Z., Fisher, M. et al .(2023). Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12608–12618
Bao, F., Li, C., Zhu, J., et al (2022). Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503
Bar-Tal, O., Ofri-Amar, D., Fridman, R., et al. (2022). Text2live: Text-driven layered image and video editing. In: European conference on computer vision, Springer, pp 707–723
Baranchuk, D., Rubachev, I., Voynov, A., et al .(2021). Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126
Barron, JT., Mildenhall, B., Tancik, M., et al. (2021). Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5855–5864
Blattmann, A., Rombach, R., Ling, H., et al. (2023). Align your latents: High-resolution video synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 22563–22575
Chabra, R., Lenssen, JE., Ilg, E., et al. (2020). Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, Springer, pp 608–625
Chan, KC., Wang, X., Xu, X., et al. (2021). Glean: Generative latent bank for large-factor image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14245–14254
Chen, HW., Xu, YS., Hong, MF., et al. (2023). Cascaded local implicit transformer for arbitrary-scale super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18257–18267
Chen, N., Zhang, Y., Zen, H., et al. (2020). Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713
Chen, Y., Tai, Y., Liu, X., et al. (2018). Fsrnet: End-to-end learning face super-resolution with facial priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2492–2501
Chen, Y., Liu, S., Wang, X. (2021). Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8628–8638
Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34, 8780–8794.
Esser, P., Rombach, R., Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12873–12883
Gao, S., Liu, X., Zeng, B., et al .(2023). Implicit diffusion models for continuous super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10021–10030
Guo, B., Zhang, X., Wu, H., et al .(2022). Lar-sr: A local autoregressive model for image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1909–1918
He, J., Shi, W., Chen, K., et al .(2022). Gcfsr: a generative and controllable face super resolution method without facial and gan priors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1889–1898
He, K., Fan, H., Wu, Y., et al .(2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Heusel, M., Ramsauer, H., Unterthiner, T., et al .(2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840–6851.
Ho, J., Saharia, C., Chan, W., et al. (2022). Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47), 1–33.
Hu, X., Mu, H., Zhang, X., et al. (2019). Meta-sr: A magnification-arbitrary network for super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1575–1584
Karras, T., Aila, T., Laine, S., et al .(2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196
Karras, T., Laine, S., Aila, T .(2019). A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410
Kingma, DP., Welling, M., et al .(2013). Auto-encoding variational bayes
Kulikov, V., Yadin, S., Kleiner, M., et al .(2023). Sinddm: A single image denoising diffusion model. In: International conference on machine learning, PMLR, pp 17920–17930
Ledig, C., Theis, L., Huszár F, et al .(2017). Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Lee, J., Jin, KH. (2022). Local texture estimator for implicit representation function. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1929–1938
Li, H., Yang, Y., Chang, M., et al. (2022). Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479, 47–59.
Li, H., Feng, Y., Xue, S., et al. (2024). Uv-idm: identity-conditioned latent diffusion model for face uv-texture generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10585–10595
Li, M., Duan, Y., Zhou, J., et al .(2023a). Diffusion-sdf: Text-to-shape via voxelized diffusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12642–12651
Li, X., Liu, Y., Lian, L., et al. (2023b). Q-diffusion: Quantizing diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 17535–17545
Liang, J., Cao, J., Sun, G., et al. (2021a). Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1833–1844
Liang, J., Lugmayr, A., Zhang, K., et al. (2021b). Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4076–4085
Lim, B., Son, S., Kim, H., et al .(2017). Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
Lin, CH., Gao, J., Tang, L., et al .(2023). Magic3d: High-resolution text-to-3d content creation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 300–309
Liu, J., Li, C., Ren, Y., et al. (2021). Diffsinger: Diffusion acoustic model for singing voice synthesis. arXiv preprint arXiv:2105.02446 3
Liu, L., Ren, Y., Lin, Z., et al. (2022). Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778
Liu, X., Zeng, B., Gao, S., et al. (2024). Ladiffgan: Training gans with diffusion supervision in latent spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1115–1125
Liu, Z., Luo, P., Wang, X., et al .(2015). Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Lu, C., Zhou, Y., Bao, F., et al. (2022). Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35, 5775–5787.
Lugmayr, A., Danelljan, M., Van Gool, L., et al. (2020). Srflow: Learning the super-resolution space with normalizing flow. In: Computer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part v 16, Springer, pp 715–732
Ma, C., Yu, P., Lu, J., et al. (2022). Recovering realistic details for magnification-arbitrary image super-resolution. IEEE Transactions on Image Processing, 31, 3669–3683.
Maas, AL., Hannun, AY., Ng, AY. et al .(2013). Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Atlanta, GA, p 3
Menon, S., Damian, A., Hu, S., et al .(2020). Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 2437–2445
Mescheder, L., Oechsle, M., Niemeyer, M., et al .(2019). Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4460–4470
Mildenhall, B., Srinivasan, P. P., Tancik, M., et al. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99–106.
Nagel, M., Amjad, RA., Van Baalen, M., et al .(2020). Up or down? adaptive rounding for post-training quantization. In: International conference on machine learning, PMLR, pp 7197–7206
Nichol, A., Dhariwal, P., Ramesh, A., et al .(2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741
Nichol, AQ., Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In: International conference on machine learning, PMLR, pp 8162–8171
Niemeyer, M., Geiger, A. (2021). Giraffe: Representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11453–11464
Niemeyer, M., Mescheder, L., Oechsle, M., et al .(2020). Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3504–3515
Van den Oord, A., Kalchbrenner, N., Espeholt, L., et al .(2016). Conditional image generation with pixelcnn decoders. Advances in neural information processing systems 29
Park, JJ., Florence, P., Straub J, et al .(2019). Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 165–174
Parmar, G., Kumar Singh, K., Zhang, R., et al .(2023). Zero-shot image-to-image translation. In: ACM SIGGRAPH 2023 conference proceedings, pp 1–11
Poole, B., Jain, A., Barron, JT., et al .(2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988
Rahaman, N., Baratin, A., Arpit, D., et al .(2019). On the spectral bias of neural networks. In: International conference on machine learning, PMLR, pp 5301–5310
Ramesh, A., Dhariwal, P., Nichol, A., et al .(2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2):3
Rasul, K., Seward, C., Schuster, I., et al. (2021). Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In: International conference on machine learning, PMLR, pp 8857–8868
Razavi, A., Van den Oord, A., Vinyals, O. (2019). Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32
Rombach, R., Blattmann, A., Lorenz, D., et al .(2022). High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10684–10695
Saharia, C., Chan, W., Chang, H., et al .(2022a). Palette: Image-to-image diffusion models. In: ACM SIGGRAPH 2022 conference proceedings, pp 1–10
Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35, 36479–36494.
Saharia, C., Ho, J., Chan, W., et al. (2022). Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence, 45(4), 4713–4726.
Saito, S., Huang, Z., Natsume, R., et al .(2019). Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2304–2314
Shang, Y., Yuan, Z., Xie, B., et al .(2023). Post-training quantization on diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1972–1981
Sitzmann, V., Zollhöfer, M., Wetzstein, G .(2019). Scene representation networks: Continuous 3d-structure-aware neural scene representations. Advances in neural information processing systems 32
Sitzmann, V., Martel, J., Bergman, A., et al. (2020). Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33, 7462–7473.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., et al .(2015a). Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning, pmlr, pp 2256–2265
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., et al .(2015b). Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning, pmlr, pp 2256–2265
Song, J., Meng, C., Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502
Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tancik, M., Srinivasan, P., Mildenhall, B., et al. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. Advances in neural information processing systems, 33, 7537–7547.
Timofte, R., Agustsson, E., Van Gool, L., et al .(2017). Ntire 2017 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 114–125
Vahdat, A., & Kautz, J. (2020). Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33, 19667–19679.
Van Den Oord, A., Dieleman, S., Zen, H., et al .(2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 12
Wang, L., Wang, Y., Lin, Z., et al .(2021a). Learning a single network for scale-arbitrary super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4801–4810
Wang, W., Bao, J., Zhou, W., et al .(2025). Sindiffusion: Learning a diffusion model from a single natural image. IEEE Transactions on Pattern Analysis and Machine Intelligence
Wang, X., Yu, K., Wu, S., et al .(2018). Esrgan: Enhanced super-resolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0
Wang, X., Li, Y., Zhang, H., et al .(2021b). Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9168–9178
Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.
Wiesner, D., Suk, J., Dummer, S., et al .(2022). Implicit neural representations for generative modeling of living cell shapes. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp 58–67
Wu, JZ., Ge, Y., Wang, X., et al .(2023). Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7623–7633
Yu, F., Seff, A., Zhang, Y., et al .(2015). Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365
Yu, S., Sohn, K., Kim, S., et al .(2023). Video probabilistic diffusion models in projected latent space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18456–18466
Zeng, B., Liu, B., Li, H., et al. (2022). Fnevr: Neural volume rendering for face animation. Advances in Neural Information Processing Systems, 35, 22451–22462.
Zhang, R., Isola, P., Efros, AA., et al .(2018). The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595
Zhang, W., Sun, J., Tang, X. (2008). Cat head detection-how to effectively exploit shape and texture features. In: Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV 10, Springer, pp 802–816
Zhang, W., Liu, Y., Dong, C., et al .(2019a). Ranksrgan: Generative adversarial networks with ranker for image super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3096–3105
Zhang, Y., Li, X., & Zhou, J. (2019). Sftgan: a generative adversarial network for pan-sharpening equipped with spatial feature transform layers. Journal of Applied Remote Sensing, 13(2), 026507–026507.
Acknowledgements
The work was supported by the National Key Research and Development Program of China (Grant No. 2023YFC3306401). This research was also supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, National Natural Science Foundation of China under Grant NO. 62076016, 62176068, 92467108 and 62141604, “One Thousand Plan” projects in Jiangxi Province Jxsg2023102268, Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park Grant No.Z231100005923035, Taiyuan City “Double hundred Research action” 2024TYJB0127
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by João F. Henriques.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Gao, S., Zeng, B. et al. Implicit Diffusion Models for Continuous Super-Resolution. Int J Comput Vis 133, 6535–6557 (2025). https://doi.org/10.1007/s11263-025-02462-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02462-y