D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction

Liu, Xuhui; Li, Hong; Qiao, Zhi; Huang, Yawen; Liu, Xi; Zhang, Juan; Qian, Zhen; Zhen, Xiantong; Zhang, Baochang

doi:10.1007/s11263-025-02426-2

D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction

Published: 16 April 2025

Volume 133, pages 5238–5261, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

545 Accesses
1 Citation
Explore all metrics

Abstract

Computed tomography (CT) is a cornerstone of clinical imaging, yet its accessibility in certain scenarios is constrained by radiation exposure concerns and operational limitations within surgical environments. CT reconstruction from incomplete views has attracted increasing research attention due to its great potential in medical applications. However, it is inherently an ill-posed problem, which, coupled with the complex, high-dimensional characteristics of 3D medical data, poses great challenges such as artifact mitigation, global incoherence, and high computational costs. To tackle those challenges, this paper introduces D3T, a new 3D conditional diffusion transformer that models 3D CT distributions in the low-dimensional 2D latent space for incomplete-view CT reconstruction. Our approach comprises two primary components: a triplanar vector quantized auto-encoder (TriVQAE) and a latent dual-domain diffusion transformer (LD3T). TriVQAE encodes high-resolution 3D CT images into compact 2D latent triplane codes which effectively factorize the intricate CT structures, further enabling compute-friendly diffusion model architecture design. Operating in the latent triplane space, LD3T significantly reduces the complexity of capturing the intricate structures in CT images. Its improved diffusion transformer architecture efficiently understands the global correlations across the three planes, ensuring high-fidelity 3D reconstructions. LD3T presents a new dual-domain conditional generation pipeline that incorporates both image and projection conditions, facilitating controllable reconstruction to produce 3D structures consistent with the given conditions. Moreover, LD3T introduces a new Dual-Space Consistency Loss that integrates image-level supervision beyond standard supervision in the latent space to enhance consistency in the 3D image space. Extensive experiments on four datasets with three inverse settings demonstrate the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

A review on Deep Learning approaches for low-dose Computed Tomography restoration

Article Open access 30 May 2021

Topology-Preserving Computed Tomography Super-Resolution Based on Dual-Stream Diffusion Model

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data Availability

The LIDC-IDRI (Armato III et al., 2011), CTSpine1K (Deng et al., 2021), and CTPelvic1K (Liu et al., 2021) datasets analyzed during the current study are both public and available. The links to these datasets can be found in the original papers. The LumbaV dataset is proposed in (Liu et al., 2024), but not accessible yet. One can contact the authors for permission to download the data.

Code Availibility

The codes are open-sourced at https://github.com/clannadcl/Official-codes-of-D3T.

References

Abdollahi, A., Pradhan, B., & Alamri, A. (2020). Vnet: An end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access, 8, 179424–179436.
Google Scholar
Agarwal, S., Furukawa, Y., Snavely, N., et al. (2011). Building Rome in a day. Communications of the ACM, 54(10), 105–112.
Google Scholar
Armato, S. G., III., McLennan, G., Bidaut, L., et al. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915–931.
Google Scholar
Barron, J. T., Mildenhall, B., Tancik, M., et al. (2021). Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5855–5864).
Blattmann, A., Rombach, R., Ling, H., et al. (2023). Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 22563–22575).
Cao, Z., Hong, F., Wu, T., et al. (2025). Difftf++: 3d-aware diffusion transformer for large-vocabulary 3D generation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Chan, E. R., Lin, C. Z., Chan, M. A., et al. (2022). Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16123–16133).
Chang, A. X., Funkhouser, T., Guibas, L., et al. (2015). Shapenet: An information-rich 3D model repository. arXiv:1512.03012
Chen, A., Xu, Z., Zhao, F., et al. (2021). Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14124–14133).
Chen, A., Xu, Z., Geiger, A., et al. (2022). Tensorf: Tensorial radiance fields. In: European conference on computer vision (pp. 333–350). Springer.
Chen, D. Z., Siddiqui, Y., Lee, H. Y., et al. (2023a). Text2tex: Text-driven texture synthesis via diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 18558–18568).
Chen, H., Gu, J., Chen, A., et al. (2023b). Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2416–2425).
Chen, J., Yu, J., Ge, C., et al. (2023c). Pixart-$alpha$: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv:2310.00426
Chen, R., Chen, Y., Jiao, N., et al. (2023d). Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22246–22256).
Chung, H., Sim, B., Ryu, D., et al. (2022). Improving diffusion models for inverse problems using manifold constraints. Advances in Neural Information Processing Systems, 35, 25683–25696.
Google Scholar
Chung, H., Lee, S., Ye, J. C. (2023a). Decomposed diffusion sampler for accelerating large-scale inverse problems. arXiv:2303.05754
Chung, H., Ryu, D., McCann, M. T., et al. (2023b). Solving 3D inverse problems using pre-trained 2D diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 22542–22551).
Deng, Y., Wang, C., Hui, Y., et al. (2021). Ctspine1k: A large-scale dataset for spinal vertebrae segmentation in computed tomography. arXiv:2105.14711
Du, C., Lin, X., Wu, Q., et al. (2024). Dper: Diffusion prior driven neural representation for limited angle and sparse view CT reconstruction. arXiv:2404.17890
Esser, P., Rombach, R., Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12873–12883).
Gao, P., Zhuo, L., Liu, D., et al. (2024). Lumina-t2x: Transforming text into any modality, resolution, and duration via flow-based large diffusion transformers. arXiv:2405.05945
Gao, S., Liu, X., Zeng, B., et al. (2023). Implicit diffusion models for continuous super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10021–10030).
Ge, R., He, Y., Xia, C., et al. (2022). X-CTRSNET: 3D cervical vertebra CT reconstruction and segmentation directly from 2D X-ray images. Knowledge-Based Systems, 236, 107680.
Google Scholar
Gieruc, T., Kästingschäfer, M., Bernhard, S., et al. (2024). 6img-to-3d: Few-image large-scale outdoor driving scene reconstruction. arXiv:2404.12378
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems (Vol. 27).
Gu, S., Chen, D., Bao, J., et al. (2022). Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10696–10706).
Harmon, S. A., Sanford, T. H., Xu, S., et al. (2020). Artificial intelligence for the detection of covid-19 pneumonia on chest CT using multinational datasets. Nature Communications, 11(1), 4080.
Google Scholar
Hatamizadeh, A., Nath, V., Tang, Y., et al. (2021). Swin unetr: Swin transformers for semantic segmentation of brain tumors in MRI images. In International MICCAI brainlesion workshop (pp. 272–284). Springer.
He, J., Li, B., Yang, G., et al. (2024). Blaze3dm: Marry triplane representation with diffusion for 3D medical inverse problem solving. arXiv:2405.15241
Henry, A., Dachapally, P. R., Pawar, S., et al. (2020). Query-key normalization for transformers. arXiv:2010.04245
Henzler, P., Rasche, V., Ropinski, T., et al. (2018). Single-image tomography: 3D volumes from 2D cranial x-rays. In Computer graphics forum (pp. 377–388). Wiley Online Library.
Herman, G. T. (2009). Fundamentals of computerized tomography: Image reconstruction from projections. Springer.
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., et al. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (Vol. 30).
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Google Scholar
Ho, J., Chan, W., Saharia, C., et al. (2022a). Imagen video: High definition video generation with diffusion models. arXiv:2210.02303
Ho, J., Salimans, T., Gritsenko, A., et al. (2022). Video diffusion models. Advances in Neural Information Processing Systems, 35, 8633–8646.
Google Scholar
Hong, Y., Zhang, K., Gu, J., et al. (2023). LRM: Large reconstruction model for single image to 3D. arXiv:2311.04400
Hu, Z., Zhao, M., Zhao, C., et al. (2024). Efficientdreamer: high-fidelity and robust 3d creation via orthogonal-view diffusion priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4949–4958).
Huang, Y., Taubmann, O., Huang, X., et al. (2018). Scale-space anisotropic total variation for limited angle tomography. IEEE Transactions on Radiation and Plasma Medical Sciences, 2(4), 307–314.
Google Scholar
Huang, Y., Preuhs, A., Lauritsch, G., et al. (2019). Data consistent artifact reduction for limited angle tomography with deep learning prior. In International workshop on machine learning for medical image reconstruction (pp. 101–112). Springer.
Jiang, L., Zhang, M., Wei, R., et al. (2021). Reconstruction of 3D CT from a single x-ray projection view using CVAE-GAN. In 2021 IEEE international conference on medical imaging physics and engineering (ICMIPE) (pp. 1–6). IEEE.
Jin, K. H., McCann, M. T., Froustey, E., et al. (2017). Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9), 4509–4522.
MathSciNet Google Scholar
Johnson, A. E., Pollard, T. J., Berkowitz, S. J., et al. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1), 317.
Google Scholar
Johnson, C. D., Chen, M. H., Toledano, A. Y., et al. (2008). Accuracy of CT colonography for detection of large adenomas and cancers. New England Journal of Medicine, 359(12), 1207–1217.
Google Scholar
Jun, H., Nichol, A. (2023). Shap-E: Generating conditional 3D implicit functions. arXiv:2305.02463
Kasten, Y., Doktofsky, D., Kovler, I. (2020). End-to-end convolutional neural network for 3D reconstruction of knee bones from bi-planar X-ray images. In Machine learning for medical image reconstruction: third international workshop, MLMIR 2020, held in conjunction with MICCAI 2020, Lima, Peru, October 8, 2020, Proceedings 3 (pp. 123–133). Springer.
Katsura, M., Matsuda, I., Akahane, M., et al. (2012). Model-based iterative reconstruction technique for radiation dose reduction in chest CT: Comparison with the adaptive statistical iterative reconstruction technique. European Radiology, 22, 1613–1623.
Google Scholar
Kong, Z., Ping, W., Huang, J., et al. (2020). Diffwave: A versatile diffusion model for audio synthesis. arXiv:2009.09761
Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (Vol. 25).
Lee, S., Chung, H., Park, M., et al. (2023). Improving 3D imaging with pre-trained perpendicular 2D diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10710–10720).
Li, J., Tan, H., Zhang, K., et al. (2023). Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv:2311.06214
Lin, C. H., Gao, J., Tang, L., et al.: (2023a) Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 300–309).
Lin, K. E., Lin, Y. C., Lai, W. S., et al. (2023b). Vision transformer for nerf-based view synthesis from a single input image. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 806–815).
Lin, W. A., Liao, H., Peng, C., et al. (2019). Dudonet: Dual domain network for CT metal artifact reduction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10512–10521).
Liu, J., Li, C., Ren, Y., et al. (2022a). Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI conference on artificial intelligence (pp. 11020–11028).
Liu, J., Anirudh, R., Thiagarajan, J. J., et al. (2023a). Dolce: A model-based probabilistic diffusion framework for limited-angle CT reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10498–10508).
Liu, L. (2014). Model-based iterative reconstruction: A promising algorithm for today’s computed tomography imaging. Journal of Medical Imaging and Radiation Sciences, 45(2), 131–136.
Google Scholar
Liu, M., Xu, C., Jin, H., et al. (2023). One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems, 36, 22226–22246.
Google Scholar
Liu, P., Han, H., Du, Y., et al. (2021). Deep learning to segment pelvic bones: Large-scale CT datasets and baseline models. International Journal of Computer Assisted Radiology and Surgery, 16, 749–756.
Google Scholar
Liu, R., Wu, R., Van Hoorick, B., et al. (2023c). Zero-1-to-3: Zero-shot one image to 3D object. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9298–9309).
Liu, X., Qiao, Z., Liu, R., et al. (2024). Diffux2ct: Diffusion learning to reconstruct CT images from biplanar x-rays. In European conference on computer vision (pp. 458–476). Springer.
Liu, Y., Ma, J., Fan, Y., et al. (2012). Adaptive-weighted total variation minimization for sparse data toward low-dose X-ray computed tomography image reconstruction. Physics in Medicine & Biology, 57(23), 7923.
Google Scholar
Liu, Z., Ning, J., Cao, Y., et al. (2022b). Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3202–3211).
Long, X., Guo, Y. C., Lin, C., et al. (2024). Wonder3d: Single image to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9970–9980).
Ma, C., Li, Z., He, J., et al. (2023a). Universal incomplete-view CT reconstruction with prompted contextual transformer. arXiv:2312.07846
Ma, C., Li, Z., Zhang, J., et al. (2023b). Freeseed: Frequency-band-aware and self-guided network for sparse-view CT reconstruction. In International conference on medical image computing and computer-assisted intervention (pp. 250–259). Springer.
Mildenhall, B., Srinivasan, P. P., Tancik, M., et al. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99–106.
Google Scholar
Mohan, K. A., Venkatakrishnan, S., Gibbs, J. W., et al. (2015). Timbir: A method for time-space reconstruction from interlaced views. IEEE Transactions on Computational Imaging, 1(2), 96–111.
MathSciNet Google Scholar
Müller, T., Evans, A., Schied, C., et al. (2022). Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (TOG), 41(4), 1–15.
Google Scholar
Nichol, A., Dhariwal, P., Ramesh, A., et al. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741
Nolan, T. (2022). Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT). Cancer Imaging Archive.
Peebles, W., Xie, S. (2023). Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4195–4205).
Peng, W., Adeli, E., Bosschieter, T., et al. (2023). Generating realistic brain MRIs via a conditional diffusion probabilistic model. In: International conference on medical image computing and computer-assisted intervention (pp. 14–24). Springer.
Perez, E., Strub, F., De Vries, H., et al, (2018). Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence.
Poole, B., Jain, A., Barron, J. T., et al. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988
Ramesh, A., Dhariwal, P., Nichol, A., et al. (2022). Hierarchical text-conditional image generation with clip latents 1(2):3. arXiv:2204.06125
Ren, Y., Wang, F., Zhang, T., et al. (2023). Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16685–16695).
Rombach, R., Blattmann, A., Lorenz, D., et al. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684–10695).
Ronchetti, M. (2020). Torchradon: Fast differentiable routines for computed tomography. arXiv:2009.14788
Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, 36479–36494.
Google Scholar
Saharia, C., Ho, J., Chan, W., et al. (2022). Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4713–4726.
Google Scholar
Schönberger, J. L., Zheng, E., Frahm, J. M., et al. (2016). Pixelwise view selection for unstructured multi-view stereo. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14 (pp. 501–518). Springer.
Shen, L., Zhao, W., & Xing, L. (2019). Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nature Biomedical Engineering, 3(11), 880–888.
Google Scholar
Shi, Y., Wang, P., Ye, J., et al. (2023). Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512
Shiode, R., Kabashima, M., Hiasa, Y., et al. (2021). 2d–3d reconstruction of distal forearm bone from actual X-ray images of the wrist using convolutional neural networks. Scientific Reports, 11(1), 15249.
Google Scholar
Shuang, W., Lin, Y., Zeng, Y., et al. (2024). Direct3d: Scalable image-to-3D generation via 3D latent diffusion transformer. Advances in Neural Information Processing Systems, 37, 121859–121881.
Google Scholar
Siasios, I. D., Pollina, J., Khan, A., et al. (2017). Percutaneous screw placement in the lumbar spine with a modified guidance technique based on 3D CT navigation system. Journal of Spine Surgery, 3(4), 657.
Google Scholar
Simpson, A. L., Antonelli, M., Bakas, S., et al. (2019). A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv:1902.09063
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., et al. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256–2265). PMLR.
Song, J., Meng, C., Ermon, S. (2020). Denoising diffusion implicit models. arXiv:2010.02502
Song, Y., Shen, L., Xing, L., et al. (2021). Solving inverse problems in medical imaging with score-based generative models. arXiv:2111.08005
Su, J., Ahmed, M., Lu, Y., et al. (2024). Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568, 127063.
Google Scholar
Tang, J., Wang, T., Zhang, B., et al. (2023). Make-it-3D: High-fidelity 3D creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22819–22829).
Trevithick, A., Yang, B. (2021). GRF: Learning a general radiance field for 3d representation and rendering. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15182–15192).
Venkatakrishnan, S. V., Drummy, L. F., Jackson, M. A., et al. (2013). A model based iterative reconstruction algorithm for high angle annular dark field-scanning transmission electron microscope (HAADF-STEM) tomography. IEEE Transactions on Image Processing, 22(11), 4532–4544.
MathSciNet Google Scholar
Venkatakrishnan, S. V., Mohan, K. A., Ziabari, A. K., et al. (2021). Algorithm-driven advances for scientific CT instruments: From model-based to deep learning-based approaches. IEEE Signal Processing Magazine, 39(1), 32–43.
Google Scholar
Wang, C., Shang, K., Zhang, H., et al. (2022a). Dudotrans: dual-domain transformer for sparse-view CT reconstruction. In International workshop on machine learning for medical image reconstruction (pp. 84–94). Springer.
Wang, G., Ye, J. C., Mueller, K., et al. (2018). Image reconstruction is a new frontier of machine learning. IEEE Transactions on Medical Imaging, 37(6), 1289–1296.
Google Scholar
Wang, G., Ye, J. C., & De Man, B. (2020). Deep learning for tomographic image reconstruction. Nature Machine Intelligence, 2(12), 737–748.
Google Scholar
Wang, H., Du, X., Li, J., et al. (2023a). Score Jacobian chaining: Lifting pretrained 2D diffusion models for 3d generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12619–12629).
Wang, Q., Wang, Z., Genova, K., et al. (2021). IBRNet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4690–4699).
Wang, T., Zhang, B., Zhang, T., et al. (2023b). Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4563–4573).
Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Google Scholar
Wang, Z., Wu, Z., Agarwal, D., et al. (2022b). Medclip: Contrastive learning from unpaired medical images and text. In Proceedings of the conference on empirical methods in natural language processing. Conference on empirical methods in natural language processing (p. 3876).
Wang, Z., Lu, C., Wang, Y., et al. (2023). ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. Advances in Neural Information Processing Systems, 36, 8406–8441.
Google Scholar
Wang, Z., Wang, Y., Chen, Y., et al. (2024). CRM: Single image to 3D textured mesh with convolutional reconstruction model. In European conference on computer vision (pp. 57–74). Springer.
Wu, J., & Mahfouz, M. R. (2021). Reconstruction of knee anatomy from single-plane fluoroscopic X-ray based on a nonlinear statistical shape model. Journal of Medical Imaging, 8(1), 016001–016001.
Google Scholar
Wu, J. Z., Ge, Y., Wang, X., et al. (2023). Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7623–7633).
Wu, W., Hu, D., Niu, C., et al. (2021). Drone: Dual-domain residual-based optimization network for sparse-view CT reconstruction. IEEE Transactions on Medical Imaging, 40(11), 3002–3014.
Google Scholar
Wu, Z., Li, Y., Yan, H., et al. (2024). Blockfusion: Expandable 3D scene generation using latent tri-plane extrapolation. ACM Transactions on Graphics (TOG), 43(4), 1–17.
Google Scholar
Xu, J., Cheng, W., Gao, Y., et al. (2024). Instantmesh: Efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv:2404.07191
Ying, X., Guo, H., Ma, K., et al. (2019). X2CT-GAN: Reconstructing CT from biplanar X-rays with generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10619–10628).
Yu, A., Ye, V., Tancik, M., et al. (2021). pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4578–4587).
Yu, S., Sohn, K., Kim, S., et al. (2023). Video probabilistic diffusion models in projected latent space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18456–18466).
Zhang, B., & Sennrich, R. (2019). Root mean square layer normalization. In Advances in neural information processing systems (Vol. 32).
Zhang, C., Liu, L., Dai, J., et al. (2024). XTransCT: Ultra-fast volumetric CT reconstruction using two orthogonal X-ray projections for image-guided radiation therapy via a transformer network. Physics in Medicine & Biology, 69(8), 085010.
Google Scholar
Zhang, R., Isola, P., Efros, A. A., et al. (2018a). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).
Zhang, Z., Liang, X., Dong, X., et al. (2018). A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Transactions on Medical Imaging, 37(6), 1407–1417.
Google Scholar
Zhang, Z., Yu, L., Liang, X., et al. (2021). TransCT: Dual-path transformer for low dose computed tomography. In Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VI 24 (pp. 55–64). Springer.
Zheng, X. Y., Pan, H., Guo, Y. X., et al. (2024). Mvd$\hat{\,}$ 2: Efficient multiview 3D reconstruction for multiview diffusion. In ACM SIGGRAPH 2024 conference papers (pp. 1–11).
Zhu, J., Zhuang, P., & Koyejo, S. (2023). HiFA: High-fidelity text-to-3D generation with advanced diffusion guidance. arXiv:2305.18766

Download references

Acknowledgements

The work was supported by National Natural Science Foundation of China under Grant No. 62176068 and 12201024, the National Key Research and Development Program of China (Grant No. 2023YFC3306401). This research was also supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, L244043 and Z241100001324017 “One Thousand Plan” projects in Jiangxi Province Jxsq2023102268.

Author information

Authors and Affiliations

Beihang University, Beijing, China
Xuhui Liu, Hong Li, Juan Zhang & Baochang Zhang
Central Research Institute, United Imaging Healthcare, Beijing, China
Xuhui Liu, Zhi Qiao, Xi Liu, Zhen Qian & Xiantong Zhen
Jarvis Research Center, Tencent YouTu Lab, Beijing, China
Yawen Huang
Zhongguancun Laboratory, Beijing, China
Juan Zhang & Baochang Zhang

Authors

Xuhui Liu
View author publications
Search author on:PubMed Google Scholar
Hong Li
View author publications
Search author on:PubMed Google Scholar
Zhi Qiao
View author publications
Search author on:PubMed Google Scholar
Yawen Huang
View author publications
Search author on:PubMed Google Scholar
Xi Liu
View author publications
Search author on:PubMed Google Scholar
Juan Zhang
View author publications
Search author on:PubMed Google Scholar
Zhen Qian
View author publications
Search author on:PubMed Google Scholar
Xiantong Zhen
View author publications
Search author on:PubMed Google Scholar
Baochang Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed to the study conception. Material preparation, data collection, and analysis were performed by Xuhui Liu, Hong Li, and Zhi Qiao. The first draft of the manuscript was written by Xuhui Liu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Juan Zhang or Xiantong Zhen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This work proposes a CT reconstruction method from incomplete-view data. It doesn’t have a direct negative social impact, yet we should prevent its misuse for malicious intents.

Additional information

Communicated by Svetlana Lazebnik.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, X., Li, H., Qiao, Z. et al. D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction. Int J Comput Vis 133, 5238–5261 (2025). https://doi.org/10.1007/s11263-025-02426-2

Download citation

Received: 01 August 2024
Accepted: 11 March 2025
Published: 16 April 2025
Issue Date: August 2025
DOI: https://doi.org/10.1007/s11263-025-02426-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

A review on Deep Learning approaches for low-dose Computed Tomography restoration

Topology-Preserving Computed Tomography Super-Resolution Based on Dual-Stream Diffusion Model

Explore related subjects

Data Availability

Code Availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now