Abstract
Computed tomography (CT) is a cornerstone of clinical imaging, yet its accessibility in certain scenarios is constrained by radiation exposure concerns and operational limitations within surgical environments. CT reconstruction from incomplete views has attracted increasing research attention due to its great potential in medical applications. However, it is inherently an ill-posed problem, which, coupled with the complex, high-dimensional characteristics of 3D medical data, poses great challenges such as artifact mitigation, global incoherence, and high computational costs. To tackle those challenges, this paper introduces D3T, a new 3D conditional diffusion transformer that models 3D CT distributions in the low-dimensional 2D latent space for incomplete-view CT reconstruction. Our approach comprises two primary components: a triplanar vector quantized auto-encoder (TriVQAE) and a latent dual-domain diffusion transformer (LD3T). TriVQAE encodes high-resolution 3D CT images into compact 2D latent triplane codes which effectively factorize the intricate CT structures, further enabling compute-friendly diffusion model architecture design. Operating in the latent triplane space, LD3T significantly reduces the complexity of capturing the intricate structures in CT images. Its improved diffusion transformer architecture efficiently understands the global correlations across the three planes, ensuring high-fidelity 3D reconstructions. LD3T presents a new dual-domain conditional generation pipeline that incorporates both image and projection conditions, facilitating controllable reconstruction to produce 3D structures consistent with the given conditions. Moreover, LD3T introduces a new Dual-Space Consistency Loss that integrates image-level supervision beyond standard supervision in the latent space to enhance consistency in the 3D image space. Extensive experiments on four datasets with three inverse settings demonstrate the effectiveness of our proposal.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data Availability
The LIDC-IDRI (Armato III et al., 2011), CTSpine1K (Deng et al., 2021), and CTPelvic1K (Liu et al., 2021) datasets analyzed during the current study are both public and available. The links to these datasets can be found in the original papers. The LumbaV dataset is proposed in (Liu et al., 2024), but not accessible yet. One can contact the authors for permission to download the data.
Code Availibility
The codes are open-sourced at https://github.com/clannadcl/Official-codes-of-D3T.
References
Abdollahi, A., Pradhan, B., & Alamri, A. (2020). Vnet: An end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access, 8, 179424–179436.
Agarwal, S., Furukawa, Y., Snavely, N., et al. (2011). Building Rome in a day. Communications of the ACM, 54(10), 105–112.
Armato, S. G., III., McLennan, G., Bidaut, L., et al. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915–931.
Barron, J. T., Mildenhall, B., Tancik, M., et al. (2021). Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5855–5864).
Blattmann, A., Rombach, R., Ling, H., et al. (2023). Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 22563–22575).
Cao, Z., Hong, F., Wu, T., et al. (2025). Difftf++: 3d-aware diffusion transformer for large-vocabulary 3D generation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Chan, E. R., Lin, C. Z., Chan, M. A., et al. (2022). Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16123–16133).
Chang, A. X., Funkhouser, T., Guibas, L., et al. (2015). Shapenet: An information-rich 3D model repository. arXiv:1512.03012
Chen, A., Xu, Z., Zhao, F., et al. (2021). Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14124–14133).
Chen, A., Xu, Z., Geiger, A., et al. (2022). Tensorf: Tensorial radiance fields. In: European conference on computer vision (pp. 333–350). Springer.
Chen, D. Z., Siddiqui, Y., Lee, H. Y., et al. (2023a). Text2tex: Text-driven texture synthesis via diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 18558–18568).
Chen, H., Gu, J., Chen, A., et al. (2023b). Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2416–2425).
Chen, J., Yu, J., Ge, C., et al. (2023c). Pixart-\(alpha\): Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv:2310.00426
Chen, R., Chen, Y., Jiao, N., et al. (2023d). Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22246–22256).
Chung, H., Sim, B., Ryu, D., et al. (2022). Improving diffusion models for inverse problems using manifold constraints. Advances in Neural Information Processing Systems, 35, 25683–25696.
Chung, H., Lee, S., Ye, J. C. (2023a). Decomposed diffusion sampler for accelerating large-scale inverse problems. arXiv:2303.05754
Chung, H., Ryu, D., McCann, M. T., et al. (2023b). Solving 3D inverse problems using pre-trained 2D diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 22542–22551).
Deng, Y., Wang, C., Hui, Y., et al. (2021). Ctspine1k: A large-scale dataset for spinal vertebrae segmentation in computed tomography. arXiv:2105.14711
Du, C., Lin, X., Wu, Q., et al. (2024). Dper: Diffusion prior driven neural representation for limited angle and sparse view CT reconstruction. arXiv:2404.17890
Esser, P., Rombach, R., Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12873–12883).
Gao, P., Zhuo, L., Liu, D., et al. (2024). Lumina-t2x: Transforming text into any modality, resolution, and duration via flow-based large diffusion transformers. arXiv:2405.05945
Gao, S., Liu, X., Zeng, B., et al. (2023). Implicit diffusion models for continuous super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10021–10030).
Ge, R., He, Y., Xia, C., et al. (2022). X-CTRSNET: 3D cervical vertebra CT reconstruction and segmentation directly from 2D X-ray images. Knowledge-Based Systems, 236, 107680.
Gieruc, T., Kästingschäfer, M., Bernhard, S., et al. (2024). 6img-to-3d: Few-image large-scale outdoor driving scene reconstruction. arXiv:2404.12378
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems (Vol. 27).
Gu, S., Chen, D., Bao, J., et al. (2022). Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10696–10706).
Harmon, S. A., Sanford, T. H., Xu, S., et al. (2020). Artificial intelligence for the detection of covid-19 pneumonia on chest CT using multinational datasets. Nature Communications, 11(1), 4080.
Hatamizadeh, A., Nath, V., Tang, Y., et al. (2021). Swin unetr: Swin transformers for semantic segmentation of brain tumors in MRI images. In International MICCAI brainlesion workshop (pp. 272–284). Springer.
He, J., Li, B., Yang, G., et al. (2024). Blaze3dm: Marry triplane representation with diffusion for 3D medical inverse problem solving. arXiv:2405.15241
Henry, A., Dachapally, P. R., Pawar, S., et al. (2020). Query-key normalization for transformers. arXiv:2010.04245
Henzler, P., Rasche, V., Ropinski, T., et al. (2018). Single-image tomography: 3D volumes from 2D cranial x-rays. In Computer graphics forum (pp. 377–388). Wiley Online Library.
Herman, G. T. (2009). Fundamentals of computerized tomography: Image reconstruction from projections. Springer.
Heusel, M., Ramsauer, H., Unterthiner, T., et al. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (Vol. 30).
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Ho, J., Chan, W., Saharia, C., et al. (2022a). Imagen video: High definition video generation with diffusion models. arXiv:2210.02303
Ho, J., Salimans, T., Gritsenko, A., et al. (2022). Video diffusion models. Advances in Neural Information Processing Systems, 35, 8633–8646.
Hong, Y., Zhang, K., Gu, J., et al. (2023). LRM: Large reconstruction model for single image to 3D. arXiv:2311.04400
Hu, Z., Zhao, M., Zhao, C., et al. (2024). Efficientdreamer: high-fidelity and robust 3d creation via orthogonal-view diffusion priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4949–4958).
Huang, Y., Taubmann, O., Huang, X., et al. (2018). Scale-space anisotropic total variation for limited angle tomography. IEEE Transactions on Radiation and Plasma Medical Sciences, 2(4), 307–314.
Huang, Y., Preuhs, A., Lauritsch, G., et al. (2019). Data consistent artifact reduction for limited angle tomography with deep learning prior. In International workshop on machine learning for medical image reconstruction (pp. 101–112). Springer.
Jiang, L., Zhang, M., Wei, R., et al. (2021). Reconstruction of 3D CT from a single x-ray projection view using CVAE-GAN. In 2021 IEEE international conference on medical imaging physics and engineering (ICMIPE) (pp. 1–6). IEEE.
Jin, K. H., McCann, M. T., Froustey, E., et al. (2017). Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9), 4509–4522.
Johnson, A. E., Pollard, T. J., Berkowitz, S. J., et al. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1), 317.
Johnson, C. D., Chen, M. H., Toledano, A. Y., et al. (2008). Accuracy of CT colonography for detection of large adenomas and cancers. New England Journal of Medicine, 359(12), 1207–1217.
Jun, H., Nichol, A. (2023). Shap-E: Generating conditional 3D implicit functions. arXiv:2305.02463
Kasten, Y., Doktofsky, D., Kovler, I. (2020). End-to-end convolutional neural network for 3D reconstruction of knee bones from bi-planar X-ray images. In Machine learning for medical image reconstruction: third international workshop, MLMIR 2020, held in conjunction with MICCAI 2020, Lima, Peru, October 8, 2020, Proceedings 3 (pp. 123–133). Springer.
Katsura, M., Matsuda, I., Akahane, M., et al. (2012). Model-based iterative reconstruction technique for radiation dose reduction in chest CT: Comparison with the adaptive statistical iterative reconstruction technique. European Radiology, 22, 1613–1623.
Kong, Z., Ping, W., Huang, J., et al. (2020). Diffwave: A versatile diffusion model for audio synthesis. arXiv:2009.09761
Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (Vol. 25).
Lee, S., Chung, H., Park, M., et al. (2023). Improving 3D imaging with pre-trained perpendicular 2D diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10710–10720).
Li, J., Tan, H., Zhang, K., et al. (2023). Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv:2311.06214
Lin, C. H., Gao, J., Tang, L., et al.: (2023a) Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 300–309).
Lin, K. E., Lin, Y. C., Lai, W. S., et al. (2023b). Vision transformer for nerf-based view synthesis from a single input image. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 806–815).
Lin, W. A., Liao, H., Peng, C., et al. (2019). Dudonet: Dual domain network for CT metal artifact reduction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10512–10521).
Liu, J., Li, C., Ren, Y., et al. (2022a). Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI conference on artificial intelligence (pp. 11020–11028).
Liu, J., Anirudh, R., Thiagarajan, J. J., et al. (2023a). Dolce: A model-based probabilistic diffusion framework for limited-angle CT reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10498–10508).
Liu, L. (2014). Model-based iterative reconstruction: A promising algorithm for today’s computed tomography imaging. Journal of Medical Imaging and Radiation Sciences, 45(2), 131–136.
Liu, M., Xu, C., Jin, H., et al. (2023). One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems, 36, 22226–22246.
Liu, P., Han, H., Du, Y., et al. (2021). Deep learning to segment pelvic bones: Large-scale CT datasets and baseline models. International Journal of Computer Assisted Radiology and Surgery, 16, 749–756.
Liu, R., Wu, R., Van Hoorick, B., et al. (2023c). Zero-1-to-3: Zero-shot one image to 3D object. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9298–9309).
Liu, X., Qiao, Z., Liu, R., et al. (2024). Diffux2ct: Diffusion learning to reconstruct CT images from biplanar x-rays. In European conference on computer vision (pp. 458–476). Springer.
Liu, Y., Ma, J., Fan, Y., et al. (2012). Adaptive-weighted total variation minimization for sparse data toward low-dose X-ray computed tomography image reconstruction. Physics in Medicine & Biology, 57(23), 7923.
Liu, Z., Ning, J., Cao, Y., et al. (2022b). Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3202–3211).
Long, X., Guo, Y. C., Lin, C., et al. (2024). Wonder3d: Single image to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9970–9980).
Ma, C., Li, Z., He, J., et al. (2023a). Universal incomplete-view CT reconstruction with prompted contextual transformer. arXiv:2312.07846
Ma, C., Li, Z., Zhang, J., et al. (2023b). Freeseed: Frequency-band-aware and self-guided network for sparse-view CT reconstruction. In International conference on medical image computing and computer-assisted intervention (pp. 250–259). Springer.
Mildenhall, B., Srinivasan, P. P., Tancik, M., et al. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99–106.
Mohan, K. A., Venkatakrishnan, S., Gibbs, J. W., et al. (2015). Timbir: A method for time-space reconstruction from interlaced views. IEEE Transactions on Computational Imaging, 1(2), 96–111.
Müller, T., Evans, A., Schied, C., et al. (2022). Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (TOG), 41(4), 1–15.
Nichol, A., Dhariwal, P., Ramesh, A., et al. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741
Nolan, T. (2022). Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT). Cancer Imaging Archive.
Peebles, W., Xie, S. (2023). Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4195–4205).
Peng, W., Adeli, E., Bosschieter, T., et al. (2023). Generating realistic brain MRIs via a conditional diffusion probabilistic model. In: International conference on medical image computing and computer-assisted intervention (pp. 14–24). Springer.
Perez, E., Strub, F., De Vries, H., et al, (2018). Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence.
Poole, B., Jain, A., Barron, J. T., et al. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988
Ramesh, A., Dhariwal, P., Nichol, A., et al. (2022). Hierarchical text-conditional image generation with clip latents 1(2):3. arXiv:2204.06125
Ren, Y., Wang, F., Zhang, T., et al. (2023). Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16685–16695).
Rombach, R., Blattmann, A., Lorenz, D., et al. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684–10695).
Ronchetti, M. (2020). Torchradon: Fast differentiable routines for computed tomography. arXiv:2009.14788
Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, 36479–36494.
Saharia, C., Ho, J., Chan, W., et al. (2022). Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4713–4726.
Schönberger, J. L., Zheng, E., Frahm, J. M., et al. (2016). Pixelwise view selection for unstructured multi-view stereo. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14 (pp. 501–518). Springer.
Shen, L., Zhao, W., & Xing, L. (2019). Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nature Biomedical Engineering, 3(11), 880–888.
Shi, Y., Wang, P., Ye, J., et al. (2023). Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512
Shiode, R., Kabashima, M., Hiasa, Y., et al. (2021). 2d–3d reconstruction of distal forearm bone from actual X-ray images of the wrist using convolutional neural networks. Scientific Reports, 11(1), 15249.
Shuang, W., Lin, Y., Zeng, Y., et al. (2024). Direct3d: Scalable image-to-3D generation via 3D latent diffusion transformer. Advances in Neural Information Processing Systems, 37, 121859–121881.
Siasios, I. D., Pollina, J., Khan, A., et al. (2017). Percutaneous screw placement in the lumbar spine with a modified guidance technique based on 3D CT navigation system. Journal of Spine Surgery, 3(4), 657.
Simpson, A. L., Antonelli, M., Bakas, S., et al. (2019). A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv:1902.09063
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., et al. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256–2265). PMLR.
Song, J., Meng, C., Ermon, S. (2020). Denoising diffusion implicit models. arXiv:2010.02502
Song, Y., Shen, L., Xing, L., et al. (2021). Solving inverse problems in medical imaging with score-based generative models. arXiv:2111.08005
Su, J., Ahmed, M., Lu, Y., et al. (2024). Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568, 127063.
Tang, J., Wang, T., Zhang, B., et al. (2023). Make-it-3D: High-fidelity 3D creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22819–22829).
Trevithick, A., Yang, B. (2021). GRF: Learning a general radiance field for 3d representation and rendering. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15182–15192).
Venkatakrishnan, S. V., Drummy, L. F., Jackson, M. A., et al. (2013). A model based iterative reconstruction algorithm for high angle annular dark field-scanning transmission electron microscope (HAADF-STEM) tomography. IEEE Transactions on Image Processing, 22(11), 4532–4544.
Venkatakrishnan, S. V., Mohan, K. A., Ziabari, A. K., et al. (2021). Algorithm-driven advances for scientific CT instruments: From model-based to deep learning-based approaches. IEEE Signal Processing Magazine, 39(1), 32–43.
Wang, C., Shang, K., Zhang, H., et al. (2022a). Dudotrans: dual-domain transformer for sparse-view CT reconstruction. In International workshop on machine learning for medical image reconstruction (pp. 84–94). Springer.
Wang, G., Ye, J. C., Mueller, K., et al. (2018). Image reconstruction is a new frontier of machine learning. IEEE Transactions on Medical Imaging, 37(6), 1289–1296.
Wang, G., Ye, J. C., & De Man, B. (2020). Deep learning for tomographic image reconstruction. Nature Machine Intelligence, 2(12), 737–748.
Wang, H., Du, X., Li, J., et al. (2023a). Score Jacobian chaining: Lifting pretrained 2D diffusion models for 3d generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12619–12629).
Wang, Q., Wang, Z., Genova, K., et al. (2021). IBRNet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4690–4699).
Wang, T., Zhang, B., Zhang, T., et al. (2023b). Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4563–4573).
Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wang, Z., Wu, Z., Agarwal, D., et al. (2022b). Medclip: Contrastive learning from unpaired medical images and text. In Proceedings of the conference on empirical methods in natural language processing. Conference on empirical methods in natural language processing (p. 3876).
Wang, Z., Lu, C., Wang, Y., et al. (2023). ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. Advances in Neural Information Processing Systems, 36, 8406–8441.
Wang, Z., Wang, Y., Chen, Y., et al. (2024). CRM: Single image to 3D textured mesh with convolutional reconstruction model. In European conference on computer vision (pp. 57–74). Springer.
Wu, J., & Mahfouz, M. R. (2021). Reconstruction of knee anatomy from single-plane fluoroscopic X-ray based on a nonlinear statistical shape model. Journal of Medical Imaging, 8(1), 016001–016001.
Wu, J. Z., Ge, Y., Wang, X., et al. (2023). Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7623–7633).
Wu, W., Hu, D., Niu, C., et al. (2021). Drone: Dual-domain residual-based optimization network for sparse-view CT reconstruction. IEEE Transactions on Medical Imaging, 40(11), 3002–3014.
Wu, Z., Li, Y., Yan, H., et al. (2024). Blockfusion: Expandable 3D scene generation using latent tri-plane extrapolation. ACM Transactions on Graphics (TOG), 43(4), 1–17.
Xu, J., Cheng, W., Gao, Y., et al. (2024). Instantmesh: Efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv:2404.07191
Ying, X., Guo, H., Ma, K., et al. (2019). X2CT-GAN: Reconstructing CT from biplanar X-rays with generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10619–10628).
Yu, A., Ye, V., Tancik, M., et al. (2021). pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4578–4587).
Yu, S., Sohn, K., Kim, S., et al. (2023). Video probabilistic diffusion models in projected latent space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18456–18466).
Zhang, B., & Sennrich, R. (2019). Root mean square layer normalization. In Advances in neural information processing systems (Vol. 32).
Zhang, C., Liu, L., Dai, J., et al. (2024). XTransCT: Ultra-fast volumetric CT reconstruction using two orthogonal X-ray projections for image-guided radiation therapy via a transformer network. Physics in Medicine & Biology, 69(8), 085010.
Zhang, R., Isola, P., Efros, A. A., et al. (2018a). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).
Zhang, Z., Liang, X., Dong, X., et al. (2018). A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Transactions on Medical Imaging, 37(6), 1407–1417.
Zhang, Z., Yu, L., Liang, X., et al. (2021). TransCT: Dual-path transformer for low dose computed tomography. In Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VI 24 (pp. 55–64). Springer.
Zheng, X. Y., Pan, H., Guo, Y. X., et al. (2024). Mvd\(\hat{\,}\) 2: Efficient multiview 3D reconstruction for multiview diffusion. In ACM SIGGRAPH 2024 conference papers (pp. 1–11).
Zhu, J., Zhuang, P., & Koyejo, S. (2023). HiFA: High-fidelity text-to-3D generation with advanced diffusion guidance. arXiv:2305.18766
Acknowledgements
The work was supported by National Natural Science Foundation of China under Grant No. 62176068 and 12201024, the National Key Research and Development Program of China (Grant No. 2023YFC3306401). This research was also supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, L244043 and Z241100001324017 “One Thousand Plan” projects in Jiangxi Province Jxsq2023102268.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception. Material preparation, data collection, and analysis were performed by Xuhui Liu, Hong Li, and Zhi Qiao. The first draft of the manuscript was written by Xuhui Liu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This work proposes a CT reconstruction method from incomplete-view data. It doesn’t have a direct negative social impact, yet we should prevent its misuse for malicious intents.
Additional information
Communicated by Svetlana Lazebnik.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Li, H., Qiao, Z. et al. D3T: Dual-Domain Diffusion Transformer in Triplanar Latent Space for 3D Incomplete-View CT Reconstruction. Int J Comput Vis 133, 5238–5261 (2025). https://doi.org/10.1007/s11263-025-02426-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-025-02426-2