Abstract
High-fidelity 3D garment synthesis from text is desirable yet challenging for digital avatar creation. Recent diffusion-based approaches via Score Distillation Sampling (SDS) have enabled new possibilities but either intricately couple with human body or struggle to reuse. We introduce ClotheDreamer, a 3D Gaussian-based method for generating wearable, production-ready 3D garment assets from text prompts. We propose a novel representation Disentangled Clothe Gaussian Splatting (DCGS) to enable separate optimization. DCGS represents clothed avatar as one gaussian model but freezes body Gaussian splats. To enhance quality and completeness, we incorporate bidirectional SDS to supervise clothed avatar and garment RGBD renderings respectively with pose conditions and propose a new pruning strategy for loose clothing. Our approach can also support custom clothing templates as input. Benefiting from our design, the synthetic 3D garment can be easily applied to virtual try-on and support physically accurate animation. Extensive experiments showcase our method’s superior and competitive performance. Our project page is at https://ggxxii.github.io/clothedreamer.
Similar content being viewed by others
Data Availability
The code and the generated garment resources used in this study will be made available at the following public repository: https://github.com/ggxxii/clothedreamer.
References
Provot X (1997) Collision and self-collision handling in cloth model dedicated to design garments. In: Computer animation and simulation’97: Proceedings of the eurographics workshop in budapest, hungary, September 2–3, 1997, Springer, pp 177–189
Baraff D, Witkin A (2023) Large steps in cloth simulation. Seminal graphics papers: Pushing the boundaries 2:767–778
Narain R, Samii A, O’brien JF (2012) Adaptive anisotropic remeshing for cloth simulation. ACM Trans Graph (TOG) 31(6):1–10
Li Y, Habermann M, Thomaszewski B, Coros S, Beeler T, Theobalt C (2021) Deep physics-aware inference of cloth deformation for monocular human performance capture. In: 2021 international conference on 3d vision (3DV), IEEE, pp 373–384
Jun H, Nichol A () Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463
Nichol A, Jun H, Dhariwal P, Mishkin P, Chen M (2022) Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751
Gao J, Shen T, Wang Z, Chen W, Yin K, Li D, Litany O, Gojcic Z, Fidler S (2022) Get3d: A generative model of high quality 3d textured shapes learned from images. Adv Neural Inf Process Syst 35:31841–31854
Qian G, Mai J, Hamdi A, Ren J, Siarohin A, Li B, Lee HY, Skorokhodov I, Wonka P, Tulyakov S, Ghanem B (2024) Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. In: The 12th international conference on learning representations (ICLR)
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10684–10695
Poole B, Jain A, Barron JT, Mildenhall B (2023) Dreamfusion: Text-to-3d using 2d diffusion. In: The 11th international conference on learning representations (ICLR)
Chen R, Chen Y, Jiao N, Jia K (2023) Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 22246–22256
Tang J, Ren J, Zhou H, Liu Z, Zeng G (2024) Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In: The 12th international conference on learning representations (ICLR)
Yi T, Fang J, Wang J, Wu G, Xie L, Zhang X, Liu W, Tian Q, Wang X (2024) Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Tang J, Chen Z, Chen X, Wang T, Zeng G, Liu Z (2024) Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In: European conference on computer vision (ECCV), Springer, pp 1–18
Lin CH, Gao J, Tang L, Takikawa T, Zeng X, Huang X, Kreis K, Fidler S, Liu MY, Lin TY (2023) Magic3d: High-resolution text-to-3d content creation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 300–309
Tang S, Zhang F, Chen J, Wang P, Furukawa Y (2023) MVDiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. In: 37th conference on neural information processing systems
Liu Y, Zhu J, Tang J, Zhang S, Zhang J, Cao W, Wang C, Wu Y, Huang D (2024) Texdreamer: Towards zero-shot high-fidelity 3d human texture generation. In: European conference on computer vision, Springer, pp 184–202
Michel O, Bar-On R, Liu R, Benaim S, Hanocka R (2022) Text2mesh: Text-driven neural stylization for meshes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Jiang R, Wang C, Zhang J, Chai M, He M, Chen D, Liao J (2023) Avatarcraft: Transforming text into neural human avatars with parameterized shape and pose control. 2023 IEEE/CVF international conference on computer vision (ICCV), pp 14325–14336
Huang S, Yang Z, Li L, Yang Y, Jia J (2023) Avatarfusion: Zero-shot generation of clothing-decoupled 3d avatars using 2d diffusion. In: Proceedings of the 31st ACM international conference on multimedia, pp 5734–5745
Hong F, Zhang M, Pan L, Cai Z, Yang L, Liu Z (2022) Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. ACM Trans Graph (TOG) 41(4):1–19
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2021) Nerf: Representing scenes as neural radiance fields for view synthesis. Commun ACM 65(1):99–106
Kerbl B, Kopanas G, Leimkühler T, Drettakis G (2023) 3d gaussian splatting for real-time radiance field rendering. ACM Trans Graph 42(4):1–14
Shen T, Gao J, Yin K, Liu MY, Fidler S (2021) Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Adv Neural Inf Process Syst 34:6087–6101
Wang J, Liu Y, Dou Z, Yu Z, Liang Y, Lin C, Xie R, Song L, Li X, Wang W (2024) Disentangled clothed avatar generation from text descriptions. In: European conference on computer vision, springer, pp 381–401
Liao T, Yi H, Xiu Y, Tang J, Huang Y, Thies J, Black MJ (2024) TADA! text to animatable digital avatars. In: InternationalcConference on 3D vision (3DV)
Wang Z, Lu C, Wang Y, Bao F, Li C, Su H, Zhu J (2024) Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Adv Neural Inf Process Syst 36
Liu X, Zhan X, Tang J, Shan Y, Zeng G, Lin D, Liu X, Liu Z (2024) Humangaussian: Text-driven 3d human generation with gaussian splatting. In: 2024 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Pavlakos G, Choutas V, Ghorbani N, Bolkart T, Osman AA, Tzionas D, Black MJ (2019) Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10975–10985
Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2023) Smpl: A skinned multi-person linear model. Seminal graphics papers: Pushing the boundaries 2:851–866
Casado-Elvira A, Trinidad MC, Casas D (2022) Pergamo: Personalized 3d garments from monocular video. In: Computer graphics forum, vol 41. Wiley Online Library, pp 293–304
Wang TY, Ceylan D, Popovic J, Mitra NJ (2018) Learning a shared shape space for multimodal garment design. ACM Trans Graph (TOG) 37(6):1–13
Tiwari G, Bhatnagar BL, Tung T, Pons-Moll G (2020) Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing. In: Computer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, Springer, pp 1–18
Gundogdu E, Constantin V, Seifoddini A, Dang M, Salzmann M, Fua P (2019) Garnet: A two-stream network for fast and accurate 3d cloth draping. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8739–8748
Grigorev A, Black MJ, Hilliges O (2023) Hood: Hierarchical graphs for generalized modelling of clothing dynamics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16965–16974
Xue Y, Bhatnagar BL, Marin R, Sarafianos N, Xu Y, Pons-Moll G, Tung T (2023) Nsf: Neural surface fields for human modeling from monocular depth. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15049–15060
Li R, Guillard B, Fua P (2023) ISP: Multi-layered garment draping with implicit sewing patterns. In: 37th conference on neural information processing systems
Bertiche H, Madadi M, Escalera S (2021) Pbns: Physically based neural simulator for unsupervised garment pose space deformation. ACM Trans Graph (TOG) 40(6):1–14
Ma Q, Yang J, Ranjan A, Pujades S, Pons-Moll G, Tang S, Black MJ (2020) Learning to dress 3d people in generative clothing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6469–6478
Pan X, Mai J, Jiang X, Tang D, Li J, Shao T, Zhou K, Jin X, Manocha D (2022) Predicting loose-fitting garment deformations using bone-driven motion networks. In: ACM SIGGRAPH 2022 conference proceedings, pp 1–10
Santesteban I, Otaduy MA, Casas D (2022) Snug: Self-supervised neural dynamic garments. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8140–8150
Bertiche H, Madadi M, Tylson E, Escalera S (2021) Deepsd: Automatic deep skinning and pose space deformation for 3d garment animation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5471–5480
Gundogdu E, Constantin V, Parashar S, Seifoddini A, Dang M, Salzmann M, Fua P (2020) Garnet++: Improving fast and accurate static 3d cloth draping by curvature loss. IEEE Trans Patt Anal Mach Intell 44(1):181–195
Lahner Z, Cremers D, Tung T (2018) Deepwrinkles: Accurate and realistic clothing modeling. In: Proceedings of the european conference on computer vision (ECCV), pp 667–684
Zhang M, Ceylan D, Mitra NJ (2022) Motion guided deep dynamic 3d garments. ACM Trans Graph (TOG) 41(6):1–12
Chen X, Wang G, Zhu D, Liang X, Torr P, Lin L (2022) Structure-preserving 3d garment modeling with neural sewing machines. Adv Neural Inf Process Syst 35:15147–15159
Su Z, Yu T, Wang Y, Liu Y (2022) Deepcloth: Neural garment representation for shape and style editing. IEEE Trans Patt Anal Mach Intell 45(2):1581–1593
Li R, Guillard B, Remelli E, Fua P (2022) Dig: Draping implicit garment over the human body. In: Proceedings of the Asian conference on computer vision, pp 2780–2795
Corona E, Pumarola A, Alenya G, Pons-Moll G, Moreno-Noguer F (2021) Smplicit: Topology-aware generative model for clothed people. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11875–11885
De Luigi L, Li R, Guillard B, Salzmann M, Fua P (2023) Drapenet: Garment generation and self-supervised draping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1451–1460
Li Y, Chen Hy, Larionov E, Sarafianos N, Matusik W, Stuyck T (2024) DiffAvatar: Simulation-ready garment optimization with differentiable simulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Tang J, Wang T, Zhang B, Zhang T, Yi R, Ma L, Chen D (2023) Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 22819–22829
Tang J, Zeng Y, Fan K, Wang X, Dai B, Chen K, Ma L (2024) Make-it-vivid: Dressing your animatable biped cartoon characters from text. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6243–6253
Karnewar A, Vedaldi A, Novotny D, Mitra NJ (2023) Holodiffusion: Training a 3d diffusion model using 2d images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18423–18433
He K, Yao K, Zhang Q, Yu J, Liu L, Xu L (2024) Dresscode: Autoregressively sewing and generating garments from text guidance. ACM Trans Graph (TOG) 43(4):1–13
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A et al (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 35:27730–27744
Li B, Li X, Jiang Y, Xie T, Gao F, Wang H, Yang Y, Jiang C (2025) Garmentdreamer: 3dgs guided garment synthesis with diverse geometry and texture details. In: International conference on 3d vision (3DV)
Chen Z, Wang F, Wang Y, Liu H (2024) Text-to-3d using gaussian splatting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 21401–21412
Liang Y, Yang X, Lin J, Li H, Xu X, Chen Y (2024) Luciddreamer: Towards high-fidelity text-to-3d generation via interval score matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6517–6526
Zeng Y, Lu Y, Ji X, Yao Y, Zhu H, Cao X (2023) Avatarbooth: High-quality and customizable 3d human avatar generation. arXiv preprint arXiv:2306.09864
Cao Y, Cao YP, Han K, Shan Y, Wong KYK (2024) Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 958–968
Kolotouros N, Alldieck T, Zanfir A, Bazavan E, Fieraru M, Sminchisescu C (2024) Dreamhuman: Animatable 3d avatars from text. Adv Neural Inf Process Syst 36
Huang X, Shao R, Zhang Q, Zhang H, Feng Y, Liu Y, Wang Q (2024) Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4568–4577
Wang P, Liu L, Liu Y, Theobalt C, Komura T, Wang W (2021) Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Adv Neural Inf Process Syst
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763
Huang Y, Wang J, Zeng A, Cao H, Qi X, Shi Y, Zha ZJ, Zhang L (2024) Dreamwaltz: Make a scene with complex 3d animatable avatars. Adv Neural Inf Process Syst 36
Yuan Y, Li X, Huang Y, De Mello S, Nagano K, Kautz J, Iqbal U (2024) Gavatar: Animatable 3d gaussian avatars with implicit mesh learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 896–905
Hong F, Chen Z, LAN Y, Pan L, Liu Z (2023) EVA3d: Compositional 3d human generation from 2d image collections. In: The 11th international conference on learning representations
Xu Y, Yifan W, Bergman AW, Chai M, Zhou B, Wetzstein G (2024) Efficient 3d articulated human generation with layered surface volumes. In: 2024 international conference on 3D vision (3DV), IEEE, pp 268–279
Dong J, Fang Q, Huang Z, Xu X, Wang J, Peng S, Dai B (2024) Tela: Text to layer-wise 3d clothed human generation. In: European conference on computer vision, Springer, pp 19–36
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. In: ACM siggraph 2006 papers, pp 835–846
Zwicker M, Pfister H, Van Baar J, Gross M (2001) Ewa volume splatting. In: Proceedings Visualization, 2001. VIS’01., IEEE, pp 29–538
Marvelous Designer (2023) 2024.: Official Site. https://www.marvelousdesigner.com/
Rusinkiewicz S, Levoy M (2001) Efficient variants of the icp algorithm. In: Proceedings 3rd international conference on 3-D digital imaging and modeling, IEEE, pp 145–152
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Patt Anal Mach Intell 36
Fridovich-Keil S, Yu A, Tancik M, Chen Q, Recht B, Kanazawa A (2022) Plenoxels: Radiance fields without neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5501–5510
Guo YC, Liu YT, Shao R, Laforte C, Voleti V, Luo G, Chen CH, Zou ZX, Wang C, Cao YP, Zhang SH (2023) threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio
Kim S, Shin J, Cho Y, Jang J, Longpre S, Lee H, Yun S, Shin S, Kim S, Thorne J, Seo M (2024) Prometheus: Inducing fine-grained evaluation capability in language models. In: The 12th international conference on learning representations
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inf Process Syst 30
Chen Z, Wu J, Wang W, Su W, Chen G, Xing S, Zhong M, Zhang Q, Zhu X, Lu L et al (2024) Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 24185–24198
Lin S, Liu B, Li J, Yang X (2024) Common diffusion noise schedules and sample steps are flawed. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 5404–5411
Katzir O, Patashnik O, Cohen-Or D, Lischinski D (2024) Noise-free score distillation. In: The 12th international conference on learning representations
Jiang Y, Tu J, Liu Y, Gao X, Long X, Wang W, Ma Y (2024) Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5322–5332
Funding
This work was supported by the Shanghai Talent Development Funding of China (Grant No. 2021016).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Yufei Liu and Junshu Tang were responsible for the overall direction and planning. Yufei Liu, Junshu Tang and Chu Zheng designed the generative framework and resolved technical details. Yufei Liu and Chu Zheng wrote the first draft of the manuscript and collaborated with Junwei Zhu and Chengjie Wang on the model implementation. Chu Zheng and Yufei Liu conducted preliminary research on related work and performed three qualitative comparison experiments. Yufei Liu and Dongjin Huang designed the user study and carried out data analysis. Dongjin Huang also assisted in supervising the project and, together with Yufei Liu and Chu Zheng, discussed the experimental results. All authors discussed the results and provided comments on the manuscript.
Corresponding author
Ethics declarations
Competing interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethics approval and consent to participate
Informed consent was obtained from all individual participants included in the study. The participant has consented to the submission of the case report to the journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Tang, J., Zheng, C. et al. ClotheDreamer: Text-guided garment generation with 3D gaussians. Appl Intell 55, 767 (2025). https://doi.org/10.1007/s10489-025-06596-x
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s10489-025-06596-x