UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors

Xu, Chao; Qian, Yijie; Zhu, Shaoting; Sun, Baigui; Zhao, Jian; Liu, Yong; Li, Xuelong

doi:10.1007/s11263-025-02395-6

UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors

Published: 11 March 2025

Volume 133, pages 4538–4554, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chao Xu ORCID: orcid.org/0000-0001-9300-4497¹^na1,
Yijie Qian¹^na1,
Shaoting Zhu¹,
Baigui Sun²,
Jian Zhao^3,4,
Yong Liu¹ &
…
Xuelong Li^3,4

585 Accesses
Explore all metrics

Abstract

Face reenactment and swapping share a similar pattern of identity and attribute manipulation. Our previous work UniFace has preliminarily explored establishing a unification between the two at the feature level, but it heavily relies on the accuracy of feature disentanglement, and GANs are also unstable during training. In this work, we delve into the intrinsic connections between the two from a more general training paradigm perspective, introducing a novel diffusion-based unified method UniFace++. Specifically, this work combines the advantages of each, i.e., stability of reconstruction training from reenactment, simplicity and effectiveness of the target-oriented processing from swapping, and redefining both as target-oriented reconstruction tasks. In this way, face reenactment avoids complex source feature deformation and face swapping mitigates the unstable seesaw-style optimization. The core of our approach is the rendered face obtained from reassembled 3D facial priors serving as the target pivot, which contains precise geometry and coarse identity textures. We further incorporate it with the proposed Texture-Geometry-aware Diffusion Model (TGDM) to perform texture transfer under the reconstruction supervision for high-fidelity face synthesis. Extensive quantitative and qualitative experiments demonstrate the superiority of our method for both tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised High-Fidelity Facial Texture Generation and Reconstruction

Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping

Unified Application of Style Transfer for Face Swapping and Reenactment

Data Availability

Our method utilizes the open-source datasets, i.e., VoxCeleb1 (Nagrani et al., 2017) and VGGFace2-HQ (Cao et al., 2018) for face reenactment, CelebA-HQ (Lee et al., 2020) and FaceForensics++ (Rossler et al., 2019) for face swapping. We will release the code upon acceptance for reproduction.

Notes

References

Avrahami, O., Lischinski, D., & Fried, O. (2022). Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18208–18218).
Bao, J., Chen, D., Wen, F., Li, H., & Hua, G. (2018). Towards open-set identity preserving face synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6713–6722).
Bitouk, D., Kumar, N., Dhillon, S., Belhumeur, P., & Nayar, S. K. (2008). Face swapping: automatically replacing faces in photographs. In ACM SIGGRAPH (pp. 1–8).
Blanz, V., Scherbaum, K., Vetter, T., & Seidel, H. P. (2004). Exchanging faces in images. Computer Graphics Forum, Wiley Online Library, 23, 669–676.
Article Google Scholar
Bounareli, S., Tzelepis, C., Argyriou, V., Patras, I., & Tzimiropoulos, G. (2023). Hyperreenact: One-shot reenactment via jointly learning to refine and retarget faces. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7149–7159).
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., & Zisserman, A. (2018) Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 67–74). IEEE.
Chen, R., Chen, X., Ni, B., & Ge, Y. (2020a) Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM international conference on multimedia (pp. 2003–2011).
Chen, Z., Wang, C., Yuan, B., & Tao, D. (2020b). Puppeteergan: Arbitrary portrait animation with semantic-aware appearance transformation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13518–13527).
Cheng, Y.T., Tzeng, V., Liang, Y., Wang, C.C., Chen, B.Y., Chuang, Y.Y., & Ouhyoung, M. (2009) 3d-model-based face replacement in video. In SIGGRAPH’09: Posters (pp. 1–1).
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019a). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4690–4699).
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., & Tong, X. (2019b) Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops.
Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34, 8780–8794.
Google Scholar
Doukas, M. C., Zafeiriou, S., & Sharmanska, V. (2021). Headgan: One-shot neural head synthesis and editing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14398–14407).
Fan, W.C., Chen, Y.C., Chen, D., Cheng, Y., Yuan, L., & Wang, Y.C.F. (2022) Frido: Feature pyramid diffusion for complex scene image synthesis. arXiv preprint arXiv:2208.13753
Gao, G., Huang, H., Fu, C., Li, Z., & He, R. (2021a). Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3404–3413).
Gao, G., Huang, H., Fu, C., Li, Z., & He, R. (2021b). Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3404–3413).
Gao, Y., Zhou, Y., Wang, J., Li, X., Ming, X., & Lu, Y. (2023). High-fidelity and freely controllable talking head video generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5609–5619).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144.
Article MathSciNet Google Scholar
Ha, S., Kersner, M., Kim, B., Seo, S., & Kim, D. (2020). Marionette: Few-shot face reenactment preserving identity of unseen targets. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 10893–10900.
Article Google Scholar
Harvey, W., Naderiparizi, S., Masrani, V., Weilbach, C., & Wood, F. (2022) Flexible diffusion modeling of long videos. arXiv preprint arXiv:2205.11495
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30
Ho, J., & Salimans, T. (2022) Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Google Scholar
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Google Scholar
Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D.P., Poole, B., Norouzi, M., & Fleet, D.J., et al. (2022a) Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., & Fleet, D.J. (2022b) Video diffusion models. arXiv preprint arXiv:2204.03458
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., & Fleet, D. J. (2022c). Video diffusion models., arXiv preprint arXiv:2204.03458
Hong, F. T., & Xu, D. (2023). Implicit identity representation conditioned memory compensation network for talking head video generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 23062–23072).
Hong, F. T., Zhang, L., Shen, L., & Xu, D. (2022). Depth-aware generative adversarial network for talking head video generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3397–3406).
Hong, F. T., Shen, L., & Xu D (2023) Dagan++: Depth-aware generative adversarial network for talking head video generation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Huang, P. H., Yang, F. E., & Wang, Y. C. F. (2020a). Learning identity-invariant motion representations for cross-id face reenactment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7084–7092).
Huang, R., Huang, J., Yang, D., Ren, Y., Liu, L., Li, M., Ye, Z., Liu, J., Yin, X., & Zhao, Z. (2023) Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. arXiv preprint arXiv:2301.12661
Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision (pp. 1501–1510).
Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Li, J., & Huang, F. (2020b). Curricularface: Adaptive curriculum learning loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5901–5910).
Jiang, D., Song, D., Tong, R., & Tang, M. (2023). Styleipsb Identity-preserving semantic basis of stylegan for high fidelity face swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 352–361).
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401–4410).
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110–8119).
Kim, K., Kim, Y., Cho, S., Seo, J., Nam, J., Lee, K., Kim, S., & Lee, K. (2022) Diffface: Diffusion-based face swapping with facial guidance. arXiv preprint arXiv:2212.13344
Lee, C. H., Liu, Z., Wu, L., & Luo, P. (2020). Maskgan Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition (CVPR).
Li, J., Li, Z., Cao, J., Song, X., & He, R. (2021). Faceinpainter: High fidelity face adaptation to heterogeneous domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5089–5098).
Li, L., Bao, J., Yang, H., Chen, D., & Wen, F. (2020). Advancing high fidelity identity swapping for forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5074–5083).
Li, M., Duan, Y., Zhou, J., Lu, J. (2022) Diffusion-sdf: Text-to-shape via voxelized diffusion. arXiv preprint arXiv:2212.03293
Lin, Y., Wang, S., Lin, Q., & Tang, F. (2012) Face swapping under large pose variations: A 3d model based approach. In 2012 IEEE international conference on multimedia and expo (pp. 333–338). IEEE
Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D., Wang, W., & Plumbley, M.D. (2023a) Audioldm: Text-to-audio generation with latent diffusion models. arXiv preprint arXiv:2301.12503
Liu, Z., Li, M., Zhang, Y., Wang, C., Zhang, Q., Wang, J., & Nie, Y. (2023b). Fine-grained face swapping via regional gan inversion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8578–8587).
Luo, Y., Zhu, J., He, K., Chu, W., Tai, Y., Wang, C., & Yan, J. (2022). Styleface: Towards identity-disentangled face generation on megapixels. In X. V. I. Part (Ed.), Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings (pp. 297–312). Springer.
Chapter Google Scholar
Molad, E., Horwitz, E., Valevski, D., Acha, A.R., Matias, Y., Pritch, Y., Leviathan, Y., & Hoshen, Y. (2023) Dreamix: Video diffusion models are general video editors. arXiv preprint arXiv:2302.01329
Nagrani, A., Chung, J.S., & Zisserman, A. (2017) Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612
Natsume, R., Yatagawa, T., & Morishima, S. (2018) Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv preprint arXiv:1804.03447
Nichol, A. Q., & Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In International conference on machine learning (pp. 8162–8171). PMLR
Park, S., Zhang, X., Bulling, A., & Hilliges, O. (2018). Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In Proceedings of the 2018 ACM symposium on eye tracking research & applications (pp. 1–10).
Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S., Umé, C., Dpfks, M., Facenheim, C.S., RP, L., & Jiang, J., et al. (2020) Deepfacelab: Integrated, flexible and extensible face-swapping framework. arXiv preprint arXiv:2005.05535
Poole, B., Jain, A., Barron, J.T., & Mildenhall, B. (2022) Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988
Ren, Q., Lu, Z., Wu, H., Zhang, J., & Dong, Z. (2023). Hr-net: a landmark based high realistic face reenactment network. IEEE Transactions on Circuits and Systems for Video Technology, 33(11), 6347–6359.
Article Google Scholar
Ren, Y., Li, G., Chen, Y., Li, T. H., & Liu, S. (2021). Pirenderer: Controllable portrait image generation via semantic neural rendering. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13759–13768).
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2287–2296).
Rochow, A., Schwarz, M., & Behnke, S. (2024). Fsrt: Facial scene representation transformer for face reenactment from factorized appearance head-pose and facial expression features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7716–7726).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684–10695).
Rosberg, F., Aksoy, E.E., Alonso-Fernandez, F., & Englund, C. (2023) Facedancer: Pose-and occlusion-aware high fidelity face swapping. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 3454–3463)
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019) Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision.
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2022) Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S. K. S., Ayan, B. K., Mahdavi, S. S., & Lopes, R. G., et al. (2022) Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487
Shiohara, K., Yang, X., & Taketomi, T. (2023). Blendface: Re-designing identity encoders for face-swapping. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7634–7644).
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019a). Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2377–2386).
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019b) First order motion model for image animation. Advances in Neural Information Processing Systems, 32.
Simonyan, K., & Zisserman, A. (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Song, J., Meng, C., & Ermon, S. (2020) Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502
Stypulkowski, M., Vougioukas, K., He, S., Zieba, M., Petridis, S., & Pantic, M. (2023). Diffused heads: Diffusion models beat gans on talking-face generation. arXiv preprint arXiv:2301.03396
Tao, J., Wang, B., Xu, B., Ge, T., Jiang, Y., Li, W., & Duan, L. (2022). Structure-aware motion transfer with deformable anchor model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3637–3646).
Wang, T. C., Mallya, A., & Liu, M. Y. (2021a). One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10039–10049).
Wang, Y., Chen, X., Zhu, J., Chu, W., Tai, Y., Wang, C., Li, J., Wu, Y., Huang, F., & Ji, R. (2021b) Hififace: 3d shape and semantic prior guided high fidelity face swapping. arXiv preprint arXiv:2106.09965
Wei, H., Yang, Z., & Wang, Z. (2024) Aniportrait: Audio-driven synthesis of photorealistic portrait animation. arXiv preprint arXiv:2403.17694
Wiles, O., Koepke, A., Zisserman, A. (2018) X2face: A network for controlling face generation using images, audio, and pose codes. In Proceedings of the European conference on computer vision (ECCV) (pp. 670–686).
Wu, J.Z., Ge, Y., Wang, X., Lei, W., Gu, Y., Hsu, W., Shan, Y., Qie, X., & Shou, M.Z. (2022) Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. arXiv preprint arXiv:2212.11565
Wu, W., Zhang, Y., Li, C., Qian, C., & Loy, C. C. (2018). Reenactgan: Learning to reenact faces via boundary transfer. In Proceedings of the European conference on computer vision (ECCV) (pp. 603–619).
Xu, C., Zhang, J., Han, Y., Tian, G., Zeng, X., Tai, Y., Wang, Y., Wang, C., & Liu, Y. (2022). Designing one unified framework for high-fidelity face reenactment and swapping. In X. V. Part (Ed.), Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings (pp. 54–71). Springer.
Chapter Google Scholar
Xu, C., Zhang, J., Hua, M., He, Q., Yi, Z., & Liu, Y. (2022b) Region-aware face swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7632–7641).
Xu, C., Zhu, J., Zhang, J., Han, Y., Chu, W., Tai, Y., Wang, C., Xie, Z., & Liu, Y. (2023) High-fidelity generalized emotional talking face generation with multi-modal emotion space learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6609–6619).
Xu, J., Wang, X., Cheng, W., Cao, Y.P., Shan, Y., Qie, X., & Gao, S. (2022c) Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. arXiv preprint arXiv:2212.14704
Xu, Y., Deng, B., Wang, J., Jing, Y., Pan, J., & He, S. (2022d). High-resolution face swapping via latent semantics disentanglement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7642–7651).
Xu, Z., Zhou, H., Hong, Z., Liu, Z., Liu, J., Guo, Z., Han, J., Liu, J., Ding, E., & Wang, J. (2022). Styleswap: Style-based generator empowers robust face swapping. In X. I. V. Part (Ed.), Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings (pp. 661–677). Springer.
Chapter Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., & Sang, N. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 325–341).
Zakharov, E., Shysheya, A., Burkov, E., & Lempitsky, V. (2019). Few-shot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9459–9468).
Zeng, H., Zhang, W., Fan, C., Lv, T., Wang, S., Zhang, Z., Ma, B., Li, L., Ding, Y., & Yu, X. (2022) Flowface: Semantic flow-guided shape-aware face swapping. arXiv preprint arXiv:2212.02797
Zeng, X., Pan, Y., Wang, M., Zhang, J., & Liu, Y. (2020). Realistic face reenactment via self-supervised disentangling of identity and pose. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12757–12764.
Zhang, B., Qi, C., Zhang, P., Zhang, B., Wu, H., Chen, D., Chen, Q., Wang, Y., & Wen, F. (2023). Metaportrait: Identity-preserving talking head generation with fast personalized adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 22096–22105).
Zhang, J., Zeng, X., Wang, M., Pan, Y., Liu, L., Liu, Y., Ding, Y., & Fan, C. (2020). Freenet: Multi-identity face reenactment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5326–5335).
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., & Wang, O. (2018) The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).
Zhang, Z., Li, L., Ding, Y., & Fan, C. (2021). Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3661–3670).
Zhao, J., & Zhang, H. (2022). Thin-plate spline motion model for image animation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3657–3666).
Zhao, W., Rao, Y., Shi, W., Liu, Z., Zhou, J., & Lu, J. (2023) Diffswap: High-fidelity and controllable face swapping via 3d-aware masked diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8568–8577).
Zhou, Y., Han, X., Shechtman, E., Echevarria, J., Kalogerakis, E., & Li, D. (2020). Makelttalk: Speaker-aware talking-head animation. ACM Transactions On Graphics (TOG), 39(6), 1–15.
Google Scholar
Zhu, F., Zhu, J., Chu, W., Tai, Y., Xie, Z., Huang, X., & Wang, C. (2022) Hifihead: One-shot high fidelity neural head synthesis with 3d control. In IJCAI (pp. 1750–1756).
Zhu, Y., Li, Q., Wang, J., Xu, C. Z., & Sun, Z. (2021). One shot face swapping on megapixels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4834–4844).

Download references

Acknowledgements

This work was supported by the Key R&D Project of Zhejiang Province under Grant 2024C01172 and National Natural Science Foundation of China (62476224).

Author information

Chao Xu and Yijie Qian contributed equally to this work.

Authors and Affiliations

State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, China
Chao Xu, Yijie Qian, Shaoting Zhu & Yong Liu
Walf 1069B Lab, Sany Group, Guangzhou, China
Baigui Sun
The Institute of AI (TeleAI), China Telecom, Beijing, China
Jian Zhao & Xuelong Li
School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, China
Jian Zhao & Xuelong Li

Authors

Chao Xu
View author publications
Search author on:PubMed Google Scholar
Yijie Qian
View author publications
Search author on:PubMed Google Scholar
Shaoting Zhu
View author publications
Search author on:PubMed Google Scholar
Baigui Sun
View author publications
Search author on:PubMed Google Scholar
Jian Zhao
View author publications
Search author on:PubMed Google Scholar
Yong Liu
View author publications
Search author on:PubMed Google Scholar
Xuelong Li
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Chao Xu.

Additional information

Communicated by Svetlana Lazebnik.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, C., Qian, Y., Zhu, S. et al. UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors. Int J Comput Vis 133, 4538–4554 (2025). https://doi.org/10.1007/s11263-025-02395-6

Download citation

Received: 04 July 2024
Accepted: 08 February 2025
Published: 11 March 2025
Version of record: 11 March 2025
Issue date: July 2025
DOI: https://doi.org/10.1007/s11263-025-02395-6

Keywords

Profiles

Baigui Sun View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised High-Fidelity Facial Texture Generation and Reconstruction

Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping

Unified Application of Style Transfer for Face Swapping and Reenactment

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now