Abstract
The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or diffusion models are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces remain under-explored. To push the related frontier research, we introduce a novel task named Open-World DeepFake Attribution, and the corresponding benchmark OW-DFA++, which aims to evaluate attribution performance against various types of fake faces in open-world scenarios. Meanwhile, we propose a Multi-Perspective Sensory Learning (MPSL) framework that aims to address the challenge of OW-DFA++. Since different forged faces have different tampering regions and frequency artifacts, we introduce the Multi-Perception Voting (MPV) module, which aligns inter-sample features based on global, multi-scale local, and frequency relations. The MPV module effectively filters and groups together samples belonging to the same attack type. Pseudo-labeling is another common and effective strategy in semi-supervised learning tasks, and we propose the Confidence-Adaptive Pseudo-labeling (CAP) module, using soft pseudo-labeling to enhance the class compactness and mitigate pseudo-noise induced by similar novel attack methods. The CAP module imposes strong constraints and adaptively filters samples with high uncertainty to improve the accuracy of the pseudo-labeling. In addition, we extend the MPSL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments and visualizations verify the superiority of our proposed method on the OW-DFA++ and demonstrate the interpretability of the deepfake attribution task and its impact on improving the security of the deepfake detection area.
Similar content being viewed by others
References
Boháček, M., & Farid, H. (2023). A geometric and photometric exploration of gan and diffusion synthesized faces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 874–883).
Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., & Yang, X. (2022). End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4113–4122).
Cao, K., Brbic, M., & Leskovec, J. (2022) Open-world semi-supervised learning. In International conference on learning representations.
Chen, L., Maddox, R. K., Duan, Z., & Xu, C. (2019). Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7832–7841).
Chen, S., Yao, T., Chen, Y., Ding, S., Li, J., & Ji, R. (2021). Local relation learning for face forgery detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 1081–1088.
Chen, Z., Li, B., Wu, S., Xu, J., Ding, S., & Zhang, W. (2022). Shape matters: deformable patch attack. In Computer vision–eccv 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part IV (pp. 529–548). Springer.
Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., Choo, J., (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Choi, Y., Uh, Y., Yoo, J., & Ha, J.-W. (2020) Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251–1258).
Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., & Verdoliva, L. (2023). Intriguing properties of synthetic images: From generative adversarial networks to diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 973–982).
Dalle2-pytorch. https://github.com/lucidrains/DALLE2-pytorch. Accessed: July 6, 2024
Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A. K. (2020) On the detection of digital face manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5781–5790).
Daubechies, I. (1992). Ten lectures on wavelets. New York: SIAM.
Deepfacelab. https://github.com/iperov/DeepFaceLab. Accessed: July 6, 2024.
Deepfakes. https://github.com/deepfakes/faceswap. Accessed: July 6, 2024.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). Ieee.
Dong, S., Wang, J., Liang, J., Fan, H., & Ji, R. (2022). Explaining deepfake detection by analysing image matching. In Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, Part XIV (pp. 18–35). Springer.
Faceapp. https://faceapp.com/app. Accessed: July 6, 2024.
Faceswap. https://github.com/MarekKowalski/FaceSwap/. Accessed: July 6, 2024.
Gu, Z., Chen, Y., Yao, T., Ding, S., Li, J., Huang, F., & Ma, L. (2021). Spatiotemporal inconsistency learning for deepfake video detection. In Proceedings of the 29th ACM international conference on multimedia (pp. 3473–3481).
Gu, Z., Yao, T., Chen, Y., Ding, S., Ma, L. (2022a). Hierarchical contrastive inconsistency learning for deepfake video detection. In European conference on computer vision (pp. 596–613). Springer.
Gu, Z., Yao, T., Yang, C., Yi, R., Ding, S., & Ma, L. (2022b). Region-aware temporal inconsistency learning for deepfake video detection. In Proceedings of the 31th international joint conference on artificial intelligence (Vol. 1).
Guarnera, L., Giudice, O., Nießner, M., & Battiato, S. (2022). On the exploitation of deepfake model recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 61–70).
Guo, L.-Z., Zhang, Y.-G., Wu, Z.-F., Shao, J.-J., & Li Y.-F. (2022). Robust semi-supervised learning when not all classes have labels. In Advances in neural information processing systems.
Haliassos, A., Vougioukas, K., Petridis, S., & Pantic, M. (2021). Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5039–5049).
Han, K., Rebuffi, S.-A., Ehrhardt, S., Vedaldi, A., & Zisserman, A. (2020) Automatically discovering and learning new visual categories with ranking statistics. In International conference on learning representations (ICLR).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
He, Y., Gan, B., Chen, S., Zhou, Y., Yin, G., Song, L., Sheng, L., Shao, J., & Liu, Z. (2021). Forgerynet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4360–4369).
Jang, E., Gu, S., & Poole, B. (2017). Categorical reparameterization with gumbel-softmax. In International conference on learning representations.
Jia, G., Zheng, M., Chuanrui, H., Ma, X., Yuting, X., Liu, L., Deng, Y., & He, R. (2021). Inconsistency-aware wavelet dual-branch network for face forgery detection. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(3), 308–319.
Jo, Y., & Park, J. (2019). Sc-fegan: Face editing generative adversarial network with user’s sketch and color. In The IEEE international conference on computer vision (ICCV).
Ju, Y., Jia, S., Cai, J., Guan, H., & Lyu, S. (2023). Glff: Global and local feature fusion for ai-synthesized image detection. IEEE Transactions on Multimedia.
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of gans for improved quality, stability, and variation. In International conference on learning representations.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401–4410).
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of StyleGAN. In Proceeding of the CVPR.
King, D. E. (2009). Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10, 1755–1758.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
Kobayashi, G., Kuribayashi, T., Yokoi, S., & Inui, K. (2020). Attention is not only a weight: Analyzing transformers with vector norms. In 2020 Conference on empirical methods in natural language processing, EMNLP 2020, (pp. 7057–7075). Association for Computational Linguistics (ACL).
Kuhn, H. W., et al. (1955). The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.
Lee, C.-H., Liu, Z., Wu, L., & Luo, P. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition (CVPR).
Lee, D.-H., et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML (Vol. 3, pp. 896).
Li, J., Xie, H., Li, J., Wang, Z., & Zhang, Y. (2021) Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6458–6467).
Li, L., Bao, J., Yang, H., Chen, D., & Wen, F. (2019). Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv:1912.13457
Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S (2020) Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3207–3216).
Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., & Yu, N. (2021) Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 772–781).
Luo, Y., Zhang, Y., Yan, J., & Liu, W. (2021) Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16317–16326).
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693.
Masi, I., Killekar, A., Mascarenhas, R. M., Gurudatt, S. P., & AbdAlmageed, W. (2020). Two-branch recurrent network for isolating deepfakes in videos. In Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, proceedings, part VII 16 (pp. 667–684). Springer.
Miao, C., Tan, Z., Chu, Q., Liu, H., Honggang, H., & Nenghai, Yu. (2023). F 2 trans: High-frequency fine-grained transformer for face forgery detection. IEEE Transactions on Information Forensics and Security, 18, 1039–1051.
Miao, C., Tan, Z., Chu, Q., Nenghai, Yu., & Guo, G. (2022). Hierarchical frequency-assisted interactive networks for face manipulation detection. IEEE Transactions on Information Forensics and Security, 17, 3008–3021.
Midjourney. https://www.midjourney.com/home/. Accessed: July 6, 2024
Neuraltextures. https://github.com/SSRSGJYD/NeuralTexture. Accessed: July 6, 2024.
Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741
Nirkin, Y., Keller, Y., & Hassner, T. (2019). FSGAN: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE international conference on computer vision (pp. 7184–7193).
Poličar, P. G., Stražar, M., & Zupan, B. (2019). opentsne: A modular python library for t-sne dimensionality reduction and embedding. bioRxiv.
Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, proceedings, part XII (pp. 86–103). Springer.
Qiqi, G., Chen, S., Yao, T., Chen, Y., Ding, S., & Yi, R. (2022). Exploiting fine-grained face forgery clues via progressive enhancement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 735–743.
Ramesh, A., Dhariwal, P., Nichol, A., Chu,C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents.
Ricker, J., Damm, S., Holz, T., & Fischer, A. (2022). Towards the detection of diffusion model deepfakes. arXiv:2210.14571
Rizve, M. N., Kardan, N., & Shah, M. (2022a). Towards realistic semi-supervised learning. In Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XXXI (pp. 437–455). Springer.
Rizve, M. N., Kardan, N., Khan, S., Shahbaz Khan, F., & Shah, M. (2022b). Openldn: Learning to discover novel classes for open-world semi-supervised learning. In Computer vision–eccv 2022: 17th european conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XXXI (pp. 382–401). Springer.
Rombach, R., Blattmann, A., Lorenz D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684–10695).
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1–11).
Sha, Z., Li, Z., Yu, N., & Zhang, Y. (2022). De-fake: Detection and attribution of fake images generated by text-to-image diffusion models. arXiv:2210.06998
Sha, Z., Li, Z., Yu, N., & Zhang, Y. (2023). De-fake: Detection and attribution of fake images generated by text-to-image generation models. In Proceedings of the 2023 ACM SIGSAC conference on computer and communications security (pp. 3418–3432).
Sharath, G., Suri, S., Rambhatla, S. S., & Shrivastava, A. (2021). Towards discovery and attribution of open-world gan generated images. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14094–14103).
Shiohara, K., & Yamasaki, T. (2022). Detecting deepfakes with self-blended images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18720–18729).
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019). First order motion model for image animation. In Conference on neural information processing systems (NeurIPS).
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C. A., Cubuk, E. D., Kurakin, A., & Fixmatch, C.-L.L. (2020). Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, 33, 596–608.
Sun, K., Liu, H., Yao, T., Sun, X., Chen, S., Ding, S., & Ji, R. (2022a). An information theoretic approach for attention-driven face forgery detection. In European conference on computer vision (pp. 111–127). Springer.
Sun, K., Yao, T., Chen, S., Ding, S., Li, J., & Ji, R. (2022b). Dual contrastive learning for general face forgery detection. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2316–2324.
Sun, Z., Chen, S., Yao, T., Yin, B., Yi, R., Ding, S., & Ma, L. (2023). Contrastive pseudo learning for open-world deepfake attribution. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 20882–20892).
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105–6114). PMLR.
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016) Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2387–2395).
Vahdat, A., Kreis, K., & Kautz, J. (2021). Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34, 11287–11302.
Vaze, S., Han, K., Vedaldi, A., & Zisserman, A. (2022). Generalized category discovery. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (pp. 7492–7501).
Wang, S.-Y., Wang, O., Zhang, R., Owens, A., & Efros, A. A. (2020). Cnn-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8695–8704).
Wang, Y., Chen, H., Heng, Q., Hou, W., Savvides, M., Shinozaki, T., Raj, B., Wu, Z., & Wang, J. (2023). Freematch: Self-adaptive thresholding for semi-supervised learning. In International conference on learning representations.
Wang, Y., Peng, C., Liu, D., Wang, N., & Gao, X. (2022). Forgerynir: Deep face forgery and detection in near-infrared scenario. IEEE Transactions on Information Forensics and Security, 17, 500–515.
Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., & Li, H. (2023). Dire for diffusion-generated image detection. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 22445–22455).
Wolter, M., Blanke, F., Heese, R., & Garcke, J. (2022). Wavelet-packets for deepfake image analysis and detection. Machine Learning, 111(11), 4295–4327.
Yang, T., Huang, Z., Cao, J., Li, L., & Li, X. (2022). Deepfake network architecture attribution. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 4662–4670.
Yu, N., Davis, L. S, & Fritz, M. (2019). Attributing fake images to gans: Learning and analyzing gan fingerprints. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7556–7566).
Yu, N., Skripniuk, V., Abdelnabi, S., & Fritz, M. (2021). Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF international conference on computer vision(pp. 14448–14457).
Zhang, B., Wang, Y., Hou, W., Hao, W., Wang, J., Okumura, M., & Shinozaki, T. (2021). Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems, 34, 18408–18419.
Zhang, S., Yuan, J., Liao, M., Zhang, L. (2021). Text2video: Text-driven talking-head video synthesis with phonetic dictionary. arXiv:2104.14631.
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2185–2194).
Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Yu., & Xia, W. (2020). Learning to recognize patch-wise consistency for deepfake detection. arXiv:2012.09311
Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021) Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15023–15033).
Zheng, T. (2023). Enfomax: Domain entropy and mutual information maximization for domain generalized face anti-spoofing.
Zheng, T., Li, B., Wu, S., Wan, B., Mu, G., Liu, S., Ding, S., & Wang, J (2024). Mfae: Masked frequency autoencoders for domain generalization face anti-spoofing. In IEEE transactions on information forensics and security (pp. 1–1).
Zhihao, G., Chen, Y., Yao, T., Ding, S., Li, J., & Ma, L. (2022). Delving into the local: Dynamic inconsistency learning for deepfake video detection. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 744–752.
Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networkss. In Computer vision (ICCV), 2017 IEEE international conference on.
Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tu, Z., Hu, H., Hu, J., & Wang, Y. (2024). (Genimage: A million-scale benchmark for detecting ai-generated image. In Advances in neural information processing systems (Vol. 36).
Funding
This work was supported by National Natural Science Foundation of China (No. 62302297, 72192821, 62272447), Shanghai Sailing Program (22YF1420300), Young Elite Scientists Sponsorship Program by CAST (2022QNRC001), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), Shanghai Science and Technology Commission (21511101200), the Fundamental Research Funds for the Central Universities (YG2023QNB17, YG2024QNA44), Beijing Natural Science Foundation (L222117).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by ZHUN ZHONG.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Z., Chen, S., Yao, T. et al. Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning. Int J Comput Vis 133, 628–651 (2025). https://doi.org/10.1007/s11263-024-02184-7
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02184-7