Robust Sequential DeepFake Detection

Shao, Rui; Wu, Tianxing; Liu, Ziwei

doi:10.1007/s11263-024-02339-6

Robust Sequential DeepFake Detection

Published: 04 January 2025

Volume 133, pages 3278–3295, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1465 Accesses
12 Citations
Explore all metrics

Abstract

Since photorealistic faces can be readily generated by facial manipulation technologies nowadays, potential malicious abuse of these technologies has drawn great concerns. Numerous deepfake detection methods are thus proposed. However, existing methods only focus on detecting one-step facial manipulation. As the emergence of easy-accessible facial editing applications, people can easily manipulate facial components using multi-step operations in a sequential manner. This new threat requires us to detect a sequence of facial manipulations, which is vital for both detecting deepfake media and recovering original faces afterwards. Motivated by this observation, we emphasize the need and propose a novel research problem called Detecting Sequential DeepFake Manipulation (Seq-DeepFake). Unlike the existing deepfake detection task only demanding a binary label prediction, detecting Seq-DeepFake manipulation requires correctly predicting a sequential vector of facial manipulation operations. To support a large-scale investigation, we construct the first Seq-DeepFake dataset, where face images are manipulated sequentially with corresponding annotations of sequential facial manipulation vectors. Based on this new dataset, we cast detecting Seq-DeepFake manipulation as a specific image-to-sequence (e.g., image captioning) task and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer). To better reflect real-world deepfake data distributions, we further apply various perturbations on the original Seq-DeepFake dataset and construct the more challenging Sequential DeepFake dataset with perturbations (Seq-DeepFake-P). To exploit deeper correlation between images and sequences when facing Seq-DeepFake-P, a dedicated Seq-DeepFake Transformer with Image-Sequence Reasoning (SeqFakeFormer++) is devised, which builds stronger correspondence between image-sequence pairs for more robust Seq-DeepFake detection. Moreover, we build a comprehensive benchmark and set up rigorous evaluation protocols and metrics for this new research problem. Extensive quantitative and qualitative experiments demonstrate the effectiveness of SeqFakeFormer and SeqFakeFormer++. Several valuable observations are also revealed to facilitate future research in broader deepfake detection problems. The code has been released at https://github.com/rshaojimmy/SeqDeepFake/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 12

Detecting and Recovering Sequential DeepFake Manipulation

An Improved Seq-Deepfake Detection Method

SE_EDNet: A Robust Manipulated Faces Detection Algorithm

Data Availability

The Seq-DeepFake dataset analysed during this study is publicly available for the research purpose - Seq-DeepFake dataset.

Notes

References

Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2019). Attention augmented convolutional networks. In CVPR (pp. 3286–3295).
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In ECCV (pp. 213–229).
Chen, G., Shen, L., Shao, R., Deng, X., & Nie, L. (2024). Lion: Empowering multimodal large language model with dual-level visual knowledge. In CVPR (pp. 26540–26550).
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR (pp. 248–255).
Dolhansky, B., Howes, R., Pflaum, B., Baram, N., & Ferrer, C. C. (2019). The deepfake detection challenge (DFDC) preview dataset. arXiv preprint arXiv:1910.08854
Durall, R., Keuper, M., Pfreundt, F. J., & Keuper, J. (2019). Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686
Dzanic, T., Shah, K., & Witherden, F. (2020). Fourier spectrum discrepancies in deep network generated images. NeurIPS, 33, 3022–3032.
Google Scholar
Gao, P., Zheng, M., Wang, X., Dai, J., & Li, H. (2021). Fast convergence of DETR with spatially modulated co-attention. In CVPR (pp. 3621–3630).
Gu, S., Bao, J., Chen, D., & Wen, F. (2020). GIQA: Generated image quality assessment. In ECCV (pp. 369–385).
Haliassos, A., Vougioukas, K., Petridis, S., & Pantic, M. (2021). Lips don’t lie: A generalisable and robust approach to face forgery detection. In CVPR (pp. 5039–5049).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
He, Y., Gan, B., Chen, S., Zhou, Y., Yin, G., Song, L., Sheng, L., Shao, J., & Liu, Z. (2021). ForgeryNet: A versatile benchmark for comprehensive forgery analysis. In CVPR (pp. 4360–4369).
Jiang, L., Li, R., Wu, W., Qian, C., & Loy, C. C. (2020). Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In CVPR (pp. 2889–2898).
Jiang, Y., Huang, Z., Pan, X., Loy, C. C., & Liu, Z. (2021). Talk-to-edit: Fine-grained facial editing via dialog. In ICCV (pp. 13799–13808).
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In ICLR.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In CVPR (pp. 4401–4410).
Kim, H., Choi, Y., Kim, J., Yoo, S., & Uh, Y. (2021). Exploiting spatial dimensions of latent in GAN for real-time image editing. In CVPR (pp. 852–861).
Lee, C. H., Liu, Z., Wu, L., & Luo, P. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In CVPR.
Lee, W., Kim, D., Hong, S., & Lee, H. (2020). High-fidelity synthesis with disentangled representation. In ECCV (pp. 157–174).
Li, J., Dongxu, L., Caimingm, X., & Steven, H. (2022). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML (pp. 12888–12900).
Li, J., Selvaraju, R., Gotmare, A., Joty, S., Xiong, C., & Hoi, S. C. H. (2021). Align before fuse: Vision and language representation learning with momentum distillation. NeurIPS, 34, 9694–9705.
Google Scholar
Li, J., Xie, H., Li, J., Wang, Z., & Zhang, Y. (2021). Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In CVPR (pp. 16317–16326).
Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face X-ray for more general face forgery detection. In CVPR (pp. 5001–5010).
Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). CELEB-DF: A large-scale challenging dataset for deepfake forensics. In CVPR (pp. 3207–3216).
Li, Z., Xie, Y., Shao, R., Chen, G., Jiang, D., & Nie, L. (2024). Optimus-1: Hybrid multimodal memory empowered agents excel in long-horizon tasks. In NeurIPS.
Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., & Yu, N. (2021). Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In CVPR (pp. 772–781).
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In CVPR (pp. 3730–3738).
Luo, Y., Zhang, Y., Yan, J., & Liu, W. (2021). Generalizing face forgery detection with high-frequency features. In CVPR.
Oord, A. V. D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748
Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., & Tran, D. (2018). Image transformer. In ICML (pp. 4055–4064).
Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: Face forgery detection by mining frequency-aware clues. In ECCV (pp. 86–103).
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). Faceforensics++: Learning to detect manipulated facial images. In CVPR (pp. 1–11).
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). GRAD-CAM: Visual explanations from deep networks via gradient-based localization. In ICCV (pp. 618–626).
Shao, R., Lan, X., Li, J., & Yuen, P. C. (2019). Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In CVPR (pp. 10023–10031).
Shao, R., Lan, X., & Yuen, P. C. (2017). Deep convolutional dynamic texture learning with adaptive channel-discriminability for 3D mask face anti-spoofing. In IJCB (pp. 748–755).
Shao, R., Lan, X., & Yuen, P. C. (2018). Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE Transactions on Information Forensics and Security, 14(4), 923–938.
Article Google Scholar
Shao, R., Lan, X., & Yuen, P. C. (2020). Regularized fine-grained meta face anti-spoofing. In AAAI (Vol. 34, pp. 11974–11981).
Shao, R., Perera, P., Yuen, P. C., & Patel, V. M. (2020). Open-set adversarial defense. In ECCV (pp. 682–698).
Shao, R., Perera, P., Yuen, P. C., & Patel, V. M. (2022). Federated generalized face presentation attack detection. IEEE Transactions on Neural Networks and Learning Systems, 35(1), 103–116.
Article Google Scholar
Shao, R., Perera, P., Yuen, P. C., & Patel, V. M. (2022). Open-set adversarial defense with clean-adversarial mutual learning. International Journal of Computer Vision, 130(4), 1070–1087.
Article Google Scholar
Shao, R., Wu, T., & Liu, Z. (2022). Detecting and recovering sequential deepfake manipulation. In ECCV (pp. 712–728).
Shao, R., Wu, T., & Liu, Z. (2023). Detecting and grounding multi-modal media manipulation. In CVPR (pp. 6904–6913).
Shao, R., Wu, T., Nie, L., & Liu, Z. (2025). Deepfake-adapter: Dual-level adapter for deepfake detectiong. International Journal of Computer Vision.
Shao, R., Wu, T., Wu, J., Nie, L., & Liu, Z. (2024). Detecting and grounding multi-modal media manipulation and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5556–5574.
Shen, L., Chen, G., Shao, R., Guan, W., & Nie, L. (2024). MOME: Mixture of multimodal experts for generalist multimodal large language models. In NeurIPS.
Shen, Y., Gu, J., Tang, X., & Zhou, B. (2020a). Interpreting the latent space of GANs for semantic face editing. In CVPR (pp. 9243–9252).
Shen, Y., Yang, C., Tang, X., & Zhou, B. (2020b). Interfacegan: Interpreting the disentangled face representation learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4), 2004–2018.
Shen, Y., & Zhou, B. (2021). Closed-form factorization of latent semantics in GANs. In CVPR (pp. 1532–1540).
Voynov, A., & Babenko, A. (2020). Unsupervised discovery of interpretable directions in the GAN latent space. In ICML (pp. 9786–9796).
Wang, H., Liu, W., Bocchieri, A., & Li, Y. (2021). Can multi-label classification networks know what they don’t know? NeurIPS, 34, 29074–29087.
Google Scholar
Wang, S. Y., Wang, O., Owens, A., Zhang, R., & Efros, A. A. (2019). Detecting photoshopped faces by scripting photoshop. In CVPR (pp. 10072–10081).
Wang, W., Alameda-Pineda, X., Xu, D., Fua, P., Ricci, E., & Sebe, N. (2018). Every smile is unique: Landmark-guided diverse smile generation. In CVPR (pp. 31–39).
Yang, H., Huang, D., Wang, Y., & Jain, A. K. (2018). Learning face age progression: A pyramid architecture of GANs. In CVPR (pp. 31–39).
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In CVPR (pp. 2185–2194).
Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021). Learning self-consistency for deepfake detection. In ICCV (pp. 15023–15033).
Zhu, P., Abdal, R., Qin, Y., & Wonka, P. (2020). Sean: Image synthesis with semantic region-adaptive normalization. In CVPR (pp. 5549–5558).
Zhu, X., Wang, H., Fei, H., Lei, Z., & Li, S. Z. (2021). Face forgery detection by 3D decomposition. In CVPR (pp. 2929–2939).
Zhuang, P., Koyejo, O., & Schwing, A. G. (2021). Enjoy your editing: Controllable GANs for image editing via latent space navigation. In ICLR.

Download references

Acknowledgements

This study is supported by National Natural Science Foundation of China (Grant No. 62306090); Natural Science Foundation of Guangdong Province of China (Grant No. 2024A1515010147); This study is supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 (MOET2EP20221-0012), NTU NAP, and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Rui Shao
S-Lab, Nanyang Technological University, Singapore, Singapore
Tianxing Wu & Ziwei Liu

Authors

Rui Shao
View author publications
Search author on:PubMed Google Scholar
Tianxing Wu
View author publications
Search author on:PubMed Google Scholar
Ziwei Liu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ziwei Liu.

Additional information

Communicated by Gang Hua.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shao, R., Wu, T. & Liu, Z. Robust Sequential DeepFake Detection. Int J Comput Vis 133, 3278–3295 (2025). https://doi.org/10.1007/s11263-024-02339-6

Download citation

Received: 05 August 2023
Accepted: 01 December 2024
Published: 04 January 2025
Version of record: 04 January 2025
Issue date: June 2025
DOI: https://doi.org/10.1007/s11263-024-02339-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Sequential DeepFake Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting and Recovering Sequential DeepFake Manipulation

An Improved Seq-Deepfake Detection Method

SE_EDNet: A Robust Manipulated Faces Detection Algorithm

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now