Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection

Xu, Yuting; Liang, Jian; Sheng, Lijun; Zhang, Xiao-Yu

doi:10.1007/s11263-024-02054-2

Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection

Published: 24 June 2024

Volume 132, pages 5663–5680, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1013 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

The deepfake threats to society and cybersecurity have provoked significant public apprehension, driving intensified efforts within the realm of deepfake video detection. Current video-level methods are mostly based on 3D CNNs resulting in high computational demands, although have achieved good performance. This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. This transformation process involves sequentially masking frames at the same positions within each frame. These frames are then resized into sub-frames and reorganized into the predetermined layout, forming thumbnails. TALL is model-agnostic and has remarkable simplicity, necessitating only minimal code modifications. Furthermore, we introduce a graph reasoning block (GRB) and semantic consistency (SC) loss to strengthen TALL, culminating in TALL++. GRB enhances interactions between different semantic regions to capture semantic-level inconsistency clues. The semantic consistency loss imposes consistency constraints on semantic features to improve model generalization ability. Extensive experiments on intra-dataset, cross-dataset, diffusion-generated image detection, and deepfake generation method recognition show that TALL++ achieves results surpassing or comparable to the state-of-the-art methods, demonstrating the effectiveness of our approaches for various deepfake detection problems. The code is available at https://github.com/rainy-xu/TALL4Deepfake.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video-based spatio-temporal scene graph generation with efficient self-supervision tasks

Article 27 March 2023

Tail-Enhanced Representation Learning for Surgical Triplet Recognition

Automatic Generation of Semantic Parts for Face Image Synthesis

Data Availibility Statement

All the datasets used in this paper are available online. FaceForensics++ (FF++) (https://github.com/ondyari/FaceForensics), Celeb-DF (CDF) (https://github.com/yuezunli/celeb-deepfakeforensics), DFDC (https://ai.meta.com/datasets/dfdc/), DeeperForensics (DFo) (https://liming-jiang.com/projects/DrF1/DrF1.html), FaceShifter(Fsh) (https://github.com/ondyari/FaceForensics), Wild-Deepfake (Wild-DF) (https://github.com/deepfakeinthewild/deepfake-in-the-wild), KoDF (https://deepbrainai-research.github.io/kodf/), and Deepfakes LSUN-Bedroom (DBL) (https://github.com/jonasricker/diffusion-model-deepfake-detection#dataset) can be downloaded from their official website accordingly.

Notes

The model trained on the FaceForensics++ dataset and evaluated on Celeb-DF, the same applies to the following text.

References

Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). Mesonet: A compact facial video forgery detection network. In 2018 IEEE international workshop on information forensics and security 1–7.
Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., & Li, H. (2019). Protecting world leaders against deep fakes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 38.
Amerini, I., Galteri, L., Caldelli, R., & Del Bimbo, A. (2019). Deepfake video detection through optical flow based CNN. In Proc: ICCV.
Book Google Scholar
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lucic, M., & Schmid, C. (2021). Vivit: A video vision transformer. Proceedings of the IEEE/CVF international conference on computer vision, pp. 6836–6846.
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., & Gould, S. (2016). Dynamic image networks for action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3034–3042.
Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., & Yang, X. (2022). End-to-end reconstruction-classification learning for face forgery detection. In Proc: CVPR.
Book Google Scholar
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model & the kinetics dataset. In proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6299–6308.
Chai, L., Bau, D., Lim, S.-N., & Isola, P. (2020). What makes fake images detectable? understanding properties that generalize. In Proc. ECCV pp. 103–120.
Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019). Graph-based global reasoning networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 433–442.
Chen, L., Zhang, Y., Song, Y., Liu, L., & Wang, J. (2022). Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 18710–18719.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258.
Coccomini, D. A., Caldelli, R., Falchi, F., Gennaro, C., & Amato, G. (2022). Cross-forgery analysis of vision transformers and CNNS for Deepfake image detection. In Proceedings of the 1st International Workshop on Multimedia AI against Disinformation, pp. 52–58.
Cozzolino, D., Rössler, A., Thies, J., Nießner, M., & Verdoliva, L. (2021). Id-reveal: Identity-aware deepfake video detection. In Proceedings of the IEEE/CVF international conference on computer vision pp. 15108–15117.
Davide, A. C. (2022). Nicola Messina, Claudio Gennaro, & Fabrizio Falchi. ICIAP: Combining efficientnet and vision transformers for video deepfake detection. In Proc.
Google Scholar
Davis, J. W., & Bobick, A. F. (1997). The representation and recognition of human movement using temporal templates. In Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp. 928–934.
deepfakes. Deepfakes. https://github.com/deepfakes/faceswap, 2020.
DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C C. (2020). The deepfake detection challenge (dfdc) dataset. arXiv:2006.07397.
Dong, X., Bao, J., Chen, D., Zhang, T., Zhang, W., Nenghai, Y., Chen, D., Wen, F., & Guo, B. (2022). Protecting celebrities from Deepfake with identity consistency transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 9468–9478.
Dong, C., Kumar, A., & Liu, E. (2022). Think twice before detecting GAN-generated fake images from their spectral domain imprints. In Proc: CVPR.
Book Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In Proc: ICLR.
Google Scholar
Fei, J., Dai, Y., Peipeng, Y., Shen, T., Xia, Z., & Weng, J. (2022). Learning second order local anomaly for general face forgery detection. In Proc: CVPR.
Book Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. Proceedings of the IEEE/CVF international conference on computer vision pp. 6202–6211.
Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., & Holz, T. (2020). Leveraging frequency analysis for deep fake image recognition. In Proc: ICML.
Google Scholar
Gerstner, C. R., & Farid, H. (2022). Detecting real-time deep-fake videos using active illumination. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 53–60.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Bing, X., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. In Proc: NeurIPS.
Google Scholar
Haiwei, W., Zhou, J., Tian, J., & Liu, J. (2022). Robust image forgery detection over online social network shared images. In Proc: CVPR.
Google Scholar
Haliassos, A., Mira, R., Petridis, S., & Pantic, M. (2022). Leveraging real talking faces via self-supervision for robust forgery detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 14950–14962.
Haliassos, A., Vougioukas, K., Petridis, S., & Pantic, M. (2021). Lips don’t lie: A generalisable & robust approach to face forgery detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 5039–5049.
Hara, K., Kataoka, H., & Satoh, Y. (2017). Learning spatio-temporal features with 3d residual networks for action recognition. Proceedings of the IEEE international conference on computer vision workshops, pp. 3154–3160.
He, K. , Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
Heo, Y.-J., Choi, Y.-J., Lee, Y.-W., & Kim, B.-G. (2021). Deepfake detection scheme based on vision transformer and distillation. arXiv:2104.01353.
Hong, F.-T., Zhang, L., Shen, L., & Dan, X. (2022). Depth-aware generative adversarial network for talking head video generation. In Proc: CVPR.
Book Google Scholar
Jia, G., Zheng, M., Chuanrui, H., Ma, X., Yuting, X., Liu, L., Deng, Y., & He, R. (2021). Inconsistency-aware wavelet dual-branch network for face forgery detection. IEEE T-BIOM,3(3) .
Jiang, L., Li, R., Wu, W., Qian, C., & Loy, C. Change. (2020). Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2889–2898.
Juan, H., Liao, X., Liang, J., Zhou, W., & Qin, Z. (2022). Finfer: Frame inference-based Deepfake detection for high-visual-quality videos. In Proceedings of the AAAI conference on artificial intelligence 951–959.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410.
Khan, S. A., & Dai, H. (2021). Video transformer for Deepfake detection with incremental learning. In In Proceedings of the 29th ACM international conference on multimedia, pp. 1821–1828.
Khormali, A., & Yuan, J.-S. (2022). Dfdt: An end-to-end Deepfake detection framework using vision transformer. Applied Sciences, 12(6), 2953.
Article Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
Kipf, Thomas N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
Kwon, P., You, J., Nam, G., Park, S., and Chae, G A large-scale korean deepfake detection dataset. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10744–10753, 2021.
Le, B M., & Woo, S S. (2022) Add: Frequency attention and multi-view based knowledge distillation to detect low-quality compressed deepfake images. In Proceedings of the AAAI conference on artificial intelligence.
Li, Y., & Lyu, S. (2019). Exposing deepfake videos by detecting face warping artifacts. In Proc. CVPRW, pp. 656–663.
Li, L., Bao, J., Yang, H., Chen, D., & Wen, F. (2020). Advancing high fidelity identity swapping for forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5074–5083.
Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 5001–5010.
Li, Y., Chang, M-C., & Lyu, S. (2018). In ictu oculi: Exposing AI generated fake face videos by detecting eye blinking. In IEEE WIFS.
Li, Y., Xin Y., Pu, S., Honggang Q, & Lyu, S. (2020). Celeb-df: A large-scale challenging dataset for Deepfake forensics. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3207–3216.
Liang, X., Hu, Z., Zhang, H., Lin, L., & Xing, E P. (2018). Symbolic graph reasoning meets convolutions. Advances in Neural Information Processing Systems.
Liu, Z., Lin, Y, Cao, Y, Han, H., Wei, Y, Zhang, Z, Lin, S, & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision 10012–10022.
Liu, Z., Luo, D., Wang, Y., Wang, L., Tai, Y., Wang, C., Li, J., Huang, F., & Lu, T., Teinet: Towards an efficient architecture for video recognition. In Proceedings of the AAAI conference on artificial intelligence, pp. 11669–11676, 2020.
Liu, Z., Qi, X., & Torr, P HS. (2020). Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Li, J., Xie, H., Li, J., Wang, Z., & Zhang, Y. (2021). Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proc: CVPR.
Book Google Scholar
MarekKowalski. Faceswap. https://github.com/MarekKowalski/FaceSwap/, 2021.
Masi, I., Killekar, A., Mascarenhas, R M., Gurudatt, Shenoy P., & AbdAlmageed, W. 2020. Two-branch recurrent network for isolating Deepfakes in videos. In Proc. ECCV, pages 667–684.
Mirsky, Y., & Lee, W. (2021). The creation and detection of Deepfakes: A survey. ACM CSUR, 54(1), 1–41.
Google Scholar
Neimark, D., Bar, O., Zohar, M., & Asselmann, D. (2021). Video transformer network. In Proc. ICCV, pp. 3163–3172.
Ni, Y., Meng, D., Changqian, Y., Quan, C., Ren, D., & Zhao, Y. (2022). Core: Consistent representation learning for face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 12–21.
Ni, Y., Meng, D., Changqian, Y., Quan, C., Ren, D., & Zhao, Y. (2022). Core: Consistent representation learning for face forgery detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12–21.
Ning, Y., Skripniuk, V., Chen, D., Davis, L., & Fritz, M. (2022). Responsible disclosure of generative models using scalable fingerprinting. In Proc: ICLR.
Google Scholar
Nirkin, Y., Wolf, L., Keller, Y., & Hassner, T. (2021) Deepfake detection based on discrepancies between faces and their context. In IEEE TPAMI.
Peipeng, Y., Fei, J., Xia, Z., Zhou, Z., & Weng, J. (2022). Improving generalization by commonality learning in face forgery detection. IEEE TIFS, 17, 547–558.
Google Scholar
Qian, Y., Guojun Yin, L., Sheng, Z. C., & Shao, J. (2020). Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proc: ECCV.
Google Scholar
Ricker, J., Damm, S., Holz, T., & Fischer, A. (2022). Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571.
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Niessner, M. (2019). Faceforensics++: Learning to detect manipulated facial images. Proceedings of the IEEE/CVF international conference on computer vision, 1–11.
Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent convolutional strategies for face manipulation detection in videos. In Proc. CVPRW, pp. 80–87.
Safaei, M., & Foroosh, H. (2019). Still image action recognition by predicting spatial-temporal pixel evolution. In 2019 IEEE winter conference on applications of computer vision pp. 111–120.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision, pp. 618–626.
Shiohara, K., & Yamasaki, T. (2022). Detecting Deepfakes with self-blended images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 18720–18729.
Sun, K., Yao, T., Chen, S., Ding, S., Li, J., & Ji, R. (2022). Dual contrastive learning for general face forgery detection. Proceedings of the AAAI conference on artificial intelligence pp. 2316–2324.
Sun, Y., Zhang, Z., Qiu, C., Liang W., L., & Sun, & Zekai W. (2022). Faketransformer: Exposing face forgery from spatial-temporal representation modeled by facial pixel variations. In 2022 7th international conference on intelligent computing and signal processing pp. 705–713.
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. International conference on machine learning, pp. 6105–6114.
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2387–2395.
Thies, J., Zollhöfer, M., & Nießner, M. (2019). Deferred neural rendering: Image synthesis using neural textures. ACM TOG, 38(4), 1–12.
Article Google Scholar
Verdoliva, L. (2020). Media forensics & Deepfakes: An overview. IEEE Journal of Selected Topics in Signal Processing, 14(5), 910–932.
Article Google Scholar
Wang, C., & Deng, W. (2021). Representative forgery mining for fake face detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 14923–14932.
Wang, X., & Gupta, A. (2018). Videos as space-time region graphs. Proceedings of the European conference on computer vision, pp. 399–417.
Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proc. CVPR, pp. 7794–7803.
Wang, P., Liu, K., Zhou, W., Zhou, H., Liu, H., Zhang, W., & Nenghai, Y. (2022). Adt: Anti-deepfake transformer. In ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing, pp. 2899–1903.
Wang, S.-Y., Wang, O., Zhang, R., Owens, A., & Efros, A. A. (2020). CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 8695–8704.
Wang, J., Zuxuan, W., Ouyang, W., Han, Xintong, C., Jingjing, J., Yu-G., & Li, S-N. (2022). M2tr: Multi-modal multi-scale transformers for Deepfake detection. In Proceedings of the 2022 international conference on multimedia retrieval 615–623.
Wodajo, D., & Atnafu, S. Deepfake video detection using convolutional vision transformer. arXiv:2102.11126, 2021.
Yang, J., Ang, Y Z., Guo, Z., Zhou, K., Zhang, W., & Liu, Z. Panoptic scene graph generation. In Procedings of ECCV, pp. 178–196, 2022.
Yang, X., Li, Y., & Lyu, S. (2019). Exposing deep fakes using inconsistent head poses. In Proc: ICASSP.
Book Google Scholar
Yang, Z., Liang, J., Yuting, X., Zhang, X.-Y., & He, R. (2023). Masked relation learning for Deepfake detection. IEEE TIFS, 18, 1696–1708.
Google Scholar
Yao, B., & Fei-Fei, L. (2012). Action recognition with exemplar based 2.5 d graph matching. In Proc. ECCV, pages 173–186.
Yiwei, R., Zhou, W., Liu, Y., Sun, Ji., & Li, Q. (2021). Bita-net: Bi-temporal attention network for facial video forgery detection. In: In 2021 IEEE International Joint Conference on Biometrics, pp. 1–8.
Yuting, X., Jia, G., Huang, H., Duan, J., & He, R. (2021). Visual-semantic transformer for face forgery detection. In 2021 IEEE International Joint Conference on Biometrics pp. 1–7.
Yuting, X., Liang, Jian, J, Gengyun, Y, Ziming, Z, Yanhao, & He, R. (2023). Tall: Thumbnail layout for Deepfake video detection. In Proceedings of the IEEE/CVF international conference on computer vision, 22658–22668.
Zhang, Y., Li, X., Liu, C., Shuai, B., Zhu, Y., Brattoli, B., Chen, H., Marsic, I., & Tighe, J. (2021). Vidtr: Video transformer without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 13577–13587.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Yu. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.
Article Google Scholar
Zhao, T., Xiang, X., Mingze, X., Ding, H., Xiong, Y., & Xia, W. (2021). Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF international conference on computer vision pp. 15023–15033.
Zhao, H., Zhou, W., Chen, Dongdong., Zhang, Weiming., & Yu, Nenghai. Self-supervised transformer for Deepfake detection. arXiv:2203.01265, 2022.
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Nenghai, Y. (2021). In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 2185–2194.
Zhao, C., Wang, C., Guosheng, H., Chen, H., Liu, C., & Tang, J. (2023). Istvt: Interpretable spatial-temporal video transformer for Deepfake detection. IEEE TIFS, 18, 1335–1348.
Google Scholar
Zheng, Y., Bao, J., Chen, D., Zeng, M., & Wen, F. (2021). Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF international conference on computer vision pp. 15044–15054.
Zhihao, G., Chen, Y., Yao, T., Ding, S., Li, J., Huang, F., & Ma, L. (2021). Spatiotemporal inconsistency learning for Deepfake video detection. Proceedings of the 29th ACM international conference on multimedia pp. 3473–3481.
Zhihao, G., Chen, Y., Yao, T., Ding, S., Li, J., & Ma, L. (2022). Delving into the local: Dynamic inconsistency learning for Deepfake video detection. In Proc: AAAI.
Google Scholar
Zhou, Y., & Lim, S-N. (2021). Joint audio-visual Deepfake detection. Proceedings of the IEEE/CVF international conference on computer vision pp. 14800–14809.
Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y.-G. (2020). Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM international conference on multimedia 2382–2390.

Download references

Acknowledgements

The authors would like to thank the reviewers and the associate editor for their valuable comments. The authors also thank Ziming Yang, and Gengyun Jia for their help in improving the technical writing aspect of this paper. This work was supported by the National Natural Science Foundation of China (NSFC) under Grants (62376265, 62276256 and U21B2045), the Beijing Nova Program under Grant Z211100002121108, and the Young Elite Scientists Sponsorship Program by CAST.

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Yuting Xu & Xiao-Yu Zhang
CRIPAC and MAIS, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jian Liang & Lijun Sheng
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yuting Xu & Xiao-Yu Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Jian Liang
Department of Automation, University of Science and Technology of China, Beijing, China
Lijun Sheng

Authors

Yuting Xu
View author publications
Search author on:PubMed Google Scholar
Jian Liang
View author publications
Search author on:PubMed Google Scholar
Lijun Sheng
View author publications
Search author on:PubMed Google Scholar
Xiao-Yu Zhang
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Jian Liang or Xiao-Yu Zhang.

Additional information

Communicated by Segio Escalera.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, Y., Liang, J., Sheng, L. et al. Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection. Int J Comput Vis 132, 5663–5680 (2024). https://doi.org/10.1007/s11263-024-02054-2

Download citation

Received: 15 September 2023
Accepted: 08 March 2024
Published: 24 June 2024
Version of record: 24 June 2024
Issue date: December 2024
DOI: https://doi.org/10.1007/s11263-024-02054-2

Keywords

Profiles

Yuting Xu View author profile

Part of a collection:

Special Issue on Biometrics Security and Privacy

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Video-based spatio-temporal scene graph generation with efficient self-supervision tasks

Tail-Enhanced Representation Learning for Surgical Triplet Recognition

Automatic Generation of Semantic Parts for Face Image Synthesis

Explore related subjects

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now