这是indexloc提供的服务,不要输入任何密码
Skip to main content

Consistency-Aware Fake Videos Detection on Short Video Platforms

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2025)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15859))

Included in the following conference series:

  • 367 Accesses

Abstract

This paper focuses to detect the fake news on the short video platforms. While significant research efforts have been devoted to this task with notable progress in recent years, current detection accuracy remains suboptimal due to the rapid evolution of content manipulation and generation technologies. Existing approaches typically employ a cross-modal fusion strategy that directly combines raw video data with metadata inputs before applying a classification layer. However, our empirical observations reveal a critical oversight: manipulated content frequently exhibits inter-modal inconsistencies that could serve as valuable discriminative features, yet remain underutilized in contemporary detection frameworks. Motivated by this insight, we propose a novel detection paradigm that explicitly identifies and leverages cross-modal contradictions as discriminative cues. Our approach consists of two core modules: Cross-modal Consistency Learning (CMCL) and Multi-modal Collaborative Diagnosis (MMCD). CMCL includes Pseudo-label Generation (PLG) and Cross-modal Consistency Diagnosis (CMCD). In PLG, a Multimodal Large Language Model is used to generate pseudo-labels for evaluating cross-modal semantic consistency. Then, CMCD extracts [CLS] tokens and computes cosine loss to quantify cross-modal inconsistencies. MMCD further integrates multimodal features through Multimodal Feature Fusion (MFF) and Probability Scores Fusion (PSF). MFF employs a co-attention mechanism to enhance semantic interactions across different modalities, while a Transformer is utilized for comprehensive feature fusion. Meanwhile, PSF further integrates the fake news probability scores obtained in the previous step. Extensive experiments on established benchmarks (FakeSV and FakeTT) demonstrate our model exhibits outstanding performance in Fake videos detection.Our code is available at https://github.com/Sakura-not-sleep/CA_FVD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Igwebuike, E.E., Chimuanya, L.: Legitimating falsehood in social media: a discourse analysis of political fake news. Discourse Commun. 15(1), 42–58 (2021)

    Article  Google Scholar 

  2. Fong, B.: Analysing the behavioural finance impact of ‘fake news’ phenomena on financial markets: a representative agent model and empirical validation. Financ. Innov. 7(1), 53 (2021)

    Article  Google Scholar 

  3. Bezbaruah, S., Dhir, A., Talwar, S., et al.: Believing and acting on fake news related to natural food: the influential role of brand trust and system trust. Br. Food J. 124(9), 2937–2962 (2022)

    Article  Google Scholar 

  4. Niu, S., Shrestha, D., Ghimire, A., et al.: A survey on watching social issue videos among YouTube and TikTok users (2023). arXiv preprint arXiv:2310.19193

  5. Sundar, S.S., Molina, M.D., Cho, E.: Seeing is believing: Is video modality more powerful in spreading fake news via online messaging apps? J. Comput. – Mediat. Commun. 26(6), 301–319 (2021)

    Article  Google Scholar 

  6. Bu, Y., Sheng, Q., Cao, J., et al.: FakingRecipe: Detecting fake news on short video platforms from the perspective of creative process. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 1351–1360 (2024)

    Google Scholar 

  7. Choi, H., Ko, Y.: Using topic modeling and adversarial neural networks for fake news video detection. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2950–2954 (2021)

    Google Scholar 

  8. Liu, X., Li, P., Huang, H., et al.: FKA – OWL: Advancing multimodal fake news detection through knowledge – augmented LVLMs. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 1015–10163 (2024)

    Google Scholar 

  9. Ghanem, B., Ponzetto, S.P., Rosso, P., et al.: FakeFlow: Fake news detection by modeling the flow of affective information (2021). arXiv preprint arXiv:2101.09810

  10. Cao, J., Qi, P., Sheng, Q., et al.: Exploring the role of visual content in fake news detection. In: Shu, K., Wang, S., Lee, D., Liu, H. (eds.) Disinformation, Misinformation, and Fake News in Social Media. LNSN, pp. 141–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-42699-6_8

    Chapter  Google Scholar 

  11. Hong, R., Lang, J., Xu, J., et al.: Following clues, approaching the truth: Explainable micro – video rumor detection via chain – of – thought reasoning. In: The Web Conference 2025 (2025)

    Google Scholar 

  12. Shang, L., Kou, Z., Zhang, Y., et al.: A multimodal misinformation detector for COVID – 19 short videos on TikTok. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 899–908. IEEE (2021)

    Google Scholar 

  13. Li, X., Xiao, X., Li, J., et al.: A CNN – based misleading video detection model. Sci. Rep. 12(1) (2022)

    Google Scholar 

  14. Medina Serrano, J.C., Papakyriakopoulos, O., Hegelich, S.: NLP – based feature extraction for the detection of COVID – 19 misinformation videos on YouTube. In: ACL Workshop on Natural Language Processing for COVID – 19 (NLP – COVID) (2020)

    Google Scholar 

  15. Qi, P., Zhao, Y., Shen, Y., et al.: Two heads are better than one: Improving fake news video detection by correlating with neighbors (2023). arXiv preprint arXiv:2306.05241

  16. Su, Y., Lan, T., Li, H., et al.: PandaGPT: One model to instruction – follow them all (2023). arXiv preprint arXiv:2305.16355

  17. Zhan, J., Dai, J., Ye, J., et al.: AnyGPT: Unified multimodal LLM with discrete sequence modeling (2024). arXiv preprint arXiv:2402.12226

  18. Qi, P., Yan, Z., Hsu, W., et al.: Sniffer: multimodal large language model for explainable out – of – context misinformation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13052–13062 (2024)

    Google Scholar 

  19. Qi, P., Bu, Y., Cao, J., et al.: FakeSV: a multimodal benchmark with rich social context for fake news detection on short video platforms. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, p. 14444–14452 (2023)

    Google Scholar 

  20. Radford, A., Kim, J.W., Hallacy, C., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, PMLR, pp. 8748–8763 (2021)

    Google Scholar 

  21. Conneau, A., Khandelwal, K., Goyal, N., et al.: Unsupervised cross – lingual representation learning at scale (2019). arXiv preprint arXiv:1911.02116

  22. Hsu, W.N., Bolte, B., Tsai, Y.H.H., et al.: HuBERT: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021)

    Article  Google Scholar 

  23. Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: Pretraining task – agnostic visiolinguistic representations for vision – and – language tasks. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  24. A GPT-4o level MLLM for vision, speech and multimodal live streaming on your phone. https://github.com/OpenBMB/MiniCPM-o

  25. Devlin, J., Chang, M.W., Lee, K., et al.: BERT: Pre–training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)

    Google Scholar 

  26. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16 × 16 words: Transformers for image recognition at scale (2020). arXiv preprint arXiv:2010.11929

  27. Chen, Z., Wang, W., Cao, Y., et al.: Expanding performance boundaries of open – source multimodal models with model, data, and test – time scaling (2024). arXiv preprint arXiv:2412.05271

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaxiong Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Liu, J., Zhang, N., Wang, Y. (2025). Consistency-Aware Fake Videos Detection on Short Video Platforms. In: Huang, DS., Zhang, Q., Zhang, C., Chen, W. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2025. Lecture Notes in Computer Science, vol 15859. Springer, Singapore. https://doi.org/10.1007/978-981-96-9812-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-9812-7_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-9811-0

  • Online ISBN: 978-981-96-9812-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics