Consistency-Aware Fake Videos Detection on Short Video Platforms

Wang, Junxi; Liu, Jize; Zhang, Na; Wang, Yaxiong

doi:10.1007/978-981-96-9812-7_17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15859))

Included in the following conference series:

International Conference on Intelligent Computing

367 Accesses

Abstract

This paper focuses to detect the fake news on the short video platforms. While significant research efforts have been devoted to this task with notable progress in recent years, current detection accuracy remains suboptimal due to the rapid evolution of content manipulation and generation technologies. Existing approaches typically employ a cross-modal fusion strategy that directly combines raw video data with metadata inputs before applying a classification layer. However, our empirical observations reveal a critical oversight: manipulated content frequently exhibits inter-modal inconsistencies that could serve as valuable discriminative features, yet remain underutilized in contemporary detection frameworks. Motivated by this insight, we propose a novel detection paradigm that explicitly identifies and leverages cross-modal contradictions as discriminative cues. Our approach consists of two core modules: Cross-modal Consistency Learning (CMCL) and Multi-modal Collaborative Diagnosis (MMCD). CMCL includes Pseudo-label Generation (PLG) and Cross-modal Consistency Diagnosis (CMCD). In PLG, a Multimodal Large Language Model is used to generate pseudo-labels for evaluating cross-modal semantic consistency. Then, CMCD extracts [CLS] tokens and computes cosine loss to quantify cross-modal inconsistencies. MMCD further integrates multimodal features through Multimodal Feature Fusion (MFF) and Probability Scores Fusion (PSF). MFF employs a co-attention mechanism to enhance semantic interactions across different modalities, while a Transformer is utilized for comprehensive feature fusion. Meanwhile, PSF further integrates the fake news probability scores obtained in the previous step. Extensive experiments on established benchmarks (FakeSV and FakeTT) demonstrate our model exhibits outstanding performance in Fake videos detection.Our code is available at https://github.com/Sakura-not-sleep/CA_FVD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fine-Grained Cross-Modal Contrast Learning for Video-Text Retrieval

A Novel Multi-modal Deep Learning Method for Detecting Material Weaknesses in Internal Control

Cross-modal contrastive learning for multimodal sentiment recognition

Article 25 March 2024

References

Igwebuike, E.E., Chimuanya, L.: Legitimating falsehood in social media: a discourse analysis of political fake news. Discourse Commun. 15(1), 42–58 (2021)
Article Google Scholar
Fong, B.: Analysing the behavioural finance impact of ‘fake news’ phenomena on financial markets: a representative agent model and empirical validation. Financ. Innov. 7(1), 53 (2021)
Article Google Scholar
Bezbaruah, S., Dhir, A., Talwar, S., et al.: Believing and acting on fake news related to natural food: the influential role of brand trust and system trust. Br. Food J. 124(9), 2937–2962 (2022)
Article Google Scholar
Niu, S., Shrestha, D., Ghimire, A., et al.: A survey on watching social issue videos among YouTube and TikTok users (2023). arXiv preprint arXiv:2310.19193
Sundar, S.S., Molina, M.D., Cho, E.: Seeing is believing: Is video modality more powerful in spreading fake news via online messaging apps? J. Comput. – Mediat. Commun. 26(6), 301–319 (2021)
Article Google Scholar
Bu, Y., Sheng, Q., Cao, J., et al.: FakingRecipe: Detecting fake news on short video platforms from the perspective of creative process. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 1351–1360 (2024)
Google Scholar
Choi, H., Ko, Y.: Using topic modeling and adversarial neural networks for fake news video detection. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2950–2954 (2021)
Google Scholar
Liu, X., Li, P., Huang, H., et al.: FKA – OWL: Advancing multimodal fake news detection through knowledge – augmented LVLMs. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 1015–10163 (2024)
Google Scholar
Ghanem, B., Ponzetto, S.P., Rosso, P., et al.: FakeFlow: Fake news detection by modeling the flow of affective information (2021). arXiv preprint arXiv:2101.09810
Cao, J., Qi, P., Sheng, Q., et al.: Exploring the role of visual content in fake news detection. In: Shu, K., Wang, S., Lee, D., Liu, H. (eds.) Disinformation, Misinformation, and Fake News in Social Media. LNSN, pp. 141–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-42699-6_8
Chapter Google Scholar
Hong, R., Lang, J., Xu, J., et al.: Following clues, approaching the truth: Explainable micro – video rumor detection via chain – of – thought reasoning. In: The Web Conference 2025 (2025)
Google Scholar
Shang, L., Kou, Z., Zhang, Y., et al.: A multimodal misinformation detector for COVID – 19 short videos on TikTok. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 899–908. IEEE (2021)
Google Scholar
Li, X., Xiao, X., Li, J., et al.: A CNN – based misleading video detection model. Sci. Rep. 12(1) (2022)
Google Scholar
Medina Serrano, J.C., Papakyriakopoulos, O., Hegelich, S.: NLP – based feature extraction for the detection of COVID – 19 misinformation videos on YouTube. In: ACL Workshop on Natural Language Processing for COVID – 19 (NLP – COVID) (2020)
Google Scholar
Qi, P., Zhao, Y., Shen, Y., et al.: Two heads are better than one: Improving fake news video detection by correlating with neighbors (2023). arXiv preprint arXiv:2306.05241
Su, Y., Lan, T., Li, H., et al.: PandaGPT: One model to instruction – follow them all (2023). arXiv preprint arXiv:2305.16355
Zhan, J., Dai, J., Ye, J., et al.: AnyGPT: Unified multimodal LLM with discrete sequence modeling (2024). arXiv preprint arXiv:2402.12226
Qi, P., Yan, Z., Hsu, W., et al.: Sniffer: multimodal large language model for explainable out – of – context misinformation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13052–13062 (2024)
Google Scholar
Qi, P., Bu, Y., Cao, J., et al.: FakeSV: a multimodal benchmark with rich social context for fake news detection on short video platforms. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, p. 14444–14452 (2023)
Google Scholar
Radford, A., Kim, J.W., Hallacy, C., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, PMLR, pp. 8748–8763 (2021)
Google Scholar
Conneau, A., Khandelwal, K., Goyal, N., et al.: Unsupervised cross – lingual representation learning at scale (2019). arXiv preprint arXiv:1911.02116
Hsu, W.N., Bolte, B., Tsai, Y.H.H., et al.: HuBERT: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021)
Article Google Scholar
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: Pretraining task – agnostic visiolinguistic representations for vision – and – language tasks. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
A GPT-4o level MLLM for vision, speech and multimodal live streaming on your phone. https://github.com/OpenBMB/MiniCPM-o
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: Pre–training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16 × 16 words: Transformers for image recognition at scale (2020). arXiv preprint arXiv:2010.11929
Chen, Z., Wang, W., Cao, Y., et al.: Expanding performance boundaries of open – source multimodal models with model, data, and test – time scaling (2024). arXiv preprint arXiv:2412.05271

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Hefei University of Technology, Anhui, China
Junxi Wang, Jize Liu, Na Zhang & Yaxiong Wang

Authors

Junxi Wang
View author publications
Search author on:PubMed Google Scholar
Jize Liu
View author publications
Search author on:PubMed Google Scholar
Na Zhang
View author publications
Search author on:PubMed Google Scholar
Yaxiong Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Yaxiong Wang .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Eastern Institute of Technology, Ningbo, Zhejiang, China
Qinhu Zhang
Tianjin University of Science and Technology, Tianjin, China
Chuanlei Zhang
China University of Mining and Technology, Xuzhou, Jiangsu, China
Wei Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Liu, J., Zhang, N., Wang, Y. (2025). Consistency-Aware Fake Videos Detection on Short Video Platforms. In: Huang, DS., Zhang, Q., Zhang, C., Chen, W. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2025. Lecture Notes in Computer Science, vol 15859. Springer, Singapore. https://doi.org/10.1007/978-981-96-9812-7_17

Download citation

DOI: https://doi.org/10.1007/978-981-96-9812-7_17
Published: 26 July 2025
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-9811-0
Online ISBN: 978-981-96-9812-7
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics