Abstract
Although the generalization of face anti-spo-ofing (FAS) is increasingly concerned, it is still in the initial stage to solve it based on Vision Transformer (ViT). In this paper, we present a cross-domain FAS framework, dubbed the Transformer with dual Cross-Attention and semi-fixed Mixture-of-Expert (CA-MoEiT), for stimulating the generalization of Face Anti-Spoofing (FAS) from three aspects: (1) Feature augmentation. We insert a MixStyle after PatchEmbed layer to synthesize diverse patch embeddings from novel domains and enhance the generalizability of the trained model. (2) Feature alignment. We design a dual cross-attention mechanism which extends the self-attention to align the common representation from multiple domains. (3) Feature complement. We design a semi-fixed MoE (SFMoE) to selectively replace MLP by introducing a fixed super expert. Benefiting from the gate mechanism in SFMoE, professional experts are adaptively activated with independent learning domain-specific information, which is used as a supplement to domain-invariant features learned by the super expert to further improve the generalization. It is important that the above three technologies can be compatible with any variant of ViT as plug-and-play modules. Extensive experiments show that the proposed CA-MoEiT is effective and outperforms the state-of-the-art methods on several public datasets.
Similar content being viewed by others
References
Boulkenafet, Z., Komulainen, J., Li, L., Feng, X., Hadid, A. (2017). Oulu-npu: A mobile face presentation attack database with real-world variations. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition FGR, pp. 612–618.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S. (2020). End-to-end object detection with transformers .
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660.
Chefer, H., Gur, S., Wolf, L. (2021). Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 782–791.
Chen, C.F., Fan, Q., Panda, R. (2021). Crossvit: Cross-attention multi-scale vision transformer for image classification.
Chen, Z., Yao, T., Sheng, K., Ding, S., Tai, Y., Li, J., Huang, F., Jin, X. (2021). Generalizable representation learning for mixture domain face anti-spoofing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1132–1139.
Chingovska, I., Anjos, A., Marcel, S. (2012). On the effectiveness of local binary patterns in face anti-spoofing. In: 2012 BIOSIG-Proceedings of the International Conference of Biometrics Special Interest Group BIOSIG.
Dmitry Ulyanov Andrea Vedaldi, V.L.: Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022v3 (2016)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. ICLR.
Erdogmus, N., Marcel, S. (2014). Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect. In: 2013 IEEE 6th International Conference on Biometrics: Theory, Applications and Systems (BTAS).
Fang, H., Liu, A., Wan, J., Escalera, S., Escalante, H.J., Lei, Z. (2023). Surveillance face presentation attack detection challenge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 6360–6370.
Fang, H., Liu, A., Wan, J., Escalera, S., Zhao, C., Zhang, X., Li, S.Z., Lei, Z. (2023). Surveillance face anti-spoofing. arXiv preprint arXiv:2301.00975.
Feng, L., Po, L. M., Li, Y., Xu, X., Yuan, F., Cheung, T. C. H., & Cheung, K. W. (2016). Integration of image quality and motion cues for face anti-spoofing: A neural network approach. Journal of Visual Communication and Image Representation, 38, 451–460.
George, A., Marcel, S. (2019). Deep pixel-wise binary supervision for face presentation attack detection. In: International Conference on Biometrics (ICB).
George, A., Marcel, S. (2021). Cross modal focal loss for rgbd face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, pp. 7882–7891.
George, A., Marcel, S. (2021). On the effectiveness of vision transformers for zero-shot face anti-spoofing
George, A., Mostaani, Z., Geissenbuhler, D., Nikisins, O., Anjos, A., & Marcel, S. (2019). Biometric face presentation attack detection with multi-channel convolutional neural network. IEEE Transactions on Information Forensics and Security, 15, 42–55.
Girdhar, R., Carreira, J.a., Doersch, C., Zisserman, A. (2019). Video Action Transformer Network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.
Hu, R., Singh, A. (2021). Unit: Multimodal multitask learning with a unified transformer.
Huang, X., Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510.
Huang, H.P., Sun, D., Liu, Y., Chu, W.S., Xiao, T., Yuan, J., Adam, H., Yang, M.H. (2022). Adaptive transformers for robust few-shot cross-domain face anti-spoofing. arXiv preprint arXiv:2203.12175.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.
Jia, Y., Zhang, J., Shan, S., Chen, X. (2020). Single-side domain generalization for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8484–8493
Jourabloo, A., Liu, Y., Liu, X. (2018). Face de-spoofing: Anti-spoofing via noise modeling. In: Proceedings of the European conference on computer vision (ECCV).
Kim, T., & Kim, Y. (2021). Suppressing spoof-irrelevant factors for domain-agnostic face anti-spoofing. IEEE Access, 9, 86966–86974.
Li, L., Feng, X., Boulkenafet, Z., Xia, Z., Li, M., Hadid, A. (2016). An original face anti-spoofing approach using partial convolutional neural network. In: 2016 6th International Conference on Image Processing Theory, Tools and Applications (IPTA).
Li, H., Pan, S.J., Wang, S., Kot, A.C. (2018). Domain generalization with adversarial feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409.
Li, X., Wan, J., Jin, Y., Liu, A., Guo, G., Li, S.Z. (2020). 3dpc-net: 3d point cloud network for face anti-spoofing. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), IEEE, pp. 1–8.
Li, B., Yang, J., Ren, J., Wang, Y., Liu, Z. (2022). Sparse fusion mixture-of-experts are domain generalizable learners. arXiv preprint arXiv:2206.04046.
Li, H., Li, W., Cao, H., Wang, S., Huang, F., & Kot, A. C. (2018). Unsupervised domain adaptation for face anti-spoofing. IEEE Transactions on Information Forensics and Security, 13(7), 1794–1809.
Liu, Y., Jourabloo, A., Liu, X. (2018). Learning deep models for face anti-spoofing: Binary or auxiliary supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR.
Liu, A., Liang, Y. (2022). Ma-vit: Modality-agnostic vision transformers for face anti-spoofing. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI-22, International Joint Conferences on Artificial Intelligence Organization, pp. 1180–1186.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision CVPR, pp. 10,012–10,022.
Liu, Y., Stehouwer, J., Liu, X. (2020). On disentangling spoof trace for generic face anti-spoofing. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16 ECCV, pp. 406–422.
Liu, A., Tan, Z., Liang, Y., Wan, J. (2023). Attack-agnostic deep face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 6335–6344.
Liu, A., Tan, Z., Wan, J., Escalera, S., Guo, G., Li, S.Z. (201). Casia-surf cefa: A benchmark for multi-modal cross-ethnicity face anti-spoofing. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1179–1187.
Liu, A., Tan, Z., Yu, Z., Zhao, C., Wan, J., Liang, Y., Lei, Z., Zhang, D., Li, S.Z., Guo, G. (2023). Fm-vit: Flexible modal vision transformers for face anti-spoofing.
Liu, A., Wan, J., Escalera, S., Jair Escalante, H., Tan, Z., Yuan, Q., Wang, K., Lin, C., Guo, G., Guyon, I., et al. (2019). Multi-modal face anti-spoofing attack detection challenge at cvpr2019. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–10.
Liu, A., Wan, J., Jiang, N., Wang, H., Liang, Y. (2022). Disentangling facial pose and appearance information for face anti-spoofing. In: 2022 26th International Conference on Pattern Recognition (ICPR), IEEE, pp. 4537–4543.
Liu, S., Yang, B., Yuen, P.C., Zhao, G. (2016). A 3d mask face anti-spoofing database with real world variations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW.
Liu, S., Zhang, K.Y., Yao, T., Bi, M., Ding, S., Li, J., Huang, F., Ma, L. (2021). Adaptive normalized representation learning for generalizable face anti-spoofing. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1469–1477.
Liu, S., Zhang, K.Y., Yao, T., Sheng, K., Ding, S., Tai, Y., Li, J., Xie, Y., Ma, L. (2021). Dual reweighting domain generalization for face presentation attack detection. arXiv preprint arXiv:2106.16128.
Liu, A., Zhao, C., Yu, Z., Su, A., Liu, X., Kong, Z., Wan, J., Escalera, S., Escalante, H.J., Lei, Z., et al. (2021). 3d high-fidelity mask face presentation attack detection challenge. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 814–823
Liu, A., Li, X., Wan, J., Liang, Y., Escalera, S., Escalante, H. J., Madadi, M., Jin, Y., Wu, Z., Yu, X., et al. (2021). Cross-ethnicity face anti-spoofing recognition challenge: A review. IET Biometrics, 10(1), 24–43.
Liu, A., Tan, Z., Wan, J., Liang, Y., Lei, Z., Guo, G., & Li, S. Z. (2021). Face anti-spoofing via adversarial cross-modality translation. IEEE Transactions on Information Forensics and Security, 16, 2759–2772.
Liu, A., Zhao, C., Yu, Z., Wan, J., Su, A., Liu, X., Tan, Z., Escalera, S., Xing, J., Liang, Y., et al. (2022). Contrastive context-aware learning for 3d high-fidelity mask face presentation attack detection. IEEE Transactions on Information Forensics and Security, 17, 2497–2507.
Menon, L. T., Koerich, A. L., & Britto Jr, A. S. (2019). Style transfer applied to face liveness detection with user-centered models. arXiv:1907.07270.
Mostaani, Z., George, A., Heusch, G., Geissbuhler, D., Marcel, S. (2020). The high-quality wide multi-channel attack (hq-wmca) database.
Parkin, A., Grinchuk, O. (2019). Recognizing multi-modal face spoofing with face recognition networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops CVPRW.
Patel, K., Han, H., & Jain, A. K. (2016). Secure face unlock: Spoof detection on smartphones. IEEE Transactions on Information Forensics and Security, 11(10), 2268–2283.
Qin, Y., Yu, Z., Yan, L., Wang, Z., Zhao, C., & Lei, Z. (2021). Meta-teacher for face anti-spoofing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(10), 6311–6326.
Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Susano Pinto, A., Keysers, D., & Houlsby, N. (2021). Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 34, 8583–8595.
Saha, S., Xu, W., Kanakis, M., Georgoulis, S., Chen, Y., Paudel, D.P., Van Gool, L. (2020). Domain agnostic feature learning for image and video based face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 802–803.
Schwartz, W.R., Rocha, A., Pedrini, H. (2011). Face spoofing detection through partial least squares and low-level descriptors. In: 2011 International Joint Conference on Biometrics (IJCB).
Shao, R., Lan, X., Li, J., Yuen, P.C. (2019). Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR.
Shao, R., Lan, X., Yuen, P.C. (2020). Regularized fine-grained meta face anti-spoofing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11,974–11,981.
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538.
Stehouwer, J., Jourabloo, A., Liu, Y., Liu, X. (2020). Noise modeling, synthesis and classification for generic object anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, pp. 7294–7303.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H. (2021). Training data-efficient image transformers distillation through attention.
Tseng, H.Y., Lee, H.Y., Huang, J.B., Yang, M.H. (2020). Cross-domain few-shot classification via learned feature-wise transformation. arXiv preprint arXiv:2001.08735.
Tu, X., Zhang, H., Xie, M., Luo, Y., Zhang, Y., & Ma, Z. (2019). Deep transfer across domains for face antispoofing. Journal of Electronic Imaging, 28(4), 043001.
Wang, G., Han, H., Shan, S., Chen, X. (2019). Improving cross-database face presentation attack detection via adversarial domain adaptation. In: 2019 International Conference on Biometrics (ICB), IEEE, pp. 1–8.
Wang, G., Han, H., Shan, S., Chen, X. (2020). Cross-domain face presentation attack detection via multi-domain disentangled representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6678–6687.
Wang, P., Wang, X., Wang, F., Lin, M., Chang, S., Xie, W., Li, H., Jin, R. (2021). Kvt: k-nn attention for boosting vision transformers.
Wang, Z., Wang, Z., Yu, Z., Deng, W., Li, J., Gao, T., Wang, Z. (2022). Domain generalization via shuffled style assembly for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4123–4133.
Wang, Z., Yu, Z., Zhao, C., Zhu, X., Qin, Y., Zhou, Q., Zhou, F., Lei, Z. (2020). Deep spatial gradient and temporal depth learning for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR.
Wang, J., Zhang, J., Bian, Y., Cai, Y., Wang, C., Pu, S. (2021). Self-domain adaptation for face anti-spoofing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2746–2754.
Wang, G., Han, H., Shan, S., & Chen, X. (2020). Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection. IEEE Transactions on Information Forensics and Security, 16, 56–69.
Wen, D., Han, H., & Jain, A. K. (2015). Face spoof detection with image distortion analysis. IEEE Transactions on Information Forensics and Security, 10(4), 746–761.
Wightman, R. (2019). Pytorch image models. https://github.com/rwightman/pytorch-image-models
Wu, H., Zeng, D., Hu, Y., Shi, H., & Mei, T. (2021). Dual spoof disentanglement generation for face anti-spoofing with depth uncertainty learning. IEEE Transactions on Circuits and Systems for Video Technology, 32(7), 4626–4638.
Xu, T., Chen, W., Wang, P., Wang, F., Li, H., Jin, R. (2021). Cdtrans: Cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165.
Yang, J., Lei, Z., Li, S.Z. (2014). Learn convolutional neural network for face anti-spoofing. arXiv preprint arXiv:1408.5601.
Yang, B., Zhang, J., Yin, Z., Shao, J. (2021). Few-shot domain expansion for face anti-spoofing. arXiv:2106.14162.
Yu, Z., Li, X., Niu, X., Shi, J., Zhao, G. (2020). Face anti-spoofing with human material perception. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16 ECCV.
Yu, Z., Liu, A., Zhao, C., Cheng, K.H.M., Cheng, X., Zhao, G. (2023). Flexible-modal face anti-spoofing: A benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Yu, Z., Zhao, C., Wang, Z., Qin, Y., Su, Z., Li, X., Zhou, F., Zhao, G. (2020). Searching central difference convolutional networks for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR.
Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., Yan, S. (2021). Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 558–567.
Yu, Z., Li, X., Wang, P., & Zhao, G. (2021). Transrppg: Remote photoplethysmography transformer for 3d mask face presentation attack detection. IEEE Signal Processing Letters, 28, 1290–1294.
Yu, Z., Wan, J., Qin, Y., Li, X., Li, S. Z., & Zhao, G. (2020). Nas-fas: Static-dynamic central difference network search for face anti-spoofing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3005–3023.
Zhang, S., Wang, X., Liu, A., Zhao, C., Wan, J., Escalera, S., Shi, H., Wang, Z., Li, S.Z. (2019). A dataset and benchmark for large-scale multi-modal face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR.
Zhang, Z., Yan, J., Liu, S., Lei, Z., Yi, D., Li, S.Z. (2012). A face antispoofing database with diverse attacks. In: 2012 5th IAPR international Conference on Biometrics (ICB).
Zhang, K.Y., Yao, T., Zhang, J., Tai, Y., Ding, S., Li, J., Huang, F., Song, H., Ma, L. (2020). Face anti-spoofing via disentangled representation learning. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16 ECCV.
Zhang, Y., Yin, Z., Li, Y., Yin, G., Yan, J., Shao, J., Liu, Z. (2020). Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16 ECCV.
Zhang, S., Liu, A., Wan, J., Liang, Y., Guo, G., Escalera, S., Escalante, H. J., & Li, S. Z. (2020). Casia-surf: A large-scale multi-modal benchmark for face anti-spoofing. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2(2), 182–193.
Zhou, K., Yang, Y., Qiao, Y., Xiang, T. (2021). Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008.
Zhou, Q., Zhang, K.Y., Yao, T., Yi, R., Ding, S., Ma, L. (2022). Adaptive mixture of experts learning for generalizable face anti-spoofing. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6009–6018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Segio Escalera.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, A. CA-MoEiT: Generalizable Face Anti-spoofing via Dual Cross-Attention and Semi-fixed Mixture-of-Expert. Int J Comput Vis 132, 5439–5452 (2024). https://doi.org/10.1007/s11263-024-02135-2
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02135-2