Abstract
Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, i.e., high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.
Similar content being viewed by others
Data Availability
ETH-XGaze dataset is available at https://ait.ethz.ch/projects/2020/ETH-XGaze. Gaze360 dataset is available at http://gaze360.csail.mit.edu. MPIIGaze dataset is available at https://www.mpi-inf.mpg.de/MPIIGaze. EyeDiap dataset is available at https://www.idiap.ch/dataset/eyediap. The source code will be released.
References
Admoni, H., & Scassellati, B. (2017). Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction, 6(1), 25–63.
Biggio, B., Nelson, B., & Laskov, P. (2012) Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389
Cai, X., Zeng, J., Shan, S., & Chen, X. (2023). Source-free adaptive gaze estimation by uncertainty reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22035–22045
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR
Cheng, Y., & Lu, F. (2022) Gaze estimation using transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3341–3347 . IEEE
Cheng, Y., Lu, F., & Zhang, X. (2018). Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 100–115
Cheng, Y., Bao, Y., & Lu, F. (2022). Puregaze: Purifying gaze feature for generalizable gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 436–443.
Cheng, Y., Huang, S., Wang, F., Qian, C., & Lu, F. (2020). A coarse-to-fine adaptive network for appearance-based gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 10623–10630.
Cui, S., Wang, S., Zhuo, J., Su, C., Huang, Q., & Tian, Q. (2020) Gradually vanishing bridge for adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12455–12464
Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive processing, 8(3), 151–158.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020) An image is worth 16\(\times \)16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Funes Mora, K.A., Monay, F., & Odobez, J.-M. (2014). Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems 27
Goodfellow, I.J., Shlens, J., Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
Grauman, K., Betke, M., Gips, J., & Bradski, G.R. (2001). Communication via eye blinks-detection and duration analysis in real time. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1,p. IEEE
Guo, Z., Yuan, Z., Zhang, C., Chi, W., Ling, Y., & Zhang, S. (2020). Domain adaptation gaze estimation by embedding with prediction consistency. In: Proceedings of the Asian Conference on Computer Vision
Hallinan, P. W. (1991). Recognizing human eyes. Geometric Methods in Computer Vision,1570, 214–226. SPIE
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Huang, J., & Wechsler, H. (1999). Eye detection using optimal wavelet packets and radial basis functions (rbfs). International Journal of Pattern Recognition and Artificial Intelligence, 13(07), 1009–1025.
Kang, G., Jiang, L., Wei, Y., Yang, Y., & Hauptmann, A. G. (2020). Contrastive adaptation network for single-and multi-source domain adaptation. IEEE transactions on pattern analysis and machine intelligence
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., & Torralba, A. (2019). Gaze360: Physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921
Krejtz, K., Duchowski, A., Zhou, H., Jörg, S., & Niedzielska, A. (2018). Perceptual evaluation of synthetic gaze jitter. Computer Animation and Virtual Worlds, 29(6), 1745.
Lee, D.-H., et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896
Li, J., Zhou, P., Xiong, C., Hoi, S.C.: Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020)
Liu, W., Ferstl, D., Schulter, S., Zebedin, L., Fua, P., & Leistner, C. (2021) Domain adaptation for semantic segmentation via patch-wise contrastive learning. arXiv preprint arXiv:2104.11056
Liu, Y., Liu, R., Wang, H., & Lu, F. (2021). Generalizing gaze estimation with outlier-guided collaborative adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3835–3844
Liu, A., Liu, X., Yu, H., Zhang, C., Liu, Q., & Tao, D. (2021). Training robust deep neural networks via adversarial noise propagation. IEEE Transactions on Image Processing, 30, 5769–5781.
Liu, H., Qi, J., Li, Z., Hassanpour, M., Wang, Y., Plataniotis, K. N., & Yu, Y. (2024). Test-time personalization with meta prompt for gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 3621–3629.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Majaranta, P., & Bulling, A. (2014). Eye Tracking and Eye-Based Human-Computer Interaction. In S. Fairclough & K. Gilleade (Eds.), Advances in Physiological Computing Human-Computer Interaction Series. London: Springer. https://doi.org/10.1007/978-1-4471-6392-3_3
Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., & Lu, F. (2021). Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition, 110, 107332.
Mei, S., & Zhu, X. (2015). Using machine teaching to identify optimal training-set attacks on machine learners. In: Twenty-Ninth AAAI Conference on Artificial Intelligence
Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582
Mughrabi, M.H., Mutasim, A.K., Stuerzlinger, W., & Batmaz, A.U. (2022). My eyes hurt: Effects of jitter in 3d gaze tracking. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 310–315. IEEE
Park, H.S., Jain, E., Sheikh, Y. (2013). Predicting primary gaze behavior using social saliency fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3503–3510
Park, S., Mello, S. D., Molchanov, P., Iqbal, U., Hilliges, O., & Kautz, J. (2019). Few-shot adaptive gaze estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9368–9377
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30
Terzioğlu, Y., Mutlu, B., & Şahin, E. (2020). Designing social cues for collaborative robots: the role of gaze and breathing in human-robot collaboration. In: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 343–357
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176
Wang, X., & He, K. (2021). Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933
Wang, H., Dong, X., Chen, Z., & Shi, B.E. (2015). Hybrid gaze/eeg brain computer interface for robot arm control on a pick and place task. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1476–1479. IEEE
Wang, H., Wu, X., Huang, Z., & Xing, E.P. (2020). High-frequency component helps explain the generalization of convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8684–8694
Wang, K., Zhao, R., Su, H., & Ji, Q. (2019). Generalizing eye tracking with bayesian adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11907–11916
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.
Wu, Z., Xiong, Y., Yu, S.X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742
Xu, M., Wang, H., & Lu, F. (2023). Learning a generalized gaze estimator from gaze-consistent feature. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 3027–3035.
Yang, J., Li, C., An, W., Ma, H., Guo, Y., Rong, Y., Zhao, P., & Huang, J. (2021) Exploring robustness of unsupervised domain adaptation in semantic segmentation. arXiv preprint arXiv:2105.10843
Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., & Hilliges, O. (2020). Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: European Conference on Computer Vision, pp. 365–381 . Springer
Zhang, X., Sugano, Y., Fritz, M., & Bulling, A. (2017). Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE transactions on pattern analysis and machine intelligence, 41(1), 162–175.
Zimmermann, R. S. (2019) Comment on" adv-bnn: Improved adversarial defense through robust bayesian neural network". arXiv preprint arXiv:1907.00895
Funding
This work was supported by Beijing Natural Science Foundation (L242019), and by the National Natural Science Foundation of China (NSFC) under Grant 62372019.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Communicated by Hsuan Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, R., Wang, H. & Lu, F. From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components. Int J Comput Vis 133, 1290–1305 (2025). https://doi.org/10.1007/s11263-024-02233-1
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02233-1