From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components

Liu, Ruicong; Wang, Haofei; Lu, Feng

doi:10.1007/s11263-024-02233-1

From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components

Published: 30 September 2024

Volume 133, pages 1290–1305, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

698 Accesses
2 Citations
Explore all metrics

Abstract

Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, i.e., high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instant interaction driven adaptive gaze control interface

Article Open access 22 May 2024

Domain Adaptation Gaze Estimation by Embedding with Prediction Consistency

Spatial-Aware GAN for Instance-Guided Cross-Spectral Face Hallucination

Data Availability

ETH-XGaze dataset is available at https://ait.ethz.ch/projects/2020/ETH-XGaze. Gaze360 dataset is available at http://gaze360.csail.mit.edu. MPIIGaze dataset is available at https://www.mpi-inf.mpg.de/MPIIGaze. EyeDiap dataset is available at https://www.idiap.ch/dataset/eyediap. The source code will be released.

References

Admoni, H., & Scassellati, B. (2017). Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction, 6(1), 25–63.
Biggio, B., Nelson, B., & Laskov, P. (2012) Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389
Cai, X., Zeng, J., Shan, S., & Chen, X. (2023). Source-free adaptive gaze estimation by uncertainty reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22035–22045
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR
Cheng, Y., & Lu, F. (2022) Gaze estimation using transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3341–3347 . IEEE
Cheng, Y., Lu, F., & Zhang, X. (2018). Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 100–115
Cheng, Y., Bao, Y., & Lu, F. (2022). Puregaze: Purifying gaze feature for generalizable gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 436–443.
Article Google Scholar
Cheng, Y., Huang, S., Wang, F., Qian, C., & Lu, F. (2020). A coarse-to-fine adaptive network for appearance-based gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 10623–10630.
Article MATH Google Scholar
Cui, S., Wang, S., Zhuo, J., Su, C., Huang, Q., & Tian, Q. (2020) Gradually vanishing bridge for adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12455–12464
Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive processing, 8(3), 151–158.
Article MATH Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020) An image is worth 16$\times $16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Funes Mora, K.A., Monay, F., & Odobez, J.-M. (2014). Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems 27
Goodfellow, I.J., Shlens, J., Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
Grauman, K., Betke, M., Gips, J., & Bradski, G.R. (2001). Communication via eye blinks-detection and duration analysis in real time. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1,p. IEEE
Guo, Z., Yuan, Z., Zhang, C., Chi, W., Ling, Y., & Zhang, S. (2020). Domain adaptation gaze estimation by embedding with prediction consistency. In: Proceedings of the Asian Conference on Computer Vision
Hallinan, P. W. (1991). Recognizing human eyes. Geometric Methods in Computer Vision,1570, 214–226. SPIE
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Huang, J., & Wechsler, H. (1999). Eye detection using optimal wavelet packets and radial basis functions (rbfs). International Journal of Pattern Recognition and Artificial Intelligence, 13(07), 1009–1025.
Article MATH Google Scholar
Kang, G., Jiang, L., Wei, Y., Yang, Y., & Hauptmann, A. G. (2020). Contrastive adaptation network for single-and multi-source domain adaptation. IEEE transactions on pattern analysis and machine intelligence
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., & Torralba, A. (2019). Gaze360: Physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921
Krejtz, K., Duchowski, A., Zhou, H., Jörg, S., & Niedzielska, A. (2018). Perceptual evaluation of synthetic gaze jitter. Computer Animation and Virtual Worlds, 29(6), 1745.
Article MATH Google Scholar
Lee, D.-H., et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896
Li, J., Zhou, P., Xiong, C., Hoi, S.C.: Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020)
Liu, W., Ferstl, D., Schulter, S., Zebedin, L., Fua, P., & Leistner, C. (2021) Domain adaptation for semantic segmentation via patch-wise contrastive learning. arXiv preprint arXiv:2104.11056
Liu, Y., Liu, R., Wang, H., & Lu, F. (2021). Generalizing gaze estimation with outlier-guided collaborative adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3835–3844
Liu, A., Liu, X., Yu, H., Zhang, C., Liu, Q., & Tao, D. (2021). Training robust deep neural networks via adversarial noise propagation. IEEE Transactions on Image Processing, 30, 5769–5781.
Article MATH Google Scholar
Liu, H., Qi, J., Li, Z., Hassanpour, M., Wang, Y., Plataniotis, K. N., & Yu, Y. (2024). Test-time personalization with meta prompt for gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 3621–3629.
Article MATH Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Majaranta, P., & Bulling, A. (2014). Eye Tracking and Eye-Based Human-Computer Interaction. In S. Fairclough & K. Gilleade (Eds.), Advances in Physiological Computing Human-Computer Interaction Series. London: Springer. https://doi.org/10.1007/978-1-4471-6392-3_3
Chapter MATH Google Scholar
Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., & Lu, F. (2021). Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition, 110, 107332.
Article MATH Google Scholar
Mei, S., & Zhu, X. (2015). Using machine teaching to identify optimal training-set attacks on machine learners. In: Twenty-Ninth AAAI Conference on Artificial Intelligence
Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582
Mughrabi, M.H., Mutasim, A.K., Stuerzlinger, W., & Batmaz, A.U. (2022). My eyes hurt: Effects of jitter in 3d gaze tracking. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 310–315. IEEE
Park, H.S., Jain, E., Sheikh, Y. (2013). Predicting primary gaze behavior using social saliency fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3503–3510
Park, S., Mello, S. D., Molchanov, P., Iqbal, U., Hilliges, O., & Kautz, J. (2019). Few-shot adaptive gaze estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9368–9377
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30
Terzioğlu, Y., Mutlu, B., & Şahin, E. (2020). Designing social cues for collaborative robots: the role of gaze and breathing in human-robot collaboration. In: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 343–357
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176
Wang, X., & He, K. (2021). Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933
Wang, H., Dong, X., Chen, Z., & Shi, B.E. (2015). Hybrid gaze/eeg brain computer interface for robot arm control on a pick and place task. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1476–1479. IEEE
Wang, H., Wu, X., Huang, Z., & Xing, E.P. (2020). High-frequency component helps explain the generalization of convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8684–8694
Wang, K., Zhao, R., Su, H., & Ji, Q. (2019). Generalizing eye tracking with bayesian adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11907–11916
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.
Article MATH Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742
Xu, M., Wang, H., & Lu, F. (2023). Learning a generalized gaze estimator from gaze-consistent feature. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 3027–3035.
Article MATH Google Scholar
Yang, J., Li, C., An, W., Ma, H., Guo, Y., Rong, Y., Zhao, P., & Huang, J. (2021) Exploring robustness of unsupervised domain adaptation in semantic segmentation. arXiv preprint arXiv:2105.10843
Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., & Hilliges, O. (2020). Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: European Conference on Computer Vision, pp. 365–381 . Springer
Zhang, X., Sugano, Y., Fritz, M., & Bulling, A. (2017). Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE transactions on pattern analysis and machine intelligence, 41(1), 162–175.
Article Google Scholar
Zimmermann, R. S. (2019) Comment on" adv-bnn: Improved adversarial defense through robust bayesian neural network". arXiv preprint arXiv:1907.00895

Download references

Funding

This work was supported by Beijing Natural Science Foundation (L242019), and by the National Natural Science Foundation of China (NSFC) under Grant 62372019.

Author information

Authors and Affiliations

State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University, Beijing, China
Ruicong Liu & Feng Lu
Peng Cheng Laboratory, Shenzhen, China
Haofei Wang & Feng Lu

Authors

Ruicong Liu
View author publications
Search author on:PubMed Google Scholar
Haofei Wang
View author publications
Search author on:PubMed Google Scholar
Feng Lu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Feng Lu.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Communicated by Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, R., Wang, H. & Lu, F. From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components. Int J Comput Vis 133, 1290–1305 (2025). https://doi.org/10.1007/s11263-024-02233-1

Download citation

Received: 07 December 2023
Accepted: 24 August 2024
Published: 30 September 2024
Version of record: 30 September 2024
Issue date: March 2025
DOI: https://doi.org/10.1007/s11263-024-02233-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instant interaction driven adaptive gaze control interface

Domain Adaptation Gaze Estimation by Embedding with Prediction Consistency

Spatial-Aware GAN for Instance-Guided Cross-Spectral Face Hallucination

Explore related subjects

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now