这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, i.e., high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

ETH-XGaze dataset is available at https://ait.ethz.ch/projects/2020/ETH-XGaze. Gaze360 dataset is available at http://gaze360.csail.mit.edu. MPIIGaze dataset is available at https://www.mpi-inf.mpg.de/MPIIGaze. EyeDiap dataset is available at https://www.idiap.ch/dataset/eyediap. The source code will be released.

References

  • Admoni, H., & Scassellati, B. (2017). Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction, 6(1), 25–63.

  • Biggio, B., Nelson, B., & Laskov, P. (2012) Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389

  • Cai, X., Zeng, J., Shan, S., & Chen, X. (2023). Source-free adaptive gaze estimation by uncertainty reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22035–22045

  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR

  • Cheng, Y., & Lu, F. (2022) Gaze estimation using transformer. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3341–3347 . IEEE

  • Cheng, Y., Lu, F., & Zhang, X. (2018). Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 100–115

  • Cheng, Y., Bao, Y., & Lu, F. (2022). Puregaze: Purifying gaze feature for generalizable gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 436–443.

    Article  Google Scholar 

  • Cheng, Y., Huang, S., Wang, F., Qian, C., & Lu, F. (2020). A coarse-to-fine adaptive network for appearance-based gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 10623–10630.

    Article  MATH  Google Scholar 

  • Cui, S., Wang, S., Zhuo, J., Su, C., Huang, Q., & Tian, Q. (2020) Gradually vanishing bridge for adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12455–12464

  • Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive processing, 8(3), 151–158.

    Article  MATH  Google Scholar 

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020) An image is worth 16\(\times \)16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  • Funes Mora, K.A., Monay, F., & Odobez, J.-M. (2014). Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems 27

  • Goodfellow, I.J., Shlens, J., Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572

  • Grauman, K., Betke, M., Gips, J., & Bradski, G.R. (2001). Communication via eye blinks-detection and duration analysis in real time. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1,p. IEEE

  • Guo, Z., Yuan, Z., Zhang, C., Chi, W., Ling, Y., & Zhang, S. (2020). Domain adaptation gaze estimation by embedding with prediction consistency. In: Proceedings of the Asian Conference on Computer Vision

  • Hallinan, P. W. (1991). Recognizing human eyes. Geometric Methods in Computer Vision,1570, 214–226. SPIE

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  • Huang, J., & Wechsler, H. (1999). Eye detection using optimal wavelet packets and radial basis functions (rbfs). International Journal of Pattern Recognition and Artificial Intelligence, 13(07), 1009–1025.

    Article  MATH  Google Scholar 

  • Kang, G., Jiang, L., Wei, Y., Yang, Y., & Hauptmann, A. G. (2020). Contrastive adaptation network for single-and multi-source domain adaptation. IEEE transactions on pattern analysis and machine intelligence

  • Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., & Torralba, A. (2019). Gaze360: Physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921

  • Krejtz, K., Duchowski, A., Zhou, H., Jörg, S., & Niedzielska, A. (2018). Perceptual evaluation of synthetic gaze jitter. Computer Animation and Virtual Worlds, 29(6), 1745.

    Article  MATH  Google Scholar 

  • Lee, D.-H., et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896

  • Li, J., Zhou, P., Xiong, C., Hoi, S.C.: Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020)

  • Liu, W., Ferstl, D., Schulter, S., Zebedin, L., Fua, P., & Leistner, C. (2021) Domain adaptation for semantic segmentation via patch-wise contrastive learning. arXiv preprint arXiv:2104.11056

  • Liu, Y., Liu, R., Wang, H., & Lu, F. (2021). Generalizing gaze estimation with outlier-guided collaborative adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3835–3844

  • Liu, A., Liu, X., Yu, H., Zhang, C., Liu, Q., & Tao, D. (2021). Training robust deep neural networks via adversarial noise propagation. IEEE Transactions on Image Processing, 30, 5769–5781.

    Article  MATH  Google Scholar 

  • Liu, H., Qi, J., Li, Z., Hassanpour, M., Wang, Y., Plataniotis, K. N., & Yu, Y. (2024). Test-time personalization with meta prompt for gaze estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 3621–3629.

    Article  MATH  Google Scholar 

  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083

  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083

  • Majaranta, P., & Bulling, A. (2014). Eye Tracking and Eye-Based Human-Computer Interaction. In S. Fairclough & K. Gilleade (Eds.), Advances in Physiological Computing Human-Computer Interaction Series. London: Springer. https://doi.org/10.1007/978-1-4471-6392-3_3

    Chapter  MATH  Google Scholar 

  • Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., & Lu, F. (2021). Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition, 110, 107332.

    Article  MATH  Google Scholar 

  • Mei, S., & Zhu, X. (2015). Using machine teaching to identify optimal training-set attacks on machine learners. In: Twenty-Ninth AAAI Conference on Artificial Intelligence

  • Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582

  • Mughrabi, M.H., Mutasim, A.K., Stuerzlinger, W., & Batmaz, A.U. (2022). My eyes hurt: Effects of jitter in 3d gaze tracking. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 310–315. IEEE

  • Park, H.S., Jain, E., Sheikh, Y. (2013). Predicting primary gaze behavior using social saliency fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3503–3510

  • Park, S., Mello, S. D., Molchanov, P., Iqbal, U., Hilliges, O., & Kautz, J. (2019). Few-shot adaptive gaze estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9368–9377

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199

  • Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30

  • Terzioğlu, Y., Mutlu, B., & Şahin, E. (2020). Designing social cues for collaborative robots: the role of gaze and breathing in human-robot collaboration. In: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pp. 343–357

  • Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176

  • Wang, X., & He, K. (2021). Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933

  • Wang, H., Dong, X., Chen, Z., & Shi, B.E. (2015). Hybrid gaze/eeg brain computer interface for robot arm control on a pick and place task. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1476–1479. IEEE

  • Wang, H., Wu, X., Huang, Z., & Xing, E.P. (2020). High-frequency component helps explain the generalization of convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8684–8694

  • Wang, K., Zhao, R., Su, H., & Ji, Q. (2019). Generalizing eye tracking with bayesian adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11907–11916

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.

    Article  MATH  Google Scholar 

  • Wu, Z., Xiong, Y., Yu, S.X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742

  • Xu, M., Wang, H., & Lu, F. (2023). Learning a generalized gaze estimator from gaze-consistent feature. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 3027–3035.

    Article  MATH  Google Scholar 

  • Yang, J., Li, C., An, W., Ma, H., Guo, Y., Rong, Y., Zhao, P., & Huang, J. (2021) Exploring robustness of unsupervised domain adaptation in semantic segmentation. arXiv preprint arXiv:2105.10843

  • Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., & Hilliges, O. (2020). Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: European Conference on Computer Vision, pp. 365–381 . Springer

  • Zhang, X., Sugano, Y., Fritz, M., & Bulling, A. (2017). Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE transactions on pattern analysis and machine intelligence, 41(1), 162–175.

    Article  Google Scholar 

  • Zimmermann, R. S. (2019) Comment on" adv-bnn: Improved adversarial defense through robust bayesian neural network". arXiv preprint arXiv:1907.00895

Download references

Funding

This work was supported by Beijing Natural Science Foundation (L242019), and by the National Natural Science Foundation of China (NSFC) under Grant 62372019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Lu.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Communicated by Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, R., Wang, H. & Lu, F. From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components. Int J Comput Vis 133, 1290–1305 (2025). https://doi.org/10.1007/s11263-024-02233-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02233-1

Keywords