Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking

Agethen, Niklas; Rosskamp, Janis; Koller, Tom L.; Klein, Jan; Zachmann, Gabriel

doi:10.1007/s11548-025-03436-8

Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking

Original Article
Published: 17 June 2025

Volume 20, pages 1589–1599, (2025)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

299 Accesses
Explore all metrics

Abstract

Purpose

Marker-based tracking of surgical instruments facilitates surgical navigation systems with high precision, but requires time-consuming preparation and is prone to stains or occluded markers. Deep learning promises marker-less tracking based solely on RGB videos to address these challenges. In this paper, object pose estimation is applied to surgical instrument tracking using a novel deep learning architecture.

Methods

We combine pose estimation from multiple views with recurrent neural networks to better exploit temporal coherence for improved tracking. We also investigate the performance under conditions where the instrument is obscured. We enhance an existing pose (distribution) estimation pipeline by a spatio-temporal feature extractor that allows for feature incorporation along an entire sequence of frames.

Results

On a synthetic dataset we achieve a mean tip error below 1.0 mm and an angle error below 0.2$^{\circ }$ using a four-camera setup. On a real dataset with four cameras we achieve an error below 3.0 mm. Under limited instrument visibility our recurrent approach can predict the tip position approximately 3 mm more precisely than the non-recurrent approach.

Conclusion

Our findings on a synthetic dataset of surgical instruments demonstrate that deep-learning-based tracking using multiple cameras simultaneously can be competitive with marker-based systems. Additionally, the temporal information obtained through the architecture’s recurrent nature is advantageous when the instrument is occluded. The synthesis of multi-view and recurrence has thus been shown to enhance the reliability and usability of high-precision surgical pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of single-stage vision models for pose estimation of surgical instruments

Article 30 April 2023

6D Object Pose Tracking for Orthopedic Surgical Training Using Visual-Inertial Sensor Fusion

Multi-target Attachment for Surgical Instrument Tracking

Notes

https://cgvr.informatik.uni-bremen.de/research/ai_surgical_navigation/.

References

Mezger U, Jendrewski C, Bartels M (2013) Navigation in surgery. Langenbeck’s Archives of Surgery 398(4):501–514. https://doi.org/10.1007/s00423-013-1059-4
Article PubMed PubMed Central Google Scholar
Joskowicz L, Hazan EJ (2016) Computer Aided Orthopaedic Surgery: Incremental shift or paradigm change? Med Image Anal 33:84–90. https://doi.org/10.1016/j.media.2016.06.036
Article PubMed Google Scholar
Tzelnick S, Rampinelli V, Sahovaler A, Franz L, Chan HHL, Daly MJ, Irish JC (2023) Skull-Base Surgery-A Narrative Review on Current Approaches and Future Developments in Surgical Navigation. J Clin Med 12(7):2706. https://doi.org/10.3390/jcm12072706
Article PubMed PubMed Central Google Scholar
Hein J, Seibold M, Bogo F, Farshad M, Pollefeys M, Fürnstahl P, Navab N (2021) Towards markerless surgical tool and hand pose estimation. Int J Comput Assisted Radiology Surgery 16(5):799–808. https://doi.org/10.1007/s11548-021-02369-2
Article Google Scholar
Hein J, Cavalcanti N, Suter D, Zingg L, Carrillo F, Calvet L, Farshad M, Navab N, Pollefeys M, Fürnstahl P (2025) Next-generation surgical navigation: Marker-less multi-view 6DoF pose estimation of surgical instruments. Med Image Anal. https://doi.org/10.1016/j.media.2025.103613
Article PubMed Google Scholar
Labbe, Y., Carpentier, J., Aubry, M., Sivic, J.: Cosypose: Consistent multi-view multi-object 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Haugaard, R.L., Iversen, T.M.: Multi-view object pose estimation from correspondence distributions and epipolar geometry. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1786–1792 (2023). https://doi.org/10.1109/ICRA48891.2023.10161514
Haugaard, R.L., Hagelskjar, F., Iversen, T.M.: SpyroPose: SE(3) Pyramids for Object Pose Distribution Estimation . In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2074–2083. IEEE Computer Society, Los Alamitos, CA, USA (2023). https://doi.org/10.1109/ICCVW60793.2023.00222
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16606–16616 (2021). https://doi.org/10.1109/CVPR46437.2021.01634 . ISSN: 2575-7075
Su, Y., Saleh, M., Fetzer, T., Rambach, J., Navab, N., Busam, B., Stricker, D., Tombari, F.: ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6728–6738 (2022). https://doi.org/10.1109/CVPR52688.2022.00662 . ISSN: 2575-7075
Xu Y, Lin K-Y, Zhang G, Wang X, Li H (2024) RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization. IEEE Trans Pattern Anal Mach Intell 46(7):4669–4683. https://doi.org/10.1109/TPAMI.2024.3360181
Article PubMed Google Scholar
Luo, Y., Ren, J., Wang, Z., Sun, W., Pan, J., Liu, J., Pang, J., Lin, L.: LSTM Pose Machines. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5207–5215 (2018). https://doi.org/10.1109/CVPR.2018.00546 . ISSN: 2575-7075
Ballas, N., Yao, L., Pal, C., Courville, A.C.: Delving deeper into convolutional networks for learning video representations. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings (2016)
Wang, X., Xie, W., Song, J.: Learning spatiotemporal features with 3dcnn and convgru for video anomaly detection. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 474–479 (2018). https://doi.org/10.1109/ICSP.2018.8652354
Rosskamp, J., Weller, R., Zachmann, G.: Effects of markers in training datasets on the accuracy of 6d pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 4457–4466 (2024)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90 . ISSN: 1063-6919

Download references

Funding

The project was funded by the University of Bremen Research Alliance (UBRA).

Author information

Niklas Agethen and Janis Rosskamp have contributed equally to this work.

Authors and Affiliations

Fraunhofer MEVIS, Max-von-Laue-Str. 2, 28359, Bremen, Germany
Niklas Agethen, Tom L. Koller & Jan Klein
University of Bremen, Bibliothekstraße 1, 28359, Bremen, Germany
Janis Rosskamp, Tom L. Koller & Gabriel Zachmann

Authors

Niklas Agethen
View author publications
Search author on:PubMed Google Scholar
Janis Rosskamp
View author publications
Search author on:PubMed Google Scholar
Tom L. Koller
View author publications
Search author on:PubMed Google Scholar
Jan Klein
View author publications
Search author on:PubMed Google Scholar
Gabriel Zachmann
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Niklas Agethen.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Agethen, N., Rosskamp, J., Koller, T.L. et al. Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking. Int J CARS 20, 1589–1599 (2025). https://doi.org/10.1007/s11548-025-03436-8

Download citation

Received: 20 January 2025
Accepted: 19 May 2025
Published: 17 June 2025
Version of record: 17 June 2025
Issue date: August 2025
DOI: https://doi.org/10.1007/s11548-025-03436-8

Keywords

Profiles

Gabriel Zachmann View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking

Abstract

Purpose

Methods

Results

Conclusion

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of single-stage vision models for pose estimation of surgical instruments

6D Object Pose Tracking for Orthopedic Surgical Training Using Visual-Inertial Sensor Fusion

Multi-target Attachment for Surgical Instrument Tracking

Explore related subjects

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now