Abstract
Purpose
Automatic image segmentation of surgical instruments is a fundamental task in robot-assisted minimally invasive surgery, which greatly improves the context awareness of surgeons during the operation. A novel method based on Mask R-CNN is proposed in this paper to realize accurate instance segmentation of surgical instruments.
Methods
A novel feature extraction backbone is built, which could extract both local features through the convolutional neural network branch and global representations through the Swin-Transformer branch. Moreover, skip fusions are applied in the backbone to fuse both features and improve the generalization ability of the network.
Results
The proposed method is evaluated on the dataset of MICCAI 2017 EndoVis Challenge with three segmentation tasks and shows state-of-the-art performance with an mIoU of 0.5873 in type segmentation and 0.7408 in part segmentation. Furthermore, the results of ablation studies prove that the proposed novel backbone contributes to at least 17% improvement in mIoU.
Conclusion
The promising results demonstrate that our method can effectively extract global representations as well as local features in the segmentation of surgical instruments and improve the accuracy of segmentation. With the proposed novel backbone, the network can segment the contours of surgical instruments’ end tips more precisely. This method can provide more accurate data for localization and pose estimation of surgical instruments, and make a further contribution to the automation of robot-assisted minimally invasive surgery.
Similar content being viewed by others
Availability of data and material
The public dataset used during the current study is available from MICCAI2017 EndoVis Challenge (https://endovissub2017-roboticinstrumentsegmentation.grand-challenge.org/).
Code availability
Code will be publicly available with the publication of this work.
References
Sang H, Wang S, Li J, He C, La Z, Wang X (2011) Control design and implementation of a novel master–slave surgery robot system, MicroHand A. Int J Med Robot Comput Assist Surg 7(3):334–347
Choi B, Jo K, Choi S, Choi J (2017) Surgical-tools detection based on Convolutional Neural Network in laparoscopic robot-assisted surgery. In: 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 1756–1759. https://doi.org/10.1109/EMBC.2017.8037183
Caccianiga G, Mariani A, de Paratesi CG, Menciassi A, De Momi E (2021) Multi-sensory guidance and feedback for simulation-based training in robot assisted surgery: a preliminary comparison of visual, haptic, and visuo-haptic. IEEE Robot Autom Lett 6(2):3801–3808
Trejo F, Hu Y (2018) User performance of VR-based dissection: direct mapping and motion coupling of a surgical tool. In: 2018 IEEE international conference on systems, man, and cybernetics (SMC), pp 3039–3044. https://doi.org/10.1109/SMC.2018.00516
Jo Y, Kim YJ, Moon H, Kim S (2018) Development of virtual reality-vision system in robot-assisted laparoscopic surgery. In: 2018 18th international conference on control, automation and systems (ICCAS), pp 1708–1712
Jin A, Yeung S, Jopling J, Krause J, Azagury D, Milstein A, Fei-Fei L (2018) Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 691–699. https://doi.org/10.1109/WACV.2018.00081
Roberts DW, Strohbehn JW, Hatch JF, Murray W, Kettenberger H (1986) A frameless stereotaxic integration of computerized tomographic imaging and the operating microscope. J Neurosurg 65(4):545–549. https://doi.org/10.3171/jns.1986.65.4.0545
Heilbrun MP, McDonald P, Wiker C, Koehler S, Peters W (1992) Stereotactic localization and guidance using a machine vision technique. Stereotact Funct Neurosurg 58(1–4):94–98. https://doi.org/10.1159/000098979
Guo-Qing W, Arbter K, Hirzinger G (1997) Real-time visual servoing for laparoscopic surgery. Controlling robot motion with color image segmentation. IEEE Eng Med Biol Mag 16(1):40–45. https://doi.org/10.1109/51.566151
Tonet O, Thoranaghatte RU, Megali G, Dario P (2007) Tracking endoscopic instruments without a localizer: a shape-analysis-based approach. Comput Aided Surg 12(1):35–42. https://doi.org/10.3109/10929080701210782
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci
Hasan SMK, Linte CA (2019) U-NetPlus: a modified encoder-decoder U-Net architecture for semantic and instance segmentation of surgical instruments from laparoscopic images. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 7205–7211. https://doi.org/10.1109/EMBC.2019.8856791
Qin F, Li Y, Su YH, Xu D, Hannaford B (2019) Surgical instrument segmentation for endoscopic vision with data fusion of cnn prediction and kinematic pose. In: 2019 international conference on robotics and automation (ICRA), pp 9821–9827. https://doi.org/10.1109/ICRA.2019.8794122
Azqueta-Gavaldon I, Fröhlich FA, Strobl KH, Triebel R (2020) Segmentation of surgical instruments for minimally-invasive robot-assisted procedures using generative deep neural networks. https://arxiv.org/abs/2006.03486
Kurmann T, Márquez-Neila P, Allan M, Wolf S, Sznitman R (2021) Mask then classify: multi-instance segmentation for surgical instruments. Int J Comput Assist Radiol Surg. https://doi.org/10.1007/s11548-021-02404-2
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Paper presented at the Proceedings of the 31st international conference on neural information processing systems, Long Beach, California, USA
Ni ZL, Bian GB, Hou ZG, Zhou XH, Xie XL, Li Z (2020) Attention-guided lightweight network for real-time segmentation of robotic surgical instruments. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9939–9945. https://doi.org/10.1109/ICRA40945.2020.9197425
Zhou X, Guo Y, He W, Song H (2021) Hierarchical attentional feature fusion for surgical instrument segmentation. In: 2021 43rd annual international conference of the IEEE engineering in medicine & biology society (EMBC), pp 3061–3065. https://doi.org/10.1109/EMBC46164.2021.9630553
Forte M-P, Gourishetti R, Javot B, Engler T, Gomez ED, Kuchenbecker KJ (2022) Design of interactive augmented reality functions for robotic surgery and evaluation in dry-lab lymphadenectomy. Int J Med Robot Comput Assist Surg 18(2):e2351. https://doi.org/10.1002/rcs.2351
Qian L, Wu JY, DiMaio SP, Navab N, Kazanzides P (2020) A review of augmented reality in robotic-assisted surgery. IEEE Trans Med Robot Bion 2(1):1–16. https://doi.org/10.1109/TMRB.2019.2957061
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. Paper presented at the CVPR 2021
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: transformers for image recognition at scale. Paper presented at the ICLR2021
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Allan M, Shvets A, Kurmann T, Zhang Z, Duggal R, Su Y-H, Rieke N, Laina I, Kalavakonda N, Bodenstedt S (2019) 2017 robotic instrument segmentation challenge. https://arxiv.org/abs/1902.06426
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Oksuz K, Cam BC, Akbas E, Kalkan S (2021) Rank & sort loss for object detection and instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3009–3018
Milletari F, Navab N, Ahmadi S (2016) V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), pp 565–571. https://doi.org/10.1109/3DV.2016.79
Shvets AA, Rakhlin A, Kalinin AA, Iglovikov VI (2018) Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 624–628. https://doi.org/10.1109/ICMLA.2018.00100
González C, Bravo-Sánchez L, Arbelaez P (2020) Isinet: an instance-based approach for surgical instrument segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 595–605
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Jin Y, Cheng K, Dou Q, Heng P-A (2019) Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 440–448
Kong X, Jin Y, Dou Q, Wang Z, Wang Z, Lu B, Dong E, Liu Y-H, Sun D (2021) Accurate instance segmentation of surgical instruments in robotic surgery: model refinement and cross-dataset evaluation. Int J Comput Assist Radiol Surg 16(9):1607–1614. https://doi.org/10.1007/s11548-021-02438-6
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, Yang Z, Zhang Y, Tao D (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3152247
Acknowledgements
Thanks are to Tao Liang and Mengjie Chen for assistance with the experiments and to Ziqi Liu for valuable discussion. Thanks for Yifei Li's help in polishing the manuscript.
Funding
This study was funded by the National Natural Science Foundation of China (Grant No. 52175028).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, X., Zou, Y., Wang, S. et al. A parallel network utilizing local features and global representations for segmentation of surgical instruments. Int J CARS 17, 1903–1913 (2022). https://doi.org/10.1007/s11548-022-02687-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11548-022-02687-z