Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation

Mojtahedi, Ramtin; Hamghalam, Mohammad; Do, Richard K. G.; Simpson, Amber L.

doi:10.1007/978-3-031-18814-5_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13594))

Included in the following conference series:

International Workshop on Multiscale Multimodal Medical Imaging

709 Accesses
6 Citations
1 Altmetric

Abstract

Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN’s locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer’s optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at: https://github.com/Ramtin-Mojtahedi/OVTPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Volumetric Universal Lesion Detection Using Light-Weight Pseudo 3D Convolution and Surface Point Regression

Cascaded atrous dual attention U-Net for tumor segmentation

Article 31 October 2020

Automated Detection of Liver Tumor Using Deep Learning

References

Colorectal cancer - statistics. https://www.cancer.net/cancer-types/colorectal-cancer/statistics. Accessed 31 May 2022
Colorectal cancer survival rates: Colorectal cancer prognosis. https://www.cancer.org/cancer/colon-rectal-cancer/detection-diagnosis-staging/survival-rates. Accessed 1 Mar 2022
Liver metastases (secondary liver cancer). https://www.mskcc.org/cancer-care/types/liver-metastases
Valderrama-Treviño, A.I., Barrera-Mera, B., Ceballos-Villalva, J.C., Montalvo-Javé, E.E.: Hepatic metastasis from colorectal cancer. Eur. J. Hepato-Gastroenterol. 7, 166–175 (2016).https://doi.org/10.5005/jp-journals-10018-1241
Wu, W., Wu, S., Zhou, Z., Zhang, R., Zhang, Y.: 3D liver tumor segmentation in CT images using improved fuzzy c-means and graph cuts. BioMed. Res. Int. 1–11 (2017). https://doi.org/10.1155/2017/5207685
Soleymanifard, M., Hamghalam, M.: Segmentation of whole tumor using localized active contour and trained neural network in boundaries. In: 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) (2019)
Google Scholar
Hamghalam, M., Wang, T., Qin, J., Lei, B.: Transforming intensity distribution of brain lesions via conditional GANs for segmentation. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (2020)
Google Scholar
Hamghalam, M., Lei, B., Wang, T.: Convolutional 3D to 2D patch conversion for pixel-wise glioma segmentation in MRI scans. In: Crimi, A., Bakas, S. (eds.) BrainLes 2019. LNCS, vol. 11992, pp. 3–12. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46640-4_1
Chapter Google Scholar
Hamghalam, M., Frangi, A.F., Lei, B., Simpson, A.L.: Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 442–452. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_42
Chapter Google Scholar
Hamghalam, M., Lei, B., Wang, T.: High tissue contrast MRI synthesis using multi-stage attention-GAN for segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4067–4074 (2020)
Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: NNU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2020). https://doi.org/10.1038/s4159202001008z
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. In: ICLR 2021 (2021)
Google Scholar
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv: 2103.14030 (2021)
Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV) (2016). https://doi.org/10.1109/3DV.2016.79
Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13, 1–13 (2022)
Article Google Scholar
Bilic, P., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)
Simpson, A.L., et al.: Computed tomography image texture: a noninvasive prognostic marker of hepatic recurrence after hepatectomy for metastatic colorectal cancer. Ann. Surg. Oncol. 24, 2482–2490 (2017)
Article Google Scholar

Download references

Acknowledgement

This work was funded in part by National Institutes of Health R01CA233888.

Author information

Authors and Affiliations

School of Computing, Queen’s University, Kingston, ON, Canada
Ramtin Mojtahedi, Mohammad Hamghalam & Amber L. Simpson
Department of Electrical Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
Mohammad Hamghalam
Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Richard K. G. Do
Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, ON, Canada
Amber L. Simpson

Authors

Ramtin Mojtahedi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hamghalam
View author publications
You can also search for this author in PubMed Google Scholar
Richard K. G. Do
View author publications
You can also search for this author in PubMed Google Scholar
Amber L. Simpson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amber L. Simpson .

Editor information

Editors and Affiliations

Massachusetts General Hospital, Boston, MA, USA
Xiang Li
University of Sydney, Sydney, Australia
Jinglei Lv
Vanderbilt University, Nashville, TN, USA
Yuankai Huo
Peking University, Beijing, China
Bin Dong
University of Southern California, Los Angeles, CA, USA
Richard M. Leahy
Massachusetts General Hospital, Boston, MA, USA
Quanzheng Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mojtahedi, R., Hamghalam, M., Do, R.K.G., Simpson, A.L. (2022). Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation. In: Li, X., Lv, J., Huo, Y., Dong, B., Leahy, R.M., Li, Q. (eds) Multiscale Multimodal Medical Imaging. MMMI 2022. Lecture Notes in Computer Science, vol 13594. Springer, Cham. https://doi.org/10.1007/978-3-031-18814-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-18814-5_11
Published: 12 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18813-8
Online ISBN: 978-3-031-18814-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation