Abstract
Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN’s locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer’s optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at: https://github.com/Ramtin-Mojtahedi/OVTPS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Colorectal cancer - statistics. https://www.cancer.net/cancer-types/colorectal-cancer/statistics. Accessed 31 May 2022
Colorectal cancer survival rates: Colorectal cancer prognosis. https://www.cancer.org/cancer/colon-rectal-cancer/detection-diagnosis-staging/survival-rates. Accessed 1 Mar 2022
Liver metastases (secondary liver cancer). https://www.mskcc.org/cancer-care/types/liver-metastases
Valderrama-Treviño, A.I., Barrera-Mera, B., Ceballos-Villalva, J.C., Montalvo-Javé, E.E.: Hepatic metastasis from colorectal cancer. Eur. J. Hepato-Gastroenterol. 7, 166–175 (2016).https://doi.org/10.5005/jp-journals-10018-1241
Wu, W., Wu, S., Zhou, Z., Zhang, R., Zhang, Y.: 3D liver tumor segmentation in CT images using improved fuzzy c-means and graph cuts. BioMed. Res. Int. 1–11 (2017). https://doi.org/10.1155/2017/5207685
Soleymanifard, M., Hamghalam, M.: Segmentation of whole tumor using localized active contour and trained neural network in boundaries. In: 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) (2019)
Hamghalam, M., Wang, T., Qin, J., Lei, B.: Transforming intensity distribution of brain lesions via conditional GANs for segmentation. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (2020)
Hamghalam, M., Lei, B., Wang, T.: Convolutional 3D to 2D patch conversion for pixel-wise glioma segmentation in MRI scans. In: Crimi, A., Bakas, S. (eds.) BrainLes 2019. LNCS, vol. 11992, pp. 3–12. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46640-4_1
Hamghalam, M., Frangi, A.F., Lei, B., Simpson, A.L.: Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 442–452. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_42
Hamghalam, M., Lei, B., Wang, T.: High tissue contrast MRI synthesis using multi-stage attention-GAN for segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4067–4074 (2020)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: NNU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2020). https://doi.org/10.1038/s4159202001008z
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: ICLR 2021 (2021)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv: 2103.14030 (2021)
Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV) (2016). https://doi.org/10.1109/3DV.2016.79
Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13, 1–13 (2022)
Bilic, P., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)
Simpson, A.L., et al.: Computed tomography image texture: a noninvasive prognostic marker of hepatic recurrence after hepatectomy for metastatic colorectal cancer. Ann. Surg. Oncol. 24, 2482–2490 (2017)
Acknowledgement
This work was funded in part by National Institutes of Health R01CA233888.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mojtahedi, R., Hamghalam, M., Do, R.K.G., Simpson, A.L. (2022). Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation. In: Li, X., Lv, J., Huo, Y., Dong, B., Leahy, R.M., Li, Q. (eds) Multiscale Multimodal Medical Imaging. MMMI 2022. Lecture Notes in Computer Science, vol 13594. Springer, Cham. https://doi.org/10.1007/978-3-031-18814-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-18814-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18813-8
Online ISBN: 978-3-031-18814-5
eBook Packages: Computer ScienceComputer Science (R0)