Multi-dimensional consistency learning between 2D Swin U-Net and 3D U-Net for intestine segmentation from CT volume

An, Qin; Oda, Hirohisa; Hayashi, Yuichiro; Kitasaka, Takayuki; Uchida, Hiroo; Hinoki, Akinari; Suzuki, Kojiro; Takimoto, Aitaro; Oda, Masahiro; Mori, Kensaku

doi:10.1007/s11548-024-03252-6

Multi-dimensional consistency learning between 2D Swin U-Net and 3D U-Net for intestine segmentation from CT volume

Original Article
Open access
Published: 22 February 2025

Volume 20, pages 723–733, (2025)
Cite this article

You have full access to this open access article

Download PDF

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Multi-dimensional consistency learning between 2D Swin U-Net and 3D U-Net for intestine segmentation from CT volume

Download PDF

1101 Accesses
Explore all metrics

Abstract

Purpose

The paper introduces a novel two-step network based on semi-supervised learning for intestine segmentation from CT volumes. The intestine folds in the abdomen with complex spatial structures and contact with neighboring organs that bring difficulty for accurate segmentation and labeling at the pixel level. We propose a multi-dimensional consistency learning method to reduce the insufficient intestine segmentation results caused by complex structures and the limited labeled dataset.

Methods

We designed a two-stage model to segment the intestine. In stage 1, a 2D Swin U-Net is trained using labeled data to generate pseudo-labels for unlabeled data. In stage 2, a 3D U-Net is trained using labeled and unlabeled data to create the final segmentation model. The model comprises two networks from different dimensions, capturing more comprehensive representations of the intestine and potentially enhancing the model’s performance in intestine segmentation.

Results

We used 59 CT volumes to validate the effectiveness of our method. The experiment was repeated three times getting the average as the final result. Compared to the baseline method, our method improved 3.25% Dice score and 6.84% recall rate.

Conclusion

The proposed method is based on semi-supervised learning and involves training both 2D Swin U-Net and 3D U-Net. The method mitigates the impact of limited labeled data and maintains consistncy of multi-dimensional outputs from the two networks to improve the segmentation accuracy. Compared to previous methods, our method demonstrates superior segmentation performance.

M U-Net: Intestine Segmentation Using Multi-dimensional Features for Ileus Diagnosis Assistance

HTSeg: Hybrid Two-Stage Segmentation Framework for Intestine Segmentation from CT Volumes

Intestine Segmentation with Small Computational Cost for Diagnosis Assistance of Ileus and Intestinal Obstruction

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Intestine obstruction [1,2,3] is a serious disease often resulting from tumors and intestinal twisting. Computed tomography (CT) is a powerful technology offering detailed intestinal information, enabling clinicians to diagnose diseases by checking CT volumes. However, the process is time-consuming, given the hundreds of slices in a CT volume. Intestine segmentation helps diagnose intestinal diseases and aids in facilitating the development of treatment plans.

Complex structure and contacting neighboring organs pose challenges for intestine segmentation. Currently, there are some thresholding-based methods [4,5,6] for organ segmentation, which mainly utilizes the intensity of the image. Full-supervision learning [7,8,9,10] is used for intestine segmentation. An obvious drawback of the full-supervision method is the substantial requirement for pixel-level labeled data to achieve satisfactory results. However, labeling medical images is time-consuming because it needs to be done by clinicians slice by slice.

For the limited labeled data problem, semi-supervision learning [11] has captured researchers’ attention in organ segmentation. Pseudo-labeling [12, 13] and consistency learning [14, 15] are primary strategies in semi-supervised learning. We introduce these strategies to intestine segmentation. The proposed method utilizes a 2D transformer generating pseudo-labels for unlabeled data, and then, a 3D convolutional neural network (CNN) is trained using the limited labeled data and ample unlabeled data with pseudo-labels. 2D Swin U-Net [16] is developed based on the vision transformer, which can capture long-range dependencies and enhance global contextual information by self-attention mechanism, improving the segmentation results of complex structures in medical images. 3D U-Net [17] is a classical network for medical segmentation that can effectively utilize the intra-slice and inter-slice features.

Qin, et al. [18] employed bidirectional teaching with two improved 3D U-Nets generating pseudo-labels for intestine segmentation. However, the pseudo-labels are unreliable since the networks with limited performances due to training with limited labeled data in the early stage of training. In contrast to this method [18], we train a 2D Swin U-Net with large-scale 2D slices from 3D CT volumes to generate pseudo-labels avoiding the pseudo-labels unreliable in the early stage of training and leverage the consistency learning between the transformer and CNN.

Our method trains a two-stage network and combines it with multi-dimensional consistency learning to segment intestines from CT volumes. The contributions of this paper are summarized as:

We propose a novel two-stage network, which utilizes large-scale labeled slices to train a 2D Swin U-Net for generating pseudo-labels avoiding unreliable pseudo-labels generated by 3D networks with limited labeled data, and a 3D U-Net is trained using both labeled and unlabeled data, preventing the neglect of inter-slice features just using the 2D network.
We use a multi-dimensional consistency learning for a new semi-supervision strategy, which not only effectively utilizes unlabeled data by pseudo-labels but also improves the model’s robustness by the consistency between segmentation results from 3D U-Net and pseudo-labels from 2D Swin U-Net by consistency learning.

Method

Overview

Our method aims to segment the intestine from CT volumes that train two networks in two steps. In step 1, we utilize labeled slices to train 2D Swin U-Net [16]. In step 2, we employ a limited number of labeled data and large-scale unlabeled data to train the 3D U-Net [17]. For the labeled data, we use a supervised loss function to update the model’s parameters. For the unlabeled data, firstly the trained 2D Swin U-Net is used to generate pseudo-labels. Then, we use an unsupervised loss function to calculate the loss keeping consistency between predictions of unlabeled data from 3D U-Net and corresponding pseudo-labels from 2D Swin U-Net. For testing, we use trained 3D U-Net to infer the patches cropped from the testing data and merge the patches to CT volumes as the final output. The flowchart of our method is shown in Fig. 1.

Two-step network with multi-dimensional consistency learning

Two-step network

To improve the accuracy of intestine segmentation, we have develop a novel multi-dimensional consistency learning approach. In general, segmentation networks require an ample amount of labeled data to achieve good performance. Considering the use of limited labeled data to train a network that generates pseudo-labels, the resulting network may generate low-quality pseudo-labels due to poor performance. CT volume is a 3D image containing many 2D slices. Therefore, the proposed method utilizes 2D CT slices in the first step and 3D patches in the second step. The structure of the two-step network is shown in Fig. 2.

Two-step network contains two networks: 2D Swin U-Net $\left( f^{s}(\cdot ) \right) $ and 3D U-Net $\left( f^{c}(\cdot ) \right) $. 2D Swin U-Net is the first symmetrical U-shape network based on the transformer, implementing self-attention in the encoder. 3D U-Net is a classical medical image segmentation model for organs with relatively simple spatial structures. However, it exhibits inadequate intestine segmentation due to the intestine’s complex structure and limited labeled data.

The proposed network uses the slices from labeled data $({\textbf {D}}^l_{slice})$ and corresponding ground truth to train 2D Swin U-Net. Getting slices operation is shown in Fig. 3. Then, the trained model generates the pseudo-labels for the slices from unlabeled data $({\textbf {D}}^u_{slice})$

$$\begin{aligned} {\textbf {{P}}}^{s}_{u} = f^{s}({\textbf {D}}^u_{slice}), \end{aligned}$$

(1)

where ${\textbf {{P}}}^{s}_{u}$ represent 2D Swin U-Net’s $\left( f^{s}(\cdot ) \right) $ prediction result of unlabeled data. Note that the trained 2D Swin U-Net takes slices as input to get outputs, and we combine these outputs into a patch as the final output. Based on the prediction ${\textbf {{P}}}^{s}_{u}$, the pseudo-labels $({\textbf {P}}^{*}_{u})$ for the unlabeled data are generated by the argmax operation that converts probabilities to discrete class labels.

For the 3D U-Net, we directly use patches from labeled and unlabeled data as the input for training. The prediction of the 3D U-Net for the labeled and unlabeled data$~{\textbf {{P}}}^{c}_{l}$ and ${\textbf {{P}}}^{c}_{u}$ is represented by

$$\begin{aligned} {\textbf {{P}}}^{c}_{l} = f^{c}({\textbf {D}}^l_{patch}),~ {\textbf {{P}}}^{c}_{u} = f^{c}({\textbf {D}}^u_{patch}), \end{aligned}$$

(2)

where ${\textbf {D}}^l_{patch}$ and $~{\textbf {D}}^u_{patch}$ represent the 3D patches cropped from labeled and unlabeled data, respectively.

In multi-dimensional consistency learning, the two networks collaborate to enable the model to leverage the strengths of two different architectures, effectively improving the model’s learning ability and achieving better segmentation performance.

Multi-dimensional consistency learning

In the proposed method, the unsupervised loss is calculated using the predictions from 3D U-Net and the pseudo-labels from 2D Swin U-Net. Multi-dimensional consistency learning is used to maintain consistency between them. The process is represented by the green dashed lines in Fig. 2.

Loss function

The proposed method involves training two networks, each corresponding to a different loss function. The 2D Swin U-Net is trained using a supervised loss function, while the 3D U-Net is trained using both supervised and unsupervised loss functions. The overview of calculated loss is shown in Fig. 4.

We just use supervised loss $L_{sup}$ to train a 2D Swin U-Net. The supervised loss consists of cross-entropy (CE) loss $L_{ce}$ and Dice loss $L_{dice}$

$$\begin{aligned} L_{sup} ({\textbf {{P}}}^{s}_{l},{\textbf {G}}) = \alpha L_{ce}({\textbf {{P}}}^{s}_{l},{\textbf {G}})+ (1-\alpha ) L_{dice}({\textbf {{P}}}^{s}_{l},{\textbf {G}}), \end{aligned}$$

(3)

where ${\textbf {{P}}}^{s}_{l}$ denotes the 2D Swin U-Net’s prediction result, and ${\textbf {{G}}}$ denotes the ground truth. We experimentally set the weight $\alpha $ to 0.3.

To train the 3D U-Net, we use the supervised loss $L_{sup}$ for labeled data and unsupervised loss $L_{un}$ for unlabeled data. The supervised loss is the same as for training 2D Swin U-Net. We just use Dice loss as unsupervised loss for the unlabeled data to avoid unstable training process due to the serious class imbalance.

$$\begin{aligned} L_{sup} ({\textbf {{P}}}^{c}_{l},{\textbf {G}}) = \alpha L_{ce}({\textbf {{P}}}^{c}_{l},{\textbf {G}})+ (1-\alpha ) L_{dice}({\textbf {{P}}}^{c}_{l},{\textbf {G}}), \end{aligned}$$

(4)

$$\begin{aligned} L_{un}({\textbf {{P}}}^{c}_{u},{\textbf {{P}}}^{*}_{u}) = L_{dice}({\textbf {{P}}}^{c}_{u},{\textbf {{P}}}^{*}_{u}), \end{aligned}$$

(5)

where$~{\textbf {{P}}}^{c}_{l}$ and ${\textbf {{P}}}^{c}_{u}$ represent 3D U-Net’s prediction results of labeled and unlabeled data, and ${\textbf {{P}}}^{*}_{u}$ represents pseudo-labels obtained from 2D Swin U-Net for ${\textbf {{P}}}^{c}_{u}$. The total loss for 3D U-Net is defined as

$$\begin{aligned} L_{total} \left( {\textbf {{P}}}^{c}_{l},{\textbf {G}}, {\textbf {{P}}}^{*}_{u},{\textbf {{P}}}^{c}_{u} \right) = L_{sup} \left( {\textbf {{P}}}^{c}_{l},{\textbf {G}} \right) + L_{un} \left( {\textbf {{P}}}^{*}_{u},{\textbf {{P}}}^{c}_{u} \right) . \end{aligned}$$

(6)

Experiments and results

Dataset and experimental setup

We used an intestine dataset consisting of 171 cases of ileus patients’ CT volumes with size 512 $\times $ 512 $\times $ (198–546) voxels, resolution (0.549–0.904 mm/voxels) $\times $ (0.549–0.904 mm/voxels) $\times $ (1.0–2.0 mm/voxels). These CT volumes were interpolated to isotropic voxel resolution ($\hbox {1mm}^3$/voxels). Interpolated volume sizes were (281$\times $281)–(463$\times $463) $\times $ (396–762) voxels. The training dataset with 85 CT volumes includes 13 densely labeled data and 72 unlabeled data. 27 sparsely labeled CT volumes were used for validation. Testing dataset with 59 CT volumes includes 58 sparsely labeled data and one densely labeled data for 3D visualization of a result. CT volumes that have labels of the intestine in some discontinuous slices are called sparsely labeled data. For one sparsely labeled data, with the percentage of labeled slices in one CT volume ranging from 1.00% to 5.31%, the number of labeled slices ranges from 6 to 29. CT volumes that have labels of the intestine in hundreds of continuous slices but not every slice were called densely labeled data. For one densely labeled data, the percentage of labeled slices in one CT volume ranges from 35.73% to 64.43%, and the number of labeled slices ranges from 154 to 319.

For training, we utilized a sliding window of size 256 $\times $ 256 $\times $ 16 with a stride of 128 $\times $ 128 $\times $ 8 to crop patches from the training dataset after the isotropic interpolation. We divided labeled patches (cropped from labeled data) into slices and applied flipping as data augmentation to generate training data for the 2D Swin U-Net. Labeled and unlabeled patches (cropping from unlabeled data) were used for training 3D U-Net, and flipping and cut-out were applied to them as data augmentation. We quantitatively evaluated the segmentation results using three metrics: 1) Dice; 2) recall; and 3) precision rates.

We conducted a series of experiments, including a contrasting experiment with previous methods (Ex 1), an ablation study of supervision loss (Ex 2), an experiment of changing the parameter in supervision loss (Ex 3), and an ablation study of selecting first and second models (Ex 4) to validate the performance of our method. All experiments were repeated three times with different random seeds for training, demonstrating the robustness of our model and proving that it performs well under different initializations. The averaged result of three times experiments was considered the final result for each testing case, and we calculated the average and standard deviation (SD) from the final results along all the testing cases (59 cases).

The p value by the Wilcoxon signed-rank test on the Dice score was calculated to prove the validation of our method. For the sparsely labeled data, these metrics were calculated only in labeled slices.

The proposed method was implemented using the PyTorch and executed on an NVIDIA A100 80 G GPU. We trained the model up to 500 epochs and used the early stopping when the best Dice score of validation remained unchanged for 30 epochs. The SGD optimizer was employed, and the poly learning rate strategy was used to adjust the learning rate with an initial value of 0.01.

Results

The quantitative results of Ex 1 are presented in Table 1, and we can see that the 81.75% of Dice score and the 7.65% of SD from the proposed method were the best performances. We conducted the Wilcoxon signed-rank test when the model was trained using 13 labeled cases, where the $\star $ denotes the p values were $<0.05$ among those methods. The segmentation results of Ex 1 are shown in Figs. 5 and 6. The results of training the proposed method using different number of labeled data are shown in Fig. 10. Figure 5 presents the 3D segmentation results, where red, green, and blue colors represent true positives, false positives, and false negatives, respectively. Since we utilize one densely labeled data to illustrate the 3D result, certain intestine regions lack labels in some slices. However, these methods can segment unlabeled intestine regions, depicted in gray. The 2D segmentation results are shown in Fig. 6. We can see from the zoomed regions in the yellow boxes that the proposed method improved the accuracy around the boundary. Figure 7 shows the distribution of Dice scores for each method on the testing dataset, and we calculated the p value when training with 13 labeled data, $\star $ means p values were $<0.05$ among those methods.

The results of Ex 2 are shown in Table 2, revealing that the proposed method with CE+Dice loss as the supervised loss function achieved the best result. The results of Ex 3 are shown in Fig. 8. We show the change in the Dice score, precision, and recall rates with blue, orange, and green colors, respectively. We can see that the best results are achieved when $\alpha =0.3$. Furthermore, the result of our method from three different planes is shown in Fig. 9. The results of Ex 4 are shown in Tables 3 and 4, revealing that the proposed method uses 2D Swin U-Net as the first step model and 3D U-Net as the second-step model achieved the best result in our intestine segmentation task.

Table 1 We compared the quantitative results of our proposed method with previous methods, including two full-supervised methods (3D U-Net, 2D Swin U-Net) and three semi-supervised methods (EM, MT, and CPS)

Full size table

Table 2 To validate the effectiveness of the loss function, we use the different loss functions in the proposed method

Full size table

Discussion

Our proposed method introduces multi-dimensional consistency learning for intestine segmentation. Firstly, in our method the 2D Swin U-Net was trained to generate pseudo-labels for unlabeled data, addressing the limited labeled data problem. Subsequently, we use limited labeled data and large-scale unlabeled data to train the 3D U-Net. For the unlabeled data, we use unsupervised loss to maintain consistency between pseudo-labels from the 2D Swin U-Net and the 3D U-Net’s prediction. A series of experiments have shown that our proposed method achieved competitive results.

The 3D and 2D segmentation results in Figs. 5 and 6 show that our method segmented more intestine regions. Since the proposed method employed unlabeled data by pseudo-labeling, consistency learning can effectively improve the segmentation results by reducing the effect of limited labeled data.

Table 1 indicates that the proposed method exhibits stable and competitive performance, characterized by a high Dice score and a low SD value. The 2D Swin U-Net showed higher quantitative results than the 3D U-Net, indicating that the 2D method outperformed the 3D network using limited labeled data. The 3D U-Net had the lowest Dice score because it was trained just using 13 labeled CT volumes, leading to underfitting, while the 2D Swin U-Net was trained using 3144 slices from 13 CT volumes. The 2D Swin U-Net was trained using sufficient data and generated more reliable pseudo-labels. Then, limited labeled data and large-scale unlabeled data, including reliable pseudo-labels, were used to train a 3D U-Net, which utilizes the advantages of two architectures, improving the network’s performance. Although our method slightly outperforms the 2D Swin U-Net with increased labeled data, The bar chart, in Fig. 10, shows the histogram of the Dice score when the 2D Swin U-Net and the proposed method were trained using different numbers of labeled cases in the training dataset. The result highlights our method’s suitability for tasks with few labeled cases. We also calculate the p value based on the Wilcoxon signed-rank test between the two methods and results < 0.05. Notably, our approach outperforms stand-alone 2D Swin U-Net and 3D U-Net models, underscoring the benefits of the extra dimension and pseudo-labels in enhancing model performance. Additionally, we compared our method with three classical semi-supervised methods (EM [19], MT [20], and CPS [13]), all using the 3D U-Net as their backbone.

EM makes the model more confident by reducing uncertainty in predicted class probabilities, encouraging definitive outputs. MT guides a student model with a teacher to ensure consistent learning from labeled and unlabeled data. CPS trains two models together, generating pseudo-labels for the other, leveraging consistency in predictions on unlabeled data. Our proposed method achieved the best results compared with them. In Fig. 7, $\star $ means the p value $<0.05$ when they were trained using 13 labeled data, which indicated the validity of the proposed method.

In Table 2, the ablation study about the loss function shows that the combination of the CE and Dice losses as the supervised loss achieved the best result, which compromises benefits from each loss function. In Fig. 8, we explored the effect of the parameter in the supervised loss when the parameter $\alpha = 0.3$ with a better result. The CE loss assigns higher likelihoods to the correct class, and the Dice loss evaluates both false positives and false negatives in the segmentation results. Combination of them as a loss function and experimentally setting appropriate ratios of them was conducive to improving segmentation accuracy.

We propose a two-step semi-supervised method based on the transformer and CNN two framework. In our method, the first step’s model is trained by labeled slices and generating pseudo-labels. Therefore, accuracy should be the primary concern. We chose three 2D transformer-based models (2D Swin U-Net, Trans U-Net, and UTNet) as candidates and trained them using 3144 labeled slices. The results in Table 3 show that the 2D Swin U-Net achieved the best Dice score and has a relatively small model size. Although the UTNet is the lightest model, it has the worst accuracy. TransUNet is the largest model but not the most accurate. Therefore, 2D Swin U-Net is the best model for the first step.

Table 3 Ablation studies different models as the first step

Full size table

Table 4 Ablation studies different models as the second step

Full size table

For the second step, we selected three 3D transformer-based models (3D Swin U-Net, Swin UNetr, and UNetr) and two 2D models (2D Swin U-Net, U-Net) and the 3D U-Net. We compared the accuracy and size of the models to select the best one. Table 4 shows that the best performance is achieved using the 3D U-Net as the second-step model. We argue that the other three 3D models have complex structures, requiring more labeled data to perform well in full-supervised learning tasks. In our approach, the second-step networks are trained with a small amount of labeled data and unlabeled data with pseudo-labels, a situation that does not take good advantage of these networks. Therefore, the 3D U-Net with a simple structure is more suitable as the second-step model. For the 2D models as the second step, when we use the 2D Swin U-Net as the second step, the model’s Dice score even slightly decreases compared to just using the 2D Swin U-Net. Although the 2D U-Net model is lightweight, it achieved low accuracy. Therefore, using 2D models as the second step is insufficient compared with the proposed methods for the intestine segmentation task.

In Fig. 9, we can see that some mis-segmentation still exists at the boundary part, which may be caused by intestines contacting neighboring organs in the boundary. The fine-tuning strategy may solve the problem.

Conclusion

We propose a multi-dimensional consistency learning between 2D Swin U-Net and 3D U-Net to segment the intestine from CT volumes. The limited number of labeled data, complex structure, and contact with neighboring organs are great challenges for intestine segmentation. We design a two-stage network, and firstly, we train a 2D Swin U-Net to generate pseudo-labels for unlabeled data reducing the effect of the limited labeled data. Secondly, labeled and unlabeled data are used to train a 3D U-Net. The experimental results demonstrated good performances.

In the contrasting experiments, our method achieved the best performance in the intestine segmentation. Although the proposed method has achieved some results, there is still some mis-segmentation at the boundary part. In the future, we will focus on reducing the mis-segmentation in the boundary by using a fine-tuning strategy.

References

Bower KL, Lollar DI, Williams SL, Adkins FC, Luyimbazi DT, Bower CE (2018) Small bowel obstruction. Surg Clin 98(5):945–971
Google Scholar
Sinicrope F (2003) Ileus and bowel obstruction, 6th edn. Holland-Frei Cancer Medicine. Hamilton, BC Decker
Google Scholar
Bogusevicius A, Pundzius J, Maleckas A, Vilkauskas L (1999) Computer-aided diagnosis of the character of bowel obstruction. Int Surg 84(3):225–228
CAS PubMed Google Scholar
Zhang W, Kim HM (2016) Fully automatic colon segmentation in computed tomography colonography. In: 2016 IEEE international conference on signal and image processing (ICSIP), pp 51–55 . IEEE
Siri Sangeeta K (2022) Threshold-based new segmentation model to separate the liver from CT scan images. IETE J Res 68(6):4468–4475
Article Google Scholar
Farzaneh N, Habbo-Gavin S, Soroushmehr SMR, Patel H, Fessell DP, Ward KR, Najarian K (2017) Atlas based 3D liver segmentation using adaptive thresholding and superpixel approaches. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1093–1097
Shin SY, Lee S, Elton D, Gulley JL, Summers RM (2020) Deep small bowel segmentation with cylindrical topological constraints. In: International conference on medical image computing and computer-assisted intervention, LNCS 12264, pp 207–215 . Springer
Oda H, Nishio K, Kitasaka T, Amano H, Takimoto A, Uchida H, Suzuki K, Itoh H, Oda M, Mori K (2020)Visualizing intestines for diagnostic assistance of ileus based on intestinal region segmentation from 3D CT images. In: SPIE medical imaging 2020: computer-aided diagnosis, vol 11314: pp 728–735
Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A (2018) H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674
Article PubMed Google Scholar
An Q, Oda H, Hayashi Y, Kitasaka T, Hinoki A, Uchida H, Suzuki K, Takimoto A, Oda M, Mori K (2023) M U-Net: Intestine Segmentation Using Multi-Dimensional Features for Ileus Diagnosis Assistance. In: Applications of medical artificial intelligence: second international workshop, AMAI vol 14313, pp 135–144. Springer
Zheng H, Lin L, Hu H, Zhang Q, Chen Q, Iwamoto Y, Han X, Chen Y-W, Tong R, Wu J (2019) Semi-supervised segmentation of liver using adversarial learning with deep atlas prior. In: Medical image computing and computer assisted intervention–MICCAI 2019, vol 11769, pp 148–156 . Springer
Zou Y, Zhang Z, Zhang H, Li C-L, Bian X, Huang J-B, Pfister T (2021) PseudoSeg: Designing Pseudo Labels for Semantic Segmentation. In: International conference on learning representations (ICLR)
Chen X, Yuan Y, Zeng G, Wang J (2021) Semi-supervised semantic segmentation with cross pseudo supervision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2613–2622
Wu Y, Ge Z, Zhang D, Xu M, Zhang L, Xia Y, Cai J (2022) Mutual consistency learning for semi-supervised medical image segmentation. Med Image Anal 81:102530
Article PubMed Google Scholar
Xie Q, Dai Z, Hovy E, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst 33:6256–6268
Google Scholar
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-Unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision, pp 205–218 . Springer
Çiçek, Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Medical image computing and computer-assisted intervention–MICCAI 2016: 19th international conference, vol 9901, pp 424–432 . Springer
Qin A, Oda H, Hayashi Y, Takayuki K, Akinari H, Hiroo U, Kojiro S, Aitaro M (2024) Takimoto Oda, Mori, K.: Intestine Segmentation from CT Volume based on Bidirectional Teaching. In: SPIE medical imaging : image processing (accepted)
Vu T-H, Jain H, Bucher M, Cord M, Pérez P (2019) ADVENT: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2517–2526
Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst. arXiv:1703.01780v6

Download references

Acknowledgements

Thanks for the help and advice from Mori laboratory. A part of this research was supported by Hori Sciences and Arts Foundation, MEXT/JSPS KAKENHI (24H00720, 22H03703), the JSPS Bilateral International Collaboration Grants, and the JST CREST (JPMJCR20D5).

Funding

Open Access funding provided by Nagoya University.

Author information

Authors and Affiliations

Graduate School of Informatics, Nagoya University, Nagoya, Aichi, 4648601, Japan
Qin An, Yuichiro Hayashi, Masahiro Oda & Kensaku Mori
School of Management and Informatics, University of Shizuoka, Suruga-ku, Shizuoka, 4228526, Japan
Hirohisa Oda
School of Information Science, Aichi Institute of Technology, Toyota, Aichi, 4700392, Japan
Takayuki Kitasaka
Graduate School of Medicine, Nagoya University, Nagoya, Aichi, 4668550, Japan
Hiroo Uchida, Akinari Hinoki & Aitaro Takimoto
Department of Radiology, Aichi Medical University, Nagakute, Aichi, 4801195, Japan
Kojiro Suzuki
Information Technology Center, Nagoya University, Nagoya, Aichi, 4648601, Japan
Masahiro Oda & Kensaku Mori
Research Center for Medical Bigdata, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 1018430, Japan
Kensaku Mori

Authors

Qin An
View author publications
Search author on:PubMed Google Scholar
Hirohisa Oda
View author publications
Search author on:PubMed Google Scholar
Yuichiro Hayashi
View author publications
Search author on:PubMed Google Scholar
Takayuki Kitasaka
View author publications
Search author on:PubMed Google Scholar
Hiroo Uchida
View author publications
Search author on:PubMed Google Scholar
Akinari Hinoki
View author publications
Search author on:PubMed Google Scholar
Kojiro Suzuki
View author publications
Search author on:PubMed Google Scholar
Aitaro Takimoto
View author publications
Search author on:PubMed Google Scholar
Masahiro Oda
View author publications
Search author on:PubMed Google Scholar
Kensaku Mori
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Kensaku Mori.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This study was approved by the institutional review boards of the Nagoya University, Aichi Medical University Hospital, and Toyohashi Municipal Hospital.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

An, Q., Oda, H., Hayashi, Y. et al. Multi-dimensional consistency learning between 2D Swin U-Net and 3D U-Net for intestine segmentation from CT volume. Int J CARS 20, 723–733 (2025). https://doi.org/10.1007/s11548-024-03252-6

Download citation

Received: 11 January 2024
Accepted: 07 August 2024
Published: 22 February 2025
Version of record: 22 February 2025
Issue date: April 2025
DOI: https://doi.org/10.1007/s11548-024-03252-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-dimensional consistency learning between 2D Swin U-Net and 3D U-Net for intestine segmentation from CT volume

Abstract

Purpose

Methods

Results

Conclusion

Similar content being viewed by others

M U-Net: Intestine Segmentation Using Multi-dimensional Features for Ileus Diagnosis Assistance

HTSeg: Hybrid Two-Stage Segmentation Framework for Intestine Segmentation from CT Volumes

Intestine Segmentation with Small Computational Cost for Diagnosis Assistance of Ileus and Intestinal Obstruction

Explore related subjects

Introduction

Method

Overview

Two-step network with multi-dimensional consistency learning

Two-step network

Multi-dimensional consistency learning

Loss function

Experiments and results

Dataset and experimental setup

Results

Discussion

Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords