Abstract
Medical image analysis tasks are characterized by high-noise, volumetric, and multi-modality, posing challenges for the model that attempts to learn robust features from the input images. Over the last decade, deep neural networks (DNNs) have achieved enormous success in medical image analysis tasks, which can be attributed to their powerful feature representation capability. Despite the promising results reported in numerous literature, DNNs are also criticized for several pivotal limits, with one of the limitations is lack of safety. Safety plays an important role in the applications of DNNs during clinical practice, helping the model defend against potential attacks and preventing the model from silent failure prediction. The recently proposed neural ordinary differential equation (NODE), a continuous model bridging the gap between DNNs and ODE, provides a significant advantage in ensuring the model’s safety. Among the variants of NODE, the neural memory ordinary differential equation (nmODE) owns the global attractor theoretically, exhibiting superiority in prompting the model’s performance and robustness during applications. While NODE and its variants have been widely used in medical image analysis tasks, there is a lack of a comprehensive review of their applications, hindering the in-depth understanding of NODE’s working principle and its potential applications. To mitigate this limitation, this paper thoroughly reviews the literature on the applications of NODE in medical image analysis from the following five aspects: segmentation, reconstruction, registration, disease prediction, and data generation. We also summarize both the strengths and downsides of the applications of NODE, followed by the possible research directions. To the best of our knowledge, this is the first review regards the applications of NODE in the field of medical image analysis. We hope this review can draw the researchers’ attention to the great potential of NODE and its variants in medical image analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Avoid common mistakes on your manuscript.
1 Introduction
Automated medical image analysis with fast, accurate, and insightful results holds substantial significance for clinicians during practice. These analysis results have a wide range of clinical applications, possible usage including disease recognition (Hu et al. 2018), quantitative analysis (Hu et al. 2019), treatment planning (Tian et al. 2022), and prognostic prediction (Zhou et al. 2022), etc. Numerous computer-aided systems (CAD) have been proposed to assist clinicians in accomplishing medical image analysis tasks. Conventional methods used in these analysis tasks are mainly based on handcrafted low-level (SIFT or HOG) that have limited representation capability, resulting in unsatisfactory performance.
During the last decade, the thriving of deep neural networks (DNNs) has changed the paradigm of image analysis from low-level hand-crafted to high-level learning-based ones. The DNNs have achieved enormous success in plenty of applications (LeCun et al. 2015), including object recognition (Krizhevsky et al. 2012), natural language processing (NLP) (Bahdanau et al. 2014), speech recognition (Graves et al. 2013), etc. The essential of DNNs is the implementation of a highly non-linear function F through layer-wise transformations. The F accomplish the mapping \(x \rightarrow y\) with learnable parameter \(W^{l}\), i.e., \(y=F(x;W^l)\), where \(l \in \{1,2,...,L\}\). One of the most simple connectionisms is the fully connected (FC) layer, where each neuron between two layers is connected through connection weight. The computational principle of the FC layer can be formulated as \(a^{l+1}=f( W^{l}a^{l})\), with \(W^{l} \in {\mathbb {R}}^{n_{l+1} \times n_{l}}\). The symbol \(a^l\) denotes the feature representation of layer l with respect to the input, and f can be composed of the non-linear activation function and normalization operator (Ioffe and Szegedy 2015).
The FC layer establishes fundamental interconnections between two layers, inspiring diverse novel connectionism approaches, including the convolutional (LeCun et al. 1998), recurrent (Williams and Zipser 1989), and self-attention (Vaswani et al. 2017) approaches. The characteristic of shared local connection inherent in the convolutional layer can extract local features efficiently, making it an ideal choice for image-related tasks (Krizhevsky et al. 2012). Besides the convolutional layer, the deliberately designed recurrent connection (Hochreiter and Schmidhuber 1997) can describe the long-term relationship among the time steps, which is appropriate for sequential-related tasks. Moreover, the self-attention embedded in the transformer architecture (Vaswani et al. 2017) shows superiority in modeling the contextual information, breaking state-of-the-art records in various tasks. First applied in the NLP-related tasks, later leveraged in the image-related analysis with the proposition of vision transformation (ViT) (Dosovitskiy et al. 2020), the transformer has unified the analysis tasks of different modals.
While various novel DNNs have been proposed to address the analysis tasks, the DNNs are criticized for their weak safety, poor generalization, and lack of explainability. Regarding safety, studies have reported that DNNs are vulnerable to adversarial examples. In other words, a small perturbation to the input can lead to false output with high probability (Goodfellow et al. 2014). Formally, for a given sample x and its corresponding prediction y generated by DNNs. The model is considered safe if, for every \(\epsilon > 0\), there exists \(\delta > 0\) such that when \(\left\| x - x^{\prime } \right\| < \delta\), it follows that \(\left\| y - y^{\prime }\right\| < \epsilon\). Nevertheless, literature has observed that current layer-wise DNNs are unsafe (Zheng et al. 2016). This lack of safety significantly increases the deployment risk of DNNs, despite the promising performance reported in the in-house test dataset. Regarding generalization, DNNs often fail on out-of-distribution (OOD) data, attributed to the independent and identically distributed (IID) assumption of the datasets used in the training and test phases. The limited generalization ability of DNNs may restrict their practical application in medical image analysis tasks, as input data may originate from different centers, vendors, and diseases (Campello et al. 2021). For the explainability, the DNNs have been condemned for a long time due to their inherent black-box computational principle. It is difficult to explain the knowledge contained in the DNNs during their internal layer-wise forward computation. Limitations in the aforementioned three aspects impede the performance of DNNs in real-world scenarios.
Fortunately, the neural ordinary differential equations (NODE) (Chen et al. 2018), a specialized model that describes the DNNs from the viewpoint of the dynamical system, provides a potential solution for the above limitations. The NODE bridges the gap between the DNNs and ODE, changing the paradigm of DNNs from a discrete limited number of layers to continuous unlimited ones. Unlike well-known DNNs with explicit architecture, the NODE implicitly maps the input to the output. The NODE has been observed with higher non-linearity, clearer dynamics behavior, and stronger fitting capacity when compared with the discrete model. For the safety, the NODE evolves the features along with the trajectory embedded in the ODE, alleviating the impact of the noise or attacks contained in the input. Figure 1 provides an example of the trajectory of a disrupted sample in the NODE. For the disrupted sample \(x + \delta\), the vanilla DNNs may output the false prediction. However, the NODE can evolve the input’s representation along the trajectory defined in the dynamical system, resulting in a correct representation. Regarding the generalization, the advantage of NODE in safety also contributes to improving the generalization capability of DNNs. It has been reported that the DNNs embedded with NODE have better robustness compared with the conventional discrete DNNs (Anumasa and Srijith 2021). For the explainability, a variant of NODE termed neural memory ordinary differential equation (nmODE) shows that the ODE possesses strong explainability of the internal representation. By visualizing the internal representation of the ODE, Zhang (Yi 2023) shows that the samples belonging to the same class display high similarity, indicating that the status of the sample in the ODE is explainable.
Recognizing the strength of NODE mentioned above, literature has leveraged NODE and its variants to tackle the medical image analysis tasks, which are characterized by high-noise, volumetric, and multi-modality properties. Numerous encouraging results have been reported in the literature. However, there is a lack of a comprehensive review of these applications, impeding an in-depth understanding of NODE’s working principles and potential applications. To address this limitation, we systematically collected and summarized NODE-related papers in the field of medical image analysis. To the best of our knowledge, this is the first review of NODE in medical image analysis. To cover the field as broadly as possible, the applications are divided into five categories: segmentation, reconstruction, registration, disease prediction, and data generation. For each application, we review the primary principles of the work and summarize the improvements of NODE compared to control methods. Moreover, we also highlight the potential limitations of NODE and give possible research directions in the discussion section.
This paper is organized as follows. Section 2 illustrates the working principle of NODE and its variants nmODE, aiming to explain the fundamental mechanism of NODE and clarify the relationship between the NODE and DNNs. Section 3 presents the applications of NODE in medical image analysis tasks, including segmentation, reconstruction, registration, disease prediction, and data generation. Section 4 discusses the strengths and limitations of NODE during the applications. Finally, Sect. 5 summarize the review.
2 Deep neural networks and ordinary differential equation
Despite the fact that the NODE and DNNs belong to distinct continuous and discrete models, the two models are closely related. This section begins by demonstrating the origination of the NODE, followed by the introduction of the recently proposed nmODE, which exhibits superior theoretical and practical properties when compared to the vanilla NODE.
2.1 Neural ordinary differential equation
We begin with the demonstration of the residual module that forms the basis of NODE. The residual module proposed in the ResNet (He et al. 2016) is one of the most prevalent modules in DNNs. It is motivated by the degradation problem during the training of DNNs, i.e., the deeper DNNs have higher training and test errors compared with the shallow ones. The degradation problem limits the depth of DNNs and poses challenges for DNNs with very deep depth, even the depth contributes to the non-linearity. To tackle this limitation, ResNet (He et al. 2016) proposes to incorporate the shortcut connection into the DNNs, resulting in the residual paradigm that has been widely used in building DNNs’ architectures. The residual module can be formulated as follows:
where the \(a^l\) in the right side of Eq. (1) denotes the shortcut connection that facilitates the gradient backpropagation during training. The introduced shortcut connections make it possible to construct DNNs with significant depth, allowing for models with 1000 or more layers.
Considering the superscript layer l in Eq. (1) as the time step t and interpreting the feature \(a^l\) as the representation y(t), we arrive at the following equation:
Taking the limit as t approaches infinity on the left side of Eq. (2), we obtain the NODE formulated as:
So far, we have completed the transformation between the discrete residual module and autonomous NODE. The right-hand side of Eq. (3) is implemented by DNNs, which constitute the name of Neural ODE. The solution to Eq. (3) can be obtained by solving the following integration:
Given the initial value \(y(t_0)\), we then can obtain the solution \(y(t_1)\) by using the ODE solver to solve the Eq. (4). Figure 2 demonstrates the working principle of NODE, which can be regarded as a specialized layer inserted into the DNNs. Suppose the representation in layer l of DNNs is denoted as \(a^l\), then the vanilla NODE uses \(a^l\) as the initial point \(y(t_0)\). By feeding the \(y(t_0)\) into the NODE described in Eq. (3) and using ODE solver to accomplish the integration, we can obtain the solution \(y(t_1)\), which can be regarded as the representation in layer \(l+1\).
Modern DNNs are typically optimized through automatic differentiation. For the NODE, there are two ways to accomplish the optimization, namely the backpropagation or the adjoint (Chen et al. 2018) approaches, which are known as discretize-then-optimize and optimize-then-discretize (Kidger 2021) methods, respectively. Suppose there are T iteration steps in the ODE solver, the backpropagation approach can compute precise gradients and has the advantage of speed, but it consumes \({\mathcal {O}}(T)\) memory since it needs to store all the intermediate variables for backpropagation. Nevertheless, the adjoint approach has a constant memory cost of \({\mathcal {O}}(1)\), but it is slower compared to the backpropagation and introduces numerical discretization errors.
One significant characteristic of ODE is that the integral curves described in Eq. (4) do not intersect. Formally speaking, suppose \(y_1(t)\) and \(y_2(t)\) are two solutions of ODE with different initial values of the identical ODE function, then \(y_1(t) \ne y_2(t)\) for all \(t \in [t_0, t_1]\). The non-intersecting property implies that the integral curves of the starting point and its neighborhood are restricted. If a sample is correctly classified by a NODE, then the prediction of its perturbed version would be bounded. This intrinsic regularization in NODE contributes to increasing the robustness, which is absent from the discrete layer-wise DNNs.
Many variants have been proposed since the prevalence of NODE. ANODE (Dupont et al. 2019) is proposed to augment the dimensions of NODE to learn complex mapping. Gholami et al. (2019) observe that the NODE trained with adjoint method (Chen et al. 2018) may not converge due to inconsistent gradients for small time step sizes. The authors address the underlying problem by incorporating the checkpointing method while keeping the same computational cost as the NODE. Solving the NODE requires evaluating the differential equation multiple times, which is referred to the number of function evaluations (NFE). It is desirable to decrease the NFE without compromising accuracy, which can prompt the computational efficiency of NODE for reality applications. To reduce the NFE, Kelly et al. (2020) simplify the trajectories of NODE by introducing a differentiable regularization term composed of the Kth-order of the internal state with respect to the time. Kidger et al. (2021) use a seminorm to replace the frequently used \(L^2\) norm to reduce the NFE.
2.2 Neural memory ordinary differential equation
One critical aspect of NODE is its input, which is frequently initialized by the output of the previous layer, e.g., the \(y(t_0)\) shown in Fig. 2. Dupont et al. (2019) points out that the NODE cannot represent some specialized function, for example, the function \(g(-1)=1\) and \(g(1)=-1\). The reason lies in the fact that the ODE’s trajectory cannot cross each other. Nevertheless, this restriction can be solved by reorganizing the inputs of NODE, i.e., regarding the feature as external inputs (\(\gamma\)) while using fixed the initial internal input (\(y(t_0)\)) (Yi 2023). For example, the one-dimensional ODE
can represent the function g. The solution of the above ODE is
where we can verify the representation of g by rearranging the input as external input x.
This architecture proposed by Yi (2023) is named neural memory ordinary differential equation (nmODE), which contains two types of neurons, including the learning neurons and memory neurons. The separation of external inputs and internal inputs indicates that learning and memory are divided, where learning only happens in the learning neurons and the memory neurons attempt to capture the feature’s characteristics through ODE. The architecture of nmODE is illustrated in Fig. 3. Formally, the framework of nmODE can be formulated by the following equations:
Similar to NODE described in Eq. (3), the nmODE outputs the solution \(y(t_1)\) by providing the initial internal input \(y(t_0)\) and external input \(\gamma\). Both the NODE and nmODE can be integrated into DNNs as specialized layers. The most significant difference between NODE and nmODE lies in the learning location. In NODE, learning is integrated into the ODE, where the internal initial value y(0) is assigned as the output from the previous layer \(a^l\). Nevertheless, the nmODE separates the learning from the ODE, specifically the transformation \(g(a^l; W)\) that generates the external input \(\gamma\) for the ODE. The nmODE provides a versatile architecture for implementing implicit non-linear mapping. Yi (2023) introduces a novel implementation of Eq. (7) as follows:
It is widely acknowledged that the attractor in a dynamic system is associated with memory. Theoretically, the nmODE shown in Eq. (8) has been proved that it has one and only one global attractor (Yi 2023), indicating that the model possesses enhanced memory properties. Experiments on natural and medical image recognition tasks show that the nmODE helps to improve the model’s accuracy.
3 Applications in the medical image analysis
3.1 Applications review
Given the strength of NODE and its variants in nonlinear mapping, they have been widely utilized for various analysis tasks, including medical image analysis. To summarize their applications in the medical image analysis tasks, we have thoroughly searched the Web of Science and Google Scholar by using the keywords neural ordinary differential equation and medical image. Based on the search results, we organize the related works into five categories, including medical image segmentation, medical image reconstruction, medical image registration, disease prediction, and medical data generation. Illustrations of related works within each category are presented in the following five subsections.
3.2 Applications in medical image segmentation
Medical image segmentation plays an important role in clinical practice, contributing to disease diagnosis, treatment planning, and prognostic analysis, etc. Starting from the breakthrough introduced by DNNs (Krizhevsky et al. 2012), especially the deep convolutional neural networks (DCNNs), the learning-based paradigm that extracts high-level abstract features has replaced the conventional hand-crafted low-level ones. The DNNs and their variants have become the de facto choice to accomplish the medical image segmentation tasks. The fully convolutional network (FCN) (Long et al. 2015) is the first attempt to build an end-to-end pixel-level semantic segmentation network. Among the variants of FCN, the U-Net (Ronneberger et al. 2015), a powerful segmentation network with succinct architecture, has been widely utilized to tackle varied medical image segmentation tasks. The composition of U-Net includes three parts, including an encoder, a decoder, and multiple shortcut connections, which is shown in Fig. 4. The encoder is designed to extract abstract features through multiple stages, where each stage consists of a convolutional layer, batch normalization operator, non-linear activation function, and pooling layer. The decoder has a similar architecture to the encoder, except that the pooling layer is replaced by the upsampling or interpolation ones. The deliberately designed shortcut connections contribute to the information flow between the encoders and decoders. Numerous variants of U-Net have been proposed, such as the Attention U-Net (AttU-Net) (Oktay et al. 2018), Recurrent Residual U-Net (R2U-Net) (Alom et al. 2018), and no new U-Net (nnU-Net) (Isensee et al. 2021) etc.
Since the introduction of NODE, plenty of works have incorporated it into the U-Net, aiming to increase the segmentation accuracy. Figure 5 summarizes the various medical image modalities used in segmentation tasks within the literature. The first work using NODE to accomplish the medical image segmentation task can be found in Pinckaers and Litjens (2019), which is termed U-NODE, a combination of U-Net and NODE. The U-NODE applies the NODE to the encoder and decoder parts of U-Net to segment whole-slide histology images of the colon (Sirinukunwattana et al. 2017). The motivation of U-NODE is to leverage the continuous depth property of NODE to dynamically adjust the receptive field of U-Net, for a better accommodation of the target with varied sizes. Experimental results reveal that the U-NODE exhibits superiority in the segmentation noise of large targets compared with the vanilla U-Net. Following the U-NODE, Li et al. (2021) use NODE with U-Net to segment blood cells from microscopy images. Considering the computational cost associated with the NODE, Li et al. tested the applied location of NODE in the U-Net to balance the computational efficacy and segmentation accuracy. A similar work can be found in Sadique et al. (2022), where Sadique et al. employ the NODE in U-Net with context encoding (Rahman et al. 2021) to segment multiple tumor tissues in the BraTS 2021 dataset (Baid et al. 2021), which contains magnetic resonance imaging (MRI) dataset with different parametrics, including the T1, T1 contrast enhancement (T1 CE), T2, and fluid-attenuated inversion recovery (FLAIR). Evaluation results on the out-of-sample MRI dataset differ from the BraTS showing that the NODE helps improve the generalization capability of the segmentation network.
Unlike the Sadique et al. (2022) that stacks the multi-parametric MRI along the channel dimension as input, Yang et al. (2023) utilizes a shared U-Net with NODE to process different modalities independently. The final tumor segmentation result is obtained through a weighted summation of the four branches corresponding to the four MRI modalities. Moreover, the authors deliberately designed a metric termed accumulative contribution curve to inspect the contribution of each modality to the segmentation of different tumor types, including the enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Experimental results reveal that not all modalities have a significant impact on the segmentation performance, indicating a reduced number of MRI modalities for automatic segmentation. This reduction is meaningful for both the clinical and computational modeling, contributing to efficient clinical practice and simplified model architecture. Besides the brain tumor, Ru et al. (2023) tackle the breast tumor segmentation with a proposed model named Att-U-NODE that integrates the attention mechanism (Oktay et al. 2018; Woo et al. 2018) with U-Net applied with NODE. Two datasets are used in the experiments, including the breast ultrasound and MRI images. The attention module is applied to the shortcut connections of the U-NODE, aiming to enhance the representation ability of abstract features.
While the fully-supervised U-Net with NODE delivers promising performance on various segmentation tasks, acquiring large-scale dense pixel-level segmentation datasets is quite expensive, especially in the context of medical imaging. Few-shot segmentation (FSS) (Wang et al. 2019) provides a potential solution that accomplishes segmentation for novel classes in a query image trained on a few annotated support images. However, the robustness of the FSS against the adversarial attack is unclear, which motivates (Pandey et al. 2023) to combine FSS with NODE to build a data-efficient segmentation model that is both accurate and safe. The method proposed by Pandey et al. (2023) is named regularized prototypical neural ordinary differential equation (R-PNODE), designed to leverage the intrinsic properties of NODE for FSS of organs in computed tomography (CT). For the paired query and support images denoted as \((x_q, x_s)\), the authors first obtain their corrupted version by applying Gaussian noise that is formulated as \((x_q^G, x_s^G)\). Three losses are proposed in the R-PNODE, with two designed for the query image and the third one for the support image. For the query image, suppose the predictions of the clean and Gaussian versions from the NODE are denoted as \(y(x_q, t_1)\) and \(y(x_q^G, t_1)\), then R-PNODE minimizes their distance between the groundtruth, which is named as consistency losses. Regarding the support image, R-PNODE maximizes the cosine similarity between the prediction of the clean support image and its related Gaussian one, which is named cluster loss. The cluster loss contributes to a robust representation of the input, aiding the FSS model in defending against adversarial attacks.
Recently, Hu et al. (2023) empirically observe that the conventional DCNNs-based segmentation model is sensitive to the noise in the training dataset, revealing inherent limitations in the robustness of DCNNs. Nevertheless, the U-Net with nmODE (Yi 2023), a novel variant of NODE, helps to alleviate the impact of the noise. The continuous nmODE and discrete DCNNs belong to two different types of DNNs, implying the distinct knowledge contained in the networks. The discrete DCNNs own strength in speed, but are vulnerable to the noise in the training dataset; conversely, the NODE is superior in the aspect of robustness, but with the cost of increased computational complexity. To balance the speed and robustness, Hu et al. (2023) leverage the knowledge distillation (KD) (Hinton et al. 2015) framework to transfer the knowledge from continuous nmODE to discrete DCNNs, simultaneously enhancing the model’s robustness and efficiency. The main idea of the proposed method can be formulated as the following equation:
where the two items in the right-hand side represent the cross entropy and KL divergence balanced by the hyper-parameter \(\lambda\). Superscript s and t denote the student (discrete U-Net) and teacher (continuous nmODE) models, respectively. The \(p_k^s\) represents the prediction from the discrete model regards the target class k. The \(p_c^{t}\) and \(p_c^{s}\) represent the predictions of the nmODE and the U-Net model for the class c, respectively. Experimental results on 18 organs-at-risk (OAR) segmentation tasks demonstrate that nmODE with knowledge distillation exhibits improved robustness compared to ODE-based models while also mitigating the computational efficiency limitation.
The NODE and its variants presented above belong to the first-order method, while Cheng et al. (2023) propose a second-order approach. They theoretically proved and empirically verified that the second-order method has faster convergence, higher robustness, and less sensitivity to noise compared to the first-order ones. The second-order NODE can be formulated as follows:
where \(\theta _g\) and \(\theta _f\) represent the parameters of the first-order and second-order NODE, respectively. Cheng et al. (2023) compare characteristics of different ODE solvers, and show that the RK4 method is appropriate for the second-order segmentation model. Extensive experiments on six segmentation datasets verify the effectiveness of the second-order NODE model.
Despite the static 2D or 3D medical images, the longitudinal imaging that captures multiple scans of a patient along with the time helps the doctor to observe the progress of the disease. The computational method that analyzes the target changes by using longitudinal imaging is named disease progression models (DPMs) (Lachinov et al. 2023). While the recurrent neural networks (RNNs) (Zhang 2013) show strength in modeling the time-related dataset, they are designed for forecasting the fixed intervals only, which fails in analyzing the time-variant dataset. NODE that models the dynamic physical process provides a potential solution for the DPMs. Lachinov et al. (2023) leverage the NODE to forecast the segmentation map at arbitrary time points. The first visit image is processed by a CNN to extract the feature representation \(\xi\), which is used as the initial condition \(y(t_0)\) for the NODE that evolves over time to form the trajectory of the segmentation map. The trained NODE can then be sampled at arbitrary time points to obtain the segmentation map. Moreover, the authors also propose a temporal Dice loss designed for the entire stack of visits. Experimental results on the optical coherence tomography (OCT) of geographic atrophy and MRI of Alzheimer’s disease demonstrate the advantage of NODE in modeling the time-variant dataset. Figure 6 presents the segmentation map of varied time points predicted by the NODE. Table 1 summarizes the comparison between the NODE and control methods in various segmentation tasks.
3.3 Applications in medical image reconstruction
Target reconstruction from medical images is another important task in medical image analysis. For example, reconstructing cortical surfaces from MRI contributes to quantitative analysis of the brain status of different ages. Nevertheless, the reconstructed cortical surface mesh requires to be a closed manifold, which is topologically equivalent to a 2-sphere. This means that the reconstruction results should be without any self-intersection cases, i.e. without any holes or handles. Despite the progress achieved by classical neuroimage analysis software, such as FreeSurfer (Fischl 2012), BrainSuite (Shattuck and Leahy 2002), HCP pipeline (Glasser et al. 2013), two notable limitations restrict their applications (Ma et al. 2022). The first is time consumption. The mentioned conventional methods are time-consuming since several computationally intensive geometric or image processing algorithms are involved. For example, the FreeSurfer requires more than 4 h to accomplish the reconstruction of the cortical surface for a single patient. The second limitation is the generalization capability. The brain images from neonatal and adults significantly varied in intensity values, shapes, and sizes, which pose challenges to the robustness of reconstruction methods.
Recently, end-to-end deep learning methods have been widely undertaken to solve the cortical surface reconstruction task. For instance, FastSurfer (Henschel et al. 2020) first leverages the DCNNs to segment the brain tissue, which is later reconstructed to the cortical surface by adopting a fast spherical mapping. DeepCSR (Cruz et al. 2021) uses the occupancy fields and signed distance functions (SDFs) to predict the implicit cortical surface. These implicit methods require topology correction to guarantee the non-intersection property of the reconstructed surface. Besides the implicit methods, Voxel2Mesh (Wickramasinghe et al. 2020) and PialNN (Ma et al. 2021) belong to the explicit approaches that directly reconstruct the cortical surface without post-processing. These explicit approaches show superiority in speed but lack theoretical guarantees to prevent self-intersections of the surface.
Faced with the aforementioned limitations of both classical and DCNNs-based methods, Lebrat et al. (2021) propose the CorticalFlow, a NODE-based approach that reconstructs the cortical surface from input MRI images. Three steps are included in the overall process. The first step involves generating the template mesh from the input MRI volume using the third-party libraries, including the JIGSAW (Engwirda and Ivers 2016) and MeshLab (Cignoni et al. 2008). In the second step, 3D U-Net (Ronneberger et al. 2015) is employed to generate a dense 3D flow field from the MRI volume, with moderate GPU memory consumption. The third step involves feeding both the predicted flow field and template mesh input into an ODE-based model named diffeomorphic mesh deformation (DMD) to compute the diffeomorphic mapping for each vertex. Moreover, a multi-stage framework with an increased number of vertices is proposed, progressively refining the predicted cortical surface from low to high resolution.
The most important aspect of CorticalFlow is the continuous DMD module, which theoretically guarantees that the predicted cortical surface is non-intersecting. The DMD module is solved using the forward Euler method. Figure 7c compares the reconstructed cortical surface from the U-Net (Ronneberger et al. 2015), neural mesh flow (NMF) (Gupta 2020), and the CorficalFlow (Lebrat et al. 2021). It is clear that the U-Net produces irregular mesh, as shown in Fig. 7a, which can be attributed to the absence of geometric regularization of the U-Net. The NMF is another NODE-based method designed for 3D mesh generation (Gupta 2020). However, Lebrat et al. (2021) found that the NMF conditioned on the global feature descriptor of the input image failed to provide the details of the cortical surface, as observed in Fig. 7b. Nevertheless, CorticalFlow uses the DMD module to optimize the local feature descriptor of each vertex and employs three stages to gradually increase the number of processed vertices. Figure 7c shows that the CorticalFlow generates anatomically plausible cortical surfaces with clear boundaries.
Motivated by the promising reconstruction results reported by the CortexFlow, Ma et al. (2022) have further proposed a NODE-based cortical surface reconstruction framework named CortexODE. The CortexODE comprises three steps. In contrast to the CortexFlow, which uses the 3D U-Net to predict the dense flow field, CortexODE initially segments the raw white matter from the input MRI volume. In the second step, the raw white matter is transformed into a signed distance function (SDF), subsequently processed by a topology correction algorithm to obtain an initial closed manifold surface. In the final third step, the initial surface is deformed to the white matter and pial surface through two NODEs, respectively. Figure 8 illustrates the diffeomorphic flow modeled by the CortexODE from the leftmost of the white matter surface to the rightmost pial surface.
Besides the Cortical surface reconstruction task modeled by the NODE, reconstructing the fully sampled medical image (e.g., MRI) from the undersampled ones contributes to balancing the efficiency and effectiveness of the imaging process. It is acknowledged that a major challenge of MRI lies in its slow data acquisition process, which is attributed to the hardware restriction. To expedite the process of MRI acquisition, it is common to acquire an undersampled k-space, leading to aliasing artifacts in the resultant image domain. The reconstruction of high-fidelity images from the undersampled k-space data holds great significance in the clinical utilization of MRI.
The MRI acquisition can be formulated by the following equation:
where y denotes the fully sampled MRI image to be reconstructed, and x represents the observed undersampled k-space. \(\epsilon\) is the noise during the imaging process. E represents the Fourier transform and undersampling that convert the fully sampled MRI into the undersampled k-space. Thus, the MRI reconstruction task can be described as the following optimization problem:
where the left term measures the distance between the reconstructed and observed MRI image. The right one is the regularization term. The problem shown in Eq. (12) can be solved using the gradient descend algorithm:
where the variable n denotes the iteration step and \(\alpha\) is the learning rate. The Eq. (13) can also be abbreviated as:
W represents the learnable parameters during the optimization. Similar to the equation described in Eq. (2), by considering the step on the left side of Eq. (14) to be infinitely small, we arrive at the NODE in Eq. (3). So far, we can see that the MRI reconstruction task is closely related to the NODE, which can be solved by off-the-shelf solvers.
Among variants of the ODE solver, the Euler solver is the first-order one, which iteratively approximates the integral of ODE through a fixed step size h:
Advanced solvers, such as the Runge–Kutta (RK) methods with higher orders, can achieve better precision than the Euler approach. The computational principle of the general RK methods with stage s can be formulated by the following equations:
where \(a_i, b_{ij},\) and \(c_i\) are fixed coefficients. Instead of the fixed step size and coefficients, Chen et al. (2020) propose integrating knowledge of the solvers into the network, letting the network learn to adjust the step size and coefficients according to the requirements of tasks. The gradient checkpoint method (Chen et al. 2016) is used to reduce the GPU memory consumption. Experimental results show that the learned ODE solver achieves better reconstruction accuracy compared with the fixed one. Figure 9 compares the reconstruction results from U-Net and ReconODE, where it is clear that the error of ReconODE is smaller than that of U-Net. Similar work for the MRI and cone-beam CT reconstruction can be found in Yazdanpanah et al. (2019); Thies et al. (2022). Table 2 summarizes the comparison between the NODE and control methods in various reconstruction tasks.
3.4 Applications in medical image registration
Deformable registration computes a dense correspondence between two images, which plays an important role in various medical image analysis tasks, e.g., surgical planning and radiogenomics analysis. The term deformable presents the nonlinear and dense characteristics of the transformation. Let \(y_{\textrm{mov}}\) and \(y_{\textrm{fix}}\) denote the moving and fixed images, where the registration task aims to find the deformation field \(\phi _{W}\) parameterized by W to convert the moving image similar to the fixed one (Dalca et al. 2018). The registration problem can be formulated as:
where the term \(y_{\textrm{mov}} \circ \phi _{W}\) denotes the transformed moving image. Function \({\mathcal {L}}\) measures the similarity between the transformed moving and fixed images. \({\mathcal {R}}(W)\) denotes the regularization term to ensure the smoothness of the transformation. The architecture of the registration method between the moving and fixed images can be found in Fig. 10.
Even the conventional registration methods, such as elastic-type models (Bajcsy and Kovačič 1989), B splines (Rueckert et al. 1999), or dense vector fields (Thirion 1998), have achieved impressive registration performance, they are criticized for the slow registration process due to the time-consuming iterative non-linear optimization algorithms. The computational burden hinders their clinical translation. Faced with this challenge, both supervised and unsupervised learning-based methods have been proposed to accomplish the registration task. The learning-based approach requires one forward pass of the model, which is intrinsically faster than the conventional methods. For the supervised approach, the registration model maps the moving and fixed images to an output deformation. The groundtruth deformation field is often obtained through conventional registration tools, which would introduce biases. The unsupervised registration method that does not require the groundtruth deformation field is more practical than the supervised one (Dalca et al. 2018).
Xu et al. (2021) propose an ODE-based registration method named Multi-Scale ODE Network (MS-ODENet), marking the pioneering application of NODE to medical image registration. The authors model the registration optimization problem in Eq. (17) as a continuous dynamics process using NODE. Throughout the experiments, the authors observed that solving NODE for the image registration problem demands a large number of NFE, significantly increasing the training and inference time costs. To address the computational efficiency problem, they propose a multi-scale approach that transforms the input image into L different resolutions. The \((l-1)\)-th version is generated by down-sampling the l-th with a factor of 2 on the x, y, and z axes. The integral interval of NODE, suppose as [0, T], is divided into L segments corresponding to the number of resolutions. In each segment, the NODE is solved at the corresponding resolution, forming the basis of the multi-scale ODE. The MS-ODENet offers two advantages compared to the vanilla one. First is the time cost. The time consumption for the function evaluation of the low-resolution input is much smaller than the high-resolution one. The second advantage is convergence. The search space of the low-resolution input for image registration is greatly reduced, leading to a faster convergence speed.
In addition to the multi-scale approach, the authors propose pretraining the feature extractor. This is motivated by the fact that the anatomical structures of the same patient remain consistent across different contrast images. The authors utilize the image-to-image translation framework (Lee et al. 2020) to separate the content from the style features. Subsequently, the robust pretrained content encoder is transferred to the registration network.
Wu et al. (2022) propose a NODE-based Optimization (NODEO) framework that regards the voxels in the moving image as a high-dimensional dynamical system whose trajectory represents the deformation field. An important contribution of the NODEO lies in the implementation of the regularization term in Eq. (17), which is composed of three parts. The first is the negative Jacobian determinants of the transformed voxels, which aim to reduce the folds in the transformation. The second is the regularization of the magnitude of the velocity field, which is equivalent to penalizing the energy of the transformation. The third is the spatial gradient of the voxels after the deformation, which contributes to the spatial smoothness of the transformed voxels.
Joshi et al. propose a residual registration network named R2Net (Joshi and Hong 2023), which is an unsupervised registration framework that leverages residual connections to generate the desired deformation field. The framework is composed of three components. The first one involves generating the initial velocity field (\(v_0\)) by using the U-Net with the input of the moving and fixed images. In the second step, the generated initial velocity field is integrated into the deformation field (\(\phi\)) with a module termed Lipschitz Continuous ResNet (LC-ResNet), contributing to obtaining the diffeomorphic deformation. During implementation, spectral normalization (Miyato et al. 2018) is applied to each convolutional layer in the residual blocks to guarantee the Lipschitz continuity. Moreover, shared or unshared convolutional weights are considered, resulting in the stationary velocity field or time-varying velocity fields, respectively. In the third step, chunk and downsampling, two frequently used data reduction strategies are adopted, leading to the local and global two branches. For the chunking branch, multiple velocity fields are merged to maintain the size the same as the input. For the downsampling branch, the downsampled velocity field is upsampled back to the original resolution. The two velocity fields are then concatenated to generate the deformation field using the LC-ResNet. Figure 11 compares the registration results from VoxelMorph (Balakrishnan et al. 2019) and R2Net (Joshi and Hong 2023), clearly showing that R2Net achieves a smaller error than VoxelMorph. Table 3 summarizes the comparison between the NODE and control methods in various registration tasks.
3.5 Applications in disease prediction
The disease prediction task is a crucial aspect of medical image analysis, encompassing two primary domains. The first domain involves identifying the presence of a disease by analyzing medical images, often using sequences or static images in clinical scenarios. The second one focuses on predicting the progression of the disease by utilizing various medical imaging modalities to model morphological or pathobiological changes over time.
Recent advances in DNNs offer substantial promise in addressing these disease prediction tasks. Notably, RNNs have gained popularity for their ability to handle time-dependent patterns in patients’ progression. The recurrent connections allow the network to process input data at each time step, continuously updating its hidden state, which acts as a memory mechanism to model the patterns in sequential data. However, original RNNs face the vanishing gradient problem, where gradients become exceedingly small when modeling long sequences. To overcome this challenge, long short-term memory (LSTM) networks (Memory 2010) and gated recurrent Units (GRUs) (Cho et al. 2014) have emerged as prominent architectures. These architectures incorporate effective gating mechanisms, efficiently mitigating the vanishing gradient problem.
RNNs and their variants are effective in modeling discrete-time dynamical systems. However, clinical data often introduces uncertainty in the time interval between data points, e.g., incomplete data or irregular acquisition times. To address this issue, previous studies either removed several data points to maintain evenly spaced data or employed missing value imputation techniques to generate possible values for intermediate data. However, these methods frequently introduce errors between the generated data and the real values, impacting the prediction results.
In contrast, NODE, the powerful continuous dynamical system, can effectively model data with varied time intervals. Unlike the RNNs, NODE is insensitive to inconsistencies in data points, making it an ideal choice to address the irregular time intervals in clinical data. Figure 12 illustrates the architecture of the NODE-based model for disease prediction tasks. Qiu et al. (2023) propose a gram-based attentive neural ordinary differential equations network (Gram-AODE) method. This approach aims to model the underlying dynamics of eyeballs’ trajectory points and learn the relations of continuous changes. Specifically, the authors first transform the video into a time series of pupil coordinates. The Gram matrix is then employed to convert the time-series data into feature images. After downsampling, the feature images are further augmented by the NODE. The dynamically modeled features by NODE are then processed through a multi-head attention block to output the categories.
The prediction of disease progression is a common requirement in analyzing brain diseases, such as Alzheimer’s disease (AD). Recently, Jeong et al. (2022) have proposed a NODE-based method to tackle challenges in modeling the progression of AD using longitudinal data characterized by irregular and incomplete observations. The workflow involves four steps: transforming input data into the Cholesky space, leveraging recurrent gated recurrent units (RGRUs) for dynamic modeling, employing neural manifold ODEs for continuous modeling, and estimating missing values through a proposed missing value estimation module. The NODE is utilized in their method to model time-series data and estimate the hidden state of missing values.
Hao et al. (2020) aim to utilize NODE to infer the propagation dynamics of amyloid pathology, which is considered the primary pathological event in AD. The method involves constructing a brain network graph where each node represents a cortical or subcortical gray matter region, and edges denote connectivity strength. The authors formulate the NODE to describe the propagation of pathology, incorporating a diffusion constant and a Laplacian-based heat diffusion equation. Additionally, they introduce a function learned through DNNs to modulate the diffusion process, enabling the prediction of pathology burden at multiple time points. The proposed model is evaluated on a neuroimaging dataset, demonstrating promising results.
Cai et al. (2023) propose a NODE-based brain state recognition neural network (OSRNet) to explore NODE’s performance on neuroimaging data. The network comprises three main components: feature Elucidation, a NODE Block, and a fully-connected classifier. Feature Elucidation involves reducing the dimensionality of high-dimensional data by mapping it to a Riemannian manifold. The method employs a generic mapping layer and feature clarification operations to obtain low-dimensional functional connectivity matrix signatures. NODE is used to model brain dynamics. A Log-Euclidean operator is applied to project symmetric positive definite matrices onto a vector space, and an ODE solver is used to simulate continuous-time dynamics. Moreover, a hybrid Euler method is proposed for numerical ODE solving. At each time step, it calculates the system’s state update using the explicit Euler method to capture the slope of the ODE.
In light of the existing unlabeled data situation, Zeghlache et al. (2023) propose an innovative fusion of longitudinal self-supervised learning (LSSL) and NODEs to enhance the modeling of disease progression. The authors introduce a Siamese-like LSSL variant designed to assess the role and significance of the reconstruction term within the LSSL framework. Furthermore, they contextualize the application of NODEs in disease progression modeling, drawing parallels with instances such as predicting Alzheimer’s disease progression and understanding patient dynamics with COVID-19 medication. The introduced LSSL-NODE model extends the application of predicting disease progression. This extension involves formulating an initial value problem through the latent representation of the first image in a consecutive pair, allowing for a more comprehensive exploration of disease dynamics.
Wen (2020) introduces a pioneering approach to modeling the temporal dynamics of resting-state functional magnetic resonance imaging (rsfMRI) data by leveraging NODEs. The key innovation lies in using NODEs as the foundation for video prediction, allowing the compression of spatial-temporal trajectories into latent representations. Specifically, Wen et al. employ NODE to model the spatial-temporal characteristics of rsfMRI data. They fit the observed rsfMRI data into the network to predict future fMRI data based on input trajectories. The authors claim this framework facilitates the prediction of future brain maps and enables data interpolation between two given trajectories. Notably, the utilization of NODE addresses the challenge of degradation in prediction quality over time points, presenting a promising solution to the inherent uncertainty in video prediction problems. Table 4 summarizes the comparison between the NODE and control methods in various disease prediction tasks.
3.6 Application in medical data generation
Medical data generation refers to the simulation of medical data using synthetic methods to replace or complement real ones. Research in this field involves using generated data to simulate various scenarios in the medical domain, including medical images, clinical data, electronic health records, etc. With the advent of DNNs, learning-based methods have emerged as a prominent tool for data generation.
Medical data involves complex biological and medical dynamic systems, such as disease progression and treatment response. Accurately modeling these systems is challenging, especially when multiple variables and influencing factors are involved. Existing methods attempting to simulate such dynamic systems require a substantial number of learnable parameters and computational resources. To address this issue, NODE has been applied in the field of data generation for the following reasons. First is continuous dynamics modeling. NODE is designed to model continuous dynamics. This makes it well-suited for tasks where the data evolves continuously over time. Second is implicit function representation. NODE implicitly represents the dynamics through trajectory. This implicit representation can capture long-term dependencies and intricate relationships within the data, enabling NODE to generate coherent and realistic sequences. Third is interpolation and extrapolation. Through its continuous representation, NODE can both interpolate between observed data points and extrapolate to generate data at unobserved time points. This is valuable for tasks where generating data at arbitrary time steps is necessary.
Recent endeavors (Hong et al. 2023; Salvador et al. 2023; Wendland et al. 2022) have attempted to incorporate NODE into the medical data generation domain. Hong et al. (2023) present a NODE-based method for forecasting pharmacokinetic parameters in dynamic brain positron emission tomography (PET) imaging. Traditional methods, such as interpolation or extrapolation, require sufficient time-activity curve (TAC) samples throughout the entire acquisition, which is not always practical due to patient motion and noise. In contrast, the proposed NODE method predicts TAC in extended time frames by mimicking the analytical modeling in a data-driven manner. The method predicts complete dynamic images based on limited early-frame images, demonstrating promising results in both simulated and clinical \(^{18}\)F-PI-2620 brain PET data. The potential applications of this method include reducing scan time, minimizing radiation dose, improving imaging quality, and better evaluating drug efficacy.
Wendland et al. (2022) introduced a hybrid, multimodal approach called Multimodal Neural Ordinary Differential Equations (MultiNODEs), which enables the generation of lifelike synthetic patient trajectories over time. This feature facilitates the seamless interpolation and extrapolation of clinical studies. MultiNODEs is proficient in incorporating both static and longitudinal data, implicitly handling missing values. By using MultiNODEs, organizations can generate synthetic patient trajectories that capture the characteristics of real patient data, enabling them to conduct analyses and make decisions based on realistic data without compromising individual privacy or data-sharing restrictions. The authors illustrate the capabilities of MultiNODEs by applying them to actual patient-level data from two independent clinical studies. Simultaneously, they demonstrate its effectiveness with simulated epidemiological data of infectious diseases, emphasizing its potential to address the issue of restricted data collection and sharing among diverse organizations. Table 4 compares the generation performance between the MultiNODEs and control method.
Salvador et al. (2023) have proposed a scientific machine-learning methodology for creating a digital twin model of the heart. The digital twin model is a virtual model based on mathematical models and real-time data. It can simulate and predict the behavior and performance of a physical system. Utilizing latent neural ordinary differential equations (LNODEs), the authors capture the pressure-volume dynamics of a heart failure patient by training a system with 43 model parameters. These model parameters are used to describe cardiac electrophysiology, active and passive mechanics, and cardiovascular fluid dynamics. LNODEs efficiently represent the 3D-0D model in a latent space via a feedforward fully connected network. Remarkably, the model achieves 300x real-time numerical simulations of the cardiac function on a standard laptop processor (Table 5).
4 Discussion
From the above review of NODE in the medical image analysis tasks, it can be found that NODE offers a powerful tool to address the challenges in conventional DNNs. The strengths of NODE can be categorized into practical and theoretical aspects. Practically, the NODE exhibits higher nonlinearity than conventional discrete DNNs, which is a key factor contributing to the successful applications of DNNs. The NODE is closely related to the residual connections in ResNet (He et al. 2016), where the latter is a milestone in the numerous architectures of DNNs that makes it possible to construct the network with very deep depth. Considering the layer in residual connections as the time step and approaching this step to be infinitely small, we arrive at the NODE. Therefore, NODE can be regarded as an advanced ResNet with infinite depth, characterized by higher nonlinearity and stronger representation ability. Theoretically, NODE derives benefits from the theoretical characteristics of ODE, particularly the non-intersecting theorem that ensures the robustness of NODE. Moreover, the theoretical advantage of NODE over the discrete DNNs contributes to the diffeomorphic transformation in the registration and reconstruction of medical images.
4.1 Computational limitation of NODE
While the NODE exhibits notable advantages over conventional DNNs, it is essential to acknowledge its inherent weaknesses. One prominent limitation is its computational cost. It has been observed that the model embedded with NODE is considerably slower than the conventional DNNs during both the training and inference phases (Hu et al. 2023). This slowness is attributed to the iterative computation required for the ODE solver’s integration, referred to as NFE. The introduced computational cost challenges the clinical deployment of NODE-based models, particularly in scenarios that necessitate timely analysis results.
4.2 NODE-based foundation model
In terms of future works, potential research directions include the following aspects. The first one is the NODE-based foundation model. The foundation model (Moor et al. 2023) contributes to the exceptionally rapid development of the generative large language model (LLM). By training the foundation model on a massive and diverse dataset through the self-supervised paradigm, the model exhibits impressive performance on various generative tasks. The NODE can be regarded as a specialized layer with higher nonlinearity and infinite depth. It is worthwhile to endow the foundation model with the NODE, which may further enhance the model’s capacity.
4.3 Pre-training and fine-tuning of NODE
The second one is NODE-based pre-training. The pre-training and fine-tuning paradigm has become the de facto standard for utilizing DNNs, particularly in medical image analysis tasks that often suffer from limited annotated datasets. Due to the superior memory properties of NODEs compared to discrete DNNs, transferring pretrained NODE-based medical image analysis models to downstream tasks may effectively reduce the need for annotated datasets and ease the difficulty of fine-tuning.
4.4 Balance of accuracy and efficiency
The third one is the balance between the computational cost and the accuracy of NODE-based models. Relaxing the error tolerance of NODE during the integration would expedite efficiency but at the cost of decreased accuracy and vice versa. Thus, finding a promising solution to balance efficiency and accuracy would significantly broaden the applications of NODE-based models.
5 Conclusion
This paper comprehensively reviews the applications of NODE-based models in the medical image analysis tasks. Numerous applications are broadly categorized into five types, including segmentation, reconstruction, registration, disease prediction, and data generation. Various novel NODE-based models have been proposed to address the challenges in these analysis tasks, delivering impressive performance compared to conventional discrete DNNs. Additionally, we summarize the strengths and weaknesses inherent in NODE-based models and suggest possible future research directions by integrating the evolution of modern artificial intelligence. We hope this review will provide insights into solutions for the medical image analysis tasks and capture researchers’ attention to the significant potential of NODE.
Data availability
No datasets were generated or analysed during the current study.
References
Alom MZ, Hasan M, Yakopcic C, Taha TM, Asari VK (2018) Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. Preprint at http://arxiv.org/abs/1802.06955arXiv:1802.06955
Anumasa S, Srijith P (2021) Improving robustness and uncertainty modelling in neural ordinary differential equations. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4053–4061
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Preprint at http://arxiv.org/abs/1409.0473
Baid U, Ghodasara S, Mohan S, Bilello M, Calabrese E, Colak E, Farahani K, Kalpathy-Cramer J, Kitamura FC, Pati S, et al (2021) The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at http://arxiv.org/abs/2107.02314
Bajcsy R, Kovačič S (1989) Multiresolution elastic matching. Comput Vision Gr Image Process 46(1):1–21
Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV (2019) Voxelmorph: a learning framework for deformable medical image registration. IEEE Trans Med Imaging 38(8):1788–1800
Barch DM, Burgess GC, Harms MP, Petersen SE, Schlaggar BL, Corbetta M, Glasser MF, Curtiss S, Dixit S, Feldt C et al (2013) Function in the human connectome: task-fmri and individual differences in behavior. Neuroimage 80:169–189
Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, Cetin I, Lekadir K, Camara O, Ballester MAG et al (2018) Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging 37(11):2514–2525
Bui PT, Reiter GS, Fabianska M, Waldstein SM, Grechenig C, Bogunovic H, Arikan M, Schmidt-Erfurth U (2022) Fundus autofluorescence and optical coherence tomography biomarkers associated with the progression of geographic atrophy secondary to age-related macular degeneration. Eye 36(10):2013–2019
Cai H, Dan T, Huang Z, Wu G (2023) Osr-net: Ordinary differential equation-based brain state recognition neural network. In: 2023 IEEE 20th international symposium on biomedical imaging (ISBI). IEEE, pp 1–5
Campello VM, Gkontra P, Izquierdo C, Martin-Isla C, Sojoudi A, Full PM, Maier-Hein K, Zhang Y, He Z, Ma J et al (2021) Multi-centre, multi-vendor and multi-disease cardiac segmentation: the m &ms challenge. IEEE Trans Med Imaging 40(12):3543–3554
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/
Chen T, Xu B, Zhang C, Guestrin C (2016) Training deep nets with sublinear memory cost. Preprint at http://arxiv.org/abs/1604.06174
Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. Adv Neural Info Process Syst 31:3
Chen EZ, Chen T, Sun S (2020) Mri image reconstruction via learning optimization using neural odes. In: Medical image computing and computer assisted intervention–MICCAI 2020: 23rd international conference, Lima, Peru, October 4–8, 2020, proceedings, Part II 23. Springer, pp 83–93
Cheng CW, Runkel C, Liu L, Chan RH, Schönlieb CB, Aviles-Rivero AI (2023) Continuous u-net: faster, greater and noiseless. Preprint at http://arxiv.org/abs/2302.00626
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. Preprint at http://arxiv.org/abs/1406.1078
Cignoni P, Callieri M, Corsini M, Dellepiane M, Ganovelli F, Ranzuglia G, et al (2008) Meshlab: an open-source mesh processing tool. In: Eurographics Italian Chapter Conference, vol. 2008. Salerno, Italy, pp 129–136
Cruz RS, Lebrat L, Bourgeat P, Fookes C, Fripp J, Salvado O (2021) Deepcsr: A 3d deep learning approach for cortical surface reconstruction. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 806–815
Dalca AV, Balakrishnan G, Guttag J, Sabuncu MR (2018) Unsupervised learning for fast probabilistic diffeomorphic registration. In: Medical image computing and computer assisted intervention–MICCAI 2018: 21st international conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I. Springer, pp 729–738
Dempster A, Petitjean F, Webb GI (2020) Rocket: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Disc 34(5):1454–1495
Der Sarkissian H, Lucka F, Eijnatten M, Colacicco G, Coban SB, Batenburg KJ (2019) A cone-beam x-ray computed tomography data collection designed for machine learning. Sci Data 6(1):215
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at http://arxiv.org/abs/2010.11929
Dupont E, Doucet A, Teh YW (2019) Augmented neural odes. Adv Neural Info Process Syst 32:2
Engwirda D, Ivers D (2016) Off-centre steiner points for delaunay-refinement on curved surfaces. Comput Aided Des 72:157–171
Fischl B (2012) Freesurfer. Neuroimage 62(2):774–781
Gholami A, Keutzer K, Biros G (2019) Anode: unconditionally accurate memory-efficient gradients for neural odes. Preprint at http://arxiv.org/abs/1902.10298
Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, Xu J, Jbabdi S, Webster M, Polimeni JR et al (2013) The minimal preprocessing pipelines for the human connectome project. Neuroimage 80:105–124
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint http://arxiv.org/abs/1412.6572
Gootjes-Dreesbach L, Sood M, Sahay A, Hofmann-Apitius M, Fröhlich H (2020) Variational autoencoder modular Bayesian networks for simulation of heterogeneous clinical study data. Front Big Data 3:16
Graves A, Mohamed A-r, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
Gupta K (2020) Neural mesh flow: 3d manifold mesh generation via diffeomorphic flows. In: 34th conference on neural information processing systems (NeurIPS 2020)
Hao W, Vogt NM, Meng Z, Hwang SJ, Koscik RL, Johnson SC, Bendlin BB, Singh V (2020) Learning amyloid pathology progression from longitudinal pib-pet images in preclinical Alzheimer’s disease. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI). IEEE, pp 572–576
Hasani R, Lechner M, Amini A, Rus D, Grosu R (2021) Liquid time-constant networks. Proc AAAI Conf Artif Intell 35:7657–7666
Henschel L, Conjeti S, Estrada S, Diers K, Fischl B, Reuter M (2020) Fastsurfer-a fast and accurate deep learning based neuroimaging pipeline. Neuroimage 219:117012
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Preprint http://arxiv.org/abs/1503.02531
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hong J, Brendel M, Erlandsson K, Sari H, Lu J, Clement C, Bui NV, Meindl M, Ziegler S, Barthel H et al (2023) Forecasting the pharmacokinetics with limited early frames in dynamic brain pet imaging using neural ordinary differential equation. IEEE transactions on radiation and plasma medical sciences
Hoover A, Kouznetsova V, Goldbaum M (2000) Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans Med Imaging 19(3):203–210
Hu J, Chen Y, Zhong J, Ju R, Yi Z (2018) Automated analysis for retinopathy of prematurity by deep neural networks. IEEE Trans Med Imaging 38(1):269–279
Hu J, Chen Y, Yi Z (2019) Automated segmentation of macular edema in oct using deep neural networks. Med Image Anal 55:216–227
Hu J, Yu C, Yi Z, Zhang H (2023) Enhancing robustness of medical image segmentation model with neural memory ordinary differential equation. Int J Neural Syst 2023:2350060–2350060
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH (2021) nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18(2):203–211
Jack CR Jr, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, Borowski B, Britson PJ, Whitwell L et al (2008) The Alzheimer’s disease neuroimaging initiative (adni): Mri methods. J Magn Reson Imaging 27(4):685–691
Jeong S, Jung W, Sohn J, Suk HI (2022) Deep geometrical learning for Alzheimer’s disease progression modeling. In: 2022 IEEE international conference on data mining (ICDM). IEEE, pp 211–220
Joshi A, Hong Y (2023) R2net: efficient and flexible diffeomorphic image registration using lipschitz continuous residual networks. Med Image Anal 89:102917
Jung W, Jun E, Suk H-I, Initiative ADN et al (2021) Deep recurrent model for individualized prediction of Alzheimer’s disease progression. Neuroimage 237:118143
Kelly J, Bettencourt J, Johnson MJ, Duvenaud DK (2020) Learning differential equations that are easy to solve. Adv Neural Inf Process Syst 33:4370–4380
Kidger P (2021) On neural differential equations. University of Oxford, Oxford
Kidger P, Chen RT, Lyons TJ (2021) “hey, that’s not an ode”: Faster ode adjoints via seminorms. In: ICML, pp 5443–5452
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Info Process Syst 25:10
Lachinov D, Chakravarty A, Grechenig C, Schmidt-Erfurth U, Bogunović H (2023) Learning spatio-temporal model of disease progression with neuralodes from longitudinal volumetric data. IEEE Trans Med Imaging 2023:14
Landman B, Xu Z, Igelsias J, Styner M, Langerak T, Klein A (2015) Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc MICCAI multi-atlas labeling beyond cranial vault-workshop challenge, vol. 5, p. 12
Lebrat L, Santa Cruz R, Gournay F, Fu D, Bourgeat P, Fripp J, Fookes C, Salvado O (2021) Corticalflow: a diffeomorphic mesh deformation module for cortical surface reconstruction. In: Advances in neural information processing systems (NeurIPS 2021): 35th conference on neural information processing systems. Neural Information Processing Systems Foundation, Inc
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lee HY, Tseng HY, Mao Q, Huang JB, Lu YD, Singh M, Yang MH (2020) Drit++: diverse image-to-image translation via disentangled representations. Int J Comput Vision 128:2402–2417
Li D, Tang P, Zhang R, Sun C, Li Y, Qian J, Liang Y, Yang J, Zhang L (2021) Robust blood cell image segmentation method based on neural ordinary differential equations. Comput Math Methods Med 2021:12
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Ma Q, Li L, Robinson EC, Kainz B, Rueckert D, Alansary A (2022) Cortexode: Learning cortical surface reconstruction by neural odes. IEEE Trans Med Imaging 42(2):430–443
Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL (2007) Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. J Cogn Neurosci 19(9):1498–1507
Marinescu RV, Oxtoby NP, Young AL, Bron EE, Toga AW, Weiner MW, Barkhof F, Fox NC, Klein S, Alexander DC et al (2018) Tadpole challenge: prediction of longitudinal evolution in alzheimer’s disease. Preprint at http://arxiv.org/abs/1805.03909
Ma Q, Robinson EC, Kainz B, Rueckert D, Alansary A (2021) Pialnn: a fast deep learning framework for cortical pial surface reconstruction. In: Machine learning in clinical neuroimaging: 4th international workshop, MLCN 2021, held in conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 4. Springer, pp 73–81
Massin P, Chabouis A, Erginay A, Viens-Bitker C, Lecleire-Collet A, Meas T, Guillausseau P-J, Choupot G, André B, Denormandie P (2008) Ophdiat: a telemedical network screening system for diabetic retinopathy in the île-de-France. Diabetes Metab 34(3):227–234
Memory LST (2010) Long short-term memory. Neural Comput 9(8):1735–1780
Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, Burren Y, Porz N, Slotboom J, Wiest R et al (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans Med Imaging 34(10):1993–2024
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. In: International conference on learning representations
Mok TC, Chung A (2020) Fast symmetric diffeomorphic image registration with convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4644–4653
Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P (2023) Foundation models for generalist medical artificial intelligence. Nature 616(7956):259–265
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, et al (2018) Attention u-net: Learning where to look for the pancreas. Preprint at http://arxiv.org/abs/1804.03999
Pandey P, Chasmai M, Sur T, Lall B (2023) Robust prototypical few-shot organ segmentation with regularized neural-odes. IEEE Trans Med Imaging 2023:4
Pinckaers H, Litjens G (2019) Neural ordinary differential equations for semantic segmentation of individual colon glands. Preprint at http://arxiv.org/abs/1910.10470arXiv:1910.10470
Qiu X, Shi S, Tan X, Qu C, Fang Z, Wang H, Gao Y, Wu P, Li H (2023) Gram-based attentive neural ordinary differential equations network for video nystagmography classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 21339–21348
Rahman MM, Sadique MS, Temtam AG, Farzana W, Vidyaratne L, Iftekharuddin KM (2021) Brain tumor segmentation using unet-context encoding network. In: International MICCAI Brainlesion workshop. Springer, pp 463–472
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Ru J, Lu B, Chen B, Shi J, Chen G, Wang M, Pan Z, Lin Y, Gao Z, Zhou J et al (2023) Attention guided neural ode network for breast tumor segmentation in medical images. Comput Biol Med 159:106884
Rueckert D, Sonoda LI, Hayes C, Hill DL, Leach MO, Hawkes DJ (1999) Nonrigid registration using free-form deformations: application to breast mr images. IEEE Trans Med Imaging 18(8):712–721
Sadique M, Rahman M, Farzana W, Temtam A, Iftekharuddin K (2022) Brain tumor segmentation using neural ordinary differential equations with unet-context encoding network. In: International MICCAI Brainlesion Workshop. Springer, pp 205–215
Salvador M, Strocchi M, Regazzoni F, Dede L, Niederer S, Quarteroni A (2023) Real-time whole-heart electromechanical simulations using latent neural ordinary differential equations. Preprint at http://arxiv.org/abs/2306.05321
Sarrafzadeh O, Rabbani H, Talebi A, Banaem HU (2014) Selection of the best features for leukocytes classification in blood smear microscopic images. In: Medical imaging 2014: digital pathology, vol. 9041. SPIE, pp 159–166
Shattuck DW, Leahy RM (2002) Brainsuite: an automated cortical surface identification tool. Med Image Anal 6(2):129–142
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Preprint at http://arxiv.org/abs/1409.1556
Sirinukunwattana K, Pluim JP, Chen H, Qi X, Heng P-A, Guo YB, Wang LY, Matuszewski BJ, Bruni E, Sanchez U et al (2017) Gland segmentation in colon histology images: the glas challenge contest. Med Image Anal 35:489–502
Tang H, Chen X, Liu Y, Lu Z, You J, Yang M, Yao S, Zhao G, Xu Y, Chen T et al (2019) Clinically applicable deep learning framework for organs at risk delineation in ct images. Nat Machine Intell 1(10):480–491
Thies M, Wagner F, Gu M, Folle L, Felsner L, Maier A (2022) Learned cone-beam ct reconstruction using neural ordinary differential equations. In: 7th international conference on image formation in X-ray computed tomography, vol. 12304. SPIE, pp 48–54
Thirion J-P (1998) Image matching as a diffusion process: an analogy with Maxwell’s demons. Med Image Anal 2(3):243–260
Tian Y, Feng Y, Wang C, Cao R, Zhang X, Pei X, Tan KC, Jin Y (2022) A large-scale combinatorial many-objective evolutionary algorithm for intensity-modulated radiotherapy planning. IEEE Trans Evol Comput 26(6):1511–1525
Van Aarle W, Palenstijn WJ, Cant J, Janssens E, Bleichrodt F, Dabravolski A, De Beenhouwer J, Batenburg KJ, Sijbers J (2016) Fast and flexible x-ray tomography using the astra toolbox. Opt Express 24(22):25129–25147
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Info Process Syst 30:10
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9197–9206
Wen Z (2020) Temporal dynamic model for resting state fmri data: a neural ordinary differential equation approach. Preprint at http://arxiv.org/abs/2011.08146
Wendland P, Birkenbihl C, Gomez-Freixa M, Sood M, Kschischo M, Fröhlich H (2022) Generation of realistic synthetic data using multimodal neural ordinary differential equations. NPJ Digital Med 5(1):122
Wickramasinghe U, Remelli E, Knott G, Fua P (2020) Voxel2mesh: 3d mesh model generation from volumetric data. In: Medical image computing and computer assisted intervention–MICCAI 2020: 23rd international conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23. Springer, pp 299–308
Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu Y, Jiahao TZ, Wang J, Yushkevich PA, Hsieh MA, Gee JC (2022) Nodeo: a neural ordinary differential equation based optimization framework for deformable image registration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 20804–20813
Xu J, Chen EZ, Chen X, Chen T, Sun S (2021) Multi-scale neural odes for 3d medical image registration. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24. Springer, pp 213–223
Yang Z, Hu Z, Ji H, Lafata K, Vaios E, Floyd S, Yin F-F, Wang C (2023) A neural ordinary differential equation model for visualizing deep neural network behaviors in multi-parametric mri-based glioma segmentation. Med Phys 2023:1
Yap MH, Pons G, Marti J, Ganau S, Sentis M, Zwiggelaar R, Davison AK, Marti R (2017) Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J Biomed Health Inform 22(4):1218–1226
Yazdanpanah AP, Afacan O, Warfield SK (2019) Ode-based deep network for mri reconstruction. Preprint at http://arxiv.org/abs/1912.12325
Yi Z (2023) nmODE: neural memory ordinary differential equation. Artif Intell Rev 56:14403–14438
Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley M, Defazio A, Stern R, Johnson P, Bruno M et al (1811) fastmri: an open dataset and benchmarks for accelerated MRI. Preprint at http://arxiv.org/abs/1811.08839
Zeghlache R, Conze P-H, Daho MEH, Li Y, Boité HL, Tadayoni R, Massin P, Cochener B, Brahim I, Quellec G, et al (2023) Longitudinal self-supervised learning using neural ordinary differential equation. In: International workshop on predictive intelligence in medicine. Springer, pp 1–13
Zhang Y (2013) Convergence analysis of recurrent neural networks, vol. 13
Zhang Y, Ji Z, Niu S, Leng T, Rubin DL, Chen Q (2019) A multi-scale deep convolutional neural network for joint segmentation and prediction of geographic atrophy in sd-oct images. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019). IEEE, pp 565–568
Zhao Q, Liu Z, Adeli E, Pohl KM (2021) Longitudinal self-supervised learning. Med Image Anal 71:102051
Zheng S, Song Y, Leung T, Goodfellow I (2016) Improving the robustness of deep neural networks via stability training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4480–4488
Zhou J, Cao W, Wang L, Pan Z, Fu Y (2022) Application of artificial intelligence in the diagnosis and prognostic prediction of ovarian cancer. Comput Biol Med 146:105608
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62106162, China Postdoctoral Science Foundation under Grant 2021M692269, Sichuan University Postdoctoral Science Foundation under Grant 2022SCU12080, National Natural Science Foundation of China Regional Project under Grant 62262074, Science and Technology Plan Project of of Yunnan Province under Grant 202405AC350083.
Author information
Authors and Affiliations
Contributions
Hao Niu, Yuxiang Zhou, Xiaohao Yan, Jun Wu, and Yuncheng Shen wrote the main manuscript text. Zhang Yi and Junjie Hu revised it critically for important intellectual content.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Niu, H., Zhou, Y., Yan, X. et al. On the applications of neural ordinary differential equations in medical image analysis. Artif Intell Rev 57, 236 (2024). https://doi.org/10.1007/s10462-024-10894-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-024-10894-0