1 Introduction

Renal cell carcinoma (RCC) is one of the most common tumors of the urinary system, accounting for about 3-4% of all malignancies and about 3% of cancer-related mortality [1, 2]. Among them, Clear Cell Renal Cell Carcinoma (ccRCC) accounts for approximately 70–80% of all RCCs, making it the most prevalent subtype [3]. It is characterized by the presence of clear cytoplasm in tumor cells and is associated with genetic alterations, including mutations in the Von Hippel-Lindau (VHL) tumor suppressor gene that occurs in up to 85% of cases [4, 5]. Despite advances in diagnostic imaging techniques such as computed tomography (CT) and magnetic resonance imaging (MRI) [6, 7], accurate staging and prognostication of ccRCC remain challenging. Traditional methods for ccRCC staging and prognosis rely on histopathological examination of tumor tissue obtained from biopsy or surgical resection. Pathological staging, based on tumor size, extent of invasion, and lymph node involvement, provides important prognostic information but may be subject to interobserver variability and limitations in predicting individual patient outcomes [8]. This has prompted researchers to seek more reliable solutions.

In recent years, the integration of AI models into cancer research and clinical practice has opened up new ways to improve the accuracy and precision of ccRCC pathological staging and prognosis prediction. AI algorithms, particularly those based on machine learning and deep-learning techniques, can analyze large data sets, including histopathological images, molecular profiles, and clinical data, to identify subtle patterns and associations that may elude human obervers [9, 10]. For example, a multimodal deep learning (DL) model incorporating histopathological images, radiological scans, and genomic data was able to predict the prognosis of ccRCC with a C-index of 0.7791 and an accuracy rate of 83.43%. This capability aids in risk stratification and treatment decision-making [11]. Xu et al. explored a new way to provide grade assessment of ccRCC on the basis of the individual’s appearance on CT images and constructed a DL method including self-supervised learning to identify patients with high-grade ccRCC, which provided a valuable means for ccRCC grading and stratification and individualized treatment for patients [12]. Frederik Wessels et al. showed good performance and versatility for CNN-based overall survival prediction of ccRCC using Hematoxylin and Eosin (H&E) stained slides, and could be combined with existing clinicopathological parameters [13]. This widely applicable technique shows the potential of artificial intelligence in image-based outcome prediction.

Therefore, the application of artificial intelligence models in pathological staging and prognostic prediction of ccRCC is a promising frontier in cancer research and clinical care. By harnessing the power of AI to analyze complex datasets and discover patterns, these models have the potential to increase diagnostic accuracy, improve prognostic assessment, and ultimately improve outcomes for patients with ccRCC. In this study, we collected the pathological slides of ccRCC patients in TCGA database, used ImageNet in combination with cellprofiler and ResNet50 to extract pathological features, and used DL algorithms to build a learning model to achieve more accurate prediction of ccRCC pathological staging. In addition, we further constructed risk models through DL features to predict the survival outcome of patients based on different risk layers.

2 Materials and methods

2.1 Patient cohorts

We collected pathological sections and clinical data from 513 ccRCC patients using formalin-fixed, paraffin-embedded samples (scanned slides with excessive labeling over the tissue area, damaged slides, and slides without tumors were excluded, with only one sample selected per patient.) in The Cancer Genome Atlas (TCGA) database, including details such as survival time, survival status, ethnicity, and staging. Additionally, we acquired 144 renal cancer paraffin samples from Shanghai Outdo Biotech Compang (Shanghai, China) for external validation. As the TCGA database is publicly available for research purposes, no ethical approval was required.

2.2 Image pre-processing

Color is a critical concern in whole-slide images (WSIs). Ensuring the standardization and verification of color on digital slide displays is crucial for implementing digital pathology effectively. Color variation primarily arises from differences in histology laboratory protocols and practices. Additionally, capture parameters such as illumination and filters, along with the image processing inherent to digital systems and display characteristics, can also influence the displayed color [14, 15]. Consequently, following the acquisition of H&E images, the originals were cropped and subjected to color normalization. These small patches then underwent normalization for them using the Macenko method with appropriate modifications [16]. Z-score normalization was subsequently applied to the RGB channels to standardize the distribution of image intensities.

2.3 DL feature extraction and selection

Given the substantial dimensions of WSIs, typically 100,000 * 80,000 pixel tiles, they underwent segmentation into numerous patches. The segmentation process involved exhaustively partitioning tissue regions into non-overlapping patches measuring 256 × 256 pixel tiles, employing a magnification factor of 20× with the OpenSlide library in Python. Feature vectors were derived utilizing a modified ResNet50 model pretrained on ImageNet. These vectors were generated by inputting cropped patches of size 256 × 256 pixel tiles.

2.4 Deep learning training

The 513 cases of ccRCC pathology slides were randomly divided into training set (80%) and validation set (20%) for DL. A ccRCC pathological histological classification was established, and its robustness was further estimated in an external validation set. The training was conducted using a 10-fold Monte Carlo cross-validation strategy. In order to validate the accuracy of the pathology model in identifying regions, we conducted a thorough evaluation employing receiver operating characteristic (ROC) curves at the patch level.

2.5 Attention map generation

CLAM produces interpretable heatmaps, allowing for an intuitive analysis of how each tissue region contributes to the model’s predictions in each WSIs [17]. These heatmaps provide pathologists with insights into histological and cytological features that are closely linked to high predictive value. To account for the differing significance of various regions in the pathological image for the model’s predictions, we calculated and saved unstandardized attention scores for all patches extracted from the image, using attention branches aligned with the model’s predicted categories. CLAM learned the attention score for each patch, which was then converted into percentiles. Subsequently, the percentiles for each WSI were normalized to a range of [0, 1], where 1 represented the highest predictiveness and 0 represented the lowest informativeness. The normalized scores were then converted into RGB colors using heatmaps and depicted above their corresponding spatial positions in the pathological images, visually highlighting areas of high attention in red and areas of low attention in blue.

2.6 The least absolute shrinkage and selection operator (LASSO)

LASSO regression is an extension of linear regression, which is characterized by variable selection and regularization while fitting a generalized linear model. The L1 regularization term was used to constrain the coefficients of the model to achieve feature selection. The complexity of LASSO regression is controlled by the parameter λ, with larger values of λ obtaining fewer features [18]. The R package ‘glmnet’ was used for LASSO analysis to select the most relevant DL features. The optimal penalty parameter λ values were determined through tenfold cross-validation.

2.7 Identification and validation of histopathologic DL‑signature

We utilized LASSO-Cox modeling to construct a DL signature, and a DL signature associated risk score was calculated for each patient by summing the product of each DL feature and its regression coefficients.This risk score was independently evaluated in an external validation set for ccRCC patients. Using the median risk score, patients were categorized into high-risk or low-risk groups. Kaplan-Meier curves were drawn, and differences in survival outcomes between these groups were analyzed using the log-rank test. The prognostic efficacy of the histopathological DL signature within our model was assessed through the area under the curve (AUC) values, which were derived from time-dependent ROC curves. Furthermore, the predictive power of the DL features was examined using multivariable Cox analysis that integrated clinical factors.

2.8 Statistical analysis

All analyses were performed with R (version 4.3.1) or Python (version 3.7.12). The Wilcoxon test was used to analyze the differences between the two groups. Kaplan-Meier method was used for estimating overall survival, and a Log-rank test was taken to compare different Kaplan-Meier curves. All statistical tests were considered significant with p < 0.05.

3 Results

3.1 Description of entry data

In this study, FFEP of 513 ccRCC patients were downloaded from the TCGA database. In training set, 46% of ccRCC patients were younger than 60 years of age, and 61.7% had stage I-II disease. In the external validation set, 58% of the patients were younger than 60 years of age, and 92% had stage I-II disease. The clinical characteristics of the patients enrolled in the study are shown in Table 1. Each pathological slice was cut into multiple 256 × 256 pixel tiles, the ImageNet pre-training weights of resnet50 were used to extract features, and then the multi-instance learning algorithm based on attention mechanism integrated in CLAM was used to train it. In addition, we applied the Macenko method to normalize the color of the pixel tiles. The results before and after normalization are shown in the Fig. 1.

Table 1 Summary of the general clinical information of ccRCC patients
Fig. 1
figure 1

The randomly extracted images were normalized. A Randomly extracted image regions. B TCGA-ccRCC pathological sections before normalization. C TCGA-ccRCC pathological sections after normalization. D Pathological sections of validation data before normalization. E Pathological sections of validation data after normalization

3.2 Performance of the histopathological classifier

A DL model based on histopathology was developed within the training set of the TCGA-ccRCC cohort, employing a training-to-validation ratio of 8:2. The model’s classification performance in clinical diagnostic tasks was assessed using a 10-fold Monte Carlo cross-validation method. Given the susceptibility of the DL system to overfitting its training data, we incorporated external validation using H&E stained slide data from 144 ccRCC cases. It is noteworthy that the external validation dataset and the TCGA-ccRCC cohort exhibit significant disparities in patient ethnicity and slide preparation techniques. Considering these factors, external validation was conducted, and the ROC curve along with the precision-recall curve demonstrated that our model achieved AUC of 0.875 (Fig. 2A) and average precision score of 0.809 (Fig. 2C), indicating the model’s robust generalizability. Figure 2B illustrates the confusion matrix of the predictive model. To visualize and interpret the relative importance of each region in the WSIs, the attention scores associated with the model’s predicted categories were converted into percentiles, normalized, and mapped onto the original slides, thus generating attention heatmaps. Figure 3 displays representative heatmaps for different stages, offering predictions at the patch level for stages I-IV.

Fig. 2
figure 2

DL algorithm predicts pathologic staging in ccRCC patients. A The ROC curve and AUC value of DL model in validation data. B The confusion matrix of DL model. C The precision-recall curve of DL model

Fig. 3
figure 3

Attention heat map of pathological tissue sections of ccRCC

3.3 Construction of histopathologic DL-signature

We used cellprofiler to extract 611 image features of patches, such as gray scale, shape, intensity, and texture, and then selected prognostic image features by LASSO regression. The results showed that the LASSO penalty model identified seven DL features (Fig. 4A, B). Multivariate model was used to create the gene signature, and a DL signature associated risk score was generated for each patient with the formula:\(\:\sum\:_{i=1}^{n}Cofe\left(i\right)\times\:x\left(i\right)\). The median of the risk scozre was used as the cut-off point to divide the patients into high-risk group and low-risk group. Log-rank test was used to compare the differences in survival outcomes between the two groups. The results showed that the overall survival rate of patients in the high-risk group was significantly lower than that in the low-risk group (P = 0.003, Fig. 4C), and there were significant differences in risk scores between groups with different survival outcomes (Fig. 4D). The AUC value calculated by the time-dependent ROC curve was used to evaluate the predictive performance of the histopathological DL signature in the prognostic model. The AUC values for predicting 1 -, 3 -, and 5-year overall survival rates were 0.68, 0.69, and 0.69 (Fig. 4E), respectively, indicating that the prediction model had high sensitivity and specificity. The results of multivariate cox regression showed that age, stage and risk score were significantly associated with overall survival of ccRCC patients (Fig. 4F).

Fig. 4
figure 4

LASSO analysis based on DL features and established risk model. A Cross-validation to select the optimal tuning parameter log (λ) in LASSO regression analysis. B LASSO coefficient profiles of candidate genes. C Kaplan-Meier curve of survival probability between High risk group and Low risk group. D Boxplot of risk score differences in different groups. E Time dependent ROC curves and corresponding AUC values. F Forest plots of multivariate Cox regression

In addition, we separately evaluated risk scores in ccRCC patients in an external cohort. Kaplan-Meier curve showed that the overall survival rate of the high-risk group was significantly lower than that of the low-risk group (Fig. 5A), and the risk score of the death group was significantly higher than that of the survival group (Fig. 5B). The AUC values for predicting 1 -, 3 -, and 5-year OS were 0.60, 0.60, and 0.65, respectively (Fig. 5C), which were similar to the results of the training set. Meanwhile, the results of multivariate cox regression showed that age and stage were significantly correlated with the overall survival of ccRCC patients (Fig. 5D).

Fig. 5
figure 5

External data validation of risk models. A Kaplan-Meier curve of survival probability between High risk group and Low risk group. B Boxplot of risk score differences in different groups. C Time dependent ROC curves and corresponding AUC values. D Forest plots of multivariate Cox regression

4 Discussion

Traditional pathological diagnosis relies on experienced pathologists, and this approach is limited by subjective judgment and individual differences that may lead to inconsistent diagnostic results [19,20,21]. However, the rise of artificial intelligence and DL technology provides new ways to improve diagnostic accuracy and efficiency through automated image analysis. DL, especially convolutional neural network (CNN), has significant potential in medical image analysis by automatically learning complex features from large datasets [22,23,24]. It has been effectively used in the fields of chest x-ray, skin cancer detection, and histopathological image analysis [25, 26]. In the analysis of pathological tissue of ccRCC, these models can identify subtle differences in tumours and accurately assess grading, aggressiveness and genetic variation, helping clinicians to develop personalised treatment plans [27, 28]. By analysing a large amount of pathological section data, deep learning models can not only provide information about the current stage of a tumour, but also predict its future development trend, thus helping doctors make more scientific decisions before treatment.

In this study, we demonstrated the effectiveness of DL models in predicting stage and prognosis of ccRCC on histopathological analysis. The performance of the DL classifier was evaluated using 10-fold Monte Carlo cross validation in the TCGA-ccRCC cohort, showing excellent predictive power with an AUC of 0.875 and an average accuracy score of 0.809. These metrics showed high sensitivity and specificity, indicating that the model could reliably distinguish different stages of ccRCC from histopathological images, while the addition of external validation further enhanced the robustness of the model. We chosed the locked model, which is characterised by high stability, with the structure and parameters carefully tuned to perform optimal solution search for a specific data class. However, compared to the adaptive model, it is less flexible and performance may be significantly degraded when dealing with new tasks that differ significantly from the training task. This study focuses on the pathohistology of ccRCC, and its results have been validated by pathologists to have high reliability and practicality. Moreover, the DL model facilitated detailed visualization of tumor heterogeneity through the generation of attention heatmaps, which map the relevance of different regions in whole slide images to the model’s predictions. This not only enhances the interpretative power of the DL system but also aids pathologists in focusing on critical areas within the slides.

Subsequently, LASSO regression was performed to identify the key image features related to patient prognosis. These biomarkers were used to calculate the risk score, and the prognostic significance of the risk score was confirmed by survival analysis, which showed that high-risk patients had a significantly worse prognosis. Further evaluation in an external cohort confirmed the consistency of the predictive performance of the model, with similar AUC values under the ROC curve over time, thus validating the usefulness of the model in the clinical setting. In addition, multivariate Cox regression analysis identified age and stage as important prognostic factors, which is consistent with clinical understanding that these factors are key determinants of ccRCC prognosis. Although morphological analysis is still time-consuming compared to direct histological examination by pathologists, risk models constructed by screening for pathological features can be effective in improving diagnostic accuracy and reducing inter-observer variability. Advances in technology, particularly the use of specific cell membrane staining techniques to accurately distinguish individual cells from their surroundings, coupled with the development of automated morphohistological analysis systems, may provide a powerful aid to the pathologist [29].

However, this study has some limitations. Firstly, the generalizability of the findings may be constrained by the diversity of the data set, particularly regarding ethnic and geographic variations which are not fully represented. Additionally, the reliance on retrospective data from TCGA-ccRCC and external cohorts may introduce selection bias, as these samples might not accurately reflect the broader patient population or the latest clinical practices. DL models usually have high sample size requirements because they need a large amount of data to effectively learn complex features and patterns. The interpretability of DL models is often limited, which could hinder their acceptance and trust among clinicians who prioritize understanding the diagnostic or prognostic basis of algorithmic decisions. Addressing these limitations in future research will be crucial for enhancing the practicality and acceptance of DL-based diagnostic tools in routine clinical practice.

5 Conclusions

In summary, we established a DL model to predict patient staging and prognosis based on histopathology and demonstrated the accuracy of the model’s recognition. This helps to assist the pathologist in making a clinical diagnosis. As these techniques continue to evolve, their integration into clinical practice has the potential to significantly improve the prognosis of ccRCC patients.