Introduction

Colorectal adenocarcinoma (COAD), a subtype of colorectal cancer (CRC), accounts for a significant portion of new cases and deaths. It ranked fifth among all cancer types in terms of both incidence and mortality rates1. COAD incidence in China consistently exhibits high morbidity and mortality rates2. This complex disease is influenced by both environmental and genetic factors, indicating its multifactorial nature3. Advances in understanding COAD’s pathophysiology have improved screening and diversified therapies, enhancing 5 year survival rates4. However, around 40% of COAD patients experience relapse, often with advanced metastasis5. Investigating reliable predictive biomarkers is crucial to identify high-risk patients and guide effective treatment for improved prognosis.

Mitochondria play crucial roles in cellular functions and significantly contribute to tumor occurrence and development6. They have a dual impact on tumor development, facilitating ATP production essential for growth and generating reactive oxygen species (ROS) that contribute to oncogenic DNA accumulation. The pivotal role of mitochondrial outer membrane permeability or mitochondrial permeability transition is vital for cancer cell survival during malignant transformation7,8. In tumor immunity, mitochondria have a vital function in activating the inflammasome, influencing memory immune cell and macrophage differentiation, and promoting antitumor responses9,10.

Studies have shown a strong connection between abnormal mitochondrial function and COAD11,12,13. Researching and identifying reliable mitochondrial biomarkers to assess COAD patient prognosis is crucial. Many studies have created survival prediction models, highlighting the field’s importance14,15,16. However, few studies specifically establish prognostic models for COAD considering the role of mitochondria.

The tumor microenvironment (TME) consists mainly of immune cells, cytokines, and stromal cells17. The TME significantly impacts immune cell evasion, inhibition, and drug resistance in cancer18. The effectiveness of immune checkpoint blockade (ICB) in cancer treatment is notably influenced by the composition of the TME19. ICB therapies have been successful in reactivating and enhancing the body’s natural anti-tumor immune response20. Several studies reported the significant therapeutic impact of immunotherapy inhibitors in the management of patients with advanced colorectal malignancies21,22,23,24.

Our research focuses on the crucial role of mitochondrial characteristics in COAD progression and immune evasion. We developed a prognostic risk model by analyzing nine mitochondria-associated genes, showing promising predictive capabilities for COAD patient prognosis. Additionally, we explored the relationship between the risk score, cellular interactions, TME, immune cell infiltration, immune checkpoint expression, tumor mutation burden (TMB), and immunotherapy response. Furthermore, drug sensitivity to 198 drugs was assessed, offering valuable insights for clinical treatment decisions in COAD.

Methods

Data collection

RNA-seq data for COAD samples were obtained from the TCGA database (41 Normal and 469 Tumor). Clinical information for samples was extracted using the UCSC Xena database (453 Tumor samples and 92 Normal samples with complete clinical data). After intersecting these datasets and excluding samples with incomplete clinical information, we retained 471 samples for analysis (430 Tumor and 41 Normal). Two validation cohorts, GSE39582 (556 samples) and GSE29623 (65 samples) were obtained (https://www.ncbi.nlm.nih.gov/geo/). We utilized a list of mitochondrial-related genes from the MitoCarta 3.0 database (https://www.broadinstitute.org/mitocarta/mitocarta30-inventory-mammalian-mitochondrial-proteins-and-pathways) (Table S1) and GSEA (http://www.gsea-msigdb.org/gsea/index.jsp).

Acquisition and processing of scRNA-seq data

Three CRC samples from GSE178318 were selected. scRNA-seq data, including cell and gene filtering, normalization, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE), were processed using the “Seurat” package. We filtered out cells with mitochondrial gene expression exceeding 5% and retained cells with 200 to 6000 detected genes. After cell filtering, high-quality scRNA-seq data were normalized to identify highly variable genes. PCA was conducted on these genes to identify the top 20 principal components (PCs) for clustering cells using t-SNE. Marker genes specific to each cell cluster were identified with the “FindAllMarkers” function. To determine cell types, the “SingleR” package was used for annotation.

Identification of differentially expressed genes (DEGs)

DEGs were detected with the “limma” package in R (v3.5.1), comparing normal and tumor samples, as well as high-risk and low-risk groups from the training set. Criteria for DEGs were |Log (2) fold change|> 2 and adjusted P < 0.01. Volcano plots (via “GdcVolcanoPlot” in R) illustrated DEGs. A Venn plot identified shared DEGs between sets and those linked to mitochondrial.

Construction and validation of prognostic mitochondrial-related risk signature

For a new prognostic gene signature, mitochondrial-related DEGs were selected through univariate Cox regression, Lasso regression, and multivariable Cox regression analyses (P < 0.05). Risk scores for samples were computed using the following formula:

$$Risk \, score \, = \, \Sigma \exp genei * \beta i$$

Gene expression is indicated as expgene_i, with βi as the coefficient index for each gene, summed across all signature genes. Survival analysis used R packages “survival” and “survminer.” An optimized cutoff determined risk groups for Kaplan–Meier (K–M) survival curves (430 Tumor samples). The Cox models’ proportional hazards assumptions were tested using the Schoenfeld Residuals test, and variables having a P-value > 0.05 were consideredas fulfilling the assumption. Signature efficacy was evaluated using ROC curves, risk plots, and the concordance index (C-index).

WGCNA

We used the WGCNA R package to establish a gene co-expression network, including genes with an upper 25 median deviation while excluding those with no significant expression. Pearson correlation values were calculated to construct the gene matrix. A soft threshold of β = 7 was chosen to create a scale-free network with high connectivity. We produced a topological overlap matrix (TOM) to estimate network connectivity and established a hierarchical clustering dendrogram with a minimum module size of 50. Modules with a similarity cutoff < 0.25 were merged. Finally, key genes were screened based on gene significance (GS) and module membership (MM), integrating quanTIseq results with module eigengenes (MEs).

Construction and valuation of nomogram

Univariate Cox regression analyzed the link between risk scores and clinical factors, identifying influential factors (P < 0.1) associated with the risk score. Multivariate Cox regression (P < 0.05) pinpointed significant survival predictors. Nomograms were constructed using these predictors, assigning scores to each variable. Total scores for patients were calculated by summing these assigned scores. Using total scores and survival probability, patient outcomes at 1, 3, and 5 years were determined. Nomogram effectiveness was evaluated via ROC curve analysis, calibration curves, and decision curve analysis (DCA) to measure its discriminative ability, predictive accuracy, and clinical utility.

The analyzing of Gene Ontology (GO) and Kyoto Encyclopedia of Genes

DEGs functions associated with mitochondria and those between diverse risk groups were scrutinized using R packages “clusterProfiler,” “org.Hs.eg.db,” “enrichplot,” and “ggplot2.” A significance threshold of adjusted P < 0.05 was applied during analysis to identify statistically meaningful functional candidates.

Cell–cell communication

Cell–cell communication networks from gene expression data were analyzed using the R package CellChat, integrating gene expression data with established interactions between signaling ligands, receptors, and cofactors from the Secreted Signaling and Cell–Cell Contact human databases. Circle and signaling role network plots visually represented and assessed the strength of these communication networks.

Gene set enrichment analyses (GSEA)

GSEA was performed using curated sets v7.4 collections acquired from the Molecular Signatures Database. GSEA 4.2.1 software was utilized for the analysis, with the complete transcriptome data from tumor samples serving as the input. Gene sets were deemed statistically significant if they demonstrated a P < 0.001 and met the criterion of FDR.

Tumor microenvironment

Stromal scores were computed using the ESTIMATE algorithm via the R package “estimate” (v3.5.1). TME-related biomarkers were extracted from GSEA results available at http://www.gsea-msigdb.org/gsea/index.jsp (Table S3–S6).

Quantization of relative abundance of 22 immune cell subtypes

The CIBERSORT algorithm in the R package determined tumor-infiltrating immune cell (TIIC) abundance in COAD samples by analyzing expression levels of specific markers. TIIC proportions were compared between high-risk and low-risk groups using the Wilcoxon test. A published study provided the list of immune checkpoints used. The R package “estimate” calculated immune score, ESTIMATE score, and tumor purity. The analysis also incorporated immune cell signatures from TISIDB.

Mutation analysis

Somatic mutation data was acquired from the cBioPortal database. To visualize the mutation landscape in patients of COAD belonging to low- and high-risk groups, a waterfall plot was generated using the “maftools” package in R. Furthermore, the TMB score was calculated for each sample.

Prediction of therapeutic sensitivity in patients with different risk scores

Risk scores’ predictive capability for immunotherapy response and 198 drugs (chemotherapies and targeted therapies) was evaluated using the “oncoPredict” package in R. IC50 values for the drugs were calculated and normalized. Drug details were sourced from the Genomics of Drug Sensitivity in Cancer (GDSC) database. The TIDE algorithm assessed the potential response to immunotherapy.

Statistical analysis

Statistical analysis, conducted using R and GraphPad Prism 8, utilized the independent student’s t-test or Mann–Whitney U test for group comparisons. The chi-square test assessed differences in immunotherapy response and clinical factors. Spearman correlation analysis was applied. The C-index evaluated variables (neoadjuvant treatment history, age, M stage, and risk score) for OS prediction. Significance was set at P < 0.05.

Results

Identification of DEGs related to mitochondrion and functional enrichment analysis in COAD

Figure 1 outlines the study’s workflow. In Fig. S1A, 2954 DEGs between normal and tumor COAD groups are identified and visualized. Subsequently, a combined analysis integrates GSEA with these DEGs, revealing 274 DEGs specifically associated with mitochondria in COAD (Fig. 2A).

Fig. 1
figure 1

Flowchart of this study. (ns no significance, *P < 0.05, **P < 0.01, ***P < 0.001).

Fig. 2
figure 2

Identification and analysis of mitochondrial-related prognostic genes in COAD. (A) Venn diagram of the intersection of 2954 mRNA DEGs and 2031 mitochondrial genes. (B) The GO analysis of 274 mitochondrial-related DEGs, including biological process (BP), cellular component (CC), and molecular function (MF). (C) The KEGG analysis of 274 mitochondrial-related DEGs. (D) Univariate cox regression analysis of OS for each mitochondrial-related DEGs, (E) LASSO regression of the 17 OS-related genes. Tuning parameter (λ) selection in LASSO regression with tenfold cross-validation, using minimum criteria. The partial likelihood deviance is plotted against log(λ). Dotted vertical lines mark the optimal log(λ) values using the minimum criteria and the one standard error rule, and the LASSO regression selected seventeen variables. (F) Multivariable Cox regression of the nine OS-related genes.

GO enrichment analysis of DEGs linked to COAD mitochondria revealed their involvement in processes like fatty acid metabolism and small molecule catabolism. Cellular component associations were mainly with the mitochondrial matrix and inner membrane. Molecular functions included diverse roles like coenzyme binding and transferase activity (Fig. 2B). KEGG pathway analysis identified enriched pathways such as PPAR signaling, lipid metabolism, and atherosclerosis (Fig. 2C).

Construction of mitochondrial-related risk signature

A univariate Cox regression analysis of the 274 mitochondria-associated DEGs identified 29 potential prognostic risk factors for COAD patients (P < 0.05, Fig. 2D). We used LASSO regression for feature selection in the training cohort, identifying the optimal tuning parameter λ as 0.015 when the partial likelihood binomial deviance was minimized. Seventeen variables with nonzero coefficients were retained (Fig. 2E). Refinement through multivariable Cox regression resulted in nine genes (Fig. 2F). Eventually, the nine selected mitochondrial-related DEGs (PPARGC1A, HSPA1A, MTUS1, MSRA, GLYATL1, TRAP1, OSBPL1A, MMP1, and KCNJ11) were used to construct a prognostic model for COAD patients. Details about these genes are available in Table 1.

Table 1 The information of 9 prognosis-related genes.

In tumor samples, PPARGC1A, HSPA1A, MTUS1, and MSRA were downregulated, while GLYATL1, TRAP1, MMP1, and KCNJ11 were upregulated compared to normal tissues (P < 0.05, Fig. S1B). K-M curve analysis revealed that lower expression of PPARGC1A, MTUS1, and MSRA was associated with shorter OS (P < 0.05, Fig. S1C).

To compute the risk score for each COAD patient, the following formula was utilized:

$$\begin{aligned} {\text{Risk score }} = \, & - 0.{1}0{7} \times {\text{PPARGC1A}} + 0.{261} \times {\text{HSPA1A}} - 0.{3}0{2} \times {\text{MTUS1}} - 0.{3}0{9} \times {\text{MSRA}} \\ & - 0.{121} \times {\text{GLYATL1}} - 0.{571} \times {\text{TRAP1}} + 0.{239} \times {\text{OSBPL1A }}0.0{92} \times {\text{MMP1}} + 0.{3}0{1} \times {\text{KCNJ11}} \end{aligned}$$

Patients, categorized into high- and low-risk groups based on their median risk score, exhibited significantly shorter OS in the high-risk group (P < 0.001, Fig. 3A). The results of the Schoenfeld Residuals test showed that none of the covariates were statistically significant (P > 0.05), and the global test was also not statistically significant. Therefore, we can conclude that the Cox model meets the proportional hazards assumption (Fig. S2). ROC curves, assessing the risk model’s accuracy for 1 year, 3 year, and 5 year OS, yielded AUC values of 0.740, 0.723, and 0.763, respectively (Fig. 3B). Figure 3C illustrated the correlation between the risk score and survival of the nine gene expressions, affirming the risk model’s reliability in predicting COAD patient prognosis.

Fig. 3
figure 3

Prognostic analysis and functional enrichment in COAD: risk stratification, WGCNA, and pathway enrichment. (A) Kaplan–Meier curves of the OS of patients in the high- and low-risk groups in the TCGA-COAD training cohort. (B) ROC curves for predicting 1-, 3-, and 5-year OS in the TCGA-COAD training cohort. (C) Distribution of risk score, survival status (red dots indicate dead, blue dots indicate alive) and the gene expression of nine model genes in the TCGA-COAD training cohort. (D) Gene dendrogram and module colors. The dendrogram groups genes by co-expression, with modules indicated by different colors. (E) Module-trait relationships. The heatmap displays correlations between gene modules and clinical traits, with red for positive and blue for negative correlations. P-values are shown in parentheses. (E,F) GO and KEGG enrichment analysis of the turquoise module.

The prognostic risk model was validated in GSE29623 and GSE39582 datasets, confirming consistent results with the training cohort. Like the training cohort, the high-risk group in the validation cohorts exhibited shorter OS than the low-risk group (Fig. S3A and Fig. S3B). In the GSE29623 dataset, AUC values were 0.682, 0.663, and 0.732 for the 1, 3, and 5 year survival ROC curve, respectively (Fig. S3A). For GSE39582, AUC values were 0.617, 0.635, and 0.616 (Fig. S3B). Survival declined with increasing risk score (Fig. S3A and Fig. S3B). Expressions of GLYATL1, TRAP1, MMP1, and KCNJ11 significantly increased, while PPARGC1A, HSPA1A, MTUS1, MSRA, and OSBPL1A significantly decreased in COAD in GSE39582, consistent with the training cohort results (Fig. S3C). Additionally, chi-squared testing revealed a higher proportion of terminal cancer patients in the high-risk group compared to the low-risk group (P < 0.05, Table 2).

Table 2 Clinical characteristics between low- and high-risk groups.

Identification of mitochondrial-related gene signatures modules using WGCNA and construction of nomogram

We chose β = 7 as an appropriate soft threshold to construct a scale-free network (R2 = 0.87). The dynamic cut tree was made after merging similar gene modules (Fig. 3D). Among the four gene module, we found that the turquoise module had a close relationship with high-risk group fraction features (Fig. 3E). Additionally, we identified the key genes in the turquoise gene module under the condition of GS > 0.2 and MM > 0.8 (Table S2). Furthermore, we performed a GO enrichment and KEGG signaling pathway analysis for the turquoise module. The GO enrichment analysis revealed that these genes were predominantly involved in extracellular matrix organization, collagen-containing extracellular matrix, and extracellular matrix structural constituent (Fig. 3F). KEGG pathway analysis further implicated these genes in pathways such as PI3K-Akt signaling pathway, Calcium signaling pathway and Wnt signaling pathway (Fig. 3G).

Univariate and multivariate Cox regression analysis identified prognostic indicators in COAD, including age, M stage, neoadjuvant treatment history, and risk score (P < 0.05, Table 3, Fig. S4A). These indicators were used to develop a nomogram. ROC curves for the nomogram yielded AUC values of 0.771, 0.768, and 0.789 for 1-year, 3 year, and 5 year OS, respectively (Fig. S4B). The calibration curve showed good agreement between predicted and actual survival probabilities at 1 year, 3 year, and 5 year time points (Fig. S4D). Decision curves indicated the nomogram’s superior predictive ability compared to other factors in forecasting COAD prognosis (Fig. S4C).

Table 3 Univariate and multivariate Cox regression analysis of various prognostic parameters in COAD patients.

Functional enrichment analysis of the DEGs in high-risk and low-risk groups and intercellular communication in COAD

Functional enrichment analysis of the 387 DEGs in high-risk and low-risk groups (Fig. S5A) revealed associations primarily with processes related to the organization and structure of the extracellular matrix (ECM). At the cellular level, these DEGs were linked to the collagen-containing ECM and endoplasmic reticulum lumen. Molecular function implications included ECM structural constituent and receptor-ligand activity (Fig. 4A). KEGG pathway analysis highlighted involvement in crucial pathways such as cytokine-cytokine receptor interaction, IL-17 signaling pathway, and TNF signaling pathway (Fig. 4B).

Fig. 4
figure 4

Functional enrichment, single-cell analysis, and cell communication in COAD and CRC. (A) The top 10 enriched terms in GO analysis belonged to biological process (BP), cellular component (CC), and molecular function (MF) for DEGs between high- and low- risk group. (B) The top 20 enriched terms in KEGG analysis for DEGs between high- and low- risk group. (C) t-SNE plot of 111,748 cells from 46 CRC samples. (D) Overall communication condition of all cell clusters and communication condition between each cell type and other cell clusters. Circle sizes are proportional to the number of cells in each cell group and edge width represents the communication probability. (EG) Heatmaps showing the relative importance of each cell subtype based on four network centrality measures and ligand-receptor gene expression in CXCL, TNF, and CCL signaling pathway networks, respectively.

scRNA-seq data was employed to explore the roles of specific cell types in CRC’s cytokine-cytokine receptor interaction and TNF signaling pathways. Following quality control, 25,120 cells were retained. t-SNE dimensionality reduction and cell-type annotation identified eight major clusters: B cells, endothelial cells, epithelial cells, macrophages, monocytes, NK cells, T cells, and tissue stem cells (Fig. 4C). Fig S6 reflects the expression of signature genes in these eight types immune cells.

Cell communication analysis using CellChat identified signal networks among the eight cell types. Figure 4D and Fig. S7A illustrated comprehensive communication conditions, highlighting strong interactions between tissue stem cells, Epithelial cells, and macrophage cells. Cytokine-related signaling pathways were delineated (Fig. 5E–G). In the CXCL signaling network, tissue stem cells were senders, and B cells were receivers, while in the TNF signaling network, monocytes were senders, and Tissue stem cells were receivers (Fig. 5E, F). In the CCL signaling network, macrophages exhibited immune function in an autocrine manner (Fig. 5G). Ligand and receptor gene expressions across different cells supported these findings (Fig. S7B). Overall, results suggest strong cytokine-cytokine receptor interaction among macrophages, tissue stem cells, and epithelial cells.

Fig. 5
figure 5

Risk score was associated with TME signatures in COAD. (A) Association between stromal score (top) and tumor purity (down) and risk score and its distribution in the low- and high-risk groups. (B) Correlation analysis for risk score and the expressions of ECM and collagen signatures. (C) Correlation analysis for risk score and the expressions of matrisome signatures. (D) Correlation analysis for risk score and the expressions of CAF up signatures. (E) GSEA recognized different gene sets in the high-risk groups. (F) Cell composition of cell types in the low- and high-risk groups. P values were showed as: *P < 0.05, **P < 0.01, ***P: < 0.001, ****P < 0.0001.

Mitochondrial-related risk score was associated with TME signatures in COAD

After identifying TME-related signaling pathway enrichment, we investigated its connection with the risk score.

In Fig. 5A, a positive correlation was observed between the risk score and stromal score in COAD, while a negative correlation existed with tumor purity. Further exploration revealed positive associations between the risk score and expressions of ECM, matrisome signatures, and carcinoma-associated fibroblast (CAF) signatures (Fig. 5B–D). GSEA results indicated a significant link between the risk score and core matrisome, ECM organization, ECM glycoproteins, and ECM proteoglycans in the high-risk group (Fig. 5E). Deconvolution analysis showed that the high-risk group had increased tissue stem cells and decreased epithelial cells compared to the low-risk group (Fig. 5F). These findings underscore a robust association between the mitochondria-related risk score and TME signatures in COAD.

Additionally, deconvolution analysis of COAD samples showed that the low-risk group had fewer tissue stem cells and more epithelial cells compared to the high-risk group (Fig. 5F). These results highlight a significant connection between the mitochondria-associated risk score and TME signatures in COAD.

Mitochondrial‑related risk score was associated with immune signatures and immunotherapy responses in COAD

Immune cell infiltration analysis revealed increased resting CD4+ T cells and resting NK cells in the low-risk group, while the high-risk group had higher levels of naive B cells, Tregs, and macrophages (Fig. 6A). Negative correlations were found between the risk score and activated CD4+ T cell signatures, while positive correlations existed with Treg and macrophage signatures (Fig. 6B). Similar associations were observed with other tumor-infiltrating immune cell signatures in Fig. S8A. Despite higher immune scores in the high-risk group, statistical significance was not reached (P > 0.05, Fig. S8B).

Fig. 6
figure 6

Risk score was associated with immune infiltration and immune-associated signatures in COAD. (A) Box plot displays the differential abundance of 22 infiltrative immune cells by CIBERSORT database between high-risk and low-risk groups of COAD samples. (B) Correlation analysis for risk score and the expressions of activated CD4+ T cell, regulatory T cell and macrophages signatures. (C) Expression variation of immune checkpoint. P values were showed as: ns not significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

The analysis of 48 immune checkpoint molecules indicated a significant upregulation of PD-1 and TIM-3 expressions in the high-risk group, suggesting substantial modulation (Fig. 6C). This prompted exploration of the risk score’s predictive ability for immunotherapy responses. Compared to the low-risk group, the high-risk group showed a notably lower immunotherapy response rate in the TCGA-COAD cohort (Fig. 7A). Immune scores analysis revealed a significantly lower response rate in the low-immune subgroup compared to the high-immune subgroup (Fig. 7B, left). Notably, Fig. 7B (right) presents a stratified analysis of immunotherapy response rates based on combinations of risk and immune scores. The Low Risk + Low Immune Score group shows a higher percentage of responders (P < 0.05). These findings suggest that combining risk score and immune score assessments could accurately predict immunotherapy response in COAD.

Fig. 7
figure 7

Risk score is a potential biomarker to predict benefits from immune therapies in COAD. (A) TIDE predicted the proportion of patients with response to immunotherapy in low-risk and high-risk groups. (B) TIDE predicted the proportion of patients responding to immunotherapy in both low- and high-immune score groups, as well as across four groups based on combined risk and immune scores. (C) TIDE predicted the proportion of patients responding to immunotherapy across MSS, MSI-L, and MSI-H groups, as well as in six groups defined by combined risk score and microsatellite status. (D) TIDE predicted the proportion of patients responding to immunotherapy across low, intermediate, and high TMB groups, as well as in six groups based on combined risk score and TMB. (E,F) Kaplan–Meier curves of the OS of patients in the low-TMB, intermediate TMB and high-TMB groups in the TCGA-COAD training cohort. MSS microsatellite stability, MSI-L microsatellite instability-low, MSI-H microsatellite instability-high. TMB tumor mutation burden. P values were showed as: ns not significant, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

The microsatellite instability-high (MSI-H) phenotype responds well to immunotherapy. Immunotherapy response rates in the MSS subgroup, shown in Fig. 7C (left), were significantly lower compared to MSI-H subgroup (P < 0.05). Rates in the MSI-H subgroup were higher than in the MSI-L subgroup, though not significantly different. These results affirm the benefits of immunotherapy for MSI-H patients. Additionally, within the MSS subgroup, the high-risk group showed a significantly lower immunotherapy response rate compared to the low-risk group (Fig. 7C, right). These findings endorse the effectiveness of combined risk-score and microsatellite status analysis in predicting immunotherapy response in COAD. TMB was identified as an indicator of immunotherapy response, with a lower response rate in the low-TMB subgroup compared to the high-risk subgroup (Fig. 7D, left). Moreover, within the low TMB subgroup, the high-risk group exhibited a significantly lower response rate to immunotherapy than the low-risk group (Fig. 7D, right). Combining the risk score and TMB score seems to be a more accurate predictor of immunotherapy response in COAD patients.

The relationship between mitochondrial-related risk score and mutation profile in COAD

TMB is an established biomarker for predicting the efficacy of immune-checkpoint inhibitors (ICIs). We analyzed simple nucleotide variations from TCGA, revealing APC, TP53, TTN, KRAS, and PIK3CA as the top five genes with the highest mutation frequencies in the high-risk group (Fig. S9A). Similarly, the low-risk group had the highest mutation frequencies in APC, TTN, KRAS, TP53, and MUC16 (Fig. S9B). However, no significant distinction was found in the mutation analysis between the two groups (Fig. S9C). The bar plot presented a summary of the gene mutation information (Fig. S9D, E). Besides, there was no significant disparity in TMB observed between the high-risk and low-risk groups in patients with COAD (Fig. S9F). Interestingly, as depicted in Fig. 7E, the high-TMB subgroup demonstrated poorer OS. However, in all the low, intermediate, and high TMB groups, the low-risk subgroups generally exhibited superior OS when compared to the high (Fig. 7F). Therefore, combining the risk score and TMB could potentially offer a more effective means of predicting the prognosis in COAD.

Risk score predicts therapeutic benefits in COAD

We analyzed IC50 values for COAD patients treated with 198 specific drugs (Fig. S10). Low-risk group patients might benefit from drugs like OSI-027, Dasatinib, Doramapimod, while high-risk group patients could respond well to BMS-754807, Trametinib, Axitinib, and others. Detailed information on the top 10 responsive drugs for each subgroup is provided in Fig. 8A–10C, as well as Tables 4, 5.

Fig. 8
figure 8

Risk score predicts drug therapeutic benefits in COAD. (A) Proportion of normalized IC50 value of the 60 drugs between the low-risk and high-risk groups. (B) Statistical box plot of IC50 predictions of 10 drugs in high-risk groups. (C) Statistical box plot of IC50 predictions of 10 drugs in low-risk groups. P values were showed as: ****P < 0.0001.

Table 4 Detailed information of the top 10 sensitivity drugs in high-risk groups.
Table 5 Detailed information of the top 10 sensitivity drugs in low-risk groups.

Discussion

In China, COAD is a prevalent cancer and a leading cause of cancer-related mortality, ranking among the top five. It is characterized by high incidence, frequent post-surgery recurrence, and significant mortality2. Around 50% of COAD patients experience metastasis and recurrence, emphasizing the need for prognostic markers to enhance our understanding and prediction of disease outcomes25. Identifying reliable prognostic biomarkers for COAD survival is crucial. While the accuracy of several gene-based prognostic indicators remains uncertain26,27, this study identified nine mitochondrial genes with potential as prognostic biomarkers for COAD, offering new possibilities for clinical diagnosis and treatment strategies.

Mitochondria, crucial for cell proliferation and programmed cell death, are potential targets for pharmacological intervention28. Highly invasive tumors reprogram mitochondrial energy metabolism to meet increased energy demands for malignant growth and migration, optimizing fuel utilization29. We combined analysis resulted in the identification of 274 DEGs that are specifically associated with mitochondria in COAD. Enrichment analysis highlighted pathways related to mitochondria and tumor development. Through Univariate Cox, Lasso, and Multivariable Cox regressions, a risk score model with nine genes was developed. Validation in GSE39582 and GSE29623 datasets confirmed the model’s effectiveness and stability. However, the AUC values in these validation sets are lower than in the training set. This discrepancy may be due to factors such as dataset differences, feature distribution variations, model overfitting, or data quality issues. While the validation results demonstrate the model’s generalizability, the lower AUC highlights its limitations. Additionally, A nomogram combining the risk score with clinical characteristics improves survival prediction, serving as a valuable tool for accurate prognosis forecasting in COAD patients.

PPARGC1A, or PGC-1α, produces peroxisome proliferator-activated receptor gamma coactivator-1, a critical transcriptional coactivator. This coactivator regulates mitochondrial biogenesis and oxidative phosphorylation, impacting tumor cell metastasis30. MTUS1, acting as a tumor suppressor gene, encodes the potential tumor suppressor protein, Microtubule-associated tumor suppressor 131. TRAP1, predominantly located in mitochondria, regulates cellular metabolic reprogramming and mitochondrial apoptosis32. Matrix metalloproteinases (MMPs) contribute to the invasiveness of CRC33. Assessing plasma MMP1 levels holds promise as a valuable prognostic factor for COAD patients, guiding personalized postoperative treatments and surveillance strategies34. We hypothesized that the differential expression of these genes is linked to adverse outcomes in COAD. However, limited research on other genes in the risk score model and their relation to COAD emphasizes the need for further studies in this area.

Our study expands on previous research by developing a multi-biomarker model, considering biomarkers like PKN2, SLC25, and MTUS1, known for predicting COAD prognosis11,35,36. Unlike singular biomarker studies, we offer a comprehensive approach. Zuo et al. focused on metabolic signatures, highlighting altered metabolic regulation in COAD37. Qiu et al. incorporated inflammation-related genes into their prognostic model38. In our model, emphasis is on mitochondria-related genes, indicating potential targets for COAD prognosis and targeted therapy.

WGCNA is a method that identifies clusters of highly correlated genes and links them to external traits or conditions. Notably, the study by Zhang et al. on oral squamous cell carcinoma provided a comparative framework that enriches our understanding of the role of co-expression networks in tumorigenesis39. Our WGCNA analysis identified several gene modules that are significantly associated with the traits of interest. The turquoise module, in particular, exhibited a strong correlation with the risk score, indicating that genes within this module may play a protective role in the disease context. The functional enrichment analysis of this module revealed crucial biological processes and pathways that could be involved in tumorigenesis. For instance, genes in the turquoise module were significantly enriched in extracellular matrix organization, which are critical for maintaining cellular integrity and function. Dysregulation of these processes is often linked to cancer progression, particularly in promoting an invasive phenotype through extracellular matrix degradation40. Moreover, the pathway analysis highlighted significant involvement of cytoskeleton-related pathways and the PI3K-Akt signaling pathway in the turquoise module. The PI3K-Akt pathway is well-known for its role in promoting cell survival, proliferation, and migration, which are hallmarks of cancer41. The enrichment of this pathway suggests that the turquoise module may drive oncogenic processes by activating signaling cascades that enhance cell survival and metastatic potential.

GO and KEGG analysis revealed significant gene function and pathway differences between high-risk and low-risk groups. ECM and cytokine-cytokine receptor interactions were notably enriched. Elevated ECM, a tumor hallmark, correlates with poor prognoses in various cancers42. Inflamed tissues of the colon and rectum release cytokines, initiating complex cell interactions and inflammation, which, along with genetic and epigenetic changes, contribute to COAD development43. Various cell types present in the colon possess receptors for inflammatory cytokines, like TNF-α, IL-1β, and IL-6, pivotal in COAD progression44.

Using scRNA-seq data, we analyzed diverse cell types, and providing a comprehensive understanding of their functional characteristics and interactions. Notably, tissue stem cells play a unique role as both senders in the CXCL signaling network and receivers in the TNF signaling network. This suggests they may act as sources of signaling in the CXCL pathway, transmitting signals to cells expressing receptors like CXCR4, and eliciting targeted cellular responses. The CXCL pathway is crucial in governing tumor growth and metastasis45. The CXCL12-CXCR4 interaction recruits CXCR4-positive cells to the tumor microenvironment, fostering aggressive tumor growth and stemness through the secretion of cytokines, chemokines, and growth factors46,47. TNF, an inflammatory cytokine, signals pro-survival or pro-death depending on the context48,49. Our findings suggest tissue stem cells may be targeted by the TNF signaling pathway, potentially expressing TNF receptors like TNFRSF1A, enabling specific responses. Interest in TNF in immunotherapy has revealed components of the TNF signaling pathway protective against CD8+ T cell cytotoxicity. These findings could reroute tumor TNF signaling towards pro-death, offering a new strategy to exploit T cell-derived TNF and enhance immunotherapy response50.

Research suggests tissues with high normal stem cell proliferation rates have a higher cancer incidence51,52,53, indicating resident normal stem cells may be a source for cancer stem cells (CSCs). Our study supports these concepts, revealing a greater presence of tissue stem cells in the high-risk group, emphasizing the link between stem cells and cancer development.

In COAD, inflammation is common, and disease progression activates oncogenes while losing tumor suppressor genes. This contributes to the deterioration of healthy cells and disruption of the intestinal epithelial barrier. As a result, lots of inflammatory factors and chemokines were producted in affected tissues54,55. The TME consists of various elements crucial for promoting tumor survival, growth, and invasiveness, while concurrently suppressing the body’s immune response against the tumor56,57. Post-therapy, TME cells release growth factors and cytokines supporting the survival of remaining tumor cells, leading to treatment resistance58. Stromal fibroblasts can transform into cancer-associated fibroblasts (CAFs) within the TME, actively contributing to tumor development and progression through factor secretion and direct interactions with other cells. These components also shape the TME by suppressing immune responses against the tumor and recruiting immune-suppressive cells59. Our analysis aligns with previous studies, showing higher expression of CAF and matrisome signatures in the high-risk group. Furthermore, ESTIMATE analysis revealed a positive correlation between the risk score and stromal score, and a negative correlation with tumor purity. These findings suggest increased infiltration of stromal cells into the TME in the high-risk group.

In the TME, immune cells are crucial. The cancer immunoediting hypothesis suggests that, in individuals with a functional immune system, immune selection favors cancer cells with lower immunogenicity during cancer development. Immunosuppressive networks are established, enabling the tumor to evade immune escape60. Clinically significant cancers often employ diverse mechanisms of immunosuppression, such as increasing the presence of immunosuppressive cells like regulatory T cells (Treg cells) and tumor-associated macrophages, upregulating immunosuppressive molecules, and downregulating cancer antigens. These mechanisms lead to impaired recognition of cancer cells by CD8+ T cells. Tie et al. propose that targeting immunosuppressive cells and mechanisms holds the potential to unleash against tumor immune responses61. In our study, we explored immune mechanisms differentiating low- and high-risk groups and investigated the potential of cancer immunotherapy to enhance the anti-tumor immune response. High-risk patients showed increased levels of naive B cells, Tregs, and macrophages, while low-risk patients had elevated resting CD4+ T cells and NK cells. The presence of high levels of Treg cells in the TME was associated with a poor prognosis in many patients with different types of cancer62. Macrophage M2 cells and Tregs can hinder the immune response mediated by CD8+ T cells against tumors63. Prior studies outlined diverse roles of CD4+ T cells, including suppressing detrimental immune reactions and aiding CD8+ T cells in eliminating tumors, but also hindering CD8+ T cell activation and NK cell cytotoxicity64,65. In the high-risk group, elevated levels of immunosuppressive cells, like Tregs and M2 macrophages, hindered CD8+ T cell-mediated tumor eradication, indicating significant immunosuppression compared to the low-risk group.

In cancer immunotherapy, immune checkpoint blockade is a pivotal strategy66. Our study specifically analyzed immune checkpoint expression in low- and high-risk groups. The high-risk group, per our model, not only displayed intensified immunosuppression but also increased immune checkpoint expression. These factors in the high-risk group worsened prognosis by promoting COAD growth, progression, invasion, and angiogenesis. Conversely, immune checkpoint inhibitors may benefit high-risk patients, potentially improving prognosis. The low-risk group showed a higher immunotherapy response rate than the high-risk group, correlating with PD-1 and TIM-3 expression levels. Combining the risk score with the low immune score, MSS, or low TMB improved the accuracy of predicting immunotherapy sensitivity in COAD patients. The risk score proves valid as a predictive biomarker for immunotherapy response. When analyzing the differences in immunotherapy response rates between high-risk and low-risk groups, several potential confounding factors should be considered: tumor microenvironment, genetic and molecular variability, pre-existing immunity, treatment history, comorbidities, patient demographics, tumor burden and stage, biomarker levels, and adverse events and tolerability. Addressing these potential confounders is crucial for accurately interpreting the differences in immunotherapy response rates between different risk groups67. TMB, reflecting cancer cell mutations and neo-antigen generation, enhances T-cell recognition, correlating with improved outcomes from immune checkpoint inhibitors68. Thus, TMB serves as a dependable biomarker for predicting the response to immune checkpoint therapy TMB serves as a reliable biomarker for predicting immune checkpoint therapy response69. Findings from our study demonstrated that low-risk patients with low TMB showed a higher immunotherapy response rate, while high-risk patients had worse prognoses regardless of TMB levels. The risk score could aid in drug selection, as responses to Trametinib, Dasatinib, and Axitinib varied between risk groups. Overall, these results support the risk score’s reliability as a predictor for chemotherapy and targeted therapy response.

In conclusion, our tailored risk score model for COAD, incorporating factors like the TME and immune cell infiltration, can be combined with immune score, microsatellite stability, or TMB to improve immunotherapy response prediction accuracy. We also identified specific drugs more effective for the two risk groups. Overall, our mitochondrial-related risk model holds promise as a reliable prognostic biomarker, enabling personalized treatment strategies for COAD patients.