Prospective validation of ORACLE, a clonal expression biomarker associated with survival of patients with lung adenocarcinoma

Biswas, Dhruva; Liu, Yun-Hsin; Herrero, Javier; Wu, Yin; Moore, David A.; Karasaki, Takahiro; Grigoriadis, Kristiana; Lu, Wei-Ting; Veeriah, Selvaraju; Naceur-Lombardelli, Cristina; Magno, Neil; Ward, Sophia; Frankell, Alexander M.; Hill, Mark S.; Colliver, Emma; de Carné Trécesson, Sophie; East, Philip; Malhi, Aman; Snell, Daniel M.; O’Neill, Olga; Leonce, Daniel; Mattsson, Johanna; Lindberg, Amanda; Micke, Patrick; Moldvay, Judit; Megyesfalvi, Zsolt; Dome, Balazs; Fillinger, János; Nicod, Jerome; Downward, Julian; Szallasi, Zoltan; Hackshaw, Allan; Jamal-Hanjani, Mariam; Kanu, Nnennaya; Birkbak, Nicolai J.; Swanton, Charles

doi:10.1038/s43018-024-00883-1

Download PDF

Article
Open access
Published: 09 January 2025

Prospective validation of ORACLE, a clonal expression biomarker associated with survival of patients with lung adenocarcinoma

Nature Cancer volume 6, pages 86–101 (2025)Cite this article

17k Accesses
3 Citations
204 Altmetric
Metrics details

Subjects

Abstract

Human tumors are diverse in their natural history and response to treatment, which in part results from genetic and transcriptomic heterogeneity. In clinical practice, single-site needle biopsies are used to sample this diversity, but cancer biomarkers may be confounded by spatiogenomic heterogeneity within individual tumors. Here we investigate clonally expressed genes as a solution to the sampling bias problem by analyzing multiregion whole-exome and RNA sequencing data for 450 tumor regions from 184 patients with lung adenocarcinoma in the TRACERx study. We prospectively validate the survival association of a clonal expression biomarker, Outcome Risk Associated Clonal Lung Expression (ORACLE), in combination with clinicopathological risk factors, and in stage I disease. We expand our mechanistic understanding, discovering that clonal transcriptional signals are detectable before tissue invasion, act as a molecular fingerprint for lethal metastatic clones and predict chemotherapy sensitivity. Lastly, we find that ORACLE summarizes the prognostic information encoded by genetic evolutionary measures, including chromosomal instability, as a concise 23-transcript assay.

Clonal gene signatures predict prognosis in mesothelioma and lung adenocarcinoma

Article Open access 23 February 2024

Dissecting the multi-omics atlas of the exosomes released by human lung adenocarcinoma stem-like cells

Article Open access 14 June 2021

Multi-omics with dynamic network biomarker algorithm prefigures organ-specific metastasis of lung adenocarcinoma

Article Open access 14 November 2024

Main

Lung cancer is the leading cause of global cancer-related death¹. Non-small cell lung cancer (NSCLC) accounts for 85% of cases, of which 50% are lung adenocarcinoma (LUAD)². For patients with NSCLC, tumor–node–metastasis (TNM) staging is the gold standard for clinical prognostication and therapeutic decision-making. Although TNM staging is clearly associated with survival, better predictors could be found. For example, surgical resection is performed with curative intent in patients with stage I disease, yet there is a 5-year mortality rate of 15% in this population³. This indicates a need to address undertreatment by identifying high-risk stage I tumors that may benefit from adjuvant therapy⁴. Moreover, as computed tomography lung-cancer screening programs are adopted, the proportion of stage I diagnoses increases from around 15% to nearly 60% (ref. ⁵). Therefore, improving prognostic accuracy in early-stage LUAD is an urgent and growing clinical need.

Transcriptomic biomarkers hold the translational potential of capturing features of cancer cell aggressiveness to add a molecular dimension to prognostication. Yet, despite two decades of research, developing reliable expression biomarkers for LUAD remains a difficult task. Previously suggested biomarkers have failed to refine risk prediction beyond established clinicopathological risk factors, particularly in stage I disease⁶, and have exhibited poor reproducibility in independent validation cohorts. This was showcased by the Director’s Challenge Consortium study in which nine top research teams failed to achieve these benchmarks⁷.

Previously, we quantified tumor sampling bias in the TRACERx (TRAcking non–small cell lung Cancer Evolution through therapy (Rx)) lung study (NCT01888601). We observed that pervasive intratumor heterogeneity (ITH) in lung cancer confounded prognostic signatures, with 30–40% of tumors yielding disparate prognostic scores depending upon where the biopsy needle was placed⁸. Proposed solutions to the sampling bias issue for molecular biomarkers (Fig. 1a) include: (1) bypassing sampling, by resecting the whole tumor then testing every part^9,10; (2) sampling and pooling biopsies from different areas of a tumor to minimize artifacts from tumor heterogeneity (previous authors have suggested that four biopsies would be sufficient for lung tumors¹¹ or two biopsies for glioma¹²); (3) homogenizing the entire tumor, then performing one test on the resulting mixture¹³; and (4) our previously developed strategy, identifying homogeneously (clonally) expressed markers to sample and test one biopsy per tumor⁸.

**Fig. 1: Prospective validation of tumor sampling bias.**

Clonal expression biomarkers may be straightforward to implement clinically, as they are compatible with existing pathology workflows and cost-effective. Accordingly, we had designed the Outcome Risk Associated Clonal Lung Expression (ORACLE) signature in TRACERx as a multiregion research cohort⁸. In retrospective validation analyses of more than 900 patients with LUAD, this biomarker maintained prognostic significance and was associated with survival independent of clinicopathological risk factors in a multivariable analysis⁸.

Here, we expand on our previous work by developing three lines of analysis related to clonal expression biomarkers in LUAD. First, we perform prospective validation of a molecular test based on cancer evolutionary principles for patients with lung cancer. Second, we expand our mechanistic understanding of clonal transcriptional signals by charting them from tumor initiation to metastasis and evaluating their association with chemosensitivity. Third, we examine the relationship between clonal RNA alterations and previously described genetic metrics of lung cancer evolution^14,15,16,17.

Results

Multiregion RNA-seq data from LUAD

Previously, we utilized data from the first 100 patients recruited into the TRACERx study (TRACERx100 cohort, including 28 patients with stage I–III LUAD, 89 tumor regions) to quantify the RNA ITH of prognostic biomarkers in LUAD⁸. In this work, we leverage multiregion RNA sequencing (RNA-seq) data from an expanded cohort of patients with stage I–III LUAD recruited prospectively in the TRACERx study (Extended Data Fig. 1a). For the validation of ORACLE in an independent patient cohort, we exclude patients profiled in our previous study to yield the TRACERx validation cohort, consisting of 369 tumor regions from 158 patients. Separately, for additional exploratory analyses, we utilize the full combined set of patients, termed the TRACERx exploratory cohort, comprising 450 tumor regions from 184 patients. All primary tumor regions were sampled from treatment-naive patients. ORACLE risk scores were determined as described in the original publication⁸, applying predefined model coefficients and risk-score cutoff (Methods and Extended Data Fig. 1b).

Benchmarking tumor sampling bias

We prospectively assessed the tumor sampling bias of ORACLE, benchmarking against comparable prognostic signatures. Tumor sampling bias was quantified using four metrics in the TRACERx validation cohort, restricting analysis to patients with multiregion RNA-seq data available (333 tumor regions from 122 patients with stage I–III LUAD; Extended Data Fig. 1a). To benchmark ORACLE, six RNA-seq-based prognostic signatures for LUAD were identified from a literature search and applied as described in their original publications (Methods and Supplementary Table 1): three signatures based on immune-related genes (Li et al.¹⁸, Song et al.¹⁹ and Jin et al.²⁰), one N⁶-methyladenosine-related signature (Wang et al.²¹), one ER-stress signature (Li et al.²²) and one signature derived from aberrantly expressed protein-coding genes (Zhao et al.²³).

First, the ORACLE signature was used to classify tumor regions as either high or low risk according to the predefined thresholds from Biswas et al.⁸. Each tumor could then be classified as concordant-low risk, concordant-high risk or discordant risk (Fig. 1a). For ORACLE, discordant risk classification was observed in 19% (23/122) of tumors compared with 25–44% across the other six signatures (Fig. 1b,c and Extended Data Fig. 2a). We also assessed whether this observation was affected by tumor stage (TNM 8th edition), finding that the discordant risk frequency for ORACLE was not significantly associated with tumor stage (chi-squared test, P = 0.09; Extended Data Fig. 2b).

Second, we applied a hierarchical clustering method previously used by us and others to quantify tumor sampling bias^8,24 (Extended Data Fig. 3). In this analysis, a larger area under the curve (AUC) value suggests more concordant classification of regions at the patient level. ORACLE exhibited an AUC value of 0.76, ranking second highest out of the seven signatures (AUC values ranging from 0.22 to 0.77; Extended Data Figs. 3 and 4a,b), with the Li et al.¹⁸ signature demonstrating a marginally higher AUC value (0.77).

Third, we applied a method developed by Househam et al.²⁵ for capturing the intratumor expression variability of individual genes, with lower values indicating homogeneous expression (Extended Data Fig. 4c). By this metric, the genes comprising ORACLE exhibited the lowest median value at 0.36 compared with values ranging from 0.49 to 1.3 for the other signatures (Extended Data Fig. 4d), indicating greater stability in expression across tumor regions.

Lastly, motivated by the reliance on single tumor biopsies in current clinical practice, we applied a metric previously used to quantify how many biopsies would be required to obtain a stable risk-score estimate²⁶ (Extended Data Fig. 4e). Using a threshold prespecified by the authors of the original study²⁶, the ORACLE signature reached a stable risk-score estimate at 1.3 biopsies compared with 1.6–2.8 for the other signatures (Extended Data Fig. 4f). This suggests that ORACLE yields a more stable risk-score estimate from a single tumor biopsy.

In this prospective validation of tumor sampling bias, ORACLE achieved the best mean rank (1.25) out of seven RNA-seq-based prognostic signatures for LUAD (range 4–6.25) across four metrics for tumor sampling bias (Fig. 1d).

Prospective validation

Next, we focused on prospective assessment of the survival association of ORACLE in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD; Extended Data Fig. 1a).

We calculated hazard ratio (HR) values to compare ORACLE risk classes: concordant-high versus concordant-low, and discordant versus concordant-low. There was a clear association between ORACLE risk class and overall survival (OS) (Fig. 2a; concordant-high versus concordant-low HR 2.2 (95% confidence interval (CI) 1.2–3.9), discordant versus concordant-low HR 2.5 (95% CI 1.3–4.9), P = 0.0034).

**Fig. 2: Prospective validation of survival association.**

We next examined whether the association between ORACLE and survival was independent of known clinicopathological risk factors (sex, age, smoking pack-years, adjuvant treatment status, tumor stage (TNM 8th edition) and histologic grade). Adjusted HR (HR-adj) values were calculated using a multivariable analysis in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD; Extended Data Fig. 1a). ORACLE was used as a continuous risk measure, by calculating the mean score across regions per tumor. The ORACLE risk score was significantly associated with OS (HR-adj 2.27 (95% CI 1.3–3.9), P = 0.004; Fig. 2b) when adjusted for sex, age, smoking pack-years, adjuvant treatment status, tumor stage (TNM 8th edition) and histologic grade.

In clinical practice, typically only one biopsy is available per tumor to determine molecular risk scores. We generated a pseudo-single biopsy cohort to evaluate ORACLE in this context, by randomly sampling one region per tumor, calculating the risk score for that region, then testing the survival association. Running this simulation 1,000 times, the ORACLE risk score remained significantly associated with OS across every iteration (Fig. 2c, bootstrapped HR 2.2, bootstrapped CI 1.42–3.42).

We also evaluated ORACLE specifically in patients with stage I LUAD in the TRACERx validation cohort (n = 70 patients with stage I LUAD), where a prognostic biomarker might have the greatest utility for adjuvant therapy use⁵. Classifying these patients according to the current clinical standard (TNM 8th edition, n = 38 in stage IA, n = 32 in stage IB), tumor substaging criteria were not prognostically informative (log-rank P = 0.43; Fig. 2d). By contrast, stratifying these patients into ORACLE risk classes (concordant-low n = 56, discordant n = 5, concordant-high n = 9) showed a significant association with OS (log-rank P = 0.003; Fig. 2d). The association between ORACLE risk score and OS in the stage I subgroup remained significant (HR-adj 5.48 (95% CI 1.6–18.8), P = 0.007; Extended Data Fig. 5a) when adjusted for sex, age, smoking pack-years, adjuvant treatment status, tumor stage (TNM 8th edition) and histologic grade. We further compared substaging classification with ORACLE risk class, finding that 8% (3/38) of patients with stage IA and 19% (6/32) of patients with stage IB were classified as ORACLE high risk (Extended Data Fig. 5b). To compare the predictive utility of ORACLE with other prognostic signatures, we calculated area under the receiver operating characteristic curve (AUROC) values, finding that the ORACLE risk score exhibited higher concordance with OS in stage I disease (AUROC 0.73) than the other six signatures (AUROC 0.59–0.72; Table 1). Lastly, a meta-analysis of four microarray datasets^7,27,28,29 from other institutions revealed that ORACLE risk score was significantly associated with survival outcome in the stage I subgroup (HR 3.4 (95% CI 2.2–5.4), P = 2.8 × 10⁻⁵; Extended Data Fig. 5c), providing additional validation in external cohorts.

Table 1 AUROC and C index calculated for patients with stage I LUAD (n = 70) using survival endpoints for LUAD RNA-seq prognostic signatures

Full size table

ORACLE as a biomarker of invasive and metastatic potential

Previously we had observed that ORACLE risk scores were significantly higher in metastatic samples from patients with LUAD, suggesting that ORACLE may serve as a signature for metastatic potential⁸. We wished to extend this finding by investigating whether high-risk clonal expression changes are present before tissue invasion and whether the lethal disseminating clone is detectable in the transcriptome of the primary tumor.

First, we tested whether ORACLE, as a lung cancer marker, predicted lung-cancer-specific survival in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD). A significant association was found between ORACLE risk class and lung-cancer-specific survival (concordant-high versus concordant-low HR 2.1 (95% CI 0.9–4.6), discordant versus concordant-low HR 3.1 (95% CI 1.4–7.0), P = 0.011; Fig. 3a). The association between ORACLE risk score and lung-cancer-specific survival remained significant in a subgroup analysis of patients with stage I disease (log-rank P = 0.0028; Fig. 3b) and when controlling for clinicopathological risk factors (HR-adj 2.15 (95% CI 1.1–4.3), P = 0.03; Extended Data Fig. 5d). ORACLE risk score was also a better predictor of lung-cancer-specific survival in stage I LUAD (AUROC 0.71) compared with the other six prognostic signatures (AUROC 0.55–0.69; Table 1).

**Fig. 3: ORACLE as a marker of invasive and metastatic potential.**

Next, to track the transition from normal tissue to cancer, we examined ORACLE risk scores across eight histological stages (n = 77 patients, including 27 normal tissues, 15 hyperplasia, 15 metaplasia, 13 mild dysplasia, 13 moderate dysplasia, 12 severe dysplasia, 13 carcinoma in situ (CIS) and 14 squamous cell carcinoma (SCC))³⁰. Charting ORACLE risk scores by developmental stages revealed an increase in expression from normal to metaplasia (linear mixed-effects model P = 0.0083; Fig. 3c).

We evaluated whether a lethal disseminating phenotype could be detected in the transcriptome of primary tumor regions harboring a metastatic subclone. Leveraging paired primary-metastasis phylogenies³¹ within the TRACERx exploratory cohort, we superimposed ORACLE risk scores onto metastatic competence at the level of tumor regions (53 tumor regions from n = 17 patients with stage I–III LUAD with paired metastasis-seeding regions (22) and non-metastasis-seeding regions (31)). In this analysis, seeding regions displayed significantly higher ORACLE risk scores than nonseeding regions (linear mixed-effects model P = 0.03; Fig. 3d). To examine whether ORACLE risk was informative for predicting early systemic dissemination, we assessed the time to relapse or death using disease-free survival (DFS) in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD). A significant association was found between ORACLE risk class and DFS (concordant-high versus concordant-low HR 2.3 (95% CI 1.2–4.2), discordant versus concordant-low HR 1.7 (95% CI 1.0–2.9), P = 0.015; Fig. 3e). We also performed a subgroup analysis finding that ORACLE risk class was significantly associated with DFS in patients with stage I disease (P = 0.025, Fig. 3f; ORACLE AUROC 0.59, other signatures AUROC values 0.55–0.66; Table 1). The association between ORACLE risk score and DFS was not significant when adjusted for clinicopathological risk factors (HR-adj 1.3 (95% CI 0.8–2.0), P = 0.3; Extended Data Fig. 5e). Relapse rates at 5 year follow-up were higher for concordant-high (37%, 13/35) and discordant (52%, 12/23) risk classes than for the concordant-low (29%, 29/100) group (Fig. 3e). Notably, the rate of progression was more rapid in the high-risk (median DFS 1.8 years) and discordant-risk groups (median DFS 0.99 years) compared with the low-risk group (median DFS not reached).

Overall, these data indicate that high-risk clonal expression changes are present in preinvasive lesions, remain detectable in primary tumors that achieve early systemic dissemination and can serve as a molecular fingerprint for the lethal metastasizing subclone.

ORACLE delineates chemosensitive cells

Predicting patient benefit from adjuvant chemotherapy is a major challenge in early-stage NSCLC^32,33. We therefore investigated the utility of ORACLE for identifying chemosensitivity in treatment-naive patients.

First, we examined the relationship between ORACLE risk score and sensitivity to cytotoxic or targeted chemotherapies by leveraging drug sensitivity screening data in the Genomics of Drug Sensitivity in Cancer (GDSC) database³⁴, which are linked to transcriptomic profiles for LUAD cell lines in the Cancer Cell Line Encyclopedia³⁵. Cell lines and compounds with missing data were filtered (Methods and Extended Data Fig. 6a). For each compound, we ranked LUAD cell lines according to ORACLE risk score, then examined the correlation with drug response determined by half-maximal inhibitory concentration (IC₅₀) (Extended Data Fig. 6b); multiple-testing correction was not applied for this exploratory analysis. Focusing on the 17 the US Food and Drug Administration (FDA)-approved drugs for NSCLC, only cisplatin was significantly correlated with efficacy in ORACLE high-risk cell lines (Fig. 4a, P = 0.045, Spearman coefficient 0.33). Furthermore, across all compounds screened, responses to 23 drugs positively correlated with ORACLE risk score. GSK1904529A, a small molecule inhibiting insulin-like growth factor-1 receptor (IGF-1R) harbored the strongest association with ORACLE risk score (P = 0.0089, Spearman coefficient 0.42). Notably, the main mechanism of GSK1904529A is cell cycle arrest³⁶ and we have previously observed cell cycle genes to be enriched among clonal transcriptional signals⁸. Only one drug, a B-Raf serine-threonine kinase (BRAF) inhibitor KIN001-206, was negatively correlated with ORACLE risk score (P = 0.0045, Spearman coefficient −0.46; Fig. 4a and Extended Data Fig. 6b). By categorizing therapeutic compounds on the basis of targeted pathways, we identified four pathways—hormone-related, chromatin histone methylation, DNA replication and genome integrity—where all compounds exhibited positive correlation with ORACLE risk. By contrast, compounds involved in inhibition of epidermal growth factor receptor (EGFR) signaling tended to display a negative correlation with ORACLE risk (Fig. 4b).

**Fig. 4: ORACLE delineates chemosensitive cells.**

To test whether adjuvant chemotherapy modulates the prognostic information captured by ORACLE, we divided patients from the TRACERx validation cohort into two subgroups according to their adjuvant treatment status (n = 102 non-adjuvant-treated, n = 56 adjuvant-treated; patients with stage I–III LUAD) and then stratified by ORACLE risk class (Fig. 4c). In the non-adjuvant-treated subgroup, a significant difference in OS rates was observed between ORACLE concordant-high risk patients (5-year OS rate 36%) and concordant-low risk patients (5-year OS rate 70%) (Cox regression P = 0.0001, HR 4.0 (95% CI 1.9–8.3)). By contrast, in the adjuvant-treated subgroup, there was no difference in OS rates between ORACLE concordant-high risk patients (5-year OS rate 69%) and concordant-low risk patients (5-year OS rate 60%) (Cox regression P = 0.8, HR 0.9 (95% CI 0.3–2.5)). This result, wherein ORACLE high-risk classification was more discriminatory among patients who did not receive adjuvant therapy, remained consistent when controlling for nodal status in this cohort of patients (Extended Data Fig. 7).

Taken together, these in vitro drug screen data and exploratory clinical data suggest that ORACLE high-risk LUAD tumors may be sensitive to platinum chemotherapy agents.

ORACLE as a summary metric of lung cancer evolution

To explore the underpinnings of clonal expression signals, we evaluated clinicopathological correlates in the TRACERx exploratory cohort (n = 184 patients with stage I–III LUAD, Extended Data Fig. 1a; Methods). The mean ORACLE risk score was calculated as a summary measure per tumor, for use in multiple linear regression analyses. We identified two clinicopathological features that were significantly associated with ORACLE risk scores: tumor stage III (P = 0.002), as shown previously⁸, and Ki67 (P = 0.0009; Fig. 5a).

We next examined genetic features defined in the TRACERx study¹⁴: whole-genome doubling (WGD) events, chromosomal complexity (fraction of loss of heterozygosity, FLOH), somatic copy-number alteration (SCNA)-ITH, and clonal and subclonal mutations in driver genes. The mean ORACLE risk score per tumor significantly correlated with SCNA-ITH (P = 0.02), FLOH (P = 0.01) and the number of clonal driver mutations (P = 0.009; Fig. 5a and Extended Data Fig. 8).

To contextualize ORACLE-associated somatic alterations to specific driver genes, we compared frequencies of each driver at gene level between low-risk (n = 308) and high-risk (n = 142) tumor regions in the TRACERx exploratory cohort (n = 184 patients with stage I–III LUAD). ORACLE high-risk tumor regions were enriched (P < 0.05, odds ratio (OR) >1) in clonal mutations occurring in eight driver genes (PTPRB, TP53, MGA, KEAP1, SETD2, NOTCH2, ARID1A and NRAS) and depleted (P < 0.05, OR <1) in tumor regions with clonal mutations of EGFR or STK11 genes (Extended Data Fig. 9a,b). Performing the same analysis for subclonal SNVs in driver genes revealed FAT1 gene enrichment in ORACLE high-risk regions (P = 0.03, OR 5.6), possibly due to this gene’s putative role in maintaining genome integrity³⁷.

As ORACLE risk score reflected chromosomal instability and complexity, we wished to identify recurrent SCNA events using GISTIC2.0³⁸ to compare positive-selection scores (G score) between ORACLE concordant high-risk and low-risk patients in the TRACERx exploratory cohort (n = 158 patients with stage I–III LUAD with concordant high- or low-risk classification, Extended Data Fig. 1a; Methods). Identifying cytobands associated with ORACLE high-risk (G-score difference >0, false discovery rate q < 0.05), significant enrichment was observed for 14 amplifications (Extended Data Fig. 9c): 1q22, 8q22.3, 8q24.11-13, 8q24.21-23, 8q24.3, 14q12, 19q12 and 19q13.11-13. These amplified chromosome arms include the NKX2-1 gene (which encodes thyroid transcription factor 1 (TTF1) an established histopathology marker for LUAD) as well as MDM4, MYC, CCNE1 and AKT2. Significant enrichment was also observed for ten cytoband deletions (8p23.1, 8p22, 8p21.3-1, 8p12, 9p24.3 and 20p12.3-1), including FGFR1, CDKN2A and PAX5 genes (Extended Data Fig. 9c).

Six biomarkers have been identified as associated with survival in the TRACERx study: recent subclonal expansion¹⁴, subclonal WGD¹⁴, preoperative circulating tumor DNA (ctDNA)¹⁵, SCNA-ITH¹⁶, spread through airway spaces (STAS)¹⁷, and ORACLE⁸. We performed multivariable analysis to quantify the comparative prognostic information between these biomarkers, including clinical risk factors in the TRACERx exploratory cohort (n = 111 patients with stage I–III LUAD with all biomarker data available). Three biomarkers remained significantly associated with OS (Fig. 5b): ORACLE (P = 0.008, HR 2.06), STAS (P = 0.023, HR 2.2) and preoperative ctDNA (P = 0.025, HR 2.27). We also calculated the percentage variance explained (PVE) encoded by each of these six biomarkers to examine the dynamics of their prognostic association (Fig. 5c). This analysis showed that ORACLE risk score was responsible for the greatest variance in OS outcomes in the first year after LUAD diagnosis (PVE 16.7%) and remained informative (PVE range 6.1–9.7%) alongside ctDNA and STAS over a 5-year follow-up period.

Overall, these results suggest that clonal expression signals correspond to single-nucleotide variants (SNVs) and SCNAs occurring early in tumor evolution. Further, genetic evolutionary metrics previously identified in the TRACERx study (SCNA-ITH, FLOH and clonal drivers) were captured by ORACLE as a simple 23-transcript assay. Lastly, ORACLE, preoperative ctDNA and STAS encoded complementary forms of prognostic information.

Discussion

Tissue biopsy is the gold standard for cancer diagnosis. The typical single-site needle biopsy samples less than 1% of the primary tumor mass¹³, failing to capture the full extent of genetic and transcriptomic ITH within individual tumors^14,39. To address this sampling bias problem, we previously reported the development of a clonal expression biomarker (ORACLE), which is associated with OS outcomes in retrospective cohorts⁸.

Here, we prospectively evaluated ORACLE, recognizing cancer as an evolutionary disease to refine molecular prognostication in patients with NSCLC. In a comparison against existing LUAD RNA-seq prognostic signatures, ORACLE was prospectively validated as the top-ranked signature across four metrics for tumor sampling bias. Importantly, the association between ORACLE and OS was prospectively validated, remaining significant in multivariable analysis with known clinicopathological risk factors and in a subgroup analysis of stage I disease.

We wished to gain a deeper understanding of the clinical utility of ORACLE. Simulation of a pseudo-single biopsy cohort suggested that ORACLE remains informative in the clinical setting where tissue samples for molecular tests are usually limited⁴⁰. The association between ORACLE and clinical outcomes was significant for lung-cancer-specific survival and DFS. As an RNA marker, ORACLE complemented the use of liquid biopsy (ctDNA) and pathology (STAS) markers to predict 5-year survival outcomes.

Lastly, we uncovered mechanism-based insights into ORACLE. Clonal transcriptional signals were ‘hard-wired’ through the acquisition of SNVs and SCNAs occurring early in tumor evolution and also delineated metastatic seeding from nonseeding primary tumor regions. These data may suggest that clonal expression biomarkers might be further developed to stratify preinvasive lesions for early intervention before systemic dissemination^41,42. ORACLE also correlated with genetic measures of chromosomal instability and complexity. This may explain the observed relationship between ORACLE and sensitivity to chemotherapy agents (in particular, cisplatin), as chromosomally unstable tumors are hypothesized to be prone to genomic catastrophe and, hence, optimal for cytotoxic therapy⁴³. Indeed, recent data support the utility of chromosomal instability signatures for predicting chemotherapy treatment response⁴⁴.

Future work in larger cohorts will test if ORACLE can integrate with substaging criteria to refine risk stratification within stage I disease and to validate a link between ORACLE and chemosensitivity. Breast cancer trials have prospectively evaluated the use of RNA markers to refine risk stratification for chemotherapy, thereby reducing overtreatment^45,46. A similar approach, designing a randomized phase III trial comparing observation versus chemotherapy or closer surveillance for ORACLE high-risk tumors, may similarly move the needle for precision diagnostics in lung cancer (Extended Data Fig. 10). Moreover, the future development of a clinical-grade RNA assay^45,46,47 may bypass the limitations of RNA-seq as a research-grade technology to enable real-time clinical implementation⁴⁸.

Future work might also extend the utility of clonal expression biomarkers beyond prognostication in LUAD. We note that the method reported in our original study to derive clonally expressed genes⁸ has successfully transferred to other cancer types^{49,50,51,52,53}. In addition, multiregion analyses suggest that existing expression-based predictive biomarkers for checkpoint immunotherapy are subject to tumor sampling bias⁵⁴. This may suggest that deriving a clonal expression biomarker capturing the immuno-oncological status of a patient with NSCLC could help refine prediction of immune checkpoint blockade efficacy⁵⁵.

ORACLE has been designed as a pragmatic solution to the sampling bias problem, applied to ‘bulk’ RNA extracted from single-site needle samples in the clinical setting. It has been suggested that, for a subset of tumors, prognosis is inherently difficult to predict due to low-penetrant subclones that are undetectable in bulk profiling⁵⁶. For accurate diagnostic classification in these cases, identifying the lethal subclone may require multiregion^57,58,59 or single-cell⁶⁰ sampling strategies.

Methods

TRACERx cohort, sample collection and sequencing

The TRACERx study (NCT01888601) is a prospective observational cohort study aiming to transform our understanding of NSCLC; it has been approved by an independent research ethics committee (NRES Committee London) (13/LO/1546). Written informed consent was mandatory and obtained from all participants. The cohort used in this study consists of the first 421 patients who had multiple regions sampled from the same tumor to obtain DNA and RNA profiles for subsequent analyses. Sex and gender were not considered in the study design, the cohort comprised 233 (55%) men and 188 (45%) women, and all available individuals were included in each analysis. The TRACERx421 cohort (1,644 tumor regions from n = 421 patients), as previously reported¹⁴, was accessed for this study, with cohort selection as follows (Extended Data Fig. 1a). Including patients with NSCLC with RNA-seq data available yielded the TRACERx NSCLC RNA-seq cohort (745 tumor regions from n = 299 patients). Excluding LUSC tumors (295 regions from n = 117 patients) and synchronous primary tumors (n = 4 patients, ‘tumor 1’ IDs were included and ‘tumor 2’ IDs were excluded¹⁴) yielded the TRACERx LUAD exploratory cohort (450 tumor regions from n = 184 patients). To obtain an independent validation cohort, patients that were analyzed in the previous training cohort⁸ (81 tumor regions from n = 26 patients with stage I–III LUAD; the number diverges from the original study (n = 28 patients, 89 regions)⁸ due to sample dropout with updated TRACERx421 pipeline and cohort criteria) were excluded, yielding the TRACERx LUAD validation cohort (369 tumor regions from n = 158 patients). DNA and RNA was extracted using AllPrep DNA/RNA Mini Kit (Qiagen). Extracted DNA and RNA was assessed for integrity by TapeStation (Agilent Technologies). Whole-exome sequencing was performed on Illumina HiSeq 4000 or HiSeq 2500 platforms. Whole-RNA (RiboZero-depleted) paired-end sequencing was performed using an Illumina HiSeq 4000 platform. RSEM package (version 1.3.3) was used to quantify transcript counts and transcript per million (TPM) values^14,17,31,39. Genes with expression value less than 1 TPM in at least 20% of samples were filtered out. The counts were normalized by variance-stabilizing transformation by the DESeq2 package (version 1.42.0)⁶¹.

Calculating ORACLE risk scores

ORACLE risk scores were calculated as described in the original publication⁸. For each sample, each of the 23 signature genes was weighted by the model coefficient developed in the training cohort, then these values were summed to derive a risk score. ORACLE risk scores were then dichotomized using a previously defined risk-score threshold (10.199) to classify samples into low- or high-risk groups. The model coefficients are specified in Supplementary Table 5 of the original publication⁸.

Batch correction for RNA-seq preprocessing pipeline versions

The computational pipeline for generating TRACERx RNA-seq data has been updated to the Nextflow pipeline³⁹ compared with the original pipeline used in the previous study⁸. Therefore, the count values of the same samples generated by the two pipelines are technically different. To ensure the same baseline and compatibility of a predefined ORACLE risk-score cutoff with the current cohort, we performed a batch correction. A linear regression model was fit between the ORACLE risk score of shared samples generated from the original and current pipelines (85 tumor regions in 27 patients). This yielded a conversion formula, and the ORACLE risk score was corrected as shown below (Extended Data Fig. 1b).

$${{\mathrm{Corrected}}\; {\mathrm{risk}}\; {\mathrm{scores}}}={{\mathrm{risk}}\; {\mathrm{scores}}}\times 1.04-0.081$$

Identification of LUAD RNA-seq prognostic signatures

Two RNA-seq prognostic signatures were identified in the previous study⁸. Of those, the TPM-based signature, Li et al.¹⁸, was selected for the analysis. Here, we used the same method as in the previous study to further identify five RNA-seq signatures^{18,19,20,21,22,23}. In brief, articles describing RNA-seq prognostic signatures for LUAD were identified by literature searching on PubMed and were manually reviewed. Only signatures with a full list of genes and model coefficients specified in the articles were included for subsequent analyses.

Tumor sampling bias metrics

Four metrics were used to measure tumor sampling bias across RNA-seq prognostic signatures:

(1)
The discordant rate was calculated as the percentage of patients who had regions classified as both high risk and low risk within a tumor.
(2)
The clustering concordance was calculated as described by Gyanchandani et al.²⁴. Tumor regions were clustered on the basis of the gene expression of a given prognostic signature using Manhattan distance and the Ward.D2 method. The concordant rate was quantified by the percentage of patients with all regions falling in the same cluster. This analysis was iterated from 1 to 122 clusters (the maximum number of clusters was set as the total number of patients in the multiregion TRACERx validation cohort).
(3)
For a given signature gene, the expression variability was quantified as the standard deviation of expression among tumor regions from each patient. The mean variability per signature was calculated as the average expression variability across patients in the TRACERx validation cohort.
(4)
Bachtiary et al.²⁶ previously developed a method to quantify total expression heterogeneity. In brief, the expression variance (σ²) within an individual tumor (w) was calculated (σ²w), then averaged across all tumors in the cohort. The mean within tumor expression variance was inversly related to the number of biopsies (k), denoted as ${{W}}=\scriptstyle\frac{\frac{1}{n}\sum {\sigma }^{2}{{w}}}{k({\mathrm{biopsies}})}$. The total variance (T) per gene expression signature was summarized as the sum of mean variance within tumor (W) and the variance between tumors (B = σ²b). The W-to-T ratio (W/T) measures the ITH per signature, with k equal to one to ten biopsies investigated in this analysis.

Survival analyses

OS was used as the primary outcome for prospective validation of survival association. It is defined as the time from registration to death or censoring. Lung-cancer-specific survival was used to measure the time from registration to death caused by lung cancer. DFS is defined as the time from registration to radiologically confirmed recurrence of the primary tumor or death or censoring. Intrathoracic relapses (n = 24), extrathoracic relapses (n = 14) or both (n = 16) were included in our dataset. Two patients with LUAD (CRUK0511 and CRUK0512) involved in the analysis for time to relapse were censored at the time of the diagnosis of new primary cancer owing to uncertainty of whether the subsequent recurrence was from the first primary or the new primary cancer. For patients with multiple synchronous primary LUAD tumors, the average value of genetic metrics was calculated. The HR and P value adjusted for age, sex, smoking pack-years, adjuvant treatment, tumor stage (TNM 8th edition) and histologic grade in multivariable Cox regression analyses, and log-rank P value between group comparisons were calculated using the survival R package (version 3.5). Kaplan–Meier curves were plotted using the survminer R package (version 0.4.9), whereas the results of multivariable Cox regression analyses were plotted using the forestplot R package (version 3.1.3). All survival analyses were performed on patients with all data available.

Meta-analysis of ORACLE prognostic values in microarray cohorts of patients with stage I LUAD

Microarray and clinical data were downloaded from GSE50081, GSE31210, GSE30219 and GSE68465 for a total of 580 patients with stage I LUAD enrolled in Shedden et al.⁷, Der et al.²⁷, Okayama et al.²⁸ and Rousseaux et al.²⁹ cohorts. The prognostic value of the ORACLE risk score was tested across four cohorts using the coxph function in the survival package (version 3.5). In the Der et al., Okayama et al. and Rousseaux et al. cohorts, 22 out of 23 genes were available, and in the Shedden et al. cohort, 19 out of 23 genes were available for analysis. The meta-analysis was performed using the rmeta R package (version 3.0).

Preinvasive lung squamous cell carcinogenesis dataset

Gene expression data published by Mascaux et al.³⁰ were downloaded from the Gene Expression Omnibus for 77 patients with lung squamous carcinogenesis (GSE33479). Eight histological stages were identified by the authors, including 27 normal tissues, 15 hyperplasia, 15 metaplasia, 13 mild dysplasia, 13 moderate dysplasia, 12 severe dysplasia, 13 CIS and 14 SCC. This was further summarized as four molecular steps of progression according to the authors, that is, (1) normal and hyperplasia tissues, (2) low-grade lesions including progression from metaplasia to moderate dysplasia, (3) high-grade lesions comprising severe dysplasia and CIS, and (4) the formation of SCC. A linear mixed-effects model was performed using the ORACLE risk score as the response variable and samples as the fixed effect, setting each patient as the random effect. No correction was made for multiple comparisons among developmental stages.

ORACLE risk score compared between seeding and nonseeding regions

The ORACLE risk score was calculated for each primary tumor region and compared between seeding and nonseeding regions by a linear mixed-effects model setting each tumor as a random effect. Seeding regions were defined as primary tumor regions that contain a most recent shared clone between the primary tumor and metastasis³¹.

In vitro drug sensitivity screening

The ORACLE risk score was calculated using expression data for cancer cell lines provided in DepMap (version 22Q1), subsetting for LUAD cell lines for subsequent analyses. Drug sensitivity (IC₅₀) data were derived from the GDSC database for 396 compounds and 54 LUAD cell lines (Cancer Cell Line Encyclopedia)^34,35. We filtered out cell lines with data for fewer than 50 compounds and removed compounds with data missing for more than 5 cell lines, leaving 37 cell lines and 359 compounds for subsequent analysis (Extended Data Fig. 6a). To determine the model for assessing association between drug sensitivity and ORACLE, we examined the distribution of IC₅₀ values, resulting in nonnormal distributions. Therefore, a Spearman correlation test was applied to the IC₅₀ and ORACLE risk score to determine significance (P < 0.05) for each drug across the cell lines. No correction was made for multiple comparisons. A list of drugs approved by the FDA for NSCLC was obtained from the National Cancer Institute (https://www.cancer.gov/about-cancer/treatment/drugs/lung). The targeting pathway was derived from the GDSC annotation.

Determinants for ORACLE magnitude

ORACLE magnitude was defined as the mean risk score among regions for a given tumor. To identify the associated determinants, multiple linear regression models were applied separately for clinicopathological and genetic features in the TRACERx exploratory cohort. Clinicopathological features include patient age, sex, the number of tumor biopsies, tumor stage (TNM version 8), smoking status, tumor volume and Ki67 score. Genetic features including WGD events, FLOH and tumor evolutionary metrics (SCNA-ITH, clonal and subclonal mutations in driver genes, and recent subclonal expansion) were identified in the TRACERx study¹⁴.

Clinical outcome variance explained by TRACERx biomarkers

To investigate how much variance of clinical outcome was explained by TRACERx biomarkers including SCNA-ITH, WGD, recent subclonal expansion, detection of preoperative ctDNA, STAS and ORACLE, we applied a generalized linear model treating the survival status at a given follow-up year as a response variable. Within the chosen follow-up time, patients with censored status were removed, keeping patients who had either a death event or no event. The variance explained was calculated using the PseudoR2 function in the DescTools R package (version 0.99.51).

Enrichment of somatic mutation in NSCLC driver genes

A list of SNVs in driver genes for NSCLC was collated in the TRACERx study¹⁴. For each SNV at the gene level, the enrichment was calculated using the frequency of mutations and was compared using a two-sided Fisher’s exact test at regional level. The OR was taken at the natural log scale. No correction was made for the multiple comparisons in this analysis.

Identification of recurrent SCNAs

The genomic regions that represented a recurrent SCNA were identified using GISTIC2.0 (version 2.0.23)³⁸. The copy number of a chromosomal segment was normalized against the sample mean ploidy and taken as the input for GISTIC2.0 to identify genomic regions with recurrent amplification or deletion. Amplification and deletion were defined as normalized copy number >log₂(2.5/2) and <log₂(1.5/2), respectively. For a given genomic region, the SCNA positive-selection score (G score) was obtained separately for patient cohorts with ORACLE low-risk and high-risk tumors; then, a G-score difference was calculated between the cohorts. A positive G-score difference (>0) with q value <0.05 indicated a statistically significant positive selection at the loci.

Statistical analysis

All statistical tests were performed using R (version 4.3.2). Tests involving correlation were performed using cor.test with the Pearson or Spearman method. Tests involving the comparisons of distributions were performed using wilcox.test with a two-sided Wilcoxon rank-sum test or using the lme function in the nlme R package (version 3.1) with a linear mixed-effects regression analysis. Fisher’s exact tests using fisher.test or chi-squared test using chisq.test were applied to count data to compare frequencies. HRs and P values for ORACLE adjusted for clinicopathological factors were calculated using multivariable Cox proportional hazards models. Two-sided log-rank tests were performed for the comparisons between groups in the Kaplan–Meier curves. For all analyses, the number of data points included was plotted or annotated in the corresponding figures and all statistical tests were two-sided unless otherwise specified. P < 0.05 was considered as statistically significant unless otherwise specified. The R packages tidyverse (version 2.0.0) and readxl (version 1.4.3) were used for data handling. The plotting was performed using ggplot2 (version 3.5.1), ggalluvial (version 0.12.5), ggrepel (version 0.9.4), ComplexHeatmap (version 2.18.0), pheatmap (version 1.0.12), cowplot (version 1.1.1), gridExtra (version 2.3), scales (version 1.3.0), RColorBrewer (version 1.1), viridis (version 0.6.4), circlize (version 0.4.15), wesanderson (version 0.3.7) and colorspace (version 2.1).

Statistics and reproducibility

No statistical method was used to predetermine sample sizes of the validation and exploratory cohorts. All available samples that passed the quality-check filters of sequencing data were included in our analyses. Data collection and analysis were not performed blind to the conditions of the study. Our study did not include group assignments and, thus, randomization is not applicable. Data distribution was assumed to be normal, but this was not formally tested. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The RNA-seq data (in each case from the TRACERx study) used during this study have been deposited at the European Genome–phenome Archive, which is hosted by the European Bioinformatics Institute and the Centre for Genomic Regulation, under accession code EGAS00001006517. Access is controlled by the TRACERx data access committee. Details on how to apply for access are available at the linked page. Previously published preinvasive lesion data are available under accession code GSE33479. Four microarray cohorts used for survival validation of ORACLE were available under accession codes GSE68465, GSE50081, GSE31210 and GSE30219. Source data are provided with this paper.

Code availability

No new code was developed in our study. Codes for processing data and generating figures are available via GitHub at https://github.com/dhruvabiswas/tracerx-oracle2.

References

Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).
Article PubMed Google Scholar
Chen, Z., Fillmore, C. M., Hammerman, P. S., Kim, C. F. & Wong, K.-K. Non-small-cell lung cancers: a heterogeneous set of diseases. Nat. Rev. Cancer 14, 535–546 (2014).
Article CAS PubMed PubMed Central Google Scholar
Goldstraw, P. et al. The IASLC Lung Cancer Staging Project: proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung cancer. J. Thorac. Oncol. 11, 39–51 (2016).
Article PubMed Google Scholar
Vargas, A. J. & Harris, C. C. Biomarker development in the precision medicine era: lung cancer as a case study. Nat. Rev. Cancer 16, 525–537 (2016).
Article CAS PubMed PubMed Central Google Scholar
de Koning, H. J. et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N. Engl. J. Med. 382, 503–513 (2020).
Article PubMed Google Scholar
Subramanian, J. & Simon, R. Gene expression-based prognostic signatures in lung cancer: ready for clinical use? J. Natl Cancer Inst. 102, 464–474 (2010).
Article CAS PubMed PubMed Central Google Scholar
Director’s Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma. Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat. Med. 14, 822–827 (2008).
Article Google Scholar
Biswas, D. et al. A clonal expression biomarker associates with lung cancer mortality. Nat. Med. 25, 1540–1548 (2019).
Article CAS PubMed PubMed Central Google Scholar
Breslow, A. Thickness, cross-sectional areas and depth of invasion in the prognosis of cutaneous melanoma. Ann. Surg. 172, 902–908 (1970).
Article CAS PubMed PubMed Central Google Scholar
Lehman, J. A., Cross, F. S. & Richey, D. G. Clinical study of forty-nine patients with malignant melanoma. Cancer 19, 611–619 (1966).
Article PubMed Google Scholar
Blackhall, F. H. et al. Stability and heterogeneity of expression profiles in lung cancer specimens harvested following surgical resection. Neoplasia 6, 761–767 (2004).
Article PubMed PubMed Central Google Scholar
Karschnia, P. et al. A framework for standardised tissue sampling and processing during resection of diffuse intracranial glioma: joint recommendations from four RANO groups. Lancet Oncol. 24, e438–e450 (2023).
Article PubMed PubMed Central Google Scholar
Litchfield, K. et al. Representative sequencing: unbiased sampling of solid tumor tissue. Cell Rep. 31, 107550 (2020).
Article CAS PubMed Google Scholar
Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525–533 (2023).
Article CAS PubMed PubMed Central Google Scholar
Abbosh, C. et al. Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA. Nature 616, 553–562 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jamal-Hanjani, M. et al. Tracking the evolution of non–small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Article CAS PubMed Google Scholar
Karasaki, T. et al. Evolutionary characterization of lung adenocarcinoma morphology in TRACERx. Nat. Med. 29, 833–845 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, B., Cui, Y., Diehn, M. & Li, R. Development and validation of an individualized immune prognostic signature in early-stage nonsquamous non–small cell lung cancer. JAMA Oncol. 3, 1529–1537 (2017).
Article PubMed PubMed Central Google Scholar
Song, C. et al. Identification of an inflammatory response signature associated with prognostic stratification and drug sensitivity in lung adenocarcinoma. Sci. Rep. 12, 10110 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jin, X. et al. A novel prognostic signature revealed the interaction of immune cells in tumor microenvironment based on single-cell RNA sequencing for lung adenocarcinoma. J. Immunol. Res. 2022, 6555810 (2022).
Article PubMed PubMed Central Google Scholar
Wang, X. et al. A novel M6A-related genes signature can impact the immune status and predict the prognosis and drug sensitivity of lung adenocarcinoma. Front. Immunol. 13, 923533 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, F., Niu, Y., Zhao, W., Yan, C. & Qi, Y. Construction and validation of a prognostic model for lung adenocarcinoma based on endoplasmic reticulum stress-related genes. Sci. Rep. 12, 19857 (2022).
Article PubMed PubMed Central Google Scholar
Zhao, J. et al. Identification of a novel gene expression signature associated with overall survival in patients with lung adenocarcinoma: a comprehensive analysis based on TCGA and GEO databases. Lung Cancer 149, 90–96 (2020).
Article PubMed Google Scholar
Gyanchandani, R. et al. Intratumor heterogeneity affects gene expression profile test prognostic risk stratification in early breast cancer. Clin. Cancer Res. 22, 5362–5369 (2016).
Article CAS PubMed PubMed Central Google Scholar
Househam, J. et al. Phenotypic plasticity and genetic control in colorectal cancer evolution. Nature 611, 744–753 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bachtiary, B. et al. Gene expression profiling in cervical cancer: an exploration of intratumor heterogeneity. Clin. Cancer Res. 12, 5632–5640 (2006).
Article CAS PubMed Google Scholar
Der, S. D. et al. Validation of a histology-independent prognostic gene signature for early-stage, non–small-cell lung cancer including stage IA patients. J. Thorac. Oncol. 9, 59–64 (2014).
Article CAS PubMed Google Scholar
Okayama, H. et al. Identification of genes upregulated in ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas. Cancer Res. 72, 100–111 (2012).
Article CAS PubMed Google Scholar
Rousseaux, S. et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci. Transl. Med. 5, 186ra66 (2013).
Article PubMed PubMed Central Google Scholar
Mascaux, C. et al. Immune evasion before tumour invasion in early lung squamous carcinogenesis. Nature 571, 570–575 (2019).
Article CAS PubMed Google Scholar
Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534–542 (2023).
Article CAS PubMed PubMed Central Google Scholar
Strauss, G. M. et al. Adjuvant paclitaxel plus carboplatin compared with observation in stage IB non–small-cell lung cancer: CALGB 9633 with the Cancer and Leukemia Group B, Radiation Therapy Oncology Group, and North Central Cancer Treatment Group Study Groups. J. Clin. Oncol. 26, 5043–5051 (2008).
Article CAS PubMed PubMed Central Google Scholar
Butts, C. A. et al. Randomized phase III trial of vinorelbine plus cisplatin compared with observation in completely resected stage IB and II non–small-cell lung cancer: updated survival analysis of JBR-10. J. Clin. Oncol. 28, 29–34 (2010).
Article CAS PubMed Google Scholar
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Article CAS PubMed PubMed Central Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sabbatini, P. et al. Antitumor activity of GSK1904529A, a small-molecule inhibitor of the insulin-like growth factor-I receptor tyrosine kinase. Clin. Cancer Res. 15, 3058–3067 (2009).
Article CAS PubMed Google Scholar
Lu, W.-T. et al. TRACERx analysis identifies a role for FAT1 in regulating chromosomal instability and whole-genome doubling via Hippo signaling. Nat. Cell Biol. https://doi.org/10.1038/s41556-024-01558-w (2024).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Article PubMed PubMed Central Google Scholar
Martínez-Ruiz, C. et al. Genomic–transcriptomic evolution in lung cancer and metastasis. Nature 616, 543–552 (2023).
Article PubMed PubMed Central Google Scholar
McCall, S. J. & Dry, S. M. Precision pathology as part of precision medicine: are we optimizing patients’ interests in prioritizing use of limited tissue samples? JCO Precis. Oncol. 3, 1–6 (2019).
Article PubMed Google Scholar
Devarakonda, S. & Govindan, R. Untangling the evolutionary roots of lung cancer. Nat. Commun. 10, 2979 (2019).
Article PubMed PubMed Central Google Scholar
Thakrar, R. M., Pennycuick, A., Borg, E. & Janes, S. M. Preinvasive disease of the airway. Cancer Treat. Rev. 58, 77–90 (2017).
Article PubMed Google Scholar
Bakhoum, S. F. & Landau, D. A. Chromosomal instability as a driver of tumor heterogeneity and evolution. Cold Spring Harb. Perspect. Med. 7, a029611 (2017).
Article PubMed PubMed Central Google Scholar
Thompson, J. S. et al. Predicting response to cytotoxic chemotherapy. Preprint at bioRxiv https://doi.org/10.1101/2023.01.28.525988 (2023).
Sparano, J. A. et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N. Engl. J. Med. 379, 111–121 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cardoso, F. et al. 70-Gene signature as an aid to treatment decisions in early-stage breast cancer. N. Engl. J. Med. 375, 717–729 (2016).
Article CAS PubMed Google Scholar
Cristescu, R. et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based immunotherapy. Science 362, eaar3593 (2018).
Article PubMed PubMed Central Google Scholar
Uguen, A. & Troncone, G. A review on the Idylla platform: towards the assessment of actionable genomic alterations in one day. J. Clin. Pathol. 71, 757–762 (2018).
Article CAS PubMed Google Scholar
Lin, Y. et al. Clonal gene signatures predict prognosis in mesothelioma and lung adenocarcinoma. npj Precis. Oncol. 8, 47 (2024).
Article CAS PubMed PubMed Central Google Scholar
Cui, S. et al. Tracking the evolution of esophageal squamous cell carcinoma under dynamic immune selection by multi-omics sequencing. Nat. Commun. 14, 892 (2023).
Article CAS PubMed PubMed Central Google Scholar
Luo, S., Jia, Y., Zhang, Y. & Zhang, X. A transcriptomic intratumour heterogeneity-free signature overcomes sampling bias in prognostic risk classification for hepatocellular carcinoma. JHEP Rep. 5, 100754 (2023).
Article PubMed PubMed Central Google Scholar
Yang, C. et al. Multi-region sequencing with spatial information enables accurate heterogeneity estimation and risk stratification in liver cancer. Genome Med. 14, 142 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, W., Huang, F., Tang, X. & Ran, L. The clonal expression genes associated with poor prognosis of liver cancer. Front. Genet. 13, 808273 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).
Article CAS PubMed PubMed Central Google Scholar
Suda, K. & Mitsudomi, T. Inter-tumor heterogeneity of PD-L1 status: is it important in clinical decision making? J. Thorac. Dis. 12, 1770–1775 (2020).
Article PubMed PubMed Central Google Scholar
Tofigh, A. et al. The prognostic ease and difficulty of invasive breast carcinoma. Cell Rep. 9, 129–142 (2014).
Article CAS PubMed Google Scholar
Mlecnik, B. et al. Comprehensive intrametastatic immune quantification and major impact of immunoscore on survival. J. Natl Cancer Inst. 110, 97–108 (2018).
Article CAS Google Scholar
Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 1114–1117 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).
Article CAS PubMed PubMed Central Google Scholar
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The TRACERx study (ClinicalTrials.gov identifier NCT01888601) is sponsored by University College London (UCL/12/0279) and has been approved by an independent Research Ethics Committee (13/LO/1546). TRACERx is funded by Cancer Research UK (CRUK) (C11496/A17786) and coordinated through the CRUK and UCL Cancer Trials Centre, which has a core grant from CRUK (C444/A15953). We acknowledge the patients and relatives who participated in the TRACERx study; all site personnel, investigators, funders and industry partners who supported the generation of the data within this study; and the support of Scientific Computing, the Advanced Sequencing Facility and Experimental Histopathology Science Technology Platforms at the Francis Crick Institute. This work is also supported by the CRUK Lung Cancer Centre of Excellence and the CRUK City of London Centre Award (C7893/A26233) as well as the UCL Experimental Cancer Medicine Centre. D.B. is supported by funding from a Cancer Research UK (CRUK) Early Detection and Diagnosis Project award (EDDCPJT\100008), the Idea to Innovation (i2i) Crick translation scheme supported by the Medical Research Council, the National Institute for Health Research (NIHR) Biomedical Research Centre and the Breast Cancer Research Foundation (BCRF). Fig. 1a is created in BioRender.com (Biswas, D. (2024) BioRender.com/t56b560). Y.-H.L. is supported by funding from a Cancer Research UK (CRUK) Early Detection and Diagnosis Project award (EDDCPJT\100008). Y.W. is supported by funding from the Wellcome Trust (220589/Z/20/Z). T.K. is supported by the JSPS Overseas Research Fellowships Program (202060447). J.M. is supported by the Hungarian National Research, Development and Innovation Office (K129065). B.D. is supported by the Austrian Science Fund (FWF I3522, FWF I3977, and I4677) and the ‘BIOSMALL’ EU HORIZON-MSCA-2022-SE-01 project. Z.M. is supported by the New National Excellence Program of the Ministry for Innovation and Technology of Hungary (UNKP‐20‐3, UNKP‐21‐3 and UNKP-23-5), and by the Bolyai Research Scholarship of the Hungarian Academy of Sciences. Z.M. is also the recipient of an International Association for the Study of Lung Cancer/International Lung Cancer Foundation Young Investigator Grant (2022). B.D. and Z.M. are supported by funding from the Hungarian National Research, Development, and Innovation Office (2020‐1.1.6‐JÖVŐ, TKP2021‐EGA‐33, FK‐143751 and FK-147045). M.J.-H. has received funding from CRUK, NIH National Cancer Institute, IASLC International Lung Cancer Foundation, Lung Cancer Research Foundation, Rosetrees Trust, UKI NETs and NIHR. C.S. is a Royal Society Napier Research Professor (RSRP\R\210001). C.S. is supported by the Francis Crick Institute, which receives its core funding from CRUK (CC2041), the UK Medical Research Council (CC2041) and the Wellcome Trust (CC2041). C.S. is funded by CRUK (TRACERx (C11496/A17786)), PEACE (C416/A21999) and CRUK Cancer Immunotherapy Catalyst Network), CRUK Lung Cancer Centre of Excellence (C11496/A30025), the Rosetrees Trust, Butterfield and Stoneygate Trusts, the NovoNordisk Foundation (ID16584), a Royal Society Professorship Enhancement Award (RP/EA/180007), the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre, the CRUK–University College London Centre, the Experimental Cancer Medicine Centre, the Breast Cancer Research Foundation (US) and The Mark Foundation for Cancer Research Aspire Award (grant 21-029-ASP).

Funding

Open Access funding provided by The Francis Crick Institute.

Author information

These authors contributed equally: Dhruva Biswas, Yun-Hsin Liu.

Authors and Affiliations

Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
Dhruva Biswas, Yun-Hsin Liu, David A. Moore, Takahiro Karasaki, Kristiana Grigoriadis, Selvaraju Veeriah, Cristina Naceur-Lombardelli, Neil Magno, Sophia Ward, Alexander M. Frankell, Ariana Huebner, Corentin Richard, Crispin T. Hiley, Emilia L. Lim, Francisco Gimeno-Valiente, Krupa Thakkar, Maise Al Bakir, Monica Sivakumar, Ieva Usaite, Sadegh Saghafinia, Sharon Vanloo, Sian Harries, Antonia Toncheva, Paulina Prymas, Bushra Mussa, Michalina Magala, Elizabeth Keene, Abigail Bunkum, Carlos Martínez-Ruiz, Clare Puttick, Despoina Karagianni, James R. M. Black, Kerstin Thol, Nicholas McGranahan, Olivia Lucas, Robert Bentham, Roberto Vendramin, Sergio A. Quezada, Simone Zaccaria, Sonya Hessey, Supreet Kaur Bola, Wing Kin Liu, Rija Zaidi, Lucrezia Patruno, Martin D. Forster, Siow Ming Lee, Mariam Jamal-Hanjani, Nnennaya Kanu, Nicolai J. Birkbak & Charles Swanton
Bill Lyons Informatics Centre, University College London Cancer Institute, London, UK
Dhruva Biswas & Javier Herrero
Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
Dhruva Biswas, David A. Moore, Takahiro Karasaki, Kristiana Grigoriadis, Wei-Ting Lu, Sophia Ward, Alexander M. Frankell, Mark S. Hill, Emma Colliver, Ariana Huebner, Crispin T. Hiley, Emilia L. Lim, Maise Al Bakir, Sian Harries, Clare Puttick, James R. M. Black, Olivia Lucas, Roberto Vendramin, Gareth A. Wilson, Rachel Rosenthal, Andrew Rowan, Chris Bailey, Claudia Lee, Katey S. S. Enfield, Mihaela Angelova, Oriol Pich, Cian Murphy, Maria Zagorulya, Michelle M. Leung, Nicolai J. Birkbak & Charles Swanton
Centre for Inflammation Biology and Cancer Immunology, King’s College London, London, UK
Yin Wu
Department of Medical Oncology, Guy’s Hospital, London, UK
Yin Wu
Department of Cellular Pathology, University College London Hospitals, London, UK
David A. Moore, Teresa Marafioti, Elaine Borg, Mary Falzon & Reena Khiroya
Cancer Metastasis Lab, University College London Cancer Institute, London, UK
Takahiro Karasaki, Abigail Bunkum, Sonya Hessey, Wing Kin Liu & Mariam Jamal-Hanjani
Department of Thoracic Surgery, Respiratory Center, Toranomon Hospital, Tokyo, Japan
Takahiro Karasaki
Cancer Genome Evolution Research Group, Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
Kristiana Grigoriadis, Ariana Huebner, Carlos Martínez-Ruiz, Clare Puttick, Kerstin Thol, Robert Bentham, Michelle M. Leung & Thomas Patrick Jones
Genomics Science Technology Platform, The Francis Crick Institute, London, UK
Sophia Ward, Daniel M. Snell, Olga O’Neill, Daniel Leonce, Jerome Nicod & Sian Harries
Oncogene Biology Laboratory, The Francis Crick Institute, London, UK
Sophie de Carné Trécesson & Julian Downward
Bioinformatics and Biostatistics, The Francis Crick Institute, London, UK
Philip East
Cancer Research UK and University College London Cancer Trials Centre, University College London, London, UK
Aman Malhi & Allan Hackshaw
Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
Johanna Mattsson, Amanda Lindberg & Patrick Micke
1st Department of Pulmonology, National Koranyi Institute of Pulmonology, Budapest, Hungary
Judit Moldvay
Department of Pulmonology, University of Szeged Albert Szent-Gyorgyi Medical School, Szeged, Hungary
Judit Moldvay
National Koranyi Institute of Pulmonology, Budapest, Hungary
Zsolt Megyesfalvi, Balazs Dome & János Fillinger
Department of Thoracic Surgery, Semmelweis University and National Institute of Oncology, Budapest, Hungary
Zsolt Megyesfalvi & Balazs Dome
Department of Thoracic Surgery, Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
Zsolt Megyesfalvi & Balazs Dome
Department of Translational Medicine, Lund University, Lund, Sweden
Balazs Dome
Computational Health Informatics Program, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
Zoltan Szallasi
Department of Oncology, University College London Hospitals, London, UK
Martin D. Forster, Siow Ming Lee, Sarah Benafif, Dionysis Papadatos-Pastos, James Wilson, Tanya Ahmad, Mariam Jamal-Hanjani & Charles Swanton
Department of Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark
Nicolai J. Birkbak
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Nicolai J. Birkbak
Computational Cancer Genomics Research Group, University College London Cancer Institute, London, UK
Abigail Bunkum, Olivia Lucas, Simone Zaccaria, Sonya Hessey, Rija Zaidi & Lucrezia Patruno
Immune Regulation and Tumour Immunotherapy Group, Cancer Immunology Unit, Research Department of Haematology, University College London Cancer Institute, London, UK
Despoina Karagianni, Sergio A. Quezada & Supreet Kaur Bola
Cancer Genome Evolution Research Group, University College London Cancer Institute, London, UK
Nicholas McGranahan
University College London Hospitals, London, UK
Olivia Lucas, Emilie Martinoni Hoogenboom, Fleur Monk, James W. Holding, Junaid Choudhary, Kunal Bhakhri, Pat Gorman, Robert C. M. Stephens, Yien Ning Sophia Wong, Maria Chiara Pisciella & Steve Bandula
Tumour Immunogenomics and Immunosurveillance Laboratory, University College London Cancer Institute, London, UK
Roberto Vendramin
Cancer Research UK Lung Cancer Centre of Excellence, University College London, Cancer Institute, London, UK
Michelle M. Leung
The Whittington Hospital NHS Trust, London, UK
Sarah Benafif, Jack French & Kayleigh Gilbert
University College London Cancer Institute, London, UK
Angela Dwornik, Angeliki Karamani, Benny Chain, David R. Pearce, Georgia Stavrou, Gerasimos-Theodoros Mastrokalos, Helen L. Lowe, James L. Reading, John A. Hartley, Kayalvizhi Selvaraju, Leah Ensell, Mansi Shah, Maria Litovchenko, Piotr Pawlik, Samuel Gamble, Seng Kuong Anakin Ung & Victoria Spanswick
The Francis Crick Institute, London, UK
Clare E. Weeden, Eva Grönroos, Jacki Goldman, Mickael Escudero, Philip Hobson, Stefan Boeing, Tamara Denner, Vittorio Barbè, William Hill, Yutaka Naito, Erik Sahai, Zoe Ramsden, George Kassiotis & Imran Noorani
Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, UK
George Kassiotis
Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK
Imran Noorani
University College London, London, UK
Imran Noorani
Singleton Hospital, Swansea Bay University Health Board, Swansea, UK
Jason F. Lester
University Hospitals of Leicester NHS Trust, Leicester, UK
Amrita Bajaj, Apostolos Nakas, Azmina Sodha-Ramdeen, Mohamad Tufail, Molly Scotland, Rebecca Boyles, Sridhar Rathinam, Sean Dulloo & Dean A. Fennell
University of Leicester, Leicester, UK
Sean Dulloo & Dean A. Fennell
Leicester Medical School, University of Leicester, Leicester, UK
Claire Wilson
Cancer Research Centre, University of Leicester, Leicester, UK
Gurdeep Matharu & Jacqui A. Shaw
Royal Free London NHS Foundation Trust, London, UK
Ekaterini Boleti
Aberdeen Royal Infirmary NHS Grampian, Aberdeen, UK
Heather Cheyne, Mohammed Khalil, Shirley Richardson & Tracey Cruickshank
Department of Medical Oncology, Aberdeen Royal Infirmary NHS Grampian, Aberdeen, UK
Gillian Price
University of Aberdeen, Aberdeen, UK
Gillian Price & Keith M. Kerr
Department of Pathology, Aberdeen Royal Infirmary NHS Grampian, Aberdeen, UK
Keith M. Kerr
Birmingham Acute Care Research Group, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
Babu Naidu
Guy’s and St Thomas’ NHS Foundation Trust, London, UK
Akshay J. Patel
University Hospital Birmingham NHS Foundation Trust, Birmingham, UK
Aya Osman, Mandeesh Sangha, Gerald Langman, Helen Shackleford, Madava Djearaman & Gary Middleton
Institute of Immunology and Immunotherapy, University of Birmingham, Birmingham, UK
Gary Middleton
Manchester Cancer Research Centre Biobank, Manchester, UK
Angela Leek, Jack Davies Hodgkinson & Nicola Totton
Wythenshawe Hospital, Manchester University NHS Foundation Trust, Wythenshawe, UK
Eustace Fontaine, Felice Granato, Juliette Novasio, Kendadai Rammohan, Leena Joseph, Paul Bishop, Vijay Joshi, Sara Waplington & Adam Atkin
Manchester University NHS Foundation Trust, Manchester, UK
Antonio Paiva-Correia
Wythenshawe Hospital, Manchester University NHS Foundation Trust, Manchester, UK
Philip Crosbie
Division of Infection, Immunity and Respiratory Medicine, University of Manchester, Manchester, UK
Philip Crosbie
Cancer Research UK Lung Cancer Centre of Excellence, University of Manchester, Manchester, UK
Philip Crosbie, Katherine D. Brown, Mathew Carter, Anshuman Chaturvedi, Pedro Oliveira, Colin R. Lindsay, Fiona H. Blackhall & Yvonne Summers
The Christie NHS Foundation Trust, Manchester, UK
Katherine D. Brown, Mathew Carter, Anshuman Chaturvedi & Pedro Oliveira
Division of Cancer Sciences, The University of Manchester and The Christie NHS Foundation Trust, Manchester, UK
Colin R. Lindsay, Fiona H. Blackhall, Yvonne Summers & Matthew G. Krebs
CRUK Manchester Institute Cancer Biomarker Centre, University of Manchester, Manchester, UK
Jonathan Tugwood & Caroline Dive
CRUK Lung Cancer Centre of Excellence, University of Manchester, Manchester, UK
Jonathan Tugwood & Caroline Dive
Artificial Intelligence in MedicineAIM Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
Hugo J. W. L. Aerts
Department of Radiation Oncology, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
Hugo J. W. L. Aerts
Radiology and Nuclear Medicine, CARIM and GROW, Maastricht University, Maastricht, The Netherlands
Hugo J. W. L. Aerts
Institute for Computational Cancer Biology, Center for Integrated OncologyCIO, Cancer Research Center Cologne EssenCCCE, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
Roland F. Schwarz
Berlin Institute for the Foundations of Learning and DataBIFOLD, Berlin, Germany
Roland F. Schwarz & Tom L. Kaufmann
Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
Tom L. Kaufmann
Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Peter Van Loo
Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Peter Van Loo
Cancer Genomics Laboratory, The Francis Crick Institute, London, UK
Peter Van Loo & Carla Castignani
Medical Genomics, University College London Cancer Institute, London, UK
Carla Castignani & Stephan Beck
Department of Pathology, ZAS Hospitals, Antwerp, Belgium
Roberto Salgado
Division of Research, Peter MacCallum Cancer Centre, Melbourne, Australia
Roberto Salgado
Danish Cancer Institute, Copenhagen, Denmark
Miklos Diossy
Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
Miklos Diossy
Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
Miklos Diossy
Integrative Cancer Genomics Laboratory, VIB Center for Cancer Biology, Leuven, Belgium
Jonas Demeulemeester
VIB Center for AI & Computational Biology, Leuven, Belgium
Jonas Demeulemeester
Department of Oncology, KU Leuven, Leuven, Belgium
Jonas Demeulemeester
Experimental Histopathology, The Francis Crick Institute, London, UK
Emma Nye & Richard Kevin Stone
University College London Cancer Institute, London, UK and Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
Jayant K. Rane
Cancer Metastasis Laboratory, University College London Cancer Institute, London, UK
Jeanette Kittel, Kerstin Haase, Kexin Koh & Rachel Scott
Cancer Research UK Lung Cancer Centre of Excellence, UCL Cancer Institute, London, UK
Jeanette Kittel, Kerstin Haase, Kexin Koh & Rachel Scott
Department of Haematology, University College London Hospitals, London, UK
Karl S. Peggs
Cancer Immunology Unit, Research Department of Haematology, University College London Cancer Institute, London, UK
Karl S. Peggs
National Cancer Centre, Singapore, Singapore
Yien Ning Sophia Wong
Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
Thomas B. K. Watkins
Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, London, UK
Catarina Veiga
Department of Medical Physics and Bioengineering, University College London Cancer Institute, London, UK
Gary Royle
Department of Medical Physics and Biomedical Engineering, University College London, London, UK
Charles-Antoine Collins-Fekete
Institute of Nuclear Medicine, Division of Medicine, University College London, London, UK
Francesco Fraioli
Institute of Structural and Molecular Biology, University College London, London, UK
Paul Ashford
Department of Radiology, University College London Hospitals, London, UK
Alexander James Procter, Asia Ahmed, Magali N. Taylor & Arjun Nair
UCL Respiratory, Department of Medicine, University College London, London, UK
Arjun Nair
Department of Thoracic Surgery, University College London Hospital NHS Trust, London, UK
David Lawrence & Davide Patrini
Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
Neal Navani & Ricky M. Thakrar
Department of Thoracic Medicine, University College London Hospitals, London, UK
Neal Navani & Ricky M. Thakrar
Lungs for Living Research Centre, UCL Respiratory, Department of Medicine, University College London, London, UK
Sam M. Janes
Integrated Radiology Department, North-Buda St John’s Central Hospital, Budapest, Hungary
Zoltan Kaplar
Institute of Nuclear Medicine, University College London Hospitals, London, UK
Zoltan Kaplar
Cancer Research UK and UCL Cancer Trials Centre, London, UK
Allan Hackshaw, Camilla Pilotti, Rachel Leslie, Anne-Marie Hacker, Sean Smith & Aoife Walker
The Institute of Cancer Research, London, UK
Anca Grapa
Garvan Institute of Medical Research, Sydeny, New South Wales, Australia
Hanyun Zhang
Case45, London, UK
Khalid AbdulJabbar
The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Xiaoxi Pan
The University of Texas MD Anderson Cancer Center, Houston, USA
Yinyin Yuan
Independent Cancer Patient’s voice, London, UK
David Chuter & Mairead MacKenzie
University Hospital Southampton NHS Foundation Trust, Southampton, UK
Serena Chee & Patricia Georg
The NIHR Southampton Biomedical Research Centre, University Hospital Southampton NHS Foundation Trust, Southampton, UK
Aiman Alzetani
Department of Oncology, University Hospital Southampton NHS Foundation Trust, Southampton, UK
Judith Cave
Academic Division of Thoracic Surgery, Imperial College London, London, UK
Eric Lim
Royal Brompton and Harefield Hospitals, part of Guy’s and St Thomas’ NHS Foundation Trust, London, UK
Eric Lim, Paulo De Sousa, Simon Jordan, Alexandra Rice, Hilgardt Raubenheimer, Harshil Bhayani, Lyn Ambrose, Anand Devaraj, Hemangi Chavan, Sofina Begum, Silviu I. Buderi, Daniel Kaniu, Mpho Malima, Sarah Booth, Andrew G. Nicholson, Nadia Fernandes, Pratibha Shah & Chiara Proli
National Heart and Lung Institute, Imperial College, London, UK
Andrew G. Nicholson
Royal Surrey Hospital, Royal Surrey Hospitals NHS Foundation Trust, Guildford, UK
Madeleine Hewish
University of Surrey, Guildford, UK
Madeleine Hewish
University of Sheffield, Sheffield, UK
Sarah Danson
Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
Sarah Danson
Liverpool Heart and Chest Hospital, Liverpool, UK
Michael J. Shackcloth
Princess Alexandra Hospital, The Princess Alexandra Hospital NHS Trust, Harlow, UK
Lily Robinson & Peter Russell
School of Cancer Sciences, University of Glasgow, Glasgow, UK
Kevin G. Blyth
Beatson Institute for Cancer Research, University of Glasgow, Glasgow, UK
Kevin G. Blyth
Queen Elizabeth University Hospital, Glasgow, UK
Kevin G. Blyth
Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow, UK
Andrew Kidd
NHS Greater Glasgow and Clyde, Glasgow, UK
Craig Dick
Cancer Research UK Scotland Institute, Glasgow, UK
John Le Quesne
Institute of Cancer Sciences, University of Glasgow, Glasgow, UK
John Le Quesne
NHS Greater Glasgow and Clyde Pathology Department, Queen Elizabeth University Hospital, Glasgow, UK
John Le Quesne
Golden Jubilee National Hospital, Clydebank, UK
Alan Kirk, Mo Asif, Rocco Bilancia, Nikos Kostoulas, Jennifer Whiteley & Mathew Thomas

Authors

Dhruva Biswas
View author publications
Search author on:PubMed Google Scholar
Yun-Hsin Liu
View author publications
Search author on:PubMed Google Scholar
Javier Herrero
View author publications
Search author on:PubMed Google Scholar
Yin Wu
View author publications
Search author on:PubMed Google Scholar
David A. Moore
View author publications
Search author on:PubMed Google Scholar
Takahiro Karasaki
View author publications
Search author on:PubMed Google Scholar
Kristiana Grigoriadis
View author publications
Search author on:PubMed Google Scholar
Wei-Ting Lu
View author publications
Search author on:PubMed Google Scholar
Selvaraju Veeriah
View author publications
Search author on:PubMed Google Scholar
Cristina Naceur-Lombardelli
View author publications
Search author on:PubMed Google Scholar
Neil Magno
View author publications
Search author on:PubMed Google Scholar
Sophia Ward
View author publications
Search author on:PubMed Google Scholar
Alexander M. Frankell
View author publications
Search author on:PubMed Google Scholar
Mark S. Hill
View author publications
Search author on:PubMed Google Scholar
Emma Colliver
View author publications
Search author on:PubMed Google Scholar
Sophie de Carné Trécesson
View author publications
Search author on:PubMed Google Scholar
Philip East
View author publications
Search author on:PubMed Google Scholar
Aman Malhi
View author publications
Search author on:PubMed Google Scholar
Daniel M. Snell
View author publications
Search author on:PubMed Google Scholar
Olga O’Neill
View author publications
Search author on:PubMed Google Scholar
Daniel Leonce
View author publications
Search author on:PubMed Google Scholar
Johanna Mattsson
View author publications
Search author on:PubMed Google Scholar
Amanda Lindberg
View author publications
Search author on:PubMed Google Scholar
Patrick Micke
View author publications
Search author on:PubMed Google Scholar
Judit Moldvay
View author publications
Search author on:PubMed Google Scholar
Zsolt Megyesfalvi
View author publications
Search author on:PubMed Google Scholar
Balazs Dome
View author publications
Search author on:PubMed Google Scholar
János Fillinger
View author publications
Search author on:PubMed Google Scholar
Jerome Nicod
View author publications
Search author on:PubMed Google Scholar
Julian Downward
View author publications
Search author on:PubMed Google Scholar
Zoltan Szallasi
View author publications
Search author on:PubMed Google Scholar
Allan Hackshaw
View author publications
Search author on:PubMed Google Scholar
Mariam Jamal-Hanjani
View author publications
Search author on:PubMed Google Scholar
Nnennaya Kanu
View author publications
Search author on:PubMed Google Scholar
Nicolai J. Birkbak
View author publications
Search author on:PubMed Google Scholar
Charles Swanton
View author publications
Search author on:PubMed Google Scholar

Consortia

TRACERx Consortium

Charles Swanton
, Mariam Jamal-Hanjani
, Dhruva Biswas
, Yin Wu
, David A. Moore
, Takahiro Karasaki
, Kristiana Grigoriadis
, Wei-Ting Lu
, Selvaraju Veeriah
, Cristina Naceur-Lombardelli
, Sophia Ward
, Alexander M. Frankell
, Emma Colliver
, Jerome Nicod
, Zoltan Szallasi
, Nnennaya Kanu
, Nicolai J. Birkbak
, Ariana Huebner
, Corentin Richard
, Crispin T. Hiley
, Emilia L. Lim
, Francisco Gimeno-Valiente
, Krupa Thakkar
, Maise Al Bakir
, Monica Sivakumar
, Ieva Usaite
, Sadegh Saghafinia
, Sharon Vanloo
, Sian Harries
, Antonia Toncheva
, Paulina Prymas
, Bushra Mussa
, Michalina Magala
, Elizabeth Keene
, Abigail Bunkum
, Carlos Martínez-Ruiz
, Clare Puttick
, Despoina Karagianni
, James R. M. Black
, Kerstin Thol
, Nicholas McGranahan
, Olivia Lucas
, Robert Bentham
, Roberto Vendramin
, Sergio A. Quezada
, Simone Zaccaria
, Sonya Hessey
, Supreet Kaur Bola
, Wing Kin Liu
, Rija Zaidi
, Lucrezia Patruno
, Martin D. Forster
, Siow Ming Lee
, Gareth A. Wilson
, Rachel Rosenthal
, Andrew Rowan
, Chris Bailey
, Claudia Lee
, Katey S. S. Enfield
, Mihaela Angelova
, Oriol Pich
, Cian Murphy
, Maria Zagorulya
, Michelle M. Leung
, Teresa Marafioti
, Elaine Borg
, Mary Falzon
, Reena Khiroya
, Thomas Patrick Jones
, Sarah Benafif
, Dionysis Papadatos-Pastos
, James Wilson
, Tanya Ahmad
, Angela Dwornik
, Angeliki Karamani
, Benny Chain
, David R. Pearce
, Georgia Stavrou
, Gerasimos-Theodoros Mastrokalos
, Helen L. Lowe
, James L. Reading
, John A. Hartley
, Kayalvizhi Selvaraju
, Leah Ensell
, Mansi Shah
, Maria Litovchenko
, Piotr Pawlik
, Samuel Gamble
, Seng Kuong Anakin Ung
, Victoria Spanswick
, Clare E. Weeden
, Eva Grönroos
, Jacki Goldman
, Mickael Escudero
, Philip Hobson
, Stefan Boeing
, Tamara Denner
, Vittorio Barbè
, William Hill
, Yutaka Naito
, Erik Sahai
, Zoe Ramsden
, George Kassiotis
, Imran Noorani
, Jason F. Lester
, Amrita Bajaj
, Apostolos Nakas
, Azmina Sodha-Ramdeen
, Mohamad Tufail
, Molly Scotland
, Rebecca Boyles
, Sridhar Rathinam
, Sean Dulloo
, Dean A. Fennell
, Claire Wilson
, Gurdeep Matharu
, Jacqui A. Shaw
, Ekaterini Boleti
, Heather Cheyne
, Mohammed Khalil
, Shirley Richardson
, Tracey Cruickshank
, Gillian Price
, Keith M. Kerr
, Jack French
, Kayleigh Gilbert
, Babu Naidu
, Akshay J. Patel
, Aya Osman
, Mandeesh Sangha
, Gerald Langman
, Helen Shackleford
, Madava Djearaman
, Gary Middleton
, Angela Leek
, Jack Davies Hodgkinson
, Nicola Totton
, Eustace Fontaine
, Felice Granato
, Juliette Novasio
, Kendadai Rammohan
, Leena Joseph
, Paul Bishop
, Vijay Joshi
, Sara Waplington
, Adam Atkin
, Antonio Paiva-Correia
, Philip Crosbie
, Katherine D. Brown
, Mathew Carter
, Anshuman Chaturvedi
, Pedro Oliveira
, Colin R. Lindsay
, Fiona H. Blackhall
, Yvonne Summers
, Matthew G. Krebs
, Jonathan Tugwood
, Caroline Dive
, Hugo J. W. L. Aerts
, Roland F. Schwarz
, Tom L. Kaufmann
, Peter Van Loo
, Carla Castignani
, Roberto Salgado
, Miklos Diossy
, Jonas Demeulemeester
, Stephan Beck
, Emma Nye
, Richard Kevin Stone
, Jayant K. Rane
, Jeanette Kittel
, Kerstin Haase
, Kexin Koh
, Rachel Scott
, Karl S. Peggs
, Emilie Martinoni Hoogenboom
, Fleur Monk
, James W. Holding
, Junaid Choudhary
, Kunal Bhakhri
, Pat Gorman
, Robert C. M. Stephens
, Yien Ning Sophia Wong
, Maria Chiara Pisciella
, Steve Bandula
, Thomas B. K. Watkins
, Catarina Veiga
, Gary Royle
, Charles-Antoine Collins-Fekete
, Francesco Fraioli
, Paul Ashford
, Alexander James Procter
, Asia Ahmed
, Magali N. Taylor
, Arjun Nair
, David Lawrence
, Davide Patrini
, Neal Navani
, Ricky M. Thakrar
, Sam M. Janes
, Zoltan Kaplar
, Allan Hackshaw
, Camilla Pilotti
, Rachel Leslie
, Anne-Marie Hacker
, Sean Smith
, Aoife Walker
, Anca Grapa
, Hanyun Zhang
, Khalid AbdulJabbar
, Xiaoxi Pan
, Yinyin Yuan
, David Chuter
, Mairead MacKenzie
, Serena Chee
, Patricia Georg
, Aiman Alzetani
, Judith Cave
, Eric Lim
, Paulo De Sousa
, Simon Jordan
, Alexandra Rice
, Hilgardt Raubenheimer
, Harshil Bhayani
, Lyn Ambrose
, Anand Devaraj
, Hemangi Chavan
, Sofina Begum
, Silviu I. Buderi
, Daniel Kaniu
, Mpho Malima
, Sarah Booth
, Andrew G. Nicholson
, Nadia Fernandes
, Pratibha Shah
, Chiara Proli
, Madeleine Hewish
, Sarah Danson
, Michael J. Shackcloth
, Lily Robinson
, Peter Russell
, Kevin G. Blyth
, Andrew Kidd
, Craig Dick
, John Le Quesne
, Alan Kirk
, Mo Asif
, Rocco Bilancia
, Nikos Kostoulas
, Jennifer Whiteley
& Mathew Thomas

Contributions

The contact address for the TRACERx consortium is ctc.tracerx@ucl.ac.uk. D.B. and Y.-H.L. designed the experiments, performed the bioinformatics analyses and wrote the manuscript. N.J.B., J.H., Y.W., D.A.M., T.K., K.G. and W.L. provided early feedback and helped to direct the avenues of bioinformatics analysis. S.V., C.N.-L., N.M. and S.W. performed sample collection and RNA extraction and helped with data interpretation. A.M.F., M.H. and E.C. performed data processing and helped with data interpretation. S.d.C.T., P.E., A.M., D.M.S., O.O., D.L., J. Mattson, A.L., P.M., J. Moldvay, Z.M., B.D., J.F., J.N., J.D., Z.S. and N.K. provided access to additional datasets and helped with data interpretation. A.H. provided statistical advice. M.J.-H. designed the TRACERx study protocols and helped to analyze the clinical characteristics of the patients. D.B. and C.S. conceived the project, acquired funding, supervised the study and edited the manuscript. All authors reviewed and approved the manuscript.

Corresponding authors

Correspondence to Dhruva Biswas, Nicolai J. Birkbak or Charles Swanton.

Ethics declarations

Competing interests

D.B. reports personal fees from NanoString and AstraZeneca and has a patent PCT/GB2020/050221 issued on methods for cancer prognostication. Y.W. consults for E15 VC and Prokarium. D.A.M. reports speaker fees from AstraZeneca, Eli Lilly, BMS and Takeda, consultancy fees from AstraZeneca, Thermo Fisher, Takeda, Amgen, Janssen, MIM Software, Bristol Myers Squibb and Eli Lilly and has received educational support from Takeda and Amgen. S.C.T. has acted as a consultant for Revolution Medicines. J.D. has acted as a consultant for AstraZeneca, Jubilant, Theras, Roche and Vividion and has funded research agreements with Bristol Myers Squibb, Revolution Medicines, Novartis, Vividion and AstraZeneca. M.J.-H. has consulted for Astex Pharmaceutical and Achilles Therapeutics, and is a member of, the Achilles Therapeutics Scientific Advisory Board and Steering Committee, has received speaker honoraria from Pfizer, Astex Pharmaceuticals, Oslo Cancer Cluster, Bristol Myers Squibb and Genentech. M.J.-H. is listed as a co-inventor on a European patent application relating to methods to detect lung cancer PCT/US2017/028013), this patent has been licensed to commercial entities and, under terms of employment, M.J.-H. is due a share of any revenue generated from such license(s), and is also listed as a co-inventor on the GB priority patent application (GB2400424.4) with title: Treatment and Prevention of Lung Cancer. N.J.B. is listed as a co-inventor on a patent to identify responders to cancer treatment (PCT/GB2018/051912), has a patent application (PCT/GB2020/050221) on methods for cancer prognostication and a patent on methods for predicting anti-cancer response (US14/466,208). C.S. acknowledges grant support from AstraZeneca, Boehringer-Ingelheim, BMS, Pfizer, Roche-Ventana, Invitae (previously Archer Dx (collaboration in minimal residual disease sequencing technologies)) and Ono Pharmaceutical. C.S. is an AstraZeneca Advisory Board member and Chief Investigator for the AZ MeRmaiD 1 and 2 clinical trials and is also Co-Chief Investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL’s SAB. He receives consultant fees from Achilles Therapeutics (also a SAB member), Bicycle Therapeutics (also a SAB member), Genentech, Medicxi, Roche Innovation Centre–Shanghai, Metabomed (until July 2022), and the Sarah Cannon Research Institute. C.S. had stock options in Apogen Biotechnologies and GRAIL until June 2021, currently has stock options in Epic Bioscience and Bicycle Therapeutics and has stock options and is co-founder of Achilles Therapeutics. C.S. is an inventor on a European patent application relating to an assay technology to detect tumor recurrence (PCT/ GB2017/053289), the patent has been licensed to commercial entities and under his terms of employment, C.S. is due a revenue share of any revenue generated from such license(s). C.S. holds patents relating to targeting neoantigens (PCT/EP2016/059401), identifying patient responses to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/ GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), identifying patients who respond to cancer treatment (PCT/GB2018/051912), a US patent relating to detecting tumor mutations (PCT/US2017/28013), methods for lung cancer detection (US20190106751A1) and both a European and US patent related to identifying indel mutation targets (PCT/GB2018/051892) and is a co-inventor on a patent application to determine methods and systems for tumor monitoring (PCT/EP2022/077987). C.S. is a named inventor on a provisional patent related to a ctDNA detection algorithm. The other authors declare no competing interests.

Peer review

Peer review information

Nature Cancer thanks Roy Herbst, David Santamaría and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 An overview of the TRACERx study.

a, An overview of cohorts utilized in this study. A total of 421 NSCLC patients were enrolled in the TRACERx study (NCT01888601) where we focused on patients with LUAD to perform analyses on LUAD prognostic signatures. Patients involved in the training dataset published previously⁸ were removed, yielding the prospective validation cohort (n = 158). Other analyses for discovery were performed on the exploratory cohort including 184 LUAD patients. Patients with multiple regions available were included in certain analyses where specified in the text. b, Batch correction of ORACLE risk score using shared samples (85 regions from 27 patients) between previously published data and current data generated from an updated computational pipeline. A dot plot showing the risk scores between two data versions and risk scores were corrected using the linear regression formula. The P value (P = 1.6×10⁻⁹⁷) was tested using a linear regression model and the coefficient of determination (R²) was shown in the graph.

Source data

Extended Data Fig. 2 Discordance percentages of published RNA-seq prognostic signatures.

a, Dot plots showing the distribution of risk scores for six published RNA prognostic signatures^{18,19,20,21,22,23} in the TRACERx validation cohort (n = 122 stage I-III LUAD patients with multiregion RNA-seq data available). Patients were classified into concordant low- (blue), concordant high- (red) and discordant-risk (gray) groups by each signature using median value as a cutoff. b, Bar plots show the percentages of risk groups classified by ORACLE risk class and the six signatures across stage I to stage III. The differences of discordant risk frequencies among tumor stages were examined using chi-squared goodness-of-fit test.

Source data

Extended Data Fig. 3 Clustering concordance of published RNA-seq prognostic signatures.

A previously used hierarchical clustering method^8,24 applied on the six published prognostic signatures is illustrated. The dendrogram and heatmap shows the clustering of tumor regions. The discordant rate (gray) was calculated as the percentage of patients with tumor regions falling into different clusters. The analysis was iterated from 1 to 122 clusters which is the maximum patient number included in this cohort. The percentage of discordant clustering was illustrated when cutting the dendrogram into 2, 10 and 60 clusters. a, Li et al.’s signature b, Wang et al.’s signature c, Zhao et al.’s signature d, Song et al.’s signature e, Jin et al.’s signature f, Li, Feng et al.’s signature.

Source data

Extended Data Fig. 4 Established metrics for quantifying tumor sampling bias.

a, The hierarchical clustering of ORACLE genes using methods described in Extended data Fig. 3. is shown. b, The area under the curve was calculated to represent concordant rate derived from hierarchical clustering method for ORACLE and the six published prognostic signatures. This analysis was run for 1 (100% concordant rate) to 122 clusters (the maximum number of clusters could be obtained for the cohort). Dashed line indicates the number of clusters cut in Extended data Fig. 3. c, A method developed by Househam et al.²⁵ examining the expression variability. The heatmap shows the gene-wise standard deviation of expression across tumor regions per patient. The average of expression variability is annotated on the left. d, Box plot represents the distribution of mean expression variability across the signature genes for ORACLE and the six other RNA signatures in the TRACERx validation cohort (n = 122 patients with 333 tumor regions). Color for each signature is labeled as the same in panel c. The statistical significance was tested using a two-sided Wilcoxon rank-sum test. The center line of the boxplot indicates median and the box spans from 25th to 75th percentile. The lower and upper whiskers define the 5th and 95th percentiles, respectively. Jin et al., 2022, P = 0.045; Li et al., 2017, P = 0.00012; Wang et al., 2022, P = 0.4; Song et al., 2022, P = 0.56; Li et al., 2022, P = 3.9×10⁻⁶; Zhao et al., 2020, P = 4.5×10⁻¹² compared with ORACLE. e, Estimation of minimum biopsy number needed to obtain a stable risk score using an algorithm developed by Bachtiary et al.²⁶. Vertical lollipop plot represents the variance of ORACLE risk score within an individual tumor. The average value of variance within tumors divided by a certain number of biopsies (k) was summarized as W. The horizontal dashed line shows the variance between tumors involved in this cohort which is denoted as B. The ratio of W to the total variance (T) measures the stability of risk scores for a given signature. This method was applied to the other six signatures. f, Line plot represents the W/T per signature from one to ten biopsies. The threshold of 0.15 (horizontal dashed line) predefined in the original publication²⁶ determined the intersection with the best fit line, yielding the least biopsies required to obtain a stable risk score.

Source data

Extended Data Fig. 5 Prospective validation of survival association in stage I LUAD and using lung-cancer-specific survival and DFS.

a, Prognostic value of ORACLE in predicting the OS in stage I subgroup (n = 70 patients with stage I LUAD) adjusted for known clinicopathological risk factors. Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack years (smoking packs and duration), adjuvant treatment status, tumor stage (TNM 8th edition) and histologic grade. The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. b, The percentages of stage I patients that transit from standard clinical substaging (TNM 8th edition) to ORACLE risk classification. The patients in the TRACERx validation cohort (n = 70 stage I LUAD patients) were stratified by tumor stage into stage IA (n = 38) and stage IB (n = 32) on the left and classified by ORACLE as concordant low- (n = 56), concordant high- (n = 9) and discordant risk (n = 5) groups on the right. The color shows the transition from stage I to ORACLE low- (blue), high- (red) and discordant-risk (gray) groups. c, Prognostic value of ORACLE in a meta-analysis across four independent cohorts of patients with LUAD (n = 580 patients with stage I LUAD). Univariate Cox analysis was performed in four microarray datasets (Shedden et al.⁷, Der et al.²⁷, Okayama et al.²⁸ and Rousseaux et al.²⁹). The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. The diamond indicates the hazard ratio for the meta-analysis of the four microarray cohorts. d, Prognostic value of ORACLE in predicting the lung-cancer-specific death adjusted for known clinicopathological risk factors in the TRACERx validation cohort (n = 158 stage I-III LUAD patients). Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack years (smoking packs and duration), adjuvant treatment status and tumor stage (TNM 8th edition). The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. e, Prognostic value of ORACLE in predicting the DFS adjusted for known clinicopathological risk factors in the TRACERx validation cohort (n = 158 stage I-III LUAD patients). Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack years (smoking packs and duration), adjuvant treatment status and tumor stage (TNM 8th edition). The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale.

Source data

Extended Data Fig. 6 Anticancer drug screening in vitro.

a, Flow diagram represents the steps for filtering cell lines and compounds obtained from GDSC and CCLE database^34,35 with missing data (n = 54 LUAD cell lines; 396 compounds). Cell lines with more than 50 compound data missing were first removed, yielding 37 cell lines. Compounds with more than 5 cell line data missing were then removed, yielding 359 compounds. b, The association of ORACLE risk score and anticancer drug response determined by half-maximal inhibitory concentration (IC₅₀). Drugs with significant association (see Fig. 4a) are shown in this figure. Spearman correlation coefficients and P values are shown for each compound.

Source data

Extended Data Fig. 7 Prediction of adjuvant therapy response.

ORACLE as a predictive marker of response to adjuvant therapies stratified by nodal status in the TRACERx validation cohort (n = 158 patients with stage I-III LUAD). Statistical significance was tested using a two-sided log-rank test. Node negative no adjuvant therapy, P = 0.03; node negative with adjuvant therapy, P = 0.051; node positive no adjuvant therapy, P = 0.35; node positive with adjuvant therapy, P = 0.19.

Source data

Extended Data Fig. 8 The association of ORACLE with genetic evolutionary metrics.

Scatter plots and boxplots show the mean of ORACLE risk score summarized per tumor in the TRACERx exploratory cohort (n = 184 patients with stage I-III LUAD) and the correlation with seven clinicopathological and seven genetic features. The center line of the boxplot indicates median and the box spans from 25th to 75th percentile. The lower and upper whiskers define the 5th and 95th percentiles, respectively.

Source data

Extended Data Fig. 9 Somatic mutations and copy number alterations underlying clonal expression magnitude.

a, Frequencies of clonal (left) and subclonal (right) driver mutations at gene level compared between high- and low-risk tumor regions in the TRACERx exploratory cohort (n = 142 high-risk and n = 308 low-risk tumor regions from 184 patients with stage I-III LUAD). The scatter plot shows the odds ratio obtained by a two-sided Fisher’s exact test for each gene mutation. A P value of 0.05 was indicated by the horizontal dashed line. b, Oncoprint shows the frequencies of clonal mutations in 10 driver genes that were enriched in ORACLE low-risk and high-risk groups. The column represents the regions across patient tumors in the TRACERx exploratory cohort (n = 184 patients with stage I-III LUAD with 450 region samples). c, The genome-wide SCNAs identified using GISTIC2.0 (Methods). For a given genome region, the G-score difference was calculated between ORACLE low-risk and high-risk cohorts to identify loci with positive selection. The plot shows the false-discovery rate (q value) of the G score in the high-risk cohort. Chromosome segments with significant positive selection (G-score difference >0 and q value < 0.05) are shown in red for amplification and blue for deletion. Vertical dashed lines indicate the threshold of a false-discovery rate (q value) equal to 0.05. The driver SCNAs, as listed in our previous study¹⁴, located in the chromosome arm harboring detected cytobands are highlighted.

Source data

Extended Data Fig. 10 Future applicability of ORACLE in clinical practice.

The possible design of prospective clinical trials to evaluate the performance of ORACLE to guide the adjuvant chemotherapy in high-risk stage I patients and monitor the outcome in low-risk stage II patients. LUAD = lung adenocarcinoma.

Supplementary information

Reporting Summary

REMARK checklist.

Supplementary Table 1

Published RNA prognostic signatures. Gene lists from six published signatures for LUAD.

Source data

Source Data Fig. 1

Statistical source data for Fig. 1.

Source Data Fig. 2

Statistical source data for Fig. 2.

Source Data Fig. 3

Statistical source data for Fig. 3.

Source Data Fig. 4

Statistical source data for Fig. 4.

Source Data Fig. 5

Statistical source data for Fig. 5.

Source Data Extended Data Fig. 1

Statistical source data for Extended Data Fig. 1.

Source Data Extended Data Fig. 2

Statistical source data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Statistical source data for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Statistical source data for Extended Data Fig. 4.

Source Data Extended Data Fig. 5

Statistical source data for Extended Data Fig. 5.

Source Data Extended Data Fig. 6

Statistical source data for Extended Data Fig. 6.

Source Data Extended Data Fig. 7

Statistical source data for Extended Data Fig. 7.

Source Data Extended Data Fig. 8

Statistical source data for Extended Data Fig. 8.

Source Data Extended Data Fig. 9

Statistical source data for Extended Data Fig. 9.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Biswas, D., Liu, YH., Herrero, J. et al. Prospective validation of ORACLE, a clonal expression biomarker associated with survival of patients with lung adenocarcinoma. Nat Cancer 6, 86–101 (2025). https://doi.org/10.1038/s43018-024-00883-1

Download citation

Received: 31 January 2024
Accepted: 15 November 2024
Published: 09 January 2025
Version of record: 09 January 2025
Issue date: January 2025
DOI: https://doi.org/10.1038/s43018-024-00883-1

This article is cited by

Conserved sphingolipid metabolism under transcriptomic diversity: a prognostic and therapeutic target in triple-negative breast cancer
- Jian Li
- Rui Chen
- Ruijie Dai
Journal of Translational Medicine (2025)
Transforming Population Health Screening for Atherosclerotic Cardiovascular Disease with AI-Enhanced ECG Analytics: Opportunities and Challenges
- Dhruva Biswas
- Arya Aminorroaya
- Rohan Khera
Current Atherosclerosis Reports (2025)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Multiregion RNA-seq data from LUAD

Benchmarking tumor sampling bias

Prospective validation

ORACLE as a biomarker of invasive and metastatic potential

ORACLE delineates chemosensitive cells

ORACLE as a summary metric of lung cancer evolution

Discussion

Methods

TRACERx cohort, sample collection and sequencing

Calculating ORACLE risk scores

Batch correction for RNA-seq preprocessing pipeline versions

Identification of LUAD RNA-seq prognostic signatures

Tumor sampling bias metrics

Survival analyses

Meta-analysis of ORACLE prognostic values in microarray cohorts of patients with stage I LUAD

Preinvasive lung squamous cell carcinogenesis dataset

ORACLE risk score compared between seeding and nonseeding regions

In vitro drug sensitivity screening

Determinants for ORACLE magnitude

Clinical outcome variance explained by TRACERx biomarkers

Enrichment of somatic mutation in NSCLC driver genes

Identification of recurrent SCNAs

Statistical analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Consortia

TRACERx Consortium

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links