Abstract
Proteomics offers an unprecedented opportunity to characterize and predict diabetic retinopathy (DR) with minimal invasiveness. Here we examine this in 10,873 individuals with (pre)diabetes from two ethnically distinct cohorts. By simultaneous profiling of ~3000 proteins, we identify 668 associations with mechanistically plausible directionality that constitute a comprehensive DR proteomic landscape with linkages to retinal tomographic structure and genetic predisposition, pointing to established and novel biological pathways conferring DR risk. Integrating DR proteomic profile markedly improves predictive performance beyond clinical and genetic predictors, with plexin B2, growth differentiation factor 15, and renin emerging as top proteins validated across cohorts and linked to retinal microvascular degeneration in Guangzhou Diabetic Eye Study (GDES) based on SS-OCTA. A parsimonious panel of these three proteins alone achieves comparable performance in predicting DR development and progression, while renin is confirmed as a causal promoter through genetic analyses. Our findings highlight the potential of large-scale proteomics in elucidating DR pathogenesis and advancing biomarker discovery, with broad implications for early detection and intervention.
Introduction
As the global population ages and urbanization, diabetes has become the fastest-growing disease worldwide, projected to affect 1.3 billion people by 20501,2,3. Among its debilitating complications, diabetic retinopathy (DR) stands out as the most prevalent and a leading cause of blindness in developed countries4,5,6. Current therapeutic options, such as laser coagulation or intravitreal medication, are either inherently destructive or of limited efficacy, with over 30%–50% of patients responding poorly5,7,8. In stark contrast, up to 90% of DR-related blindness is preventable through early detection and intervention, yet identifying high-risk individuals has remained an unmet challenge9,10. Despite extensive efforts in exploring the pathogenesis of DR and discovering its biomarkers, existing knowledge of risk factors has proven far from being sufficient, and genetic studies, while promising, have yielded limited success6,11,12. Expanding the biomarker landscape is therefore imperative, as such knowledge is paramount to unravelling pathogenesis and identifying robust predictors of disease onset and progression, some of which may serve as targets for therapeutic development.
Recent advances in proteomics offer a promising avenue for directly linking systemic biochemical disturbances to disease risk13,14,15,16,17. While pioneering attempts have highlighted aqueous humour-proteomics’ potential in predicting retinal aging and diseases16, the invasive nature of intraocular sampling limits broader application of any biomarkers identified. In contrast, blood-based biomarkers offer a minimally invasive alternative. Although some blood proteins have been associated with DR18,19,20, these studies were limited by small sample sizes, cross-sectional designs, and focus on few selected proteins, which introduced reverse causations bias and lacked a comprehensive proteomic perspective. Whether blood proteomics can predict DR, and if so, which proteins possess the most robust predictive power, has remained unanswered. To bridge this knowledge gap, large-scale prospective studies with proteomic data are therefore essential.
In this study, we employed a data-driven proteomic approach within the world’s largest population-based biobank, with extensive longitudinal follow-up, to map the DR proteomic landscape and assess its potential for predicting DR over a span of up to 15 years21. We also explored the relationships between DR-associated proteins and retinal structure, as well as their links to genetic susceptibility, and characterized the underlying biological pathways. These findings were largely validated in an external, ethnically distinct cohort22 of diabetic patients with standardized DR assessment following the ETDRS grading system and DRCR.net criteria23. Finally, we conducted mendelian randomization (MR) to establish causal links and identify potential therapeutic targets. To our knowledge, this represents the largest and most comprehensive human population-based proteomic study of DR, with broad functional and clinical implications that may transform DR prediction and diabetic management.
Results
Participants’ characteristics
A total of 10,521 UK Biobank (UKB) participants free of DR at baseline were included in this study for identifying DR-associated proteins, characterizing biological pathways, and modelling machine learning-based proteomic state models for prediction of DR development (Fig. 1). The study population had a mean (SD) age of 59.6 (7.3) years, with 49.9% female and 87.9% being of Caucasian ethnicity (Supplementary Table S1). Over 118.7 thousand person-years of follow-up, 425 (4.0%) cases of incident DR were identified. Those who developed DR during follow-up were more likely to be male sex (P = 6.52 × 10−7), non-Caucasian (P = 0.003), have a higher body mass index (P = 2.47 × 10−10), higher Haemoglobin A1c level (P = 6.98 × 10−42), and a longer duration of diabetes (P = 0.007), consistent with prior epidemiological findings24,25. Participants were randomly allocated in a ratio of 8:2 to a discovery set (n = 8451) for protein discovery, pathway profiling, and model training, and a fully withheld test set (n = 2070) for all model evaluations, with balanced baseline characteristics across both sets (all P > 0.05) (Supplementary Table S2). The data of an additional 352 participants from the Guangzhou Diabetic Eye Study (GDES) served for external replication (Methods).
1, 2 A total of 10,521 UK Biobank (UKB) participants without DR at baseline were included to identify DR-associated proteins, characterize biological pathways, and develop machine learning-based proteomic state models for predicting incident DR. An additional 352 participants from the Guangzhou Diabetic Eye Study (GDES) were included for independent replication. 3 Dysregulated protein profiles across 2923 blood proteins were assessed, with subgroup analyses exploring differences by genetic predisposition and sex. 4 Associations between DR-associated proteins, retinal tomographic structure, and genetic predisposition were examined. 5 The identified proteins revealed both established and novel biological pathways underlying DR risk. 6 Machine learning models were trained to predict incident DR, while (7) protein importance ranking identified the top contributors to prediction. 8 The utility of integrating proteomic profiles and top-ranked proteins into clinical and polygenic risk models was evaluated. 9 External validation of the proteomic profiles, their associations with retinal structure, and predictive improvements were conducted in GDES. Leveraging annual retinal microvascular phenotyping data from GDES participants, 10 we investigated the impact of DR-associated proteins on the dynamics of microvascular degeneration. 11 To infer causality, we conducted two-sample Mendelian randomization (MR) for the top-ranked proteins and their association with incident DR. Created in BioRender. Wang, W. (https://BioRender.com/30gbrhl). Parts of panel (1) was created from Flaticon (https://flaticon.com). DR diabetic retinopathy; SCP superficial capillary plexus; DCP deep capillary plexus; VD vessel density; GWAS genome-wide association study; pQTL protein quantitative trait loci; SNP single nucleotide polymorphism; MR Mendelian randomization.
Identifying proteins associated with incident DR
Out of 2923 plasma proteins tested in the discovery set, 814 were significantly associated with the risk of incident DR risk after applying multiple testing correction and adjusting for age, sex, and ethnicity in model 1 (Fig. 2a and Supplementary Table S3). Sensitivity analyses in model 2, which further adjusted for established DR risk factors (HbA1c, duration of diabetes, BMI, and SBP)24,25, replicated 673 of these associations (Fig. 2b and Supplementary Table S4). These included 463 positive and 210 negative associations, most with mechanistically plausible directionality. Among the strongest associations was the growth differentiation factor 15 (GDF15) (hazard ratio [HR] = 1.734, 95% CI: 1.620, 1.856; PBH = 2.06 × 10−53), a senescence-associated secretory phenotype which has been linked to hypoxia and mitochondrial dysfunction recently implicated in DR pathogenesis26,27,28. Other prominent proteins replicated in model 2 included renin (REN) (HR = 1.460, 95% CI: 1.354, 1.574; PBH = 1.17 × 10−20), angiopoietin-like 4 (ANGPTL4) (HR = 1.490, 95% CI: 1.349, 1.647; PBH = 2.33 × 10−13), and fatty acid-binding protein 4 (FABP4) (HR = 1.425, 95% CI: 1.299, 1.564; PBH = 3.12 × 10−12), all with known effects in promoting microvascular and metabolic dysfunction29,30,31,32. Further sensitivity analyses, (1) additionally adjusting for hypoglycaemic medications (Pearson’s r = 0.924; P = 1.31 × 10−1223); and (2) excluding participants who developed DR within the initial 2 years of follow-up (Pearson’s r = 0.981; P = 4.47 × 10−2071), yielded consistent results (Fig. 2c, d and Supplementary Tables S5–S6). Hypoglycaemic medication use explained only a small proportion (<3%) of the variance in DR-associated proteins (Methods and Supplementary Table S7).
(a–d), Associations of 2923 proteins with incident DR. Hazard ratios for incident DR per 1-SD change in protein levels were estimated using Cox proportional hazards models (n = 10,521). Model 1 (a) was adjusted for age, sex, and ethnicity, while model 2 (b) included additional adjustments for HbA1c, duration of diabetes, BMI, SBP, and status of diabetes (prediabetes/diabetes). Sensitivity analyses were conducted by additionally adjusted for medication use (c) and excluding DR within the initial two years of follow-up (d). P values were calculated using two-sided Wald tests after controlling false discovery rate (FDR) for multiple testing. e Subgroup analyses stratified by polygenic risk and sex using model 2. Participants who completed proteomic profiling and genotyping were dichotomized at the PRS median high-risk (n = 5261) and low-risk (n = 5260) groups. Sex-specific analyses included 5267 male and 5254 female participants. P values were calculated using two-sided Wald tests, and asterisks denote significant associations after controlling FDR for multiple testing independently within each stratum (***P < 0.001, **P < 0.01, *P < 0.05). To enhance visual contrast and facilitate cross-protein comparison, hazard ratios were log-transformed and linearly rescaled to span the full colour scale. Original (unscaled) results are reported in Supplementary Tables S7–S8 and S10–S11. f Associations of DR-associated proteins with retinal structure. Coefficients for retinal tomographic measurements per 1-SD change in protein levels were estimated using linear models adjusted for age, sex, HbA1c, duration of diabetes, BMI, and SBP (n = 3397). P values were calculated using two-sided Student’s t-tests after controlling FDR for multiple testing. Asterisks denote significance as above. g Subgroup analyses by polygenic risk and sex for the associations of DR-associated proteins with PL thickness. Coefficients were linearly rescaled as above, and original (unscaled) results are reported in Supplementary Tables S17–S20. Source data are provided as a Source Data file. DR diabetic retinopathy, RPE retinal pigment epithelium, PL photoreceptor layer, GC-IPL ganglion cell-inner plexus layer, RNFL retinal nerve fibre layer.
Based on the established link between genetic susceptibility and DR risk33, we performed subgroup analyses stratified by polygenic risk scores (PRSs) derived from an independent dataset34 (Methods), revealing both shared and distinct patterns of protein dysregulation across varying genetic risk strata (Fig. 2e and Supplementary Tables S8–S10). To ensure balanced statistical power, participants were dichotomized at the PRS median into high- and low-risk groups. While the dysregulation of most identified proteins was consistent (ρ ≈ 1), such as REN (ρ = 1.006, 95% CI: 0.860, 1.177; Pheterogeneity = 0.950) and GDF15 (ρ = 1.109, 95% CI: 0.952, 1.292; Pheterogeneity = 0.187), the total number of dysregulated proteins was markedly higher in the high-risk stratum (n = 539) compared to the low-risk stratum (n = 175). Certain proteins including vascular endothelial growth factor-D (VEGF-D) (ρ = 1.531, 95% CI: 1.251, 1.873; Pheterogeneity = 6.24 × 10−5) were significantly dysregulated exclusively among those individuals with a high genetic risk (HR = 1.310, 95% CI: 1.623, 1.476; PBH = 1.36 × 10−4; PBonferroni = 0.006), with evidence of interaction effect (Pinteraction = 0.003). In subgroup analyses stratified by sex, the primary results from the total population remained largely unchanged (Fig. 2e and Supplementary Tables S11–S13), suggesting similar patterns of protein dysregulation between men and women. However, sex-specific differences were observed for certain proteins. For instance, plexin B2 (PLXNB2), implicated in neural inflammation35,36, were predominantly associated with incident DR in women (HR = 1.465, 95% CI: 1.288, 1.667; PBH = 4.89×10−7; PBonferroni = 4.17×10−6) rather than men (ρ = 1.288, 95% CI: 1.076, 1.542; Pheterogeneity = 0.006; Pinteraction = 0.026).
DR-associated proteins and retinal structures
To elucidate the relationship between blood proteins and incident DR, we explored associations between these proteins and retinal structures in 3397 participants free of DR at baseline and with both proteomic and retinal imaging data (Methods). After adjusting for confounders as described in model 2218 proteins were associated with at least one of the six retinal morphometric parameters after multiple testing correction (Fig. 2f and Supplementary Tables S14–S17). In general, higher levels of proteins promoting DR were associated with a reduced thickness of the neuroretina and retinal pigment epithelium (RPE), whereas protective proteins demonstrated opposing patterns. Among retinal structures, the photoreceptor layer (PL), home to specialized neurons for visual processing at the core of the retina’s functionality37,38, was most markedly associated with DR-associated proteins (Supplementary Table S15). These proteins included GDF15 (β = −0.080, 95% CI: −0.127, −0.034; PBH = 0.007), REN (β = −0.079, 95% CI: −0.122, −0.036; PBH = 0.004), and FABP4 (β = −0.090, 95% CI: −0.131, −0.049; PBH = 0.001), in line with earlier findings linking them to metabolic and microvascular dysfunctions compromising the photoreceptor functionality26,29,32,39. Subgroup analyses revealed that these associations were more pronounced in women and individuals with high genetic risk, while most associations did not reach significance in men and those with a low genetic risk (Fig. 2g and Supplementary Tables S18–S21).
In addition, 3 and 5 proteins were associated with retinal nerve fibre layer (RNFL) and ganglion cell-inner plexiform layer (GC-IPL) thickness (Fig. 2f and Supplementary Table S16), respectively, representing unmyelinated axons and cytosol/dendrites of retinal ganglion cells40. These associations included reductions in RNFL thickness with hyaluronidase-1 (HYAL1) (β = −0.074, 95% CI: −0.111, −0.037; PBH = 0.028) and C-C motif chemokine 21 (β = −0.080, 95% CI: −0.116, −0.044; PBH = 0.008), in line with earlier findings suggesting their roles in diabetic inflammatory and endothelial dysfunctions that compromise axonal integrity41,42. In addition, proteins such as adhesion G-protein coupled receptor G1 (β = −0.079, 95% CI: −0.121, −0.038; PBH = 0.044) and fibroblast growth factor binding protein 1 (β = 0.072, 95% CI: 0.035, 0.109; PBH = 0.044) were associated with a reduced GC-IPL thickness (Fig. 2f and Supplementary Table S16), supporting their roles in cellular adhesion and neurotrophic modulation of dendritic and synaptic processes in the retina43,44. The RPE, whose alterations have recently been implicated in early DR pathogenesis37,45, also held associations with several blood proteins including liver carboxylesterase 1 (CES1) (β = −0.066, 95% CI: −0.105, −0.027; PBH = 0.034). Subgroup analyses stratified by sex and genetic risk produced mostly consistent results (Fig. 2g and Supplementary Tables S22–S28).
DR-associated proteins and DR polygenic susceptibility
To investigate whether the association between blood proteins and incident DR is driven by genetic factors, we examined the associations between DR-associated proteins and polygenic risk (Methods). After multiple testing correction, 151 proteins were significantly associated with polygenic risk, exhibiting consistent directionality with their association to incident DR (Fig. 3a and Supplementary Table S29). These proteins included GDF15 (β = 0.106, 95% CI: 0.075, 0.137; PBH = 1.08 × 10−9), REN (β = 0.028, 95% CI: 0.009, 0.047; PBH = 0.016), and FABP4 (β = 0.056, 95% CI: 0.031, 0.081; PBH = 1.46 × 10−4), suggesting a direct modulation by the genetic architecture that links downstream DR pathophysiology. Conversely, other 53 proteins, such as leptin receptor (β = −0.156, 95% CI: −0.206, −0.105; PBH = 1.84 × 10−9) and CES1 (β = 0.033, 95% CI: 0.016, 0.050; PBH = 0.001), exhibited opposing directionality with respect to their association with incident DR (Fig. 3a and Supplementary Table S30), suggesting a regulation by environmental factors or secondary metabolic changes rather than direct genetic predisposition46,47.
a Patterns of blood protein alignments and disalignments with polygenic risk and DR incidence. Coefficients for protein levels per 1-SD change in polygenic risk scores were estimated using linear models adjusted for age, sex, HbA1c, duration of diabetes, BMI, and SBP (n = 10,521). Data are presented as estimated coefficients (squares) with 95% confidence intervals (CIs) indicated by error bars. Solid blocks and asterisks denote significant associations based on two-sided Student’s t-tests with false discovery rate (FDR) correction for multiple testing. b Enrichment analysis for Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Reactome pathways. Proteins significantly associated with DR after multiple testing correction in model 2 were analysed using the full panel of Olink proteins as the reference. P values were calculated using two-sided Fisher’s exact tests after controlling FDR for multiple testing. The numbers above each bar represent the count of proteins observed in each pathway. c Protein–protein interaction network of DR-associated proteins. Nodes represent individual proteins, with darker shades indicating higher degrees. The width and colour of edges connecting the nodes represent the combined interaction scores, with thicker and darker lines signifying stronger evidence of interaction. To improve readability and highlight the most robust connections, only edges with a standardized interaction score > 0.85 are displayed. Source data are provided as a Source Data file. DR diabetic retinopathy.
To complement the polygenic findings, we further assessed the SNP–protein associations using 58 established DR risk variants curated from a systematic review of prior genome-wide association studies (GWASs) and meta-analyses (Methods)48. As expected, all proteins associated with DR polygenic risk were also significantly associated with at least one of these validated DR risk variants, providing orthogonal support for the genetic influence on these proteins (Supplementary Table S31). Further sensitivity analysis excluding SNPs in strong linkage disequilibrium (LD, r2 > 0.8 within ±1 Mb) with cis-pQTLs of these proteins prior to PRS construction yielded consistent results (Methods and Supplementary Table S32), confirming that the observed associations reflect true polygenic signals rather than LD confounding. Overall, the contrast between proteins consistently aligned with both the polygenic risk and DR incidence, and those without such alignment, highlighted the multifactorial nature of DR pathogenesis.
Biological function of DR-associated proteins
To establish the biological relevance of DR-associated proteins, we queried the Gene Ontology (GO)49, Kyoto Encyclopaedia of Genes and Genomes (KEGG)50, and Reactome51 databases to measure protein enrichment in various pathways (Methods), highlighting established and novel pathways canonically implicated in DR pathogenesis (Fig. 3b). These pathways included proteolytic cascades (enrichment score [ES] = 204.0; PBH = 2.03 × 10−14) that promote oxidative stress52,53, positive regulation of the MAPK cascade (ES = 190.0; PBH = 9.34 × 10−14), and cell population proliferation (ES = 150.4; PBH = 9.34 × 10−14), which disrupt pericyte and endothelial responses to oxidative stress and growth signals seen in DR, in line with earlier findings54,55. Other enriched pathways involved receptor-ligand interactions (ES = 558.7; PBH = 5.85 × 10−29) and cytokine activity (ES = 259.2; PBH = 2.70 × 10−14), both of which have been implicated in promoting retinal inflammation56,57. From a cellular component perspective, the enrichment of proteins in the collagen-containing extracellular matrix (ES = 463.3; PBH = 1.22 × 10−27) and endoplasmic reticulum lumen (ES = 265.5; PBH = 9.57 × 10−18) supports prior evidence of extracellular matrix remodelling58 and endoplasmic reticulum stress59 in DR pathogenesis.
We further modelled a protein–protein interaction (PPI) network using four publicly available databases60,61,62,63 to elucidate the intricate connections among the 684 DR-associated proteins identified (Methods). The generated network, comprising 664 protein nodes and 6,475 interaction edges (Fig. 3c), exhibited a highly centralized topology with short path lengths (Lenaver. = 2.289) and high radiality (Radaver. = 0.994), suggesting efficient signal propagation within a tightly coordinated molecular landscape. Interleukin (IL)-6 emerged as the most central protein (Deg = 225), followed by CD4 (Deg = 170) and IL-10 (Deg = 162), aligning with their established roles in chronic inflammation, a key driver of DR pathogenesis57,64. In addition, Molecular COmplex DEtection (MCODE)65 identified four densely connected subnetworks, each enriched for distinct yet interconnected functional modules (Supplementary Figs. S1–S3 and Supplementary Tables S33–S36). These subnetworks functioned in immune regulation, chronic inflammation, extracellular matrix remodelling, and angiogenesis, collectively illustrating how diverse biological processes converge in DR pathogenesis. IL-1R1, CD274, and CD28 were identified as integrative hubs among modules, suggesting their dual role as key mediators bridging immune checkpoint regulation and inflammatory immune cell dynamics.
Proteomic profile stratifies and predicts future DR risk
We next assessed the ability of DR-associated proteins to stratify and predict future DR risk (Methods). The proteomic state model, trained using eXtreme Gradient Boosting (XGBoost), yielded significant separations in cumulative hazards across DR risk trajectories in the fully withheld test set (Fig. 4a), conferring a more than 32-fold increased risk for top tertile individuals compared to those in the bottom (HR = 32.947, 95% CI: 8.037, 135.072; P = 1.61 × 10−22). In terms of predictive performance, the proteomic states alone achieved a C-statistic of 0.824 (95% CI: 0.774, 0.874), outperforming both the polygenic risk (P = 9.27 × 10−5) and clinical risk factor models (P = 0.010) (Fig. 4b). Integrating DR proteomic states into the polygenic, clinical, and polygenic + clinical models further improved predictive performance, resulting in ΔC-statistics of 0.149 (95% CI: 0.084, 0.214; P = 6.86 × 10−6), 0.088 (95% CI: 0.034, 0.143; P = 0.002), and 0.076 (95% CI: 0.772, 0.846; P = 0.003), with overall C-statistics of 0.832 (95% CI: 0.783, 0.880), 0.840 (95% CI: 0.796, 0.885), and 0.847 (95% CI: 0.803, 0.890) (Fig. 4b). Improvements in net reclassification (NRIpolygenic = 0.999, 95% CI: 0.782, 1.216; P = 6.22 × 10−11, NRIclinical = 0.407, 95% CI: 0.181, 0.632; P = 4.16 × 10−4, and NRIclinical + polygenic = 0.371, 95% CI: 0.145, 0.596; P = 0.001) and integrated discrimination (IDIpolygenic = 0.076, 95% CI: 0.042, 0.110; P = 1.00 × 10−5, IDIclinical = 0.031, 95% CI: 0.007, 0.055; P = 0.010, and IDIclinical + polygenic = 0.030, 95% CI: 0.006, 0.054; P = 0.012) were consistently observed. Goodness of model fit was confirmed (PHosmer-Lemeshow test > 0.99), and decision curve analyses revealed evident improvements in clinical utility (Fig. 4c). In addition, integrating proteomic state into the clinical model significantly outperformed the one that integrated polygenic risk (ΔC-statistic = 0.070, 95% CI: 0.017, 0.122; P = 0.009).
a, d Cumulative DR hazard over the follow-up period (n = 10,521), stratified by proteomic state tertiles constructed using the full protein panels (a) and the cardiometabolic panel only (d). Data are presented as observed event frequencies with 95% confidence intervals (CIs) shown as shading derived from survival proportions. High risk corresponds to the top tertile, middle risk to the middle tertile, and low risk to the bottom tertile. b, e Receiver operating characteristic curves comparing the clinical risk factor model, polygenic risk model, clinical risk factors + polygenic risk model, clinical risk factors + proteomic states, polygenic risk + proteomic states, and clinical risk factors + polygenic risk + proteomic states for predicting future DR (n = 10,521). Proteomic states were constructed using the full protein panels (b) and the cardiometabolic panel only (e). The dashed grey line indicates random classifier. Stratified performance across polygenic risk and sex are also presented. (c and f), Standardized net benefit curves for the same models predicting future DR (n = 10,521). Proteomic states were constructed using the full protein panels (c) and the cardiometabolic panel only (f). The horizontal dashed grey line indicates “treat none,” while the vertical grey line indicates “treat all.” Stratified performances across polygenic risk and sex are also shown. Source data are provided as a Source Data file. DR diabetic retinopathy.
Given that the cardiometabolic panel captured most of the identified proteins (n = 158, Supplementary Table S4), aligning with the cardiovascular and metabolic nature of DR4, and that each Olink panel operates independently, we tested whether this single panel alone could provide stratification and predictive power. Encouragingly, the cardiometabolic proteomic state model exhibited comparable stratification performance to the full panel (Fig. 4d), achieving a C-statistic of 0.817 (95% CI: 0.722, 0.862) that surpassed both polygenic (P = 1.24 × 10−6) and clinical risk models (P = 0.009) (Fig. 4e). When integrated with these benchmarks, the cardiometabolic protein panel maintained improvements in both predictability and clinical utility over the polygenic (ΔC-statisticpolygenic = 0.176, 95% CI: 0.114, 0.239 [P = 3.65 × 10−8]; NRIpolygenic = 0.841, 95% CI: 0.625, 1.058 [P = 4.33 × 10−9]; IDIpolygenic = 0.057, 95% CI: 0.032, 0.081 [P = 1.07 × 10−5]), clinical models (ΔC-statisticclinical = 0.087, 95% CI: 0.036, 0.138 [P = 0.001]; NRIclinical = 0.389, 95% CI: 0.172, 0.606 [P = 4.50 × 10−4]; IDIclinical = 0.028, 95% CI: 0.005, 0.051 [P = 0.016]), and clinical + polygenic models (ΔC-statisticclinical + polygenic = 0.049, 95% CI: 0.005, 0.094 [P = 0.030]; NRIclinical + polygenic = 0.376, 95% CI: 0.157, 0.594 [P = 7.80 × 10−4]; IDIclinical + polygenic = 0.026, 95% CI: 0.002, 0.050 [P = 0.033]) (Fig. 4e). Integrating proteomic state into the clinical model constantly outperformed the one that integrated polygenic risk (ΔC-statistic = 0.052, 95% CI: 0.007, 0.097; P = 0.025). Subgroup analyses stratified by sex and genetic risk revealed consistent improvements across all strata, while the most pronounced benefits were generally observed in women and high-genetic-risk individuals (Fig. 4b, c, e, and f).
Protein importance ranking and parsimonious protein model
To assess the contribution of individual proteins to DR prediction, we next ranked the identified proteins based on their importance to the prediction task (Methods). This strategy introduces subtle perturbations to the model inputs, allowing the marginal contribution of each protein feature to be assessed66. Consistent with its suggested role in neural inflammation35,36, PLXNB2 emerged as the strongest predictor among all proteins tested (Fig. 5a and Supplementary Table S37), while holding the widest range of Shapley additive explanations (SHAP) for predicting DR (Fig. 5b). Even when considered independently, without any additional information, PLXNB2 achieved a C-statistic of 0.722 (95% CI: 0.657, 0.788), comparably to the clinical risk factor model (P = 0.268) and outperforming the polygenic risk model (P = 0.014). Other highly informative proteins included GDF15 and REN (Fig. 5a, b), both of which have been implicated in diabetes-related inflammation and microvascular dysfunction26,32,39. When evaluated individually, they achieved C-statistics of 0.712 (95% CI: 0.644, 0.780) and 0.669 (95% CI: 0.600, 0.739) (Fig. 5a). Subgroup analyses based on sex and polygenic risk again revealed consistent performance across all strata, with PLXNB2 generally being more informative in women and those with high genetic susceptibility while GDF15 and REN exhibited quite the opposite patterns (Supplementary Table S38).
a, b Importance ranking and SHapley Additive exPlanations (SHAP) visualization of key proteins. The bar plot (a) indicates the ranked importance of proteins based on their contributions to DR prediction as determined by the SHAP values, and the line with 95% confidence intervals (CIs) shown as shading represents each protein when assessed individually (n = 10,521). The three top-ranked proteins are highlighted in red. Gradient colours in the SHAP summary plot (b) indicate the magnitude of individual protein contributions to prediction. c Cumulative DR hazard over the follow-up period (n = 10,521), stratified by tertiles of the three-protein states. Data are presented as observed event frequencies with 95% CIs shown as shading derived from survival proportions. High risk corresponds to the top tertile, middle risk to the middle tertile, and low risk to the bottom tertile. d Distribution of PLXNB2, GDF15, and REN levels in baseline blood samples from individuals who developed DR (n = 425) versus those who did not (n = 10,096) over the follow-up. Data are presented as density clouds showing the full distribution, with overlaid box plots indicating the median (centre line), interquartile range (bounds of the box), and whiskers extending to the minimum and maximum within 1.5× the interquartile range. Individual data points, including minima and maxima, are shown as dots. P values were calculated using two-sided Student’s t-tests without multiple comparison adjustments. e Receiver operating characteristic curves comparing the clinical risk factor model, polygenic risk model, clinical risk factors + polygenic risk model, clinical risk factors + 3-protein states, polygenic risk + 3-protein states, and clinical risk factors + polygenic risk + 3-protein states for predicting future DR (n = 10,521). The dashed grey line represents random classifier. f Standardized net benefit curves of the same models for predicting future DR (n = 10,521). The horizontal dashed grey line indicates “treat none,” and the vertical grey line indicates “treat all.” Source data are provided as a Source Data file. DR diabetic retinopathy.
We further developed a parsimonious protein model comprising only three top-ranked proteins (i.e., PLXNB2, GDF15, and REN), all of which were significantly enriched in baseline blood samples from individuals who developed DR over the 15-year follow-up (Fig. 5d). Compared to individuals in the bottom tertile of the three-protein states, those in the top tertile exhibited over 25-fold higher risk (HR = 26.245, 95% CI: 6.367, 108.190; P = 1.11 × 10−16) (Fig. 5e). When integrated into the clinical risk factor model, the three-protein states resulted in a ΔC-statisticclinical of 0.049 (95% CI: 0.005, 0.093; P = 0.039) and an overall C-statisticclinical of 0.802 (95% CI: 0.750, 0.855), with an NRIclinical of 0.383 (95% CI: 0.137, 0.630; P = 0.002) and IDIclinical of 0.017 (95% CI: 0.005, 0.029; P = 0.007), producing comparable performance while offering superior translational potential for clinical applications (Fig. 5e). When compared to the polygenic risk model and clinical + polygenic model, respectively, integrating the three-protein states yielded a ΔC-statisticpolygenic of 0.191 (95% CI: 0.099, 0.285; P = 4.73 × 10−5) and a ΔC-statisticclinical + polygenic of 0.051 (95% CI: 0.005, 0.097; P = 0.030), an NRIpolygenic of 0.615 (95% CI: 0.366, 0.863; P = 8.71 × 10−6) and an NRIclinical + polygenic of 0.333 (95% CI: 0.083, 0.584; P = 0.009), and an IDIpolygenic of 0.013 (95% CI: 0.004, 0.021; P = 0.004) and IDIclinical + polygenic of 0.016 (95% CI: 0.004, 0.027; P = 0.009). Subgroup performances were comparable across sex and genetic susceptibility (Fig. 5e, f).
Replication in an independent cohort
Since the cardiometabolic panel captured most of the identified proteins (n = 158, Supplementary Table S4), we replicated the Olink cardiometabolic proximity extension assay (PEA) on baseline blood samples from 352 GDES participants67 (Methods). Over 1.9 thousand person-years of follow-up, 36 GDES participants developed DR (Supplementary Table S39). After adjusting for confounders as in UKB model 2128 of the 158 proteins identified in the UKB cardiometabolic panel demonstrated concordant directionality in the GDES, with 97 proteins achieving statistical significance for their association with incident DR risk (Fig. 6a and Supplementary Table S40). Among these, the three top-ranked proteins identified in the UKB, i.e., PLXNB2 (HR = 2.321, 95% CI: 1.805, 2.986; PBH = 6.82 × 10−9), GDF15 (HR = 2.248, 95% CI: 1.710, 2.954; PBH = 1.19 × 10−7), and REN (HR = 1.720, 95% CI: 1.338, 2.212; PBH = 8.87 × 10−5) were all successfully replicated in the GDES. When linking DR-associated proteins to retinal structures, 34 of 42 associations with the PL identified in the UKB cardiometabolic panel were replicated in the GDES, of which 21 were statistically significant (Fig. 6b and Supplementary Table S41). These associations included REN (HR = −0.362, 95% CI: −0.533, −0.190; PBH = 4.40 × 10−5) and FABP4 (HR = −0.267, 95% CI: −0.434, −0.101; PBH = 0.002). Other replicable associations were observed for HYAL1 with RNFL thickness (HR = −0.849, 95% CI: −1.474, −0.224; PBH = 0.008) and CES1 with RPE thickness (HR = −0.484, 95% CI: −0.718, −0.250; PBH = 6.04 × 10−5).
a Associations of 367 proteins with incident DR, estimated using Cox proportional hazards (CPH) models adjusted for age, sex, ethnicity, HbA1c, duration of diabetes, BMI, and SBP (n = 352). P values were calculated using two-sided Wald tests without multiple comparison adjustments. b Associations of DR-associated proteins with retinal structure (n = 352), estimated using linear models adjusted for the same covariates. P values were calculated using two-sided Student’s t-tests after controlling false discovery rate (FDR) for multiple testing. Significant associations are marked with asterisks (***P < 0.001, **P < 0.01, *P < 0.05). c, e Receiver operating characteristic curves comparing the clinical risk factor model with models incorporating proteomic or 3-protein states for predicting future DR (n = 352). Shaded areas indicate performance gaps. d, h Distribution of PLXNB2, GDF15, and REN levels in baseline blood samples among individuals who developed DR (n = 36) and those who did not (n = 316) d and among those who experienced DR progression (n = 15) and those who did not (n = 337) (h). Data are presented as density clouds showing the full distribution, with overlaid box plots indicating the median (centre line), interquartile range (bounds of the box), and whiskers extending to the minimum and maximum within 1.5× the interquartile range. Individual data points, including minima and maxima, are shown as dots. P values were calculated using two-sided Student’s t-tests without multiple comparison adjustments. f Associations of 367 proteins with DR progression, estimated using CPH models adjusted for the same covariates. P values were calculated using two-sided Wald tests without multiple comparison adjustments. g, i Receiver operating characteristic curves comparing the clinical risk factor model with models incorporating proteomic or 3-protein states for predicting DR progression (n = 352). j Causal relationship between REN and incident DR. GWAS summary statistics for DR included 10,413 DR cases and 308,633 controls. Data are presented as estimated SNP effects (dots) with 95% confidence intervals (CIs) indicated by error bars. Source data are provided as a Source Data file. DR diabetic retinopathy, pQTL protein quantitative trait loci.
The integration of cardiometabolic proteomic profiles further replicated improvement in DR prediction beyond the clinical risk factor model, demonstrating a ΔC-statistic of 0.146 (95% CI: 0.061, 0.212; P = 1.60 × 10−5) and achieving an overall C-statistic of 0.853 (95% CI: 0.775, 0.931) in the GDES (Fig. 6c). Corresponding NRI and IDI were 0.847 (95% CI: 0.493, 1.201; P = 3.87 × 10−6) and 0.151 (95% CI: 0.087, 0.215; P = 5.40 × 10−6). Among the panel, PLXNB2, GDF15, and REN were validated as enriched in baseline blood samples from individuals who developed DR during the four-year follow-up period (all P < 0.05) (Fig. 6d). The parsimonious three-protein model comprising PLXNB2, GDF15, and REN achieved a C-statistic of 0.752 (95% CI: 0.649–0.854), replicating the performance comparable to the clinical risk factor model (C-statistic = 0.707, 95% CI: 0.588, 0.826; P = 0.357) as observed in the UKB internal assessment. Integration of this three-protein panel into the clinical risk model resulted in a ΔC-statistic of 0.126 (95% CI: 0.026, 0.225; P = 0.005) and an overall C-statistic of 0.833 (95% CI: 0.749, 0.916), with significant improvements in reclassification (NRI = 0.965, 95% CI: 0.285, 1.645; P = 4.43 × 10⁻⁸) and discrimination (IDI = 0.248, 95% CI: 0.033, 0.463; P = 3.68 × 10⁻⁸) (Fig. 6e). Subgroup analyses stratified by sex further validated the consistent improvements achieved by integrating proteomic states into clinical models, with either the full cardiometabolic panel or the parsimonious three-protein panel (Fig. 6c and e).
Extrapolation to DR progression and retinal microvascular degeneration
Using the revised ETDRS criteria23, GDES participants were graded for DR status at baseline and each follow-up visit (Methods) and subjected to an extrapolation task for predicting DR progression. After adjusting for the same confounders, 156 proteins were associated with future DR progression (Fig. 6f and Supplementary Table S42), with 128 overlapping those associated with incident DR. These included top-ranked proteins validated across cohorts, including PLXNB2 (HR = 2.246, 95% CI: 1.674, 3.015; PBH = 3.60 × 10−6), GDF15 (HR = 1.805, 95% CI: 1.377, 2.365; PBH = 2.25 × 10−4), and REN (HR = 2.324, 95% CI: 1.648, 3.277; PBH = 3.72 × 10−5) (Fig. 6h), alongside 28 novel observations, most with established roles in promoting or responding to oxidative stress and neural inflammation68,69,70. Enrichment analyses characterized networks of inflammatory processes and tissue remodelling underpinning DR progression, including lymphocyte proliferation, arylesterase activity, and heparan sulphate metabolism (Supplementary Fig. S4). Incorporating the cardiometabolic proteomic states into the clinical model significantly enhanced predictability for DR progression (ΔC-statistic = 0.238, 95% CI: 0.171, 0.305 [P = 2.85 × 10⁻¹²]; NRI = 1.138, 95% CI: 0.899, 1.377 [P = 6.69 × 10⁻¹¹]; IDI = 0.331, 95% CI: 0.257, 0.404 [P = 3.37 × 10⁻¹⁰]), as did the parsimonious three-protein states (ΔC-statistic = 0.175, 95% CI: 0.100, 0.251 [P = 5.50 × 10⁻⁶]; NRI = 0.488, 95% CI: 0.217, 0.759 [P = 4.22 × 10⁻⁴]; IDI = 0.120, 95% CI: 0.066, 0.173 [P = 1.08 × 10⁻⁵]) (Fig. 6g and i).
Leveraging data from annual retinal microvascular phenotyping in 352 GDES participants, we further investigated the impact of DR-associated proteins on the dynamics of microvascular degeneration (Methods). Adjusting for the same confounders, these proteins were significantly associated with impaired microvascular perfusion across various retinal subregions, showing largely consistent directionality with their associations to incident DR risk (Table 1). PLXNB2, consistent with its strong predictive power, exhibited the most prominent association with microvascular degeneration in the superficial macula (β = −0.480, 95% CI: −0.794, −0.166; P = 0.003), where the highest concentration of photoreceptor neurons for visual processing resides71. GDF15 demonstrated the strongest association within the optic nerve head (β = −0.861, 95% CI: −1.129, −0.594; P = 1.14 × 10−9), the convergence zone for all retinal nerve fibres71. REN, another top-ranked protein replicated across cohorts, was significantly associated with deep peripapillary microvascular degeneration (β = −0.249, 95% CI: −0.447, −0.050; P = 0.015). Interestingly, in line with earlier findings72,73,74, deep macular perfusion in early DR appeared either unaffected or exhibited unexpected associations, such as with PLXNB2 (β = 0.575, 95% CI: 0.394, 0.755; P = 1.53 × 10−9).
Genetic evidence linking blood proteins to DR and druggability profile
To infer causality between the top-ranked proteins and incident DR, we conducted two-sample MR (Methods). Protein quantitative trait loci (pQTLs) for the top-ranked proteins were derived from a GWAS conducted by the UKB75, while GWAS summary statistics for DR were obtained from the FinnGen R9 release34, which included 10,413 DR cases and 308,633 controls. Among the top-ranked proteins, REN was confirmed as a causal promoter of DR based on inverse-variance weighted (IVW) MR (ORIVW = 1.116, 95% CI: 1.032, 1.207; P = 0.011) (Fig. 6j), consistent with its observed association with incident DR in the UKB and GDES. Sensitivity MR methods (including weighted median, MR-Egger, MR pleiotropy Residual Sum and Outlier [MR-PRESSO]), as well as tests for pleiotropy (MR-Egger intercept and MR-PRESSO global), heterogeneity (Cochran’s Q), and bidirectional MR (Methods), reaffirmed the robustness of REN’s causal role and precluded reverse causality (Table 2). While PLXNB2 (ORIVW = 1.010, 95% CI: 0.952,1.071; P = 0.720) and GDF15 (ORIVW = 1.041, 95% CI: 0.890, 1.218; P = 0.661) exhibited concordant effect directions (ORIVW > 1), neither reached statistical significance. Sensitivity analyses for the two proteins yielded consistent null results (Table 2).
We next consulted druggable genome76, ChEMBL77, and DrugBank78 to assess the therapeutic potential of REN. Several approved drugs targeting REN, such as aliskiren and lisinopril, were identified and are already in use for conditions overlapping with DR pathophysiology, including diabetes mellitus and diabetic nephropathy (Supplementary Table S43).
Discussion
By performing a proteome-wide association study in the world’s largest population-based biobank, this study provides the inaugural revelation of the proteomic landscape of DR and identified a broad array of blood-based proteins with linkages to retinal structure, microvasculature, and genetic predisposition. These proteins characterize both established and novel biological pathways, offering enhanced risk prediction for both incident DR and its progression. PLXNB2, GDF15, and REN were the top-ranked proteins validated across cohorts, wherein a parsimonious panel designed based on these three proteins demonstrated an encouraging performance for clinical application. These findings were largely replicated in an independent diabetic cohort of distinct ethnicity, where standardized seven-field fundus photographs and optical coherence tomography (OCT) imaging ensured accurate DR diagnosis and grading. Furthermore, by integrating advanced retinal microvascular phenotyping, we provided additional evidence linking blood proteins to the dynamics of retinal degeneration. Finally, two sample MR confirmed REN as a causal driver of DR. Taken together, our findings highlight where blood-based proteomics may unravel biological revelations on DR pathogenesis and inform the need for clinical practice in diabetic management.
This study represents a pioneering exploration into the blood-based proteomic landscape of DR within a large, population-based biobank setting. Though few studies permit direct comparison, our findings parallel earlier observations in diabetic patients and animal-based experimental models16,31,79,80,81,82,83,84. In line with the landmark framework of intraocular “liquid-biopsy”16, we found replication of several intraocularly enriched proteins in peripheral blood, including ANGPTL4 and VEGF-D, each with known or mechanistically plausible roles in the microangiopathic pathogenesis of DR16,31,79,80,81. Among the previously reported blood-based evidence, we identified FABP4 as a key promoter of DR, supporting its proposed role in pro-inflammatory signalling through the NF-κB pathway82,83. IL-6, another pro-inflammatory cytokine, has been shown to exacerbate DR by upregulation endothelium-specific leukocyte adhesion molecules that trigger reactive oxygen species-induced damage, mitochondrial dysfunction, and inflammation84,85,86, corroborating our finding of its central role underlying DR protein network. Other proteins identified, such as collectin-12 and roundabout homolog 4, also exhibited a potential involvement in DR, though their specific mechanisms remain less understood87,88.
To elucidate the proteomic landscape across distinct retinal structures, we explored the relationships between DR-associated proteins and retinal morphology. REN, a key component of the renin-angiotensin system whose inhibition has been shown to mitigate diabetic microvascular complications32,89,90, emerged as a pivotal protein linked to retinal photoreceptor degeneration. This finding was replicated across independent cohorts of distinct ethnicity and confirmed by two sample MR as a causal DR driver and adopted as a druggable target for conditions overlapping with DR pathophysiology, highlighting its role in both DR aetiology and the development of potential targeted therapies. HYAL1 is another cross-cohort validated protein associated with RNFL thickness, reinforcing earlier studies implicating its role in microvascular pathogenesis through the degradation of the endothelial glycocalyx42,91,92. This adds novel evidence supporting the neurovascular interactions recently proposed in DR pathogenesis93. CES1, a regulator of lipid metabolism47, was also validated across cohorts in its association with RPE degeneration, aligning with the RPE’s sentinel role in maintaining retinal homeostasis and regulating metabolic processes37. Collectively, these findings offer valuable insights into the mechanisms linking proteomic alterations to DR pathology.
We identified and validated highly predictive proteins for DR across cohorts, pointing to underpinning pathways conferring DR risk. PLXNB2 emerged as the strongest predictor of all proteins, with performance already comparable to the combination of all traditional risk factors. While previous studies have implicated PLXNB2 in diabetic neuropathy35,36, our findings provide first evidence for its relevance and longitudinal predictive power for DR across cohorts, aligning with its established role in diabetes-related proinflammatory responses and microvascular impairment35,36. Activation of PLXNB2 by its ligands—such as soluble SEMA4D, which was also identified in this study as a DR-associated protein, which modulates leukocyte migration and antigen presentation, and has been reported to inhibit the growth and repair of axons94. Leveraging standardized DR grading based on the modified ETDRS system and DRCR.net criteria in the GDES, we were able to assess the severity of DR and provide preliminary revelation for its novel role in DR progression. However, MR analysis revealed no evidence for a causal effect, suggesting that PLXNB2 elevation likely reflects a heightened proinflammatory state or retinal neurovascular stress rather than a direct causal driver. The observed sex-specific effect may reflect differential immune responses or hormonal regulation as previously reported94. While these observations are consistent with emerging literature, further mechanistic studies are warranted to clarify the regulatory context of the PLXNB2 pathway in DR.
While GDF15 and REN have been primarily implicated in diabetic nephropathy26,39,95, their relationship with DR has not been elucidated yet. Here we present evidence of their longitudinal association with both DR development and progression and identified them as one of the strongest predictors for DR across cohorts. REN reflects activation of neurohormonal pathways that are known to exacerbate vascular permeability, inflammation, and endothelial dysfunction in both the kidney and retina39. Consistent with these roles, MR analysis supported its causal effect on DR, reinforcing potential as a pharmacological target with established benefits for retinal microvascular protection in diabetes. In contrast, GDF15 is a stress-responsive cytokine induced by mitochondrial dysfunction and cellular injury to maintain tissue homeostasis. Given that endothelial cells are a major source of circulating GDF1595,96, its elevation likely reflects subclinical endothelial damage, positioning GDF15 as a sensitive biomarker of early microvascular stress preceding DR onset. In diabetic nephropathy, GDF15’s association with microvascular dysfunction is exemplified by its correlation with microalbuminuria, an established marker of renal microvascular damage95. These mechanisms were supported by recent studies that linked GDF15 and REN to DR pathologies32,97,98, highlighting their shared involvement in diabetic microvascular complications across organs.
Adding proteomic profile to traditional and genetic risk models significantly improved DR prediction, reflecting their complementary information holding over known risk factors and genetic predisposition. Notably, the proteomic profile alone outperformed the combined traditional risk factors, contributing an additional 12% improvement in predictive accuracy, far surpassing the ~5% improvement achieved in our prior efforts using metabolomics to modify DR prediction99. This performance gap is plausible, since proteins are the direct executor of biological functions100. Furthermore, protein importance ranking identified key markers that could streamline clinical application. The parsimonious three-protein panel (PLXNB2, GDF15, and REN) demonstrated robust performance across both internal and external cohort evaluations (C-statistics > 0.8), underscoring its practical clinical utility for guiding DR targeted screening in diabetic patients. In resource-limited settings, measuring just these three proteins may suffice for DR risk stratification and prediction. Conversely, for contexts requiring higher accuracy, the full cardiometabolic panel or the complete proteomics assay may be prioritized. These findings suggest that proteomic profiles could serve as a powerful tool to refine diabetic management, with significant implications for clinical practice moving forward.
By employing advanced optical angiography71, we established the linkage of circulating proteins to dynamic microvasculature degeneration in DR. In line with recent studies highlighting the role of microvascular impairment in the development and progression of DR101,102, our findings revealed involvement of the cross-cohort validated proteins with region-specific impacts on the retinal microvasculature. Among the top proteins identified, PLXNB2 and GDF15 exhibited the most widespread impact on the retina, affecting microvasculature from the macular to the peripapillary regions. This pattern is consistent with their roles in mediating broad inflammatory responses and neurovascular stress, signalling early retinal neuroinflammation and microvascular dysfunction that precede overt DR pathologies. In contrast, REN predominantly impacted the optic nerve head, a region highly sensitive to perfusion fluctuations. The unique vascular anatomy and limited autoregulatory reserve likely heighten its susceptibility to RAS-mediated microcirculatory instability32. Moreover, segmentation of the retinal microvasculature into superficial and deep plexuses provided more refined insights into the dynamics of microvascular impairment. In line with earlier findings72,73, our observations indicate that the deep macula remains largely unaffected during early stages of DR, suggesting a variably layered response of the retina to DR-related stress.
The strengths of our study include the high-throughput proteomic analysis of a large sample size, an extensive follow-up period, and external validation with standardized DR grading and microvascular phenotyping, enabling both replication of prior findings and identification of novel blood markers for biological revelations and disease prediction. Certain limitations should be acknowledged. First, the diagnosis of DR in the UKB relied on medical records, which may underestimate the prevalence of DR, particularly asymptomatic early stages. Nevertheless, any resulting bias would likely attenuate associations toward the null rather than inflate them. Moreover, validation in an external cohort with stringent DR grading supported the robustness of our findings. Second, while the Olink platform provides a comprehensive assessment of circulating proteins, it does not encompass the entire human proteome. Lastly, the Caucasian-skewed population limits the generalizability of the current findings to other ethnic groups. Although key findings were replicated in an independent East Asian cohort, the relatively small sample size and the lack of representation from other ancestry groups such as Black and Oceanian indigenous, warrant caution in interpreting cross-population robustness. Future studies should evaluate the applicability of these findings in more ethnically and geographically diverse populations.
Another consideration pertains to the ascertainment of prediabetes. In both the UKB and GDES cohorts, prediabetes was defined based on fasting glucose and HbA1c levels, consistent with standard clinical practice in the absence of oral glucose tolerance testing (OGTT), as recommended by the American Diabetes Association103. However, this approach excludes individuals with isolated impaired glucose tolerance (IGT), who present with normal fasting glucose and HbA1c but exhibit post-challenge hyperglycaemia detectable only by OGTT104. Their absence represents selection bias rather than misclassification, as they were entirely excluded from all analyses, and thus no contamination of the analytic population occurred. Moreover, since individuals with isolated IGT are likely to carry higher proteomic and genetic risk burdens for diabetes and DR104, their exclusion likely biased our findings toward conservative estimates rather than inflated ones. Nevertheless, we acknowledge that our findings do not generalize to this unmeasured subgroup at this stage and that interpretation should therefore be approached with caution. Future studies incorporating OGTT-based classification would be valuable in assessing the broader applicability and generalizability of our findings across the full glycaemic spectrum.
In conclusion, utilizing an innovative data-driven strategy, we mapped the blood-based proteomic landscape for DR in the world’s largest population-based resource to date, with validation in an external, ethnically distinct cohort featuring standardized DR assessment and extensive ocular phenotyping. Our findings strongly support the association of peripheral circulating PLXNB2, GDF15, and REN with the risk of future DR, and link these proteins to retinal structure, genetic predisposition, and microvascular degeneration dynamics in the retina. The proteomic profiles substantially improved risk prediction and clinical utility for DR, conferring a 32-fold enrichment of risk among identified high-risk individuals. For streamlined clinical application, a parsimonious three-protein panel designed based on the top-ranked proteins (PLXNB2, GDF15, and REN) achieved comparable performance in predicting DR development and progression, while REN was confirmed as a causal promoter of DR through two sample MR. These findings may largely translate into the clinical utility of proteomic profiles as an additional tool to refine and optimize diabetic management strategies.
Methods
Study population
The UKB study is a prospective, multicentre cohort study that recruited over 500,000 participants aged 40–69 years from 22 assessment centres across England, Scotland, and Wales between 2006 and 201021. Participants were excluded if they lacked hospital records or proteomic data, did not meet diagnostic criteria for diabetes or prediabetes, or had a known diagnosis of DR at baseline. This resulted in an analytic cohort of 10,521 participants with DR-free prediabetes or diabetes in the current analysis. An additional 352 diabetic participants with proteomic and imaging data from the GDES were included for external validation and study extrapolation22. The study adhered to the Declaration of Helsinki and was approved by the Northwest Multicentre Research Ethics Committee (11/NW/0382) and the Ethics Committee of Zhongshan Ophthalmic Centre (2017KYPJ094). Written informed consent was obtained from all participants before enrolment. The study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) and Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines for reporting.
Blood-based proteomic profiling
Baseline blood samples were collected from UKB participants at 22 assessment centres across the UK between 2007 and 2010 and from GDES participants at Zhongshan Ophthalmic Centre between 2017 and 2021. Details of the Olink proteomics assay, data processing, and quality control procedures have been previously described75,105,106. In brief, EDTA plasma samples from the UKB underwent proteomic profiling using the antibody-based Olink Explore 3072 PEA, capturing 2923 unique proteins across eight protein panels. In the GDES, plasma samples were profiling with the Olink Explore 384 PEA, capturing 361 unique proteins in the cardiometabolic panel. All clinical and sample characteristics were blinded to the investigators conducting the assays. Sample controls ensured precision within and between plates, while plate controls standardized protein levels across plates. For both the UKB and GDES, interplate coefficients of variation were below 20% and intraplate coefficients below 10%. Raw counts were normalized using extension controls and log-transformed to produce Olink’s Normalized Protein eXpression (NPX) values, minimizing technical variability and enabling comparability across samples (Supplementary Methods).
Ascertainment of prediabetes, diabetes, DR, and covariates
As diabetic metabolic disorders affect the retina before clinically evident diabetes107,108,109, participants with both diabetes and pre-diabetes were considered at risk for DR and were included in identifying DR-associated proteins. Prediabetes was defined based on the presence of impaired fasting glucose and/or an HbA1c ranging from 5.7–6.4% (39–47 mmol/mol), in accordance with the 2021 American Diabetes Association guideline103. Diabetes was defined by either self-reported diagnosis by a physician during the touchscreen questionnaire, the use of insulin or hypoglycaemic medication, or an HbA1c ≥ 6.5% (48 mmol/mol), as previously described110. In the UKB, the determination of DR was based on ICD-10 code H360111. In the GDES, stereoscopic colour fundus photographs of seven standard fields and macular OCT imaging were obtained after pupil dilation for DR ascertainment67. Stages of DR were graded according to the modified ETDRS criteria, with severity scores assigned accordingly (Supplementary Methods)23,112. Incident DR was defined as the absence of DR progressing to any level, whereas progression was defined as a worsening of two or more levels during the follow-up period.
For covariates assessment, face-to-face interviews and comprehensive questionnaires were administered to all participants from the UKB and the GDES collecting information on age, sex, ethnicity, smoking status, and medical history. Baseline physical and biochemical measurements, including body mass index, systolic blood pressure, levels of high-density lipoprotein cholesterol, total cholesterol, and HbA1c, were also taken. For participants with pre-diabetes, the duration of diabetes was considered zero99.
Retinal tomographic structure assessment
Retinal structure assessment was performed in both cohorts using OCT, an optical biopsy technique that provides real-time morphology of the retinal tomographic structure with histological resolution113,114. OCT scanning in UKB participants was performed using a Topcon 3D OCT-1000 Mk II (Topcon, Japan), with a scan density of 512 A-scans × 128 B-scans, centred at the fovea, in an enclosed darkroom115,116. The Topcon Advanced Boundary Segmentation algorithm (TABS) was employed to automatically delineate boundaries and segment the retinal layers based on gradient information116, with accuracy and reproducibility previously validated116,117,118. Following rigorous quality control processes (Supplementary Methods), we considered five measurements for subsequent analyses: overall retinal, RNFL, GC-IPL, PL, and RPE thickness, all of which have established relevance to DR development37,45,119,120. In GDES participants, the updated generation of OCT devices (DRI OCT Triton; Topcon, Japan) that offers greater penetration (with a 1050 nm wavelength), faster scanning speed (within 1.3 seconds), and higher scanning density (of 512 A-scans × 512 B-scans) were employed121. This advanced scanning further reduces motion artifacts and provides more accurate retinal measurements71.
Genotyping and polygenic risk
Genotyping was conducted using the UK BiLEVE Axiom Array or the UKB Axiom Array. Quality control processes and imputation were conducted based on the Haplotype Reference Consortium and UK10K haplotype resources, as previously described122. GWAS summary statistics from the FinnGen R9 release34 were used to construct polygenic risk specific to DR. The use of an external, independent dataset was intentional to minimize overfitting and reduce the risk of bias that may arise from employing the same dataset for both PRS construction and application. Per-individual PRSs were calculated by summing the per-variant effect sizes weighted by allele dosages. To account for population structure and enable cross-ancestry comparability, the raw PRSs were subsequently centred and variance-standardized using ancestry-inferred reference distributions. Detailed procedures for PRS centring and standardization are provided in the Supplementary Methods. This standardized DR-specific PRS was then used for (1) subgroup and interaction analyses for the protein-DR association; (2) evaluating protein-PRS associations; and (3) prediction modelling of DR risk throughout the study. In addition, associations between protein levels and 58 individual DR risk variants, as curated from a systematic review of prior GWASs (P < 10−4) and meta-analyses (P < 0.05), were assessed48. The Benjamini-Hochberg (BH) method was employed to control for false discovery rate (FDR).
To evaluate the potential confounding of LD between PRS variants and cis-pQTLs for the target proteins, a sensitivity analysis excluding SNPs in strong LD with any known cis-pQTLs prior to PRS construction was conducted. The list of cis-pQTLs was curated from a proteomic study by Sun et al.75, which was conducted using the same platform and population. Pairwise LD estimation was performed using the 1000 Genomes Phase 3 European reference panel. SNPs with r2 > 0.8 within ±1 Mb of any cis-pQTL were removed, and a filtered PRS was reconstructed using the same approach based on the FinnGen summary statistics. Associations between the LD-filtered PRS and plasma protein levels were reassessed using the same models as in the primary analysis. Comparison of results before and after LD filtering allowed evaluation of the extent to which observed associations were attributable to true polygenic signal rather than confounding by LD.
Functional enrichment and PPI network
Enrichment analysis was performed on the identified DR-associated proteins after multiple testing correction in model 2. We queried three of the most comprehensive biological annotation and pathway databases: GO49, KEGG50, and Reactome51, using the full panel of Olink proteins as the reference. The enrichment scores were computed by taking the log of the P value from the Fisher’s exact test and multiplying that by the z-score of the deviation from the expected rank123,124. The BH method was employed to control for FDR. Next, we generated a DR-associated PPI network using data integrated from four publicly available databases: STRING60, BioGrid61, OmniPath62, and InWeb_InBioMap63. Densely connected subnetworks were identified using the MCODE algorithm65, with a κ-core of 2 and a node score cutoff of 0.2. For each MCODE component, pathway and process enrichment were performed as above. Summary of topology statistics were calculated using Cytoscape (version 3.10.2). The Benjamini-Hochberg method was employed to control for false discovery rate.
Modelling and evaluating DR proteomic states
The proteomic state models were trained on DR-associated proteins from: (1) the full eight Olink panels; and (2) solely the cardiometabolic panel in the discovery set using Extreme Gradient Boosting (XGBoost), an efficient ensemble machine learning algorithm widely used for disease prediction66,125,126. SHapley Additive exPlanations (SHAP) were calculated to estimate the marginal contribution of each protein feature in the proteomic state models66,125,126. The parsimonious three-protein models were trained using Cox model. Model evaluations were performed on the fully withheld test set as internal validation and on the GDES as external validation. Model features and parameters derived from the UKB training set were applied in both the UKB test set and the GDES. To assess the predictive performance of DR proteomic states for DR development and progression, C-statistics were calculated. The added predictability of DR proteomic states was evaluated compared to (1) polygenic risk model; (2) clinical risk model; and (3) clinical + polygenic risk model. Clinical predictors were derived from three established models: the Aspelund model127, the Hippisley-Cox and Coupland model128, and the Rhee model129,130. These models included age, sex, HbA1c, duration of diabetes, SBP, BMI, TC/HDL-c, and smoking status. NRIs and IDIs were computed to quantify potential finer increments in reclassification. Goodness of model fit was assessed using the Hosmer-Lemeshow test, and decision curve analyses were performed to estimate the benefits in clinical utility across decision thresholds.
Retinal microvascular assessment
GDES participants underwent retinal optical angiography utilizing Triton swept-source OCT angiography imaging system (DRI OCT Triton; Topcon, Japan) at each follow-up visit to visualize the retinal microvascular system101,130,131,132. Two scan modes were employed: the Angio Macula 6 × 6 mm scan mode centred on the fovea and the Angio Disc 6 × 6 mm scan mode centred on the optic nerve head, allowing for longitudinal assessment of microvascular dynamics in the macula and optic disc, respectively. Automatic image segmentation was performed using a built-in software (IMAGEnet 6, version 1.22) to segment the macular and the peripapillary region into the superficial and deep capillary plexus132,133,134. After predefined quality control135 and magnification effect correction136 processes (Supplementary Methods), the parafoveal vessel density (VD) was determined by averaging the VD of the four ETDRS inner-ring quadrants, while the perifoveal VD was determined by averaging the VD of the four ETDRS outer-ring quadrants. Dynamics of the microvascular perfusion were defined as the longitudinal change in VD within the corresponding subregion99.
Establishing causal links through two-sample MR
Two-sample MR was conducted to investigate whether a DR-associated protein causally promotes or protects against the risk of DR. Genetic instruments were derived from a GWAS of Olink proteomics conducted in the UKB75, while summary statistics for DR were sourced from the FinnGen R9 release34. Cis-pQTLs were used as genetic instruments to minimize pleiotropy and were defined as variants within ±1 Mb of the transcription start site of the corresponding protein-coding gene. LD pruning was performed using the 1000 Genomes Project Phase 3 (European) reference panel137, with an r² threshold of <0.001 to ensure independence among genetic instruments138,139. The IVW method was used as the primary MR analysis, with reverse causality tested through bidirectional analysis. To assess the robustness of causal estimates and account for potential pleiotropic bias, a suite of sensitivity analyses was performed, including weighted median, MR-Egger, and MR-PRESSO. The weighted median method provides valid causal estimates even if up to 50% of the instruments are invalid. MR-Egger provides pleiotropy-robust estimates and intercept tests for directional pleiotropy. MR-PRESSO identifies and corrects for outlier SNPs, with a global test for horizontal pleiotropy. In addition, Cochran’s Q test was used to examine heterogeneity among SNP-specific estimates.
Druggability profile assessment
Blood proteome is the major source of therapeutic targets139. To evaluate the druggability profile of DR-driven proteins confirmed through two sample MR, an upgraded list of druggable genes was adopted. Data on the clinical development status of drug candidates were extracted from the ChEMBL database, including information on drug molecule classifications, approved indications, and clinical trial outcomes. Information regarding drug-target interactions was obtained from DrugBank and supplemented by manual curation to identify candidate proteins with known or potential therapeutic applications. Compounds were classified into three categories based on their development stage: (1) approved (already approved for one or more clinical indications); (2) in development (currently undergoing clinical trials for specific conditions); and (3) druggable (potential targets identified through preclinical studies or computational predictions but not yet in clinical trials). This enabled a comprehensive assessment of the therapeutic landscape for DR-driven proteins, highlighting both existing treatments and opportunities for repurposing or developing new drug candidates.
Statistical analysis
Data analyses were performed using R (version 4.2.2). No statistical methods were used to predetermine sample size; however, our sample size is comparable to or larger than those in prior studies99,140,141. Continuous variables were reported as mean (SD) and categorical variables were reported as number (percentage). Student’s t-test and Pearson’s χ2-test were used to compare continuous and categorical variables, respectively. To ensure comparability across proteins, Olink’s NPX transformation was applied to all raw counts and data distribution was assumed to be normal75,105,106.
Participants were randomly assigned to discovery and test sets at an 8:2 ratio, with the discovery set for identifying DR-associated proteins, characterizing underlying biological pathways, and training the proteomic state model, and the fully withheld test set was reserved for internal model evaluation. The associations of protein levels with incident DR, DR progression, retinal tomographic structure, polygenic risk, individual genetic loci, and microvascular dynamics were assessed using multivariable Cox proportional hazards models and linear models in the UKB and GDES, where applicable, with increasing numbers of covariates adjusted in two sequential models. Model 1 was adjusted for age, sex, and ethnicity, and model 2 was additionally adjusted for HbA1c, duration of diabetes, BMI, SBP, and status of diabetes (prediabetes/diabetes). These were known factors associated with DR risk24,25. The associations of protein levels with polygenic risk and individual genetic loci were additionally adjusted for the first ten genetic principal components. To account for multiple testing, the BH method was employed to control for FDR. Proteins reaching significance were considered candidate proteomic fingerprints for DR. The proportional hazards assumption was tested using the Schoenfeld residual method and was satisfied for each model.
Sensitivity analyses were conducted by (1) additionally adjusting for hypoglycaemic medication use; (2) excluding those who experienced DR within the first 2 years of follow-up; and (3) stratifying analyses by polygenic risk and sex. The extent to which hypoglycaemic medication use contributed to variability in protein levels was assessed by fitting two models for each DR-associated protein: a base model including the aforementioned covariates, and an integrated model combining hypoglycaemic medication use. The marginal proportion of variance in protein abundance attributable to hypoglycaemic medication was quantified as the difference in adjusted R2 between the two models. An HR ratio (ρ) significantly deviating from 1 was considered disparities across subgroups. The interaction effects were tested by including an interaction term (protein × subgroup) in the models. A two-sided P < 0.05 was considered statistically significant, with exceptions where specified.
Ethics approval
The study was approved by the Northwest Multicentre Research Ethics Committee (11/NW/0382) and the Ethics Committee of Zhongshan Ophthalmic Center (2017KYPJ094).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data used in this study are available from UKB via data access procedures (http://www.ukbiobank.ac.uk). Permission to use the UK Biobank Resource was obtained via material transfer agreement as part of Application 105658. Raw data from the GDES are not publicly available due to data privacy law and were used for the purposes of this project with institutional permission from the Zhongshan Ophthalmic Center. All requests for access to in-house data should be addressed to the corresponding authors, Dr Wei Wang (email: wangwei@gzzoc.com), and will be processed in accordance with Zhongshan Ophthalmic Center policies. The GDES group will evaluate all requests based on the purpose of the data request, and it may take ~90 days to process the request. A material-transfer and data-usage agreement will be required between Zhongshan Ophthalmic Center and the receiving organization, and the requesting organization must state the intended purpose of the data transfer and provide assurances that the transferred data will only be used for non-commercial academic and educational purposes in compliance with Zhongshan Ophthalmic Center institutional policies. Source data are provided with this paper.
Code availability
All software used in this study is publicly available. The code used in this study can be access at (https://github.com/zocskl/DRPro142).
References
Prince, M. J. et al. The burden of disease in older people and implications for health policy and practice. Lancet 385, 549–562 (2015).
The Lancet. Beat diabetes: an urgent call for global action. Lancet 387, 1483 (2016).
GBD 2021 Diabetes Collaborators Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet 402, 203–234 (2023).
Cheung, N., Mitchell, P. & Wong, T. Y. Diabetic retinopathy. Lancet 376, 124–136 (2010).
Stitt, A. W. et al. The progress in understanding and treatment of diabetic retinopathy. Prog. Retin. Eye Res. 51, 156–186 (2016).
Simo, R. & Hernandez, C. Novel approaches for treating diabetic retinopathy based on recent pathogenic evidence. Prog. Retin. Eye Res. 48, 160–180 (2015).
Little, H. L. Treatment of proliferative diabetic retinopathy. Long. -term. results argon. laser Photocoagul. Ophthalmol. 92, 279–283 (1985).
Bressler, N. M. et al. Persistent macular thickening following intravitreous aflibercept, bevacizumab, or ranibizumab for central-involved diabetic macular edema with vision impairment: a secondary analysis of a randomized clinical trial. Jama Ophthalmol. 136, 257–269 (2018).
Vujosevic, S. et al. Screening for diabetic retinopathy: new perspectives and challenges. Lancet Diab. Endocrinol. 8, 337–347 (2020).
Solomon, S. D. et al. Diabetic retinopathy: a position statement by the american diabetes association. Diab. Care 40, 412–418 (2017).
Jampol, L. M., Glassman, A. R. & Sun, J. Evaluation and care of patients with diabetic retinopathy. N. Engl. J. Med. 382, 1629–1637 (2020).
Hirsch, I. B. & Brownlee, M. Beyond hemoglobin A1c–need for additional markers of risk for diabetic microvascular complications. Jama 303, 2291–2292 (2010).
Oh, H. S. et al. Organ aging signatures in the plasma proteome track health and disease. Nature 624, 164–172 (2023).
Lehallier, B. et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat. Med. 25, 1843–1850 (2019).
Carrasco-Zanini, J. et al. Proteomic signatures improve risk prediction for common and rare diseases. Nat. Med. 30, 2489–2498 (2024).
Wolf, J. et al. Liquid-biopsy proteomics combined with AI identifies cellular drivers of eye aging and disease in vivo. Cell 186, 4868–4884 (2023).
Carrasco-Zanini, J. et al. Proteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study. Lancet Digit. Health 6, e470–e479 (2024).
Li, Q. et al. alpha-Klotho prevents diabetic retinopathy by reversing the senescence of macrophages. Cell Commun. Signal. 22, 449 (2024).
Li, T. et al. Cellular communication network factor 1 promotes retinal leakage in diabetic retinopathy via inducing neutrophil stasis and neutrophil extracellular traps extrusion. Cell Commun. Signal. 22, 275 (2024).
Qiu, J. et al. NOD1 deficiency ameliorates the progression of diabetic retinopathy by modulating bone marrow-retina crosstalk. Stem Cell Res. Ther. 15, 38 (2024).
UK Biobank data on 500,000 people paves way to precision medicine. Nature 562, 163–164 (2018).
Yang, S. et al. Photoreceptor metabolic window unveils eye-body interactions. Nat. Commun. 16, 697 (2025).
Grading diabetic retinopathy from stereoscopic color fundus photographs - an extension of the modified Airlie House classification: ETDRS Report Number 10. Ophthalmology 127, S99–S119 (2020).
Zhang, X. et al. Prevalence of diabetic retinopathy in the United States, 2005–2008. Jama 304, 649–656 (2010).
Yau, J. W. et al. Global prevalence and major risk factors of diabetic retinopathy. Diab. Care 35, 556–564 (2012).
Wang, D. et al. GDF15: emerging biology and therapeutic applications for obesity and cardiometabolic disease. Nat. Rev. Endocrinol. 17, 592–607 (2021).
St, S. J. et al. Biomarkers of cellular senescence and risk of death in humans. Aging Cell 22, e14006 (2023).
Jimenez-Loygorri, J. I. et al. Mitophagy in the retina: viewing mitochondrial homeostasis through a new lens. Prog. Retin. Eye Res. 96, 101205 (2023).
Prentice, K. J. et al. A hormone complex of FABP4 and nucleoside kinases regulates islet function. Nature 600, 720–726 (2021).
Aryal, B., Price, N. L., Suarez, Y. & Fernandez-Hernando, C. ANGPTL4 in metabolic and cardiovascular disease. Trends Mol. Med. 25, 723–734 (2019).
Babapoor-Farrokhran, S. et al. Angiopoietin-like 4 is a potent angiogenic factor and a novel therapeutic target for patients with proliferative diabetic retinopathy. Proc. Natl. Acad. Sci. USA 112, E3030–E3039 (2015).
Fletcher, E. L., Phipps, J. A., Ward, M. M., Vessey, K. A. & Wilkinson-Berka, J. L. The renin-angiotensin system in retinal health and disease: Its influence on neurons, glia and the vasculature. Prog. Retin. Eye Res. 29, 284–311 (2010).
Vujkovic, M. et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 52, 680–691 (2020).
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Casden, N., Belzer, V., El, K. A., El, F. R. & Behar, O. Astrocyte-to-microglia communication via Sema4B-Plexin-B2 modulates injury-induced reactivity of microglia. Proc. Natl. Acad. Sci. USA 121, e1894319175 (2024).
Rajabinejad, M. et al. The MALAT1-H19/miR-19b-3p axis can be a fingerprint for diabetic neuropathy. Immunol. Lett. 245, 69–78 (2022).
Tonade, D. & Kern, T. S. Photoreceptor cells and RPE contribute to the development of diabetic retinopathy. Prog. Retin. Eye Res. 83, 100919 (2021).
Lagnado, L. & Baylor, D. Signal flow in visual transduction. Neuron 8, 995–1002 (1992).
Perkins, B. A., Aiello, L. P. & Krolewski, A. S. Diabetes complications and the renin-angiotensin system. N. Engl. J. Med. 361, 83–85 (2009).
Saidha, S. et al. Relationships between retinal axonal and neuronal measures and global central nervous system pathology in multiple sclerosis. Jama Neurol. 70, 34–43 (2013).
Savinov, A. Y., Wong, F. S., Stonebraker, A. C. & Chervonsky, A. V. Presentation of antigen by endothelial cells and chemoattraction are required for homing of insulin-specific CD8+ T cells. J. Exp. Med. 197, 643–656 (2003).
Dogne, S. et al. Hyaluronidase 1 Deficiency Preserves Endothelial Function and Glycocalyx Integrity in Early Streptozotocin-Induced Diabetes. Diabetes 65, 2742–2753 (2016).
Olaniru, O. E. et al. SNAP-tag-enabled super-resolution imaging reveals constitutive and agonist-dependent trafficking of GPR56 in pancreatic beta-cells. Mol. Metab. 53, 101285 (2021).
Cottarelli, A. et al. Fgfbp1 promotes blood-brain barrier development by regulating collagen IV deposition and maintaining Wnt/beta-catenin signaling. Development 147, 185140 (2020).
Yang, S. et al. Metabolic fingerprinting on retinal pigment epithelium thickness for individualized risk stratification of type 2 diabetes mellitus. Nat. Commun. 14, 6573 (2023).
Frodermann, V. et al. Exercise reduces inflammatory cell production and cardiovascular inflammation via instruction of hematopoietic progenitor cells. Nat. Med. 25, 1761–1771 (2019).
Xu, J. et al. Hepatic carboxylesterase 1 is essential for both normal and farnesoid X receptor-controlled lipid homeostasis. Hepatology 59, 1761–1771 (2014).
Liao, W. L. et al. Multilocus genetic risk score for diabetic retinopathy in the Han Chinese population of Taiwan. Sci. Rep. 8, 14535 (2018).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25, 25–29 (2000).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Milacic, M. et al. The reactome pathway knowledgebase 2024. Nucleic Acids Res. 52, D672–D678 (2024).
Yousri, N. A. et al. Metabolic and metabo-clinical signatures of type 2 diabetes, obesity, retinopathy, and dyslipidemia. Diabetes 71, 184–205 (2022).
Kahl, S. & Roden, M. Amino acids - lifesaver or killer in patients with diabetes?. Nat. Rev. Endocrinol. 14, 449–451 (2018).
Geraldes, P. et al. Activation of PKC-delta and SHP-1 by hyperglycemia causes vascular cell apoptosis and diabetic retinopathy. Nat. Med. 15, 1298–1306 (2009).
Penn, J. S. et al. Vascular endothelial growth factor in eye disease. Prog. Retin. Eye Res. 27, 331–371 (2008).
Giacco, F. & Brownlee, M. Oxidative stress and diabetic complications. Circ. Res. 107, 1058–1070 (2010).
Tang, J. & Kern, T. S. Inflammation in diabetic retinopathy. Prog. Retin. Eye Res. 30, 343–358 (2011).
Roy, S. & Kim, D. Retinal capillary basement membrane thickening: role in the pathogenesis of diabetic retinopathy. Prog. Retin. Eye Res. 82, 100903 (2021).
Zhang, S. X. et al. The endoplasmic reticulum: homeostasis and crosstalk in retinal health and disease. Prog. Retin. Eye Res. 98, 101231 (2024).
Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Turei, D., Korcsmaros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).
Li, T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017).
Chen, M., Luo, C., Zhao, J., Devarajan, G. & Xu, H. Immune regulation in the aging retina. Prog. Retin. Eye Res. 69, 159–172 (2019).
Bader, G. D. & Hogue, C. W. An automated method for finding molecular complexes in large protein interaction networks. Bmc Bioinforma. 4, 2 (2003).
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).
Zhang, S. et al. Design and baseline data of the diabetes registration study: guangzhou diabetic eye study. Curr. Eye Res. 48, 591–599 (2023).
Ebert, J. et al. Paraoxonase-2 regulates coagulation activation through endothelial tissue factor. Blood 131, 2161–2172 (2018).
Chung, H. L. et al. Loss- or gain-of-function mutations in ACOX1 cause axonal loss via different mechanisms. Neuron 106, 589–606 (2020).
Homey, B. et al. CCL27-CCR10 interactions regulate T cell-mediated skin inflammation. Nat. Med. 8, 157–165 (2002).
Laíns, I. et al. Retinal applications of swept source optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA). Prog. Retin. Eye Res. 84, 100951 (2021).
Aschauer, J. et al. Longitudinal analysis of microvascular perfusion and neurodegenerative changes in early type 2 diabetic retinal disease. Br. J. Ophthalmol. 106, 528 (2022).
Nesper, P. L. et al. Quantifying microvascular abnormalities with increasing severity of diabetic retinopathy using optical coherence tomography angiography. Invest. Ophthalmol. Vis. Sci. 58, BIO307–BIO315 (2017).
Fragiotta, S. et al. Progression biomarkers of microvascular and photoreceptor changes upon long-term evaluation in type 1 diabetes. Invest. Ophthalmol. Vis. Sci. 64, 23 (2023).
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Finan, C. et al. The druggable genome and support for target identification and validation in drug development. Sci. Transl. Med. 9, eaag1166 (2017).
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
Knox, C. et al. DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Res. 52, D1265–D1275 (2024).
Loporchio, D. F. et al. Cytokine levels in human vitreous in proliferative diabetic retinopathy. Cells 10, 1069 (2021).
Kovacs, K. et al. Angiogenic and inflammatory vitreous biomarkers associated with increasing levels of retinal ischemia. Invest. Ophthalmol. Vis. Sci. 56, 6523–6530 (2015).
Sodhi, A. et al. Angiopoietin-like 4 binds neuropilins and cooperates with VEGF to induce diabetic macular edema. J. Clin. Invest. 129, 4593–4608 (2019).
Garin-Shkolnik, T., Rudich, A., Hotamisligil, G. S. & Rubinstein, M. FABP4 attenuates PPARgamma and adipogenesis and is inversely correlated with PPARgamma in adipose tissues. Diabetes 63, 900–911 (2014).
Huang, P. et al. Fatty acid-binding protein 4 in patients with and without diabetic retinopathy. Diab. Metab. J. 46, 640–649 (2022).
Fickweiler, W. et al. Elevated retinol binding protein 3 concentrations are associated with decreased vitreous inflammatory cytokines, VEGF, and progression of diabetic retinopathy. Diab. Care 45, 2159–2162 (2022).
Mesquida, M., Drawnel, F. & Fauser, S. The role of inflammation in diabetic eye disease. Semin. Immunopathol. 41, 427–445 (2019).
Robinson, R. et al. Interleukin-6 trans-signaling inhibition prevents oxidative stress in a mouse model of early diabetic retinopathy. Redox Biol. 34, 101574 (2020).
Zhao, L., Xu, H., Liu, X., Cheng, Y. & Xie, J. The role of TET2-mediated ROBO4 hypomethylation in the development of diabetic retinopathy. J. Transl. Med. 21, 455 (2023).
Peng, D. et al. Common variants in or near ZNRF1, COLEC12, SCYL1BP1 and API5 are associated with diabetic retinopathy in Chinese patients with type 2 diabetes. Diabetologia 58, 1231–1238 (2015).
Estacio, R. O. et al. The effect of nisoldipine as compared with enalapril on cardiovascular outcomes in patients with non-insulin-dependent diabetes and hypertension. N. Engl. J. Med. 338, 645–652 (1998).
Lewis, E. J., Hunsicker, L. G., Bain, R. P. & Rohde, R. D. The effect of angiotensin-converting-enzyme inhibition on diabetic nephropathy. The Collaborative Study Group. N. Engl. J. Med. 329, 1456–1462 (1993).
Dogne, S., Flamion, B. & Caron, N. Endothelial Glycocalyx as a Shield Against Diabetic Vascular Complications: Involvement of Hyaluronan and Hyaluronidases. Arterioscler. Thromb. Vasc. Biol. 38, 1427–1439 (2018).
Wang, G., Tiemeier, G. L., van den Berg, B. M. & Rabelink, T. J. Endothelial glycocalyx hyaluronan: regulation and role in prevention of diabetic complications. Am. J. Pathol. 190, 781–790 (2020).
Antonetti, D. A., Silva, P. S. & Stitt, A. W. Current understanding of the molecular and cellular pathology of diabetic retinopathy. Nat. Rev. Endocrinol. 17, 195–206 (2021).
Rajabinejad, M. et al. Semaphorin 4A, 4C, and 4D: function comparison in the autoimmunity, allergy, and cancer. Gene 746, 144637 (2020).
Hellemons, M. E. et al. Growth-differentiation factor 15 predicts worsening of albuminuria in patients with type 2 diabetes. Diab. Care 35, 2340–2346 (2012).
Kempf, T. et al. GDF-15 is an inhibitor of leukocyte integrin activation required for survival after myocardial infarction in mice. Nat. Med. 17, 581–588 (2011).
Li, Y., Yan, Z., Chaudhry, K. & Kazlauskas, A. The renin-angiotensin-aldosterone system (RAAS) is one of the effectors by which vascular endothelial growth factor (VEGF)/Anti-VEGF controls the endothelial cell barrier. Am. J. Pathol. 190, 1971–1981 (2020).
Chung, J. O., Park, S. Y., Cho, D. H., Chung, D. J. & Chung, M. Y. Relationship between plasma growth differentiation factor-15 levels and diabetic retinopathy in individuals with type 2 diabetes. Sci. Rep. 10, 20568 (2020).
Yang, S. et al. Plasma metabolomics identifies key metabolites and improves prediction of diabetic retinopathy: development and validation across multinational cohorts. Ophthalmology 131, 1436–1446 (2024).
Tyers, M. & Mann, M. From genomics to proteomics. Nature 422, 193–197 (2003).
Waheed, N. K. et al. Optical coherence tomography angiography in diabetic retinopathy. Prog. Retin. Eye Res. 97, 101206 (2023).
Durbin, M. K. et al. Quantification of retinal microvascular density in optical coherence tomographic angiography images in diabetic retinopathy. Jama Ophthalmol. 135, 370–376 (2017).
2. Classification and diagnosis of diabetes: standards of medical care in diabetes-2021. Diabetes Care. 44, S15-S33 (2021).
Carrasco-Zanini, J. et al. Proteomic signatures for identification of impaired glucose tolerance. Nat. Med. 28, 2293–2300 (2022).
Eldjarn, G. H. et al. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 348–358 (2023).
Dhindsa, R. S. et al. Rare variant associations with plasma protein levels in the UK Biobank. Nature 622, 339–347 (2023).
Tabak, A. G., Herder, C., Rathmann, W., Brunner, E. J. & Kivimaki, M. Prediabetes: a high-risk state for diabetes development. Lancet 379, 2279–2290 (2012).
The prevalence of retinopathy in impaired glucose tolerance and recent-onset diabetes in the Diabetes Prevention Program. Diabet. Med. 24, 137-144 (2007).
Nguyen, T. T., Wang, J. J. & Wong, T. Y. Retinal vascular changes in pre-diabetes and prehypertension: new findings and their research and clinical implications. Diab. Care 30, 2708–2715 (2007).
Zhang, P. et al. Association of serum 25-hydroxyvitamin D with cardiovascular outcomes and all-cause mortality in individuals with prediabetes and diabetes: results from the UK biobank prospective cohort study. Diab. Care 45, 1219–1229 (2022).
Chen, X. et al. Vitamin D status, Vitamin D receptor polymorphisms, and risk of microvascular complications among individuals with type 2 diabetes: a prospective study. Diab. Care 46, 270–277 (2023).
Klein, R., Knudtson, M. D., Lee, K. E., Gangnon, R. & Klein, B. E. K. The wisconsin epidemiologic study of diabetic retinopathy XXII: the twenty-five-year progression of retinopathy in persons with type 1 diabetes. Ophthalmology 115, 1859–1868 (2008).
Aumann, S., Donner, S., Fischer, J. & Müller, F. Optical coherence tomography (OCT): principle and technical realization. https://doi.org/10.1007/978-3-030-16638-0_3 (2019).
Drexler, W. & Fujimoto, J. G. State-of-the-art retinal optical coherence tomography. Prog. Retin. Eye Res. 27, 45–88 (2008).
Patel, P. J. et al. Spectral-domain optical coherence tomography imaging in 67321 adults. Ophthalmology 123, 829–840 (2016).
Yang, Q. et al. Automated layer segmentation of macular OCT images using dual-scale gradient information. Opt. Express 18, 21293–21307 (2010).
Keane, P. A. et al. Optical coherence tomography in the UK biobank study – rapid automated analysis of retinal thickness for large population-based studies. Plos One 11, e0164095 (2016).
Zebardast, N. et al. Characteristics of p.Gln368Ter myocilin variant and influence of polygenic risk on glaucoma penetrance in the UK biobank. Ophthalmology 128, 1300–1311 (2021).
Zerbini, G. et al. Progressive thinning of retinal nerve fiber layer/ganglion cell layer (RNFL/GCL) as biomarker and pharmacological target of diabetic retinopathy. Int. J. Mol. Sci. 24, 12672 (2023).
Ng, D. S. et al. Retinal ganglion cell neuronal damage in diabetes and diabetic retinopathy. Clin. Exp. Ophthalmol. 44, 243–250 (2016).
Yang, S. et al. Analysis of plasma metabolic profile on ganglion cell-inner plexiform layer thickness with mortality and common diseases. Jama Netw. Open. 6, e2313220 (2023).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Guo, Y. et al. Plasma proteomic profiles predict future dementia in healthy adults. Nat. Aging 4, 247–260 (2024).
Liu, X. et al. Illness severity assessment of older adults in critical illness using machine learning (ELDER-ICU): an international multicentre study with subgroup bias evaluation. Lancet Digit. Health 5, e657–e667 (2023).
Al’Aref, S. J. et al. Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry. Eur. Heart J. 41, 359–367 (2020).
Aspelund, T. et al. Individual risk assessment and information technology to optimise screening frequency for diabetic retinopathy. Diabetologia 54, 2525–2532 (2011).
Hippisley-Cox, J. & Coupland, C. Development and validation of risk prediction equations to estimate future risk of blindness and lower limb amputation in patients with diabetes: cohort study. Bmj 351, h5441 (2015).
Rhee, S. Y. et al. Plasma glutamine and glutamic acid are potential biomarkers for predicting diabetic retinopathy. Metabolomics 14, 89 (2018).
Huang, Y. et al. Optic nerve head capillary network quantified by optical coherence tomography angiography and decline of renal function in type 2 diabetes: a three-year prospective study. Am. J. Ophthalmol. 253, 96–105 (2023).
Wang, W. et al. Choriocapillaris flow deficit and the risk of referable diabetic retinopathy: a longitudinal SS-OCTA study. Br. J. Ophthalmol. 107, 1319–1323 (2023).
Yang, S. et al. Genome-wide DNA methylation analysis of extreme phenotypes in the identification of novel epigenetic modifications in diabetic retinopathy. Clin. Epigenetics. 14, 137 (2022).
Sun, Z. et al. OCT Angiography metrics predict progression of diabetic retinopathy and development of diabetic macular edema: a prospective study. Ophthalmology 126, 1675–1684 (2019).
Park, Y. J., Kim, J., Lee, E. J. & Park, K. H. Peripapillary microvasculature of the retina and choriocapillaris in uninvolved fellow eyes of unilateral retinal vein occlusion patients. Retin. -J. Retin. Vitr. Dis. 42, 159–167 (2022).
Wu, Y., He, M., Huang, W. & Wang, W. Associations between retinal microvascular flow, geometry, and progression of diabetic retinopathy in type 2 diabetes: a 2-year longitudinal study. Acta Diabetol. 61, 195–204 (2024).
Bennett, A. G., Rudnicka, A. R. & Edgar, D. F. Improvements on Littmann’s method of determining the size of retinal features by fundus photography. Graefes. Arch. Clin. Exp. Ophthalmol. 232, 361–367 (1994).
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440 (2022).
Mavromatis, L. A. et al. Multi-omic underpinnings of epigenetic aging and human longevity. Nat. Commun. 14, 2236 (2023).
Zheng, J. et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat. Genet. 52, 1122–1131 (2020).
Islam, M. M. et al. Predicting the risk of diabetic retinopathy using explainable machine learning algorithms. Diab. Metab. Syndr. -Clin. Res. Rev. 17, 102919 (2023).
van der Heijden, A. A. et al. Prediction models for development of retinopathy in people with type 2 diabetes: systematic review and external validation in a Dutch primary care setting. Diabetologia 63, 1110–1119 (2020).
zocskl. zocskl/DRPro: Proteomic atlas of diabetic retinopathy (v1.1.1). Zenodo. https://doi.org/10.5281/zenodo.17096304 (2025).
Acknowledgements
This study was funded by the GBRCE for Major Blinding Eye Diseases Prevention and Treatment, the Hainan Province Clinical Medical Center, the National Natural Science Foundation of China (82371086, W.W.; 82401298, R.X; 82322016, C.H.), the Chinese Postdoctoral Science Foundation (2024M763790, R.X), the Projects of Research Center for Sharp Vision at The Hong Kong Polytechnic University (P0057931, S.T.), the Health and Medical Research Fund-Research Fellowship Scheme, Health Bureau, Hong Kong (07210207, S.T.), and the Lumitin Vision to Brightness Research Funding for the Young and middle-aged Ophthalmologists (BCF-KH-YK-20230803-03, S.C.). We thank all the participants and staff of the UKB and the GDES.
Author information
Authors and Affiliations
Contributions
Study concept and design: W.W.; Acquisition, analyses, or interpretation: S.Y., Z.X., R.X., ZY.Z, H.L., Y.C., ZH.Z., L.D.; Drafting of the manuscript: S.Y., W.W.; Critical revision of the manuscript for important intellectual content: W.W., CY.C., YC.T, JB.J., M.H., N.C., LZ.Z., X.S., W.H., L.Z., S.C., C.H, S.T.; Statistical analyses: S.Y., Z.X., W.W.; Obtained funding: W.W.; Administrative, technical, or material support: S.Y., LZ.Z., W.H.; Study supervision: W.W., JB.J., CY.C., YC.T.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Weiping Jia, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, S., Xin, Z., Xiong, R. et al. Proteome atlas for mechanistic discovery and risk prediction of diabetic retinopathy. Nat Commun 16, 9636 (2025). https://doi.org/10.1038/s41467-025-64634-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-64634-1