Background

Alzheimer’sdisease (AD) is a genetically complex disorder to which various pathophysiological processes are thought to contribute. Amyloid and tau pathology are the most well-known, but other processes, such as inflammation and cholesterol metabolism, among many others, play important roles in disease development as well [1]. Different risk factors may affect AD development by different mechanisms; therefore, patients may develop AD due to different combinations of causes and pathways. Accurately identifying and distinguishing which molecular mechanisms play the lead role on an individual basis is therefore crucial for etiological research, but also for clinical diagnosis, prognosis, and future therapeutic approaches.

Cerebrospinal fluid (CSF) biomarkers can provide insights into disease mechanisms, often before symptoms fully develop [2]. We have previously demonstrated the utility of linearly combining different AD CSF biomarkers into five statistically independent components, which likely represent different disease processes and which may be more informative than analyzing each CSF trait separately [3]. Specifically, we had applied principal component analysis (PCA) to data for six CSF biomarkers collected in two independent cohorts: the European Medical Information Framework for Alzheimer’s Disease Multimodal Biomarker Discovery (EMIF-AD MBD) study [4] and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [5].

A very similar structure representing five principal components (PC) was found in both cohorts and can be summarized as follows: [3] the first PC loaded strongly on tau and phosphorylated tau (pTau), and moderately on neurogranin (Ng) and YKL-40. Tau is a marker of neurodegeneration, with pTau being a component of neurofibrillary tangles [2, 6], Ng is a marker of synaptic functioning [7], while YKL-40 is associated with neuronal inflammation and astroglial reaction [2, 8,9,10]. Thus, this PC likely represents tau pathology and associated degenerative processes, such as deficits in synaptic functioning and elevated inflammation (henceforth referred to as “tau pathology/degeneration” (PC1)). The second PC loads specifically on Aβ42 only (“Aβ Pathology” PC2), a very early and important marker of amyloid deposition in the brain [2]. The third PC loads strongly on neurofilament light chain (NfL), but also moderately on YKL-40, and can be interpreted as representing neuronal injury and the accompanying inflammatory response (“injury/inflammation” PC3), as NfL is a component of axons and its presence in CSF is a non-specific marker of neuronal damage [6]. The fourth PC loads on YKL-40 and only weakly on tau and NfL, and therefore can be regarded as representing neuronal inflammation and astroglial reaction, not related to AD symptoms (“non-AD inflammation” PC4). Similarly, the fifth PC loads strongly on Ng and weakly on tau, representing synaptic functioning mostly independent of the other biomarkers and AD symptoms (“non-AD synaptic functioning” PC5). Please note, that the PC “names” are used for increased legibility, but are by necessity reductionist. We do not wish to imply that the named biological processes are unique or exhaustively describe any given PC.

After establishing the component structure, we applied these to search for rare variant associations using whole-exome sequencing in our previous study [3]. This work led to the identification of six genes, in which rare variants were associated with the CSF PCs. Specifically, we identified associations between the injury/inflammation component (PC3) and rare variants in IFFO1, DTNB, NLRC3, and SLC22A10, as well as between the non-AD synaptic functioning component (PC5) and rare variants in GABBR2 and CASZ [3]. Interestingly, rare variant associations with AD risk were simultaneously reported for the DTNBlocus in an independent project utilizing whole-genome sequencing in AD families and case-control datasets [11].

In this study, we aimed to extend the previous, rare-variant analyses to investigate the role of common variants on the PCA-defined CSF biomarker profiles. While previous GWAS in the field have screened for common-variant associations with single biomarkers [9, 12,13,14,15], to our knowledge no GWAS combining these CSF biomarkers in a multivariate framework has been performed to date. Multivariate analyses have the advantages of (i) allowing a more robust (compared to univariate analyses) quantification of different disease pathways, resulting in increased statistical power [16, 17] and (ii) enabling to differentiate various possible mechanisms of action more precisely.

Secondary aims of our study included the identification of sex-specific effects and AD mediation pathways. AD is more prevalent in women, and CSF biomarkers differentially predict brain and cognitive changes depending on sex [18, 19]. Furthermore, genetic effects on CSF biomarkers may depend on sex as well, e.g., rs34331204 on chromosome 7p21 was found to have a male-specific association with neurofibrillary tangles [20]. It is therefore prudent to investigate whether the component structure differs between sexes and whether associations of PCs with AD or with genetic predictors is sex-dependent. Finally, we performed mediation analyses to gauge whether potential SNP effects on CSF biomarker profiles also affect AD risk.

Methods

Participants

The presented work is part of the EMIF-AD project, a consortium of European studies investigating the etiology of AD and AD biomarkers with the aim to improve prognosis and diagnosis [4]. Participants included elderly individuals with cognitively unimpaired individuals, mild cognitive impairment (MCI), and AD type dementia. Both deep phenotyping (such as brain imaging and determination of CSF biomarkers) and genotyping (SNP arrays and WES) were performed on a large number of EMIF-AD participants [21,22,23]. The current study utilizes the existing CSF biomarker and SNP array data and combines them with a range of statistical methods not previously employed on these data. Written informed consent was obtained for all assessment before the start of the study [4]. The study was conducted in accordance to the Declaration of Helsinki and ethical approval was obtained from the Ethical Committee of the University of Lübeck, as well as local committees of consortium members [4]. More details on the recruitment and phenotype ascertainment protocols used in the EMIF-AD dataset can be found in Bos et al. [4].

To increase the generalizability of effect estimates and to increase power to detect new associations, we performed all analyses jointly with equivalent CSF biomarker and SNP genotype data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [5]. Data used in the preparation of this article were obtained from adni.loni.usc.edu. ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.

The current study utilized two participant selection paradigms for analysis: first, we selected participants for whom observations for at least 4 out of the 5 biomarkers were available. In total, this yielded 1158 participants to construct and examine the biomarker PC scores (Additional file 1: Table S1). Second, we only included participants with available SNP array data, who were unrelated and of European ancestry. This reduced the sample size to 973 participants (Additional file 1: Table S2, see also Hong et al. for detailed selection methods [13]). Overall, both EMIF-AD and ADNI were comparable datasets of elderly participants, with a mean age at ascertainment of 69 and 75 years, respectively (Additional file 1: Table S1). The distributions of diagnostic status were similar in both datasets as well, with approximately half of the sample diagnosed with MCI, while 25% presented either no cognitive impairment or with a diagnosis of AD (Additional file 1: Table S1).

We further assessed novel findings in the independent Knight Alzheimer Disease Research Center (Knight-ADRC) cohort. Applying the same inclusion criteria, we selected 786 participants with sufficient genotype and phenotype/CSF data. These were on average 69 years old and 46.95% male, 21.12% had MCI, and 4.71% AD (Additional file 1: Table S3, Additional file 2: Supplementary Methods).

Measures

Genotyping, imputation, and quality control (QC)

SNP genotypes were determined using the Infinium Global Screening Array (GSA; Illumina, Inc., USA) at the Institute of Clinical Molecular Biology (UKSH, Campus-Kiel) in EMIF-AD and using Illumina’s Omni 2.5 M or Human610-Quad arrays in ADNI. Autosomal SNPs in both GWAS datasets were processed with the same computational workflow [13], including the imputation of untyped variants with MiniMac 3 using the HRC 1.1 reference panel [24]. Here, we only analyzed common SNPs (MAF ≥ 0.01 per study) with sufficient imputation quality (R2 > 0.30) and SNPs within HWE (p < 5 × 10−6). To check for cryptic relatedness, we used the KING-robust kinship estimator implemented in PLINK (v2.0) with a cutoff of 0.025, which excludes family relationships down to the fourth degree. Please see Hong et al. for a detailed description of the GWAS methods, QC criteria, and processing pipeline [13]. Within Knight-ADRC, participants were genotyped using GSA, CoreExome, Illumina 660 K, NeuroX2, and OmniExpress arrays. All samples were imputed using the TOPMed imputation server and only SNPs with an imputation quality of R2 > 0.30 and HWE (P ≥ 1*10−6) were kept. Related individuals were identified through identity by descent (IBD; PLINK (v1.9)) with PI_HAT > 0.20 cutoff. For analysis, only samples of European ancestry, as determined by genotype PCs, and SNPs with MAF ≥ 0.0001 (0.01%) and 98% call rate were kept. For X-chromosome-specific methods, see Additional file 2: Supplementary Methods.

CSF biomarkers

Biomarkers were derived from CSF, as obtained via lumbar puncture [3, 5, 23]. For EMIF, the V-PLEX Plus AbPeptidePanel 1 Kit was used to measure Aβ, and in the case of tau, the INNOTEST ELISA was applied [23]. In ADNI, the Elecsys CSF immunoassay and a cobas e 601 analyzer assessed Aβ and tau concentration [25]. For both cohorts, ELISA was applied to assess NfL levels [23, 26]. Ng concentration was measured by ELISA in EMIF [23] and by electrochemiluminescence in ADNI [27]. ELISA was used to measure YKL-40 levels in EMIF [23] and LC/MRM‐MS proteomics were applied in ADNI [28]. For the YKL-40 proteomics data, we z-score standardized two ion frequencies with two peptide sequences each and averaged the values. In Knight-ADRC, Aβ, tau, and ptau levels were measured using Lumipulse G1200 automated assay system. NfL, Ng, and YKL-40 levels were obtained using the aptamer-based SOMAscan (v4) platform.

Statistical analysis

CSF biomarker PCA, sex differences, and AD associations

The analysis work-flow is summarized in Additional file 2: Fig. S1. First, we computed five PCs across all participants with sufficient biomarker information. PCs were defined as described previously [3] and assigned to specific functional domains, as described in the introduction. The PCA was performed on the phenotype level and PCs were constructed independent of genotype information. Briefly, we first applied a rank-based inverse normal transformation within both studies to decrease extreme skewness of the observed biomarker levels and to z-score standardize their scale across studies [29]. We used the missMDA package to determine the optimum number of components and account for missing data [30]. Specifically, we applied leave-one-out cross-validation, removing one observation at a time and predicting it by a PCA model fitted to the rest of the dataset. The model resulting in the smallest mean square error of prediction contained five components (PC1-PC5). We then imputed missing values to avoid excessive sample size losses and potential participation biases. This was achieved using a regularized iterative PCA method with five components, as implemented in missMDA. Finally, we performed a PCA with varimax rotation (an orthogonal rotation) and extracted five PC scores with the psych package [31]. All analyses were performed in R 4.0.3 [32] We applied the same PC loadings to the Knight-ADRC for replication purposes of results related to PC4 and PC5.

We have not previously explored to which degree the component structure differs between sexes, or whether the resulting PCs show sex-dependent associations with dementia symptoms. We first repeated PCA in both males and females and compared loadings. We then tested whether mean PC levels differed between sexes. This was achieved by regressing PC scores on sex, adjusted for age, five genetic ancestry components, diagnostic status (dummy coding MCI and AD), and study (ADNI vs EMIF-AD).

In a last step, we used the PC scores as predictor of latent AD. Here, latent AD is defined as an underlying continuous normally distributed variable, representing a range of probability to either have no cognitive impairment, MCI, or AD. Latent AD was estimated by item factor analysis [33]. Accordingly, participants with low scores (below −1.47SD) are unlikely to display cognitive impairment, above −1.47SD and below 0.40SD are most likely to suffer from MCI and above 0.40SD have a high probability to be affected by AD. To account for potential sex differences, we also added a product term between PC scores and sex, coded as male = 0 and female = 1. The biomarker PC term can therefore be interpreted as the association of biomarker PC scores on latent AD in males and the interaction term as the female-specific effect, i.e., the difference between sexes. These analyses were adjusted for the same covariates as in the main analyses, i.e., age, genetic ancestry, and study. We applied a structural equation model (SEM) with a weighted least square mean and variance adjusted (WLSMV) estimator using Lavaan 0.6–9 [34] to estimate latent AD and regress it onto biomarker PCs, sex, their interaction, and covariates.

GWAS and meta-analyses

We performed four sets of GWAS: main-effect GWAS analysis (both sexes), male-only GWAS, female-only GWAS, and sex interaction analyses. Within each analysis group, we performed five GWAS, one for each biomarker PC, separately for EMIF-AD and ADNI. For all GWAS, separate linear regression models were run in PLINK [35] to all autosomal and X-linked SNPs passing QC. For the X-chromosomal analyses genotype, dosage for hemizygous males was coded as 2, to reflect the same dosage as homozygous females [36]. The biomarker PC scores were treated as outcome, imputed SNP dosage (0–2 numbers of effect allele) as main predictor. In addition, we included sex and five PCs reflecting genetic ancestry as additional covariates in the regression models. Analyses of the ADNI dataset were additionally corrected for genotyping array. Lastly, GWAS results for the EMIF-AD and ADNI were meta-analyzed using the inverse variance weighting (i.e., fixed-effect) model implemented in METAL [37].

In secondary analyses, we aimed to discover SNP effects exclusively found in one sex by running sex-specific GWAS. In the final model, we added a sex interaction term, representing the difference between the SNP effect in females vs males. In addition to single-variant analyses, we also estimated the aggregate effect of all SNPs within a protein-coding gene. These analyses were performed with MAGMA 1.08 [38] on the FUMA 1.3.7 platform with default settings [39]. FUMA was also used to select independent (R2 ≤ 0.6) genome-wide significant (p < 5*10−8) SNPs for further mediation analyses and to perform gene-set enrichment analyses on the GWAS top hits emerging for each analyzed PC. While the focus was on multivariate analyses using PCs as outcome, we also performed the same analyses with single biomarkers to compare both approaches.

We further estimated the variance explained by loci found to be genome-wide significant in the main-effect analyses. We computed the difference in R2 between a full model including the tested SNP and a baseline model with covariates only. We additionally estimated the local heritability of the top loci using LAVA 0.1.0 [40]. We defined the region of interest as 10 KB down or upstream of the gene start/end or SNPs, or in the case of APOE, as previously defined [41] (Additional file 1: Table S4). Single-SNP, gene-based and local heritability analyses were repeated in Knight-ADRC for the novel PC4 and PC5 loci showing genome-wide significance in the main discovery analyses to assess replication.

Mediation analyses

Independent SNPs, which showed genome-wide significant association in any of the GWAS were further tested for mediation effects. Specifically, we examined whether these SNPs would affect latent AD via their influence on biomarker levels. To test this hypothesis, we applied a SEM to each SNP. In this SEM, the genetic variant predicts all biomarker PCs, as well as latent AD directly. The biomarker PCs in turn also predict latent AD. See Fig. 1 for a path diagram. Sex, age, five genetic ancestry components, and study were predictors of both CSF biomarkers and latent AD, thus all mediation and direct pathways were statistically adjusted for these potential confounders. Paths from SNPs to a biomarker were multiplied with the path from biomarker to latent AD to obtain the mediation effect for that particular PC. We also summed all mediation pathways to obtain a total mediation estimate elicited by any biomarker. This biomarker mediation estimate was further summed with the direct effect to estimate the total effect. To aid the interpretation of the mediation magnitude, we also provide an estimate of the proportion of the mediated effect (i.e., biomarker mediation/total effect). However, this was only possible, when mediated and total effects pointed in the same direction. Additional context to the total effect is afforded by providing the variants’ effects on AD based on a large and independent previous case-control GWAS [42].

Fig. 1
figure 1

Path model of main findings. This path model summarizes the main-effect mediation model. Circles indicate principal components or latent variables, rectangles represent observed variables. Arrows either indicate PC loadings or structural regression paths. Thicker lines correspond to stronger loadings, solid structural paths are genome-wide significant (p < 5 × 10−8), and dashed lines are suggestive (p < 0.008). Coefficients indicate either the effect of one effect allele on a biomarker PC in SD, or the effect of one SD higher biomarker PC score on latent AD in SD. Note: all paths are adjusted for assessment age, sex, genetic ancestry, and study but are omitted from figure

As some SNPs showed sex-dependent effects, we also ran a moderated mediation model to account for sex-specific mediation. This was achieved by adding a product term between SNP dosage and dummy variable for sex (female = 1, male = 0), and adding this interaction term as predictor of biomarker PCs and latent AD. If no mediation pathway differed nominally (p≥ 0.05) between sexes, main mediation model results are presented. Otherwise, male- and female-specific mediation estimates are provided, as estimated by the moderated mediation model. All mediation analyses were estimated with WLSMV in lavaan [34].

Comparison to rare variant results

As outlined in the introduction, we previously identified several rare variant associations using the same biomarker PCA approach in a subset of the EMIF-AD individuals analyzed here. Specifically, this pertains to associations between IFFO1, DTNB, NLRC3, and SLC22A10 and the injury/inflammation component (PC3), as well as between GABBR2 and CASZand the non-AD synaptic functioning component (PC5) [3]. Here we examined, whether common variants—in addition to the rare variants already identified—in these genes also show associations with the CSF biomarker PCs, using both single-variant and gene-based tests as outlined above.

Functional characterization and related phenotypes

SNPs were annotated to genes using default settings of FUMA, which relies on positional and functional information. Given the broad association signal between the chromosomal region encompassing GRIN2D and non-AD synaptic functioning (PC5), we additionally performed fine-mapping analyses to identify the most likely causal genes at this locus. To this end, we used a TWAS fine-mapping approach [43] as implemented in FOCUS 0.802 (https://github.com/bogdanlab/focus).

Briefly, we examined the association between predicted gene expression at the GRIN2D locus with non-AD synaptic functioning (PC5). The GRIN2D locus was defined as encompassing chr19:48382575–48466980 (GRCh38/hg38), starting 10 kb downstream of KDELR1 and 10 kb upstream of KCNJ14to cover all lead variants within the locus. Prediction weights were previously computed based on expression in the dorsolateral prefrontal cortex of ROS/MAP participants with AD, MCI, or healthy controls as obtained from the AMP-AD RNAseq Harmonization Study [44, 45], and processed by Bellenguez et al. [46] We chose this dataset as it includes a relatively large number of samples of a relevant tissue in a population similar to the current study. Prediction quality of GRIN2D was sufficient in this dataset, based on significant variance explained as estimated with cross-validation (5.2%, p = 3.6 × 10−8). In total, weights for 13,420 transcripts with nominally significant heritability estimates were available. We next associated the predicted transcripts with non-AD synaptic functioning (PC5) and consequently computed the posterior inclusion probability (PIP), resulting in a credible gene set. We only considered genes as likely causal candidates, if they showed an association with the outcome on a transcriptome-wide significance level (p < 0.05/13,420 < 3.7 × 10−6; |Z|> 4.6).

To further study the impact of GRIN2Don related phenotypes, we tested the predicted gene expression for association with risk for AD [42, 46], low educational attainment [47], low cognitive ability [48], and risk for major depressive disorder [49] using summary statistics of independent, recent GWAS. Finally, we also attempted to compute local genetic correlations between PCs and related phenotypes. However, due to low sample sizes, this was only possible for TMEM106B and the genetic correlation between injury/inflammation (PC3) and AD. Similarly, we were not able to investigate genetic correlations between sexes given the even lower sample sizes in these subsets.

Multiple testing adjustment

To strike a balance between reliable inference and power, we present our findings as primary, secondary, and tertiary results. The primary analyses in this study were the GWAS in the full dataset independent of sex. For SNP-based tests, we apply the conventional genome-wide association threshold of p < 5 × 10−8, and for gene-based tests, we used Bonferroni’s method to adjust for 19,511 genes resulting in a threshold of p < 2.3 × 10−6, as recommended by FUMA. The sex-specific analyses present additional tests of related (and non-independent) hypotheses, and, thus should be regarded as secondary and more exploratory analyses. For the mediation analyses, we applied an alpha of 0.05/6 = p < 0.0083, adjusting for six potential mediation or direct pathways. See Additional file 2: Supplementary Methods for full description of the multiple testing adjustment strategy.

Results

CSF biomarker PCs

Analogous to our previous study [3], the six AD CSF biomarkers tested here could be combined into five consistent components across datasets and analytical subsamples. The PC structure was very similar across the EMIF-AD and ADNI datasets (Additional file 1: Table S5); therefore, all subsequent PC analyses were based on a combined discovery sample to maximize sample size and reduce study heterogeneity. In this study, we extended our analyses to examine whether the CSF biomarker PCs’ loadings, their mean levels, or associations with latent AD differ by sex. Generally, PC loadings were consistent across males and females (Table 1). NfL loaded 0.06 higher on tau pathology/degeneration in females when compared to males (0.25 vs 0.19). A similar pattern was observed for pTau, which loaded 0.07 higher on Injury/Inflammation (0.18 vs 0.11) in women vs men. All other loading differences were below 0.04 and therefore classified as “indifferent” between sexes. Based on these observations, we used the common loadings as estimated across both sexes for further analyses.

Table 1 PCA loadings. Component loadings of each biomarker (first column) on five principal components (column groups two to six) are displayed

While the component structure was very similar between sexes in general, we observed differences in mean levels. When adjusting for age, diagnostic status, and study, females showed 0.21SD (SE = 0.06, p = 6.3 × 10−4) higher scores on non-AD synaptic functioning. In contrast, injury/inflammation was −0.40SD (SE = 0.05, p = 2.7 × 10−13) lower in females. Tau pathology/degeneration, Aβ pathology, and non-AD inflammation did not show differences in mean levels across sexes when accounting for multiple testing (p < 0.05/5 = 0.01).

Finally, we examined the association of the biomarker PCs with latent AD, including potential sex interactions (Additional file 1: Table S6). 1SD higher levels in tau pathology/degeneration or injury/inflammation were associated with 0.41SD and 0.40SD higher latent AD levels in males. Females had a stronger association with 0.43SD and 0.44SD, respectively, but the difference was not significant (p > 0.5). Higher brain Aβ accumulation is reflected in lower CSF Aβ values; therefore, higher Aβ pathology scores were associated with lower AD occurrence in both males and females (β =  − 0.34SD and β =  − 0.45), although the difference between sexes was not significant (p = 0.13). Non-AD inflammation and non-AD synaptic functioning did not significantly associate with latent AD, when adjusting for five tests (i.e., all p > 0.01). Due to lack of evidence for sex-differential associations, all subsequent analyses are performed under the assumption that associations between PCs and latent AD are invariant across sexes.

GWAS

In our GWAS analyses, we tested 7,433,949 autosomal and X-linked SNPs. For gene-based tests, we assessed 19,511 protein-coding genes. For all outcomes and analyses, lambda was below 1.05 and QQ-plots showed no evidence of noteworthy genome-wide inflation (Additional file 2: Fig. S2). GWAS results are visualized as Manhattan plots for main (Fig. 2), sex stratified (Fig. 3), sex interaction analyses (Additional file 2: Fig. S3), and gene-based tests (Additional file 2: Fig. S4). Results of independent SNPs showing genome-wide significance are summarized in Table 2. Their effect on AD risk by mediation analysis using CSF biomarker PCs are displayed in Table 3. Finally, gene-based results are depicted in Table 4. We had good to excellent power to detect SNPs with a βof 0.30 and a minor allele frequency (MAF) of at least 10% [50]. For less common SNPs (< 5%), we only had power to detect SNPs with a β of 0.65 or higher (Additional file 2: Fig. S5).

Fig. 2
figure 2

Manhattan plot (main-effect model). Results from GWAS of five CSF biomarker PC across both sexes. Each row represents a different PC as outcome. X-axis represents each chromosome and the y-axis the p-value of the SNP association with the outcome on a −log10 scale. All analyses were adjusted for sex, genetic ancestry, and SNP array. Red line indicates genome-wide significance threshold (p = 5 × 10−8). Yellow line indicates suggestive threshold (p = 1 × 10−6). Vertical lines point towards genome-wide significant loci based on any model. P-values below 1 × 10−10 were winsorized to 1 × 10−10

Fig. 3
figure 3

Manhattan plot (sex stratified). Results from GWAS of five CSF biomarker PC for males and females separately. Each row represents a different PC as outcome. Per outcome, results for males are depicted at the bottom and for females at the top. X-axis represents each chromosome and the y-axis the p-value of the SNP association with the outcome on a −log10 scale. All analyses were adjusted for genetic ancestry and SNP array. Red line indicates genome-wide significance threshold (p = 5 × 10−8). Yellow line indicates suggestive threshold (p = 1 × 10−6). Vertical lines point towards genome-wide significant loci based on any model. P-values below 1 × 10−10 were winsorized to 1 × 10−10

Fig. 4
figure 4

Regional plots for TMEM106B, CHI3L1, and GRIN2D. Each plot displays the p-values of SNP associations in either TMEM106B, CHI3L1, or GRIN2D loci. Statistics are derived from two studies. Orange dots represent p-values of association with biomarker PCs estimated in this study and back dots represent p-values of association with AD, as estimated in a separate GWAS on 1 million participants (Wightman et al. [42]). Regional plots were created with snpxplorer.net

Table 2 Independent genome-wide significant loci. Independent SNPs, which had genome-wide significant associations with a Biomarker PC in either main effect, sex-stratified or sex interaction models. See also Supplementary Tables for fine-mapping and replication analyses of the PC5 signal near GRIN2D
Table 3 Mediation models. Results from structural equation models testing the hypothesis, that SNP effects on CSF biomarker PCs would also affect AD risk
Table 4 Results of gene-based tests. Results from gene-based tests with MAGMA representing the joint effects of common SNPs within the named gene (NB: for PC1 no genome-wide significant gene-based results were observed)

Main analyses

In the main analyses, none of the SNPs showed genome-wide significant association with the tau pathology/degeneration component (PC1). In contrast, seven independent SNPs in the APOE locus showed genome-wide significant associations with the Aβ pathology component (PC2). Specifically, the C allele of the lead SNP rs429358 (which is also known as the ε4-allele) was associated with −0.50SD lower PC scores (SE = 0.04, p = 1.3 × 10−29, MAF = 0.30).

Regarding the injury/inflammation component (PC3), transmembrane protein 106B (TMEM106B), tagged by lead intronic SNP rs2302634, showed strong and genome-wide significant associations: the T allele at this variant predicted +0.26SD (SE = 0.04, p = 1.3 × 10−9, MAF = 0.42) higher injury/inflammation scores.

In case of the non-AD inflammation component (PC4), two loci reached genome-wide significance: chitinase 3 like 1 (CHI3L1) on chromosome 1 and a new region on chromosome 9p21.3 with lead SNP rs145791381 (located in an intergenic region). CHI3L1 encodes the YKL-40 protein and two independent SNPs in or near this gene showed genome-wide significant associations with the non-AD inflammation PC: the strongest effect was observed for intronic variant rs7551263 (T allele: β =  − 0.39SD, SE = 0.05, p = 5.7 × 10−17, MAF = 0.16; Fig. 4). The second signal in CHI3L1 was elicited by SNP rs10399931 located < 160 bp upstream of CHI3L1 (T allele: β =  − 0.33SD, SE = 0.06, p = 6.0 × 10−9, MAF = 0.24). Both SNPs are independent (r2 = 0.04 in our study and D’ = 0.71 in European population [39]) and therefore probably represent two separate signals. In a previous work [13], both SNPs were directly associated with CSF YKL-40 levels and can therefore be considered cis pQTLs (i.e., protein quantitative trait loci) of this protein.

Another SNP reaching genome-wide significance in the main-effect analyses was the intergenic SNP rs145791381 on chr. 9p21.3. The T allele of this variant was associated with lower scores on the non-AD inflammation component (PC4, β =  − 1.01SD, SE = 0.18, p = 6.0 × 10−9, MAF = 0.02). Despite the strength of the association in the primary GWAS, this potential association signal was elicited by a “singleton variant” (i.e., there was no “trail” of correlated variants showing similar association evidence). The missing trail of correlated variants may be the result of this variant’s low minor allele frequency which substantially reduces the number of variants in strong LD in this particular region. Given the fact that this variant was not associated with the non-AD inflammation PC4 in Knight-ADRC (β = 0.07SD, SE = 0.35, p = 0.84), we have put no further emphasis on this finding throughout the remainder of the manuscript.

Lastly, no single SNP showed genome-wide significant association with the non-AD synaptic functioning component (PC5). However, this PC showed evidence for genome-wide significant association in the gene-based GWAS analyses highlighting the glutamate receptor gene GRIN2D, based on aggregated test statistics across 110 SNPs located between 19:48,898,132 and 19:48,948,188 (Z =  + 4.63, p = 1.8 × 10−6) (Table 4). The lead SNP within this gene was rs8111684 in the 3’ UTR region of the GRIN2D gene (β =  + 0.19SD, SE = 0.05, p = 9.8 × 10−5, MAF = 0.26). Interestingly, the genomic regions flanking the gene both p-ter and q-ter showed more significant single SNP associations. As these variants are located outside of the gene, they were not considered in the gene-based tests (Fig. 4). Notwithstanding, we included two additional SNPs from this adjacent region for further characterization: rs275844 and rs3107911. The strongest single-SNP association in the locus was elicited by rs3107911, located in the intron of the gene KCNJ14 (β =  − 0.31SD, SE = 0.06, p = 5 × 10−7, MAF = 0.15) and approx. 16 kb q-ter of GRIN2D (Fig. 4). The lead variant explained 2.2% of the variance and the GRIN2D locus as a whole explained 3% (p = 0.009) of the variance in non-AD synaptic functioning (PC5). See Additional file 1: Table S4 for local heritability and estimates of the variance explained for all loci.

Comparison of the PCA-based GWAS results with GWAS analyses run on individual biomarkers revealed that the PCA phenotype led to a higher statistical support (i.e., smaller P-values) in about half (i.e., 10 out of 21) of the top hits of the main-effect analyses (Additional file 1: Table S7). This pattern was observed for 4 of the 5 computed PCs, i.e., all but PC2 (Aβ pathology). Only for this latter PC all (n = 8) single biomarker results consistently showed stronger statistical support than the PCA-based GWAS. In contrast, comparing results for non-AD inflammation (PC4) vs. single biomarker analyses using YKL-40 levels, the associations tended to be much stronger for the PCA-based analyses. These analyses support our general hypothesis that combining single biomarkers by PCA can increase power (and perhaps specificity) in the context of genetic association analyses, but this gain in power appears to be outcome-dependent.

Lastly, we compared the results of our primary GWAS meta-analyses which were computed using fixed-effect models to analyses using random effects models (Additional file 1: Table S8). This comparison revealed that in all but one (i.e., rs12670437) analyses the statistical support of our top GWAS findings was highly comparable, suggesting that the choice of statistical model for the meta-analyses does not appreciably change our top GWAS findings.

Post-GWAS analyses on main-effect results

Our main efforts in the post-GWAS analyses concentrated on the further characterization and replication of the novel association between non-AD synaptic functioning and markers in the GRIN2D region. These analyses yielded independent support of the involvement of GRIN2D in AD in many but not all analyses. First, our original GWAS was performed in two independent datasets (EMIF-AD and ADNI) and we note that the GRIN2D signal is quite consistent in both datasets (both in terms of effect size and statistical support; Additional file 1: Table S9a). Second, TWAS analyses combining large eQTL (i.e., brain from ROS/MAP) and GWAS summary statistics on AD risk and related brain phenotypes, support our conclusion of a relevant association signal in the GRIN2D region and suggest that the molecular effects may be mediated by affecting the expression on GRIN2D (see section “ Functional characterization and related phenotypes”, Additional file 1: Table S10). Third, while analyses in the Knight-ADRC dataset revealed no strong association signals between markers in GRIN2D and PC5 (Additional file 1: Table S9b), we note that three out of seven top GRIN2D SNPs from our primary analyses (i.e., all showing P < 1 × 10−5 in our GWAS) were not available in the Knight-ADRC data and could hence not be tested. Although these (and other) top markers were lacking, we were able to assess local heritability metrics in this region and observed evidence that the remaining markers explain a significant portion of PC5 variance (6.3%, p = 0.0002) in the Knight-ADRC data (Additional file 1: Table S4). Furthermore, inspection of all available 228 SNPs in the Knight-ADRC dataset in the general GRIN2D region (i.e., within the 100-kb interval from chr19: 48,365,921–48,465,818) revealed the strongest association between exons 7 and 12 of the GRIN2D gene, but no other gene in the region (best SNP: rs74459994; P-value = 1.6 × 10−3; Additional file 1: Table S9c). Furthermore, genetic effects in the Knight-ADRC individuals showed a trend in the same direction as in EMIF/ADNI with a genetic correlation of 0.42 (95% CI [− 0.46; 1.00], p = 0.30) between discovery and replication. While the wide confidence interval does not allow for generalized statements about the consistency of effects, the positive genetic correlation provides descriptively more context to local heritability estimates being the consequence of genetic effects in the same direction in both discovery and replication.

In summary, our extensive replication analyses provide considerable—but not unequivocal—independent support for a significant association with markers in the GRIN2D region and several AD-relevant phenotypes. Notwithstanding the supporting evidence and its compelling functional candidacy, the GRIN2D association should be considered preliminary until it is more fully characterized in future work.

Sex-specific effects

Several additional SNPs showed genome-wide significance only in the male or female subsamples (genome-wide significant sex-stratified p-value (p)), or a sex interaction effect (genome-wide significant interaction term p-value (pint)). For instance, rs114211800, an intronic variant of the non-coding RNA gene LOC105377684 on chromosome 5q33.3 showed strong association with the injury/inflammation component (PC3) in males (β =  − 1.23SD, SE = 0.22, p = 1.7 × 10−8), but not in females (β =  − 0.11SD, SE = 0.26, p = 0.69, pint = 0.002, MAF = 0.02). Similarly, the intergenic SNP rs12670437 (chromosome 7q11.23) was strongly associated with non-AD inflammation in male participants (β =  − 1.20SD, SE = 0.22, p = 2.4 × 10−8), but not in females (β =  + 0.20SD, SE = 0.21, p = 0.33, pint = 0.0001, MAF = 0.02). Vice versa, rs140169162, located in an intron of the MYO1D gene on chromosome 17q11.2, showed highly specific effects on the component capturing tau pathology/degeneration (PC1) in females (β =  + 1.61SD, SE = 0.30, p = 4.9 × 10−8), but not in males (β =  − 0.02SD, SE = 0.26, p = 0.94, pint = 0.0003, MAF = 0.02). Likewise, for the component tagging non-AD inflammation (PC4), rs16974493 (intergenic, chr. 13q33.3) and rs150326618 (intergenic, chr. 14q22.1) were only genome-wide significant in females (rs16974493: β =  − 0.97SD, SE = 0.17, p = 2.2 × 10−8; rs150326618: β =  + 1.39SD, SE = 0.25, p = 2.6 × 10−8), but not in males (rs16974493: β =  + 0.18SD, SE = 0.22, p = 0.42, pint = 0.005, MAF = 0.03; rs150326618: β =  + 0.09SD, SE = 0.24, p = 0.72, pint = 0.001, MAF = 0.02).

In addition to these sex-specific results, the sex interaction models revealed two SNPs eliciting significant evidence for sex interaction reflecting their opposite effects in males vs. females: rs1638675 (intergenic, chr. 10q25.3) and rs56194026 (approx. 2 kb upstream of ARID3B on chr. 15q24.1). For SNP rs1638675, the A allele showed a negative association with Aβ pathology in males (β =  − 0.33SD, SE = 0.07, p = 1.0 × 10−6), but positive in females (β =  + 0.18SD, SE = 0.06, p = 4.2 × 10−3, pint = 3.5 × 10−8, MAF = 0.43). In case of rs56194026, the association was sex-dependent for the injury/inflammation component (PC3) where the C allele showed a negative effect in males (β =  − 0.50SD, SE = 0.12, p = 1.6 × 10−5), but a positive effect in females (β =  + 0.41SD, SE = 0.12, p = 5.3 × 10−4, pint = 4.7 × 10−8, MAF = 0.08).

Finally, we highlight a suggestive sex difference for rs2302634 in TMEM106B. The effect of this SNP on the component capturing injury/inflammation (PC3) was approximately twice as large in males compared to females (βmale =  + 0.34SD vs βfemale =  + 0.18SD), although this difference did not attain statistical significance (p = 0.06).

Mediation analyses

Our main GWAS main analyses identified several loci showing highly significant association with the biomarker PCs defined for this study. As three of the five biomarker PCs are independently associated with diagnostic status, this raises the questions as to whether the SNP effects on PC levels also significantly impact AD development. Overall, we observed two distinct mediation patterns: (1) SNPs that affect AD either via alteration in both the Aβ pathology and tau pathology/degeneration components (APOE) or (2) SNPs that affect AD via the injury/inflammation PC only (TMEM106B and CHI3L1).

In the case of APOE, the rs429358 ε4 allele was associated with a 0.39SD higher latent AD score (SE = 0.06, p = 4.6 × 10−12). The mediation model suggests that this adverse effect can be partitioned into three pathways: (1) Approximately one third is attributable to mediation via the Aβ pathology component (PC2), (2) nearly one quarter of the SNP effects were due to mediation via the tau pathology/neurodegeneration component (PC1), while (3) the remaining almost 50% were due to pathways not represented by any of the measured biomarker PCs (Table 3).

The mediation analyses also suggested that increases in injury/inflammation scores (PC3) due to the T allele in SNP rs2302634 (TMEM106B) resulted in a significant increase of latent AD (β =  + 0.07SD, SE = 0.02, p = 1.2 × 10−5). Furthermore, we did not find evidence that TMEM106B affects AD by any other pathway, either measured or unmeasured. The positive mediation effect is consistent with the strong positive total AD risk effect for rs2302634 (+ Z = 5.38, p = 7.3 × 10−8), and overall genome-wide significant association with AD risk recently described by two GWAS [42, 46]. The local genetic correlation between injury/inflammation scores (PC3) and AD was 0.92 (95% CI 0.50–1.00, p = 0.001) and 0.98 (95% CI [0.60–1.00], p = 0.0004) based on summary statistics from the same GWASs. See Fig. 4 for a regional plot visualizing associations between TMEM106B with the biomarker PC and previously reported associations with AD.

The SNP rs7551263 in the intron of CHI3L1 was primarily associated with the component capturing non-AD inflammation (PC4). As this PC did not correlate with latent AD, we also found no evidence for mediation of AD risk via this pathway. However, rs7551263 was also nominally associated with the injury/inflammation component (PC3) (T allele: β =  + 0.21SD, SE = 0.06, p = 2.3 × 10−4) and showed evidence for mediation through this pathway (β =  + 0.05SD, SE = 0.02, p = 4.0 × 10−4). Interestingly, the T allele was negatively associated with non-AD inflammation (PC4), so our results suggest that while the T allele decreases levels of non-AD inflammation biomarker profiles this has no measurable protective effect on latent AD. At the same time, this allele significantly increases injury/inflammation profiles, which results in a significantly higher AD risk.

Comparison to rare variant results

In contrast to our previous work based on WES-derived rare variants in a subset of the EMIF-AD dataset analyzed here [3], we found no evidence for an association between the analogous CSF biomarker components and common variants in the genes previously highlighted (i.e., IFFO1, DTNB, NLRC3, SLC22A10, GABBR2, and CASZ). Similarly, combining common SNP effects in gene-based tests did not reveal any significant associations at these loci either (Table 4). Together, these results suggest that common variants (MAF ≥ 0.01) do not appreciably contribute to the rare variant association signals identified earlier by our group.

Functional characterization and related phenotypes

The fine-mapping analyses on the lead PC5 association identified three genes through which SNPs in the GRIN2D locus may affect non-AD synaptic functioning (PC5). Specifically, results of the TWAS analyses suggest that GRIN2D SNPs lower synaptic functioning (i.e., lead to higher non-AD synaptic functioning PC5 scores) by decreasing expression of ZSWIM9 (Z =  − 5.57, p = 1.3 × 10−8) and GRIN2D (Z =  − 5.03, p = 1.5 × 10−7), as well as by increasing the expression of SLC17A7 (Z =  + 4.74, p = 1.1 × 10−6). The same expression profile, i.e., a predicted decrease of GRIN2D and ZSWIM9 levels but increased SLC17A7 levels, was also associated with increased risk for AD, lower educational attainment, lower cognitive ability, and higher risk of major depressive disorder across several independent studies (Additional file 1: Table S10). The statistically most robust associations were observed for decreased levels of GRIN2D and AD, as well as increased ZSWIM9 / decreased SLC17A7 levels and educational attainment (Additional file 1: Table S10).

Finally, we used the PCA-based GWAS results as input for gene-set enrichment analyses as implemented in FUMA. While these revealed a few interesting pathways and biological processes of potential relevance to AD (e.g., “regulation of macrophage chemotaxis” for PC4 [non-AD inflammation], “negative regulation of glial cell proliferation” & “neuronal differentiation” for PC3 [injury/inflammation], and “regulation of lipid transport” for PC2 [Aβ Pathology]), we note that only one (“protein localization to ciliary transition zone” [PC2]) remained significant after multiple testing correction by FDR (Additional file 1: Table S11).

Discussion

In this work, we comprehensively explored the influence of common variants on multivariate combinations of AD CSF biomarkers representing different disease processes. In addition to confirming several previously reported GWAS loci, we identified one new region (containing GRIN2D and other plausible candidate genes) showing strong association with synaptic functioning in an elderly population. Furthermore, our results provide evidence for the presence of numerous loci with sex-specific effects.

Arguably the most interesting finding of our main GWAS analyses is the discovery of genome-wide significant gene-based association with variants in GRIN2D on chromosome 19q13.33 and the non-AD synaptic functioning component (PC5), mainly driven by Ng levels. Interestingly, SNPs in the chromosomal regions immediately flanking GRIN2D showed an even stronger association with non-AD synaptic functioning than variants within GRIN2D itself, possibly suggesting that gene expression rather than gene (dys)function may be the lead mechanism underlying this potential association, a notion that is also supported by our TWAS findings. Based on predicted gene expression by TWAS, the results suggest that the PC5-associated SNPs may lower synaptic functioning by decreasing levels of GRIN2D, but they may also affect other nearby genes ZSWIM9 (decreased expression) and SLC17A7 (increased expression).

GRIN2Dencodes the GluN2D subunit of the glutamate receptor NMDAR, which plays an important role in learning and memory [51]. While mutations in GRIN2Dhave been reported to cause epileptic encephalopathy [51], this gene’s role in AD and other traits is less clear. While some recent data suggest that GRIN2DmRNA expression is lower in the temporal cortex of AD cases according to the AMP-AD project [52], there are no strong GWAS-based association signals reported in this region of chromosome 19 and relevant cognitive traits in the GWAS catalog (https://www.ebi.ac.uk/gwas/regions/chr19:48361628-48476971), except for an association with self-reported mathematical ability. Look-up of our two lead variants in the GRIN2D region (i.e., rs275844 and rs3107911) in summary statistics of two recent AD GWAS suggest a weak association for rs275844 in only one of the GWAS (p = 0.03) [42]. Our mediation analyses did not identify significant effects on latent AD, either, suggesting no or weak association of the locus with AD risk. In contrast, the association evidence between predicted gene expression and brain-related outcomes by TWAS was very consistent. Risk for AD, low educational attainment, low cognitive ability and risk for major depressive disorder were all related to lower GRIN2D expression (and the expression of two other nearby genes).

Besides implying the GRIN2D locus in synaptic functioning, the current study also provides further insights into how other known AD loci may affect disease risk. This was made possible by our multivariate approach which allowed for the quantification and disentanglement of different mediator mechanisms. Specifically, in the mediation analyses, we observed two CSF biomarker profiles associated with AD, which are determined by two different gene sets. The first profile (PC1 and PC2) is characterized by decreased amyloid and increased tau, as well as increased Ng and YKL-40 levels, but not NfL. This component was most strongly associated with SNPs in the APOEregion, in particular the well-known AD risk variant ε4. Our data suggest that the association with this variant may increase AD risk by being a catalyst for amyloid deposition or as inhibitor of amyloid clearance, represented here by the Aβ pathology PC. The resulting amyloid aggregation is thought to cascade into several neurodegenerative processes, involving formation of tau tangles, loss of synaptic functioning and inflammation [53]. Astonishingly, our results suggest that combinations of Aβ, tau, Ng, and YKL-40 assessments are able to capture most of these neurodegenerative processes triggered by the APOE locus, as they mediated 54–79% of the APOESNP effects. However, it is important to note that the four CSF biomarkers captured by PC1 and PC2 are not sufficient to explain all genetic risk effects on AD. The second most relevant CSF biomarker pattern in that regard was the injury/inflammation component (PC3) represented by increased NfL and YKL-40 levels. PC3 levels are statistically independent of the changes in amyloid and tau levels (captured by PC1 and PC2), typically observed in AD, but associated with AD diagnostic status to a similar degree. Given prior knowledge of the non-specificity of NfL with respect to AD pathogenesis [6], we interpret this pattern to represent an independent non-AD-specific neurodegenerative pathway for dementias in general. Genetically, our results suggest that this pathway is not determined by variants in the APOE locus, but instead by variants in TMEM106B and potentially CHI3L1. TMEM106B affects neuronal loss [54] and has been convincingly associated with risk for at least two forms of dementias, i.e., fronto-temporal dementia (FTD) [54] and AD [13, 42, 55]. Here, the lead variant for TMEM106B is rs2302634, which is in perfect LD (r2= 1) with the lead variant from our recently published NfL-specific GWAS in the same two datasets (i.e., rs7797705) [13]. Both variants are in either perfect (r2 = 1) or near-perfect (r2 = 0.98) LD with the lead variant reported to be a risk factor for FTD, i.e., rs1990622, suggesting that the signal underlying PC3 (this study) and risk for FTD (previous work) are strongly correlated and may be elicited by the same underlying causal variant. CHI3L1 encodes the YKL-40 protein and we and others have previously demonstrated that common and possibly rare variants at this locus represent cispQTLs of CSF YKL-40 [3, 13]. The novel results from our current study suggest that CHI3L1 variants may be associated with increased neuronal injury and inflammation leading to a heightened AD risk. We note, however, that the associations between CHI3L1 and injury/inflammation (PC3) showed opposite effect directions from those between CHI3L1 and non-AD inflammation (PC4). Comparisons with univariate analyses of YKL-40 suggest, that the single biomarker analyses do not allow to distinguish between clinically relevant YKL-40 levels co-occurring with NfL and non-clinically relevant variation, and thus may mask these complex relationships. This may result in underestimated effect sizes and emphasizes the advantages of applying a multivariate approach. In addition, it is important to emphasize that the association signals elicited by the CHI3L1 variant for the mediation effect of PC3 on AD risk (i.e., rs7551263; Table 3) are not correlated (r2 = 0.04) with the lead CHI3L1 variants of the PC4 main effect (e.g., rs10399931; Table 2). Interestingly, both variants were highlighted as pQTL of YKL-40 in previous work from our group [13].

Additional insights resulted from the sex-stratified analyses which revealed several SNPs either showing associations in only one sex stratum, or opposite effects in males and females. As examples, we highlight two such SNPs: First, rs140169162 is located in an intron of MYO1D and showed strong association with tau pathology/neurodegeneration (PC1) with evidence for a mediation effect on latent AD, but only in the female subsample. Interestingly, SNPs in MYO1Dhas been found to have a female-specific effect on hernias, as well [56]. Despite its apparent sex-specificity previous work has nominated MYOD1as a potential drug target for AD according to predictive network analysis [57, 58]. The encoded protein, myogenic differentiation 1, is involved in myelin sheath formation [59] and both common [60] and rare [61] variants have been associated with autism, supporting MYO1D’s role in neural development and functioning.

Second, rs56194026 is located near ARID3Band was associated with injury/inflammation (PC3), with strong but opposite effects in males vs. females. The gene encodes AT-rich interaction domain 3B, a DNA-binding protein from the ARID family of proteins which are involved in embryonic patterning, cell lineage gene regulation, cell cycle control, transcriptional regulation, and possibly in chromatin structure modification [62]. Samyesudhas et al. [63] recently suggested a relevant role of this protein in AD development, as ARID3Bis expressed in response to the amyloid precursor protein intracellular domain and neuronal injury. However, it remains unclear why SNPs near this gene would have opposite effects in males and females. Possibly, this is related to the higher mean NfL levels in males, or the genes’ proposed function as regulator of sex-biased expression [64].

A major strength of our study is the application of multivariate analyses based on five CSF biomarker profiles and the estimation of mediation effects. Studying component patterns of different biomarker combinations allows to shed new light and provide new insights on how common genetic variants affect biomarkers and AD risk beyond their effects on the levels of single biomarkers. In the context of our study, PCA-based GWAS analyses showed stronger associations when compared to single biomarkers for all PC phenotypes except PC2. Further support for the multivariate approach stems from our results with YKL-40 and Ng which show different association patterns, depending on whether or not they co-vary with the levels of other biomarkers. The inclusion of the X-chromosome and examination of sex differences are additional strengths of our study. While no SNPs on the X-chromosome attained genome-wide significance, we identified several SNPs showing sex-specific effects. This highlights the importance of modeling sex interactions, especially for biomarkers with pronounced differences in mean levels.

In addition to these strengths, we note the following potential limitations. First, while our sample size is generally large for a CSF biomarker study, it is small compared to GWAS of other complex traits, including recent GWAS in the AD field [42, 46]. Second, the sample size limitation is aggravated in the sex-specific analyses and SNPs with low MAF, which need to be interpreted with caution and require further replication. Third, we only studied individuals of European descent. It remains unclear whether and to which degree our results are relevant also in non-European ancestries. It is well-known, for instance, that the APOErisk effects on AD are ancestry-dependent [65, 66]. Fourth, it is important to emphasize, that our mediation analyses are based on the assumption that the analyzed CSF biomarkers reflect pathological processes that precede and cause AD symptoms. An alternative—and often times equally plausible—interpretation is that the uncovered SNPs affect AD symptoms independently of biomarker levels and that the component associations observed here actually reflect a consequence of AD pathogenesis. While the specificity of the mediation results using certain PCs but not others generally supports the assumed causal directions, longitudinal studies, e.g., on MCI conversion, are needed to confirm the findings of this arm of our project. Fifth, we note that one of our lead signals, i.e., the association between the non-AD synaptic functioning PC (mainly driven by Ng levels) and variants in GRIN2D, only showed strong evidence for association in two out of three CSF biomarker datasets. While additional analyses (e.g., TWAS using several AD-relevant phenotypes, local heritability estimates) generally support the GRIN2D association, this signal should be considered preliminary until assessed in additional datasets of sufficient size. Sixth, our primary GWAS analyses were performed with five independent CSF biomarker phenotypes. In agreement with common practice in the GWAS field [67,68,69], we did not adjust for testing multiple traits in our study, but define genome-wide significance at the conventional threshold of 5 × 10−8. Accounting for all five traits would theoretically lead to a study-wide alpha level of 1 × 10−8, which is surpassed by all identified GWAS main effects except those at GRIN2D (for which the smallest SNP-based P-value is 5 × 10−7). Thus, the decision to refrain from adjusting for trait multiplicity does not change any of the main conclusions of our study.

Conclusions

In summary, in this first multivariate CSF biomarker GWAS, we observed at least one novel locus showing strong and convincing association with non-AD specific biomarker patterns. The results also suggest the presence of two distinct mediation pathways, by which common SNPs may affect AD risk. One pathway is related to amyloid and tau pathology and is mostly determined by APOE SNPs. The second pathway is related to increased neuronal injury and inflammation, captured by NfL and YKL-40. Genetically, this latter pathway is mostly driven by variants in TMEM106B and CHI3L1. Pathway-aware genetic studies with larger sample sizes and in more diverse ancestries are needed to further understand the complex etiology of AD and to translate genetic information to personalized medicine approaches.