Main

Genome-wide association studies (GWAS) have elucidated loci underpinning human complex traits, including autoimmune1,2 and infection-related traits3,4. However, the mechanisms by which risk alleles act remain unclear. One hypothesis is that they influence gene regulation. Mapping expression quantitative trait loci (eQTLs) links genetic variants to gene expression changes, aiding GWAS interpretation. Efforts such as GTEx5 and DICE6 have identified thousands of eQTLs using aggregated expression profiles (bulk RNA) from healthy people. However, eQTLs can have different effects in specific contexts, for example, in different cell states or physiologic conditions. Less than 10% of eQTLs found at homeostasis colocalize with GWAS loci7, suggesting that disease-associated regulation may emerge only in specific cellular contexts such as those critical for disease initiation8.

One way to understand context-dependent gene regulation is by studying reQTLs—eQTLs for which the magnitude of the effect on gene expression changes after environmental perturbation (P). reQTL studies induce a perturbation (for example, external stimulation) in a subset of samples and compare the genotype–expression associations between conditions to identify reQTLs. This approach is critical to understand how genetic variation may act in disease-specific conditions often missed in typical eQTL studies9,10,11,12. However, in practice, ex vivo perturbations affect cells heterogeneously13,14,15,16,17. Current bulk RNA-based reQTL models cannot account for heterogeneous responses across cells. Accounting for this heterogeneity in statistical models may improve power to detect reQTLs.

We and others have recently demonstrated that mapping eQTLs at single-cell resolution can uncover heterogeneous eQTL effects along continuous cell states18,19,20. However, reQTL models typically only offer a single format to represent perturbation response21,22, which neglects their potential diversity. We hypothesize that effective modeling of per-cell perturbation state using linear models may enhance power to detect response eQTLs.

Here we show that modeling eQTL effects in single cells using a continuous perturbation score increases power to detect response eQTLs compared to binary-perturbation-state models. Our goal is not to define new eQTLs, but rather to identify those that change in response to an experimental condition. It can be applied to a previously identified set of eQTLs or to discover new reQTLs. Furthermore, we show that it can help to prioritize genetic effects in disease-relevant contexts. This builds on recent advances demonstrating the potential for single-cell data in understanding eQTLs18,19,20,23.

Results

General framework

Our reQTL mapping framework is depicted in Fig. 1a and explained in detail in Methods. To define a continuous perturbation score, we used a penalized logistic regression with corrected expression principal components (PCs) (hPCs) as independent variables to predict the log odds of being in the pool of perturbed (P) cells, which serves as a surrogate of the cell’s degree of response to the experimental perturbation. Then, to find response eQTLs, we used a Poisson mixed effects model (PME) of gene expression in single cells as a function of genotype and genotype interactions with a discrete perturbation term and the independent effect of a continuous perturbation score term (Methods).

Fig. 1: Framework to identify response eQTLs at single-cell resolution.
figure 1

a, Cells from each donor are collected and divided into the NP and P conditions. The transcriptional profile of each cell is then ascertained via scRNA-seq. To identify reQTLs, we used a Poisson mixed effects model of gene expression in single cells (Methods). b, The continuous perturbation score is based on penalized logistic regression that predicts discrete perturbation state based on the corrected expression PCs. This perturbation score can be used as a surrogate of the continuous response state of each cell. The plot shows the regression coefficients of the penalized logistic regression in the IAV perturbation. c, Histogram showing the distribution of the IAV perturbation score color by the experimental condition. df, Scatter plots showing correlation between the IAV perturbation score and the expression of top three correlated genes colored by cell density. Genes (encoded proteins): ISG15 (ISG15 ubiquitin-like modifier) (d), IFI6 (interferon-alpha-inducible protein 6) (e); IFIT3 (interferon-induced protein with tetratricopeptide repeats 3) (f).

Quantifying perturbation states

We applied our framework to study the effect of ex vivo stimulation with influenza A virus (IAV), Candida albicans (CA), Pseudomonas aeruginosa (PA) and Mycobacterium tuberculosis (MTB) on gene regulation. We used transcriptional profiles from 299,879 peripheral blood mononuclear cells (PBMCs) from 89 donors24 and 475,333 PBMCs from 120 donors22 (Methods). We expected that each perturbation may have different effects on cell states, so we defined perturbation scores independently for each stimulation condition (Fig. 1b,c and Extended Data Fig. 1a–g). We discuss the effect of the number of cells on the estimation of the perturbation score and cell-type-specific perturbation scores in the Supplementary Note and Supplementary Figs. 15. Based on those analyses, we decided to use the PBMC-based perturbation score for all downstream analyses.

We found that the perturbation score reflects the transcriptional heterogeneity of the response to perturbation. For example, higher IAV score correlates with the increased expression of genes such as ISG15 (Pearson r = 0.81; Fig. 1d), IFI6 (Pearson r = 0.81; Fig. 1e), and IFIT3 (Pearson r = 0.79; Fig. 1f), and these top correlated genes were enriched in the ‘interferon-alpha response’ Molecular Signatures database (MSigDB) gene set25,26 (gene set enrichment analysis27 (GSEA) P adjusted = 0.003; Supplementary Table 1 and Extended Data Fig. 2; Methods). This is consistent with the well-characterized role of the type I interferon response during RNA–virus infection28. Details on other perturbation scores are in Fig. 2, Supplementary Table 1 and Extended Data Fig. 2. In addition, the ability of the score to capture the cellular heterogeneity of the response to perturbation is described in the Supplementary Note and Supplementary Fig. 6. We also explore the impact of low sequencing coverage and the similarity between a data-driven IAV perturbation score and a predefined interferon-alpha score in the Supplementary Note and Supplementary Figs. 7 and 8.

Mapping reQTLs

We prioritized 1,892 eQTLs for IAV, 1,863 eQTLs for CA, 1,794 eQTLs for PA and 1,355 eQTLs for MTB datasets (Extended Data Fig. 3; Methods). Next, we tested which of these eQTL effects varied in effect size after perturbation by modeling gene expression in single cells as a function of genotype (G), and the interaction with the discrete perturbation state (GxDiscrete) and the independent effect of the continuous perturbation state (GxScore) accounting for confounders and batch (Methods) using the PME model described in ref. 18. Our model also includes the top five expression PCs to control unwanted variation in gene expression such as cell-type heterogeneity; we found that the additional random effect for cell-type did not improve the results of the model (Supplementary Fig. 9). We assessed the significance of both GxDiscrete and GxScore terms compared to a null model with no interactions using a two d.f. likelihood ratio test (LRT), hereafter referred to as the 2df-model.

Fig. 2: Schematic of the perturbation score in experiments from the LLDEEP DAG2+ Project.
figure 2

a, We obtained genotype and scRNA-seq data from 120 participants in the LLDEEP DAG2+ project. A total of 112,362 cells from 104 donors that were subject to NP conditions were included in the presented analysis. Similarly, we included cells perturbed with CA, PA and MTB for 24 h. bm, Histograms showing the distribution of the perturbation score in the CA (b), PA (f) and MTB (j) perturbation experiments, and scatter plots showing the correlation of the perturbation score and the expression of top correlated genes colored by cell density in the CA (ce), PA (gi) and MTB (km) perturbation experiments. Genes (encoded proteins): STAT1 (signal transducer and activator of transcription 1) (c), GBP1 (guanylate binding protein 1) (d), GBP5 (guanylate binding protein 1) (e), SDC2 (syndecan 2) (g), MMP14 (matrix metallopeptidase 14) (h), CXCL1 (C-X-C motif chemokine ligand 1) (i), CXCL5 (C-X-C motif chemokine ligand 5) (k), CXCL1 (C-X-C motif chemokine ligand 1) (l), CXCL3 (C-X-C motif chemokine ligand 3) (m).

Using the 2df-model, we found 166 reQTLs in IAV, 770 in CA, 646 in MTB and 594 in PA (LRT Q value < 0.05; Fig. 3a–d and Supplementary Tables 25). We performed exhaustive quality control to minimize false positives (Extended Data Fig. 4 and Supplementary Fig. 10). Furthermore, we evaluated the degree of replication of previous results (Methods) and observed enrichment of significant interactions for CA (χ2 P = 2.2 × 10−178), PA (χ2 P = 1.96 × 10−137) and MTB (χ2 P = 4.62 × 10−143) (Extended Data Fig. 5). These results show that our findings are driven by the interaction (GxDiscrete and GxScore) effects.

Fig. 3: Mapping reQTLs in single cells under different perturbations.
figure 3

ad, Modeling perturbation as discrete and continuous cell states (2df-model) found more reQTLs compared to an alternative model that considers only the interaction with the discrete perturbation state (1df-discrete) in IAV (a), CA (b), MTB (c) and PA (d) perturbation experiments. Each bar represents the number of reQTLs that were significant (LRT Q < 0.05, permutation P < 0.05) in each response eQTL model. Q values correspond the following number of tests: 1,892 in IAV; 1,863 in CA; 1,794 in PA and 1,355 in MTB dataset. eh, Venn diagrams showing the overlapping of reQTLs found by each model in the full dataset. The 2df-model identified on average 89.6% of the reQTLs discovered by the 1df-discrete across the four perturbations. In addition, on average 36.9% of the reQTLs were found only by modeling the discrete and continuous perturbation state in the 2df-model. ip, Plots show the mean and s.d. from three replicates. il, Plots show the number of reQTLs that remained significant (LRT Q < 0.05) after randomly removing cells within each donor from the full dataset for each perturbation model. Median number of cells per donor in the full dataset: IAV, 2,229; CA, 2,107; PA, 1,874; MTB, 1,868. mp, Plots show the number of reQTLs that remained significant (LRT Q < 0.05) after randomly removing donors from the full dataset for each perturbation model. Median number of cells per donor in the full dataset: IAV, 2,229; CA, 2,107; PA, 1,874; MTB, 1,868. Total number of donors in the full dataset: IAV, 89; CA, 120; PA, 120; MTB, 120.

We found that reQTLs identified in PBMCs captured most reQTLs present in principal cell types (Supplementary Fig. 11). Furthermore, our framework can be applied to a cell-type specific analysis (Methods), enabling discovery of very specific reQTLs (Supplementary Tables 69). However, a limited number of cells per cell-type in these datasets compromises power. Some examples include the increase in the MX1 eQTL effect of rs461981 in CD4+ T cells after IAV perturbation and the increase in the SAR1A eQTL effect of rs15801 in CD8+ T cells after CA perturbation.

Comparison of power to detect response eQTL effects

Our approach to discover reQTLs had two main features: we (1) use a single-cell model and (2) represent the perturbation state as a continuous variable. The 2df-model detected more reQTLs compared to a traditional approach that aggregates single-cell gene expression by condition and donor into a pseudobulk profile (Extended Data Fig. 6; Methods). To address the effect of the perturbation score on our power to detect reQTLs, we compared the 2df-model to an alternative single-cell PME model that includes only the binary GxDiscrete interaction term (1df-discrete) (Methods). We found that the 2df-model still identified, on average, 89.6% of the reQTLs detected by the 1df-discrete model (Fig. 3e–h), suggesting that little power was lost despite the statistical penalty of estimating an additional interaction term (GxScore) in comparison with the 1df-discrete model. Furthermore, on average, 36.9% of the reQTLs detected with the 2df-model were missed by the 1df-discrete model (Fig. 3e–h), and the 2df-model outperformed it as response heterogeneity increased (Supplementary Fig. 12).

Next, we tested whether these differences in power remained consistent under varying sample sizes (Methods) by randomly downsampling cells per donor (Fig. 4i–l) and donors (Fig. 4m–p). We compared the power difference between the 2df-model and the 1df-discrete. In all scenarios, the 2df-model consistently identified more reQTLs. Also, we observed a linear power loss by reducing donors and cells, suggesting that future studies with larger sample sizes may be able to detect new reQTL effects using the 2df-model. We found that the 2df-model has similar performance in detecting reQTLs as ‘CellRegMap’19 (CRM) (Methods), but the results from the 2df-model are more interpretable and it is substantially more computationally efficient (Supplementary Fig. 13).

Fig. 4: Estimating the change in the eQTL effect after perturbation in single cells.
figure 4

ah, Substantial changes in the magnitude of the eQTL effect of rs11867191 for SLFN5 (ad) and the eQTL rs12819511 for KRR1 after IAV perturbation (eh). ip, Subtle changes in the eQTL effects of rs1464264 for ARL4C (il) and the eQTL rs10194534 for SESTD1 (mp) after IAV perturbation. reQTL effects for SLFN5, KRR1, ARL4C and SESTD1 were significant in the 2df-model (LRT Q < 0.05). Forest plots (a,e,i,m) show the effect size and 95% confidence interval (CI) of each of the terms that contribute to the composite eQTL (total eQTL effect). The effect sizes were calculated using the 2df-model applied to PBMCs. Scatter plots (b,f,j,n) show the total eQTL (Total β) in a cell. Each dot represents a cell (n = 199,879). The color corresponds to NP and P conditions. The dashed lines on the x axis indicate the perturbation score at the 50th percentile. The dashed lines on the y axis indicate the total eQTL effect at the 50th percentile. The histogram below the scatter plot shows the density of the perturbation score color by NP and P conditions. For all boxplots (c,d,g,h,k,l,o,p), each point represents the average log2(UMI count + 1) across all cells around the 50th percentiles on the NP and P in a donor (n = 89), grouped by genotype. Boxplot subheadings show the average Total β and the 95% CI for cells around the 50th percentiles. Boxplots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the interquartile range (IQR) (or minimum and maximum values if they fall within that range; end of whiskers). LRT P corresponds to the ANOVA LRT P value. G, genotype term; GxDiscrete, interaction term between G and perturbation; GxScore, interaction term between G and the perturbation score.

reQTLs with strong and subtle perturbation effects

We observed eQTLs with a substantial change in the magnitude of effect upon perturbation. For example, we found that the main genotype effect of rs11867191 on SLFN5 and the additional interaction effects (Fig. 4a) resulted in a substantial change in the total eQTL effect (Methods) after IAV perturbation (Fig. 4b; 2df-modelPBMC LRT P = 1.15 × 10−173). To confirm the change in the SLFN5 eQTL at the donor level, we selected cells between the 45th and 55th percentile (50p) of the perturbation score distribution from within P and nonperturbed (NP) states, respectively (Methods). The eQTL effect in the NP state (NP-50p eQTLPBMC = −0.07, LRT P = 4.34 × 10−3; Fig. 4c) was magnified upon perturbation (P-50p eQTLPBMC = −0.32, LRT P = 1.39 × 10−21; Fig. 4d). A similar example that replicated across the four perturbation models was the reQTL for KRR1 (Fig. 4e–h and Extended Data Fig. 7a–i).

In addition, our model also detected subtle changes in reQTL effects, such as the reQTL rs1464264 for the gene ARL4C (IAV 2df-modelPBMC LRT P = 1.85 × 10−8) with both GxDiscrete and GxScore contributing to the attenuation of the total eQTL effect (Fig. 4i,j). This reQTL was still significant in the 1df-discrete model (1df-discretePBMC LRT P = 1.11 × 10−4); however, it was not detected by the Pseudobulk-linear mixed effect (LME) model (Pseudobulk LMEPBMC LRT P = 2.07 × 10−1). A similar example was found in the reQTL rs10194534 for the gene SESTD1, where the GxScorePBMC term was the most significant interaction effect in the 2df-model (LRT P = 2.66 × 10−3; Fig. 4m), whereas this reQTL was not significant in the 1df-discrete model (1df-discretePBMC LRT P = 6.36 × 10−2). We confirmed these results by looking at the eQTL effect of these genes in the NP-50p and P-50p state (ARLC4, Fig. 4k,l; SESTD1, Fig. 4o,p). Our findings for KRR1, ARL4C and SESTD1 are consistent with previous studies that identified reQTL effects in perturbation experiments that induced an interferon response11,12.

To highlight the significance of our results in terms of the potential to discover new reQTLs, we compared our results from the IAV dataset with a previous bulk study that detected reQTLs after IAV stimulation29 (Methods). Despite having a smaller sample size, a total of 33 reQTLs were detected only in our analysis (Supplementary Table 10, Supplementary Note and Supplementary Figs. 14 and 15)

reQTLs identify genes with perturbation-specific regulation

An eQTL may not colocalize with a trait-associated locus in part because the eQTL effects are typically not ascertained in the relevant context, for example, perturbed states reQTLs may be more likely to colocalize compared to an eQTL that has no change upon perturbation. We performed a colocalization analysis with nine immune and four nonimmune traits (coloc.abf, Pr > 50%; Methods). As expected, reQTLs were enriched for colocalizations in all perturbation experiments except MTB, and the 2df-model detected more reQTLs without decreasing the enrichment for colocalization (Extended Data Fig. 8). Additional examples of reQTLs detected only by the 2df-model and previously implicated with traits such as lymphocyte counts30 and hematocrit levels31 are shown in Extended Data Fig. 9.

In some cases, the reQTLs detected only in the 2df-model pointed to perturbation-specific disease relevance. One such example was the reQTL effect of rs11721168 for PXK in the IAV experiment (Fig. 5a). This eQTL effect decreases after perturbation (Fig. 5b). We confirmed that the eQTL effect was detected only in cells that score lower in the NP state (Fig. 5c,d). However, the cell-type specific analysis did not detect this reQTL (Supplementary Tables 1114). In contrast, Alasoo et al. reported that the eQTL effect of rs11721168 for PXK decreased from −0.14 in NP macrophages to −0.09 in macrophages after interferon gamma perturbation32. This discrepancy may reflect differences in stimulation or cell states, or the limited power in our cell-type-specific analyses because of fewer cells.

Fig. 5: reQTLs can have state-specific colocalization.
figure 5

The reQTL rs11721168 for PXK in the IAV perturbation experiment had perturbation-specific colocalization with SLE. a, Forest plot showing the effect size and 95% CI of the main genotype (G) and interaction effects (GxScore, GxDiscrete) from the 2df-model in 199,879 PBMCs. reQTL effects for PXK were significant in the 2df-model (LRT Q < 0.05). b, Scatter plots showing the total eQTL (Total β) in a cell. Each dot represents a cell (n = 199,879). The color corresponds to the NP and P conditions. The dashed lines on the x axis indicate the perturbation score at the 50th percentile. The dashed line on the y axis indicates the total eQTL effect at the 50th percentile. For all boxplots, each point represents the average log2(UMI count + 1) across all cells in a donor (n = 89), grouped by genotype. c, Boxplots show the average Total β and the 95% CI for cells between the 20th to 30th percentiles within the NP and the 95% CI. d,e, Boxplots subheadings show the average Total β and the 95% CI for cells around the 50 percentiles on the NP (d) and P (e) and the 95% CI. Boxplots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the IQR (or minimum and maximum values if they fall within that range; end of whiskers). f, Probability of colocalization (coloc.abf PPH4) of eQTL signal in PXK with GWAS for SLE at NP and P discrete state. g, Probability of colocalization of eQTL signal in PXK with GWAS for SLE at tertiles of the perturbation score in the NP (NPT1–NPT3) and P (PT1–PT3) cells.

We also found that the eQTL signal for PXK in the NP cells colocalized with a GWAS locus for systemic lupus erythematosus (SLE)33 (Fig. 5f). When we performed the colocalization at increasing tertiles of the perturbation score within the NP state and P state (Methods), we found that the eQTL signal for PXK colocalized in the lowest perturbation tertile of the NP cells (Fig. 5f). PXK encodes a sorting nexin protein important for receptor internalization and organelle trafficking, including endosomal trafficking34, and it has been implicated previously in SLE risk2,35. Our results show that genetic regulation of PXK is particularly strong in cells with the lowest perturbation scores. It also suggests that these may be the most relevant cell states through which PXK may be mediating risk for SLE. A similar perturbation-specific reQTL rs3807865 for TMEM106B (Extended Data Fig. 10) showed consistent reQTL effects in refs. 12,29, both in monocytes. However, those studies were able to detect eQTL effects in resting cells, perhaps related to increased sensitivity to detect eQTLs in these studies due to the large number of cells in bulk data or other technical factors.

Response eQTLs can have cell-type-specific effects

Previous studies demonstrated that eQTLs can have cell-type-dependent effects36. Since our reQTL analysis was performed on heterogeneous PBMC samples, we explored whether reQTL effects differed between cell types (for example, the magnitude of the interaction effects in monocytes versus the magnitude of interaction effects in CD4+ T cells) exposed to the same perturbation, such as IAV. To evaluate this, we applied the 2df-model separately for each cell type and tested for heterogeneity using Cochran’s Q test (Methods). Most reQTLs have similar effects across cell types; however, around 25% of reQTLs showed evidence of heterogeneous response eQTL effects (Q value < 0.10) across the four perturbation models (Supplementary Figs. 16 and 17).

For example, we observed that the reQTL rs10774671 for the gene OAS1 had a cell-type-specific effect in the IAV experiment (Supplementary Table 15). This eQTL has no effect in NP PBMCs but has a significant effect after IAV perturbation (Supplementary Table 3). In our analysis of the IAV dataset, we observed stronger interaction effects for this eQTL among CD4+ T (Fig. 6a) and B cells (Supplementary Fig. 18a–c). We also observed these differences at the donor level, where we found that the eQTL effect in the P-50p of CD4+ T (Fig. 6b,c) and B cells were stronger than the eQTL effect at P-50p in monocytes (Fig. 6e,f). Furthermore, we found that this OAS1 eQTL had state-specific colocalization with the GWAS locus for severe COVID-19 (Supplementary Fig. 19a,b). This result is consistent with previous studies that report association of rs10774671 with levels of OAS1 and risk of severe COVID-19 (ref. 13). Previous studies did not detect this reQTL after IAV perturbation in monocytes29 or dendritic cells11. More recently, Kumasaka et al. identified the eQTL rs10774671 for OAS1 in PBMCs post-COVID-19 infection, with a stronger reQTL effect in CD16+ monocytes20. The differing cell-type-specific effects may arise from study design differences—ex vivo IAV perturbation versus natural infection. It is worth mentioning that detecting reQTLs with heterogeneous effects may be underpowered for lowly abundant cell types. Also, our framework does not distinguish whether the perturbation effect in each main cell type is driven by direct viral infection or by bystander activation. In contrast to previous studies29,32,37,38, we did not detect any reQTLs of rs10774671 for the nearby OAS3 gene in any of the datasets analyzed, possibly related to different experimental designs and the low expression of this gene.

Fig. 6: Response eQTLs with cell-type-specific effects after IAV perturbation.
figure 6

af, The eQTL rs10774671 for OAS1 was magnified after perturbation in all cell types; however, the magnitude of the effect was stronger in CD4 T cells (ac) and more subtle in other cell types like monocytes (df). gl, Furthermore, we found that the reQTL rs1117139 for RPS26 had cell-type-specific effects. After perturbation, the eQTL effect was magnified in monocytes (gi); however, the eQTL effect decreased in NK cells (jl). Scatter plots show the total eQTL effect in a cell for CD4 T cells (n = 121,935) (a), monocytes (n = 20,634) (g) and NK cells (n = 10,255) (j). Each dot represents a cell (n = 199,879). The color corresponds to the NP and P conditions. The dashed lines on the x axis indicate the perturbation score at the 50th percentile. The dashed line on the y axis indicates the total eQTL effect at the 50th percentile. For all boxplots, each point represents the average log2(UMI count + 1) across all cells around the 50 percentiles on the NP and P in a donor (n = 89), grouped by genotype. Boxplot subheadings show the average Total β and the 95% CI for cells around the 50th percentile. Boxplots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the IQR (or minimum and maximum values if they fall within that range; end of whiskers). In c and f, the area delimited by the dashed box was magnified.

We also observed the reQTL rs11171739 for the gene RPS26 in the IAV experiment (Supplementary Table 2). This reQTL has cell-type-specific interaction effects (Supplementary Table 15). More specifically, we found that there was a significant magnification in the total eQTL effect after perturbation in monocytes (Fig. 6g) and B cells (Supplementary Fig. 19j–l); however, the change in the eQTL effect was not only smaller in natural killer (NK) (Fig. 6j) and CD4+ T cells (Supplementary Fig. 19m–o) but strikingly also in the opposite direction. Furthermore, there was no significant reQTL effect in CD8+ T cells (Supplementary Fig. 19p–r). We also observed these differences in gene regulation in the context of perturbation when we estimated the eQTL effect on NP-50p and P-50p states on monocytes (Fig. 6h,i), NK cells (Fig. 6k,l) and the other cell types (Supplementary Fig. 19). Similar results for the reQTL effect in RPS26 were found in the CA, PA and MTB perturbation experiments (Supplementary Figs. 2022). Previous studies suggest that rs11171739 has pleiotropic effects on autoimmune diseases39 and increases rheumatoid arthritis (RA) risk1. Our results show that genetic regulation of RPS26 depends on cell perturbation and cell-type, both of which may influence the causal relationship with diseases such as RA.

Discussion

Recent studies have begun to explore reQTLs in single-cell data, but current linear models are limited to comparing discrete P versus NP conditions, reducing power. Here we introduce a new framework that incorporates the level at which each cell responds to the perturbation to model reQTL effects across continuous perturbation states such as interferon-alpha, complement activation and inflammation. We demonstrate that this framework substantially improves reQTL detection and is applicable to heterogeneous samples such as PBMCs, enabling comparisons across cell types.

Limited colocalization between eQTLs and GWAS loci suggested that disease alleles may affect gene regulation in specific cellular contexts not captured by aggregated eQTL profiles. We observed an enrichment of the colocalization of reQTLs with traits, which highlights the importance of improving the identification of reQTLs. Consistent with previous studies9,12, reQTLs can have perturbation-dependent colocalization. Notably, we found that, even among cells with the same discrete perturbation state, the strength of colocalization of the reQTL and a trait can depend on the continuous perturbation state of the cell. This highlights the value of identifying highly responsive cellular states to uncover genetic mechanisms of disease.

Current single-cell datasets are limited in the number of cells and people assayed compared to the much larger bulk RNA sequencing (RNA-seq) datasets. However, as genotyped single-cell datasets become larger, single-cell eQTL methods—such as those we present in this study—have the potential to better elucidate context-dependent eQTL effects, including reQTLs. Although we focus on PBMCs, the methods we propose are in no way restricted to this specific tissue. In fact, we envision that this approach could be extended easily to sorted populations from PBMCs. Indeed, we think that it may be particularly useful in exploring single-cell organoid data40,41,42,43 or solid tissue data from patients43, where cell sorting and bulk RNA-seq may confound important signals44,45,46,47.

Our work has important limitations. First, our PCA-derived perturbation scores may not generalize across datasets, even with the same perturbation condition. Second, the perturbation score presented here is based on a log-linear model, which may miss nonlinear, cell-type-specific programs. Third, we previously demonstrated the benefits of using protein expression in addition to gene expression to model continuous cell states48,49. Here we model the continuous perturbation state based only on single-cell RNA-seq (scRNA-seq) data. This limitation is due to the lack of publicly available multiomic (scRNA-seq, CITE-seq (cellular indexing of transcriptomes and epitopes)) data for reQTL analysis. However, our framework should be applicable to the use of multiomic data.

Overall, our work provides a generalizable approach for identifying perturbation-dependent eQTLs and underscores the importance of modeling cell-level variation. Although our framework uses the perturbation score based on corrected expression PCs, our reQTL model could incorporate alternative scores such as those presented in ref. 50 and ref. 51. Combining statistical models such as the one presented here with biologically relevant perturbations will ultimately advance our understanding of the biological mechanisms that underlie disease heritability.

Methods

Cohorts

We used published data (GEO accession no. GSE162632) to study the effect of 12 h of ex vivo stimulation with either IAV or mock (control) in 90 samples of heterogeneous cell composition (PBMCs) using scRNA-seq. Similarly, we obtained data (EGA accession no. EGAS00001005376) of 120 PBMC samples infected for 24 h with CA, PA, MTB or mock (control) from the European Genome-Phenome Archive under a Lifelines DEEP DAG2+ Project (LLDEEP DAG2+) data access agreement.

Genotype quality control

The original Randolph et al. study24 (IAV perturbation) used 4× low-pass whole-genome sequencing to call genotypes of 89 people—a total of 78,111,311 genome-wide variants. Genotypes with a posterior probability < 0.9 were considered as missing data. We removed variants with a missing call rate >10% and only included variants with a minor allele frequency ≥5% and Hardy-Weinberg equilibrium P > 1 × 10−5. In total, for the IAV perturbation dataset we had 5,194,816 genetic variants genome-wide that passed quality control (QC), and these variants were used in the eQTL mapping. Furthermore, we selected 508,883 independent variants using PLINK52 (–indep-pairwise 50 5 0.2), and then we performed PC analysis (PCA) using the R package SNPRelate53.

The original Oelen et al. study22 (CA, PA and MTB perturbations) used genotype data from the HumanCytoSNP-12 BeadChip and imputed with the Haplotype Reference Consortium reference panel of 120 people at a total of 7,249,882 genome-wide variants mapped on the Hg19 reference genome. For the main analysis we included variants with missing call rate <10%, minor allele frequency ≥5% and Hardy-Weinberg equilibrium P > 1 × 10−5. The variants that passed the filtering step were then lifted over to the hg38 reference genome using Picard tools. In total, for this dataset, we had 3,962,615 variants after QC, which were used in the eQTL mapping. We selected 292,681 independent variants using PLINK52 (–indep-pairwise 50 5 0.2), and then we performed PCA using SNPRelate53. For the secondary analysis, we selected the 3,554 variants included in the Supplementary Table 9 of Oelen et al.22 and lifted over 2,347 variants from the hg19 to hg38 reference genome.

Single-cell mRNA quality control

We performed quality control of the 236,993 PBMCs from the 12-h perturbation experiment with IAV. These cells had <20% reads that mapped to mitochondrial genes. We removed ten influenza IAV genes from the original 19,248 genes in the count matrix. We selected a total of 208,223 cells with >500 expressed genes for the analysis. For the 24-h perturbations with CA, PA and MTB, there were, in total, 493,061 cells and 33,538 genes measured. As in the IAV perturbation study, we selected 475,333 cells with <20% reads mapping to mitochondrial genes and >500 detected genes.

Then, for each perturbation condition (NP or P) in each study, we normalized the counts for each cell, scaled each gene’s expression, and cosine normalized. We selected the 3,000 most variable genes based on the variance stabilizing transformation and conducted PCA using an implicitly restarted Lanczos method implemented in the irlba R package. Then, we controlled the effects of batch and sample donor using Harmony54 (v.0.1.0) (parameters: θdonor = 1, θbatch = 1). The top 20 hPCs were used for downstream analysis. For cell-type annotations, we used the originally published cluster labels.

Estimating the continuous perturbation state of a cell

In contrast to traditional models that consider only the discrete perturbation state imposed by the experimental design, we define a continuous score to quantify the degree of perturbation response of each cell. The underlying assumption is that perturbation may induce changes in the transcriptional profile of a cell that are proportional to the degree of response of the cell. Similarly, experimental conditions may induce changes in the transcriptional profile of NP cells55,56,57,58,59. To systematically identify axes of variation in the transcriptional profile of P and NP cells, we use PCA and correct technical or biological effects with Harmony to obtain hPCs. We assume that one or several hPCs may capture the effect of perturbation in a cell.

To define a continuous perturbation score, we used a penalized logistic regression with hPCs as independent variables to predict the log odds of being in the pool of P cells, which serves as a surrogate of the cell’s degree of response to the experimental perturbation. Thus, for each perturbation model, we developed a continuous perturbation score of a cell by modeling the discrete perturbation state of a cell (i) (NP = 0, P = 1) as a function of the top 20 hPCs in a logistic Lasso regression model implemented in the R package glmnet60. We fitted the model and used the LASSO penalty term with a tuning parameter (λ) obtained from cross validation (cv.glmnet parameters: alpha = 1, lambda.1se)

$$\log \left(\frac{{\pi }_{i}}{1-{\pi }_{i}}\right)={\theta }_{i}+{\beta }_{\rm{PC}_{1}}{X}_{i,\rm{PC}_{1}}+\cdots +{\beta }_{\rm{PC}_{\textit{K}}}{X}_{i,\rm{PC}_{\textit{K}}}$$
(1)

where \(K\) is the number of PCs and πi is the probability of the cell (i) to be in the pool of P cells.

We use this approach instead of focusing on predefined gene sets such as the interferon-stimulated gene pathway for viral perturbations, which may not accurately represent the changes on the transcriptome in specific tissues, cells or perturbation experiments.

Cell-type-specific perturbation score

For each perturbation experiment, cell-type-specific perturbation scores for each main cell-type presented in PBMCs were generated by subsetting the dataset by cell-type and then performing the same steps as described before: gene normalization, scale, cosine normalization, selection of the 3,000 most variable genes (variance stabilizing transformation), PCA, harmony and Lasso regression using the top 20 corrected expression hPCs as predictors in the logistic Lasso regression model.

Correspondence of the perturbation score with canonical protein marker of response to perturbation

We evaluate the ability of the perturbation score generated using the penalized logistic regression to capture the response to perturbation. We re-analyzed a dataset of PBMCs perturbed with either anti CD3 and anti CD28 and LPS, which profiled cells using scRNA-seq as well as surface protein markers (CITE-seq)61. Thus, for each condition we calculated the perturbation score using the scRNA-seq data as described before. Then, we evaluate the Pearson correlation between the perturbation score of a cell and the expression of surface protein markers of activation.

Gene pathway enrichment analysis

For each perturbation experiment, we calculated the Pearson correlation between the 3,000 most variable genes and the perturbation score of a cell. We selected genes with a correlation Q < 0.05 (3,000 tests). Then, we used these correlation values to rank the genes and performed a gene pathway enrichment analysis using the fgsea R package27 with a maximum gene set size of 500, a minimum size of 15, and 100,000 permutations. We include the following gene sets from the Pathway Interaction Database stored in MSigDB: Hallmark gene set and large collection of immunological conditions (C7).

Prioritizing eQTLs in PBMCs bulk expression profiles

For each individual participant and perturbation condition, we summed the gene expression counts for each gene across all cells to generate bulk profiles, then we averaged the gene expression from each perturbation condition by individual participant, resulting in the average bulk expression profiles. This allows us to use a traditional eQTL linear regression model with one sample per donor to prioritize eQTLs. For each perturbation experiment, we generated bulk expression profiles for the NP, P and average conditions. We normalized the expression profiles (log2 count per million + 1) and applied an inverse rank normal transformation (qnorm((rank(gene, na.last = ‘keep’) − 0.5)/sum(!is.na(gene))). Then, we removed the effect of age, sex if available, top five genotype PCs and 15 latent factors using PEER factor analysis62. We only included transcripts expressed in at least 50% of the samples. The corrected gene expression was used as the dependent variable in our bulk cis-eQTL mapping. We performed the cis-eQTL mapping by testing each gene with genetic variants within 1 Mb of each gene’s transcription starting site using a linear regression implemented in FastQTL63. We use 1,000 permutations per locus and select the gene–SNP pairs (eGenes) with a Q value < 0.05. (Number of tests in IAV, 48,016,811; CA, 46,012,957; PA, 46,012,957; MTB, 44,896,173). In most cases, we prioritized the eQTL from the averaged NP and P profile as this produced the largest number of eQTLs (Q value < 0.10) (Supplementary Fig. 3). Standard approaches to fit the PME model are computationally expensive and impractical for the systematic genome-wide identification of eGenes in single-cell experiments.

Identifying response eQTL effects

For the single-cell approaches for both PBMCs or cell-type-specific analysis, we modeled the raw expression (E) counts in a cell (i) as a function of genotype allele dosage and an interaction with cell state accounting for confounders and technical batch using a described PME model18. For all the analyses, we included age and sex as fixed effects where available. We also controlled for genetic population stratification by including the top five genotype PCs (gPC) as covariates. Similarly, we controlled for unwanted variation in gene expression by including the top five gene expression PCs (ePC). In addition, we included the number of unique molecular identifier (UMIs) and percentage of reads mapping to mitochondrial genes (MT%) as fixed effects, and donor (d) and technical batch (b) as random effects. We include two cell state variables, the discrete perturbation state (Discrete: where P = 1 for perturbation or 0 for no perturbation) and the residual perturbation score (rScore) (3) that avoid the collinearity between the Discrete variable and the perturbation score (Score) from Eq. 1. To estimate the reQTL effect, we fit a model that included the genotype main effect (G) and the interaction terms of G with the Discrete (GxDiscrete) and rScore (GxrScore) variables. In contrast, we only included the interaction term GxDiscrete in the simplified 1df-discrete model. For the cell-type-specific reQTL analysis, we subset the dataset by each principal cell-type and used the cell-type-specific score and applied the same model (2) as described above.

$$\begin{array}{l}\log \left({E}_{i}\right)\;=\;\theta +{\beta }_{G}{X}_{d,{\rm{geno}}}+{\beta }_{{\rm{age}}}{X}_{d,{\rm{age}}}+{\beta }_{{\rm{nUMI}}}{X}_{i,{\rm{nUMI}}}+{\beta }_{{\rm{MT}}}{X}_{i,{\rm{MT}}}\\\qquad\qquad\;+{\beta }_{{\rm{Discrete}}}{X}_{i,{\rm{Discrete}}}+{\beta }_{{\rm{rScore}}}{X}_{i,{\rm{rScore}}}+{\beta }_{\rm{Gx}{\rm{rScore}}}{X}_{d,{\rm{geno}}}{X}_{i,{\rm{rScore}}}\\\qquad\quad\quad\;\,+{\beta }_{\rm{Gx}{\rm{Discrete}}}{X}_{d,{\rm{geno}}}{X}_{i,{\rm{Discrete}}}+\mathop{\sum}\limits_{k=1}^{5}{\beta }_{{g\rm{PC}}_{k}}{X}_{d,{g\rm{PC}}_{k}}\\\qquad\qquad\;+\mathop{\sum}\limits_{k=1}^{5}{\beta }_{{e\rm{PC}}_{k}}{X}_{i,{e\rm{PC}}_{k}}+\left(d\right)+({k}_{b}|b),\end{array}$$
(2)
$${\rm{Score}}=\theta +{\beta }_{{\rm{Discrete}}}{X}_{i,{\rm{Discrete}}}+{r\;\rm{Score}}.$$
(3)

To map reQTLs in bulk profiles (i) (Pseudobulk LME) we modeled the residuals from the PEER factor analysis (rPFA) as a function of the genotype and the GxDiscrete term accounting for donor’s age, sex and both the total numbers of UMIs and mean MT% across all cells per donor using an LME model. We included donor as a random effect in the LME model (equation (4)).

$$\begin{array}{l}{{\rm{rPFA}}}_{i}\;=\;\theta +{\beta }_{G}{X}_{d,{\rm{geno}}}+{\beta }_{{\rm{age}}}{X}_{d,{\rm{age}}}+{\beta}_{{\rm{nUMI}}}{X}_{i,{\rm{nUMI}}}+{\beta }_{{\rm{MT}}}{X}_{i,{\rm{MT}}}\\\qquad\qquad+{\beta }_{{\rm{Discrete}}}{X}_{i,{\rm{Discrete}}}+{\beta}_{{\rm{GxDiscrete}}}{X}_{d,{\rm{geno}}}{X}_{i,{\rm{Discrete}}}+\left(d\right)+\varepsilon .\end{array}$$
(4)

We compared the significance of each of these full models against a null model with no interaction terms using the likelihood ratio test and considered significant reQTLs to be those with an LRT Q value < 0.05. Furthermore, to rule out the possibility of spurious significant results, we performed 500 permutations per each discovered reQTL. For the permutation in the single-cell models, we shuffled values of the perturbation score and the discrete perturbation state randomly within each donor. For the pseudobulk LME model, we shuffled values of the perturbation state randomly across samples.

To exclude spurious associations, for each gene we counted the proportion of LRT P values from the permutation experiments that were smaller than the discovery LRT P (permutation P) and removed genes with permutation P < 0.05. To exclude genes where the PME model may not be appropriate due to high differential gene expression across cell states, we fit a PME model where we simulated null conditions by shuffling the perturbation score value randomly within each donor. We tested the significance of the interaction term under this null by comparing this model with a model with no GxScore term using the LRT ANOVA. For each eGene, we performed 100 permutations and removed eGenes where we observed a significant P value (LRT P < 0.01) in more than five out of 100 permutations.

In addition, we performed a secondary analysis for the CA, PA and MTB perturbation experiments based on the 1,825 eGenes prioritized by ref. 22 and listed in Supplementary Table 9 and passing QC in our preprocessing. For each perturbation experiment, we estimated reQTL effects for this set of 1,825 eQTLs using each of the models described above (equations (1) and (3)) and compared the significance against a null model with no interactions terms using the LRT and considered significant those with LRT Q value < 0.05. Furthermore, for each perturbation experiment, we generated P values under the uniform distribution based on the number of eQTLs included in our secondary analysis. Then, we calculated the enrichment of significant reQTLs in the 2df-model by comparing the number of tests where we found 2df-model LRT P < 0.001, >0.001 and ≤0.01, >0.01 and ≤0.05, >0.05 and ≤0.1, and >0.1 with those found under the uniform distribution using a χ2 test.

Estimating the eQTL effect at levels of the perturbation score

For any given reQTL, we selected cells either in the NP or P condition and calculated the 45th and 55th percentile of the perturbation score in that group. Then, we selected the cells with perturbation score values within the 45th to 55th percentile (50p) and estimated the eQTL effect in those cells by fitting a single-cell PME, where we modeled the raw gene counts (E) in a cell (i) as a function of the genotype (G) and the same cell and donor level fix and random covariates as described above (equation (2)) but without interaction terms (equation (5)). We consider the significance of the G term by comparing it with a null model with no G term using the LRT with 1 d.f.

$$\begin{array}{l}\log \left({E}_{i}\right)\;=\;\theta +{\beta }_{G}{X}_{d,{\rm{geno}}}+{\beta }_{{\rm{age}}}{X}_{d,{\rm{age}}}+{\beta }_{{\rm{nUMI}}}{X}_{i,{\rm{nUMI}}}+{\beta }_{{\rm{MT}}}{X}_{i,{\rm{MT}}}\\\qquad\qquad\quad+\mathop{\sum}\limits_{k=1}^{5}{\beta}_{{g\rm{PC}}_{k}}{X}_{d,{g\rm{PC}}_{k}}+\mathop{\sum}\limits_{k=1}^{5}{\beta }_{{e\rm{PC}}_{k}}{X}_{i,{e\rm{PC}}_{k}}+\left(d\right)+({k}_{b}|b).\end{array}$$
(5)

To visualize the eQTL effect at the donor level, we aggregated the gene counts in the cells in the 50p by donor and then normalized the gene expression (log2 counts per million + 1). Then, we plotted the gene expression as a function of the eQTL’s allele dosage.

Delta betas between PBMC and cell-type-specific reQTL analysis

First, based on the PBMCs analysis and for each main cell-type (that is, TCD4+, monocytes), we calculated the difference in the total eQTL between the 50th percentile in the P state versus the 50th percentile in the NP state (Delta 1). Furthermore, based on the cell-type-specific analysis (TCD4+ and TCD4+ perturbation score), we calculated the difference in the total eQTL between the 50th percentile in the P state versus the 50th percentile in the NP state (Delta 2). We only included eGenes with LRT Q value < 0.05 in the cell-type-specific analysis. Then, for each main cell-type we calculated the Pearson correlation between Delta 1 and Delta 2.

Composite eQTL effect in a cell

We defined the total eQTL in a cell (i) as the sum of the genotype main effect (G) with the discrete interaction effect (GxDiscrete) and the interaction effect with the residual of the perturbation score (GxrScore), weighted by its corresponding value (Discrete, NP = 0, P = 1; Score, continuous perturbation score) (Eq. 6).

$${{\beta }_{{\rm{total}}}=\beta }_{G}+{\beta }_{{\rm{GxDiscrete}}}{X}_{i}+{\beta }_{{\rm{GxrScore}}}{X}_{i}.$$
(6)

Comparison of the IAV perturbation score and an interferon-alpha score

As an alternative approach to quantify the response to perturbation, we developed a per cell interferon-alpha score based on a predefined gene set (MSigDB: Hallmark interferon-alpha response (INF-alpha))64. The selected genes (INF-alpha) were normalized, then we removed genes expressed in <1% of the cells and then scaled the expression of the gene (mean = 0, variance = 1). Then, for each cell, we averaged the expression across genes. To compare reQTL detected by each score, we ran a reduced single-cell reQTL model where we tested only for the significance of the interaction effect between the genotype and the score (GxScore). We then evaluated the number of significant reQTL (LRT Q value < 0.05) as well as the standardized effect size of the main eQTL effect (G) and interaction effects (GxScore) between the two approaches.

Comparison of Poisson mixed effect and negative binomial models

To confirm that the single-cell PME model was well calibrated, we fit a single-cell reQTL model where we modeled the raw expression counts in single cells as a function of genotype allele dosage and interaction terms equivalent to the PME 2df-model described in Eq. 1, using a negative binomial model implemented in the glmmTMB65 R package. We calculated the correlation between the LRT P value from the original PME and the negative binomial model.

Downsampling donors and cells

To test the effect of sample size on the power to detect reQTLs in the single-cell models, we generated datasets where we removed cells or removed donors randomly. We generated five datasets and, in each, we removed either 10, 20, 30, 40 or 50% of cells randomly within each donor. For each dataset, we generated three replicates by using different random seeds. Similarly, we generated five datasets with three replicates each where for each set we randomly removed either 10, 20, 30, 40 or 50% of donors. Then, in each downsampled dataset, we estimated reQTL effects with the PME model by testing the originally prioritized eGenes and counting the number of reQTLs that reached an LRT Q value < 0.05.

Power to detect reQTL at different degrees of perturbation

For each of the perturbation experiments (IAV, CA, PA and MTB), we generated three new datasets with different degrees of perturbation by sampling cells from the P condition (that is, cells exposed to IAV) at different ranges of the score (30th to 70th, 20th to 80th, and full distribution). We constrained the sampling so that each dataset contained the same number of P cells, and we kept all the NP cells. Then, we used the same set of prioritized eQTLs as the original analysis (IAV, 1,892; CA, 1,863; PA, 1,794; MTB, 1,355) and tested for reQTLs in single cells using both the 2df-model and the reduced 1df-discrete model. We compared the number of reQTLs that reached LRT Q value < 0.05 between the two models.

Similarly, for each perturbation experiment, we generated three additional new datasets by sampling P cells at different levels of the perturbation score (<33rd, >66th and the full distribution), again constraining the number of cells to be fixed between the datasets, and keeping all NP cells. We performed a reQTL analysis using both 2df-model and 1df-discrete models and compared the number of reQTLs that reached LRT Q value < 0.05 between the two models.

Comparison of the 2df-model with CellRegMap to find reQTLs in single cells

We compared our reQTL framework with CellRegMap19 (CRM), a statistical framework that can identify eQTLs with dynamic effects across cell-states in single cells. CRM can use PCs as cell states; however, in our comparison, we applied CRM using the residual effect of the perturbation score and the discrete perturbation state as these are the same cell-states used in our 2df-model. In the CA perturbation experiment, we defined a set of ground-truth reQTLs. These were reQTLs detected by the original analysis by Oelen et al.22 and detected in our pseudobulk reQTL approach. Similarly, we defined a set of negative controls as those reQTLs not reported by Oelen et al.22 or our pseudobulk analysis (LRT Q value > 0.2). In addition, for every ground-true reQTL, we randomly selected three negative controls with mean gene expression within the range of the mean gene expression of positive control ± (mean positive/2). Based on these criteria, we had 35 positive reQTL and 105 negative controls, and we tested for reQTLs using our 2df-model and CRM with the default parameters. We corrected for multiple comparisons using a Bonferroni correction and compared the sensitivity and specificity of the two models.

reQTLs with heterogeneous effects across cell types

We tested for heterogeneous effects by comparing the effect size of each of the interaction terms between cell types. For each perturbation experiment, we selected cells from each main cell type present in the experiment (IAV: B, CD4+ T, CD8+ T, Monocytes, NK; CA-PA-MTB: B, CD4+ T, CD8+ T, Monocytes, NK, DC) and fitted the single-cell 2df-model with the GxDiscrete and GxScore terms. Then, for each reQTL, we used the summary statistics to compare the effect sizes of each of the interaction’s terms across cell types. We tested whether the variability in the observed effect sizes is larger than the expected based on sampling variability alone using the Cochran’s Q test (Q) implemented in the metafor R package (rma, method = ‘FE,’ weighted = TRUE). We defined response eQTLs with heterogeneous effects as those with a Cochrane’s Q-test Q value < 0.10 in at least one of the interaction terms.

$$\begin{array}{ccc}{\beta }_{{\rm{meta}}}=\left(\mathop{\sum }\limits_{k=1}^{K}\frac{{\beta }_{k}}{s{{e}_{k}}^{2}}\right)\left/\left(\mathop{\sum }\limits_{k=1}^{K}\frac{1}{s{{e}_{k}}^{2}}\right)\right.&{\rm{and}}& Q=\mathop{\sum }\limits_{k=1}^{K}\frac{1}{s{{e}_{k}}^{2}}{\left({\beta }_{k}-{\beta}_{{\rm{meta}}}\right)}^{2}\end{array},$$

where \(K\) is the number of cell types in the perturbation experiment, \({\beta }_{k}\) is the effect size for each cell-type, and \({\beta }_{{\rm{meta}}}\) is the average effect size weighted by the inverse of the variance of the estimate on each cell type.

Overlap with previous response eQTLs in response to IAV

We compared the overlap between the reQTLs found in the IAV dataset using the 2df-model (all PBMCs and monocyte-specific) and the reQTLs found after IAV perturbation in primary monocytes reported originally in Table S2 by Quach et al.29. There were 286 eGenes present in both analyses. We compared the overlap of eGenes with significant reQTLs (LRT Q value < 0.05) found in our PBMC analysis and the monocyte-specific analysis of the IAV dataset. Similarly, we selected eGenes with reported reQTLs P value from Quach et al.29. In the case of several reQTLs per gene reported in ref. 29, we chose the result with the lowest reQTL P value.

Colocalization analysis

To find instances where a perturbation-dependent reQTL shared its causal variant with a GWAS locus, we performed a colocalization analysis using coloc66 (coloc.abf), a Bayesian approach that estimates the posterior probability of having a shared true causal variant between the eQTL and the GWAS locus. We used summary statistics from nine GWAS on immune traits (type 1 diabetes67, multiple sclerosis68, asthma69, RA1, SLE33, inflammatory bowel disease70, Crohn’s disease70, ulcerative colitis70, psoriasis71) and three GWAS on nonimmune traits (coronary artery disease72, type 2 diabetes73, depression74).

We defined eQTLs as those with a nominal Q value < 0.05 in the analysis with FastQTL. We ran coloc with the default parameters and defined colocalization as those loci with a posterior probability H4 (PPH4) > 0.5. Furthermore, for those colocalized loci, we generated the 95% credible set and obtained the posterior probability of each variant within the credible set (snpH4) under the single causal variant assumption. We calculated the enrichment of colocalization in eGenes with reQTL effect compared to eGenes with no reQTL effect using Fisher’s exact test.

In addition, to find colocalization at different levels of the perturbation score, we defined tertiles of perturbation within the NP (T1–T3) and P (T4–T6) conditions. For example, in the IAV experiment, T1 was defined as NP cells with perturbation score values <33rd percentile (−0.90), T2 was defined as NP cells with the perturbation score ≥33rd and below 66th percentile (−0.75) and T3 was defined as NP cells with a perturbation score ≥ 66th percentile. Then, for each tertile, we generated pseudobulk gene expression profiles, tested for cis-eQTL and ran coloc as described before.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.