Modeling heterogeneity in single-cell perturbation states enhances detection of response eQTLs

Valencia, Cristian; Nathan, Aparna; Kang, Joyce B.; Rumker, Laurie; Lee, Hyunsun; Raychaudhuri, Soumya

doi:10.1038/s41588-025-02344-6

Download PDF

Analysis
Open access
Published: 20 October 2025

Modeling heterogeneity in single-cell perturbation states enhances detection of response eQTLs

Cristian Valencia^1,2,3,4,5,
Aparna Nathan ORCID: orcid.org/0000-0002-5975-2851^1,2,3,4,5,
Joyce B. Kang^1,2,3,4,5,
Laurie Rumker^1,2,3,4,5,
Hyunsun Lee^2,5,6,7 &
…
Soumya Raychaudhuri ORCID: orcid.org/0000-0002-1901-8265^1,2,3,4,5

Nature Genetics volume 57, pages 2882–2890 (2025)Cite this article

27k Accesses
6 Altmetric
Metrics details

Subjects

Abstract

Identifying response expression quantitative trait loci (reQTLs) can help to elucidate mechanisms of disease associations. Typically, such studies model the effect of perturbation as discrete conditions. However, perturbation experiments usually affect perturbed cells heterogeneously. Here we show that modeling of per-cell perturbation state enhances detection of reQTLs. We use single-cell data to study the effect of perturbations with influenza A virus, Candida albicans, Pseudomonas aeruginosa and Mycobacterium tuberculosis on gene regulation. We found on average 36.9% more reQTLs by accounting for single-cell heterogeneity compared to the standard discrete reQTL model. For example, we detected a decrease in the expression quantitative trait loci effect for PXK with influenza A virus. Furthermore, we found that, on average, 25% of reQTLs have cell-type-specific effects. For example, the reQTL effect for RPS26 was stronger in B cells. Our work provides a general model for more accurate reQTL identification and underscores the value of modeling cell-level variation.

Single-cell eQTL models reveal dynamic T cell state dependence of disease loci

Article 11 May 2022

Pathogen-specific innate immune response patterns are distinctly affected by genetic diversity

Article Open access 05 June 2023

A computational method for cell type-specific expression quantitative trait loci mapping using bulk RNA-seq data

Article Open access 25 May 2023

Main

Genome-wide association studies (GWAS) have elucidated loci underpinning human complex traits, including autoimmune^1,2 and infection-related traits^3,4. However, the mechanisms by which risk alleles act remain unclear. One hypothesis is that they influence gene regulation. Mapping expression quantitative trait loci (eQTLs) links genetic variants to gene expression changes, aiding GWAS interpretation. Efforts such as GTEx⁵ and DICE⁶ have identified thousands of eQTLs using aggregated expression profiles (bulk RNA) from healthy people. However, eQTLs can have different effects in specific contexts, for example, in different cell states or physiologic conditions. Less than 10% of eQTLs found at homeostasis colocalize with GWAS loci⁷, suggesting that disease-associated regulation may emerge only in specific cellular contexts such as those critical for disease initiation⁸.

One way to understand context-dependent gene regulation is by studying reQTLs—eQTLs for which the magnitude of the effect on gene expression changes after environmental perturbation (P). reQTL studies induce a perturbation (for example, external stimulation) in a subset of samples and compare the genotype–expression associations between conditions to identify reQTLs. This approach is critical to understand how genetic variation may act in disease-specific conditions often missed in typical eQTL studies^9,10,11,12. However, in practice, ex vivo perturbations affect cells heterogeneously^{13,14,15,16,17}. Current bulk RNA-based reQTL models cannot account for heterogeneous responses across cells. Accounting for this heterogeneity in statistical models may improve power to detect reQTLs.

We and others have recently demonstrated that mapping eQTLs at single-cell resolution can uncover heterogeneous eQTL effects along continuous cell states^18,19,20. However, reQTL models typically only offer a single format to represent perturbation response^21,22, which neglects their potential diversity. We hypothesize that effective modeling of per-cell perturbation state using linear models may enhance power to detect response eQTLs.

Here we show that modeling eQTL effects in single cells using a continuous perturbation score increases power to detect response eQTLs compared to binary-perturbation-state models. Our goal is not to define new eQTLs, but rather to identify those that change in response to an experimental condition. It can be applied to a previously identified set of eQTLs or to discover new reQTLs. Furthermore, we show that it can help to prioritize genetic effects in disease-relevant contexts. This builds on recent advances demonstrating the potential for single-cell data in understanding eQTLs^18,19,20,23.

Results

General framework

Our reQTL mapping framework is depicted in Fig. 1a and explained in detail in Methods. To define a continuous perturbation score, we used a penalized logistic regression with corrected expression principal components (PCs) (hPCs) as independent variables to predict the log odds of being in the pool of perturbed (P) cells, which serves as a surrogate of the cell’s degree of response to the experimental perturbation. Then, to find response eQTLs, we used a Poisson mixed effects model (PME) of gene expression in single cells as a function of genotype and genotype interactions with a discrete perturbation term and the independent effect of a continuous perturbation score term (Methods).

**Fig. 1: Framework to identify response eQTLs at single-cell resolution.**

Quantifying perturbation states

We applied our framework to study the effect of ex vivo stimulation with influenza A virus (IAV), Candida albicans (CA), Pseudomonas aeruginosa (PA) and Mycobacterium tuberculosis (MTB) on gene regulation. We used transcriptional profiles from 299,879 peripheral blood mononuclear cells (PBMCs) from 89 donors²⁴ and 475,333 PBMCs from 120 donors²² (Methods). We expected that each perturbation may have different effects on cell states, so we defined perturbation scores independently for each stimulation condition (Fig. 1b,c and Extended Data Fig. 1a–g). We discuss the effect of the number of cells on the estimation of the perturbation score and cell-type-specific perturbation scores in the Supplementary Note and Supplementary Figs. 1–5. Based on those analyses, we decided to use the PBMC-based perturbation score for all downstream analyses.

We found that the perturbation score reflects the transcriptional heterogeneity of the response to perturbation. For example, higher IAV score correlates with the increased expression of genes such as ISG15 (Pearson r = 0.81; Fig. 1d), IFI6 (Pearson r = 0.81; Fig. 1e), and IFIT3 (Pearson r = 0.79; Fig. 1f), and these top correlated genes were enriched in the ‘interferon-alpha response’ Molecular Signatures database (MSigDB) gene set^25,26 (gene set enrichment analysis²⁷ (GSEA) P adjusted = 0.003; Supplementary Table 1 and Extended Data Fig. 2; Methods). This is consistent with the well-characterized role of the type I interferon response during RNA–virus infection²⁸. Details on other perturbation scores are in Fig. 2, Supplementary Table 1 and Extended Data Fig. 2. In addition, the ability of the score to capture the cellular heterogeneity of the response to perturbation is described in the Supplementary Note and Supplementary Fig. 6. We also explore the impact of low sequencing coverage and the similarity between a data-driven IAV perturbation score and a predefined interferon-alpha score in the Supplementary Note and Supplementary Figs. 7 and 8.

Mapping reQTLs

We prioritized 1,892 eQTLs for IAV, 1,863 eQTLs for CA, 1,794 eQTLs for PA and 1,355 eQTLs for MTB datasets (Extended Data Fig. 3; Methods). Next, we tested which of these eQTL effects varied in effect size after perturbation by modeling gene expression in single cells as a function of genotype (G), and the interaction with the discrete perturbation state (GxDiscrete) and the independent effect of the continuous perturbation state (GxScore) accounting for confounders and batch (Methods) using the PME model described in ref. ¹⁸. Our model also includes the top five expression PCs to control unwanted variation in gene expression such as cell-type heterogeneity; we found that the additional random effect for cell-type did not improve the results of the model (Supplementary Fig. 9). We assessed the significance of both GxDiscrete and GxScore terms compared to a null model with no interactions using a two d.f. likelihood ratio test (LRT), hereafter referred to as the 2df-model.

**Fig. 2: Schematic of the perturbation score in experiments from the LLDEEP DAG2+ Project.**

Using the 2df-model, we found 166 reQTLs in IAV, 770 in CA, 646 in MTB and 594 in PA (LRT Q value < 0.05; Fig. 3a–d and Supplementary Tables 2–5). We performed exhaustive quality control to minimize false positives (Extended Data Fig. 4 and Supplementary Fig. 10). Furthermore, we evaluated the degree of replication of previous results (Methods) and observed enrichment of significant interactions for CA (χ² P = 2.2 × 10⁻¹⁷⁸), PA (χ² P = 1.96 × 10⁻¹³⁷) and MTB (χ² P = 4.62 × 10⁻¹⁴³) (Extended Data Fig. 5). These results show that our findings are driven by the interaction (GxDiscrete and GxScore) effects.

**Fig. 3: Mapping reQTLs in single cells under different perturbations.**

We found that reQTLs identified in PBMCs captured most reQTLs present in principal cell types (Supplementary Fig. 11). Furthermore, our framework can be applied to a cell-type specific analysis (Methods), enabling discovery of very specific reQTLs (Supplementary Tables 6–9). However, a limited number of cells per cell-type in these datasets compromises power. Some examples include the increase in the MX1 eQTL effect of rs461981 in CD4⁺ T cells after IAV perturbation and the increase in the SAR1A eQTL effect of rs15801 in CD8⁺ T cells after CA perturbation.

Comparison of power to detect response eQTL effects

Our approach to discover reQTLs had two main features: we (1) use a single-cell model and (2) represent the perturbation state as a continuous variable. The 2df-model detected more reQTLs compared to a traditional approach that aggregates single-cell gene expression by condition and donor into a pseudobulk profile (Extended Data Fig. 6; Methods). To address the effect of the perturbation score on our power to detect reQTLs, we compared the 2df-model to an alternative single-cell PME model that includes only the binary GxDiscrete interaction term (1df-discrete) (Methods). We found that the 2df-model still identified, on average, 89.6% of the reQTLs detected by the 1df-discrete model (Fig. 3e–h), suggesting that little power was lost despite the statistical penalty of estimating an additional interaction term (GxScore) in comparison with the 1df-discrete model. Furthermore, on average, 36.9% of the reQTLs detected with the 2df-model were missed by the 1df-discrete model (Fig. 3e–h), and the 2df-model outperformed it as response heterogeneity increased (Supplementary Fig. 12).

Next, we tested whether these differences in power remained consistent under varying sample sizes (Methods) by randomly downsampling cells per donor (Fig. 4i–l) and donors (Fig. 4m–p). We compared the power difference between the 2df-model and the 1df-discrete. In all scenarios, the 2df-model consistently identified more reQTLs. Also, we observed a linear power loss by reducing donors and cells, suggesting that future studies with larger sample sizes may be able to detect new reQTL effects using the 2df-model. We found that the 2df-model has similar performance in detecting reQTLs as ‘CellRegMap’¹⁹ (CRM) (Methods), but the results from the 2df-model are more interpretable and it is substantially more computationally efficient (Supplementary Fig. 13).

**Fig. 4: Estimating the change in the eQTL effect after perturbation in single cells.**

reQTLs with strong and subtle perturbation effects

We observed eQTLs with a substantial change in the magnitude of effect upon perturbation. For example, we found that the main genotype effect of rs11867191 on SLFN5 and the additional interaction effects (Fig. 4a) resulted in a substantial change in the total eQTL effect (Methods) after IAV perturbation (Fig. 4b; 2df-model_PBMC LRT P = 1.15 × 10⁻¹⁷³). To confirm the change in the SLFN5 eQTL at the donor level, we selected cells between the 45th and 55th percentile (50p) of the perturbation score distribution from within P and nonperturbed (NP) states, respectively (Methods). The eQTL effect in the NP state (NP-50p eQTL_PBMC = −0.07, LRT P = 4.34 × 10⁻³; Fig. 4c) was magnified upon perturbation (P-50p eQTL_PBMC = −0.32, LRT P = 1.39 × 10⁻²¹; Fig. 4d). A similar example that replicated across the four perturbation models was the reQTL for KRR1 (Fig. 4e–h and Extended Data Fig. 7a–i).

In addition, our model also detected subtle changes in reQTL effects, such as the reQTL rs1464264 for the gene ARL4C (IAV 2df-model_PBMC LRT P = 1.85 × 10⁻⁸) with both GxDiscrete and GxScore contributing to the attenuation of the total eQTL effect (Fig. 4i,j). This reQTL was still significant in the 1df-discrete model (1df-discrete_PBMC LRT P = 1.11 × 10⁻⁴); however, it was not detected by the Pseudobulk-linear mixed effect (LME) model (Pseudobulk LME_PBMC LRT P = 2.07 × 10⁻¹). A similar example was found in the reQTL rs10194534 for the gene SESTD1, where the GxScore_PBMC term was the most significant interaction effect in the 2df-model (LRT P = 2.66 × 10⁻³; Fig. 4m), whereas this reQTL was not significant in the 1df-discrete model (1df-discrete_PBMC LRT P = 6.36 × 10⁻²). We confirmed these results by looking at the eQTL effect of these genes in the NP-50p and P-50p state (ARLC4, Fig. 4k,l; SESTD1, Fig. 4o,p). Our findings for KRR1, ARL4C and SESTD1 are consistent with previous studies that identified reQTL effects in perturbation experiments that induced an interferon response^11,12.

To highlight the significance of our results in terms of the potential to discover new reQTLs, we compared our results from the IAV dataset with a previous bulk study that detected reQTLs after IAV stimulation²⁹ (Methods). Despite having a smaller sample size, a total of 33 reQTLs were detected only in our analysis (Supplementary Table 10, Supplementary Note and Supplementary Figs. 14 and 15)

reQTLs identify genes with perturbation-specific regulation

An eQTL may not colocalize with a trait-associated locus in part because the eQTL effects are typically not ascertained in the relevant context, for example, perturbed states reQTLs may be more likely to colocalize compared to an eQTL that has no change upon perturbation. We performed a colocalization analysis with nine immune and four nonimmune traits (coloc.abf, Pr > 50%; Methods). As expected, reQTLs were enriched for colocalizations in all perturbation experiments except MTB, and the 2df-model detected more reQTLs without decreasing the enrichment for colocalization (Extended Data Fig. 8). Additional examples of reQTLs detected only by the 2df-model and previously implicated with traits such as lymphocyte counts³⁰ and hematocrit levels³¹ are shown in Extended Data Fig. 9.

In some cases, the reQTLs detected only in the 2df-model pointed to perturbation-specific disease relevance. One such example was the reQTL effect of rs11721168 for PXK in the IAV experiment (Fig. 5a). This eQTL effect decreases after perturbation (Fig. 5b). We confirmed that the eQTL effect was detected only in cells that score lower in the NP state (Fig. 5c,d). However, the cell-type specific analysis did not detect this reQTL (Supplementary Tables 11–14). In contrast, Alasoo et al. reported that the eQTL effect of rs11721168 for PXK decreased from −0.14 in NP macrophages to −0.09 in macrophages after interferon gamma perturbation³². This discrepancy may reflect differences in stimulation or cell states, or the limited power in our cell-type-specific analyses because of fewer cells.

**Fig. 5: reQTLs can have state-specific colocalization.**

We also found that the eQTL signal for PXK in the NP cells colocalized with a GWAS locus for systemic lupus erythematosus (SLE)³³ (Fig. 5f). When we performed the colocalization at increasing tertiles of the perturbation score within the NP state and P state (Methods), we found that the eQTL signal for PXK colocalized in the lowest perturbation tertile of the NP cells (Fig. 5f). PXK encodes a sorting nexin protein important for receptor internalization and organelle trafficking, including endosomal trafficking³⁴, and it has been implicated previously in SLE risk^2,35. Our results show that genetic regulation of PXK is particularly strong in cells with the lowest perturbation scores. It also suggests that these may be the most relevant cell states through which PXK may be mediating risk for SLE. A similar perturbation-specific reQTL rs3807865 for TMEM106B (Extended Data Fig. 10) showed consistent reQTL effects in refs. ^12,29, both in monocytes. However, those studies were able to detect eQTL effects in resting cells, perhaps related to increased sensitivity to detect eQTLs in these studies due to the large number of cells in bulk data or other technical factors.

Response eQTLs can have cell-type-specific effects

Previous studies demonstrated that eQTLs can have cell-type-dependent effects³⁶. Since our reQTL analysis was performed on heterogeneous PBMC samples, we explored whether reQTL effects differed between cell types (for example, the magnitude of the interaction effects in monocytes versus the magnitude of interaction effects in CD4⁺ T cells) exposed to the same perturbation, such as IAV. To evaluate this, we applied the 2df-model separately for each cell type and tested for heterogeneity using Cochran’s Q test (Methods). Most reQTLs have similar effects across cell types; however, around 25% of reQTLs showed evidence of heterogeneous response eQTL effects (Q value < 0.10) across the four perturbation models (Supplementary Figs. 16 and 17).

For example, we observed that the reQTL rs10774671 for the gene OAS1 had a cell-type-specific effect in the IAV experiment (Supplementary Table 15). This eQTL has no effect in NP PBMCs but has a significant effect after IAV perturbation (Supplementary Table 3). In our analysis of the IAV dataset, we observed stronger interaction effects for this eQTL among CD4⁺ T (Fig. 6a) and B cells (Supplementary Fig. 18a–c). We also observed these differences at the donor level, where we found that the eQTL effect in the P-50p of CD4⁺ T (Fig. 6b,c) and B cells were stronger than the eQTL effect at P-50p in monocytes (Fig. 6e,f). Furthermore, we found that this OAS1 eQTL had state-specific colocalization with the GWAS locus for severe COVID-19 (Supplementary Fig. 19a,b). This result is consistent with previous studies that report association of rs10774671 with levels of OAS1 and risk of severe COVID-19 (ref. ¹³). Previous studies did not detect this reQTL after IAV perturbation in monocytes²⁹ or dendritic cells¹¹. More recently, Kumasaka et al. identified the eQTL rs10774671 for OAS1 in PBMCs post-COVID-19 infection, with a stronger reQTL effect in CD16⁺ monocytes²⁰. The differing cell-type-specific effects may arise from study design differences—ex vivo IAV perturbation versus natural infection. It is worth mentioning that detecting reQTLs with heterogeneous effects may be underpowered for lowly abundant cell types. Also, our framework does not distinguish whether the perturbation effect in each main cell type is driven by direct viral infection or by bystander activation. In contrast to previous studies^29,32,37,38, we did not detect any reQTLs of rs10774671 for the nearby OAS3 gene in any of the datasets analyzed, possibly related to different experimental designs and the low expression of this gene.

**Fig. 6: Response eQTLs with cell-type-specific effects after IAV perturbation.**

We also observed the reQTL rs11171739 for the gene RPS26 in the IAV experiment (Supplementary Table 2). This reQTL has cell-type-specific interaction effects (Supplementary Table 15). More specifically, we found that there was a significant magnification in the total eQTL effect after perturbation in monocytes (Fig. 6g) and B cells (Supplementary Fig. 19j–l); however, the change in the eQTL effect was not only smaller in natural killer (NK) (Fig. 6j) and CD4⁺ T cells (Supplementary Fig. 19m–o) but strikingly also in the opposite direction. Furthermore, there was no significant reQTL effect in CD8⁺ T cells (Supplementary Fig. 19p–r). We also observed these differences in gene regulation in the context of perturbation when we estimated the eQTL effect on NP-50p and P-50p states on monocytes (Fig. 6h,i), NK cells (Fig. 6k,l) and the other cell types (Supplementary Fig. 19). Similar results for the reQTL effect in RPS26 were found in the CA, PA and MTB perturbation experiments (Supplementary Figs. 20–22). Previous studies suggest that rs11171739 has pleiotropic effects on autoimmune diseases³⁹ and increases rheumatoid arthritis (RA) risk¹. Our results show that genetic regulation of RPS26 depends on cell perturbation and cell-type, both of which may influence the causal relationship with diseases such as RA.

Discussion

Recent studies have begun to explore reQTLs in single-cell data, but current linear models are limited to comparing discrete P versus NP conditions, reducing power. Here we introduce a new framework that incorporates the level at which each cell responds to the perturbation to model reQTL effects across continuous perturbation states such as interferon-alpha, complement activation and inflammation. We demonstrate that this framework substantially improves reQTL detection and is applicable to heterogeneous samples such as PBMCs, enabling comparisons across cell types.

Limited colocalization between eQTLs and GWAS loci suggested that disease alleles may affect gene regulation in specific cellular contexts not captured by aggregated eQTL profiles. We observed an enrichment of the colocalization of reQTLs with traits, which highlights the importance of improving the identification of reQTLs. Consistent with previous studies^9,12, reQTLs can have perturbation-dependent colocalization. Notably, we found that, even among cells with the same discrete perturbation state, the strength of colocalization of the reQTL and a trait can depend on the continuous perturbation state of the cell. This highlights the value of identifying highly responsive cellular states to uncover genetic mechanisms of disease.

Current single-cell datasets are limited in the number of cells and people assayed compared to the much larger bulk RNA sequencing (RNA-seq) datasets. However, as genotyped single-cell datasets become larger, single-cell eQTL methods—such as those we present in this study—have the potential to better elucidate context-dependent eQTL effects, including reQTLs. Although we focus on PBMCs, the methods we propose are in no way restricted to this specific tissue. In fact, we envision that this approach could be extended easily to sorted populations from PBMCs. Indeed, we think that it may be particularly useful in exploring single-cell organoid data^40,41,42,43 or solid tissue data from patients⁴³, where cell sorting and bulk RNA-seq may confound important signals^44,45,46,47.

Our work has important limitations. First, our PCA-derived perturbation scores may not generalize across datasets, even with the same perturbation condition. Second, the perturbation score presented here is based on a log-linear model, which may miss nonlinear, cell-type-specific programs. Third, we previously demonstrated the benefits of using protein expression in addition to gene expression to model continuous cell states^48,49. Here we model the continuous perturbation state based only on single-cell RNA-seq (scRNA-seq) data. This limitation is due to the lack of publicly available multiomic (scRNA-seq, CITE-seq (cellular indexing of transcriptomes and epitopes)) data for reQTL analysis. However, our framework should be applicable to the use of multiomic data.

Overall, our work provides a generalizable approach for identifying perturbation-dependent eQTLs and underscores the importance of modeling cell-level variation. Although our framework uses the perturbation score based on corrected expression PCs, our reQTL model could incorporate alternative scores such as those presented in ref. ⁵⁰ and ref. ⁵¹. Combining statistical models such as the one presented here with biologically relevant perturbations will ultimately advance our understanding of the biological mechanisms that underlie disease heritability.

Methods

Cohorts

We used published data (GEO accession no. GSE162632) to study the effect of 12 h of ex vivo stimulation with either IAV or mock (control) in 90 samples of heterogeneous cell composition (PBMCs) using scRNA-seq. Similarly, we obtained data (EGA accession no. EGAS00001005376) of 120 PBMC samples infected for 24 h with CA, PA, MTB or mock (control) from the European Genome-Phenome Archive under a Lifelines DEEP DAG2⁺ Project (LLDEEP DAG2⁺) data access agreement.

Genotype quality control

The original Randolph et al. study²⁴ (IAV perturbation) used 4× low-pass whole-genome sequencing to call genotypes of 89 people—a total of 78,111,311 genome-wide variants. Genotypes with a posterior probability < 0.9 were considered as missing data. We removed variants with a missing call rate >10% and only included variants with a minor allele frequency ≥5% and Hardy-Weinberg equilibrium P > 1 × 10⁻⁵. In total, for the IAV perturbation dataset we had 5,194,816 genetic variants genome-wide that passed quality control (QC), and these variants were used in the eQTL mapping. Furthermore, we selected 508,883 independent variants using PLINK⁵² (–indep-pairwise 50 5 0.2), and then we performed PC analysis (PCA) using the R package SNPRelate⁵³.

The original Oelen et al. study²² (CA, PA and MTB perturbations) used genotype data from the HumanCytoSNP-12 BeadChip and imputed with the Haplotype Reference Consortium reference panel of 120 people at a total of 7,249,882 genome-wide variants mapped on the Hg19 reference genome. For the main analysis we included variants with missing call rate <10%, minor allele frequency ≥5% and Hardy-Weinberg equilibrium P > 1 × 10⁻⁵. The variants that passed the filtering step were then lifted over to the hg38 reference genome using Picard tools. In total, for this dataset, we had 3,962,615 variants after QC, which were used in the eQTL mapping. We selected 292,681 independent variants using PLINK⁵² (–indep-pairwise 50 5 0.2), and then we performed PCA using SNPRelate⁵³. For the secondary analysis, we selected the 3,554 variants included in the Supplementary Table 9 of Oelen et al.²² and lifted over 2,347 variants from the hg19 to hg38 reference genome.

Single-cell mRNA quality control

We performed quality control of the 236,993 PBMCs from the 12-h perturbation experiment with IAV. These cells had <20% reads that mapped to mitochondrial genes. We removed ten influenza IAV genes from the original 19,248 genes in the count matrix. We selected a total of 208,223 cells with >500 expressed genes for the analysis. For the 24-h perturbations with CA, PA and MTB, there were, in total, 493,061 cells and 33,538 genes measured. As in the IAV perturbation study, we selected 475,333 cells with <20% reads mapping to mitochondrial genes and >500 detected genes.

Then, for each perturbation condition (NP or P) in each study, we normalized the counts for each cell, scaled each gene’s expression, and cosine normalized. We selected the 3,000 most variable genes based on the variance stabilizing transformation and conducted PCA using an implicitly restarted Lanczos method implemented in the irlba R package. Then, we controlled the effects of batch and sample donor using Harmony⁵⁴ (v.0.1.0) (parameters: θ_donor = 1, θ_batch = 1). The top 20 hPCs were used for downstream analysis. For cell-type annotations, we used the originally published cluster labels.

Estimating the continuous perturbation state of a cell

In contrast to traditional models that consider only the discrete perturbation state imposed by the experimental design, we define a continuous score to quantify the degree of perturbation response of each cell. The underlying assumption is that perturbation may induce changes in the transcriptional profile of a cell that are proportional to the degree of response of the cell. Similarly, experimental conditions may induce changes in the transcriptional profile of NP cells^{55,56,57,58,59}. To systematically identify axes of variation in the transcriptional profile of P and NP cells, we use PCA and correct technical or biological effects with Harmony to obtain hPCs. We assume that one or several hPCs may capture the effect of perturbation in a cell.

To define a continuous perturbation score, we used a penalized logistic regression with hPCs as independent variables to predict the log odds of being in the pool of P cells, which serves as a surrogate of the cell’s degree of response to the experimental perturbation. Thus, for each perturbation model, we developed a continuous perturbation score of a cell by modeling the discrete perturbation state of a cell (i) (NP = 0, P = 1) as a function of the top 20 hPCs in a logistic Lasso regression model implemented in the R package glmnet⁶⁰. We fitted the model and used the LASSO penalty term with a tuning parameter (λ) obtained from cross validation (cv.glmnet parameters: alpha = 1, lambda.1se)

$$\log \left(\frac{{\pi }_{i}}{1-{\pi }_{i}}\right)={\theta }_{i}+{\beta }_{\rm{PC}_{1}}{X}_{i,\rm{PC}_{1}}+\cdots +{\beta }_{\rm{PC}_{\textit{K}}}{X}_{i,\rm{PC}_{\textit{K}}}$$

(1)

where $K$ is the number of PCs and π_i is the probability of the cell (i) to be in the pool of P cells.

We use this approach instead of focusing on predefined gene sets such as the interferon-stimulated gene pathway for viral perturbations, which may not accurately represent the changes on the transcriptome in specific tissues, cells or perturbation experiments.

Cell-type-specific perturbation score

For each perturbation experiment, cell-type-specific perturbation scores for each main cell-type presented in PBMCs were generated by subsetting the dataset by cell-type and then performing the same steps as described before: gene normalization, scale, cosine normalization, selection of the 3,000 most variable genes (variance stabilizing transformation), PCA, harmony and Lasso regression using the top 20 corrected expression hPCs as predictors in the logistic Lasso regression model.

Correspondence of the perturbation score with canonical protein marker of response to perturbation

We evaluate the ability of the perturbation score generated using the penalized logistic regression to capture the response to perturbation. We re-analyzed a dataset of PBMCs perturbed with either anti CD3 and anti CD28 and LPS, which profiled cells using scRNA-seq as well as surface protein markers (CITE-seq)⁶¹. Thus, for each condition we calculated the perturbation score using the scRNA-seq data as described before. Then, we evaluate the Pearson correlation between the perturbation score of a cell and the expression of surface protein markers of activation.

Gene pathway enrichment analysis

For each perturbation experiment, we calculated the Pearson correlation between the 3,000 most variable genes and the perturbation score of a cell. We selected genes with a correlation Q < 0.05 (3,000 tests). Then, we used these correlation values to rank the genes and performed a gene pathway enrichment analysis using the fgsea R package²⁷ with a maximum gene set size of 500, a minimum size of 15, and 100,000 permutations. We include the following gene sets from the Pathway Interaction Database stored in MSigDB: Hallmark gene set and large collection of immunological conditions (C7).

Prioritizing eQTLs in PBMCs bulk expression profiles

For each individual participant and perturbation condition, we summed the gene expression counts for each gene across all cells to generate bulk profiles, then we averaged the gene expression from each perturbation condition by individual participant, resulting in the average bulk expression profiles. This allows us to use a traditional eQTL linear regression model with one sample per donor to prioritize eQTLs. For each perturbation experiment, we generated bulk expression profiles for the NP, P and average conditions. We normalized the expression profiles (log₂ count per million + 1) and applied an inverse rank normal transformation (qnorm((rank(gene, na.last = ‘keep’) − 0.5)/sum(!is.na(gene))). Then, we removed the effect of age, sex if available, top five genotype PCs and 15 latent factors using PEER factor analysis⁶². We only included transcripts expressed in at least 50% of the samples. The corrected gene expression was used as the dependent variable in our bulk cis-eQTL mapping. We performed the cis-eQTL mapping by testing each gene with genetic variants within 1 Mb of each gene’s transcription starting site using a linear regression implemented in FastQTL⁶³. We use 1,000 permutations per locus and select the gene–SNP pairs (eGenes) with a Q value < 0.05. (Number of tests in IAV, 48,016,811; CA, 46,012,957; PA, 46,012,957; MTB, 44,896,173). In most cases, we prioritized the eQTL from the averaged NP and P profile as this produced the largest number of eQTLs (Q value < 0.10) (Supplementary Fig. 3). Standard approaches to fit the PME model are computationally expensive and impractical for the systematic genome-wide identification of eGenes in single-cell experiments.

Identifying response eQTL effects

For the single-cell approaches for both PBMCs or cell-type-specific analysis, we modeled the raw expression (E) counts in a cell (i) as a function of genotype allele dosage and an interaction with cell state accounting for confounders and technical batch using a described PME model¹⁸. For all the analyses, we included age and sex as fixed effects where available. We also controlled for genetic population stratification by including the top five genotype PCs (gPC) as covariates. Similarly, we controlled for unwanted variation in gene expression by including the top five gene expression PCs (ePC). In addition, we included the number of unique molecular identifier (UMIs) and percentage of reads mapping to mitochondrial genes (MT%) as fixed effects, and donor (d) and technical batch (b) as random effects. We include two cell state variables, the discrete perturbation state (Discrete: where P = 1 for perturbation or 0 for no perturbation) and the residual perturbation score (rScore) (3) that avoid the collinearity between the Discrete variable and the perturbation score (Score) from Eq. 1. To estimate the reQTL effect, we fit a model that included the genotype main effect (G) and the interaction terms of G with the Discrete (GxDiscrete) and rScore (GxrScore) variables. In contrast, we only included the interaction term GxDiscrete in the simplified 1df-discrete model. For the cell-type-specific reQTL analysis, we subset the dataset by each principal cell-type and used the cell-type-specific score and applied the same model (2) as described above.

$$\begin{array}{l}\log \left({E}_{i}\right)\;=\;\theta +{\beta }_{G}{X}_{d,{\rm{geno}}}+{\beta }_{{\rm{age}}}{X}_{d,{\rm{age}}}+{\beta }_{{\rm{nUMI}}}{X}_{i,{\rm{nUMI}}}+{\beta }_{{\rm{MT}}}{X}_{i,{\rm{MT}}}\\\qquad\qquad\;+{\beta }_{{\rm{Discrete}}}{X}_{i,{\rm{Discrete}}}+{\beta }_{{\rm{rScore}}}{X}_{i,{\rm{rScore}}}+{\beta }_{\rm{Gx}{\rm{rScore}}}{X}_{d,{\rm{geno}}}{X}_{i,{\rm{rScore}}}\\\qquad\quad\quad\;\,+{\beta }_{\rm{Gx}{\rm{Discrete}}}{X}_{d,{\rm{geno}}}{X}_{i,{\rm{Discrete}}}+\mathop{\sum}\limits_{k=1}^{5}{\beta }_{{g\rm{PC}}_{k}}{X}_{d,{g\rm{PC}}_{k}}\\\qquad\qquad\;+\mathop{\sum}\limits_{k=1}^{5}{\beta }_{{e\rm{PC}}_{k}}{X}_{i,{e\rm{PC}}_{k}}+\left(d\right)+({k}_{b}|b),\end{array}$$

(2)

$${\rm{Score}}=\theta +{\beta }_{{\rm{Discrete}}}{X}_{i,{\rm{Discrete}}}+{r\;\rm{Score}}.$$

(3)

To map reQTLs in bulk profiles (i) (Pseudobulk LME) we modeled the residuals from the PEER factor analysis (rPFA) as a function of the genotype and the GxDiscrete term accounting for donor’s age, sex and both the total numbers of UMIs and mean MT% across all cells per donor using an LME model. We included donor as a random effect in the LME model (equation (4)).

$$\begin{array}{l}{{\rm{rPFA}}}_{i}\;=\;\theta +{\beta }_{G}{X}_{d,{\rm{geno}}}+{\beta }_{{\rm{age}}}{X}_{d,{\rm{age}}}+{\beta}_{{\rm{nUMI}}}{X}_{i,{\rm{nUMI}}}+{\beta }_{{\rm{MT}}}{X}_{i,{\rm{MT}}}\\\qquad\qquad+{\beta }_{{\rm{Discrete}}}{X}_{i,{\rm{Discrete}}}+{\beta}_{{\rm{GxDiscrete}}}{X}_{d,{\rm{geno}}}{X}_{i,{\rm{Discrete}}}+\left(d\right)+\varepsilon .\end{array}$$

(4)

We compared the significance of each of these full models against a null model with no interaction terms using the likelihood ratio test and considered significant reQTLs to be those with an LRT Q value < 0.05. Furthermore, to rule out the possibility of spurious significant results, we performed 500 permutations per each discovered reQTL. For the permutation in the single-cell models, we shuffled values of the perturbation score and the discrete perturbation state randomly within each donor. For the pseudobulk LME model, we shuffled values of the perturbation state randomly across samples.

To exclude spurious associations, for each gene we counted the proportion of LRT P values from the permutation experiments that were smaller than the discovery LRT P (permutation P) and removed genes with permutation P < 0.05. To exclude genes where the PME model may not be appropriate due to high differential gene expression across cell states, we fit a PME model where we simulated null conditions by shuffling the perturbation score value randomly within each donor. We tested the significance of the interaction term under this null by comparing this model with a model with no GxScore term using the LRT ANOVA. For each eGene, we performed 100 permutations and removed eGenes where we observed a significant P value (LRT P < 0.01) in more than five out of 100 permutations.

In addition, we performed a secondary analysis for the CA, PA and MTB perturbation experiments based on the 1,825 eGenes prioritized by ref. ²² and listed in Supplementary Table 9 and passing QC in our preprocessing. For each perturbation experiment, we estimated reQTL effects for this set of 1,825 eQTLs using each of the models described above (equations (1) and (3)) and compared the significance against a null model with no interactions terms using the LRT and considered significant those with LRT Q value < 0.05. Furthermore, for each perturbation experiment, we generated P values under the uniform distribution based on the number of eQTLs included in our secondary analysis. Then, we calculated the enrichment of significant reQTLs in the 2df-model by comparing the number of tests where we found 2df-model LRT P < 0.001, >0.001 and ≤0.01, >0.01 and ≤0.05, >0.05 and ≤0.1, and >0.1 with those found under the uniform distribution using a χ² test.

Estimating the eQTL effect at levels of the perturbation score

For any given reQTL, we selected cells either in the NP or P condition and calculated the 45th and 55th percentile of the perturbation score in that group. Then, we selected the cells with perturbation score values within the 45th to 55th percentile (50p) and estimated the eQTL effect in those cells by fitting a single-cell PME, where we modeled the raw gene counts (E) in a cell (i) as a function of the genotype (G) and the same cell and donor level fix and random covariates as described above (equation (2)) but without interaction terms (equation (5)). We consider the significance of the G term by comparing it with a null model with no G term using the LRT with 1 d.f.

$$\begin{array}{l}\log \left({E}_{i}\right)\;=\;\theta +{\beta }_{G}{X}_{d,{\rm{geno}}}+{\beta }_{{\rm{age}}}{X}_{d,{\rm{age}}}+{\beta }_{{\rm{nUMI}}}{X}_{i,{\rm{nUMI}}}+{\beta }_{{\rm{MT}}}{X}_{i,{\rm{MT}}}\\\qquad\qquad\quad+\mathop{\sum}\limits_{k=1}^{5}{\beta}_{{g\rm{PC}}_{k}}{X}_{d,{g\rm{PC}}_{k}}+\mathop{\sum}\limits_{k=1}^{5}{\beta }_{{e\rm{PC}}_{k}}{X}_{i,{e\rm{PC}}_{k}}+\left(d\right)+({k}_{b}|b).\end{array}$$

(5)

To visualize the eQTL effect at the donor level, we aggregated the gene counts in the cells in the 50p by donor and then normalized the gene expression (log₂ counts per million + 1). Then, we plotted the gene expression as a function of the eQTL’s allele dosage.

Delta betas between PBMC and cell-type-specific reQTL analysis

First, based on the PBMCs analysis and for each main cell-type (that is, TCD4⁺, monocytes), we calculated the difference in the total eQTL between the 50th percentile in the P state versus the 50th percentile in the NP state (Delta 1). Furthermore, based on the cell-type-specific analysis (TCD4⁺ and TCD4⁺ perturbation score), we calculated the difference in the total eQTL between the 50th percentile in the P state versus the 50th percentile in the NP state (Delta 2). We only included eGenes with LRT Q value < 0.05 in the cell-type-specific analysis. Then, for each main cell-type we calculated the Pearson correlation between Delta 1 and Delta 2.

Composite eQTL effect in a cell

We defined the total eQTL in a cell (i) as the sum of the genotype main effect (G) with the discrete interaction effect (GxDiscrete) and the interaction effect with the residual of the perturbation score (GxrScore), weighted by its corresponding value (Discrete, NP = 0, P = 1; Score, continuous perturbation score) (Eq. 6).

$${{\beta }_{{\rm{total}}}=\beta }_{G}+{\beta }_{{\rm{GxDiscrete}}}{X}_{i}+{\beta }_{{\rm{GxrScore}}}{X}_{i}.$$

(6)

Comparison of the IAV perturbation score and an interferon-alpha score

As an alternative approach to quantify the response to perturbation, we developed a per cell interferon-alpha score based on a predefined gene set (MSigDB: Hallmark interferon-alpha response (INF-alpha))⁶⁴. The selected genes (INF-alpha) were normalized, then we removed genes expressed in <1% of the cells and then scaled the expression of the gene (mean = 0, variance = 1). Then, for each cell, we averaged the expression across genes. To compare reQTL detected by each score, we ran a reduced single-cell reQTL model where we tested only for the significance of the interaction effect between the genotype and the score (GxScore). We then evaluated the number of significant reQTL (LRT Q value < 0.05) as well as the standardized effect size of the main eQTL effect (G) and interaction effects (GxScore) between the two approaches.

Comparison of Poisson mixed effect and negative binomial models

To confirm that the single-cell PME model was well calibrated, we fit a single-cell reQTL model where we modeled the raw expression counts in single cells as a function of genotype allele dosage and interaction terms equivalent to the PME 2df-model described in Eq. 1, using a negative binomial model implemented in the glmmTMB⁶⁵ R package. We calculated the correlation between the LRT P value from the original PME and the negative binomial model.

Downsampling donors and cells

To test the effect of sample size on the power to detect reQTLs in the single-cell models, we generated datasets where we removed cells or removed donors randomly. We generated five datasets and, in each, we removed either 10, 20, 30, 40 or 50% of cells randomly within each donor. For each dataset, we generated three replicates by using different random seeds. Similarly, we generated five datasets with three replicates each where for each set we randomly removed either 10, 20, 30, 40 or 50% of donors. Then, in each downsampled dataset, we estimated reQTL effects with the PME model by testing the originally prioritized eGenes and counting the number of reQTLs that reached an LRT Q value < 0.05.

Power to detect reQTL at different degrees of perturbation

For each of the perturbation experiments (IAV, CA, PA and MTB), we generated three new datasets with different degrees of perturbation by sampling cells from the P condition (that is, cells exposed to IAV) at different ranges of the score (30th to 70th, 20th to 80th, and full distribution). We constrained the sampling so that each dataset contained the same number of P cells, and we kept all the NP cells. Then, we used the same set of prioritized eQTLs as the original analysis (IAV, 1,892; CA, 1,863; PA, 1,794; MTB, 1,355) and tested for reQTLs in single cells using both the 2df-model and the reduced 1df-discrete model. We compared the number of reQTLs that reached LRT Q value < 0.05 between the two models.

Similarly, for each perturbation experiment, we generated three additional new datasets by sampling P cells at different levels of the perturbation score (<33rd, >66th and the full distribution), again constraining the number of cells to be fixed between the datasets, and keeping all NP cells. We performed a reQTL analysis using both 2df-model and 1df-discrete models and compared the number of reQTLs that reached LRT Q value < 0.05 between the two models.

Comparison of the 2df-model with CellRegMap to find reQTLs in single cells

We compared our reQTL framework with CellRegMap¹⁹ (CRM), a statistical framework that can identify eQTLs with dynamic effects across cell-states in single cells. CRM can use PCs as cell states; however, in our comparison, we applied CRM using the residual effect of the perturbation score and the discrete perturbation state as these are the same cell-states used in our 2df-model. In the CA perturbation experiment, we defined a set of ground-truth reQTLs. These were reQTLs detected by the original analysis by Oelen et al.²² and detected in our pseudobulk reQTL approach. Similarly, we defined a set of negative controls as those reQTLs not reported by Oelen et al.²² or our pseudobulk analysis (LRT Q value > 0.2). In addition, for every ground-true reQTL, we randomly selected three negative controls with mean gene expression within the range of the mean gene expression of positive control ± (mean positive/2). Based on these criteria, we had 35 positive reQTL and 105 negative controls, and we tested for reQTLs using our 2df-model and CRM with the default parameters. We corrected for multiple comparisons using a Bonferroni correction and compared the sensitivity and specificity of the two models.

reQTLs with heterogeneous effects across cell types

We tested for heterogeneous effects by comparing the effect size of each of the interaction terms between cell types. For each perturbation experiment, we selected cells from each main cell type present in the experiment (IAV: B, CD4⁺ T, CD8⁺ T, Monocytes, NK; CA-PA-MTB: B, CD4⁺ T, CD8⁺ T, Monocytes, NK, DC) and fitted the single-cell 2df-model with the GxDiscrete and GxScore terms. Then, for each reQTL, we used the summary statistics to compare the effect sizes of each of the interaction’s terms across cell types. We tested whether the variability in the observed effect sizes is larger than the expected based on sampling variability alone using the Cochran’s Q test (Q) implemented in the metafor R package (rma, method = ‘FE,’ weighted = TRUE). We defined response eQTLs with heterogeneous effects as those with a Cochrane’s Q-test Q value < 0.10 in at least one of the interaction terms.

$$\begin{array}{ccc}{\beta }_{{\rm{meta}}}=\left(\mathop{\sum }\limits_{k=1}^{K}\frac{{\beta }_{k}}{s{{e}_{k}}^{2}}\right)\left/\left(\mathop{\sum }\limits_{k=1}^{K}\frac{1}{s{{e}_{k}}^{2}}\right)\right.&{\rm{and}}& Q=\mathop{\sum }\limits_{k=1}^{K}\frac{1}{s{{e}_{k}}^{2}}{\left({\beta }_{k}-{\beta}_{{\rm{meta}}}\right)}^{2}\end{array},$$

where $K$ is the number of cell types in the perturbation experiment, ${\beta }_{k}$ is the effect size for each cell-type, and ${\beta }_{{\rm{meta}}}$ is the average effect size weighted by the inverse of the variance of the estimate on each cell type.

Overlap with previous response eQTLs in response to IAV

We compared the overlap between the reQTLs found in the IAV dataset using the 2df-model (all PBMCs and monocyte-specific) and the reQTLs found after IAV perturbation in primary monocytes reported originally in Table S2 by Quach et al.²⁹. There were 286 eGenes present in both analyses. We compared the overlap of eGenes with significant reQTLs (LRT Q value < 0.05) found in our PBMC analysis and the monocyte-specific analysis of the IAV dataset. Similarly, we selected eGenes with reported reQTLs P value from Quach et al.²⁹. In the case of several reQTLs per gene reported in ref. ²⁹, we chose the result with the lowest reQTL P value.

Colocalization analysis

To find instances where a perturbation-dependent reQTL shared its causal variant with a GWAS locus, we performed a colocalization analysis using coloc⁶⁶ (coloc.abf), a Bayesian approach that estimates the posterior probability of having a shared true causal variant between the eQTL and the GWAS locus. We used summary statistics from nine GWAS on immune traits (type 1 diabetes⁶⁷, multiple sclerosis⁶⁸, asthma⁶⁹, RA¹, SLE³³, inflammatory bowel disease⁷⁰, Crohn’s disease⁷⁰, ulcerative colitis⁷⁰, psoriasis⁷¹) and three GWAS on nonimmune traits (coronary artery disease⁷², type 2 diabetes⁷³, depression⁷⁴).

We defined eQTLs as those with a nominal Q value < 0.05 in the analysis with FastQTL. We ran coloc with the default parameters and defined colocalization as those loci with a posterior probability H₄ (PPH4) > 0.5. Furthermore, for those colocalized loci, we generated the 95% credible set and obtained the posterior probability of each variant within the credible set (snpH₄) under the single causal variant assumption. We calculated the enrichment of colocalization in eGenes with reQTL effect compared to eGenes with no reQTL effect using Fisher’s exact test.

In addition, to find colocalization at different levels of the perturbation score, we defined tertiles of perturbation within the NP (T1–T3) and P (T4–T6) conditions. For example, in the IAV experiment, T1 was defined as NP cells with perturbation score values <33rd percentile (−0.90), T2 was defined as NP cells with the perturbation score ≥33rd and below 66th percentile (−0.75) and T3 was defined as NP cells with a perturbation score ≥ 66th percentile. Then, for each tertile, we generated pseudobulk gene expression profiles, tested for cis-eQTL and ran coloc as described before.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

For the IAV perturbation experiment²⁴, the scRNA-seq data and the imputed low-pass WGS data are publicly available via SRA (PRJNA736483) and via Zenodo at https://doi.org/10.5281/zenodo.4273999 (ref. ⁷⁵). For the CA, PA and MTB perturbation experiments²², the scRNA-seq data and the genotype data are publicly available at https://eqtlgen.org/sc/datasets/1m-scbloodnl.html.

Code availability

Scripts to reproduce analyses are available via GitHub at https://github.com/immunogenomics/reQTL and via Code Ocean at https://doi.org/10.24433/CO.4460698.v1.

References

Ishigaki, K. et al. Multi-ancestry genome-wide association analyses identify novel genetic mechanisms in rheumatoid arthritis. Nat. Genet. 54, 1640–1651 (2022).
CAS PubMed PubMed Central Google Scholar
Khunsriraksakul, C. et al. Multi-ancestry and multi-trait genome-wide association meta-analyses inform clinical risk prediction for systemic lupus erythematosus. Nat. Commun. 14, 668 (2023).
CAS PubMed PubMed Central Google Scholar
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Google Scholar
Kwok, A. J., Mentzer, A. & Knight, J. C. Host genetics and infectious disease: new tools, insights and translational opportunities. Nat. Rev. Genet. 22, 137–153 (2021).
CAS PubMed Google Scholar
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
CAS Google Scholar
Schmiedel, B. J. et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell 175, 1701–1715 (2018).
CAS PubMed PubMed Central Google Scholar
Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).
CAS PubMed PubMed Central Google Scholar
Kang, J. B., Raveane, A., Nathan, A., Soranzo, N. & Raychaudhuri, S. Methods and insights from single-cell expression quantitative trait loci. Annu. Rev. Genomics Hum. Genet. 24, 277–303 (2023).
CAS PubMed PubMed Central Google Scholar
Kim-Hellmuth, S. et al. Genetic regulatory effects modified by immune activation contribute to autoimmune disease associations. Nat. Commun. 8, 266 (2017).
PubMed PubMed Central Google Scholar
Barreiro, L. B. et al. Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc. Natl Acad. Sci. USA 109, 1204–1209 (2012).
CAS PubMed PubMed Central Google Scholar
Lee, M. N. et al. Common genetic variants modulate pathogen-sensing responses in human dendritic cells. Science 343, 1246980 (2014).
PubMed PubMed Central Google Scholar
Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).
PubMed PubMed Central Google Scholar
Savage, A. K. et al. Multimodal analysis for human ex vivo studies shows extensive molecular changes from delays in blood processing. iScience 24, 102404 (2021).
CAS PubMed PubMed Central Google Scholar
Niepel, M., Spencer, S. L. & Sorger, P. K. Non-genetic cell-to-cell variability and the consequences for pharmacology. Curr. Opin. Chem. Biol. 13, 556–561 (2009).
CAS PubMed PubMed Central Google Scholar
Gascoigne, K. E. & Taylor, S. S. Cancer cells display profound intra- and interline variation following prolonged exposure to antimitotic drugs. Cancer Cell 14, 111–122 (2008).
CAS PubMed Google Scholar
Feinerman, O., Veiga, J., Dorfman, J. R., Germain, R. N. & Altan-Bonnet, G. Variability and robustness in T cell activation from regulated heterogeneity in protein levels. Science 321, 1081–1084 (2008).
CAS PubMed PubMed Central Google Scholar
Ma, Q., Wang, Y., Lo, A. S.-Y., Gomes, E. M. & Junghans, R. P. Cell density plays a critical role in ex vivo expansion of T cells for adoptive immunotherapy. J. Biomed. Biotechnol. 2010, 386545 (2010).
PubMed PubMed Central Google Scholar
Nathan, A. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022).
CAS PubMed PubMed Central Google Scholar
Cuomo, A. S. E. et al. CellRegMap: a statistical framework for mapping context‐specific regulatory variants using scRNA‐seq. Mol. Syst. Biol. 18, e10663 (2022).
CAS PubMed PubMed Central Google Scholar
Kumasaka, N. et al. Mapping interindividual dynamics of innate immune response at single-cell resolution. Nat. Genet. 55, 1066–1075 (2023).
CAS PubMed PubMed Central Google Scholar
Schmiedel, B. J. et al. Single-cell eQTL analysis of activated T cell subsets reveals activation and cell type-dependent effects of disease-risk variants. Sci. Immunol. 7, eabm2508 (2022).
CAS PubMed PubMed Central Google Scholar
Oelen, R. et al. Single-cell RNA-sequencing of peripheral blood mononuclear cells reveals widespread, context-specific gene expression regulation upon pathogenic exposure. Nat. Commun. 13, 3267 (2022).
CAS PubMed PubMed Central Google Scholar
Kang, J. B. et al. Mapping the dynamic genetic regulatory architecture of HLA genes at single-cell resolution. Nat. Genet. 55, 2255–2268 (2023).
CAS PubMed PubMed Central Google Scholar
Randolph, H. E. et al. Genetic ancestry effects on the response to viral infection are pervasive but cell type specific. Science 374, 1127–1133 (2021).
CAS PubMed PubMed Central Google Scholar
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
CAS PubMed PubMed Central Google Scholar
Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2021).
Levy, D. E., Marié, I. J. & Durbin, J. E. Induction and function of type I and III interferon in response to viral infection. Curr. Opin. Virol. 1, 476–486 (2011).
CAS PubMed PubMed Central Google Scholar
Quach, H. et al. Genetic adaptation and Neandertal admixture shaped the immune system of human populations. Cell 167, 643–656 (2016).
CAS PubMed PubMed Central Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
CAS PubMed PubMed Central Google Scholar
Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213 (2020).
CAS PubMed PubMed Central Google Scholar
Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431 (2018).
CAS PubMed PubMed Central Google Scholar
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
CAS PubMed PubMed Central Google Scholar
Takeuchi, H., Takeuchi, T., Gao, J., Cantley, L. C. & Hirata, M. Characterization of PXK as a protein involved in epidermal growth factor receptor trafficking. Mol. Cell. Biol. 30, 1689–1702 (2010).
CAS PubMed PubMed Central Google Scholar
Langefeld, C. D. et al. Transancestral mapping and genetic load in systemic lupus erythematosus. Nat. Commun. 8, 16021 (2017).
CAS PubMed PubMed Central Google Scholar
Fairfax, B. P. et al. Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nat. Genet. 44, 502–510 (2012).
CAS PubMed PubMed Central Google Scholar
Nédélec, Y. et al. Genetic ancestry and natural selection drive population differences in immune responses to pathogens. Cell 167, 657–669 (2016).
PubMed Google Scholar
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
CAS PubMed PubMed Central Google Scholar
Márquez, A. et al. Meta-analysis of Immunochip data of four autoimmune diseases reveals novel single-disease and cross-phenotype associations. Genome Med. 10, 97 (2018).
PubMed PubMed Central Google Scholar
Uzquiano, A. et al. Proper acquisition of cell class identity in organoids allows definition of fate specification programs of the human cerebral cortex. Cell 185, 3770–3788 (2022).
CAS PubMed PubMed Central Google Scholar
Tresenrider, A. et al. Single-cell sequencing of individual retinal organoids reveals determinants of cell-fate heterogeneity. Cell Rep. Methods 3, 100548 (2023).
CAS PubMed PubMed Central Google Scholar
Lim, E. T. et al. Orgo-Seq integrates single-cell and bulk transcriptomic data to identify cell type specific-driver genes associated with autism spectrum disorder. Nat. Commun. 13, 3243 (2022).
CAS PubMed PubMed Central Google Scholar
Cook, C. P. et al. A single-cell transcriptional gradient in human cutaneous memory T cells restricts Th17/Tc17 identity. Cell Rep. Med. 3, 100715 (2022).
CAS PubMed PubMed Central Google Scholar
Denisenko, E. et al. Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows. Genome Biol. 21, 130 (2020).
CAS PubMed PubMed Central Google Scholar
O’Flanagan, C. H. et al. Dissociation of solid tumor tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses. Genome Biol. 20, 210 (2019).
PubMed PubMed Central Google Scholar
Hu, P., Zhang, W., Xin, H. & Deng, G. Single cell isolation and analysis. Front. Cell Dev. Biol. 4, 116 (2016).
PubMed PubMed Central Google Scholar
Costa, A., Silvestrini, R., Del Bino, G. & Motta, R. Implications of disaggregation procedures on biological representation of human solid tumours. Cell Prolif. 20, 171–180 (1987).
CAS Google Scholar
Nathan, A. et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).
CAS PubMed PubMed Central Google Scholar
Zhang, F. et al. Deconstruction of rheumatoid arthritis synovium defines inflammatory subtypes. Nature 623, 616–624 (2023).
CAS PubMed PubMed Central Google Scholar
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
CAS PubMed PubMed Central Google Scholar
Weinberger, E., Lin, C. & Lee, S.-I. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat. Methods 20, 1336–1345 (2023).
CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
CAS PubMed PubMed Central Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
CAS PubMed PubMed Central Google Scholar
van den Brink, S. C. et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat. Methods 14, 935–936 (2017).
PubMed Google Scholar
Mattei, D. et al. Enzymatic dissociation induces transcriptional and proteotype bias in brain cell populations. Int. J. Mol. Sci. 21, 7944 (2020).
CAS PubMed PubMed Central Google Scholar
Machado, L., Relaix, F. & Mourikis, P. Stress relief: emerging methods to mitigate dissociation-induced artefacts. Trends Cell Biol. 31, 888–897 (2021).
CAS PubMed Google Scholar
Barnes, M. G., Grom, A. A., Griffin, T. A., Colbert, R. A. & Thompson, S. D. Gene expression profiles from peripheral blood mononuclear cells are sensitive to short processing delays. Biopreserv. Biobank 8, 153–162 (2010).
CAS PubMed PubMed Central Google Scholar
Andriamboavonjy, L. et al. Comparative analysis of methods to reduce activation signature gene expression in PBMCs. Sci. Rep. 13, 23086 (2023).
CAS PubMed PubMed Central Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
PubMed PubMed Central Google Scholar
Lawlor, N. et al. Single cell analysis of blood mononuclear cells stimulated through either LPS or anti-CD3 and anti-CD28. Front. Immunol. 12, 636720 (2021).
CAS PubMed PubMed Central Google Scholar
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).
PubMed PubMed Central Google Scholar
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
CAS PubMed Google Scholar
Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
CAS PubMed PubMed Central Google Scholar
Brooks, M. E. et al. glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R. J. 9, 378 (2017).
Google Scholar
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
PubMed PubMed Central Google Scholar
Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594, 398–402 (2021).
CAS PubMed PubMed Central Google Scholar
International Multiple Sclerosis Genetics Consortium (IMSGC). Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 45, 1353–1360 (2013).
Demenais, F. et al. Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks. Nat. Genet. 50, 42–53 (2018).
CAS PubMed Google Scholar
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
CAS PubMed PubMed Central Google Scholar
Tsoi, L. C. et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat. Genet. 44, 1341–1348 (2012).
CAS PubMed PubMed Central Google Scholar
van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).
PubMed PubMed Central Google Scholar
Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 9, 2941 (2018).
PubMed PubMed Central Google Scholar
Howard, D. M. et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat. Commun. 9, 1470 (2018).
PubMed PubMed Central Google Scholar
Randolph, H. E. Influenza A response variation scripts. Zenodo https://doi.org/10.5281/zenodo.4273998 (2021).

Download references

Acknowledgements

This work is supported in part by funding from the National Institutes of Health (R01AR063759, P01AI148102, U01HG012009, UC2AR081023, R56HG013083), T32DK110919 and T32HG10464-5 (C.V.), F30AI172238 (J.B.K.), T32GM007753 and T32GM144273 (J.B.K. and L.R.).

Author information

Authors and Affiliations

Center for Data Sciences, Brigham and Women’s Hospital, Boston, MA, USA
Cristian Valencia, Aparna Nathan, Joyce B. Kang, Laurie Rumker & Soumya Raychaudhuri
Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Cristian Valencia, Aparna Nathan, Joyce B. Kang, Laurie Rumker, Hyunsun Lee & Soumya Raychaudhuri
Division of Rheumatology, Inflammation and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Cristian Valencia, Aparna Nathan, Joyce B. Kang, Laurie Rumker & Soumya Raychaudhuri
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Cristian Valencia, Aparna Nathan, Joyce B. Kang, Laurie Rumker & Soumya Raychaudhuri
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Cristian Valencia, Aparna Nathan, Joyce B. Kang, Laurie Rumker, Hyunsun Lee & Soumya Raychaudhuri
Immune Tolerance Network, Boston, MA, USA
Hyunsun Lee
Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA, USA
Hyunsun Lee

Authors

Cristian Valencia
View author publications
Search author on:PubMed Google Scholar
Aparna Nathan
View author publications
Search author on:PubMed Google Scholar
Joyce B. Kang
View author publications
Search author on:PubMed Google Scholar
Laurie Rumker
View author publications
Search author on:PubMed Google Scholar
Hyunsun Lee
View author publications
Search author on:PubMed Google Scholar
Soumya Raychaudhuri
View author publications
Search author on:PubMed Google Scholar

Contributions

C.V. and S.R. conceptualized the study. C.V., S.R. and A.N. designed the statistical and computational strategy. C.V. analyzed the data. J.B.K., L.R. and H.L. gave feedback and advice on the interpretation of results. All authors contributed to writing and editing the final manuscript.

Corresponding author

Correspondence to Soumya Raychaudhuri.

Ethics declarations

Competing interests

S.R. is an advisor for Janssen, BMS, Pfizer, Nimbus and Bonito. He is also a founder for Mestag Therapeutics. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Maxime Rotival and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Modeling the perturbation score.

We generated the continuous perturbation score in the CA, PA and MTB experiments. a–c, Plots showing the regression coefficients of the penalized logistic regression in each of the perturbation experiments. d–g, Plots showing the cross-validation results for each of the perturbation scores. Dotted lines show the penalizations factor (λ) that resulted in the most regularized model and that gives the model with minimum mean cross validation error.

Extended Data Fig. 2 Top genes correlated with the perturbation score.

Scatter plots showing the Pearson correlation (r) of each perturbation score and the expression of the top correlated genes colored by cell density. Each row corresponds to the perturbation experiment IAV, CA, PA and MTB, respectively.

Extended Data Fig. 3 Set of eQTLs prioritized on each perturbation experiment.

Set of eQTLs identified on each perturbation experiment. a, IAV. b, CA. c, PA. d, MTB. Bar plots shows the number of significant (q-value < 0.10) eQTLs detected on bulk expression profiles for the non-perturbed (NP), perturbed (P) and the average of the NP and P conditions.

Extended Data Fig. 4 Comparison of the P values between PME and Negative binomial reQTL models.

a, IAV. b, CA. c, PA. d, MTB. Scatter plot shows good agreement (Pearson correlation r) between the likelihood ratio test P-value (LRT P) from the single cell Poisson Mix Effect (PME) model (2df-model) and the LRT P-value from an equivalent negative binomial model. Each dot represents an reQTL that was significant in the PME 2df-model.

Extended Data Fig. 5 Enrichment of significant interaction among previously reported eQTLs.

Results from our secondary analysis where we examined reQTL effects with the 2df-model on the set of 1,825 previously prioritized eQTLs from the original study by Oelen et al. a, CA. b, MTB. c, PA. The x-axis indicates the number of reQTLs with the specified LRT P-value obtained on the 2df-model or under the uniform distribution (1,825 P values). χ², chi-square test.

Extended Data Fig. 6 Mapping reQTLs in single cells with the 2df-model has more power compared to a pseudobulk approach.

a–d, Bar plots showing the number of reQTLs detected by the single cell 2df-model and the LME-pseudobulk. e–h, Venn diagrams showing the number of eGenes with reQTLs specifically detected by each model and the number of overlapping eGenes with reQTL effects. reQTLs were defined as those discovered with LRT q-value < 0.05 and confirmed with 500 permutations (permutation P-value < 0.05).

Extended Data Fig. 7 Decreased of the eQTL effect after perturbations in the KRR1 gene.

a, d, g, Scatter plots showing the total eQTL effect for KRR1 in CA (a), MTB (d), and PA (g). Each dot represents the total eQTL (β) in a cell. The color corresponds to the NP and P conditions. The dashed lines on the x-axis indicate the perturbation score at the 50th percentile. The dashed lines on the y-axis indicate the total beta at the 50th percentile. The histogram below the scatter plot shows the density of the perturbation score color by NP and P conditions. b, c, CA. e, f, MTB. h, i, PA. Boxplots represent the average total eQTL (β) and 95% CI for cells between the 45th to 55th percentiles within each, the NP and P state after aggregating the single cell gene count by donor. Box plots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the interquartile range (IQR) (or minimum and maximum values if they fall within that range; end of whiskers).

Extended Data Fig. 8 reQTLs are enriched for colocalizations.

Forest plot shows that reQTLs are more likely to colocalize compared to eQTLs in the IAV, PA, and CA perturbation experiments. Plot shows the estimated odds ratio (OR), and the error bars represented the 95% confidence interval. 2df-model, reQTLs detected by the single cell PME 2df-model; Bulk, reQTLs detected by the pseudobulk model.

Extended Data Fig. 9 Examples of reQTLs detected only by the 2df-model.

a, Scatter plot showing the total eQTL (β) of rs3184122 for COX14 on each cell in the CA perturbation experiment. b, c, Average eQTL effect of rs3184122 on COX14 in the CA experiment. d, Total eQTL (β) of rs2238641 for FARSA on each cell in the CA perturbation experiment. e, f, Average eQTL effect of rs2238641 on FARSA in the NP and P states of the CA experiment. g, Total eQTL (β) of rs3108680 for PRMT6 on each cell in the PA perturbation experiment. h, i, Average eQTL effect of rs3108680 on PRMT6 the NP and P states of the PA experiment. For the scatter plots, the color corresponds to the NP and P conditions. The dashed lines on the x-axis indicate the perturbation score at the 50th percentile. The dashed lines on the y-axis indicate the total beta at the 50th percentile. Boxplots represent the average total eQTL (β) and 95% CI for cells between the 45th to 55th percentiles within each, the NP and P state after aggregating the single cell gene count by donor. Box plots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the interquartile range (IQR) (or minimum and maximum values if they fall within that range; end of whiskers).

Extended Data Fig. 10 Increase of eQTL effect for TMEM106B in the IAV experiment.

reQTL rs3807865 for TMEM106B in the IAV perturbation experiment had perturbation-specific colocalization with depression. a, Plot shows the estimates for the genotype (G) and interactions (GxScore, GxDiscrete) from the 2df-model in PBMC. The error bars correspond to the 95% confidence interval. b, Scatter plot showing the total eQTL (β) effect of rs3807865 for TMEM106B on each cell. The color corresponds to the NP and P conditions. The dashed lines on the x-axis indicate the perturbation score at the 50th percentile. The dashed lines on the y-axis indicate the total beta at the 50th percentile. c, d, Boxplot represents the average total eQTL (β) and 95% CI for cells between the 45th to 55th percentiles within each, the NP and P state after aggregating the single cell gene count by donor. Box plots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the interquartile range (IQR) (or minimum and maximum values if they fall within that range; end of whiskers). e, Probability of colocalization of eQTL in TMEM106B with GWAS for depression at NP and P conditions. f, Probability of colocalization of eQTL signal in TMEM106B with GWAS for depression at tertiles of the perturbation score in the NP (NP_T1-NP_T3) and P (P_T1-P_T3) cells.

Supplementary information

Supplementary Information

Supplementary Note, Figs. 1–27 and legends for Tables 1–17.

Reporting Summary

Peer Review File

Supplementary Table 1

Gene set enriched for each of the perturbation scores. MSigDB gene set (GSEA P adjusted < 0.05).

Supplementary Table 2

Results from the 2df-model IAV (tab 1), 1 df-discrete (tab 2) and pseudobulk LME model (tab 3).

Supplementary Table 3

Results from the 2df-model CA (tab 1), 1df-discrete (tab 2) and pseudobulk LME model (tab 3).

Supplementary Table 4

Results from the 2df-model PA (tab 1), 1df-discrete (tab 2) and pseudobulk LME model (tab 3).

Supplementary Table 5

Results from the 2df-model MTB (tab 1), 1df-discrete (tab 2) and pseudobulk LME model (tab 3).

Supplementary Table 6

Results from cell-type-specific reQTL analysis using the 2df-model in IAV.

Supplementary Table 7

Results from cell-type-specific reQTL analysis using the 2df-model in 24 h CA.

Supplementary Table 8

Results from cell-type-specific reQTL analysis using the 2df-model in 24 h PA.

Supplementary Table 9

Results from cell-type-specific reQTL analysis using the 2df-model in 24 h MTB.

Supplementary Table 10

Comparison of reQTLs detected in the IAV dataset using the 2df-model and previous reQTL study²⁹.

Supplementary Table 11

Results from reQTL analysis in IAV by cell type.

Supplementary Table 12

Results from reQTL analysis in CA by cell type.

Supplementary Table 13

Results from reQTL analysis in PA by cell type.

Supplementary Table 14

Results from reQTL analysis in MTB by cell type.

Supplementary Table 15

Test of heterogeneous effect for the reQTL effect in IAV.

Supplementary Table 16

Colocalization analysis on the NP and P states in IAV, CA, MTB and PA.

Supplementary Table 17

Colocalization analysis at different levels of the perturbation score in IAV, CA, MTB and PA.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Valencia, C., Nathan, A., Kang, J.B. et al. Modeling heterogeneity in single-cell perturbation states enhances detection of response eQTLs. Nat Genet 57, 2882–2890 (2025). https://doi.org/10.1038/s41588-025-02344-6

Download citation

Received: 30 January 2024
Accepted: 25 August 2025
Published: 20 October 2025
Version of record: 20 October 2025
Issue date: November 2025
DOI: https://doi.org/10.1038/s41588-025-02344-6