Introduction

Exposure to traumatic events is common, estimated to affect 70% of people during their lifetime [1], with approximately 10% of trauma-exposed individuals experiencing post-traumatic stress disorder (PTSD) [2]. There are known associations between PTSD and future substance use behaviors [3,4,5], and 40.1% of people with PTSD are estimated to have a substance use disorder [6]. Substance use, including tobacco smoking and drinking alcohol, is of particular concern because of their associated morbidity and mortality: tobacco and alcohol consumption both increase the risk of heart disease, stroke, and cancers [7,8,9], and substance use accounts for approximately 14.7% of disability-adjusted life years globally [10]. PTSD is a heterogeneous condition owing both to biological risks and environmental factors with four major dimensions (avoidance, negative mood/cognition, hyperarousal, and re-experiencing), but to what extent PTSD and its components contribute to substance use risk is unclear. Understanding this interplay, though, could allow for more accurate risk prediction of substance use in trauma-exposed populations, and help tailor effective interventions based on the most salient constructs of PTSD.

Substance use itself is a complex phenomenon, and, similar to PTSD, affected by both biological underpinnings and environmental factors, such as availability of substances, social support, and socioeconomic status. It is estimated from family-based studies that tobacco and alcohol dependency are both about 50% heritable [11,12,13,14], though more recent meta-analyses estimated the single nucleotide polymorphism (SNP)-based heritability between 5 to 15% for both tobacco [15] and alcohol use disorders [16]. While the effect of PTSD symptoms on substance use behaviors has been described before as synergistic, and PTSD can interact with other factors, such as sleep problems [17], overall health [18], and childhood adverse experiences [19], to increase the risk of substance use, genetic interactions with PTSD are less documented.

There are three major gaps in prior literature we seek to address. First, most prior studies that have explored gene-environment interactions on substance use behaviors have been candidate gene studies, which do not capture the whole burden of genomic risk [20], and are subject to bias because they only investigated loci with prior documented or hypothesized evidence [21]. Secondly, no study to date has examined interactions with the four components of PTSD, leaving gaps in our understanding whether PTSD globally interacts with the genetic risk of substance use to increase behaviors, or if there are more salient subscales that affect this relationship. Beyond genetics, there is an array of contradictory findings as to which subscale of PTSD symptoms is most associated with substance use behaviors. There has been some evidence that negative cognition and mood symptoms have higher associations with substance use behaviors (regardless of genetic influence) [22, 23], while others have demonstrated associations with re-experiencing symptoms compared to other clusters [24] and hyperarousal [25]. By adjusting for the genetic risk and potential interactions in one of the largest PTSD cohorts to date, we seek to clarify this relationship. And finally, while polygenic risk scores and interactions with environmental exposures have been studied for substance use outcomes, only two to date have examined polygenic risk interactions with trauma [26], but these have been in single ancestry cohorts [27, 28], limiting their generalizability to more diverse populations.

Although historically, substance use genome-wide association studies (GWAS) have been conducted among predominantly European populations [29,30,31], more recent studies have successfully identified genetic loci associated with substance use in large cohorts representing multiple ancestries [15, 32,33,34]. Polygenic risk scores (PRS) have yet to fully leverage these multi-ancestry GWASes, and PRS studies for substance use behaviors have primarily been conducted in European-ancestry cohorts [35,36,37]. Several novel methods have been proposed to leverage shared genetic effects, while properly modeling ancestry-associated differences in linkage disequilibrium and avoiding bias due to population stratification [38,39,40]. With the development of these statistical methods, we are now better positioned to estimate the genetic burden for substance use in diverse populations, including after trauma. One powerful Bayesian approach, PRS-CSx [41], applies a continuous shrinkage before GWAS summary statistics and LD reference panels from multiple ancestry populations to estimate posterior weights incorporating both shared and ancestry-specific variant effects. PRS-CSx uses a relatively straightforward implementation process to model the joint distribution without the need for a large validation dataset or additional sequencing data, as is necessary for most other multi-ancestry methods [39].

Now, with cross-ancestry statistical methods to grapple with such diverse genetic data, we will leverage prospective data in the first civilian cohort of this scale, with sufficient statistical power to examine separate symptom subscales of PTSD, such as re-experiencing, avoidance, negative cognition/mood, and hyperarousal symptoms. We hypothesize that the polygenic risk for tobacco and alcohol use synergistically interacts with PTSD symptoms to increase substance use behaviors after trauma. We also hypothesize the sfttrength of this association differs by symptom cluster subscale, and negative cognition/mood has the largest association and synergistic interaction with the underlying genetic risk to increase substance use, due to prior research into the self-medication hypothesis that participants who experience negative mood use the euphoric and dissociative effects of substance use to cope [22].

Methods

Study sample

The Advancing Understanding of RecOvery afteR traumA (AURORA) study has been previously described and the data is available through the National Institute of Mental Health (NIMH) Data Archive (NDA) [42]. In brief, participants were recruited within 72 h of presenting to an emergency department (ED) after a traumatic event. Qualifying traumatic events included motor vehicle collisions, physical or sexual assault, falls >10 feet, mass casualty incidents, and other events related to experiencing, witnessing, or learning about actual or threatened serious injury, sexual violence, or death, and agreement from a trained research assistant that this is a likely traumatic event. In our sample, only 3.5% (n = 107) of participants were enrolled under this “Other qualifying event” category. Participants were followed at one of 29 study sites for one year across 6 time points: ED baseline, week 2, week 8, month 3, month 6, and month 12 post-trauma. At ED baseline, blood biospecimens were collected for testing (as described in the next section). Participants completed self-report study surveys at each follow-up time, including sociodemographic characteristics, trauma-related questionnaires, general mental health questionnaires and substance use behaviors. The target sample size for AURORA was determined via power analyses in the original protocol and 3050 total participants were recruited in the parent study [42].

Biospecimen collection and quality control

Blood biospecimens were collected using the PAXgene tubes at ED sites and frozen at −20C until shipping to the National Institute of Mental Health (NIMH) Repository and Genomics Resource (NRGR). DNA was then isolated via magnetic bead Chemagic 360 before testing. Biospecimens were processed on Fluidigm SNP trace for quality check. DNA genotyping was conducted using the Illumina Global Screening Array-24 1.0 at the Stanley Center for Psychiatric Research of the Broad Institute. Genotyping quality control included removing rare variants with minor allele frequency (MAF) < 0.005, karyotypic abnormalities determined via B-allele frequency, cryptic relatedness with kinship coefficient >0.18 (first-degree relatives), removing SNPs with Hardy-Weinberg equilibrium p-value < 1e-8, removing SNPs with call rates <98%, and removing participants with sex chromosomes discrepant from reported sex assigned at birth.

Ancestry principal components were estimated using ADMIXTURE [43] within the sample to determine likely ancestry groups. We generated 10 components for global diversity in our sample. The 3 primary ancestry groups were determined via cross-validation within ADMIXTURE, and likely ancestry was assigned based on >90% of the variance explained in the first three respective principal components, corresponding to likely European (EUR), likely African (AFR), and likely admixed American (AMR) ancestry, respectively. A further 10 principal components (PCs) were estimated within these groups. In models with all ancestry groups, 5 PCs were used to allow the model to converge given the small sample of the admixed American population, leading to 5 global PCs, 5 PCs for European ancestry, 5 PCs for African ancestry, and 5 PCs for American Admixed ancestry (20 PCs total). When stratifying European and African ancestry groups, 10 PCs were used.

Polygenic risk scores

We generated polygenic risk scores (PRS) using two methods. First, we conducted PRS for cross-ancestry estimation (PRS-CSx) [38, 41] using all participants. After stratifying by likely ancestry groups, we used PRSice-2 [44] in each group.

Cross-ancestry scores

For PRS-CSx, we used linkage disequilibrium (LD) reference panels from the 1000 Genome Project Phase 3 [45] for European and African ancestry. We used discovery genome-wide association study (GWAS) summary statistics from largest and most current version of the GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN) for tobacco and alcohol [32]. GSCAN included 119,589 AFR ancestry participants, 2,669,029 EUR ancestry participants, and 286,026 AMR ancestry participants. The summary statistics we used for tobacco were calculated in GSCAN as the average number of cigarettes smoked per week, and for alcohol, the summary statistics from GSCAN were based on the average number of drinks consumed per week. PRS-CSx multi-ancestry meta-analysis scores were used for all subsequent analyses. Global shrinkage parameter phi was derived using a fully Bayesian approach given the large population sizes of the discovery GWAS. PRS-CSx was run using Python 3.9 [46].

Ancestry-specific scores

We also stratified by ancestry group and generated ancestry-specific scores using PRSice-2 applied to the corresponding ancestry’s summary statistics. A sensitivity analysis was also performed with >50% ancestry individuals, allowing for greater admixture and increasing sample size. Clumping was performed with a window size of 250 kb, an R2 threshold for clumping of 0.1, and a p-value threshold for clumping of 1.0. The optimal p-value cut-off was determined using thresholding from 5e-08 to 0.5 that maximized R2, with a step-size of 0.00005. PRSice-2 was run using R 4.3 [47]. PRS-CSx and PRSice-2 scores were standardized and converted into deciles to more easily compare between low and high genetic risk groups.

Post-traumatic stress

We conducted further analyses adjusting for post-traumatic stress (PTS) symptoms at baseline, measured by the PTSD Checklist for the DSM-5 (PCL-5) [48], to identify the potential influence of traumatic stress on the relationship between biological risk for substance use and substance use behaviors after trauma. We also calculated the subscale scores by summing items from the relevant symptom cluster subscales for: avoidance, hyperarousal, negative cognition and mood, and re-experiencing symptoms [48]. We report the standardized PCL-5 scores in our models. We report multiplicative interactions and report additive interactions in supplement. For the additive interaction terms, we generated the relative excess risk due to interaction (RERI), the proportion of the overall effect attributable to interaction (AP), and the synergy index (SI) [49].

Substance use outcomes

Substance use was reported for tobacco and alcohol frequency and quantity via self-report. Tobacco consumption included cigarettes, pipes, cigars, chewing tobacco, dip, snuff, hookah, nicotine gum or patches, or e-cigarettes. This information was collected via the PhenX toolkit [50], which asked participants to report the frequency of use in the past 30 days and the average quantity of use in the past 30 days. We then calculated quantity-frequency by multiplying frequency and quantity counts, resulting in the total number of instances of use in the past 30 days. The primary endpoint of interest was 6 months post-trauma, as roughly half of all participants who undergo substance use disorder treatment complete their treatment within 90 days [51, 52], and this was the first endpoint following the traumatic event beyond that time period. Given the skip pattern of the questionnaire and high rates of missingness of healthcare utilization variables available, multiple imputation was not possible to sufficiently estimate and adjust for prior substance use treatment with confidence. Therefore, we used this approach as a means to capture a timepoint where it was unlikely that participants were currently in a treatment program, even if this was not captured in self-report.

Covariates

The primary covariates in these analyses were the ancestry-specific 3 major PC’s, an additional top 5 PCs in the combined ancestry models and the sex assigned at birth (male, female, other). While the question was asked, no participants reported “other” sex assigned at birth, therefore responses were binarized as male or female.

Anxiety symptoms were measured using the Patient Reported Outcomes Measurement Information System (PROMIS) Anxiety Bank items [53], and depression symptoms were measured using the PROMIS Depression Short Form [54]. These were included in our initial assessment of covariates and are reported in the Results, however did not demonstrate evidence of confounding results and were not used in the moderation analyses.

In these models, we expected there may be further confounding of the relationship between PTSD symptoms and substance use behaviors based on previous literature [3, 55,56,57,58], and adjusted for additional covariates. These included marital status (defined as never married as the reference, married, and grouping separated, divorced, and widowed as “other status” which were dummy coded), race/ethnicity (defined as non-Hispanic White as the reference, non-Hispanic Black, Hispanic, and non-Hispanic others which were dummy coded) to capture factors of race/ethnicity beyond genetic ancestry, income (defined as $35,000/year as the reference, $35,000 to $75,000, >$75,000, and “did not report” which were dummy coded), and education (did not complete high school as the reference, completed high school or received GED, completed bachelor’s degree, and completed post-graduate degree which were dummy coded), age in years and age^2 to identify potential generational effects, and finally region of data collection, which was dummy coded (Northeast as the reference, Southeast, Southwest, Midwest, and West coast). All continuous variables (age, depression symptoms, anxiety symptoms, and PTSD symptoms) were standardized by centering to the mean and scaling by the standard deviation.

Missingness

Missingness was assessed descriptively and graphically with the naniar package [59]. Missingness was determined to be missing at random by creating a yes/no variable derived from whether PCL-5 score or substance quantity-frequency was missing and we used univariate tests (t-tests for age, depression, anxiety, and PTSD; Chi-squared tests for all categorical variables) to identify whether a given variable was associated with missingness of either the exposure or outcome. No variable was identified as being associated with missing both the exposure and the outcome; therefore, we determined collider stratification bias was unlikely to occur based on our measured variables, and missingness was most likely to be at random. The most notable value that was missing was self-reported income; age, race/ethnicity, gender, and marital status otherwise had high rates of response (>90%). Income was missing among 335 (12.2%) of the sample overall, including participants who selected that they declined to specify their income and participants who skipped the question entirely. We conducted multiple imputation by chained equation to account for missing values, with 20 datasets for 30 iterations each. We pooled results using Rubin’s rules [60].

Statistical analyses

Our study sample included 2973 participants with initial genotype data available, and 2747 samples passed quality control. We generated summary statistics of our sample for major covariates and tested for differences of demographic characteristics in substance use with univariate methods. These methods included Pearson correlation coefficients for tobacco and alcohol frequency and age in years, as well as analysis of variance (ANOVA) tests for ancestry, race/ethnicity, income, marital status, and sex assigned at birth categories and substance use frequency outcomes. These were followed with Kruskal-Wallis tests.

We regressed PRS-CSx scores for average cigarette use with respective tobacco frequency and quantity, and average drinks with alcohol frequency and quantity at ED baseline and month 6 post-traumatic event, controlling for 5 ancestry-specific PCs for each of the 3 major ancestry groups, and sex using quasipoisson regression. Quasipoisson regression was used because substance use frequency and quantity were both counts; however, there was evidence of potential overdispersion when fitting an initial Poisson model and testing the dispersion statistic and the quasipoisson distribution relaxes this assumption. We report the regression coefficient and R2 for the crude and adjusted model. We performed sensitivity analyses independently for each likely ancestry group and repeated this modeling procedure, controlling for 5 global ancestry PCs and 5 ancestry-specific PCs in each of the 3 major ancestry groups (Equation 1). We compared the PRS-CSx ancestry-stratified scores to the PRSice-2 ancestry-stratified scores to determine the best performance.

$${IRR}\left({Substance\; use\; quantity}-{frequency}\right)=\beta {PRS}+\beta {PTSD}+\beta {COV}$$

We then adjusted for PTSD symptoms using the PCL-5 scores, controlling for additional sociodemographic covariates that may confound the relationship between traumatic stress and substance use, and conducted interactive analyses between tobacco and alcohol polygenic risk scores, and PTSD symptoms at 8 weeks following the traumatic event. As per recommendations from Keller [61], we further adjusted for all potential interactions between the PRS scores and other covariates included in the model (Equation 2). We did not include PTSD-covariate interaction terms as we experienced model convergence issues when including that number of interaction terms, and of the models that did converge, none were significant.

$$\begin{array}{ll}{IRR}\left({Substance\; use\; quantity}-{frequency}\right)=\beta {PRS}+\beta {PTSD}+\beta {PRS}* {PTSD}\\\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\beta {PRS}* {COV}\end{array}$$

PTSD can only be diagnosed after 1 month following a traumatic event per the DSM-5 [62], therefore we selected the earliest timepoint in the AURORA cohort follow-up that was at least 30 days after the index event. We also performed a sensitivity analysis using the highest PCL-5 score in any follow-up period before month 6, when our outcome was ascertained. This was to determine whether PTSD after trauma compounded or demonstrated a threshold effect with pre-existing genetic risk for substance use to influence post-trauma substance use. We then separated these results into the respective subscales for the PCL-5 to identify salient components driving a potential interaction (Equation 3).

$$\begin{array}{ll}{IRR}\left({Substance\; use\; quantity}-{frequency}\right)\\=\beta {PRS}+\beta {PTS}{D}_{{subscale}}+\beta {PRS}* {PTS}{D}_{{subscale}}+\beta {PRS}* {COV}\end{array}$$

Given we tested four subscales across two PRS scores, we used Benjamini-Hochberg correction of the false discovery rate to maintain a family-wise error rate of 0.05, to mitigate inflation of the family-wise error rate while maintaining power in our analyses compared to more conservative approaches such as a Bonferroni correction. We also generated 95% confidence intervals for point estimates.

Sensitivity analysis

As previously mentioned, we generated additive interaction metrics which are reported in supplement. We also generated generalized estimating equations (GEE) models for the repeated measures. We conducted analysis comparing model fit with the autoregressive 1, exchangeable, independent, and unstructured covariance structure. We determined the exchangeable covariance structure had the best fit based on Akaike Information Criterion and Bayesian Information Criterion and convergence. Given, however, that the scores were generated based on summary statistics that used cross-sectional ascertainment of tobacco and alcohol use, and GEEs estimate the slope over the repeated measures, we report this in supplement as the outcome in the GEE model is ultimately different from the summary statistic data the PRS-CSx scores were generated from. And given we did not have a replication sample for our analyses, we generated E-Values to describe the effect size of a potential unmeasured confounder that would be necessary to negate our results.

Ethical approval

This study was determined as non-human subjects research as a secondary data analysis of de-identified pre-existing data by the Harvard Longwood Institutional Review Board. All data from the parent study was collected following written informed consent of participants, and in accordance with the Declaration of Helsinki of 1978 and all relevant updates.

Results

Sample descriptive statistics

In our final analytic sample, 2747 participants had high quality genotyping data available (Table 1). Our sample predominantly self-identified their race/ethnicity as non-Hispanic Black (50.1%, n = 1376), and 1461 were identified genetically as belonging to the African ancestry group (55.2%). Further, 35.6% (n = 977) self-identified as non-Hispanic White, and 1110 (41.9%) were considered part of the European ancestry group. An additional 78 (2.9%) participants were categorized as likely belong to the American admixed group. Most participants had experienced a motor vehicle collision as the index traumatic event (76.4%, n = 2247), and fewer reported physical or sexual assaults (9.8%, n = 288). Participants who smoked tobacco compared to those who neither smoked nor drank alcohol less often had completed a bachelor’s degree (19.1 vs. 28.1%). There was no statistically significant association between education and alcohol use at baseline (p = 0.48); however, there were differences in income, with those drinking alcohol at baseline having a higher proportion of income >$75,000/year (n = 252, 14.8%) compared to those who did not consume alcohol or tobacco (n = 75, 10.1%) (p = 0.003).

Table 1 Descriptive statistics and univariate associations of study participants and tobacco and alcohol frequency of use.

PRS performance

The best predictive performance of PRS-CSx scores was consistently observed in the European sample when stratifying by ancestry (Fig. 1). This finding was similar for alcohol scores (Fig. 2), with the best performance in European ancestry yet lower performance in African and American ancestry groups. We present the deciles of PRS-CSx scores in Supplemental Figs. S12 and S13. The tobacco and alcohol scores were significantly associated with their respective outcomes when controlling for ancestry-specific PCs, age, age^2, sex, and region (Supplemental Tables S1 and S2). For example, tobacco PRS-CSx was associated with greater tobacco quantity-frequency (IRR: 1.17, 95% CI: 1.05, 1.31, p = 0.005), and the R2 for tobacco quantity-frequency was <0.01 overall, but 0.04 in the European ancestry subcohort (IRR: 1.36, 95% CI: 1.15, 1.61; p < 0.001). These findings were similar for greater quantity-frequency (IRR: 1.13, 95% CI: 1.01, 1.26; p-value = 0.03). The R2 for alcohol quantity-frequency was 0.06 for the European ancestry group, but <0.01 for the total ancestry groups when analyzed together, with a stronger effect in the European ancestry group as well (IRR: 1.24, 95% CI: 1.07, 1.44, p-value = 0.005). We graph the raw scores of the PRS-CSx scores in Supplemental Fig. S11 for reference. PRSice-2 score performance are reported in supplement (Figs. S11 to S8) and demonstrated reduced performance compared to PRS-CSx. Therefore, we conducted primary analyses using PRS-CSx scores.

Fig. 1: Percentile of tobacco-based PRS-CSx score and average tobacco use frequency at baseline, for all ancestries and stratified by estimated ancestry group.
figure 1

PRS-CSx scores were first centered and standardized, then converted to deciles for plotting. Ancestry groups were determined using ADMIXTURE software, and thresholds for likely ancestry were set at >0.90 in the factor loadings. The frequency of tobacco use was ascertained in the past 30 days of the emergency department (ED) baseline.

Fig. 2: Percentile of alcohol-based PRS-CSx score and average alcohol use frequency at baseline, for all ancestries and stratified by estimated ancestry group.
figure 2

PRS-CSx scores were first centered and standardized, then converted to deciles for plotting. Ancestry groups were determined using ADMIXTURE software, and thresholds for likely ancestry were set at >0.90 in the factor loadings. The frequency of alcohol use was ascertained in the past 30 days of the emergency department (ED) baseline.

Associations between PRS and substance use outcomes

We found statistically significant associations with the PRS score and greater tobacco use when adjusting for other mental health conditions and for sociodemographic covariates (Fig. 3). For instance, for tobacco quantity-frequency, the initial PRS effect estimate controlling only for sex, age, and ancestry-specific principal components was 1.17 (95% CI: 1.03, 1.33, p = 0.02), while controlling for PTSD, anxiety, depression at baseline, and experiencing a prior traumatic event, the PRS effect estimate was 1.16 (95% CI: 1.02, 1.32, p = 0.03). Further adjusting for all additional sociodemographic variables, the estimated PRS effect of 1.14 (95% CI: 1.01, 1.29, p = 0.03) for tobacco outcomes did not appreciably deviate from prior estimates. There was not consistent association with alcohol PRS-CSx score and alcohol quantity-frequency outcomes at baseline, and in the final model adjusting for all potential covariates, it was not significant (IRR: 1.08, 95% CI: 0.97, 1.19, p = 0.16).

Fig. 3: Associations of polygenic risk scores (PRS) with tobacco frequency and quantity, and alcohol frequency and quantity 6 months after a traumatic event, adjusting for post-traumatic stress and lifetime traumatic events for all ancestries.
figure 3

IRR: Incidence rate ratio as estimated using the quasipoisson model to relax dispersion parameter assumptions, 95% CI: 95% confidence interval. The outcome definition was the number of times one consumed the substance in the past 30 days, multiplied by the quantity consumed on average when one did consume, in the past 30 days. PRS-CSx: Polygenic risk score using continuous shrinkage and cross-ancestry methods; generated with summary statistics from the GSCAN database for average cigarette per week and average alcoholic drinks per week. PTS symptoms: Post-traumatic stress (PTS) was ascertained with the PTSD Checklist for the DSM-5 (PCL-5). Anxiety was ascertained with the PROMIS Anxiety Bank of state and trait anxiety. Depression was ascertained with the PROMIS 8-item short form for Depression. Ancestry-specific principal components were controlled for in all models.

PRS and Post-traumatic stress interaction analyses

In our interaction analyses (Fig. 4), we examined substance use for future associations at month 6 following the traumatic event, based on post-traumatic stress symptoms for week 8 after the traumatic event. These analyses aimed to identify the prospective relationship between pre-existing genetic risk, the addition of PTSD after a traumatic event, and future substance use behaviors. We found evidence of an interactive effect with PTSD for greater tobacco use quantity-frequency (IRR: 0.91, 95% CI: 0.82, 1.00, p = 0.05) such that the association between PTSD and tobacco use was weaker among those with high tobacco use PRS. There was no significant main effect of the PRS score (IRR: 1.05, 95% CI: 0.76, 1.44, p = 0.79) or joint interactive effect with week 8 PTSD symptoms and the PRS score (IRR: 1.04, 95% CI: 0.94, 1.16, p = 0.42) for alcohol quantity-frequency outcomes. We also graphed the predicted tobacco use quantity-frequency at month 6 when stratifying by the 10th and 90th percentile of the tobacco PRS-CSx score in Supplemental Fig. S14.

Fig. 4: Main and joint effects of polygenic risk score (PRS) and post-traumatic stress (PTS) symptoms and subscales at 8 weeks post-trauma on tobacco use in the past 30 days at 6 months post-trauma for all ancestries.
figure 4

1PRS-CSx: Polygenic risk score using continuous shrinkage and cross-ancestry methods; generated with summary statistics from the GSCAN database for average cigarette per week and average alcoholic drinks per week. 2IRR: Incidence rate ratio as estimated using the quasipoisson model to relax dispersion parameter assumptions, 95% CI: 95% confidence interval. 3 PTSD: Post-traumatic stress, symptoms were assessed using the post-traumatic stress disorder (PTSD) symptom checklist for the DMS-5 (PCL-5), with relevant subscales also reported (re-experiencing, negative cognition and mood, hyperarousal, and avoidance). Models were also adjusted with 5 principal components for global ancestry, and 5 principal components calculated within the top 3 ancestry groups, determined by >90% of the genetic variance related to ancestry being explained by that component.

PRS and Post-traumatic stress subscale interaction analyses

When examining subscale-specific effects, there was a statistically significant interaction between re-experiencing items and the PRS-CSx score for all types of tobacco use. This interaction was in a similar direction of effect to the larger PCL-5 score with all items. For example, for tobacco quantity-frequency, the interaction effect between re-experiencing items with PRS was 0.90 (95% CI: 0.82, 0.99, p = 0.03) when exponentiated to the incidence rate ratio scale, suggesting that the average effect of re-experiencing symptoms is less among those with high genetic risk, while among those with lower genetic risk for substance use behaviors, re-experiencing symptoms played a larger role in their future substance use. There was an interaction between the negative cognition and mood subscale and the genetic PRS as well, with IRR 0.90 (95% CI: 0.82, 0.99; p = 0.02). This suggests a similar direction of effect, whereby the average effect of negative cognition and mood symptoms is less among those with high genetic risk (Supplemental Fig. S14). The main effects of PRS and all other subscales were significantly increased tobacco use risk, though no additional interaction was significant (Fig. 4). Alcohol demonstrated a lack of lack of a main effect with the PCL-5 overall and respective subscales in our models (Figs. 3 and 4).

We report the interactions on the additive scale as well, which did not demonstrate significant additive interaction effects for either the RERI, the AP, or the SI (Supplemental Tables S4 and S5).

Sensitivity analyses

We report the generalized estimating equation for tobacco (Supplemental Table S6) and alcohol outcomes (Supplemental Table S7) using the exchangeable correlation structure. There was a significant interaction between the re-experiencing subscale and PRS-CSx (IRR: 0.94, 95% CI; 0.90, 0.99). No interactions with PTSD subscales and the alcohol standardized PRS score were significant. When using the maximum PTSD score in the ED baseline, week 2, week 8, and month 3, re-experiencing (IRR: 0.1, 95% CI: 0.82, 1.00, p = 0.047) and negative cognition and mood (IRR: 0.91, 95% CI: 0.83, 1.00, p = 0.049) were significant (Supplemental Table S8). There was no significant interaction using the GEE for alcohol outcomes (Supplemental Table S9). With our E-Value analysis (Supplemental Table S10), to negate the significant interaction terms would require an effect size of 1.50 (lower 95% CI limit: 1.21).

We did not have adequate statistical power to identify interaction effects in ancestry group strata, however we did identify similar main effects. We found an increase in tobacco quantity-frequency among European participants associated with increased PRS-CSx score when controlling for the PCL-5 main and interaction effects (Supplemental Table S11), though there was not a significant relationship between alcohol risk score and alcohol quantity-frequency in the European subsample (Supplemental Table S12). When examining subscales of the PCL-5, there were consistent associations among the European ancestry strata with the tobacco scores increasing tobacco quantity-frequency (Supplemental Table S12), none were statistically significant with the adjusted p-values, however the hyperarousal subscale indicated a suggestive increase (IRR: 1.17, 95% CI: 0.999, 1.36, p = 0.051). For the African ancestry strata, the tobacco outcomes are reported in Supplemental Table S13 and were significantly associated with negative cognition/mood symptoms (IRR: 1.20, 95% CI; 1.01, 1.2), and all subscales were associated with alcohol outcomes (for example, hyperarousal had an IRR of 1.28 (95% CI: 1.11, 1.48, p < 0.001), but the standardized PRS-CSx score for alcohol use was not significantly associated with quantity-frequency of use (Supplemental Table S14).

Discussion

This study is one of the first studies to investigate the interaction of cross-ancestry polygenic risk scores with PTSD symptoms in relationship to for substance use behaviors after trauma. To our knowledge, this study is also one of the first studies to examine the interaction between specific PTSD subscales and genetic risk on substance use outcomes. Given the high level of co-occurrence between PTSD and substance use, using longitudinal cohorts to delineate temporal ordering ensures that PTSD symptoms are ascertained before future substance use behaviors. This is paramount for guiding resources and treatment decisions. We demonstrated statistically significant interaction between PTSD symptoms and genetic risk to daily cigarette consumption on post-trauma future daily nicotine product consumption but did not find evidence of an interaction with genetic risk and PTSD symptoms on future alcohol use. Our findings suggest there may be a threshold effect of risk occurring among groups with high genetic risk for tobacco. However, we note that overall prediction of substance use traits in non-European ancestry samples in this investigation was relatively poor. The use of multi-ancestry methods, however, allowed us to examine these associations in a diverse cohort using all participants when controlling for additional principal components and covariates.

While there have been major strides in cross-ancestry methods for PRS, notable challenges remain [39]. Generating and assessing the accuracy of these scores in multiple populations and communities remains a frontier of the field. These challenges are particularly pronounced in complex conditions that have such a notable environmental component, such as substance use, with an estimated heritability between 30 and 70% for nicotine use disorder using twin-based studies [63], and between 50 and 64% for alcohol use disorder using twin-based studies [11]. SNP-based heritability analyses from recent GWAS report lower estimates: a 2024 meta-analysis estimated roughly 5 to 15% heritability of tobacco use disorder phenotypes [15], and while another meta-analysis estimated heritability of alcohol use disorder phenotypes as 7 to 13% using SNPs [16]. This is further exacerbated by the relatively low predictive ability of current polygenic risk scores, which, even when using more ancestrally homogenous and large cohorts such as the FinnTwin12 study, the PRS explained between 2.5 to 3.5% of the variance for alcohol use disorder [64]. Using multiple ancestry reference groups may boost power for diverse cohorts [40], however we found that performance remained the best in the ancestry group most represented in the discovery GWAS: the European ancestry subcohort. The lack of consistent association in the other groups demonstrates the ongoing limitations of statistical genetics work to leverage multi-ancestry cohorts for PRS estimation. Even using a different method for prediction, PRSice-2, in the ancestry-specific estimates continued to demonstrate low predictive accuracy. This suggests that, despite significant progress in the field, there continue to be challenges based on the limitations of the initial GWAS. This work further highlights the need for greater diversity in GWAS discovery cohorts. In the future, using reference GWAS with large, diverse cohorts, such as All Of Us, could improve PRS prediction.

Our interaction analyses suggest that compounding genetic risk with PTSD symptoms for future tobacco use leads to a lesser increase in substance use than expected from the main effects alone. While further replication of this interaction is necessary, this observation may be due to a thresholding effect in our sample, which was entirely trauma-exposed and reported high rates of overall substance use (e.g., alcohol use was greater than 60%, tobacco use was greater than 30% of the sample). The risk of substance use related to genetic liability may mask the effects of the PTSD symptoms; in other words, individuals are over a threshold of risk factors, and additional insult of PTSD symptoms does not lead to the same increase as it does for those with low underlying risk. This would be consistent with the diathesis-stress model, whereby individuals with low genetic vulnerability may have increased sensitivity to environmental factors that increase their risk for substance use behaviors [65, 66]. Under such a model, certain participants respond to post-traumatic stress with increases in substance use behaviors, while others do not. In this case, this group vulnerable to PTSD effects is the low genomic risk for substance use. Alternatively, the PRS themselves may capture some PTSD risk, therefore those with high PRS are already at risk of high PTSD after trauma, with less of the measured PTSD explaining changes in tobacco and alcohol consumption. A recent review only found three studies to date directly addressing this effect, with inconclusive evidence and no consistent direction of the interaction effect [26]. Therefore, risk for PTSD symptoms may differ from exposure to prior traumatic events, and future research should investigate how the timing of traumatic events, the number of prior traumatic events, and PTSD symptoms may compound and influence this risk.

The lack of association between PTSD symptoms and alcohol outcomes in our study could be related to the high rate of alcohol consumption at baseline in our sample. Alcohol use disorder has a high prevalence among PTSD-diagnosed patients, ranging from 9.8 to 61.3% [67]. The alcohol PRS was not consistently associated with outcomes when controlling for prior mental health conditions, suggesting that the effect of genetic prediction on alcohol use outcomes may not be independent of these factors. Prior studies, including a systematic review in 2020 [22], have demonstrated there is some evidence of the “self-medication” hypothesis, whereby PTSD symptoms are associated with increased alcohol use. These studies, however, have methodological limitations, and only a handful of studies reported longitudinal assessments of prior exposure to traumatic events and future alcohol use [4, 68,69,70,71]. These studies generally found an increase in alcohol use after trauma, though associations with PTSD symptoms were mixed at times; for example, a study examining alcohol consumption associated with PTSD symptoms after 09/11/2001 found no association with increased use [71]. However, a more recent study with 8 years of follow-up identified that substance use was independent from PTSD symptoms over time, when accounting for other co-morbidities and confounders [72]. Therefore, although we did not demonstrate associations with increased alcohol use, unmeasured confounders which we did not include in our study might have affected the observed relationships. In addition, the various types of traumas experienced by those in the cohort could have been confounders associated with different PTSD symptomatology that we were unable to disentangle given that our cohort had primarily experienced motor vehicle collisions.

When separating by symptom subscales, we found that re-experiencing symptoms and negative cognition/mood subscales demonstrated interactions with the tobacco PRS-CSx score similar to the overall PTSD-PRS interactions. However, hyperarousal and avoidance symptoms did not interact with genetic risk, though the tobacco PRS-CSx score remained consistently associated when accounting for them. These interactions with re-experiencing and negative cognition/mood remained statistically significant when adjusting for co-occurring anxiety and depression symptoms and were negative, compared to positive associations in the main effects. This finding suggests that these two subscales may be independent of other affective symptoms. In prior research, subscale studies conducted using the DSM-4 version of the PTSD Checklist—which included a re-experiencing subscale with questions worded slightly differently and included the “emotional numbing” subscale—found that emotional numbing was often associated with increased nicotine use and dependence [73, 74], however re-experiencing symptoms were associated with positive and negative reinforcement of smoking behaviors [75]. In one recent study among former smokers exposed to a traumatic event, the investigators found emotional numbing/avoidance was related to cross-sectional assessment of relapse of smoking, but not re-experiencing symptoms [74]. Our findings in a longitudinal cohort may relate to changes in smoking behaviors in the future, rather than prevalence of smoking, and demonstrate how re-experiencing symptoms indicates some other underlying biological vulnerability that interacts with the genetic risk for tobacco consumption.

Our study was limited in several ways: we relied on self-reporting of substance use behaviors. Therefore, we may be subject to misclassification and reporting biases based on recall or social desirability bias [76]. We also used previously described methods for polygenic risk score estimation, which are limited for cross-ancestry populations [39]. While PRS-CSx showed statistically significant and consistent associations with our outcome of interest, suggesting minimal contamination of the score, it underperformed in African ancestry samples compared to European ancestry sample. There also were not enough American Indigenous ancestry samples to accurately estimate risk when stratifying for either PRS-CSx or PRSice-2. Future methods that improve prediction in underrepresented populations may reveal further insights we cannot explore with current methods. We used GSCAN data to build the PRS scores, which specifically quantified cigarette use, however this variable was the closest to the tobacco product frequency and quantity we had available in AURORA. Therefore, our AURORA outcome variable reflected other products beyond cigarettes, which may bias findings. Although we did find interactions for tobacco outcomes, we may have been underpowered to detect them in alcohol outcomes. Due to the high rates of alcohol use in our sample, there were fewer contrasts available, which may have reduced statistical power. Additionally, our sample had primarily experienced motor vehicle collisions as their index trauma, and other cohorts, such as veteran cohorts or those experiencing physical or sexual assault, may yield different findings. PTSD symptomatology may differ based on the type of trauma experienced, and therefore certain symptom presentations may not be represented in our study. The sample was further skewed by being self-identifying primarily as female and being Black/African-American compared to the general US population. We did not have access to a comparable replication population that had longitudinal data of similar timing available or subscale information with similar measures of tobacco and alcohol quantity-frequency. In the future, replication must be conducted to validate the findings, as in gene-environment research, there are often challenges with replication in additional cohorts. While, to preserve power in our sample, we did not split our data into a training/testing dataset, we did generate E-values for our primary findings to quantify the necessary influence of an unmeasured confounder to negate our results. Finally, there might be residual confounders in our study that we did not consider which could affect our findings.

Our study has several strengths, particularly the longitudinal nature of our data that allowed us to assess future substance use behaviors and trajectories over time. This advantage allowed us to establish temporal ordering of the effect of PTSD symptoms on future substance use behaviors. We also used a cross-ancestry method for polygenic risk scores that showed consistent associations when controlling for co-occurring mental health conditions and for socioenvironmental factors often related to both substance use scores and substance use behaviors. This observation demonstrates that the PRS largely captured genetic variation rather than an aggregated environmental effect, which may be due to the accounting for ancestry differences. Education was one of our most associated covariates with substance use but it does not diminish the association with the genetic scores in the European-only and all-ancestry cohort. We also report PTS subscale associations, which revealed important differences between different facets of traumatic stress and highlighted particular risks that may affect the risk for substance use. Finally, our sample was a diverse civilian cohort who had all experienced a traumatic event, which likely enables our findings to be more externally valid to general populations.

Conclusion

This study demonstrates evidence of genetic and post-traumatic stress interactions on tobacco outcomes and the persistent effect of genetic risk on alcohol outcomes when controlling for other environmental conditions. Our work highlights re-experiencing and negative cognition/mood constructs of PTSD symptoms may be particularly salient in interactions with genetic risk for tobacco consumption and future tobacco outcomes after trauma. These findings show a threshold effect may occur, whereby increased genetic risk and increased PTSD symptoms do not simply aggregate, but rather high-risk individuals may not have further risk conferred after a certain point. Future research may consider different substance use trajectories after traumatic stress, such as differences in remitting or relapsing patterns of use versus incident use, and expand this work to other substances, such as cannabis and opioids.