- Research
- Open access
- Published:
Multi-omics mediation pipeline reveals differential pathways of maternal SNPs affecting newborn adiposity outcomes
BMC Genomic Data volume 26, Article number: 66 (2025)
Abstract
Background
A great deal of previous research describes the impact of the maternal metabolic and genetic milieu on newborn adiposity outcomes. However, much of this research does not focus on all aspects of the problem simultaneously. Studies focusing on metabolic factors may not distinguish between maternal and fetal genetic pathways, while studies that do focus on these different genetic pathways may not incorporate metabolic information into effect estimates or variant classifications. In this paper, we introduce a novel multi-omics pipeline for maternal genetic variant selection and mediation effect testing that can handle all these pathways, and use it to investigate broad patterns in the effects of maternal genetic variants on newborn adiposity outcomes.
Results
A Bayesian network model is used to incorporate both metabolomic and genomic data into an initial filter for maternal variants likely to affect newborn adiposity outcomes through a direct maternal genetic effect, an indirect fetal genetic effect, a maternal metabolic effect, or some combination of these pathways. A mediation model is then fit to these candidate variants and associated outcomes to identify which of these pathways, if any, mediate the total effect. We then group maternal genetic variants according to the relative magnitudes of these three effect pathways. In an application to existing mother-newborn data from the HAPO study, we find that of 78 candidate variants, the majority influence newborn birthweight solely through either a direct maternal or indirect fetal genetic effect (37% and 40%, respectively), a smaller number through both of these (14%), relatively few exclusively through the maternal metabolic pathway (6%), and almost none through a combination of the maternal metabolic pathway with either of the two genetic pathways (3%). We also find that these overall patterns of mediation effects are similar across outcomes.
Conclusions
Our results reveal broad patterns in the effects of maternal genetic variants on newborn adiposity, and identify both new genetic loci and loci known from previous literature to influence newborn adiposity. These results demonstrate the potential for scientific discovery enabled by our multi-omics mediation pipeline, and the approach is broadly applicable for untangling path-specific contributions in the modern integrated multi-omics landscape.
Background
A great deal of previous research describes the impact of the maternal metabolic and genetic milieu on newborn adiposity outcomes. Early work in this area, including the observational Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study [1, 2], established a strong link between maternal metabolic phenotypes and newborn adiposity, and more recent work focused on elucidating the mechanisms underlying these associations through high-dimensional omics analyses and joint analyses of genomic and metabolic data [3, 456.
A second, related line of research addresses the role of genetics in newborn adiposity in greater depth. Early studies demonstrated that maternal and fetal genetics both impact newborn birth weight [7,8,9]. Later work focused on determining the relative contributions of these two sources of genetic information, accounting explicitly for the fact that maternal genetics can influence newborn outcomes both directly and through inheritance. Reference [10] used structural equation modeling to distinguish the maternal and fetal genetic contributions of individual SNPs to newborn birthweight, and then grouped these SNPs according to whether they influence birth weight through a maternal contribution, a fetal contribution, or both. Reference [11] also used structural equation modeling to distinguish maternal and fetal contributions, though their interest was in identifying variants with predominantly maternal genetic contributions. Reference [12] grouped genetic variants in a similar way to [10] using maternal and fetal genetic effects estimated from phased haplotype data.
Though these studies are an important step forward in our understanding of the factors affecting newborn adiposity outcomes and the pathways by which they act, most do not address the full multi-omic nature of the problem. Studies focusing on metabolic factors may not distinguish between maternal and fetal genetic pathways. On the other hand, studies that do focus on these different genetic pathways may not incorporate metabolic information into effect estimates or variant classifications.
Multi-omics Bayesian network analysis [6, 13, 14], a technique in which all variables involved can be analyzed simultaneously in a network setting, is a promising methodology to address these shortcomings. In a Bayesian network, individual “child” nodes (variables) are modeled as functions of their “parent” nodes, the set of all nodes with edges that point toward the child. An advantage of this framework for multi-omics data is that it can incorporate pre-specified edge directions that reflect biological constraints (or lack thereof) between or within different types of data. For example, genetic variables can affect metabolic variables, but not vice versa; this constraint can be accommodated by requiring edges between genetic nodes and metabolic nodes to be directed from the gene to the metabolite. Edges between certain metabolic variables, on the other hand, may not be expected to have a particular direction. In these cases, no direction is pre-specified, and edges can exist in either or both directions between a pair of nodes.
In this paper, we use a Bayesian network approach combined with a mediation analysis to provide a more complete picture of the different pathways by which maternal genetic variants influence newborn adiposity outcomes. We focus on three effect pathways: the effect mediated through the maternal metabolome, the effect mediated through non-metabolic maternal factors, and the effect mediated through fetal genetics. Following [10, 12], we then group maternal genetic variants according to the relative magnitudes of these three effect pathways, and investigate patterns within the groupings.
Methods
HAPO data
The data used in this paper come from the HAPO study [1], a multi-site observational study of over 25,000 pregnant women and their newborns to determine maternal glycemic factors associated with pregnancy outcomes. Full details of the HAPO protocol have been published elsewhere [15, 16]. Briefly, a 2-hr 75 g oral glucose tolerance test (OGTT) at approximately 28 weeks’ gestation was used to measure maternal glycemia. Initial fasting blood samples at the start of the OGTT were drawn after an overnight fast, with additional samples drawn 1-hr and 2-hrs after a 75 g glucose load. At delivery, a cord blood sample was taken and used to measure cord C-peptide. Newborn birthweight and newborn sum of skin folds (SSF) were also measured.
In additional to glucose measurements, blood samples from the HAPO OGTT underwent additional analyses, either as part of the initial study or one of its ancillaries. These additional analyses included metabolic profiling, in which conventional clinical and targeted metabolite levels were measured, and genotyping. Details of the metabolomic and genetic data preparation are given in the next two sections. For this paper, we used data from 1385 maternal/offspring pairs for which each of the following data types were available: 1) maternal metabolite levels from the fasting blood samples, 2) maternal and newborn genotype, 3) cord C-peptide, newborn birthweight, and newborn SSF (all measured at delivery), and 4) maternal confounding variables, specifically: field center, maternal age, maternal height, maternal BMI, maternal fasting and 1-hr OGTT glucose levels, gestational age, gestational age at delivery, offspring sex, parity (0 vs 1+), mean arterial pressure, and metabolomics sample storage time.
Prior to downstream data analysis, each maternal metabolic variable and each newborn outcome variable was regressed on the full set of maternal confounding variables. The residuals from these regressions were retained for downstream network construction and mediation model fitting.
Genetic data
Genotyping of maternal and newborn HAPO DNA samples encompassing a wide range of ancestry groups has been described. Briefly, 2,581 Afro-Caribbean and 1,615 Mexican American samples were run on an Illumina HumanOmni1M-Duo v3B SNP array, 5,832 Northern European samples on an Illumina Human610-Quad v1B SNP array, 2,466 Thai samples on an Illumina HumanOmni-Quad v1-0B SNP array, and 5,564 transethnic samples on an Illumina Global Screening Array-24 v2.0 A1 [3, 6, 17]. Genotype imputation was performed separately on each dataset (maternal and fetal from each of the ancestry groups) on the TOPMed Imputation Server using Minimac4 (version 1.5.7) [18] and the TOPMed reference panel for the Afro-Caribbean, Mexican American, Northern European, Thai, and transethnic samples [19].
Exclusion criteria for DNA samples included inconsistent self-report of sex at birth with available chromosomal data, chromosomal anomalies, unintended sample duplicates, sample relatedness based on genetic relatedness, low call rate (samples with missing call rate >1% and SNPs with missing call rate >2%). Exclusion criteria for individual variants included SNPs with >3 Mendelian errors, departures from Hardy-Weinberg equilibrium, duplicate discordance, sex differences in heterozygosity, low minor allele frequencies (<0.01) and/or low imputation quality score (<0.75).
A total of 9,419,653 overlapping SNPs were analyzed in a mega-analysis across the maternal imputed datasets. Principle components (PCs) were estimated separately for maternal data using the package SNPRelate version 1.26.0 in R [20].
Metabolomics data
A Beckman-Coulter Unicell DxC 600 clinical analyzer was used to measure clinical metabolites (triglycerides, non-esterified fatty acids, lactate, glycerol, 3-hydroxybutyrate). Targeted metabolite panels (acylcarnitines and amino acids) used flow injection electrospray ionization mass spectrometry, and were quantified using isotope dilution with a Waters TQ triple quadrupole mass spectrometer and Acquity liquid chromatography. Data were processed using MassLynx Version 4.1 (Waters Corporation, Milford, MA, USA). Any targeted or clinical metabolite with greater than 10% missingness across subjects was excluded from further analyses.
Non-targeted metabolite assays used gas-chromatography mass-spectrometry (GC-MS). Specifically, batches of HAPO samples, plus quality control and blank samples, were run using a 7890B GC/5977B MS (Agilent Technologies, Santa Clara, CA, USA). AMDIS freeware was used for peak deconvolution [21, 22], and data from the QC samples were used to control for technical variability. The R package metabomxtr (Version 1.26.0) [23, 24] was used for batch correction and normalization. Any non-targeted metabolite with greater than 20% missingness across subjects was excluded, with the remaining missing metabolite values imputed with minimum values specific to that metabolite.
Following the above preprocessing, 62 targeted and 53 non-targeted metabolites were retained our analysis, for a total of 115.
Genetic variant selection
Because the goal of this paper is to elucidate the mediation pathways by which maternal genetic variants influence newborn outcomes, genetic variants that showed no association with any of the three outcomes (cord C-peptide, newborn birthweight, and newborn SSF) or any of the maternal metabolites in genome wide association studies (GWAS) were not included in the Bayesian Network Models (BNMs) or mediation analyses [25, 26]. This filtering process reduces the number of variants under consideration, making the network size more manageable while also restricting attention to variants that could have significant mediation effects. Separate GWAS were conducted for associations with cord C-peptide, newborn birthweight, newborn SSF, and each of the metabolites. Each GWAS was performed using mega-analysis linear regression in SNPTest v2.5 against an additive genotype variable. Following the methods of Kuang et al. [27], who analyzed the same dataset, each GWAS adjusted for the first three principal components of genetic ancestry (results were not changed by adding additional PCs beyond this number) as well as the full set of maternal confounding variables. SNPs with nominal p\(<5\times 10^{-6}\) were selected for further analysis. This selection threshold is less stringent than the usual GWAS threshold of nominal p\(<5\times 10^{-8}\), since the additional BNM filter (see below) aids in selecting the most informative SNPs [6]. The total set of variants selected in at least one of these GWAS (7,043) were then LD trimmed using the package SNPRelate version 1.26.0 in R [20]. Specifically, SNPs in the lactase region (2q21) and major histocompatibility complex (MHC) and inversion regions 8p23 and 17q21 were excluded. LD pruning was performed to only include pairs of SNPs with minor allele frequency \(\ge\) 0.02, missing call rate \(\le\) 0.02 and \(r^2 \le 0.1\) in a sliding 10Mb window.
Bayesian network model (BNM)
In order to reduce the number of variants eventually tested for mediation effects, BNMs were constructed with: 1) the maternal genetic variants selected via the GWAS filtering procedure described above, 2) the corresponding newborn genetic variants, 3) maternal metabolites, and 4) the three outcome variables (cord C-peptide, newborn birthweight, and newborn SSF) as nodes. Separate BNMs were constructed with each of the newborn outcomes. A major advantage of BNMs is the ability to specify direction of edges via a “blocklist”, a list of edge types that are not allowed to exist (i.e. are set to zero). In our networks, these blocked edges are used to enforce biological constraints on the existence and direction of edges. Specifically, only the following edge types were permitted: (1) edges from maternal SNP to any maternal metabolite, any newborn outcome, and maternal and newborn SNPs within 250,000 bp of the maternal SNP; (2) edges in either direction between any two maternal metabolites; (3) edges from newborn SNP to newborn outcome. All other edge types were included in the blocklist and hence set to zero. All BNMs were estimated using the CCDr algorithm [28]. Once fitted, the BNMs were used to select pairs of maternal SNPs and newborn outcomes for additional mediation analyses. Specifically, if a maternal SNP was connected to a newborn outcome through a network path involving at least one maternal metabolite and/or fetal genetic variant, that pair was analyzed further using the mediation model described below.
Choice of penalty parameter
BNMs include a penalty parameter \(\lambda\) that governs the overall sparsity level of the network, with larger \(\lambda\) values inducing greater levels of overall sparsity and thus more stringent filtering of SNPs as described above. Moreover, for any fixed pair of nodes, there is a value of \(\lambda\) above which the edge between them is never in the network, but below which it is. A direct consequence of this is that for every SNP, there is a value of \(\lambda\) below which that SNP remains in the set of variants tested in subsequent mediation analyses, but above which it does not. We used the genetic variant rs7034200, a variant in the GLIS3 gene known to be associated with glucose metabolism and newborn birthweight through both maternal and fetal pathways [29], as a benchmark against which to base our choice of \(\lambda\). Specifically, we chose \(\lambda\) to have the largest possible value while still ensuring that this SNP remained in the set of variants retained for mediation analysis with newborn birthweight. This resulted in \(\lambda = 2.95\). This procedure promotes a high overall level of sparsity and variant filtering, while still ensuring that an important a priori true positive is retained. It is worth noting that this procedure for choosing \(\lambda\) is only possible because of this a priori biological information. However, in a different setting, other approaches such as k-fold cross validation could be used without change to other parts of our method.
Mediation analysis
For each maternal SNP/newborn outcome pair identified using the BNM as described above, a mediation model was fit to determine the breakdown of the total effect of the maternal SNP on that newborn outcome. Following the techniques of VanderWeele et al. [30] for mediation analysis using multiple dependent mediators, the following regressions were fit, where a represents the maternal SNP of interest, y represents the outcome of interest, \(\textbf{m} = \{m_1, m_2, \ldots , m_k\}\) represents the full set of maternal metabolites, and \(g_a\) represents the newborn SNP at the same location as a:
The effect of the maternal SNP mediated through maternal metabolites is then given by \(e_{met} = \sum _{i=1}^k \beta _a^{(i)}\theta _m^{(i)}\), the effect mediated through fetal genetics by \(e_{fg} = \theta _{g_a}\gamma _a\), and direct maternal genetic effect by \(e_{mg} = \theta _a\) [30]. The full set of maternal metabolites was used in the mediation model for each variant, rather than a variant-specific set determined by the BNM, to maximize both computational feasibility and the comparability of effect magnitudes across variants.
This model and the corresponding effect size equations assume that, conditional on the exposure a, \(\textbf{m}\) is independent of \(g_a\), but does not attempt to parse the complicated dependence structure within \(\textbf{m}\). Metabolite networks often have feedback loops, bi-directional edges, and other elements that are not compatible within the directed acyclic graph framework necessary for mapping conditional independence relationships and identifiable pathway effects. Thus, a total effect through \(\textbf{m}\) is estimated rather than a set of metabolite specific effects. An added benefit of this approach is that it removes a potential multiple testing concern associated with testing hundreds of metabolite-specific pathways for each variant. See Fig. 1 for a graphic depiction of the mediation model and the pathways of interest.
In addition to the mediation effects themselves, proportion mediated was also calculated. The proportion mediated by each pathway is given by
All models were fit and mediation effects estimated using the lavaan R package [31].
Results
Genetic variant selection
A total of 9,419,653 genetic variants were included in the initial round of GWAS (see Methods). Of these, 7,043 were associated with maternal metabolites or at least one of the newborn outcomes (cord C-peptide, newborn birthweight, and newborn SSF). After LD trimming (for details see “Genetic variant selection” section), this set of 7,043 was reduced to 902. After selecting variant/outcome pairs for mediation analysis based on a path existing in the BNM between variant and outcome (see “Bayesian network model (BNM)” section), 397 variants remained for mediation testing with cord C-peptide as outcome, 631 remained with newborn birthweight as outcome, and 300 remained with newborn SSF as outcome.
Example Bayesian network corresponding to a single maternal variant, rs3789183. The network shown is a subnetwork of the null newborn birthweight Bayesian network. The orange node is the maternal variant, the blue node is analogous newborn variant, and the gray node is the outcome. The small nodes at the top of the figure are maternal metabolites. Network paths connect the maternal variant directly to the outcome and indirectly through the metabolites
Figure 2 shows an example of a subnetwork corresponding to a single maternal variant. This variant is connected to newborn birthweight through a direct maternal genetic pathway as well as a maternal metabolic pathway, and is thus included in the set of variants subsequently tested for mediation effects. Note that for this particular variant, only the direct maternal genetic pathway is significant in the mediation analysis (Fig. 3).
Mediation analysis
Variant/outcome pairs were classified into one of 7 groups based on which pathways showed significant effects (nominal \(p < 0.05\)) in the mediation analysis. Table 1 summarizes the group membership criteria, and Table 2 lists the number of variants falling into each group.
Table 2 gives a broad sense of the effect patterns that drive maternal variant and newborn outcome association patterns. The majority of variant-outcome associations that were detected fell into groups 1 and 3, indicating that variants with an effect through either direct maternal genetic factors or through fetal inheritance, but not both, are relatively common. A smaller number of variants fell into group 5, indicating both a maternal and fetal genetic effect. A smaller number of variants also fell into group 2, indicating an indirect metabolic effect only. Interestingly, these metabolic effects tended to occur alone, i.e. it is relatively uncommon for a variant to have an indirect metabolic effect as well as another active pathway, as evidenced by the small number of variants in groups 4, 6, and 7.
These overall patterns were similar for all three newborn outcomes studied, as can be seen from comparing across columns in Table 2. However, not all the individual variants involved were the same. Of the 155 total variants associated with at least one of the three outcomes, 133 (86%) were associated with only one, 19 (12%) were associated with two, and 3 (2%) were associated with all three. These results suggest that, although overall patterns seem to be shared across birthweight, cord C-peptide, and newborn SSF, there is somewhat limited overlap in individual variants affecting each of these outcomes. Interestingly, however, among variants that were associated with at least two of the outcomes, group membership was often conserved (see appendix Table 3), including one variant, rs792402 in the MSI2 gene, that acted exclusively through the metabolic pathway for all three outcomes.
Magnitudes of mediation effects for the newborn birthweight (kg) outcome. A variant is shown on the y-axis if it fell into one of the 7 effect pattern groups. White spaces indicate an insignificant effect for that pathway (nominal \(p \ge 0.05\)). Variants that were not connected to newborn birthweight in the BNM are not shown. Abbreviations: Mat. Gen. DE = Maternal Genetic Direct Effect, Fet. Gen. IE = Fetal Genetic Indirect Effect, Met. IE = Metabolomic Indirect Effect, TE = Total Effect
Figure 3 shows effect magnitudes for each of the three mediation pathways (i.e. \(e_{mg}\), \(e_{fg}\), \(e_{met}\)) across genetic variants for the newborn birthweight outcome (see supplement for similar figures for the cord C-peptide and newborn SSF outcomes). A particular genetic variant is included in the heatmap for an outcome if any of the three mediation pathways have significant effects (i.e. the variant falls into one of the 7 groups described above for that outcome). Among the group 5 variants, the maternal genetic effect and fetal genetic effect tend to have opposite directions, a phenomenon that has been observed previously in the maternal/fetal genomics literature [12, 32, 33]. Some of these have maternal and fetal effects have similar magnitudes, so that the total effect is not statistically significant despite significant pathway effects (see rs7551077, rs60580953, and several variants immediately below these in Fig. 3). Not all of these opposite effects are understood mechanistically, but one potential hypothesis in the case of variants affecting insulin and glucose uptake (which are themselves often associated with Type 2 diabetes), involves the amount of glucose made available by the mother and the amount of glucose absorbed into cells by the fetus. A SNP that reduces insulin sensitivity in the mother will lead to higher intrauterine glucose. This will tend to have a positive effect on birthweight: with more glucose available to the fetus, more will be absorbed into the cells and eventually stored as glycogen or fat. However, when part of the DNA of the developing infant, this same SNP might also reduce the infant’s insulin sensitivity and therefore reduce glucose absorption into the cells. This will tend to have a negative effect on birthweight. This could produce a positive maternal genetic effect, but a negative fetal genetic effect. However, direct experimental evidence would be needed to confirm this or any other mechanistic hypothesis.
Our analysis identified several variants associated with newborn birthweight from genes that match previous analyses. These include RORA, a gene known to be associated with adipose tissue inflammation [34]. Both our analysis and Juliusdottir et al. [12] identified variants with direct maternal genetic effect of positive sign (i.e. presence of variant associated with higher newborn birthweight). Our analysis, Juliusdottir et al., Beaumont et al. [11], and Warrington et al. [10] all found variants in LMO1 with a positive maternal genetic effect. Our analysis and Beaumont et al. both found variants with positive maternal genetic effects in L3MBTL1, and our study, Juliusdottir et al., and Warrington et al. all found such variants in GLIS3, a risk factor for diabetes [35].
Our analysis also uncovered biologically plausible genes with mediation pathways that, to the best of our knowledge, have not been examined in previous literature. Among these are two genes with metabolic mediation pathways, LRP1B and TTC7A. LRP1B encodes a known low-density lipoprotein (LDL) receptor. A mutation in this gene could affect the receptor’s ability to bind LDL, possibly influencing downstream metabolic processes. Mutations in TTC7A are known to cause problems in intestinal development, including early-onset inflammatory bowel disease, and our results suggest it may be implicated in other metabolic disruptions, although mechanistic studies would be needed to confirm any relationship to newborn adiposity. The strongest total effect of any variant we found was a SNP in the gene DSCAML1, which had significant negative effects through non-metabolic maternal and fetal pathways. This gene is known to be involved in neuronal differentiation; our analyses suggest it may also contribute to birthweight regulation.
As mentioned in Methods, we specifically calibrated our newborn birthweight network to include the variant rs7034200 in GLIS3, which previous studies identified as having opposite maternal and fetal effects on birthweight. In our network, this variant fell into Group 3, with a significant fetal effect only. However, the direct maternal genetic effect had a point estimate of the opposite sign, with p = 0.1, as well as a maternal metabolic indirect effect point estimate of the same sign as the maternal genetic effect. Because GLIS3 is known to affect glucose metabolism, we fit the mediation model again using only glucose, rather than all metabolites together, as the mediator. In this analysis, there were significant fetal (p = 0.002) and maternal metabolic (p = 0.02) indirect effects of opposite sign, consistent with previous findings [29]. The difference in findings for a glucose-only model versus a model including all metabolites may be due to other metabolites unrelated to GLIS3 contributing noise to the estimation in the full mediation model, whereas in the glucose-only model, it was possible to detect an effect. This example highlights the fact that if a particular metabolite is known a priori to be a mechanism of action for a variant, focusing on that metabolite alone could be more powerful than taking all the metabolites together.
Discussion and conclusions
In this paper, we introduced a novel multi-omics pipeline for maternal genetic variant selection and mediation effect testing, and used it to test for metabolic and genetic effects on three different newborn outcomes. Unlike previous work in this area, our method explicitly integrates metabolomic and genomic data and groups maternal genetic variants into categories based on both data types. This integration is made possible by the use of a multi-omic Bayesian network model to identify mediation candidates, and a mediation framework that estimates joint indirect effects through sets of multiple mediators.
This latter technique is especially important. Previous studies have used Bayesian networks to identify mediation model candidates in a similar way, but were limited by focusing on one or two individual mediators at a time. Our approach estimates mediation effects through the entire metabolome and directly compares them to effects through genetic pathways. It may seem like estimating a joint effect through many mediators sacrifices specificity. However, the metabolome is a large network with a complicated dependency structure that would be difficult if not impossible to map out in a directed acyclic graph or structural equation model, and exactly how mediation effects through individual metabolites would be estimated and interpreted is a challenging problem in its own right. Estimation of joint mediation effects through large groups of mediators allows for identifiable mediation effects even when the dependence structure of the collection of mediators is complicated or unknown. In most cases, this is an advantage because we may not have any a priori information about possible mechanisms of action for a particular genetic variant, and the joint approach does not require us to. However, as we saw in the GLIS3 example, when a more specific mechanism is known, focusing on that mechanism is a more powerful approach. Fortunately, our framework can handle both these approaches. As in the GLIS3 example, where we replaced the full set of metabolites with only glucose, one could replace any larger set of mediators with a smaller subset known to be affected by a particular variant.
Our results when applying our method to the HAPO data suggest that maternal genetic variants affecting newborn adiposity through metabolic pathways (i.e. have a statistically significant \(e_{met}\)) tend to be specific to that pathway, and do not act through other maternal mechanisms or through newborn inheritance (i.e. have non-significant \(e_{mg}\) and \(e_{fg}\)). This is evidenced by the relatively small number of variants with both a metabolic mediation effect and either a direct maternal genetic effect or a fetal genetic effect (groups 4, 6, and 7 in the Results). The magnitude of the metabolic effects also tended to be smaller than that of the the other two genetic pathways, though this is consistent with the broader range of physiological processes encompassed by these other pathways. The direct maternal genetic effect, \(e_{mg}\), includes all maternal pathways downstream from the maternal SNP not explicitly accounted for in our model, i.e. all non-metabolic pathways, such as proteomic, lipidomic, or microbiomic pathways. Likewise, the fetal genetic effect includes all fetal pathways downstream from the fetal SNP. Although the only other pathway we considered explicitly in this paper was the maternal metabolic pathway, our model framework is flexible enough to incorporate any maternal or fetal pathway as long as the Bayesian network and mediation models are specified correctly based on the particular context. For example, if maternal and newborn genotype, as well as maternal metabolomic and proteomic data, are available, an analysis similar to ours could be conducted with an additional mediation pathway for proteomics. Effect patterns could then be sorted into 16 groups instead of 8. The approach is thus broadly applicable for untangling path-specific contributions in the modern integrated multi-omics landscape. Our method also identified several genes that matched the findings in previous studies on the maternal vs. fetal genetic effects of maternal variants on newborn birthweight. The other two newborn outcomes included in this paper, cord C-peptide and SSF, are less well-studied than newborn birthweight and to the best of our knowledge do not have corresponding analyses in the literature.
These results demonstrate the potential for scientific discovery of a multi-omics mediation method. Nevertheless, our study has a number of limitations. Perhaps the most significant is the relatively small sample size of our HAPO application compared with that of other studies examining maternal vs. fetal genetic pathways. This is particularly important for interpreting GWAS results, and is part of the reason we emphasize overall effect patterns rather than specific genetic variants in the presentation of our results. While the BNM filter helps to limit the number of variants under consideration in a biologically principled way, it cannot replace the stringent significance thresholds typically used in GWAS. Future studies applying our method to larger datasets could provide detailed information on both effect patterns and specific variants. A second limitation is that our model does not consider potential interactions between maternal metabolites and fetal genetics. Such interactions could also contribute to determining newborn adiposity outcomes, and elucidating them will be an important step in our understanding of the factors affecting this inheritance pathway. Although such interactions were beyond the scope of this particular study given our small sample size, future applications of our method could modify the BNM and mediation model to include them. Finally, application of our methodology requires specifying a set of metabolites through which to compute mediation effects. Our primary interest was in the total effect mediated through a large group of metabolites, but we also examined a particular case where the effect of variants in a particular gene, GLIS3, were known a priori to be mediated largely by the metabolite glucose. This example illustrates a tradeoff inherent to application of the method. The larger the set of metabolites considered as mediators, the higher the power to detect the metabolic pathway effects of variants that act diffusely, but in a consistent effect direction, through many metabolites. On the other hand, variants that act primarily through a single metabolite could become false negatives in the presence of noise from many other metabolites. This tradeoff is somewhat unavoidable without knowing the mechanism of every SNP ahead of time, though larger sample sizes than ours would alleviate some of this difficulty. Additional research is needed to clarify the differences in power for SNPs that operate in these distinct ways.
Data availability
The maternal and newborn genotype data on European, Mexican American, Thai and Afro-Caribbean ancestry and the accompanying phenotype data are currently available in dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs000096.v4.p1%26;phv=163664%26;phd=2831%26;pha=%26;pht=2446%26;phvf=%26;phdf=%26;phaf=%26;phtf=%26;dssp=1%26;consent=%26;temp=1). An additional subset of 2680 Northern European and transethnic samples, metabolomic data and codes used for analyses will be made available by the authors upon request.
References
HAPO Study Cooperative Research Group. Hyperglycemia and adverse pregnancy outcomes. N Engl J Med. 2008;358:1991–2002.
HAPO Study Cooperative Research Group. Hyperglycemia and adverse pregnancy outcome (HAPO) study: associations with maternal body mass index. BJOG. 2010;117:575–84.
Hayes MG, Urbanek M, Hivert MF, Armstrong LL, Morrison J, et al. Identification of HKDC1 and BACE2 as genes influencing glycemic traits during pregnancy through genome-wide association studies. Diabetes. 2013;62(9):3282–91.
Kadakia R, Nodzenski M, Talbot O, Kuang A, Bain JR, et al. Maternal metabolites during pregnancy are associated with newborn outcomes and hyperinsulinaemia across ancestries. Diabetologia. 2019;82:473–84.
Liu Y, Kuang A, Bain JR, Muehlbauer MJ, Ilkayeva OR, Lowe LP, et al. Maternal metabolites associated with gestational diabetes mellitus and a postpartum disorder of glucose metabolism. J Clin Endocrinol Metab. 2021;106(11):3283–94.
Kuang A, Hayes MG, Hivert MF, Balasubramanian R Jr, Scholtens DM WLL. Network Approaches to Integrate Analyses of Genetics and Metabolomics Data with Applications to Fetal Programming Studies. Metabolites. 2022;12(6):512.
Early Growth Genetics Consortium. Genetic evidence for causal relationships between maternal obesity-related traits and birth weight. J Am Med Assoc. 2016;315(11):1129–40.
Early Growth Genetics Consortium. Genome-wide associations for birth weight and correlations with adult disease. Nature. 2017;538(7624):248–52.
Tyrrell J, Richmond RC, Palmer TM, Feenstra B, Rangarajan J, Metrustry S, et al. Genetic evidence for causal relationships between maternal obesity-related traits and birth weight. JAMA. 2016;315(11):1129–40.
Warrington NM, Beaumont RN, Horikoshi M, Day FR, Øyvind Helgeland, Laurin C, et al. Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat Genet. 2019;51(8):804–14.
Beaumont RN, Warrington NM, Cavadino A, Tyrrell J, Nodzenski M, Horikoshi M, et al. Genome-wide association study of offspring birth weight in 86577 women identifies five novel loci and highlights maternal genetic effects that are independent of fetal genetics. Hum Mol Genet. 2018;27(4):742–56.
Juliusdottir T, Steinthorsdottir V, Stefansdottir L, Sveinbjornsson G, Ivarsdottir EV, Thorolfsdottir RB, et al. Distinction between the effects of parental and fetal genomes on fetal growth. Nat Genet. 2021;53:1135–42.
Ruiz-Perez D, Lugo-Martinez J, Bourguignon N, Mathee K, Lerner B, Bar-Joseph Z, et al. Dynamic bayesian networks for integrating multi-omics time series microbiome data. mSystems. 2021. https://doi.org/10.1128/msystems.01105-20.
Wang Q, Chen R, Cheng F, Wei Q, Ji Y, Yang H, et al. A bayesian framework that integrates multi-omics data and gene networks predicts risk genes from schizophrenia GWAS data. Nat Neurosci. 2019;22(5):691–9.
HAPO Study Cooperative Research Group. The hyperglycemia and adverse pregnancy outcome (HAPO) study. Int J Gynaecol Obstet. 2002;78:69–77.
HAPO Study Cooperative Research Group, Nesbitt GS, Syme M, Sheridan B, Lappin TRJ, Trimble ER. Integration of local and central laboratory functions in a worldwide multicentre study: experience from the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study. Clin Trials. 2006;3:397–407.
Urbanek M, Hayes MG, Armstrong LL, Morrison J, Lowe LP, Badon SE, et al. The chromosome 3q25 genomic region is associated with measures of adiposity in newborns in a multi-ethnic genome-wide association study. Hum Mol Genet. 2013;22:3583–96.
Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48:1284–7.
Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature. 2021;590:290–9.
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–8.
Halket JM, Przyborowska A, Stein SE, Mallard G, Down S, Chalmers RA. Deconvolution gas chromatography/mass spectrometry of urinary organic acids – potential for pattern recognition and automated identification of metabolic disorders. Rapid Commun Mass Spectrom. 1999;13:279–84.
AMDIS. 2003. Available online: http://amdis.net. Accessed 22 June 2021.
Nodzenski M, Muehlbauer MJ, Bain JR, Reisetter AC, Lowe WL Jr, Scholtens DM. Metabomxtr: an R package for mixture-model analysis of non-targeted metabolomics data. Bioinformatics. 2014;30:3287–8.
Reisetter AC, Muehlbauer MJ, Bain JR, Nodzenski M, Stevens RD, Ilkayeva O, et al. Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data. BMC Bioinformatics. 2017;18(1):84.
Gieger C, Geistlinger L, Altmaier E, Hrabe de Angelis M, Kronenberg F, Meitinger T, et al. Genetics meets metabolomics: A genome-wide association study of metabolite profiles in human serum. PLoS Genet. 2008;4(11):e1000282.
Shin SY, Fauman EB, Petersen AK, Krumsiek J, Santos R, Huang J, et al. An atlas of genetic influences on human blood metabolites. Nat Genet. 2014;46:543–50.
Kuang A, Hivert MF, Hayes MG, Jr WLL, Scholtens DM. Multi-ancestry genome-wide association analyses: a comparison of meta- and mega-analyses in the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study. BMC Genomics. 2025;26(1):65.
Aragam B, Zhou Q. Concave penalized estimation of sparse Gaussian Bayesian networks. J Mach Learn Res. 2015;16:2273–328.
Hwang LD, Cuellar-Partida G, Yengo L, Zeng J, Beaumont RN, Freathy RM, et al. Direct and INdirect effects analysis of Genetic lOci (DINGO): A software package to increase the power of locus discovery in GWAS meta-analyses of perinatal phenotypes and traits influenced by indirect genetic effects. 2023. Available on medRxiv: https://doi.org/10.1101/2023.08.22.23294446.
VanderWeele TJ, Vansteelandt S. Mediation analysis with multiple mediators. Epidemiol Methods. 2014;2(1):95–115.
Rosseel Y. lavaan: An R Package for Structural Equation Modeling. J Stat Softw. 2012;48:1-36.
Chen J, Bacelis J, Sole-Navais P, Srivastava A, Juodakis J, Rouse A, et al. Dissecting maternal and fetal genetic effects underlying the associations between maternal phenotypes, birth outcomes, and adult phenotypes: a mendelian-randomization and haplotype-based genetic score analysis in 10,734 mother-infant pairs. PLoS Med. 2020. https://doi.org/10.1371/journal.pmed.1003305.
Mahajan A, Wessel J, Willems SM, et al. Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. Nat Genet. 2018;50:559–71.
Ortega R, Hueso L, Benito E, Ortega J, Civera M, Sanz MJ, et al. The nuclear retinoid-related orphan receptor RORa controls adipose tissue inflammation in patients with morbid obesity and diabetes. Int J Obes (Lond). 2021;45(7):1369–81.
Calderari S, Ria M, Gerard C, Nogueira TC, Villate O, Collins SC, et al. Molecular genetics of the transcription factor GLIS3 identifies its dual function in beta cells and neurons. Genomics. 2018;110(2):98–111.
Acknowledgements
We thank William L. Lowe, Jr. and Marie-France Hivert for helpful discussions on the interpretation of biological findings.
Funding
The HAPO Study was funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (grant Nos. R01HD34242 and R01HD34243), with additional HAPO ancillary study data obtained through the National Institute of Diabetes and Digestive and Kidney Diseases (grant Nos. R01DK095963 and R01DK117491). The data analyses described in this work were funded by the National Library of Medicine (grant No. R01LM013444).
Author information
Authors and Affiliations
Contributions
NPG and DMS contributed to the conceptualization, methodology, interpretation of results, and writing of the manuscript. AK analyzed the data and contributed to the writing of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study was approved by the Northwestern University Institutional Review Board and adhered to the Declaration of Helsinki for research in human subjects. Additionally, the original HAPO study protocol was approved by the institutional review boards (IRBs) at each study field center (protocol code 353-003, date of approval 28 June 2000). All HAPO participants gave written informed consent.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Additional tables and figures
Appendix A Additional tables and figures
Magnitudes of mediation effects for the sum of skin folds (SSF) outcome. A variant is shown on the y-axis if it fell into one of the 7 effect pattern groups. White spaces indicate an insignificant effect for that pathway (nominal \(p \ge 0.05\)). Variants that were not connected to SSF in the BNM are not shown. Abbreviations: Mat. Gen. DE = Maternal Genetic Direct Effect, Fet. Gen. IE = Fetal Genetic Indirect Effect, Met. IE = Metabolomic Indirect Effect, TE = Total Effect
Magnitudes of mediation effects for the cord C-peptide outcome. A variant is shown on the y-axis if it fell into one of the 7 effect pattern groups. White spaces indicate an insignificant effect for that pathway (nominal \(p \ge 0.05\)). Variants that were not connected to cord C-peptide in the BNM are not shown. Abbreviations: Mat. Gen. DE = Maternal Genetic Direct Effect, Fet. Gen. IE = Fetal Genetic Indirect Effect, Met. IE = Metabolomic Indirect Effect, TE = Total Effect
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gill, N.P., Kuang, A. & Scholtens, D.M. Multi-omics mediation pipeline reveals differential pathways of maternal SNPs affecting newborn adiposity outcomes. BMC Genom Data 26, 66 (2025). https://doi.org/10.1186/s12863-025-01355-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1186/s12863-025-01355-w