+
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Primer
  • Published:

Genome-wide association studies

Abstract

Genome-wide association studies (GWAS) test hundreds of thousands of genetic variants across many genomes to find those statistically associated with a specific trait or disease. This methodology has generated a myriad of robust associations for a range of traits and diseases, and the number of associated variants is expected to grow steadily as GWAS sample sizes increase. GWAS results have a range of applications, such as gaining insight into a phenotype’s underlying biology, estimating its heritability, calculating genetic correlations, making clinical risk predictions, informing drug development programmes and inferring potential causal relationships between risk factors and health outcomes. In this Primer, we provide the reader with an introduction to GWAS, explaining their statistical basis and how they are conducted, describe state-of-the art approaches and discuss limitations and challenges, concluding with an overview of the current and future applications for GWAS results.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of steps for conducting GWAS.
Fig. 2: Manhattan plot and quantile–quantile plot to visualize GWAS results.
Fig. 3: Illustration of functional follow-up of GWAS.
Fig. 4: Overview of the steps necessary for calculating PRSs.

Similar content being viewed by others

References

  1. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017). This article provides an excellent overview of the main conclusions from 10 years of GWAS and addresses future challenges for the field.

    Google Scholar 

  2. Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

    ADS  Google Scholar 

  3. Siminovitch, K. A. PTPN22 and autoimmune disease. Nat. Genet. 36, 1248–1249 (2004).

    Google Scholar 

  4. Wang, K. et al. Diverse genome-wide association studies associate the IL12/IL23 pathway with Crohn disease. Am. J. Hum. Genet. 84, 399–405 (2009).

    Google Scholar 

  5. Moschen, A. R., Tilg, H. & Raine, T. IL-12, IL-23 and IL-17 in IBD: immunobiology and therapeutic targeting. Nat. Rev. Gastroenterol. Hepatol. 16, 185–196 (2019).

    Google Scholar 

  6. Benjamin, D. J. et al. The promises and pitfalls of genoeconomics. Annu. Rev. Econ. 4, 627–662 (2012).

    Google Scholar 

  7. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Google Scholar 

  8. Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019). This paper analyses thousands of complex traits to chart the extent of pleiotropy in the human genome, finding trait-associated loci spread across much of the genome, and the majority associated with more than one trait.

    Google Scholar 

  9. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).

    Google Scholar 

  10. Jansen, P. R. et al. Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat. Genet. 51, 394–403 (2019). Together with Lee et al. (2018), this study was the first GWAS to have a sample size >1,000,000.

    Google Scholar 

  11. Holland, D. et al. Beyond SNP heritability: polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model. PLOS Genet. 16, e1008612 (2020).

    Google Scholar 

  12. Slatkin, M. Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).

    Google Scholar 

  13. Uffelmann, E. & Posthuma, D. Emerging methods and resources for biological interrogation of neuropsychiatric polygenic signal. Biol. Psychiatry 89, 41–53 (2021).

    Google Scholar 

  14. Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006).

    Google Scholar 

  15. Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).

    Google Scholar 

  16. Holmes, M. V., Ala-Korpela, M. & Smith, G. D. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat. Rev. Cardiol. 14, 577–590 (2017).

    Google Scholar 

  17. Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

    Google Scholar 

  18. Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    Google Scholar 

  19. Rietveld, C. A. et al. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc. Natl Acad. Sci. USA 111, 13790–13794 (2014).

    ADS  Google Scholar 

  20. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Google Scholar 

  21. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    ADS  Google Scholar 

  22. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature 590, 290–299 (2021).

    ADS  Google Scholar 

  23. Lam, M. et al. RICOPILI: rapid imputation for COnsortias PIpeLIne. Bioinformatics 36, 930–933 (2020).

    Google Scholar 

  24. Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).

    Google Scholar 

  25. Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).

    ADS  Google Scholar 

  26. Lawson, D. J. et al. Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Hum. Genet. 139, 23–41 (2020).

    Google Scholar 

  27. Morris, T. T., Davies, N. M., Hemani, G. & Smith, G. D. Population phenomena inflate genetic associations of complex social traits. Sci. Adv. 6, eaay0328 (2020).

    ADS  Google Scholar 

  28. Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018).

    Google Scholar 

  29. Kerminen, S. et al. Geographic variation and bias in the polygenic scores of complex diseases and traits in Finland. Am. J. Hum. Genet. 104, 1169–1181 (2019).

    Google Scholar 

  30. Zaidi, A. A. & Mathieson, I. Demographic history mediates the effect of stratification on polygenic scores. eLife 9, e61548 (2020). This paper investigates the effects of residual population structure on GWAS in simulated populations with different demographic histories and shows that commonly used methods such as principal components of common variants cannot correct for recent population stratification.

    Google Scholar 

  31. Brumpton, B. et al. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nat. Commun. 11, 3519 (2020).

    ADS  Google Scholar 

  32. Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).

    ADS  Google Scholar 

  33. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Google Scholar 

  34. Pirinen, M., Donnelly, P. & Spencer, C. C. A. Including known covariates can reduce power to detect genetic effects in case–control studies. Nat. Genet. 44, 848–851 (2012).

    Google Scholar 

  35. Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    Google Scholar 

  36. Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    Google Scholar 

  37. Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).

    Google Scholar 

  38. Altshuler, D. & Donnelly, P., The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

    ADS  Google Scholar 

  39. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Google Scholar 

  40. Baselmans, B. M. L. et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet. 51, 445–451 (2019).

    Google Scholar 

  41. Rangamaran, V. R., Uppili, B., Gopal, D. & Ramalingam, K. EasyQC: tool with interactive user interface for efficient next-generation sequencing data quality control. J. Comput. Biol. 25, 1301–1311 (2018).

    Google Scholar 

  42. Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).

    Google Scholar 

  43. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    Google Scholar 

  44. Neale, B. M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).

    Google Scholar 

  45. Zaitlen, N. et al. Informed conditioning on clinical covariates increases power in case–control association studies. PLoS Genet. 8, e1003032 (2012).

    Google Scholar 

  46. Moskvina, V., Holmans, P., Schmidt, K. M. & Craddock, N. Design of case–controls studies with unscreened controls. Ann. Hum. Genet. 69, 566–576 (2005).

    Google Scholar 

  47. Pirastu, N. et al. Genetic analyses identify widespread sex-differential participation bias. Nat. Genet. 53, 663–671 (2021).

    Google Scholar 

  48. Benyamin, B., Visscher, P. M. & McRae, A. F. Family-based genome-wide association studies. Pharmacogenomics 10, 181–190 (2009).

    Google Scholar 

  49. Teng, J. & Risch, N. The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases. II. individual genotyping. Genome Res. 9, 234–241 (1999).

    Google Scholar 

  50. Mostafavi, H. et al. Variable prediction accuracy of polygenic scores within an ancestry group. eLife 9, e48376 (2020).

    Google Scholar 

  51. Robinson, M. R. et al. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 47, 1357–1362 (2015).

    Google Scholar 

  52. Purcell, S., Sham, P. & Daly, M. J. Parental phenotypes in family-based association analysis. Am. J. Hum. Genet. 76, 249–259 (2005).

    Google Scholar 

  53. Abecasis, G. R., Cardon, L. R. & Cookson, W. O. C. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).

    Google Scholar 

  54. Fulker, D. W., Cherny, S. S., Sham, P. C. & Hewitt, J. K. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64, 259–267 (1999).

    Google Scholar 

  55. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Google Scholar 

  56. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 5, 1097–1103 (2021).

    Google Scholar 

  57. Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018). This paper shows for the first time that part of the signal in the GWAS for some traits is from ‘indirect genetic effects’ that act through parents rather than directly on the index individual, and shows how these can be disentangled with family data.

    ADS  Google Scholar 

  58. Bates, T. C. et al. The nature of nurture: using a virtual-parent design to test parenting effects on children’s educational attainment in genotyped families. Twin Res. Hum. Genet. 21, 73–83 (2018).

    Google Scholar 

  59. Young, A. I. et al. Mendelian imputation of parental genotypes for genome-wide estimation of direct and indirect genetic effects. Preprint at bioRxiv https://doi.org/10.1101/2020.07.02.185199v1 (2020).

    Article  Google Scholar 

  60. Howe, L. J. et al. Within-sibship GWAS improve estimates of direct genetic effects. Preprint at bioRxiv https://doi.org/10.1101/2021.03.05.433935v1 (2021). This study is the largest within-sibship GWAS to date and illustrates the value of this method for disentangling direct genetic effects from indirect genetic effects and population structure.

    Article  Google Scholar 

  61. Liu, J. Z., Erlich, Y. & Pickrell, J. K. Case–control association mapping by proxy using family history of disease. Nat. Genet. 49, 325–331 (2017).

    Google Scholar 

  62. Hujoel, M. L. A., Gazal, S., Loh, P.-R., Patterson, N. & Price, A. L. Liability threshold modeling of case–control status and family history of disease increases association power. Nat. Genet. 52, 541–547 (2020).

    Google Scholar 

  63. Hatzikotoulas, K., Gilly, A. & Zeggini, E. Using population isolates in genetic association studies. Brief. Funct. Genomics 13, 371–377 (2014).

    Google Scholar 

  64. Xue, Y. et al. Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations. Nat. Commun. 8, 15927 (2017).

    ADS  Google Scholar 

  65. Chheda, H. et al. Whole-genome view of the consequences of a population bottleneck using 2926 genome sequences from Finland and United Kingdom. Eur. J. Hum. Genet. 25, 477–484 (2017).

    Google Scholar 

  66. Lim, E. T. et al. Distribution and medical impact of loss-of-function variants in the finnish founder population. PLoS Genet. 10, e1004494 (2014). This paper gives a good illustration of the value of isolated populations for identifying founder variants of large effect that are rare in other populations.

    Google Scholar 

  67. Service, S. et al. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nat. Genet. 38, 556–560 (2006).

    Google Scholar 

  68. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008).

    Google Scholar 

  69. Palin, K., Campbell, H., Wright, A. F., Wilson, J. F. & Durbin, R. Identity-by-descent-based phasing and imputation in founder populations using graphical models. Genet. Epidemiol. 35, 853–860 (2011).

    Google Scholar 

  70. Glodzik, D. et al. Inference of identity by descent in population isolates and optimal sequencing studies. Eur. J. Hum. Genet. 21, 1140–1145 (2013).

    Google Scholar 

  71. Uricchio, L. H., Chong, J. X., Ross, K. D., Ober, C. & Nicolae, D. L. Accurate imputation of rare and common variants in a founder population from a small number of sequenced individuals. Genet. Epidemiol. 36, 312–319 (2012).

    Google Scholar 

  72. Herzig, A. F. et al. Strategies for phasing and imputation in a population isolate. Genet. Epidemiol. 42, 201–213 (2018).

    Google Scholar 

  73. Zeggini, E., Gloyn, A. L. & Hansen, T. Insights into metabolic disease from studying genetics in isolated populations: stories from Greece to Greenland. Diabetologia 59, 938–941 (2016).

    Google Scholar 

  74. Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).

    Google Scholar 

  75. Do, R. et al. Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature 518, 102–106 (2015).

    Google Scholar 

  76. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). This paper describes the production of genetic data for the UK Biobank, which has been widely used in GWAS.

    ADS  Google Scholar 

  77. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    Google Scholar 

  78. Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).

    Google Scholar 

  79. Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53, 185–194 (2021).

    Google Scholar 

  80. Hill, W. D. et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol. Psychiatry 24, 169–181 (2019).

    Google Scholar 

  81. Elliott, L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018).

    ADS  Google Scholar 

  82. Thorp, J. G. et al. Symptom-level modelling unravels the shared genetic architecture of anxiety and depression. Nat. Hum. Behav. https://doi.org/10.1038/s41562-021-01094-9 (2021).

    Article  Google Scholar 

  83. Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).

    Google Scholar 

  84. Ferreira, M. A. R. et al. Age-of-onset information helps identify 76 genetic variants associated with allergic disease. PLoS Genet. 16, e1008725 (2020).

    Google Scholar 

  85. Purves, K. L. et al. A major role for common genetic variation in anxiety disorders. Mol. Psychiatry https://doi.org/10.1038/s41380-019-0559-1 (2019).

    Article  Google Scholar 

  86. Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).

    Google Scholar 

  87. Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).

    ADS  Google Scholar 

  88. Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).

    ADS  Google Scholar 

  89. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

    Google Scholar 

  90. Raychaudhuri, S. Mapping rare and common causal alleles for complex human diseases. Cell 147, 57–69 (2011).

    Google Scholar 

  91. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    Google Scholar 

  92. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Google Scholar 

  93. Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

    Google Scholar 

  94. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Google Scholar 

  95. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Google Scholar 

  96. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).

    MathSciNet  Google Scholar 

  97. Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    ADS  Google Scholar 

  98. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    ADS  Google Scholar 

  99. Dendrou, C. A., Petersen, J., Rossjohn, J. & Fugger, L. HLA variation and disease. Nat. Rev. Immunol. 18, 325–339 (2018).

    Google Scholar 

  100. Study, T. I. H. C. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 330, 1551–1557 (2010).

    ADS  Google Scholar 

  101. Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).

    Google Scholar 

  102. Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE 8, e64683 (2013).

    ADS  Google Scholar 

  103. Zheng, X. et al. HIBAG — HLA genotype imputation with attribute bagging. Pharmacogenomics J. 14, 192–200 (2014).

    Google Scholar 

  104. Dilthey, A. T., Moutsianas, L., Leslie, S. & McVean, G. HLA*IMP — an integrated framework for imputing classical HLA alleles from SNP genotypes. Bioinformatics 27, 968–972 (2011).

    Google Scholar 

  105. Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).

    Google Scholar 

  106. Vukcevic, D. et al. Imputation of KIR types from SNP variation data. Am. J. Hum. Genet. 97, 593–607 (2015).

    Google Scholar 

  107. Yamamoto, K. et al. Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun. Biol. 3, 1–11 (2020).

    Google Scholar 

  108. Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single variant resolution. Nature 547, 173–178 (2017).

    ADS  Google Scholar 

  109. Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).

    Google Scholar 

  110. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    Google Scholar 

  111. Sinnott-Armstrong, N., Naqvi, S., Rivas, M. & Pritchard, J. K. GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background. eLife 10, e58615 (2021).

    Google Scholar 

  112. Smemo, S. et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371–375 (2014).

    ADS  Google Scholar 

  113. Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).

    ADS  Google Scholar 

  114. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164–e164 (2010).

    Google Scholar 

  115. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Google Scholar 

  116. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    ADS  Google Scholar 

  117. Tak, Y. G. & Farnham, P. J. Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenetics Chromatin 8, 57 (2015).

    Google Scholar 

  118. Barbeira, A. N. et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 22, 49 (2021).

    Google Scholar 

  119. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    ADS  Google Scholar 

  120. Morris, J. A. et al. Discovery of target genes and pathways of blood trait loci using pooled CRISPR screens and single cell RNA sequencing. Preprint at bioRxiv https://doi.org/10.1101/2021.04.07.438882v1 (2021).

    Article  Google Scholar 

  121. Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

    ADS  Google Scholar 

  122. GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Google Scholar 

  123. van der Wijst, M. et al. The single-cell eQTLGen consortium. eLife 9, e52155 (2020).

    Google Scholar 

  124. Kerimov, N. et al. eQTL Catalogue: a compendium of uniformly processed human gene expression and splicing QTLs. Preprint at bioRxiv https://doi.org/10.1101/2020.01.29.924266v1 (2020).

    Article  Google Scholar 

  125. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Google Scholar 

  126. GTEx Consortium et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Google Scholar 

  127. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    Google Scholar 

  128. Wen, X., Pique-Regi, R. & Luca, F. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 13, e1006646 (2017).

    Google Scholar 

  129. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Google Scholar 

  130. Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).

    Google Scholar 

  131. Greenwald, W. W. et al. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nat. Commun. 10, 1054 (2019).

    ADS  Google Scholar 

  132. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    ADS  Google Scholar 

  133. Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390.e19 (2019).

    Google Scholar 

  134. Mulvey, B., Lagunas, T. & Dougherty, J. D. Massively parallel reporter assays: defining functional psychiatric genetic variants across biological contexts. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2020.06.011 (2020).

    Article  Google Scholar 

  135. Canver, M. C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).

    ADS  Google Scholar 

  136. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    Google Scholar 

  137. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).

    Google Scholar 

  138. Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).

    Article  Google Scholar 

  139. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    Google Scholar 

  140. Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).

    Google Scholar 

  141. Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

    Google Scholar 

  142. Choi, S. W., Mak, T. S.-H. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).

    Google Scholar 

  143. Martin, A. R., Daly, M. J., Robinson, E. B., Hyman, S. E. & Neale, B. M. Predicting polygenic risk of psychiatric disorders. Biol. Psychiatry 86, 97–109 (2019).

    Google Scholar 

  144. Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).

    Google Scholar 

  145. International Schizophrenia Consortium. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    Google Scholar 

  146. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    ADS  Google Scholar 

  147. Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).

    ADS  Google Scholar 

  148. Márquez-Luna, C., Loh, P.-R., South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).

    Google Scholar 

  149. Márquez-Luna, C. et al. Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Preprint at bioRxiv https://doi.org/10.1101/375337v1 (2018).

    Article  Google Scholar 

  150. Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa1029 (2020).

    Article  Google Scholar 

  151. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Google Scholar 

  152. Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Google Scholar 

  153. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).

    ADS  Google Scholar 

  154. Craig, J. E. et al. Multitrait analysis of glaucoma identifies new risk loci and enables polygenic prediction of disease susceptibility and progression. Nat. Genet. 52, 160–166 (2020).

    Google Scholar 

  155. López-Ratón, M., Rodríguez-Álvarez, M. X., Cadarso-Suárez, C. & Gude-Sampedro, F. OptimalCutpoints: an R package for selecting optimal cutpoints in diagnostic tests. J. Stat. Softw. 61, 1–36 (2014).

    Google Scholar 

  156. Wald, N. J. & Old, R. The illusion of polygenic disease risk prediction. Genet. Med. 21, 1705–1707 (2019).

    Google Scholar 

  157. Mihaescu, R. et al. Improvement of risk prediction by genomic profiling: reclassification measures versus the area under the receiver operating characteristic curve. Am. J. Epidemiol. 172, 353–361 (2010).

    Google Scholar 

  158. Li, R., Chen, Y., Ritchie, M. D. & Moore, J. H. Electronic health records and polygenic risk scores for predicting disease risk. Nat. Rev. Genet. 21, 493–502 (2020).

    Google Scholar 

  159. Mars, N. et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat. Med. 26, 549–557 (2020).

    Google Scholar 

  160. Riveros-Mckay, F. et al. Integrated polygenic tool substantially enhances coronary artery disease prediction. Circ. Genomic Precis. Med. 14, e003304 (2021). This paper proposes a method to integrate clinical risk scores and PRSs for coronary artery disease and shows the improved predictive accuracy of PRSs over established clinical risk factors in European-ancestry individuals from the UK Biobank.

    Google Scholar 

  161. Sun, L. et al. Polygenic risk scores in cardiovascular risk prediction: a cohort study and modelling analyses. PLoS Med. 18, e1003498 (2021). This paper recalibrated risk prediction models in the UK Biobank to what would be expected in an unbiased UK population to account for the bias caused by UK Biobank participants being healthier and wealthier, which is seldom considered in other studies in this field.

    Google Scholar 

  162. Weale, M. E. et al. Validation of an integrated risk tool, including polygenic risk score, for atherosclerotic cardiovascular disease in multiple ethnicities and ancestries. Am. J. Cardiol. 148, 157–164 (2021). This paper applies the integrated model proposed by Riveros-Mckay et al. (2021) to diverse populations in the UK Biobank and provides the first cross-ancestry validation of the clinical utility of adding polygenic scores into clinical risk tools.

    Google Scholar 

  163. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    Google Scholar 

  164. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Google Scholar 

  165. Scutari, M., Mackay, I. & Balding, D. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 12, e1006288 (2016).

    Google Scholar 

  166. Sakaue, S. et al. Functional variants in ADH1B and ALDH2 are non-additively associated with all-cause mortality in Japanese population. Eur. J. Hum. Genet. 28, 378–382 (2020).

    Google Scholar 

  167. Cavazos, T. B. & Witte, J. S. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. HGG Adv. 2, 100017 (2021).

    Google Scholar 

  168. Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).

    Google Scholar 

  169. Wand, H. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021).

    ADS  Google Scholar 

  170. Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).

    Google Scholar 

  171. Fisher, R. A. XV. — The correlation between relatives on the supposition of Mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinb. 52, 399–433 (1919).

    Google Scholar 

  172. Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics (Pearson, Prentice Hall, 2009).

  173. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Google Scholar 

  174. Schizophrenia Working Group of the Psychiatric Genomics Consortium. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Google Scholar 

  175. Wainschtein, P. et al. Recovery of trait heritability from whole genome sequence data. Preprint at bioRxiv https://doi.org/10.1101/588020 (2019).

    Article  Google Scholar 

  176. Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    ADS  Google Scholar 

  177. Bomba, L., Walter, K. & Soranzo, N. The impact of rare and low-frequency genetic variants in common disease. Genome Biol. 18, 77 (2017).

    Google Scholar 

  178. Bergen, S. E., Gardner, C. O. & Kendler, K. S. Age-related changes in heritability of behavioral phenotypes over adolescence and young adulthood: a meta-analysis. Twin Res. Hum. Genet. 10, 423–433 (2007).

    Google Scholar 

  179. Bernabeu, E. et al. Sexual differences in genetic architecture in UK Biobank. Preprint at bioRxiv https://doi.org/10.1101/2020.07.20.211813v1 (2020).

    Article  Google Scholar 

  180. Heath, A. C. et al. Education policy and the heritability of educational attainment. Nature 314, 734–736 (1985).

    ADS  Google Scholar 

  181. Browning, S. R. & Browning, B. L. Population structure can inflate SNP-based heritability estimates. Am. J. Hum. Genet. 89, 191–193; author reply 193–195 (2011).

    Google Scholar 

  182. Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).

    Google Scholar 

  183. Zhang, Y. et al. Local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits. Preprint at bioRxiv https://doi.org/10.1101/2020.05.08.084475v1 (2020).

    Article  Google Scholar 

  184. Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).

    Google Scholar 

  185. Werme, J., Sluis, Svander, Posthuma, D. & de Leeuw, C. A. LAVA: an integrated framework for local genetic correlation analysis. Preprint at bioRxiv https://doi.org/10.1101/2020.12.31.424652v1 (2021).

    Article  Google Scholar 

  186. Jordan, D. M., Verbanck, M. & Do, R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol. 20, 222 (2019).

    Google Scholar 

  187. Smith, G. D. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Google Scholar 

  188. Evans, D. M. & Smiths, G. D. Mendelian randomization: new applications in the coming age of hypothesis-free causality. Annu. Rev. Genomics Hum. Genet. 16, 327–350 (2015).

    Google Scholar 

  189. Wellcome Trust. Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility Vol. 6 (Wellcome Trust, 2003).

  190. COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. 28, 715–718 (2020). This paper presents the recently established COVID-19 Host Genetics Initiative as a prime example of collaboration and team science, forming within a few months, rapidly aggregating data into a massive resource, rapidly crystallizing results and making it all freely available to academics.

    Google Scholar 

  191. Knoppers, B. M. Framework for responsible sharing of genomic and health-related data. HUGO J. 8, 3 (2014).

    Google Scholar 

  192. Peloquin, D., DiMaio, M., Bierer, B. & Barnes, M. Disruptive and avoidable: GDPR challenges to secondary research uses of data. Eur. J. Hum. Genet. 28, 697–705 (2020).

    Google Scholar 

  193. Staunton, C. et al. Protection of Personal Information Act 2013 and data protection for health research in South Africa. Int. Data Priv. Law 10, 160–179 (2020).

    Google Scholar 

  194. Molnár-Gábor, F. & Korbel, J. O. Genomic data sharing in Europe is stumbling — could a code of conduct prevent its fall? EMBO Mol. Med. 12, e11421 (2020).

    Google Scholar 

  195. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Google Scholar 

  196. Bezuidenhout, L. & Chakauya, E. Hidden concerns of sharing research data by low/middle-income country scientists. Glob. Bioeth. Probl. Bioet. 29, 39–54 (2018).

    Google Scholar 

  197. Bull, S. Review: Ensuring global equity in open research. Wellcome Trust https://doi.org/10.6084/M9.FIGSHARE.4055181.V1 (2016).

    Article  Google Scholar 

  198. de Vries, J. et al. The H3Africa policy framework: negotiating fairness in genomics. Trends Genet. 31, 117–119 (2015).

    Google Scholar 

  199. Yakubu, A. et al. Model framework for governance of genomic research and biobanking in Africa — a content description. AAS Open Res. 1, 13 (2018).

    Google Scholar 

  200. O’Doherty, K. C. et al. Toward better governance of human genomic data. Nat. Genet. 53, 2–8 (2021).

    Google Scholar 

  201. Lyon, M. S. et al. The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol. 22, 32 (2021).

    Google Scholar 

  202. Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).

    Google Scholar 

  203. Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A. & Dalton, D. R. HARKing’s threat to organizational research: evidence from primary and meta-analytic sources. Pers. Psychol. 69, 709–750 (2016).

    Google Scholar 

  204. Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2, 196–217 (1998).

    Google Scholar 

  205. Colhoun, H. M., McKeigue, P. M. & Smith, G. D. Problems of reporting genetic associations with complex outcomes. Lancet 361, 865–872 (2003).

    Google Scholar 

  206. John, L. K., Loewenstein, G. & Prelec, D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23, 524–532 (2012).

    Google Scholar 

  207. Chambers, C. D., Feredoes, E., Muthukumaraswamy, S. D. & Etchells, P. J. Instead of ‘playing the game’ it is time to change the rules: Registered Reports at AIMS Neuroscience and beyond. AIMS Neurosci. 1, 4 (2014). This paper introduces the Registered Reports concept, a publishing format in which peer review occurs before data collection and analysis.

    Google Scholar 

  208. Song, F., Hooper & Loke, Y. Publication bias: what is it? How do we measure it? How do we avoid it? Open Access J. Clin. Trials https://doi.org/10.2147/OAJCT.S34419 (2013).

    Article  Google Scholar 

  209. Syed, M. & Donnellan, M. B. Registered reports with developmental and secondary data: some brief observations and introduction to the special issue. Emerg. Adulthood 8, 255–258 (2020).

    Google Scholar 

  210. Van den Akker, O. et al. Preregistration of secondary data analysis: a template and tutorial. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/hvfmr (2019).

    Article  Google Scholar 

  211. Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019). This paper shows that the polygenic selection signal of height in European-ancestry individuals is strongly attenuated when using GWAS summary statistics generated from the UK Biobank rather than the largest GWAS meta-analysis (GIANT consortium).

    Google Scholar 

  212. Refoyo-Martínez, A. et al. How robust are cross-population signatures of polygenic adaptation in humans? Preprint at medRxiv https://doi.org/10.1101/2020.07.13.200030v2 (2020).

    Article  Google Scholar 

  213. Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).

    Google Scholar 

  214. Abdellaoui, A. et al. Genetic correlates of social stratification in Great Britain. Nat. Hum. Behav. 3, 1332–1342 (2019).

    Google Scholar 

  215. Haworth, S. et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun. 10, 333 (2019).

    ADS  Google Scholar 

  216. Selzam, S. et al. Comparing within- and between-family polygenic score prediction. Am. J. Hum. Genet. 105, 351–363 (2019).

    Google Scholar 

  217. Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).

    Google Scholar 

  218. O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).

    Google Scholar 

  219. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    Google Scholar 

  220. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Google Scholar 

  221. Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).

    Google Scholar 

  222. Flannick, J. et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019).

    ADS  Google Scholar 

  223. Singh, T. et al. The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability. Nat. Genet. 49, 1167–1173 (2017).

    Google Scholar 

  224. Luo, Y. et al. Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7. Nat. Genet. 49, 186–192 (2017).

    Google Scholar 

  225. Tindana, P., Molyneux, S., Bull, S. & Parker, M. ‘It is an entrustment’: broad consent for genomic research and biobanks in sub-Saharan Africa. Dev. World Bioeth. 19, 9–17 (2019).

    Google Scholar 

  226. Fisher, C. B. & Layman, D. M. Genomics, big data, and broad consent: a new ethics frontier for prevention science. Prev. Sci. 19, 871–879 (2018).

    Google Scholar 

  227. Nembaware, V. et al. A framework for tiered informed consent for health genomic research in Africa. Nat. Genet. 51, 1566–1571 (2019).

    Google Scholar 

  228. Weiner, C. Anticipate and communicate: ethical management of incidental and secondary findings in the clinical, research, and direct-to-consumer contexts (December 2013 Report of the Presidential Commission for the Study of Bioethical Issues). Am. J. Epidemiol. 180, 562–564 (2014).

    Google Scholar 

  229. Eckstein, L., Garrett, J. R. & Berkman, B. E. A framework for analyzing the ethics of disclosing genetic research findings. J. Law Med. Ethics 42, 190–207 (2014).

    Google Scholar 

  230. Wonkam, A. & de Vries, J. Returning incidental findings in African genomics research. Nat. Genet. 52, 17–20 (2020).

    Google Scholar 

  231. McGuire, A. L. et al. The road ahead in genetics and genomics. Nat. Rev. Genet. 21, 581–596 (2020).

    Google Scholar 

  232. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).

    ADS  Google Scholar 

  233. Hudson, M. et al. Rights, interests and expectations: Indigenous perspectives on unrestricted access to genomic data. Nat. Rev. Genet. 21, 377–384 (2020).

    Google Scholar 

  234. Claw, K. G. et al. A framework for enhancing ethical genomic research with Indigenous communities. Nat. Commun. 9, 2957 (2018).

    ADS  Google Scholar 

  235. Mills, M. C. & Rahal, C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020).

    Google Scholar 

  236. Lautenbach, D. M., Christensen, K. D., Sparks, J. A. & Green, R. C. Communicating genetic risk information for common disorders in the era of genomic medicine. Annu. Rev. Genomics Hum. Genet. 14, 491–513 (2013).

    Google Scholar 

  237. Palk, A. C., Dalvie, S., de Vries, J., Martin, A. R. & Stein, D. J. Potential use of clinical polygenic risk scores in psychiatry — ethical implications and communicating high polygenic risk. Philos. Ethics Humanit. Med. 14, 4 (2019).

    Google Scholar 

  238. Regalado, A. Eugenics 2.0: we’re at the dawn of choosing embryos by health, height, and more. MIT Technology Review https://www.technologyreview.com/2017/11/01/105176/eugenics-20-were-at-the-dawn-of-choosing-embryos-by-health-height-and-more/ (2017).

  239. Kong, C., Dunn, M. & Parker, M. Psychiatric genomics and mental health treatment: setting the ethical agenda. Am. J. Bioeth. 17, 3–12 (2017).

    Google Scholar 

  240. de Vries, J., Landouré, G. & Wonkam, A. Stigma in African genomics research: gendered blame, polygamy, ancestry and disease causal beliefs impact on the risk of harm. Soc. Sci. Med. 258, 113091 (2020).

    Google Scholar 

  241. Merriman, T. & Cameron, V. Risk-taking: behind the warrior gene story. N. Z. Med. J. 120, U2440 (2007).

    Google Scholar 

  242. Gronowski, A. M. & Budelier, M. M. The ethics of direct-to-consumer testing. Clin. Lab. Med. 40, 93–103 (2020).

    Google Scholar 

  243. Blell, M. & Hunter, M. A. Direct-to-consumer genetic testing’s red herring: ‘genetic ancestry’ and personalized medicine. Front. Med. 6, 48 (2019).

    Google Scholar 

  244. Rothstein, M. A. et al. Legal and ethical challenges of international direct-to-participant genomic research: conclusions and recommendations. J. Law Med. Ethics. 47, 705–731 (2019).

    Google Scholar 

  245. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). This paper describes the concept of ‘missing heritability’, the observation that heritability estimates from GWAS are much lower than those from twin studies.

    ADS  Google Scholar 

  246. Young, A. I. Solving the missing heritability problem. PLoS Genet. 15, e1008222 (2019).

    Google Scholar 

  247. Cai, N. et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat. Genet. 52, 437–447 (2020).

    Google Scholar 

  248. Nagel, M., Watanabe, K., Stringer, S., Posthuma, D. & van der Sluis, S. Item-level analyses reveal genetic heterogeneity in neuroticism. Nat. Commun. 9, 1–10 (2018).

    Google Scholar 

  249. Plenge, R. M., Scolnick, E. M. & Altshuler, D. Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013).

    Google Scholar 

  250. Cook, D. et al. Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nat. Rev. Drug Discov. 13, 419–431 (2014).

    Google Scholar 

  251. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    ADS  Google Scholar 

  252. Peat, G. et al. The Open Targets post-GWAS analysis pipeline. Bioinforma. Oxf. Engl. 36, 2936–2937 (2020).

    Google Scholar 

  253. Sakaue, S. & Okada, Y. GREP: genome for REPositioning drugs. Bioinforma. Oxf. Engl. 35, 3821–3823 (2019).

    Google Scholar 

  254. Schork, N. J. Personalized medicine: time for one-person trials. Nature 520, 609–611 (2015).

    ADS  Google Scholar 

  255. Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).

    Google Scholar 

  256. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Google Scholar 

  257. Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).

    Google Scholar 

  258. Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).

    Google Scholar 

  259. Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007).

    ADS  Google Scholar 

  260. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

    Google Scholar 

  261. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Google Scholar 

  262. Mägi, R. & Morris, A. P. GWAMA: software for genome-wide association meta-analysis. BMC Bioinforma. 11, 288 (2010).

    Google Scholar 

  263. Delaneau, O. et al. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 8, 15452 (2017).

    ADS  Google Scholar 

  264. Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).

    Google Scholar 

  265. Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).

    Google Scholar 

  266. Burgess, S. et al. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur. J. Epidemiol. 30, 543–552 (2015).

    Google Scholar 

  267. Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).

    Google Scholar 

  268. Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).

    Google Scholar 

  269. Finer, S. et al. Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. Int. J. Epidemiol. 49, 20–21i (2020).

    Google Scholar 

  270. The H3Africa Consortium. Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).

    Google Scholar 

  271. Giri, A. et al. Trans-ethnic association study of blood pressure determinants in over 750,000 individuals. Nat. Genet. 51, 51–62 (2019).

    Google Scholar 

  272. All of Us Research Program Investigators. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    Google Scholar 

  273. Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).

    Google Scholar 

Download references

Acknowledgements

D.P. is supported by Netherlands Organization for Scientific Research (NWO) grant VICI 435-14-005, the NWO Gravitation project BRAINSCAPES: A Roadmap from Neurogenetics to Neurobiology (024.004.012) and European Research Council advanced grant ERC-2018-ADG 834057. N.S.M. is supported by National Institutes of Health (NIH) grant U24HL135600. J.d.V. is supported by NIH grant U54HG009790 and Wellcome Trust grant 219600/Z/19/Z. Y.O. is supported by Japan Society for the Promotion of Science (JSPS) KAKENHI grants 19H01021 and 20K21834 and Japan Agency for Medical Research and Development (AMED) grants JP20km0405211, JP20ek0109413, JP20ek0410075, JP20gm4010006 and JP20km0405217. T.L. is supported by NIH grants R01GM122924, R01HL142028, 1R01AG057422, 1UM1HG008901 and R01MH106842. H.C.M. is supported by a Wellcome Trust core grant to the Sanger Institute (098051).

Author information

Authors and Affiliations

Authors

Contributions

Introduction (D.P. and E.U.); Experimentation (A.R.M. and H.C.M.); Results (D.P., Y.O. and T.L.); Applications (A.R.M.); Reproducibility and data deposition (D.P., E.U., J.d.V. and N.S.M.); Limitations and optimizations (D.P., E.U., J.d.V. and Q.Q.H.); Outlook (D.P., E.U., Q.Q.H., Y.O. and T.L.); Overview of the Primer (D.P.).

Corresponding author

Correspondence to Danielle Posthuma.

Ethics declarations

Competing interests

T.L. is an adviser to Goldfinch Bio, Variant Bio and GSK, and has equity in Variant Bio. The other authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Methods Primers thanks T. Edwards, J. Hostetler, D. Paltoo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

23andme: https://research.23andme.com/

ρ-HESS: https://huwenboshi.github.io/hess/

ANNOVAR: https://annovar.openbioinformatics.org/en/latest/

BEAGLE: http://faculty.washington.edu/browning/beagle/beagle.html

Bermuda principles: https://web.ornl.gov/sci/techresources/Human_Genome/research/bermuda.shtml

BGENIE: https://jmarchini.org/bgenie/

BOLT-LMM: https://alkesgroup.broadinstitute.org/BOLT-LMM/

BRAINSCAPES: https://brainscapes.nl

CAVIAR: http://zarlab.cs.ucla.edu/tag/caviar/

dbGAP: https://www.ncbi.nlm.nih.gov/gap/

DEPICT: https://data.broadinstitute.org/mpg/depict/

Ebola Data Platform: https://www.iddo.org/research-themes/ebola

fastGWA: https://cnsgenomics.com/software/gcta/#fastGWA

FINEMAP: http://www.christianbenner.com/

FinnGen results: https://www.finngen.fi/en/access_results

FlashPCA: https://github.com/gabraham/flashpca

FUMA: https://fuma.ctglab.nl/

FUSION: http://gusevlab.org/projects/fusion/

GCTA-COJO: https://cnsgenomics.com/software/gcta/#Overview

GCTA: https://cnsgenomics.com/software/gcta/#Overview

GEMMA: https://github.com/genetics-statistics/GEMMA

GeneAtlas: http://geneatlas.roslin.ed.ac.uk/

General Data Protection Regulation: https://gdpr-info.eu/

Genetic Investigation of Anthropometric Traits (GIANT) consortium: https://portals.broadinstitute.org/collaboration/giant

GenomicSEM: https://github.com/MichelNivard/GenomicSEM/

Genotype–Tissue Expression (GTEX) resource: https://gtexportal.org/home/

Global Alliance for Genomics and Health: https://www.ga4gh.org/

Global Lipids Genetics Consortium: http://lipidgenetics.org

GWAS Atlas: https://atlas.ctglab.nl/

GWAS Catalog: https://www.ebi.ac.uk/gwas/

GWAMA: https://genomics.ut.ee/en/tools/gwama

HIBAG: https://bioconductor.org/packages/release/bioc/html/HIBAG.html

HLA*IMP: https://bioinformaticshome.com/tools/imputation/descriptions/HLA-IMP.html

H3 Africa Consortium: https://h3africa.org/

IMPUTE2: http://mathgen.stats.ox.ac.uk/impute/impute_v2.html

International Common Disease Alliance: https://www.icda.bio

KIR*IMP: http://imp.science.unimelb.edu.au/kir/

LAVA: https://github.com/josefin-werme/lava

LDPred: https://github.com/bvilhjal/ldpred

LDPred-2: https://github.com/privefl/bigsnpr

LD-hub: http://ldsc.broadinstitute.org/

LDSC: https://github.com/bulik/ldsc

METAL: http://csg.sph.umich.edu/abecasis/Metal/

Locuszoom: http://locuszoom.org/

MACH: http://csg.sph.umich.edu/abecasis/mach/tour/imputation.html

MAGMA: https://ctg.cncr.nl/software/magma

Mendelian Randomization: https://cran.r-project.org/web/packages/MendelianRandomization/index.html

Michigan Imputation Server: https://imputationserver.sph.umich.edu/index.html#!

Minimac: https://genome.sph.umich.edu/wiki/Minimac

OpenGWAS database: https://gwas.mrcieu.ac.uk/

Open Science Framework: https://osf.io/

Open Science Framework Registered Reports resource: https://osf.io/rr/

PAINTOR: https://github.com/gkichaev/PAINTOR_V3.0

Pan UKBB: https://pan.ukbb.broadinstitute.org/

Pheweb.jp: https://pheweb.jp/

PLINK: http://zzz.bwh.harvard.edu/plink

PLINK2: https://www.cog-genomics.org/plink/2.0/

Polygenic Score Catalogue: https://www.pgscatalog.org

PRSice: https://www.prsice.info/

PrediXcan: https://github.com/gamazonlab/PrediXcan

PRScs: https://github.com/getian107/PRScs

Psychiatric Genomics Consortium: https://www.med.unc.edu/pgc

QTLTools: https://qtltools.github.io/qtltools/

REGENIE: https://rgcgithub.github.io/regenie/

RICOPILI: https://sites.google.com/a/broadinstitute.org/ricopili/

SAIGE: https://github.com/weizhouUMICH/SAIGE

SBayesR: https://cnsgenomics.com/software/gctb/#Overview

SMARTPCA: https://github.com/chrchang/eigensoft/wiki/smartpca

SMR: https://cnsgenomics.com/software/smr/#Overview

SNP2HLA: http://software.broadinstitute.org/mpg/snp2hla/

SNPTEST: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html#introduction

SumHer: http://dougspeed.com/sumher/

superGNOVA: https://github.com/qlu-lab/SUPERGNOVA

SuSIE: https://github.com/stephenslab/susieR

Templates for data privacy documents: https://www.jyu.fi/en/university/data-privacy/data-privacy-templates#autotoc-item-autotoc-1

TOPMed Imputation Server: https://imputation.biodatacatalyst.nhlbi.nih.gov/#!

VEP: https://www.ensembl.org/info/docs/tools/vep/index.html

Supplementary information

Glossary

Polygenic risk scores

(PRSs). Scores that provide an indication of an individual’s genetic liability to a trait or disease, calculated using an individual’s genome, weighted by effect sizes obtained from genome-wide association studies (GWAS).

Linkage disequilibrium

The non-independent association of two alleles in a population.

Collider bias

A bias that occurs when two variables (A and B) both influence a third variable (C), and the third variable is used to condition on. This can induce spurious correlations between variables A and B.

Hardy–Weinberg equilibrium

If the frequency of observed genotypes of a variant in a population can be derived from the observed allele frequencies, the genetic variant is said to be in Hardy–Weinberg equilibrium. A test for Hardy–Weinberg equilibrium is often used in quality control of genome-wide association studies (GWAS) to filter out variants with possible genotype calling errors.

Phasing

The process of estimating whether genotyped alleles derive from the maternal or paternal allele.

Population stratification

The presence of multiple genetically distinct subpopulations that differ in their mean phenotypic values. When not accounted for, this can lead to spurious genetic associations.

Random effect term

Random effects are effects that have so many levels (including more than the number of observations) that they are not individually estimable. By assigning these effects as random it is assumed that they are drawn from a population with a known variance and covariance structure. The effect for an individual can be predicted given the data and the distributional assumptions.

Logit link function

A function for converting a linear combination of covariate values into probabilities.

Bonferroni testing threshold

A correction for multiple testing that is typically applied by dividing the significance threshold by the number of independent tests that are carried out.

Winner’s curse

The phenomenon that the effect sizes of newly discovered alleles tend to be overestimated.

Odds ratio

An effect size estimate of a risk factor that quantifies the increased odds of having the disease per risk allele count in genome-wide association studies (GWAS) or one standard deviation increase of the polygenic risk score (PRS).

Summary statistics

The primary outcome of genome-wide association studies (GWAS), including a list of all tested single-nucleotide polymorphisms (SNPs) and effect sizes. The minimum required information is SNP IDs, SNP locations and genomic build, alleles, strand, effect size and standard error, P value, test statistic, minor allele frequency and sample size.

Rare variant burden testing

A statistical technique in which the number of rare alleles per gene is used to determine genetic association with a trait.

Transmission disequilibrium test

A family-based genetic association test in which alleles transmitted to affected offspring are contrasted with alleles not transmitted.

Gene flow

The transfer of genetic material between populations.

Genetic bottlenecks

Reductions in effective population size, for example, due to a migration followed by geographical isolation, or due to cultural endogamy, which leads to a reduction in diversity.

Conditional association analysis

A genetic association analysis that includes fixed effects of genetic variants.

Prior probability distribution

A term used in Bayesian statistics to describe the probability distribution of an unknown quantity based on beliefs an investigator has about the model parameters.

Posterior probability distribution

A term used in Bayesian statistics to describe the probability distribution of an unknown quantity based on observed data.

Expression quantitative trait loci

(eQTLs). Dosage effects of genetic variants on gene expression profiles, including expression levels and mRNA splicing patterns.

Pseudo-R 2

A statistical measure that indicates how well a model fits the data for binary traits and that can be used to compare models.

Liability scale

The assumed underlying normal distribution of dichotomous traits.

Net reclassification index

A metric that measures how much a new model improves in terms of reclassification. It is calculated as the proportion of individuals who are correctly reclassified minus the proportion of individuals who are incorrectly reclassified.

Identity by descent

The property of two identical segments of DNA having been inherited from a common ancestor without recombination.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uffelmann, E., Huang, Q.Q., Munung, N.S. et al. Genome-wide association studies. Nat Rev Methods Primers 1, 59 (2021). https://doi.org/10.1038/s43586-021-00056-9

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s43586-021-00056-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载