+
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Genome-wide association testing beyond SNPs

Abstract

Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The cumulative total of association studies added to the GWAS Catalog between 2021 and 2023 for SNP- and CNV-based tests.
Fig. 2: Biological and clinical impact of CNV.
Fig. 3: Data flow depicting the key steps from cohort genetic data to CNV-GWAS results.
Fig. 4: Differences between array-based and sequence-based genome resolution for CNV association testing.
Fig. 5: Emerging opportunities.

Similar content being viewed by others

References

  1. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

    Article  Google Scholar 

  2. Barrett, J. C. & Cardon, L. R. Evaluating coverage of genome-wide association studies. Nat. Genet. 38, 659–662 (2006).

    Article  CAS  PubMed  Google Scholar 

  3. LaFramboise, T. Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Res. 37, 4181–4193 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Hofker, M. H., Fu, J. & Wijmenga, C. The genome revolution and its role in understanding complex diseases. Biochim. Biophys. Acta 1842, 1889–1895 (2014).

    Article  CAS  PubMed  Google Scholar 

  5. Sollis, E. et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).

    Article  CAS  PubMed  Google Scholar 

  6. Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wand, H. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ochoa, D. et al. The next-generation open targets platform: reimagined, redesigned, rebuilt. Nucleic Acids Res. 51, D1353–D1359 (2023).

    Article  PubMed  Google Scholar 

  9. Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700 000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zhu, H. & Zhou, X. Statistical methods for SNP heritability estimation and partition: a review. Comput. Struct. Biotechnol. J. 18, 1557–1568 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).

    Article  Google Scholar 

  13. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Yang, L. A practical guide for structural variation detection in the human genome. Curr. Protoc. Hum. Genet. 107, e103 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Taghizadeh, S. et al. Genome-wide identification of copy number variation and association with fat deposition in thin and fat-tailed sheep breeds. Sci. Rep. 12, 8834 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Delledonne, A. et al. Copy number variant scan in more than four thousand Holstein cows bred in Lombardy, Italy. PLoS ONE 19, e0303044 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).

    Article  Google Scholar 

  18. Verlouw, J. A. M. et al. A comparison of genotyping arrays. Eur. J. Hum. Genet. 29, 1611–1624 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Rapti, M. et al. CoverageMaster: comprehensive CNV detection and visualization from NGS short reads for genetic medicine applications. Brief. Bioinform. 23, bbac049 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Tanjo, T., Kawai, Y., Tokunaga, K., Ogasawara, O. & Nagasaki, M. Practical guide for managing large-scale human genome data in research. J. Hum. Genet. 66, 39–52 (2021).

    Article  PubMed  Google Scholar 

  21. Vacic, V. et al. Duplications of the neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. Nature 471, 499–503 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Fitzgerald, T. & Birney, E. CNest: a novel copy number association discovery method uncovers 862 new associations from 200,629 whole-exome sequence datasets in the UK Biobank. Cell Genom. 2, 100167 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Montavon, T., Thevenet, L. & Duboule, D. Impact of copy number variations (CNVs) on long-range gene regulation at the HoxD locus. Proc. Natl Acad. Sci. USA 109, 20204–20211 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Conrad, D. F. & Hurles, M. E. The population genetics of structural variation. Nat. Genet. 39, S30–S36 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

    Article  CAS  PubMed  Google Scholar 

  26. Lee, C. & Scherer, S. W. The clinical context of copy number variation in the human genome. Expert Rev. Mol. Med. 12, e8 (2010).

    Article  PubMed  Google Scholar 

  27. Lupski, J. R. Genomic rearrangements and sporadic disease. Nat. Genet. 39, S43–S47 (2007).

    Article  CAS  PubMed  Google Scholar 

  28. Campbell, C. D. & Eichler, E. E. Properties and rates of germline mutations in humans. Trends Genet. 29, 575–584 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Belyeu, J. R. et al. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2,396 families. Am. J. Hum. Genet. 108, 597–607 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Gudmundsson, S. et al. Variant interpretation using population databases: lessons from gnomAD. Hum. Mutat. 43, 1012–1030 (2022).

    Article  PubMed  Google Scholar 

  31. Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).

    Article  CAS  PubMed  Google Scholar 

  32. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zhang, F., Gu, W., Hurles, M. E. & Lupski, J. R. Copy number variation in human health, disease, and evolution. Annu. Rev. Genomics Hum. Genet. 10, 451–481 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Coutelier, M. et al. Combining callers improves the detection of copy number variants from whole-genome sequencing. Eur. J. Hum. Genet. 30, 178–186 (2022).

    Article  CAS  PubMed  Google Scholar 

  37. Hollox, E. J., Zuccherato, L. W. & Tucci, S. Genome structural variation in human evolution. Trends Genet. 38, 45–58 (2022).

    Article  CAS  PubMed  Google Scholar 

  38. Rossi, N. et al. Ethnic-specific association of amylase gene copy number with adiposity traits in a large Middle Eastern biobank. NPJ Genom. Med. 6, 8 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Perry, G. H. et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 39, 1256–1260 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Higuchi, R., Iwane, T., Iida, A. & Nakajima, K. Copy number variation of the salivary amylase gene and glucose metabolism in healthy young Japanese women. J. Clin. Med. Res. 12, 184–189 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Rouleau, M. et al. Extensive metabolic consequences of human glycosyltransferase gene knockouts in prostate cancer. Br. J. Cancer 128, 285–296 (2023).

    Article  CAS  PubMed  Google Scholar 

  42. Mafune, A. et al. Homozygous deletions of UGT2B17 modifies effects of smoking on TP53-mutations and relapse of head and neck carcinoma. BMC Cancer 15, 205 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Collins, R. L. et al. A cross-disorder dosage sensitivity map of the human genome. Cell 185, 3041–3055.e25 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Barra, V. & Fachinetti, D. The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nat. Commun. 9, 4340 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Cook, C. B. et al. Somatic mosaicism detected by genome-wide sequencing in 500 parent–child trios with suspected genetic disease: clinical and genetic counseling implications. Cold Spring Harb. Mol. Case Stud. 7, a006125 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Elrick, H. et al. SAVANA: reliable analysis of somatic structural variants and copy number aberrations in clinical samples using long-read sequencing. Preprint at bioRxiv https://doi.org/10.1101/2024.07.25.604944 (2024) .

  47. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Thaxton, C. et al. Utilizing ClinGen gene-disease validity and dosage sensitivity curations to inform variant classification. Hum. Mutat. 43, 1031–1040 (2022).

    Article  CAS  PubMed  Google Scholar 

  49. Huang, N., Lee, I., Marcotte, E. M. & Hurles, M. E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Rice, A. M. & McLysaght, A. Dosage-sensitive genes in evolution and disease. BMC Biol. 15, 78 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  51. All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

    Article  Google Scholar 

  52. Auwerx, C. et al. Rare copy-number variants as modulators of common disease susceptibility. Genome Med. 16, 5 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Kirschner, R. et al. RPGR transcription studies in mouse and human tissues reveal a retina-specific isoform that is disrupted in a patient with X-linked retinitis pigmentosa. Hum. Mol. Genet. 8, 1571–1578 (1999).

    Article  CAS  PubMed  Google Scholar 

  54. Shaikh, T. H. Copy number variation disorders. Curr. Genet. Med. Rep. 5, 183–190 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Xu, H. H. et al. Familial 5.29 Mb deletion in chromosome Xq22.1-q22.3 with a normal phenotype: a rare pedigree and literature review. BMC Med. Genomics 16, 111 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Naseer, M. I. et al. Copy number variations in Saudi family with intellectual disability and epilepsy. BMC Genomics 17, 757 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Wolstencroft, J. et al. Neuropsychiatric risk in children with intellectual disability of genetic origin: IMAGINE, a UK national cohort study. Lancet Psychiatry 9, 715–724 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Zarrei, M. et al. Gene copy number variation and pediatric mental health/neurodevelopment in a general population. Hum. Mol. Genet. 32, 2411–2421 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Auwerx, C. et al. The individual and global impact of copy-number variants on complex human traits. Am. J. Hum. Genet 109, 647–668 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Ceyhan-Birsoy, O. et al. Next generation sequencing-based copy number analysis reveals low prevalence of deletions and duplications in 46 genes associated with genetic cardiomyopathies. Mol. Genet. Genom. Med. 4, 143–151 (2016).

    Article  CAS  Google Scholar 

  61. Singer, E. S. et al. Characterization of clinically relevant copy-number variants from exomes of patients with inherited heart disease and unexplained sudden cardiac death. Genet. Med. 23, 86–93 (2021).

    Article  CAS  PubMed  Google Scholar 

  62. Nfonsam, L. et al. ALU transposition induces familial hypertrophic cardiomyopathy. Mol. Genet. Genom. Med. 8, e951 (2020).

    Article  Google Scholar 

  63. Wilfert, A. B., Sulovari, A., Turner, T. N., Coe, B. P. & Eichler, E. E. Recurrent de novo mutations in neurodevelopmental disorders: properties and clinical implications. Genome Med 9, 101 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Malhotra, D. & Sebat, J. CNVs: harbingers of a rare variant revolution in psychiatric genetics. Cell 148, 1223–1241 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Marshall, C. R. et al. Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects. Nat. Genet. 49, 27–35 (2017).

    Article  CAS  PubMed  Google Scholar 

  66. Davies, R. W. et al. Using common genetic variation to examine phenotypic expression and risk prediction in 22q11.2 deletion syndrome. Nat. Med. 26, 1912–1918 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Maury, E. A. et al. Schizophrenia-associated somatic copy-number variants from 12,834 cases reveal recurrent NRXN1 and ABCB11 disruptions. Cell Genom. 3, 100356 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Trost, B. et al. Genomic architecture of autism from comprehensive whole-genome sequence annotation. Cell 185, 4409–4427.e18 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Riggs, E. R. et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genet. Med. 22, 245–257 (2020).

    Article  PubMed  Google Scholar 

  70. Hippman, C. & Nislow, C. Pharmacogenomic testing: clinical evidence and implementation challenges. J. Pers. Med. 9, 10 (2019).

    Article  Google Scholar 

  71. Crews, K. R. et al. Clinical pharmacogenetics implementation consortium guideline for CYP2D6, OPRM1, and COMT genotypes and select opioid therapy. Clin. Pharmacol. Ther. 110, 888–896 (2021).

    Article  CAS  PubMed  Google Scholar 

  72. Twesigomwe, D. et al. Characterization of CYP2D6 pharmacogenetic variation in sub-Saharan African populations. Clin. Pharmacol. Ther. 113, 643–659 (2023).

    Article  CAS  PubMed  Google Scholar 

  73. Twist, G. P. et al. Constellation: a tool for rapid, automated phenotype assignment of a highly polymorphic pharmacogene, CYP2D6, from whole-genome sequences. NPJ Genom. Med. 1, 15007 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Lee, S. B. et al. Stargazer: a software tool for calling star alleles from next-generation sequencing data using CYP2D6 as a model. Genet. Med. 21, 361–372 (2019).

    Article  CAS  PubMed  Google Scholar 

  75. Chen, X. et al. Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data. Pharmacogenomics J. 21, 251–261 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Twesigomwe, D. et al. StellarPGx: a nextflow pipeline for calling star alleles in cytochrome P450 genes. Clin. Pharmacol. Ther. 110, 741–749 (2021).

    Article  CAS  PubMed  Google Scholar 

  77. Cavallari, L. H. & Johnson, J. A. A case for genotype-guided pain management. Pharmacogenomics 20, 705–708 (2019).

    Article  CAS  PubMed  Google Scholar 

  78. Tayeh, M. K. et al. Clinical pharmacogenomic testing and reporting: a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 24, 759–768 (2022).

    Article  CAS  PubMed  Google Scholar 

  79. Singh, A. K. et al. Detecting copy number variation in next generation sequencing data from diagnostic gene panels. BMC Med. Genomics 14, 214 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Behera, S. et al. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. Preprint at bioRxiv https://doi.org/10.1101/2024.01.02.573821 (2024).

  82. Hujoel, M. L. A. et al. Influences of rare copy-number variation on human complex traits. Cell 185, 4233–4248.e27 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Gabrielaite, M. et al. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers 13, 6283 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 60 (2021).

    Article  Google Scholar 

  85. Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Gross, A. M. et al. Copy-number variants in clinical genome sequencing: deployment and interpretation for rare and undiagnosed disease. Genet. Med. 21, 1121–1130 (2019).

    Article  CAS  PubMed  Google Scholar 

  87. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet 53, 1097–1103 (2021).

    Article  CAS  PubMed  Google Scholar 

  88. Romdhane, L. et al. Ethnic and functional differentiation of copy number polymorphisms in Tunisian and HapMap population unveils insights on genome organizational plasticity. Sci. Rep. 14, 4654 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Fadista, J., Manning, A. K., Florez, J. C. & Groop, L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 24, 1202–1205 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Kaler, A. S. & Purcell, L. C. Estimation of a significance threshold for genome-wide association studies. BMC Genomics 20, 618 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Null, M. et al. Genome-wide analysis of copy number variants and normal facial variation in a large cohort of Bantu Africans. HGG Adv. 3, 100082 (2022).

    CAS  PubMed  Google Scholar 

  92. Hujoel, M. L. A. et al. Hidden protein-altering variants influence diverse human phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2023.06.07.544066 (2023).

  93. Li, S., Carss, K. J., Halldorsson, B. V. & Cortes, A. UK biobank whole-genome sequencing consortium. whole-genome sequencing of half-a-million UK biobank participants. Preprint at bioRxiv https://doi.org/10.1101/2023.12.06.23299426 (2023).

  94. Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Beyter, D. et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet. 53, 779–786 (2021).

    Article  CAS  PubMed  Google Scholar 

  96. Eggertsson, H. P. et al. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat. Commun. 10, 5402 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Wang, Q. et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Li, Y. R. et al. Rare copy number variants in over 100,000 European ancestry subjects reveal multiple disease associations. Nat. Commun. 11, 255 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Aguirre, M., Rivas, M. A. & Priest, J. Phenome-wide burden of copy-number variation in the UK biobank. Am. J. Hum. Genet. 105, 373–383 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Babadi, M. et al. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat. Genet. 55, 1589–1597 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Zhan, X., Girirajan, S., Zhao, N., Wu, M. C. & Ghosh, D. A novel copy number variants kernel association test with application to autism spectrum disorders studies. Bioinformatics 32, 3603–3610 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Dougherty, M. L. et al. Transcriptional fates of human-specific segmental duplications in brain. Genome Res. 28, 1566–1576 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Egorova, T. V. et al. In-frame deletion of dystrophin exons 8–50 results in DMD phenotype. Int. J. Mol. Sci. 24, 9117 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Schmitz, D. et al. Copy number variations and their effect on the plasma proteome. Genetics 225, iyad179 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  107. de Los Campos, G., Grueneberg, A., Funkhouser, S., Pérez-Rodríguez, P. & Samaddar, A. Fine mapping and accurate prediction of complex traits using Bayesian Variable Selection models applied to biobank-size data. Eur. J. Hum. Genet. 31, 313–320 (2023).

    Article  Google Scholar 

  108. Broekema, R. V., Bakker, O. B. & Jonkers, I. H. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol. 10, 190221 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Zhang, C., Cerveira, E., Rens, W. & Lee, C. Multicolor fluorescence in situ hybridization (FISH) approaches for simultaneous analysis of the entire human genome. Curr. Protoc. Hum. Genet. 99, e70 (2018).

    Article  PubMed  Google Scholar 

  110. Gribble, S. M., Ng, B. L., Prigmore, E., Fitzgerald, T. & Carter, N. P. Array painting: a protocol for the rapid analysis of aberrant chromosomes using DNA microarrays. Nat. Protoc. 4, 1722–1736 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Mantere, T. et al. Optical genome mapping enables constitutional chromosomal aberration detection. Am. J. Hum. Genet. 108, 1409–1422 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Schrauwen, I. et al. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci. Rep. 14, 11239 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Louzada, S. & Yang, F. in Cancer Cytogenetics and Cytogenomics (eds. Ye, J. C. & Heng, H. H.) 185–203. Methods in Molecular Biology series vol. 2825 (Springer, 2024).

  114. Choi, J. et al. A whole-genome reference panel of 14,393 individuals for East Asian populations accelerates discovery of rare functional variants. Sci. Adv. 9, eadg6319 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Lepamets, M. et al. Omics-informed CNV calls reduce false-positive rates and improve power for CNV-trait associations. HGG Adv. 3, 100133 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Hujoel, M. L. A. et al. Protein-altering variants at copy number-variable regions influence diverse human phenotypes. Nat. Genet. 56, 569–578 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Gordeeva, V. et al. Benchmarking germline CNV calling tools from exome sequencing data. Sci. Rep. 11, 14416 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Zhou, Z., Wang, W., Wang, L. S. & Zhang, N. R. Integrative DNA copy number detection and genotyping from sequencing and array-based platforms. Bioinformatics 34, 2349–2355 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  119. Montanucci, L. et al. Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals. Nat. Commun. 14, 4392 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Owen, D. et al. Effects of pathogenic CNVs on physical traits in participants of the UK Biobank. BMC Genomics 19, 867 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Fawcett, K. A. et al. Exome-wide analysis of copy number variation shows association of the human leukocyte antigen region with asthma in UK Biobank. BMC Med. Genomics 15, 119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Liu, J. et al. The coexistence of copy number variations (CNVs) and single nucleotide polymorphisms (SNPs) at a locus can result in distorted calculations of the significance in associating SNPs to disease. Hum. Genet. 137, 553–567 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Wineinger, N. E., Pajewski, N. M. & Tiwari, H. K. A method to assess linkage disequilibrium between CNVs and SNPs inside copy number variable regions. Front. Genet. 2, 17 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  125. Estivill, X. & Armengol, L. Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet. 3, 1787–1799 (2007).

    Article  CAS  PubMed  Google Scholar 

  126. Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Hayhurst, J. et al. A community driven GWAS summary statistics standard. Preprint at bioRxiv https://doi.org/10.1101/2022.07.15.500230 (2022).

  128. Magno, R. & Maia, A. T. gwasrapidd: an R package to query, download and wrangle GWAS catalog data. Bioinformatics 36, 649–650 (2020).

    Article  CAS  PubMed  Google Scholar 

  129. Cao, T., Li, A. & Huang, Y. pandasGWAS: a Python package for easy retrieval of GWAS catalog data. BMC Genomics 24, 238 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Elsworth, B. et al. The MRC IEU OpenGWAS data infrastructure. Preprint at bioRxiv https://doi.org/10.1101/2020.08.10.244293 (2020).

  131. Costanzo, M. C. et al. Cardiovascular disease knowledge portal: a community resource for cardiovascular disease research. Circ. Genom. Precis. Med. 16, e004181 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Lambert, S. A. et al. The polygenic score catalog: new functionality and tools to enable FAIR research. Preprint at medRxiv https://doi.org/10.1101/2024.05.29.24307783 (2024).

  133. Chen, Y. et al. Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat. Commun. 14, 283 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02024-y (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  135. Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Dierckxsens, N., Li, T., Vermeesch, J. R. & Xie, Z. A benchmark of structural variation detection by long reads through a realistic simulated model. Genome Biol. 22, 342 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  139. De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  140. Gustafson, J. A. et al. Nanopore sequencing of 1000 genomes project samples to build a comprehensive catalog of human genetic variation. Preprint at medRxiv https://doi.org/10.1101/2024.03.05.24303792 (2024).

  141. Schloissnig, S. et al. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 genomes project. Preprint at bioRxiv https://doi.org/10.1101/2024.04.18.590093 (2024).

  142. Groza, C. et al. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat. Commun. 15, 657 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Noyvert, B. et al. Imputation of structural variants using a multi-ancestry long-read sequencing panel enables identification of disease associations. Preprint at bioRxiv https://doi.org/10.1101/2023.12.20.23300308 (2023).

  145. Lambert, S. A. et al. The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Xiang, R. et al. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med. 16, 33 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  147. Hao, L. et al. Development of a clinical polygenic risk score assay and reporting workflow. Nat. Med. 28, 1006–1013 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Lennon, N. J. et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat. Med. 30, 480–487 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Bergen et al. Joint contributions of rare copy number variants and common SNPs to risk for schizophrenia. Am. J. Psychiatry 176, 29–35 (2019).

    Article  PubMed  Google Scholar 

  150. Taniguchi, S. et al. Polygenic risk scores in schizophrenia with clinically significant copy number variants. Psychiatry Clin. Neurosci. 74, 35–39 (2020).

    Article  CAS  PubMed  Google Scholar 

  151. Mollon, J. et al. Impact of copy number variants and polygenic risk scores on psychopathology in the UK biobank. Biol. Psychiatry 94, 591–600 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Alexander-Bloch, A. et al. Copy number variant risk scores associated with cognition, psychopathology, and brain structure in youths in the philadelphia neurodevelopmental cohort. JAMA Psychiatry 79, 699–709 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  153. Saarentaus, E. C. et al. Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants. Mol. Psychiatry 26, 4884–4895 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Kachuri, L. et al. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 25, 8–25 (2024).

    Article  CAS  PubMed  Google Scholar 

  155. Hu, S. et al. Leveraging fine-scale population structure reveals conservation in genetic effect sizes between human populations across a range of human phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2023.08.08.552281 (2023).

  156. Hou, K. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55, 549–558 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Heyne, H. O. et al. Mono- and biallelic variant effects on disease at biobank scale. Nature 613, 519–525 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  158. Song, P. et al. Data resource profile: understanding the patterns and determinants of health in South Asians-the South Asia biobank. Int. J. Epidemiol. 50, 717–718e (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  159. Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Loh, P. R. et al. Reference-based phasing using the haplotype reference consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Delaneau, O., Zagury, J. F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  162. Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Browning, B. L. & Browning, S. R. Statistical phasing of 150,119 sequenced genomes in the UK Biobank. Am. J. Hum. Genet. 110, 161–165 (2023).

    Article  CAS  PubMed  Google Scholar 

  164. Lassen, F. H. et al. Exome-wide evidence of compound heterozygous effects across common phenotypes in the UK Biobank. Preprint at medRxiv https://doi.org/10.1101/2023.06.29.23291992 (2023).

  165. Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  167. Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Namba, S. et al. A practical guideline of genomics-driven drug discovery in the era of global biobank meta-analysis. Cell Genom. 2, 100190 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Klein, R. J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Arruda, A. L., Morris, A. P. & Zeggini, E. Advancing equity in human genomics through tissue-specific multi-ancestry molecular data. Cell Genom. 4, 100485 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank all members of the CNV-GWAS Consortium for contributions and discussions and the EMBL-EBI for funding of the authors’ work.

Author information

Authors and Affiliations

Authors

Contributions

L.H., E.M.M., X.Z., K.F., P.I.S., F.M., M.I. and T.F. wrote the article. All authors researched the literature, provided substantial contributions to discussions of the content, and reviewed and/or edited the manuscript.

Corresponding author

Correspondence to Tomas Fitzgerald.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks Bjarni V. Halldórsson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

All of Us: https://allofus.nih.gov/

ClinGen: https://clinicalgenome.org/

Finngen: https://www.finngen.fi/en

Genomics England: https://www.genomicsengland.co.uk/

gnomAD: https://gnomad.broadinstitute.org/

GWAS Catalog: https://www.ebi.ac.uk/gwas/

Linkage disequilibrium: https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/gene-linkage-disequilibrium

Open Targets Platform: https://platform.opentargets.org/

PGS Catalog: https://www.pgscatalog.org/

PharmGKB: https://www.pharmgkb.org/gene/PA128/labelAnnotation

PRECISE: https://npm.sg/

UK Biobank: https://www.ukbiobank.ac.uk/

Supplementary information

Glossary

Cryptic relatedness

Confounding relatedness within a population, which can occur when individuals in the study cohort are more closely related to one another than assumed by the investigators.

Diplotype

The combination of two alleles for a gene, one inherited from each of the individual’s parents.

Dosage sensitivity

When a variation in the number of copies of a gene or genetic element leads to a change in phenotype.

Epistasis

A phenomenon whereby nonlinear interactions between different genes or variants affect the same trait.

FAIR recommendations

A set of guidelines to increase the value of data by making it findable, accessible, interoperable and reusable.

Imputation

A method for inferring genotypes (or genetic variants) at locations that were not included in the assay.

Lead associations

A lead association is the genetic variant that has the strongest association signals (lowest P value) from the association test. This variant may or may not be causal.

LOEUF score

A continuous metric designed to assess how intolerant a gene is to loss-of-function variation.

Mendelian randomization

An epidemiological method used to study the causal effect of a risk factor (such as genetic variation) on health, social or economic outcomes.

Optical mapping

An imaging method that analyses fluorescently labelled DNA molecules to provide a high-resolution map of structural variation.

Polygenic score

(PGS). A weighted estimate of how genetic variants affect a phenotype, often used to estimate a person’s risk of developing a disease or complex trait.

Single-nucleotide polymorphism

(SNP). A germline genetic variant that is present in more than 1% of the population, in which a single nucleotide base differs from the reference genome.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harris, L., McDonagh, E.M., Zhang, X. et al. Genome-wide association testing beyond SNPs. Nat Rev Genet 26, 156–170 (2025). https://doi.org/10.1038/s41576-024-00778-y

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41576-024-00778-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载