Accurate human genome analysis with element avidity sequencing

Carroll, Andrew; Kolesnikov, Alexey; Cook, Daniel E.; Brambrink, Lucas; Wiseman, Kelly N.; Billings, Sophie M.; Kruglyak, Semyon; Lajoie, Bryan R.; Zhao, Junhua; Levy, Shawn E.; McLean, Cory Y.; Shafin, Kishwar; Nattestad, Maria; Chang, Pi-Chuan

doi:10.1186/s12859-025-06191-4

Research
Open access
Published: 25 July 2025

Accurate human genome analysis with element avidity sequencing

BMC Bioinformatics volume 26, Article number: 194 (2025) Cite this article

1411 Accesses
7 Citations
2 Altmetric
Metrics details

Abstract

Background

New sequencing technologies provide options for the scientific community to design studies and build clinical workflows. These options expand user choice, and can enable more accurate, scalable, or affordable workflows depending on the fit between scientist needs and platform capability. However, it is essential to understand the performance of these new technologies for different tasks, especially for capabilities that were not possible or tractable in prior technologies. We investigate the new sequencing technology avidity from Element Biosciences. to help the scientific community understand the performance of the options to generate sequencing data.

Results

We show that Element whole genome sequencing achieves higher mapping and variant calling accuracy compared to Illumina sequencing at the same coverage, with larger differences at lower coverages (20–30x). We quantify base error rates of Element reads, finding lower error rates, especially in homopolymer and tandem repeat regions. We use Element’s ability to generate paired end sequencing with longer insert sizes than typical short–read sequencing. We show that longer insert sizes result in even higher accuracy, with long insert Element sequencing giving more accurate genome analyses at all coverages.

Conclusions

New options for sequencing technologies can analyze genomes comparably or better than prior standard methods.

Peer Review reports

Introduction

Sequencing the genomes and transcriptomes of organisms enables diagnosis of genetic diseases [1,2,3], discovery of gene-trait associations [4] for drug discovery [5] and agriculture [6], creation of reference genomes [7], resources for genetic variant annotation [8], and imputation methods [9].

Initially, efforts to assess sequencing accuracy used indirect factors, such as the ratio of transition to transversion in variant calls or concordance with Mendelian inheritance [10]. The ability to assess accuracy was transformed by the Genome in a Bottle standards, a set of 7 human cell lines whose genomes were extensively characterized with multiple technologies, analysis methods, and manual curation [11,12,13,14,15]. This resource, combined with community competitions [16] and comparisons [17] expanded the ability to detect accuracy improvements beyond the accuracy of current individual methods, which allowed a burst of innovation in both sequencing [18] and analysis [19,20,21] methods to demonstrate the validity of their innovations. This innovation has in turn been shown to increase diagnostic rates and to identify previously missed disease-causing variants [3].

A new sequencing method based on sequencing by avidity rather than sequencing by synthesis developed by Element Biosciences can generate short-read sequencing at high yield, with more than six 30 × genomes in a sequencing run, along with per-base accuracies that reportedly exceed Illumina sequencing [22]. Because the reported metrics focus on the accuracy of individual reads this work assesses how the read level accuracies of Element correspond to accuracy of full genome sequencing including both mapping and variant calling of the Genome in a Bottle samples.

We observe that Element sequencing enables higher accuracy across a range of coverages from 20–50x. The increase in accuracy was most notable at lower coverages (20–30x). We identified genome contexts where Element had improved accuracy, specifically tandem repeats and homopolymers, with a reduction in read soft-clipping due to loss of quality later in the read at these contexts.

One new property of Element’s AVITI platform is the ability to generate paired-end sequencing data with longer insert sizes (the distance between the paired reads) than is typical with Illumina preparations. By investigating Element sequencing runs performed with libraries that had longer insert size distribution (with a template length of > 1000 base pairs as opposed to 350–500 base pairs), we identified a strong positive effect with longer insert sizes for Element sequencing. The long insert Element sequencing outperformed both Illumina and standard insert Element sequencing at each coverage threshold.

Results

Comparing variant calling accuracy

We compared genome analysis accuracy between Illumina and Element sequencing in typical use cases. Individual sequencing runs of high sequencing coverage of Illumina [23] and Element were downsampled to equal number of starting reads for 20x, 30x, 40x, and 50 × coverage. These reads were mapped with BWA MEM [24] to the GRCh38 reference [25]. Sequencing runs from HG001, HG002, HG003, and HG005 were analyzed from both technologies. Variants were called with DeepVariant v1.5 [26], which has been jointly trained with both Illumina and Element data in the single release model. All comparisons between technologies use this same DeepVariant model. HG003 is withheld from training all DeepVariant models and that sample is used for whole genome holdout test datasets. Chromosome 20 is withheld from training in all samples and is used for comparison on other samples.

To assess accuracy, we used Hap.py [13] to compare the resulting VCF against the v4.2.1 Genome in a Bottle truth set [15] used in the PrecisionFDA v2 Truth Challenge [16]. Element sequencing had higher accuracy (both precision and recall) compared to Illumina at the 20 × coverage point, but the difference narrowed at higher coverage (Fig. 1A, Supplementary Fig. 1).

A sample with 30 × average sequencing coverage will have a distribution of coverages across the genome, for example some positions will be covered at 20 × and others at 40x. To look more directly at the effect of coverage on accuracy for Illumina and Element, we downsampled at 1 × intervals from 50 × to 10x (a total of 40 variant calling runs per sample) across chromosome 20, which is always withheld from DeepVariant training in all samples. We collected the hap.py results for all variant call files, aggregated all calls, and stratified these calls by the sequencing depth at a given position. This allowed us to assess performance on coverage-matched positions across a large coverage range. This revealed larger differences in accuracy at lower coverages between Element and Illumina (Fig. 1B). Element had a higher accuracy in the 30–40 × coverage range as well.

The first step of DeepVariant’s variant calling method uses a heuristic process, conceptually similar to Samtools [27] bcftools or GATK [28] which uses observed allele frequencies to propose positions as candidate variants. In the second stage, a convolutional neural network either rejects these candidates as false, or determines they are true and assigns their genotype. In order for a false candidate to be generated, at least two reads must support the candidate and at least 12% of total reads for SNPs or 6% for Indels. At the candidate variant level, we noticed larger differences between Element and Illumina runs. Element runs had fewer rejected candidates (Fig. 2A). The observation of lower false candidate generation was consistent with the reported higher overall accuracy of Element reads [22]. However, the magnitude of the difference seen here should require concentration of errors in certain contexts, as a 12% support rate is much higher than overall Illumina sequencing errors rates. In the case of DeepVariant, the reduction in number of candidates results in a corresponding decrease in runtime for Element relative to Illumina, roughly reducing the DeepVariant runtime by 20%.

The large differences in candidate generation rates between Illumina and Element seemed unlikely to be fully explained by random error rates. Instead, reads going out of sequencing phase in regions difficult to resolve by sequencing by synthesis could generate the error rates required to make candidates. In sequencing by synthesis, reads go out of phase when they hit certain contexts (e.g. homopolymers and tandem repeat runs) that break up the synchronous replication of the cluster, so individual molecules are replicating at different parts of their template [29]. This degrades the sequencing quality and produces errors.

The read mapping step can occasionally identify that a read has useless sequence after a certain point and soft-mask the read. We observed a higher proportion of soft-masked bases in Illumina, which was much more pronounced in repeat regions and homopolymers from the Genome in a Bottle stratifications [30] (Fig. 2B). This is consistent with Element having an improvement in read phasing over difficult contexts.

Investigating base-level concordance through T2T assemblies

Recently, a highly accurate. telomere-to-telomere (T2T) assembly of chromosome Y was completed for HG002 [31]. This allows us to investigate the empirical accuracy of Element sequencing at the base-level, by taking high quality mapping reads to complete Y-chromosome sequence of HG002 and HG003 and looking for any mismatch from the assembly to create the full base-level error rate of the reads. Because HG003 is the father of HG002 and transmits the same Y-chromosome to the T2T assembly, we can analyze both samples in this way.

We used the Bam Error Stats Tool (BEST) [32], which was developed to quantify errors in sequencing technology at the read level by comparing reads to a reliable assembly. Reads at MAPQ60 were used to greatly reduce mapping bias. Consistent with the observations from variant calling, we observe empirical concordance of HG002 and HG003 is higher with Element samples than with Illumina samples. Mismatch rates were 2.4 to 3.3 fold higher in Illumina reads compared to Element (Fig. 3A). We also compare the predicted base quality values and find their calibration consistent between Element samples and well-calibrated beyond predicted Q40 (99.99% accuracy). (Fig. 3B).

Long insert sequencing improves genome analysis

Element has developed methods that allow libraries with insert sizes of > 1000 base pairs as opposed to 350–500 base pairs to be sequenced efficiently. To test if longer inserts could improve Element sequencing accuracy, we received long insert sequencing runs with a median length of more than 1000 bp (Fig. 4A). The same mapping and variant calling pipeline was used resulting in large improvements in recall (Fig. 4B, Supplementary Fig. 2), suggesting that increasing insert length is a promising mechanism to increase variant calling comprehensiveness and accuracy in general.

In addition to running with DeepVariant, we performed analyses with GATK4.5 on nine additional sequencing runs: 3 Illumina NovaSeq runs, 3 Element runs with the Cloudbreak chemistry and 500 bp insert sizes (standard inserts), and 3 Element runs with the Cloudbreak chemistry and 1000 bp insert sizes (long insert) (Supplementary Fig. 3). Additionally, we express the results for one whole genome sample in total errors (sum of false positives and false negatives for both SNP and Indel errors) (Supplementary Fig. 4). For DeepVariant, we observed the same patterns described previously—Element has higher accuracy than Illumina regardless of coverage, but more pronounced at 20–30x, long insert Element has higher overall accuracy, especially recall. For GATK analyses, we observed that Element had higher accuracy than Illumina at 20 × and 30x, but as coverage increases, GATK gains recall but loses precision. We also observed a trade-off with long insert sequencing which had higher recall for GATK but lower precision. This observation is not present with Illumina. This may suggest that some aspect of the way GATK models the sequencing data was designed for some aspect of Illumina data that is not present in the same way in Element data. DeepVariant at 20 × coverage in every sample, Illumina or Element, achieved a higher accuracy than GATK with any sample or coverage (including 40x). To realize the potential in the long insert Element data, these results suggest that it is ideal to use DeepVariant or a variant caller other than GATK. As with all other analyses, these accuracy measures use regions of the genome never used to train any DeepVariant model (chromosome 20 for all samples) for all samples, and a full sample holdout from training (HG003).

Discussion

We have characterized the accuracy profile for analysis of human genomes with a new sequencing technology, Element AVITI that uses a sequencing by avidity approach rather than sequencing by synthesis. Element data achieves greater variant calling accuracy over a range of coverages, with especially improved accuracy in the 20–30 × coverage range. We identify certain sequence contexts in which Element outperforms Illumina reads, including in tandem repeats and homopolymers, as measured by soft-clipping rates. Finally, we show a positive effect on the accuracy of whole genome sequencing pipelines when using longer inserts for sequencing.

Although these investigations focus on whole genome analysis for germline variation at coverage ranges typically used for variant discovery, there are several other applications for which base-level accuracy is of greater importance. These applications include somatic sequencing for detection of subclonal acquired variants, deep sequencing of cancers, or analysis of cell-free DNA. For this application, only a few sequence reads may contain a variant at low allele fraction, and the ability to determine whether those bases reflect a real variant or an error depends highly on sequence quality. Similarly, low-pass sequencing of samples followed by imputation could benefit more, which has recently been investigated with Element sequencing [33].

The high accuracy of Element in homopolymers and repeats could provide a unique ability to improve genome assemblies and reference resources by polishing remaining errors in these contexts which are difficult both for Illumina as well as long-read methods like Pacific Biosciences and Oxford Nanopore.

One caveat in analyses of accuracy is that the current (v4.2.1) Genome in a Bottle benchmarks do not cover the entirety of the Genome, due to the difficulty in mapping certain parts of it. Accuracy across the full genome, including these parts not covered by Genome in a Bottle is likely lower. The longer insert size Element runs might be able to better access parts of the genome which can’t be measured by these benchmarks, and could be another method to help expand the confident regions in future releases.

Conclusions

We demonstrate that data from the new short-read sequencing instrument Element AVITI can achieve comparable or better performance compared to Illumina NovaSeq. This will expand the available options for the research community for sequencing technology choice.

We identify areas of better performance, including higher accuracy especially at lower coverage, and a reduced amount of read soft-clipping in repetitive regions and homopolymers. This will allow the research community to better resolve certain challenging genomics regions.

We demonstrate that the use of longer inserts between read pairs when sequencing can improve accuracy, especially recall, which we hope will shift the research community and technology providers toward using longer inserts to improve mappability and analysis of genomes.

Methods

Protocol for long insert element data

Covaris-sheared, PCR-free long insert libraries were prepared using the Kapa HyperPrep workflow. 1ug of HG002 and HG003 gDNA were mechanically sheared using the following Covaris program:

Duration	Temp	Peak power	Duty % factor	Cycles/bursts	Average Power
10 s	12C	50	20	200	10

A narrow double-sided SPRI selection ratio of 0.3X/0.42X was used to select the long fragments. The Adept Rapid protocol was used for circularization. The libraries were sequenced on the Element AVITI system, 2 × 150 paired end reads with indexing, using a custom recipe for long inserts. The primary changes to the recipe involved increasing the amplification time to account for the increased insert length.

Reference genome used

GRCh38 with masking of certain false segmental duplications as recommended by Genome in a Bottle [34] (GRCh38_masked_v2_decoy_excludes_GPRIN2_DUSP22_FANCD2.fasta.gz) was used for all germline variant calling pipelines. For BEST analysis ChrY of T2T-CHM13v2.0 [31] was used.

Read mapping

Mapping was performed with BWA v0.7.17 (r1188) [24]. Duplicate marking was performed with GATK v4.1.2 [28].

Variant Calling

Variant calling was performed with DeepVariant v1.5 [26] using the WGS model.

Read and base level assessment

Assessment of base and read level accuracy was performed with BAM Error Stats Tool (BEST). [32]

Assessment used only MAPQ 60 reads in the T2T-XY v2.7 confident regions. The confident BED file used is at: https://storage.googleapis.com/brain-genomics-public/research/element/chry/T2T_chrY_confident.bed

Variant accuracy evaluation

Accuracy evaluation was performed with hap.py [13] using the v4.2.1 [13, 15] truth sets from Genome in a Bottle.

Chromosome20 downsampling

To assess accuracy matching coverage at variant position for Fig. 1B, downsamples at 1 × intervals were conducted from 50 × to 10x, and hap.py used to annotate variant calls as true positives, false positives, and false negatives. All calls for a given sample were aggregated over the files, and the sequence depth for each variant position was used to calculate total precision and recall.

Read count and downsampling fraction for sequence data

The sequencing data used for this paper is available as FASTQ files at: https://console.cloud.google.com/storage/browser/brain-genomics-public/research/element/sequencing_files/

The number of reads for each sample (including both paired files) and the downsample fraction required to reach 30 × coverage are:

Downsampling is performed by generating a new BAM with the command:

Availability of data and materials

Illumina sequencing was taken from PCR-Free NovaSeq6000 data generated as described in Baid et al. [23]. Element sequencing data for whole genome comparison on HG003 was taken from the Cloudbreak release. Sequencing data from other samples was taken from earlier Element chemistries and made available by Element from: https://www.elementbiosciences.com/resources. FASTQ, BAM, VCFs, and analysis files are hosted publicly and available with no egress charge at: https://console.cloud.google.com/storage/browser/brain-genomics-public/research/element/. Accessible from GCP console at: gs://brain-genomics-public/research/element/. All contents of this folder are available via direct https links, an index of file urls can be downloaded at: https://storage.googleapis.com/brain-genomics-public/research/element/element_urls.txt. Within this folder there are five subfolders: candidates/—VCF files for 30 × sequencing of Illumina and Element multiple samples used to identify filtered candidates across samples (Fig. 2A). chr20/—Chromosome20 VCF files for 1 × downsamples from 10 to 50 × of Illumina and Element samples. Used for Fig. 1B. chry/—ChromosomeY BAM, VCF, and Best analysis files used to assess read concordance with T2T assembly. Used for Fig. 3A–B. sequencing_files/—Whole genome sequencing FASTQ files analyzed in this paper. wgs/—FASTQ, BAM, VCF, and Hap.py files for HG003 Illumina and Element Cloudbreak, include standard (500 bp) and long (1000 bp) insert sizes. Used for Figs. 1A, 2B, 4A–B.

Abbreviations

BAM:: Binary alignment map
VCF:: Variant call format
T2T:: Telomere-2-telomere
GRCh38:: Genome reference consortium human build 38
BEST:: Bam error stats tool
WGS:: Whole genome sequencing
GCP:: Google cloud platform
NIST:: National institute of standards and technology

References

Gorzynski JE, et al. Ultrarapid nanopore genome sequencing in a critical care setting. N Engl J Med. 2022;386:700–2.
Article PubMed Google Scholar
Saunders CJ, et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci Transl Med. 2012;4:154ra135.
Article PubMed PubMed Central Google Scholar
AlDubayan SH, et al. Detection of pathogenic variants with germline genetic testing using deep learning vs standard methods in patients with prostate cancer and melanoma. JAMA. 2020;324:1957–69.
Article CAS PubMed PubMed Central Google Scholar
Manolio TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010;363:166–76.
Article CAS PubMed Google Scholar
Peloso GM, et al. Rare protein-truncating variants in APOB, lower low-density lipoprotein cholesterol, and protection against coronary heart disease. Circ Genom Precis Med. 2019;12: e002376.
Article CAS PubMed PubMed Central Google Scholar
Snelling WM, et al. Assessment of imputation from low-pass sequencing to predict merit of beef steers. Genes. 2020;11:1312.
Article CAS PubMed PubMed Central Google Scholar
Liao W-W, et al. A draft human pangenome reference. Nature. 2023;617:312–24.
Article CAS PubMed PubMed Central Google Scholar
Karczewski KJ, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–43.
Article CAS PubMed PubMed Central Google Scholar
Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406.
Article CAS PubMed PubMed Central Google Scholar
Eberle MA, et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 2017;27:157–64.
Article CAS PubMed PubMed Central Google Scholar
Zook JM, et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol. 2014;32:246–51.
Article CAS PubMed Google Scholar
Zook JM, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific data. 2016;3: 160025.
Article CAS PubMed PubMed Central Google Scholar
Krusche P, et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol. 2019;37:555–60.
Article CAS PubMed PubMed Central Google Scholar
Zook JM, et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol. 2019;37:561–6.
Article CAS PubMed PubMed Central Google Scholar
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom 2,100128 (2022). https://doi.org/10.1016/j.xgen.2022.100128
Olson, N. D. et al. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell Genom 2, 100129 (2022). https://doi.org/10.1016/j.xgen.2022.100129
Foox J, et al. Performance assessment of DNA sequencing platforms in the ABRF next-generation sequencing study. Nat Biotechnol. 2021;39:1129–40.
Article CAS PubMed PubMed Central Google Scholar
Wenger AM, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62.
Article CAS PubMed PubMed Central Google Scholar
Sirén J, et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science. 2021;374:abg8871.
Article PubMed PubMed Central Google Scholar
Tetikol HS, et al. Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis. Nat Commun. 2022;13:4384.
Article CAS PubMed PubMed Central Google Scholar
Scheffler, K. et al. Somatic small-variant calling methods in Illumina DRAGEN^TM secondary analysis. bioRxiv 2023–2003 (2023). https://doi.org/10.1101/2023.03.23.53401
Arslan S, et al. Sequencing by avidity enables high accuracy with low reagent consumption. Nat Biotechnol. 2023. https://doi.org/10.1038/s41587-023-01750-7.
Article PubMed PubMed Central Google Scholar
Baid G, et al. An extensive sequence dataset of gold-standard samples for benchmarking and development. Cold Spring Harbor Lab. 2020. https://doi.org/10.1101/2020.12.11.422022.
Article Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN] (2013).
Schneider VA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017;27:849–64.
Article CAS PubMed PubMed Central Google Scholar
Poplin R, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36:983–7.
Article CAS PubMed Google Scholar
Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Article PubMed PubMed Central Google Scholar
DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8.
Article CAS PubMed PubMed Central Google Scholar
Pfeiffer F, et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci Rep. 2018;8:10950.
Article PubMed PubMed Central Google Scholar
Dwarshuis, N. J. et al. StratoMod: Predicting sequencing and variant calling errors with interpretable machine learning. bioRxiv 2023–2001 (2023).
Rhie, A. et al. The complete sequence of a human Y chromosome. bioRxiv 2022–2012 (2022).
Liu, D. et al. Best: A tool for characterizing sequencing errors. bioRxiv 2022–2012 (2022).
Li, J. H. et al. Low-pass sequencing plus imputation using avidity sequencing displays comparable imputation accuracy to sequencing by synthesis while reducing duplicates. bioRxiv 2022–2012 (2022).
Behera S, et al. FixItFelix: improving genomic analysis by fixing reference errors. Genome Biol. 2023;24:31.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Ted Yun for feedback and comments for the manuscript. We thank Justin Zook, Nathan Dwarshuis, Nancy Hansen, Mira Mastoras, Konstantinos Kyriakidis, and Benedict Paten for analysis and insights into Element reads for work with NIST Genome in a Bottle, Telomere-2-Telomere consortium, and Human Pangenome Project.

Funding

This study was funded by Google LLC. No grants were received for this study. Element Biosciences produced the sequencing data used in this study, but did not conduct analysis or perform informatics steps for this study. Google conducted analysis, trained DeepVariant models, and performed informatics steps, but did not produce any sequencing for this study.

Author information

Authors and Affiliations

Google LLC, Mountain View, CA, USA
Andrew Carroll, Alexey Kolesnikov, Daniel E. Cook, Lucas Brambrink, Cory Y. McLean, Kishwar Shafin, Maria Nattestad & Pi-Chuan Chang
Element Biosciences, San Diego, CA, USA
Kelly N. Wiseman, Sophie M. Billings, Semyon Kruglyak, Bryan R. Lajoie, Junhua Zhao & Shawn E. Levy

Authors

Andrew Carroll
View author publications
Search author on:PubMed Google Scholar
Alexey Kolesnikov
View author publications
Search author on:PubMed Google Scholar
Daniel E. Cook
View author publications
Search author on:PubMed Google Scholar
Lucas Brambrink
View author publications
Search author on:PubMed Google Scholar
Kelly N. Wiseman
View author publications
Search author on:PubMed Google Scholar
Sophie M. Billings
View author publications
Search author on:PubMed Google Scholar
Semyon Kruglyak
View author publications
Search author on:PubMed Google Scholar
Bryan R. Lajoie
View author publications
Search author on:PubMed Google Scholar
Junhua Zhao
View author publications
Search author on:PubMed Google Scholar
Shawn E. Levy
View author publications
Search author on:PubMed Google Scholar
Cory Y. McLean
View author publications
Search author on:PubMed Google Scholar
Kishwar Shafin
View author publications
Search author on:PubMed Google Scholar
Maria Nattestad
View author publications
Search author on:PubMed Google Scholar
Pi-Chuan Chang
View author publications
Search author on:PubMed Google Scholar

Contributions

AC, PC-C, SK, and BRL conceived of the experimental design AK, DEC, LB KS, MN, PC-C trained DeepVariant models and modified DeepVariant code KNW, SMB, SK, BRL, JZ, SEL generated Element data, software, and analysis AC, CYM, PC-C, SEL supervised team efforts at respective organizations AC, CYM, PC-C, DEC, SK, BRL, SEL wrote the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Andrew Carroll.

Ethics declarations

Ethics approval and consent to participate

All sequencing was performed on cell lines previously consented, published and made available by NIST. There are no research participants or human subjects in this study.

Consent for publication

Not applicable.

Competing interests

AC, AK, DEC, LB, CYM, KS, MN, and PC-C are employees of Google LLC and own Alphabet stock as part of the standard compensation package. KNW, SMB, SK, BRJ, JZ, and SEL are employees of Element Biosciences and hold stock options in the company. Google and Element do not have a commercial relationship involving the sale of sequencing capabilities or informatics tools.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Carroll, A., Kolesnikov, A., Cook, D.E. et al. Accurate human genome analysis with element avidity sequencing. BMC Bioinformatics 26, 194 (2025). https://doi.org/10.1186/s12859-025-06191-4

Download citation

Received: 26 March 2024
Accepted: 13 June 2025
Published: 25 July 2025
Version of record: 25 July 2025
DOI: https://doi.org/10.1186/s12859-025-06191-4

Accurate human genome analysis with element avidity sequencing

Abstract

Background

Results

Conclusions

Introduction

Results

Comparing variant calling accuracy

Investigating base-level concordance through T2T assemblies

Long insert sequencing improves genome analysis

Discussion

Conclusions

Methods

Protocol for long insert element data

Reference genome used

Read mapping

Variant Calling

Read and base level assessment

Variant accuracy evaluation

Chromosome20 downsampling

Read count and downsampling fraction for sequence data

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary Material 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us