Extended Data Fig. 5: Molecular features of truncating variants at homopolymeric loci.
From: Respiratory complex and tissue lineage drive recurrent mutations in tumour mtDNA
a,b, Enrichment for truncating variants in CI and non-truncating in CIII when restricted to mutations with 20+ reads supporting the alternate allele. Error bars are 95% Poisson exact confidence intervals; P-values from two-sided Poisson tests. c, Comparison of frameshift indel homopolymer hotspots detected among indels supported by a minimum of 20 alt-reads (Y-axis) to those with a minimum support of 5 alt-reads (X-axis). d, Percentage of cases per cancer type with truncating frameshift indels at any of 6 indel hotspot loci. Plotted cancer types had ≥ 20 well-covered samples (n=4,432 paired tumor and matched-normal samples total). Bar height indicates the fraction of samples with any indels at homopolymer hotspot out of the total number of well-covered samples for the given cancer type; numbers above bars indicate the total number of cases. e, Validation of homopolymeric indel hotspot loci. The proportion of samples in TCGA (X-axis) or PCAWG (excluding samples also in TCGA, Y-axis) with frameshift indels at 73 homopolymeric regions. The 6 indel hotspot loci are colored red and labeled. y=x is shown as a dashed line. Pearson correlation coefficient r as indicated. f, Breakdown of homopolymer loci and their hotspot incidence rates by mitochondrial complex. Heatmap tile shading indicates overall mutation rate (total number of mutants across homopolymer loci divided by the total number of samples with sufficient sequencing coverage). Fractions in tile labels are the number of homopolymer hotspots divided by the total number of homopolymer loci. Right, histogram of the total number of loci with each homopolymer length. g, The percentage of all truncating variants which arose at 6 homopolymer hotspot loci in TCGA tumor samples and in saliva-derived normal samples from HelixMTdb. Error bars are 95% binomial confidence intervals.