Fig. 4: In-depth analysis of clonal relationships in the global H. pylori dataset.
a Pairwise core genome MLST (cgMLST) distances of the HpGP dataset. Bins illustrate the distribution of core genome allele sharing between pairs of samples. The x-axis ranges from 0.1 to 0.99, with lower values indicating higher number of shared alleles. Every pair is included in a single category of comparison (color bar). Only a small fraction of all possible pairs shares more than 1% of alleles, most of them involving samples from the same country of origin. It is noteworthy that a group of strains from different regions of the US shares between 6% and 17% of alleles corresponding to 62 and 176 identical genes, suggesting the presence of a deep clone. Other pairs exhibit larger portions of shared alleles (distances <50%), representing recent transmissions between closely related strains. b Dated ClonalFrameML tree of the final set of strains considered to belong to the US deep clone Hp_Clone_US-1, including five publicly available genomes. Node ages correspond to years based on a previously estimated 1.38 × 10−5 mutation rate per site per year. The colored dots represent the geographical origin of each strain.