Clonal dynamics and somatic evolution of haematopoiesis in mouse

Kapadia, Chiraag D.; Williams, Nicholas; Dawson, Kevin J.; Watson, Caroline; Yousefzadeh, Matthew J.; Le, Duy; Nyamondo, Kudzai; Kodavali, Sreeya; Cagan, Alex; Waldvogel, Sarah; Zhang, Xiaoyan; De La Fuente, Josephine; Leongamornlert, Daniel; Mitchell, Emily; Florez, Marcus A.; Sosnowski, Krzysztof; Aguilar, Rogelio; Martell, Alejandra; Guzman, Anna; Harrison, David; Niedernhofer, Laura J.; King, Katherine Y.; Campbell, Peter J.; Blundell, Jamie; Goodell, Margaret A.; Nangalia, Jyoti

doi:10.1038/s41586-025-08625-8

Download PDF

Article
Open access
Published: 05 March 2025

Clonal dynamics and somatic evolution of haematopoiesis in mouse

Nature volume 641, pages 681–689 (2025)Cite this article

37k Accesses
11 Citations
159 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 19 September 2025

This article has been updated

Abstract

Haematopoietic stem cells maintain blood production throughout life¹. Although extensively characterized using the laboratory mouse, little is known about clonal selection and population dynamics of the haematopoietic stem cell pool during murine ageing. We isolated stem cells and progenitors from young and old mice, identifying 221,890 somatic mutations genome-wide in 1,845 single-cell-derived colonies. Mouse stem cells and progenitors accrue approximately 45 somatic mutations per year, a rate only approximately threefold greater than human progenitors despite the vastly different organismal sizes and lifespans. Phylogenetic patterns show that stem and multipotent progenitor cell pools are established during embryogenesis, after which they independently self-renew in parallel over life, evenly contributing to differentiated progenitors and peripheral blood. The stem cell pool grows steadily over the mouse lifespan to about 70,000 cells, self-renewing about every 6 weeks. Aged mice did not display the profound loss of clonal diversity characteristic of human haematopoietic ageing. However, targeted sequencing showed small, expanded clones in the context of murine ageing, which were larger and more numerous following haematological perturbations, exhibiting a selection landscape similar to humans. Our data illustrate both conserved features of population dynamics of blood and distinct patterns of age-associated somatic evolution in the short-lived mouse.

Clonal dynamics of haematopoiesis across the human lifespan

Article Open access 01 June 2022

The longitudinal dynamics and natural history of clonal haematopoiesis

Article Open access 01 June 2022

Clonal tracing with somatic epimutations reveals dynamics of blood ageing

Article Open access 21 May 2025

Main

The haematopoietic system sustains mammalian life through the generation of oxygenating red blood cells, immune cells and platelets. In humans, blood production accounts for 86% of daily cellular turnover, generating approximately 280 billion cells per day¹. This process relies on a hierarchy of progenitors that successively amplify cellular output towards differentiated blood cells, in turn derived from a heterogeneous pool of haematopoietic stem cells (HSCs)^2,3,4.

Like all somatic cells, HSCs accumulate somatic mutations with age^5,6. In humans, some mutations promote cellular fitness, driving clonal outgrowth⁵. Such ‘clonal haematopoiesis’, although at very low levels in younger individuals, drives ubiquitous and dramatic loss of clonal diversity in older people⁵. Clonal haematopoiesis is a known risk factor for blood cancers and age-associated non-cancerous disease^7,8.

Whether these patterns of somatic evolution are features of haematopoietic ageing in other species is unknown. In Mammalia, the rate of somatic mutation accrual in colonic epithelium inversely scales with lifespan such that species acquire a similar magnitude of mutations by the end of life⁹. It is unclear if this pattern extends beyond the colon and whether the consequences of somatic evolution over human life scale to shorter-lived species.

The inbred laboratory mouse has been extensively used to study haematopoiesis and establish fundamental tenets of stem cell biology. The most common strain, C57BL/6J, has a median lifespan of 28 months¹⁰, 35-fold shorter than that of humans, and recapitulates many phenotypes of human ageing, with initial data showing a lower rate of clonal haematopoiesis¹¹. Here, we study the ontogeny, clonal dynamics and selection landscapes of murine HSCs in vivo to understand the evolutionary processes shaping the maintenance and ageing of blood production.

Whole-genome sequencing of stem cells

We purified HSCs from three young (3 months) and three aged (30 months) healthy C57BL/6J female mice (Fig. 1a and Extended Data Fig. 1a), ages estimated to be the human lifespan equivalent of approximately 20 and 85–90 years, respectively (Supplementary Note 1). Longstanding views have held that haematopoiesis is supported by long-term HSCs, which give rise to multipotent progenitors (MPPs, or short-term HSCs). Both HSCs and MPPs, distinguished by cell-surface markers, support haematopoiesis and produce all differentiated blood cell types, but only HSCs engraft following serial transplantation^{12,13,14,15,16}. To examine both populations in vivo, single HSC- and MPP-derived colonies from 3-month and 30-month animals (Fig. 1a and Extended Data Fig. 1b) underwent whole-genome sequencing (WGS) to an average depth of 14× (61–235 HSCs, 70–191 MPPs per animal; Fig. 1b). We purified HSCs (n = 242) from 17 additional mice aged 1 day to 30 months (9–24 colonies per animal). Following exclusion of 139 colonies (Extended Data Fig. 1c), 1,547 whole genomes (908 HSCs, 639 MPPs) went forward for somatic mutation identification and phylogenetic reconstruction.

**Fig. 1: Somatic mutations in murine stem cell-derived haematopoietic colonies.**

Somatic mutation accumulation

We observed a constant rate of somatic mutation accumulation with age (Fig. 1c). At 3 months, HSC/MPP had an average of 59.5 single-base substitutions (SBSs) (95% confidence interval (CI) 57.3–61.7), with 161.4 SBSs (CI 155.1–167.8) by 30 months. Annually, 45.3 SBSs (CI 42.2–48.4) were acquired, with a somatic mutation occurring every 8–9 days, corresponding to a mutation rate of 8.3 × 10⁻⁹ per base pair per year (CI 7.7–8.9 × 10⁻⁹ per base pair per year). Few DNA insertions or deletions were captured with no chromosomal changes observed. Previous studies indicate that MPPs are a more rapidly cycling population^3,17,18 thought to amplify cell production from HSCs, which could result in a greater mutation burden. We observed no difference in mutation burden between HSCs and MPPs (Fig. 1d), consistent with the lack of differences in mutation burdens between human HSCs and differentiated blood cells¹⁹.

The murine HSC SBS rate is approximately three times that of humans (14–17 SBSs per year)^5,6,20,21, consistent with the concept that somatic mutation rates negatively correlate with lifespan⁹. However, the tenfold difference in ultimate mutation burden (approximately 150 in aged murine HSCs versus more than 1,500 in older human HSCs) is greater than the end-of-life somatic mutation burden variation observed across mammalian intestinal crypts⁹. Thus, we wished to validate the lower-than-expected somatic mutation burden observed in aged murine stem cells.

First, we compared genome-wide mutation burdens in HSCs to matched microdissected intestinal crypts from the same aged mice (n = 16, range 5–6 per animal; Extended Data Fig. 1d). Colonic epithelium had a higher mutation burden (Fig. 1e), as previously reported⁹, confirming that we were not underestimating mutations in HSCs. Next, we undertook nano-error rate whole-genome duplex sequencing¹⁹ of matched blood to orthogonally validate HSC mutation burden. Mutation burden was not statistically different from that of haematopoietic colonies (Fig. 1e) but showed a non-significant trend towards higher mutation burden in whole blood, probably due to the presence of lymphoid cells that have more mutations²². Together, these data confirm that somatic mutation rates in blood do not inversely scale with lifespan to the same degree as observed in colon.

Aetiology of mutational processes in mouse

The pattern of sharing of somatic mutations across individual colonies can be used to reconstruct a phylogenetic tree that depicts their ancestral lineage relationships. We use ‘lineage’ here to represent the direct line of descent rather than different blood cell types. Figure 2 shows the phylogenetic trees for a 3-month-old mouse and a 30-month-old mouse, with more phylogenies in Extended Data Fig. 2. Tips of the trees are individual HSC- (blue) and MPP-derived colonies (red); branches that trace upwards from tip to root reflect the somatic mutations present in an individual colony and how these mutations are shared across other colonies. Individual branchpoints (‘coalescences’) represent ancestral cell divisions wherein descendants of both daughter cells were captured at sampling. Colonies that share a common ancestor on the phylogeny represent a clade.

**Fig. 2: Phylogenetic trees of HSCs and MPPs from a young mouse and an old mouse.**

We explored the aetiology of the higher rate of mutation accumulation in murine HSC/MPPs compared to human. DNA replication during cell division is one source of somatic mutations reflecting DNA polymerase base incorporation errors. Nodes on the phylogenies with more than two descendant lineages (polytomies) are evidence of ancestral cell divisions not associated with a somatic mutation and can be used to infer the number of mutations acquired per cell division (Extended Data Fig. 3). We observed 266 lineages by 12 mutations of molecular time in 5 donors that had adequate (more than 10 lineages) diversity. Of the 265 symmetrical self-renewing cell divisions that would have been required, 44 were mutationally silent, providing an estimate of 1.80 (CI 1.46–2.19) mutations per cell division during early life (Extended Data Fig. 3). This is not significantly different from that observed in humans (1.84 mutations per cell division, P = 0.5) (ref. ²³), indicating that excess mutation accumulation in murine HSCs does not reflect poorer fidelity during DNA replication.

Mutagenic biological processes yield distinguishable patterns of base substitutions at trinucleotide sequence contexts, termed mutation signatures. We identified three mutational processes (Extended Data Fig. 4a and Methods): SBS1, reflecting the spontaneous deamination of methylated cytosines; SBS5, attributed to cell-intrinsic damage and repair; and SBS18, characterized by C>A transversions potentially linked to oxidative damage. As in human HSCs, SBS1 and SBS5 increased with age in keeping with their clock-like nature (Extended Data Fig. 4b). Mutations attributed to SBS18, previously identified in murine colorectal crypts⁹, preferentially appeared early in life (Extended Data Fig. 4c,d) reminiscent of their presence in human placenta and human fetal HSCs^23,24. Overall, the higher relative somatic mutation accumulation rate in mice seems underlaid by context-specific mutational processes (SBS18), together with a higher rate of endogenous DNA damage and/or reduced repair (SBS1/SBS5).

Parallel establishment of HSCs and MPPs

The classic view of haematopoietic differentiation, based largely on transplantation studies, held that MPPs derived from HSCs^25,26. Recent barcoding and single-cell approaches in non-perturbed haematopoiesis indicate a more nuanced picture with several self-renewing progenitor populations^2,4. Using the phylogenies, we retraced stem cell ontogeny in vivo during unperturbed haematopoiesis. We inferred the identity of ancestral branches using the identity of their nearest sibling cell (Supplementary Note 2). Clear vertical bands of HSC-only (blue) and MPP-only (red) ancestral lineages were observed across the trees, representing independent clades (Fig. 2a,b and Extended Data Fig. 2), with only a minority of HSCs arising from MPPs and vice versa. The phylogenetic separation of MPPs and HSCs indicates that most HSCs are derived from HSC self-renewing divisions, and most MPPs are derived from MPP self-renewing divisions, with the two populations independently contributing to blood production in parallel over life (Fig. 2c).

The lack of intermixing between HSCs and MPPs indicates long-term inheritance of an HSC or MPP state. Such stable heritable identity appears by 25 mutations of molecular time, indicating that a substantial proportion of HSC and MPP populations appear and diverge early in life. Around 50 somatic mutations seem to be acquired before birth (mixed effects model intercept 48.2, CI 45.61–50.8). Thus, HSCs and MPPs are established in parallel during fetal development.

To explore when in utero identity was established, we next evaluated somatic mutations present in both HSCs and matched colonic crypts. Blood is mesoderm-derived and colonic epithelium from endoderm; thus, any shared mutations occurred in embryonic cells prior to gastrulation. Mutations on the haematopoietic phylogeny were observed in sampled colonic crypts down to 9–11 mutations of molecular time (Extended Data Fig. 3), timing these shared mutations to have occurred during gastrulation. Indeed, branches with an inferred HSC or MPP identity did not share mutations with the colon, consistent with these lineages being established after germ layer specification.

Given the probable embryonic establishment of distinct HSC and MPP pools, we considered the simplest series of cell state changes (such as HSC to MPP) to capture the observed cell identities. We first considered the prevailing view that MPPs are generated from HSCs. We counted the number of cell identity changes required to reach the sampled cell identity. Surprisingly, the HSC-to-MPP model was equivalently parsimonious (that is, requiring a similar number of cell state changes) to a model where all cells start as MPP (with HSCs arising from MPPs), an ontogeny not generally considered probable (Fig. 2d and Supplementary Note 2). Overall, our data indicate that many long-term HSC and MPP lineages are established independently and in parallel during early development and that MPPs do not always arise from HSCs, contrary to classical views but consistent with recent barcoding evidence^2,4.

Modelling HSC and MPP transitions

To estimate the rate of HSC–MPP transitions through time, we developed a hidden Markov tree model. We defined three unobservable ancestral states—embryonic precursor cell (EMB), HSC and MPP—and inferred the sequence and transition rates between these states during life. We considered all cells prior to gastrulation (fewer than ten mutations) as EMB and assumed a fixed probability of transitioning out of this state to either HSC or MPP (with subsequent fixed probabilities of further transitions) per unit of molecular time.

By fitting the model to each donor, each age group or the whole cohort, we found that a model fit to young and old mice separately best fit the data (Supplementary Note 2 and Supplementary Table 1). A model in which EMB transitions to either HSC or MPP was significantly better than an ‘HSC-first’ model, where EMB must transition to HSC prior to MPP specification. However, we were only able to reject an ‘HSC-first’ ontogeny model in older mice and not younger mice, as our data indicated more frequent HSC-to-MPP transitions earlier in life (Fig. 2e). This difference between young and old mice could be explained if the HSCs that produce MPPs early in life are extinguished by old age and thus are not sampled in the phylogeny. Alternatively, the rate of HSCs that transition to MPPs may be greater earlier in life. Our model indicates that 50% of all HSC and MPP lineages in young and aged mice had committed to their cell state before 50 mutations of molecular time, probably before birth, confirming our earlier qualitative observations. As might be expected, HSC-to-MPP transitions were more frequent than MPP-to-HSC transitions, which were extremely rare (1 in 1,000 transitions) and within the plausible limits of cell-sorting accuracies.

Stem cell population dynamics over life

The pattern of coalescences in the phylogenetic tree reflects the ratio (N/λ) of population size (N) and the rate of HSC self-renewing cell divisions (λ)—both smaller populations and more frequent cell divisions shorten the interval between coalescences. Human haematopoietic phylogenies have a profusion of early coalescences, reflecting rapid cell division during embryonic growth. Coalescences are then infrequently observed during adulthood because of a large stable HSC population, reappearing in older age in clonal expansions as clonal diversity collapses⁵.

By contrast, young and old murine haematopoietic phylogenies display coalescences continuing down the tree (Fig. 2a,b and Extended Data Figs. 2 and 5). Using a phylodynamic Bayesian framework to infer the HSC population trajectory (N/λ), we infer an early period of exponential HSC growth followed by progressively increasing N/λ over life (Fig. 3a), consistent with increased HSC numbers with age by flow cytometry (Fig. 3b) and other studies^27,28. Our findings contrast with haematopoietic progenitor population trajectories in humans⁵, which exhibit a population growth plateau during adulthood. Interestingly, we infer entirely overlapping N/λ trajectories for HSCs and MPPs. Together with their similar mutation burdens and lineage independence, HSC and MPP clonal dynamics during steady state in vivo haematopoiesis seem indistinguishable.

**Fig. 3: Population dynamics and selection in the murine stem cells.**

We developed an HSC/MPP population dynamics model in which stem cells grow towards the target population size, taking into account loss of cells through cell death or differentiation. Using approximate Bayesian computation (ABC)²⁹, which generates simulations of phylogenetic trees to estimate the most likely posterior distributions of population size and symmetrical self-renewing division rates (Fig. 3 and Extended Data Fig. 6), we estimate that the murine HSC–MPP population grows to around 70,000 cells (median 72,414, CI 25,510–98,540). Symmetric cell divisions occur roughly every 6 weeks (median 6.4 weeks, CI 1.8–13.2 weeks). Stem cells exit the population by either death or differentiation about once every 18 weeks (CI 2.3–357 weeks).

Contribution to progenitors and blood

Given the observed similarities between HSC and MPP, we studied if HSCs and MPPs might differentially contribute to downstream blood cell production. Using further WGS of colonies (n = 298) derived from a mixed progenitor compartment including granulocyte/macrophage-biased MPPs (MPP^GM) and lymphoid-biased MPPs (MPP^Ly), we expanded the phylogenetic trees of the three aged animals. We observed no preferential bias in MPP^GM or MPP^Ly emerging from HSC versus MPP (Extended Data Fig. 7a–e), indicating that HSCs and MPPs produce downstream progenitors at similar proportions. However, this analysis is limited by relatively low numbers of sampled MPP^GM and MPP^Ly.

To assess lineage contribution to mature blood production, we used deep sequencing of matched blood to measure the fraction of cells harbouring mutations from the phylogenetic trees. Mutations corresponding to both ancestral HSCs and MPPs were recaptured in blood, indicating that both cell types actively contribute to mature blood production. Mutations private to single cells on the phylogeny were highly subclonal (less than 0.1% variant allele fraction (VAF); Extended Data Fig. 8a), in line with each HSC/MPP contributing only a small amount of overall blood production. We observed a slight bias towards increased representation of ancestral MPP lineages compared to HSCs, although this difference was subtle (Extended Data Figs. 8b and 9) and may be due to increased proliferation of MPP descendants or differences in compartment population size earlier in life.

Absence of large clonal expansions

The absence of expanded clades in murine phylogenetic trees (Fig. 3c) indicates maintenance of clonal diversity, in contrast to the oligoclonality observed in older humans⁵ (Extended Data Fig. 5). Population dynamics simulations confidently recapitulated observed phylogenies under a model of neutral growth without positive selection. No colonies (n = 1,305) displayed mutations in murine orthologues of human clonal haematopoiesis genes. Among all 88,053 SNVs, the relative rate of non-synonymous mutation acquisition also did not significantly depart from neutrality (Fig. 3d), with no new genes identified as being under selection (Supplementary Data 1).

Given that mutation entry, which furnishes a population with phenotypic variation and substrate for selection, is occurring at a higher rate in mice, we considered reasons for the lack of observable clonal expansions and absence of positive selection on non-synonymous mutations, both of which manifest ubiquitously in human haematopoiesis⁵. One possibility is that there are insufficient HSC and MPP divisions in the short lifespan of mice to facilitate detectable clonal expansions of cells with fitness-inferring mutations. Alternatively, as both population size and the frequency of self-renewing cell divisions (captured in N/λ) determine the rate of random drift and hence the drift threshold that selection must overcome³⁰, the fitness (s) of newly arising mutations may also be insufficient for their carrier subclones to exceed the genetic drift threshold in a mouse lifespan (s = λ/N representing the drift threshold³⁰).

In the first scenario, clones under positive selection (that is, with necessary driver mutations) will still be present but would be too small to detect using a phylogenetic approach that only readily identifies larger clones (greater than 5% clonal fraction). In the second scenario, the fitness landscape of any detectable clones would reflect the specific murine haematopoietic drift threshold. We explored both possibilities.

Native and modulable murine clonal haematopoiesis

We examined murine blood (3–37 months) for very small, expanded clones and the presence of clonal haematopoiesis using targeted duplex-consensus sequencing of murine orthologues of 24 genes associated with human clonal haematopoiesis. Expanded clonal haematopoiesis clones clearly increased in prevalence with age (Fig. 4a,b). Average clone size was very small at 0.017% of nucleated blood cells (range 0.0036–0.27%) representing clonal fractions between 1 in 500 to 1 in 30,000 cells. Clonal expansions harboured mutations in Dnmt3a and Tet2, genes frequently mutated in human clonal haematopoiesis, but also Bcor and Bcorl1, observed in humans following bone marrow immune insult^7,8. These data are consistent with a previous report identifying expanded clones in mice following bone marrow transplantation¹¹.

**Fig. 4: Clonal haematopoiesis during normal ageing in mouse.**

Increased clonal prevalence with age was observed across different laboratory strains, including the genetically heterogeneous HET3 strain and at similar clonal fractions (Fig. 4c,d), confirming that small clones with clonal haematopoiesis drivers are not specific to the C57BL/6J strain. Clones were present in biological replicates and persistent in mice sampled longitudinally over four months, although individual clonal dynamics varied (Fig. 4e). Variants displayed enrichment for non-synonymous mutations across these genes with positive selection evident for Dnmt3a, Bcor and Bcorl1 (normalized ratio of non-synonymous to synonymous somatic mutations (dN/dS) > 1, q < 0.1) (Fig. 4f). These data indicate that such small clonal expansions in murine blood are under positive selection and not the result of genetic drift.

Laboratory mice are maintained in exceptionally clean and controlled conditions. By contrast, microbial exposures and systemic insults in humans shape the landscape of clonal haematopoiesis³¹. To examine whether the murine haematopoietic selective landscape can be similarly altered, we applied a series of infectious or myeloablative exposures.

We subjected mice to a normalized microbial experience (NME) in which laboratory mice experience several bacterial, viral and parasitic pathogens³² through exposure to fomite (pet store) bedding. Aged NME-exposed mice displayed an increased burden of somatic clones, especially driven by Trp53 (Extended Data Fig. 10a). Targeted infection with the single pathogen Mycobacterium avium, which has been shown to activate HSCs and lead to chronic inflammation³³, led to increased frequency of Bcor, Tet2 and Asxl1 mutant clones (Extended Data Fig. 10b). To observe the effect of myeloablation, aged mice were also treated with chemotherapeutic agents 5-fluorouracil and cisplatin. When treated with cisplatin, we observed globally increased somatic clonal burden (Extended Data Fig. 10c, P = 0.027) with increased Trp53, Tet2 and Asxl1 and statistically significant positive selection for Trp53 (Extended Data Fig. 10c), analogous to humans^34,35. Similarly, 5-fluorouracil exposure led to clones at magnitudes-greater proportions than age-matched controls (Extended Data Fig. 10d). These data illustrate that haematopoietic mutation accrual and selection are sufficient to drive native clonal haematopoiesis in mice, with modulable selection landscapes.

Fitness landscape of murine clonal haematopoiesis

We wished to understand if the fitness landscape of clonal haematopoiesis drivers in mice influenced the small observed clone sizes (median 0.017%) compared to humans at equivalent lifespan ages. We estimated the fitness landscape of the driver mutations using an established continuous time branching evolutionary framework for HSC dynamics³⁶.

By analysing the distribution of neutral mutation VAFs (clones at low VAF bearing synonymous or intronic mutations), we can derive independent orthogonal estimates for the underlying effective population size (N/λ) and mutation rates (μ). We estimated N/λ at 16,500 HSC-years (CI 11,122–21,836; Fig. 5a), consistent with estimates from phylogenetic trees (N/λ 7,918 HSC-years, CI 2,277–20,309). Differences in these estimates are probably influenced by the following: (1) ABC allows for the inferred population growth, whereas the branching evolutionary framework assumes a stable population size; and (2) the branching evolutionary framework model uses the intronic/synonymous mutation rate as the background genome-wide mutation rate, unlike ABC. In the targeted gene panel, the synonymous/intronic mutation rate was estimated at 1.8 × 10⁻⁴ base pairs (bp) per year (CI 1.2 × 10⁻⁴–2.7 × 10⁻⁴) and the non-synonymous mutation rate at 3.4 × 10⁻⁴ bp per year (CI 2.9 × 10⁻⁴–3.9 × 10⁻⁴). The total mutation rate, when scaled to total genome size, corresponds to a global mutation rate of 11.77 × 10⁻⁹ bp per year (CI 9.28 × 10⁻⁹–14.94 × 10⁻⁹). Encouragingly, this is similar to the mutation rate directly observed from the phylogenetic trees (8.29 × 10⁻⁹ bp per year, CI 7.73 × 10⁻⁹–8.85 × 10⁻⁹).

**Fig. 5: The fitness landscape of known drivers of clonal haematopoiesis.**

Finally, we estimated the distribution of fitness effects driven by non-synonymous mutations. Our analysis indicates that approximately 7% (CI 5–21%) have strong fitness effects (50–200% growth per year) (Fig. 5b). Considering that we infer mouse stem cells to be self-renewing roughly every 6 weeks (CI 2.3–12.5 weeks), an annual growth rate of 200% translates to a per-symmetrical-self-renewing division selective advantage of approximately 15% (5–30%), in line with reported selection coefficients of mutated genes associated with clonal haematopoiesis in humans^5,21,36. Indeed, in the short-lived mouse, variants with weaker fitness (less than 50%) might have insufficient time to reach exponential, deterministic growth in the population, given that clones are not established until ${t}_{{\rm{years}}} > \frac{1}{s}$ (ref. ³⁷), although any background growth in population size could circumvent this, allowing for weaker variants to fix in the population. These data may also explain why some of the low-VAF clones identified by duplex consensus sequencing did not increase in clone size over time (Fig. 4e).

Discussion

Here, we study the ontogeny, population dynamics and somatic evolution of HSCs in the most widely used mammalian model organism, the laboratory mouse. Our data indicate that HSC and MPP populations are established during embryogenesis, following which they independently self-renew throughout murine life. That MPPs are generated from HSCs in a transplant setting^{12,13,14,15,16} may underscore their potential in an experimental setting versus in vivo function. Both MPPs and HSCs contribute to differentiated progenitors and peripheral blood production, with a possible slight bias of contribution from ancestral MPP lineages. These data align with lineage tracing data^2,3 and reports of embryonic MPPs^2,38. HSCs and MPPs then grow in lockstep over life with indistinguishable clonal dynamics, reaching 25,000–100,000 cells, similar to the human HSC pool size (20,000–200,000 stem cells)^5,20 and reminiscent of the indicated conservation of stem cell numbers across mammalian species³⁹. Considering the log-fold difference in body mass and consequent demands on blood production, this similarity may be surprising. In both organisms, but especially the mouse, the number of stem cells exceeds apparent lifetime need; the stem cell compartment of a single mouse can fully reconstitute the blood of approximately 50 transplant recipients⁴⁰. A large stem cell pool may confer an advantage in the face of naturally occurring exposures to environmental pathogens⁴¹ and tissue injury, through increased tolerance to stem cell losses and improved adaptation afforded by somatic cell diversity.

Somatic mutation rates scale inversely with mammalian lifespan in colonic epithelium—mice accumulate mutations 20 times faster than humans, aligned with the difference between their lifespans⁹. Our data show this pattern does not extend to blood; the murine HSC mutation rate is only 3-fold higher than human^5,6,20,21 despite a 35-fold shorter lifespan, indicating tissue-specific evolutionary constraints on mutation rates across species. Indeed, somatic mutation rates in germline cells are lower in mouse than in human⁴² and under the influence of distinct factors such as effective population size and age of reproductive maturity⁴³. In blood, a low somatic mutation rate may minimize entry of detrimental disease-causing mutations, which, when combined with a large stem cell pool, may also reduce the probability that such mutations become established in the pool. Alternatively, the low HSC mutation rate may simply be a feature of phylogenetic legacy⁴⁴.

Patterns of somatic evolution in humans provide one plausible mechanism by which ageing phenotypes occur: the presence of clonal expansions in older human blood driven by somatic mutations is associated with diseases of ageing⁸. However, we only observe small mutation-driven clonal expansions in murine blood by the end of life, indicating that any role age-associated haematopoietic oligoclonality has in human ageing is not shared by the laboratory mouse. The size of clonal expansions is constrained in mice partly because of infrequent HSC self-renewing divisions. Our data fit with mouse stem cells self-renewing every 6 weeks (1.8–13.2 weeks), in the range of previous estimates (every 4–24 weeks^45,46,47). While this is more frequent than human HSCs, which divide 1–2 times a year, for patterns of oligoclonality in humans to be observed in the shorter-lived mouse, stem cells would need to self-renew much more frequently. Long-lived species, such as non-human primates, may more closely resemble human haematopoietic population dynamics^48,49. Alternatively, mouse strains with higher HSC turnover⁵⁰ or exposed to more sustained systemic insults may better mimic human patterns of clonal haematopoiesis. Murine haematopoietic clones expand on systemic exposures, recapitulating patterns observed in human studies^34,35 and following murine transplant^8,31. Our data indicate a conserved selection landscape in mouse during homoeostatic haematopoiesis and under stress that is driven by mutations in some of the same genes as seen in humans and observable with highly sensitive sequencing. With the evolutionarily conserved constraints on stem cell dynamics in blood, together these drive a distinct pattern of somatic evolution over the murine lifespan. Our data provide a framework for the interpretation of future studies of HSC biology and ageing using the laboratory mouse.

Methods

Cohort

Wild-type C57BL/6 mice were bred at Baylor College of Medicine or received from the Aged Rodent Colony at the National Institute of Aging (Baltimore, MD). C57BL/6J:FVB/NJ F₁ hybrid mice were bred in the Niedernhofer laboratory at the University of Minnesota as previously described⁵². HET3 mice were bred at the Jackson Laboratories as described⁵³. C57BL/6 were housed at the AAALAC-approved Center for Comparative Medicine in BSL-2 suites. Experimental procedures were approved by the Baylor College of Medicine or University of Minnesota Institutional Animal Care and Use Committees and performed following the Office of Laboratory Animal Welfare guidelines and PHS Policy on Use of Laboratory Animals. Mice were housed with the same sex in ventilated cages, under a 14/10-h light–dark cycle with temperature and humidity control and enrichment material and fed ad libitum with a standard chow diet. Female mice were chosen for internal consistency and because of their slightly longer lifespan. All mice were nulliparous.

Haematopoietic progenitor purification

Whole bone marrow cells were isolated from murine hindlimbs and enriched for c-Kit+ haematopoietic progenitors prior to fluorescence-activated cell sorting using a BD Aria II and BD FACSDiva software (v.9.0). Whole bone marrow was incubated with anti-CD117 microbeads (Miltenyi Biotec) for 30 min at 4 °C following magnetic column enrichment (LS Columns, Miltenyi Biotec). Progenitor-enriched cells were stained with an antibody cocktail to identify specific progenitor populations using a recent consensus definition⁵¹. LSKs containing a mixture of stem and progenitor cells were defined as Lineage^–c-Kit⁺Sca-1⁺ (Lineage^– refers to being negative for expression of a set of lineage-defining markers indicated below). HSCs were defined as LSK⁺FLT-3^–CD48^–CD150⁺; MPPs were defined as LSK⁺FLT-3^–CD48^–CD150^–. MPP^GM was defined as LSK⁺FLT-3^–CD48⁺CD150^–, and MPP^Ly was defined as LSK⁺FLT-3⁺CD150⁻. The gating strategy is illustrated in Extended Data Fig. 1a. This immunophenotypic HSC population includes long-term stem cells with serial repopulating ability, whereas the MPP population is limited to short-term repopulation, as demonstrated in transplantation assays^{12,13,14,15,16}. For sorting HSCs from newborn pups, the lineage marker Mac1 was excluded because it is known to be highly expressed on fetal HSCs⁵⁴. Antibodies were c-Kit/APC, Sca-1/Pe-Cy7, FLT-3/PE, CD48/FITC, CD150/BV711, Lineage (CD4, CD8, Gr1, Mac1, Ter119)/eFlour-450 and purchased from BD Biosciences or eBioscience. All antibodies were used at 1:100 dilution for staining, except FLT-3/PE and CD150/BV71, which were used at 1:50 dilution. Flow cytometry data was analysed using FlowJo (v.10.8.1).

Single-cell haematopoietic colony expansion in vitro

Cell sorting was performed on a BD Arial II in two stages. First, HSCs and MPPs were sorted into separate tubes containing ice-cold fetal bovine serum using the ‘yield’ sort purity setting to maximize positive cells. Second, the cell populations from stage one were single-cell index-sorted into individual wells of 96-well flat-bottom tissue culture plates containing 100 µl of Methocult M3434 medium (Stem Cell) supplemented with 1% penicillin–streptomycin (ThermoFisher). No cytokine supplements were added to the base methylcellulose medium. Cells were incubated at 37 °C and 5% CO₂ for 14 ± 2 days, followed by manual assessment of colony growth. Colonies (more than 200 cells) were transferred to a fresh 96-well plate, washed once with ice-cold PBS and then centrifuged at 800g for 10 min. Supernatant was removed to 10–15 µl prior to DNA extraction on the fresh pellet. The Arcturus Picopure DNA Extraction kit (ThermoFisher) was used to purify DNA from individual colonies according to the manufacturer’s instructions; 62–88% of HSCs and MPPs produced colonies, indicating that we are sampling from representative populations in each individual compartment (Extended Data Fig. 1b). Extracted DNA from each colony was topped off with 50 µl Buffer RLT (Qiagen) and stored at −80 °C.

Laser capture microdissection

Matched colonic tissue from the three 30-month-old mice used in this study was dissected and snap-frozen at the time of bone marrow collection. Colon tissue sectioning and laser capture microdissection were performed as previously described⁵⁵. In brief, previously snap-frozen colon tissue was fixed in PAXgene FIX (Qiagen) at room temperature for 24 h and subsequently transferred into a PAXgene Stabilizer for storage until further processing at −20 °C. The fixed tissue was then paraffin-embedded, cut into 10-μm sections and mounted on polyethylene naphthalate membrane slides. Staining of histology sections was done using haematoxylin and eosin as previously described⁹, with scans of each section captured thereafter. Individual colonic crypts were identified, demarcated and isolated by laser capture microdissection using a Leica Microsystems LMD 7000 microscope (Extended Data Fig. 1d) followed by lysis using the Arcturus Picopure DNA Extraction kit (ThermoFisher).

Whole-genome sequencing

For low-DNA-input WGS of haematopoietic colonies (from young and aged mice) and colonic crypts (from aged mice), enzymatic fragmentation-based library preparation was performed on 1–10 ng of colony DNA, as previously described⁵⁵. WGS sequencing (2 × 150 bp) was performed at a median sequencing depth of 14× for haematopoietic colonies and 17× for colonic crypts on the NovaSeq platform. Reads were aligned to the GRCm38 mouse reference genome using bwa-mem. For whole-genome single-molecule (nanorate) sequencing, we used matched whole-blood genomic DNA collected from the three aged mice during tissue collection. Nanorate sequencing library preparation was performed as previously described¹⁹, followed by sequencing to 146–153× coverage on the Illumina Novaseq platform.

Somatic mutation identification and quality control in haematopoietic colonies

Single-nucleotide variants (SNVs) in each colony were identified using CaVEMan⁵⁶, including an unmatched normal mouse control sample that had previously undergone WGS (MDGRCm38is). Insertions and deletions were identified using cpgPindel⁵⁷. Filters specific to low-input sequencing artefacts were applied⁵⁵. As variant calling used an unmatched control, both somatic and germline variants were initially called. Germline variants and recurrent sequencing artefacts were then identified using pooled information across mouse-specific colonies and filters as follows: (1) Homopolymer run filter. To reduce artefacts due to mapping errors or introduced by polymerase slippage, SNVs and indels adjacent to a single nucleotide repeat of length 5 or more were excluded. (2) Strand bias filter. Variants supported by reads only in positive or negative directions are probably artefacts. For SNVs, a two-sided binomial test was used to assess if the proportion of forward reads among mutant allele-supporting reads differed from 0.5. Any variant with significantly uneven mutant read support (cutoff of P < 0.001) and with over 80% of unidirectional mutant reads was excluded. For each indel, if the Pindel call in the originally supporting colony lacked bidirectional support, the indel was excluded. (3) Beta-binomial filter. Variants were filtered on the basis of a beta-binomial distribution across all colonies, as previously described⁹. The beta-binomial distribution assesses the variance in mutant read support at all colonies for a given mutation. True somatic variants are expected to be present at high VAF (approximately 0.50) in some colonies and absent in others, yielding a high beta-binomial overdispersion parameter (ρ). By contrast, artefactual calls are likely to be present at low VAF across many colonies, which corresponds to low overdispersion. The maximum-likelihood estimate of the overdispersion parameter ρ was calculated for each loci. For samples with more than 25 colonies, SNVs with ρ < 0.1 and indels with ρ < 0.15 were discarded. For samples with fewer than 25 colonies, SNVs and indels with ρ < 0.20 were discarded. (4) VAF filters. Variants with VAF significantly lower than the expected VAF for clonal samples across all mutant genotyped colonies, as assessed with a binomial test with P threshold less than 0.001, were discarded. Additionally, variants with VAF less than half the median VAF of variants that pass the beta-binomial filter were discarded. (5) Germline filter. All sites at which the aggregate VAF is not significantly less than 0.45 are assumed to be germline and discarded. The aggregate VAF is derived from the mutant read count across all colonies for a sample. The binomial test with a confidence threshold less than 0.001 was used to assess departure from germline VAF. (6) Indel proximity filter. SNVs were discarded if they occurred in ten bps of a neighbouring indel. (7) Missing site filter. Loci at which genotype information is unavailable because of poor sequencing coverage will interfere with accurate phylogeny construction. Variants that have no genotype or coverage less than 6× in over one-third of samples were discarded. (8) Clustered site filter. SNVs and indels within 10 bp of a neighbouring SNV or indel, respectively, were filtered. (9) Non-variable site filter. Sites genotyped as mutant or wild type in all colonies do not inform phylogeny relationships and are probably recurrent artefacts or germline variants, and thus were discarded.

Some colonies were excluded on the basis of low coverage or evidence of non-clonality or contamination. Visual inspection of filtered variant VAF distributions per colony was used to identify colonies with mean VAF < 0.4 or with evidence of non-clonality (Extended Data Fig. 1c).

Mutation burden estimation

Total SNV burden from WGS of individual colonies was corrected for differing depths of sequencing using a per-sample asymptomatic regression fit⁵⁵ (Extended Data Fig. 1e). A linear mixed-effect model was used to estimate the rate of mutation acquisition with age, taking into account individual animals as a random effect as follows: ${\rm{burden}} \sim {\rm{age}}+(0+{\rm{age}}| {\rm{sampleID}})$.

We filtered nanorate sequencing calls as previously described¹⁹, with the following modifications: we excluded variants (1) mapped to the mitochondrial genome, (2) located within 15 bp of sequencing read ends or (3) observed in all duplex consensus reads, as these are probably germline events. Matched colony WGS data were used as a normal control. Mutation burdens were normalized to diploid genome size to determine the global SNV burdens.

Phylogeny construction and quality control

Phylogenetic trees were constructed on the basis of shared mutations between colonies for each mouse, as extensively described previously^5,21. The steps, in brief, were as follows: (1) Create genotype matrix. Every colony has high sequencing coverage (median 14×) distributed evenly across the genome, allowing the determination of a genotype for nearly every mutated site observed across colonies. Each locus was annotated as Present, Absent or Unknown in a read-depth-specific manner. The number of unattributable sites was low, allowing precise inferences of colony interrelatedness. (2) Infer phylogenetic tree from genotype matrix. We applied the maximum parsimony algorithm MPBoot to construct phylogenetic trees from the genotype matrix. Only SNVs were used to infer tree topology, but both SNVs and indels (if any) were assigned to inferred branches using treemut. Loci with unknown genotypes in at least one-third of colonies were annotated as missing sites and not used in phylogeny inference. (3) Normalize branch lengths for differing sequencing depth and sensitivity. Branch lengths at this stage are defined by the number of mutations supporting each branch (molecular time). However, each colony has slightly different sequencing coverage, which correlates with differences in mutation detection sensitivity. Thus, we normalized branch lengths on the basis of genome coverage to correct for sensitivity differences across colonies with varying depth, as described in ref. ²¹ (Extended Data Fig. 1e). (4) Annotate trees with phenotype and genotype information. Each terminal branch (tip) of a tree represents a specific colony. Thus, we annotated each branch of the tree with the sampled cell phenotype (HSC versus MPP).

Tree-level checks were used to identify any discordant branch assignments and assess the validity of tree topology. Any branches supported by variants with mean VAF < 0.4 likely contained contamination by non-clonal variants and indicated that the filtering strategy (see above) was insufficient. Similarly, the branch-level VAF distributions of every colony (tip) in the tree were manually inspected to confirm supporting variants were not present in unrelated portions of the tree (topology discordance). Finally, the trinucleotide spectra of individual somatic mutations were compared between those mutations located on shared branches (that is, mutations supported by more than two colonies) and mutations only observed once and thus present on terminal branches. Mutation spectra were highly similar, indicating that mutations not shared by more than one colony were not populated by a relative excess of artefacts (Extended Data Fig. 1f).

Population size trajectories

We use the phylodyn package, which uses the density of coalescent events (bifurcations) in a phylogenetic tree to estimate the trajectory of N(t)/λ(t) over time^5,20. Ultrametric lifespan-scaled trees were used to infer chronological timing. Under a neutral model of population dynamics, the phylogeny of a sample is a realization of the coalescent process. In the coalescent process, the rate of coalescent events at time t is proportional to the ratio of population size, N$(t)$, to the birth rate, λ(t) (which in the context of stem cell dynamics is the symmetric cell division rate). The sequence of intercoalescent intervals across any time interval [$t$₁, $t$₂] is informative about the value of the parameter ratio N(t)/λ(t) across the same time interval. We note that only with a constant cell division rate λ over time can the trajectory parameter be interpreted as a scalar multiple of the trajectory of population size N(t). Phylodyn assumes isochronous sampling and a neutrally evolving population. We overlaid separate population size trajectories for HSCs and MPPs in Fig. 3a.

Approximate Bayesian computation

We used inference from phylodynamic trajectories to inform the development of an HSC population dynamics model. Population size trajectories from phylodyn indicated two successive ‘epochs’ of exponential growth, with some variation in growth rate between epochs and a steady increase in population size over time (Fig. 3a). Given the constraint of tissue volumes, it may be implausible that the HSC population grows constantly. We reconcile this discrepancy by noting that there are very few late-in-life coalescences in our phylogenies and, as a consequence, the estimated phylodyn trajectory in late adulthood is associated with very wide credible intervals. We used a population growth model based on a linear birth–death process⁵⁸ (in which a population tends to grow exponentially, subject to stochastic fluctuations), together with a fixed upper bound N on population size. The model assumed a constant birth rate λ and constant death rate ν, with the population trajectory growing at a rate λ − ν in an epoch. The shape of the trajectory of N(t)/λ(t) depends on the cell division rate parameter $\lambda $, not only through the denominator in the ratio N(t)/λ(t), but also on λ − ν, through the tendency of the population size N(t) to grow exponentially at a rate λ − ν. In particular, if we increase the fixed upper limit N and at the same time increase the cell division rate λ, so that their ratio remains constant, the shape of the trajectory of N(t)/λ(t) will change as a consequence of the changes in the value of the parameter λ. This indicates that the parameters λ, ν, (in each epoch) and N, are all identifiable and so can be estimated separately. The identifiability of λ, ν and N are expanded on in Supplementary Note 4.

We applied Bayesian inference procedures²⁹ to estimate the parameters (λ, ν and N) of the bounded birth–death process. We used ABC. This method first generates simulations of population trajectories and (sample) phylogenetic trees across a lifespan. Each population simulation is run with specific values for the population dynamic parameters drawn from a previous distribution over biologically plausible ranges of parameter values. The ABC method includes a rejection step that retains only those parameter values that generated simulated phylogenies resembling the observed phylogeny (as measured by an appropriate Euclidean distance). The accepted simulations constitute a sample from the (approximate) posterior distribution. Population trajectories and sample phylogenies were simulated using the rsimpop R package. Approximate posterior distributions were computed using the R package abc. We specified uniform joint prior densities for λ, ν and N that encompassed published estimates for N (population size) and λ (symmetric division rate)^{16,45,46,47,59}: N ranged from 10² to 10⁵ cells, λ ranged from 0.01 to 0.15 cell divisions per day, and ν ranged from 0 to $\lambda $, such that the growth rate (λ − ν) is always positive (as observed in the phylodyn trajectories).

Our population dynamics model was a birth–death process incorporating two separate growth epochs. The first (early) epoch lasted until 10 weeks post-conception, and the second (later) epoch lasted from 10 weeks onwards and corresponded to murine adulthood. Inferences were weak for the early epoch; thus, the later epoch was used for parameter inferences. Posterior densities from the three older mice were computed using the ‘rejection’ method (Extended Data Fig. 6) and pooled to yield parameter estimates and credible intervals.

Early life polytomy analysis

The polytomies were used to estimate lower and upper bounds for the mutation rate per symmetric division during embryogenesis. The method detailed in ref. ²⁰ was used, whereby the number of edges with zero mutation counts at the top of the tree (up to the first 12 mutations) is inferred from the number and degree of polytomies assuming an underlying tree with binary bifurcations. The mutations per division are assumed to be Poisson distributed. A maximum-likelihood range is then calculated in two steps: first, using the 95% CI of the proportion of zero length edges, with this next leading to a maximum-likelihood estimate for the Poisson rate. Sample M7183 lacked sufficient early life diversity (fewer than 10 unique lineages in 12 mutations molecular time) and was excluded.

Shared variants between blood and colonic crypts

Mutation genotype matrices (described above) were generated for colonic crypt samples at loci observed in truncal (shared) branches in the matched HSC tree. Every variant was annotated as present or absent for each colonic crypt. We applied two stages to crypt annotation. First, a crypt sample was marked positive if the given variant exceeded a per-sample minimum VAF threshold. The minimum VAF threshold was defined as half the median VAF for all pass-filter colonic crypt variants (as described above). Next, for each variant represented in at least one crypt, any remaining crypt with greater than 2 mutant allele read support was marked positive. This tiered definition allowed for shared variant capture despite differences in coverage among crypt samples. The proportion of a shared variant present among crypt samples was illustrated as a pie chart and annotated to the respective branch of the matched HSC tree (Extended Data Fig. 3).

Mutational signature analysis

We used the Hierarchical Dirichlet Process (HDP) algorithm to extract mutation signatures across aged and young HSC and MPP colony samples, following the process detailed in ref. ⁶⁰. Previous work in humans has applied mutation signature extraction to SNVs found only on terminal branches of phylogenetic trees: such terminal branches displayed mutation burdens in excess of 1,000 mutations, depending on the organ. Given the low mutation burden in mouse haematopoietic colonies (terminal branch lengths spanning 30–150 mutations) and thus reduced mutational information, we used all branches with length greater than or equal to 30 mutations as input. To circumvent any bias against shared variants, branches with fewer than 30 SNVs were collapsed to a single ‘shared branch’ sample. We generated mutation count matrices for each branch, using the 96 possible trinucleotide mutational contexts as input to the R package hdp. HDP was run (1) without priors (de novo), (2) with the reference catalogue of all 79 signatures derived from the PanCancer Analysis of Whole Genomes study (COSMIC v.3.3.1) as priors or (3) with the signatures previously defined as active in mouse colon⁹, SBS1, SBS5, SBS18, as priors. Trinucleotide signature definitions were adjusted to mouse genome mutation opportunities before use as priors, and all prior signatures were weighted equally. Signature extraction parameters (1) and (2) produced profiles that did not resemble any existing signatures (cosine similarity < 0.9), probably because of relatively limited SNV burden in mouse colony data. Use of mouse colon signatures as prior information (3) yielded four signature components. Two signature components demonstrated high similarity to SBS1 and SBS5 (cosine > 0.9). The remaining two unknown components were deconvoluted to reattribute their composition to known signatures using the fit_signatures function from sigfit. This yielded three components with a reconstruction cosine similarity metric exceeding 0.99 for similarity to SBS1, SBS5 and SBS18, indicating these three signatures explain the majority of our data (Extended Data Fig. 4a). We surmise the final reattribution step was necessary because of the log-fold lower SNV burdens in mouse blood colonies (30–200 mutations) relative to other tissues examined in previous work (more than 1,000 mutations).

Branch signature assignment and analyses

For each mouse, we pooled the assigned SNVs into a ‘private’ or ‘shared’ category depending on whether the variant maps to a shared branch. Signature attribution to signatures SBS1, SBS5 and SBS18 was then carried out for each of these per mouse category using sigfit::fit_to_signature with the default ‘multinomial’ model. The per-branch attributions were then carried out by (1) assigning a per-mutation signature membership probability and then (2) summing these signature membership probabilities over all SNVs assigned to a branch to obtain a branch-level signature attribution proportion. The per-mutation signature probability was calculated using

$$P({\rm{mutation}}\in {\rm{Sig}})=\frac{P({\rm{mutation}}\in {\rm{Sig}}){P}_{0}({\rm{mutation}}\in {\rm{Sig}})}{{\varSigma }_{{{\rm{Sig}}}^{{\prime} }\in \{{\rm{SBS}}1,{\rm{SBS}}5,{\rm{SBS}}18\}}P({\rm{mutation}}\in {\rm{Sig}}{\prime} ){P}_{0}({\rm{mutation}}\in {\rm{Sig}}{\prime} )}$$

where the prior probability, ${P}_{0}({\rm{mutation}}\in {\rm{Sig}})$, is given by the mean Sigfit attribution probability of the specified signature, Sig, for the category that the mutation belongs to.

A linear mixed-effect model was used to assess the relationship between age and the signature-specific substitution burden for each colony while accounting for repeated measures. The signature-specific burdens per colony were estimated using a linear mixed model (R package lme4) with age as a random effect and mouse ID as grouping variable:

$${{\rm{burden}}}_{{\rm{signature}}} \sim {\rm{age}}+(0+{\rm{age}}| {\rm{mouseID}}).$$

Hidden Markov tree approach

Modelling the ancestral unobserved MPP and HSC states with a hidden Markov tree

We defined three unobservable (‘hidden’) ancestral states, EMB, HSC and MPP, and used the observed outcomes (HSC or MPP tip states) to infer the transition probabilities between these identities and the most likely sequence of cell identity transitions during life. The transitions between states are modelled by a discrete time Markov chain with one step in time representing one mutation in molecular time. We require the root of the tree, presumably the zygote, to start in the ‘EMB’ state and to stay in that state until ten mutations in molecular time. After ten mutations, the cell then has a non-zero probability of transitioning to another state given by the transition matrix M:

$$M=\left(\begin{array}{ccc}1-{p}_{{\rm{HSC}}- > {\rm{MPP}}} & {p}_{{\rm{HSC}}- > {\rm{MPP}}} & 0\\ {p}_{{\rm{MPP}}- > {\rm{HSC}}} & 1-{p}_{{\rm{MPP}}- > {\rm{HSC}}} & 0\\ {p}_{{\rm{EMB}}- > {\rm{HSC}}} & {p}_{{\rm{EMB}}- > {\rm{MPP}}} & 1-{p}_{{\rm{EMB}}- > {\rm{HSC}}}-{p}_{{\rm{EMB}}- > {\rm{MPP}}}\end{array}\right)$$

This then implies the following transition probabilities for branch u having length l(u) (excluding any overlap with molecular time less than ten mutations), starting in state i and ending in state k:

$${P}_{i,k}(u)={({M}^{l(u)})}_{i,k}$$

Now, for a node that is in a specified state, the probability of descendent states is independent of the rest of the tree. This conditional independence property facilitates recursive calculation of a best path (‘Viterbi path’), the likelihood of the Viterbi path and the full likelihood of the observed phenotypes given the model. The approach is essentially an inhomogeneous special case of the approach previously described⁶¹.

Upward algorithm for determining likelihood of the observed states given M and a prior probability of root state π

The probability of the observed data descendant from a node u whose end of branch state is i is given by

$${P}_{u}({D}_{u}| i)=\prod _{v\in {\rm{children}}(u)}\left(\mathop{\sum }\limits_{k=1}^{S}{P}_{i,k}(v){P}_{v}({D}_{v}| k)\right)$$

where S is the number of hidden states (S = 3 in our use), and D_u denotes the observed data descendant of u: that is, the observed tip phenotypes of the clade defined by u.

Initialization of terminal branches

The probability of observing a matching phenotype is assumed to be

$${P}_{u}({\rm{Observed}}\;{\rm{Phenotype}}\;{\rm{of}}\;u=i| i)=1-{\epsilon }$$

The probability of observing a mismatching phenotype, $j\ne i$, is

$${P}_{u}({\rm{Observed}}\;{\rm{Phenotype}}\;{\rm{of}}\;u=j| i)=0.5{\epsilon }$$

The root probability ${P}_{{\rm{root}}}({D}_{{\rm{root}}}| i)$ is calculated recursively from the above, and the model likelihood is given by

$$P=\mathop{\sum }\limits_{i=1}^{S}{{\rm{\pi }}}_{i}{P}_{{\rm{root}}}({D}_{{\rm{root}}}| i)$$

Given the two-stage cell sorting approach described above, we assume nearly error-free phenotyping and set ϵ = 10⁻¹².

Determining the most likely sequence of hidden end-of-branch states

This Viterbi-like algorithm can be run in conjunction with the upward algorithm. Here, instead of summing over all possible states, we keep track of the most likely descendant states for each possible state of the current node u.

The quantity δ_u(i) is the probability of the most likely sequence of descendant states given that node u ends in state i:

$${\delta }_{u}(i)=\prod _{v\in {\rm{children}}(u)}(\mathop{\max }\limits_{k}\{{\delta }_{v}(k){P}_{i,k}(v)\})$$

Additionally, for each node, we store the most probable child states given that u is in state i:

$${\varPsi }_{u,v}(i)={{\rm{argmax}}}_{k}\{{\delta }_{v}(k){P}_{i,k}(v)\}$$

The tip deltas are initialized using the emission probabilities:

$${\delta }_{u}(i)={P}_{u}({\rm{Observed}}\;{\rm{Phenotype}}\;{\rm{of}}\;u| i)$$

The above provides a recipe for recursively finding δ_root(i) and is combined with prior root probability π to give the most likely root state, max_k{δ_root(i)}; in our case, we set the prior probability of ‘EMB’ to unity, so EMB is the starting state. The child node states are then directly populated using Ψ.

Targeted duplex-consensus sequencing

Genomic DNA from freshly collected peripheral blood was purified using the Zymo Quick-DNA Miniprep Plus kit according to the manufacturer’s instructions. 1,650 ng of high-molecular-weight DNA was ultrasonically sheared to an average 300 bp fragment size using a Covaris M220 and ligated to duplex identifier sequencing adaptors⁶² using the Twinstrand Biosciences DuplexSeq library prep kit. A large input of gDNA was used to ensure that the number of input genomic equivalents (about 275,000–330,000 genomes) did not limit the achievable duplex sensitivity. A custom baitset of biotinylated probes was used to enrich sequences targeting mouse orthologues of common human clonal haematopoiesis driver genes over two overnight hybridization reactions. Our target panel spanned 61.8 kb and captured homologous regions from the entire coding region of the following genes: Dnmt3a, Tet2, Asxl1, Trp53, Rad21, Cux1, Runx1, Bcor and Bcorl1 and specific exons with hotspot mutations (as observed in COSMIC) for the following genes: Ppm1d, Sf3b1, Srsf2, U2af1, Zrsr2, Idh1, Idh2, Gnas, Gnb1, Cbl, Jak2, Ptpn11, Brcc3, Nras and Kras. Targeted loci encompass more than 95% of human clonal haematopoiesis events⁶³ and are described in Supplementary Data 2. Libraries were sequenced on the NovaSeq platform to a raw depth between 1 and 3 million reads, corresponding to duplex-consensus depths between 30,000 and 50,000× that vary across targeted exons. Quality control of duplex sequencing is discussed in Supplementary Note 3.

Variant identification in targeted gene duplex-consensus sequencing

Duplex-consensus and single-strand consensus reads were generated using the fgbio suite of tools according the fgbio Best Practices FASTQ to Consensus Pipeline Guidelines (https://github.com/fulcrumgenomics/fgbio/blob/main/docs/best-practice-consensus-pipeline.md). To build a duplex-consensus read, we required at least three reads in each supporting read family (that is, at least three sequenced polymerase chain reaction duplicates of matched top and bottom strands from an original DNA molecule). The ‘DuplexSeq Fastq to VCF’ (v.3.19.1) workflow hosted on DNANexus was also used to generate duplex-consensus reads. Next, VarDict⁶⁴ was used to identify all putative variants, followed by functional annotation using Ensembl Variant Effect Predictor⁶⁵. Finally, numerous post-processing filters were applied to remove false positives and artefactual variants: (1) Quality flag filter. VarDict annotates all variants using a series of quality flags that assess mapping and read-level fidelity⁶⁴. Any variant with a quality flag other than ‘PASS’ was discarded. (2) Read support filter. Duplex sequencing enables detection of somatic variants even from a single read⁶²; however, variants supported by a consensus read (singletons) were found to be highly enriched for spurious calls. Thus, any variant supported by a single read was discarded. (3) Mismatches per read filter. Variants were excluded if the mean number of mismatches per supporting read exceeded 3.0. (4) End repair and A-tailing artefact filter. Library preparation enzymatic steps may introduce false-positive SNVs near read ends due to misincorporation of adenine bases during A-tailing or mistemplating during blunting of fragmented 3′ ends. The fgbio FilterSomaticVcf tool was used to assess the probability that any variant within 20 bp of read ends was due to such enzymatic errors; probable end-repair artefacts were discarded. (5) Read position filter. Variants in positions less than or equal to 15 bp from the 5′ or 3′ end of a consensus read were observed to be enriched for spurious variants based on trinucleotide signature and were discarded. (6) Oxidative damage filter. Mechanical fragmentation (prior to duplex adaptor attachment) creates oxidative DNA damage, often in the form of 8-oxoguanine^66,67, which mispairs with thymine and is fixed after polymerase chain reaction amplification. Any variants fitting the previously described oxidative artefact signature (SBS45) were discarded. (7) Sequencing coverage filter. Variants at loci with duplex depth of less than or equal to 20,000× were considered undersequenced and discarded. (8) Strand bias filter. We used a Fisher’s exact test to assess for forward or reverse strand bias between wild-type and mutant reads. Any variant enriched for unidirectional read support was discarded. (9) Recurrent variant filter. Variants present in 5% or more of samples per duplex-sequencing batch or in five or more independent samples were discarded. (10) Indel length filter. Long insertions or deletions could be attributed to poor mapping, erroneous fragment ligation or false-positive calls by VarDict. Any indels greater than or equal to 15 bp were excluded. (11) High-VAF filter. Germline variants display a VAF of 0.5 or 1.0. Any variants with VAF 0.4 or more were excluded as putative germline variants. (12) Impact filter. Clonal haematopoiesis is driven by functional coding sequence changes in driver genes. Thus, synonymous mutations were excluded during generation of the dot-plots in Fig. 4 and Extended Data Fig. 10c. This filter was not used for analyses that require synonymous variant information (dN/dS, fitness effect estimation). (13) Homologous position filter. Residues conserved with humans are likely to be functional in mice. Variants at loci without a matching reference allele at homologous position in humans were discarded. This filter primarily eliminated intronic variants and was not used for analyses incorporating synonymous variant information. Variants identified are detailed in Supplementary Data 2.

Murine perturbation experiments

Perturbation experiments were initiated in aged (21-month) male and female mice unless otherwise described. Mice were randomly allocated to control or experimental groups. Investigators were not blinded to the group assignment during experiments. For Mycobacterium avium infection, mice were infected with 2 × 10⁶ colony-forming units of M. avium delivered intravenously as previously described⁶⁸. Infected and uninfected control mice were housed in a BSL-2 biohazard animal suite following infection. Mice were infected once every 8 weeks (twice in total) to ensure chronic infection. For cisplatin exposure, mice were exposed to 3 mg kg⁻¹ cisplatin delivered intraperitoneally every 4 weeks, as indicated. Dose spacing was selected to allow for sufficient recovery following myeloablation and blood counts were not altered in cisplatin-treated mice (Fig. 4c), indicating recovery of haematopoiesis. For 5-fluorouracil exposure, 150 mg kg⁻¹ 5-FU was delivered intraperitoneally every 4 weeks two times; this 5-FU dose has previously been shown to drive temporary activation of HSCs in mice^69,70. Exposure to an NME of murine transmissible pathogens was performed as previously described⁷¹. In brief, immune-experienced ‘pet store’ mice were purchased from pet stores around Minneapolis, MN. Aged (24-month) C57BL/6J:FVB/NJ laboratory mice were either directly cohoused with pet store mice or housed on soiled (fomite) bedding collected from cages of pet store mice. Mice were exposed to continuous fomite bedding for 1 month, followed by 5 months recovery on specific pathogen-free bedding before tissue collection. All NME work was performed in the Dirty Mouse Colony Core Facility at the University of Minnesota, a BSL-3 facility. Age-matched C57BL/6J:FVB/NJ F1 laboratory mice maintained in specific pathogen-free conditions were used as controls. For monitoring, peripheral blood (about 50 µl) was collected in EDTA-coated tubes and analysed on an OX-360 automated hemocytometer (Balio Diagnostics). For all aforementioned mouse cohorts, peripheral blood genomic DNA was purified and converted to duplex sequencing libraries as described above.

Differences in clone burden between control and treated cohorts was quantified using a Mann–Whitney test on cumulative VAFs per sample. Gene-level enrichment was measured using a Mann–Whitney test on the maximum VAF for a given gene per sample, normalized for coverage differences between samples. Gene-level dN/dS estimates were generated as described below.

dN/dS analysis

dN/dS can be used to assess for selection within somatic mutations by comparing the observed dN/dS to that expected under neutral selection. We use the R package dNdScv⁷² to estimate dN/dS ratios of somatic mutations derived from whole-genome and targeted-gene duplex-consensus sequencing. To incorporate mouse-specific differences in trinucleotide context composition and background mutation rates, we generated a murine reference CDS dataset using the buildref function and genome annotations in Ensembl (v.102). For the phylogenetic trees, we input all tree variants to the dndscv function. dN/dS output and all coding variants detected in trees are listed in Supplementary Data 1. To examine dN/dS in targeted duplex-consensus sequencing data, we pooled all variants observed in cross-sectionally sampled mice across ages and ran dndscv limited to exons only included on our targeted panel (Supplementary Data 2).

Targeted capture of tree variants

We designed a custom targeted DNA baitset (Agilent SureSelect) targeting mutations on the phylogenetic trees of the aged mice and then queried genomic DNA purified from matched peripheral blood for tree-specific mutations using high-depth targeted sequencing. The baitset was designed to capture mutations on the phylogenetic trees of all three aged mice (MD7180, MD7181 and MD7182) and to cover mutations found in HSCs, MPPs and LSKs. The baitset was designed as follows: (1) All variants on shared branches that pass the SureDesign tool’s ‘moderately stringent filters’. (2) All variants on a random subset of private branches that pass SureDesign’s ‘most stringent filters’. Approximately 25% of the private branches of each mouse were randomly selected. (3) The exons and 3′ and 5′ UTRs for all clonal haematopoiesis driver genes used in our duplex sequencing panel (listed above). Target-enriched libraries were generated according to the manufacturer’s protocol and sequenced using the Illumina Novaseq platform. Baits were sequenced to median depths of 2,616×, 2,549× and 2,628× for MD1780, MD7181 and MD7182, respectively.

To quantify the degree of HSC and MPP contribution to peripheral blood, we estimated the posterior distribution of true VAF for every mutation captured with our targeted baitset. This was done using the Gibbs sampling method previously developed⁷³. Then, for each molecular time t and for each branch that overlaps t, we estimate the VAF of a hypothetical mutation at time t. This is done by arranging our baitset variants in descending estimated VAF order at equally spaced intervals down the branch and then linearly interpolating the VAF at time t based on the estimated VAF of the neighbouring mutations. The aggregate VAF at time t for a tree or lineage is then calculated as the sum of the estimated VAFs of the overlapping branches at time t.

Maximum-likelihood estimates of fitness effects

Evolutionary framework

To generate estimates of fitness effects, mutation rates and population size, we applied an evolutionary framework based on continuous time branching for HSCs, as previously reported³⁶. The framework is based on a stochastic branching model of HSC dynamics, where variants with a variant-specific fitness effect, s, are acquired stochastically at a constant rate μ. Synonymous and non-synonymous mutations detected with duplex sequencing in untreated 24- to 25-month-old mice were used in the analysis. Synonymous and non-synonymous mutations were considered independently. Synonymous mutations are assumed to have no fitness effect and reflect behaviour under neutral drift, whereas non-synonymous mutations were hypothesized to reflect behaviour under a positive selective advantage. The density of variants declined at VAF 5 × 10⁻⁵, so to only include VAF ranges supported by informative variants, only variants above this threshold were included in maximum-likelihood estimations described below.

How the distribution of VAFs predicted by our evolutionary framework changes with age (t), the variant’s fitness effect (s), the variant’s mutation rate (μ), the population size of HSCs (N) and the time in years between successive symmetric cell differentiation divisions (τ) is given by the following expression for the probability density as a function of l = log(VAF):

$$\rho (l)=\frac{\theta }{(1-2{e}^{l})}e-\frac{{e}^{l}}{\varphi (1-2{e}^{l})}\,{\rm{\text{where}}}\,{\theta }=N\tau {\mu }\,\mathrm{and}\,{\varphi }=\frac{{e}^{{st}}-1}{2N\tau s}$$

The value of $\varphi =\frac{{e}^{{st}}-1}{2N\tau s}$ is the typical maximum VAF a variant can reach and this increases with fitness effect (s) and age (t). To reach VAFs greater than φ requires a variant to both occur early in life and stochastically drift to high frequencies, which is unlikely. Therefore, the density of variants falls off exponentially for VAFs greater than φ. For neutral mutations (s = 0),

$$\varphi =\frac{t}{2N\tau }$$

Because the mouse age t is known and the neutral φ is measurable from the data, the ratio φ/t allows us to infer Nτ from the distribution of neutral mutation VAFs. Because the neutral θ is measurable from the data and θ = Nτμ, we can also infer the neutral mutation rate (μ).

Probability density histograms, as a function of log-transformed VAFs, were generated using Doane’s method for log(VAF) bin size calculation. Densities were normalized by the product of bin sample size and width. Estimates for Nτ and μ were inferred using a maximum-likelihood approach, minimizing the L2 norm between the cumulative log densities and the predicted densities. For synonymous mutations, maximum-likelihood estimates were optimized for Nτ and μ. For non-synonymous mutations, variants with VAFs below the observed maximum synonymous VAF (1.99 × 10⁻⁴) were used—these variants are in the ‘neutral’ range—and estimates were optimized for with the Nτ estimated from synonymous mutations.

Differential fitness effects

We estimated the distribution of fitness effects across non-synonymous variants using our derived estimates of Nτ and non-synonymous μ. We parameterized the distribution of fitness effects using an exponential power distribution, which captures a strongly decreasing prevalence of mutations with high fitness:

$${\mu }_{{\rm{non}} \mbox{-} {\rm{n}}{\rm{e}}{\rm{u}}{\rm{t}}{\rm{r}}{\rm{a}}{\rm{l}}}(s)\propto \exp \left[-{\left(\frac{s}{d}\right)}^{\beta }\right]$$

The shape of the distribution was fixed to β = 3 (ref. ⁷⁴). Using the VAF density histograms from non-synonymous variants, we estimated the scale of the distribution and non-neutral mutation rate: ${\int }_{s=0}^{\infty }{\mu }_{{\rm{n}}{\rm{o}}{\rm{n}}\text{-}{\rm{n}}{\rm{e}}{\rm{u}}{\rm{t}}{\rm{r}}{\rm{a}}{\rm{l}}}(s)ds$. The maximum-likelihood fit predicted a scale of about d = 2 and the proportion of non-neutral non-synonymous mutations to be about 12% (Fig. 5b).

Statistics and reproducibility

For all presented data, the sample size n represents the number of biologically independent samples. Individual data points are displayed and represent independent biological replicates. For whole-genome-sequence experiments, colonies from at least three independent biologically replicate mice were queried at each age timepoint. For duplex experiments, each individual mouse queried is an independent biological replicate. In perturbation experiments, displayed data are aggregated from at least two independent replicate treatment cohorts. Mice were randomly allocated to control or experimental groups. Investigators were not blinded to the group assignment during experiments. Replicate experiments were designed identically. All replicate experiments were successful. All statistical analysis was performed using GraphPad Prism (GraphPad Software, v.10) or R (v.4.3) unless otherwise specified. The statistical tests used are detailed in the figure legends. Measure of centre and error bars are described in the figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

WGS data has been deposited at the European Nucleotide Archive at accession numbers ERP138320, ERP152795 and ERP144323. Targeted duplex sequencing data has been deposited at NCBI BioProject PRJNA1033340.

Code availability

SNVs and indels were detected using CaVEMan (v.1.14.0, https://github.com/cancerit/CaVEMan), cgpPindel (v.3.9.0, https://github.com/cancerit/cgpPindel) and VarDict (v.1.8.3, https://github.com/AstraZeneca-NGS/VarDictJava). Variants were annotated using VAGrENT (v.3.7.0, https://github.com/cancerit/VAGrENT) and Ensemble VEP (v.107-110.0, https://github.com/Ensembl/ensembl-vep). Phylogenies were constructed using MPBoot (v.1.1.0, https://github.com/diepthihoang/mpboot). Variants were assigned to phylogenies using Rtreemut (https://github.com/nangalialab/treemut). Population trajectories were inferred using phylodyn (https://github.com/mdkarcher/phylodyn). Bayesian inferences used the packages rsimpop (https://github.com/nangalialab/rsimpop) for simulations and abc (v.2.2.1, https://CRAN.R-project.org/package=abc) for ABC. Mutation signatures were inferred using hdp (https://github.com/nicolaroberts/hdp) and sigfit (v.2.2.0, https://github.com/kgori/sigfit). Duplex consensus reads were generated using the fgbio suite of tools (v.1.5.1–2.1.0, http://fulcrumgenomics.github.io/fgbio/). dN/dS ratios were calculated using dNdScv (v.0.1.0, https://github.com/im3sanger/dndscv). Population genetic analyses of clone sizes and parameter inferences were based on code available at https://github.com/blundelllab/ClonalHematopoiesis/. Other analyses were carried out using custom R scripts available at https://github.com/CDKapadia/somatic-mouse.

Change history

19 September 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41586-025-09635-2

References

Sender, R. & Milo, R. The distribution of cellular turnover in the human body. Nat. Med. 27, 45–48 (2021).
Article CAS PubMed Google Scholar
Patel, S. H. et al. Lifelong multilineage contribution by embryonic-born blood progenitors. Nature 606, 747–753 (2022).
Article ADS CAS PubMed Google Scholar
Sun, J. et al. Clonal dynamics of native haematopoiesis. Nature 514, 322–327 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Kucinski, I. et al. A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 31, 244–259.e10 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kapadia, C. D. & Goodell, M. A. Tissue mosaicism following stem cell aging: blood as an exemplar. Nat. Aging 4, 295–308 (2024).
Article PubMed Google Scholar
Cagan, A. et al. Somatic mutation rates scale with lifespan across mammals. Nature 604, 517–524 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Yuan, R. et al. Genetic coregulation of age of female sexual maturation and lifespan through circulating IGF1 among inbred mouse strains. Proc. Natl Acad. Sci. USA 109, 8224–8229 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Chin, D. W. L. et al. Aged healthy mice acquire clonal hematopoiesis mutations. Blood 139, 629–634 (2022).
Article CAS PubMed PubMed Central Google Scholar
Adolfsson, J. et al. Upregulation of Flt3 expression within the bone marrow Lin−Sca1+c-kit+ stem cell compartment is accompanied by loss of self-renewal capacity. Immunity 15, 659–669 (2001).
Article CAS PubMed Google Scholar
Kiel, M. J. et al. SLAM family receptors distinguish hematopoietic stem and progenitor cells and reveal endothelial niches for stem cells. Cell 121, 1109–1121 (2005).
Article CAS PubMed Google Scholar
Challen, G. A., Boles, N. C., Chambers, S. M. & Goodell, M. A. Distinct hematopoietic stem cell subtypes are differentially regulated by TGFβ1. Cell Stem Cell 6, 265–278 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cabezas-Wallscheid, N. et al. Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15, 507–522 (2014).
Article CAS PubMed Google Scholar
Pietras, E. M. et al. Functionally distinct subsets of lineage-biased multipotent progenitors control blood production in normal and regenerative conditions. Cell Stem Cell 17, 35–46 (2015).
Article CAS PubMed PubMed Central Google Scholar
Takizawa, H., Regoes, R. R., Boddupalli, C. S., Bonhoeffer, S. & Manz, M. G. Dynamic variation in cycling of hematopoietic stem cells in steady state and inflammation. J. Exp. Med. 208, 273–284 (2011).
Article CAS PubMed PubMed Central Google Scholar
Busch, K. et al. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546 (2015).
Article ADS CAS PubMed Google Scholar
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
Article ADS CAS PubMed Google Scholar
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Williams, N. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022).
Article ADS CAS PubMed Google Scholar
Machado, H. E. et al. Diverse mutational landscapes in human lymphocytes. Nature 608, 724–732 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).
Article ADS CAS PubMed Google Scholar
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Bryder, D., Rossi, D. J. & Weissman, I. L. Hematopoietic stem cells: the paradigmatic tissue-specific stem cell. Am. J. Pathol. 169, 338–346 (2006).
Article CAS PubMed PubMed Central Google Scholar
Qin, P. et al. Integrated decoding hematopoiesis and leukemogenesis using single-cell sequencing and its medical implication. Cell Discov. 7, 2 (2021).
Article CAS PubMed PubMed Central Google Scholar
de Haan, G. & Van Zant, G. Dynamic changes in mouse hematopoietic stem cell numbers during aging. Blood 93, 3294–3301 (1999).
Article PubMed Google Scholar
Morrison, S. J., Wandycz, A. M., Akashi, K., Globerson, A. & Weissman, I. L. The aging of hematopoietic stem cells. Nat. Med. 2, 1011–1016 (1996).
Article CAS PubMed Google Scholar
Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).
Article PubMed PubMed Central Google Scholar
Koonin, E. V. Splendor and misery of adaptation, or the importance of neutral null for understanding evolution. BMC Biol. 14, 114 (2016).
Article PubMed PubMed Central Google Scholar
Florez, M. A. et al. Clonal hematopoiesis: mutation-specific adaptation to environmental change. Cell Stem Cell 29, 882–904 (2022).
Article CAS PubMed PubMed Central Google Scholar
Beura, L. K. et al. Normalizing the environment recapitulates adult human immune traits in laboratory mice. Nature 532, 512–516 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Matatall, K. A. et al. Chronic infection depletes hematopoietic stem cells through stress-induced terminal differentiation. Cell Rep. 17, 2584–2595 (2016).
Article CAS PubMed PubMed Central Google Scholar
Coombs, C. C. et al. Therapy-related clonal hematopoiesis in patients with non-hematologic cancers is common and associated with adverse clinical outcomes. Cell Stem Cell 21, 374–382.e4 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bolton, K. L. et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat. Genet. 52, 1219–1226 (2020).
Article CAS PubMed PubMed Central Google Scholar
Watson, C. J. et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 367, 1449–1454 (2020).
Article ADS CAS PubMed Google Scholar
Kimura, M. & Ohta, T. The average number of generations until fixation of a mutant gene in a finite population. Genetics 61, 763–771 (1969).
Article CAS PubMed PubMed Central Google Scholar
Kobayashi, M. et al. HSC-independent definitive hematopoiesis persists into adult life. Cell Rep. 42, 112239 (2023).
Article CAS PubMed PubMed Central Google Scholar
Abkowitz, J. L., Catlin, S. N., McCallie, M. T. & Guttorp, P. Evidence that the number of hematopoietic stem cells per animal is conserved in mammals. Blood 100, 2665–2667 (2002).
Article CAS PubMed Google Scholar
Dykstra, B. et al. Long-term propagation of distinct hematopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229 (2007).
Article CAS PubMed Google Scholar
Gros, P. & Casanova, J.-L. Reconciling mouse and human immunology at the altar of genetics. Annu. Rev. Immunol. 41, 39–71 (2023).
Article CAS PubMed Google Scholar
Lindsay, S. J., Rahbari, R., Kaplanis, J., Keane, T. & Hurles, M. E. Similarities and differences in patterns of germline mutation between mice and humans. Nat. Commun. 10, 4053 (2019).
Article ADS PubMed PubMed Central Google Scholar
Bergeron, L. A. et al. Evolution of the germline mutation rate across vertebrates. Nature 615, 285–291 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Gould, S. J., Lewontin, R. C., Maynard Smith, J. & Holliday, R. The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. Lond. B Biol. Sci. 205, 581–598 (1997).
Google Scholar
Cheshier, S. H., Morrison, S. J., Liao, X. & Weissman, I. L. In vivo proliferation and cell cycle kinetics of long-term self-renewing hematopoietic stem cells. Proc. Natl Acad. Sci. USA 96, 3120–3125 (1999).
Article ADS CAS PubMed PubMed Central Google Scholar
Wilson, A. et al. Hematopoietic stem cells reversibly switch from dormancy to self-renewal during homeostasis and repair. Cell 135, 1118–1129 (2008).
Article CAS PubMed Google Scholar
Bernitz, J. M., Kim, H. S., MacArthur, B., Sieburg, H. & Moore, K. Hematopoietic stem cells count and remember self-renewal divisions. Cell 167, 1296–1309.e10 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shepherd, B. E. et al. Hematopoietic stem-cell behavior in nonhuman primates. Blood 110, 1806–1813 (2007).
Article CAS PubMed PubMed Central Google Scholar
Yu, K.-R. et al. The impact of aging on primate hematopoiesis as interrogated by clonal tracking. Blood 131, 1195–1205 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chen, J., Astle, C. M. & Harrison, D. E. Genetic regulation of primitive hematopoietic stem cell senescence. Exp. Hematol. 28, 442–450 (2000).
Article CAS PubMed Google Scholar
Challen, G. A., Pietras, E. M., Wallscheid, N. C. & Signer, R. A. J. Simplified murine multipotent progenitor isolation scheme: establishing a consensus approach for multipotent progenitor identification. Exp. Hematol. 104, 55–63 (2021).
Article PubMed Google Scholar
Ahmad, A. et al. ERCC1-XPF endonuclease facilitates DNA double-strand break repair. Mol. Cell. Biol. 28, 5082–5092 (2008).
Article CAS PubMed PubMed Central Google Scholar
Nadon, N. L., Strong, R., Miller, R. A. & Harrison, D. E. NIA interventions testing program: investigating putative aging intervention agents in a genetically heterogeneous mouse model. EBioMedicine 21, 3–4 (2017).
Article PubMed Google Scholar
Bowie, M. B. et al. Hematopoietic stem cells proliferate until after birth and show a reversible phase-specific engraftment defect. J. Clin. Invest. 116, 2808–2816 (2006).
Article CAS PubMed PubMed Central Google Scholar
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021).
Article CAS PubMed Google Scholar
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Article PubMed Google Scholar
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kendall, D. G. Stochastic processes and population growth. J. R. Stat. 11, 230–282 (1949).
Article MathSciNet Google Scholar
Challen, G. A., Boles, N., Lin, K. K.-Y. & Goodell, M. A. Mouse hematopoietic stem cell identification and analysis. Cytometry A 75, 14–24 (2009).
Article PubMed PubMed Central Google Scholar
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Article ADS CAS PubMed Google Scholar
Durand, J.-B., Goncalves, P. & Guedon, Y. Computational methods for hidden Markov tree models-an application to wavelet trees. IEEE Trans. Signal Process. 52, 2551–2560 (2004).
Article ADS MathSciNet Google Scholar
Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9, 2586–2606 (2014).
Article CAS PubMed PubMed Central Google Scholar
Challen, G. A. & Goodell, M. A. Clonal hematopoiesis: mechanisms driving dominance of stem cell clones. Blood 136, 1590–1598 (2020).
CAS PubMed PubMed Central Google Scholar
Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016).
Article PubMed PubMed Central Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 7, 122 (2016).
Article Google Scholar
Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Feng, C. G., Weksberg, D. C., Taylor, G. A., Sher, A. & Goodell, M. A. The p47 GTPase Lrg-47 (Irgm1) links host defense and hematopoietic stem cell proliferation. Cell Stem Cell 2, 83–89 (2008).
Article CAS PubMed PubMed Central Google Scholar
Lerner, C. & Harrison, D. E. 5-Fluorouracil spares hemopoietic stem cells responsible for long-term repopulation. Exp. Hematol. 18, 114–118 (1990).
CAS PubMed Google Scholar
Dong, S. et al. Chaperone-mediated autophagy sustains haematopoietic stem-cell function. Nature 591, 117–123 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Camell, C. D. et al. Senolytics reduce coronavirus-related mortality in old mice. Science 373, eabe4832 (2021).
Article CAS PubMed PubMed Central Google Scholar
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chapman, M. S. et al. Clonal dynamics after allogeneic haematopoietic cell transplantation. Nature 635, 926–934 (2024).
Poon, G. Y. P., Watson, C. J., Fisher, D. S. & Blundell, J. R. Synonymous mutations reveal genome-wide levels of positive selection in healthy tissues. Nat. Genet. 53, 1597–1605 (2021).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

C.D.K. is supported by grant no. F30DK131638 and is a visiting scholar at the Wellcome Sanger Institute. The Goodell lab is supported by grants from the National Institutes of Health, including grant nos. AG036695, CA183252, CA237291, DK092883, 1P01CA265748, F30HD111129 (S.W.) and the Milky Way Research Foundation. J.N. is supported by a Cancer Research UK Advanced Clinical Fellowship, and work in the Nangalia lab is supported by Wellcome, Cancer Research UK, Alborada Trust, Blood Cancer UK, Leukemia and Lymphoma Society, and the WBH Foundation. L.J.N. is supported by NIH grant nos. AG063543 and AG056278. The Niedernhofer lab is supported by grant nos. AG063543 and AG063543-02S1. The Harrison lab is supported by grant no. 5U01AG022308. D. Le, M.A.F. and K.Y.K. were supported by grant nos. R35 HL155672 (K.Y.K.), R01 AI141716 (K.Y.K.), F31 HL154661 (D. Le), F31 HL156500 (M.A.F.) and a minority graduate fellowship from the American Society of Hematology (M.A.F.). K.N. is supported by the Wellcome Trust and CRUK. Adobe Stock library images used in illustrations were covered under a Baylor College of Medicine education license. We are grateful for the assistance of R. D. O’Kelly and M. Pierson in conducting NME experiments and the CASM Support team at the Wellcome Sanger Institute. We thank E. Laurenti, S. Loughran and A. R. Green for constructive discussions and J. T. Gebert and H. L. Chan for the critical feedback.

Author information

Authors and Affiliations

Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
Chiraag D. Kapadia, Sarah Waldvogel, Xiaoyan Zhang, Josephine De La Fuente, Rogelio Aguilar, Alejandra Martell, Anna Guzman & Margaret A. Goodell
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Nicholas Williams, Kevin J. Dawson, Kudzai Nyamondo, Sreeya Kodavali, Alex Cagan, Daniel Leongamornlert, Emily Mitchell, Krzysztof Sosnowski, Peter J. Campbell & Jyoti Nangalia
Early Cancer Institute, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Caroline Watson & Jamie Blundell
Institute on the Biology of Aging and Metabolism, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
Matthew J. Yousefzadeh & Laura J. Niedernhofer
Columbia Center for Translational Immunology, Columbia Center for Human Longevity, Department of Medicine, Columbia University Medical Center, New York, NY, USA
Matthew J. Yousefzadeh
Department of Pediatrics, Division of Infectious Diseases, Baylor College of Medicine, Houston, TX, USA
Duy Le, Marcus A. Florez & Katherine Y. King
Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
Kudzai Nyamondo, Sreeya Kodavali, Emily Mitchell, Krzysztof Sosnowski, Peter J. Campbell & Jyoti Nangalia
Departments of Genetics, Pathology & Veterinary Medicine, University of Cambridge, Cambridge, UK
Alex Cagan
Department of Haematology, University of Cambridge, Cambridge, UK
Emily Mitchell & Jyoti Nangalia
The Jackson Laboratory, Bar Harbor, ME, USA
David Harrison

Authors

Chiraag D. Kapadia
View author publications
Search author on:PubMed Google Scholar
Nicholas Williams
View author publications
Search author on:PubMed Google Scholar
Kevin J. Dawson
View author publications
Search author on:PubMed Google Scholar
Caroline Watson
View author publications
Search author on:PubMed Google Scholar
Matthew J. Yousefzadeh
View author publications
Search author on:PubMed Google Scholar
Duy Le
View author publications
Search author on:PubMed Google Scholar
Kudzai Nyamondo
View author publications
Search author on:PubMed Google Scholar
Sreeya Kodavali
View author publications
Search author on:PubMed Google Scholar
Alex Cagan
View author publications
Search author on:PubMed Google Scholar
Sarah Waldvogel
View author publications
Search author on:PubMed Google Scholar
Xiaoyan Zhang
View author publications
Search author on:PubMed Google Scholar
Josephine De La Fuente
View author publications
Search author on:PubMed Google Scholar
Daniel Leongamornlert
View author publications
Search author on:PubMed Google Scholar
Emily Mitchell
View author publications
Search author on:PubMed Google Scholar
Marcus A. Florez
View author publications
Search author on:PubMed Google Scholar
Krzysztof Sosnowski
View author publications
Search author on:PubMed Google Scholar
Rogelio Aguilar
View author publications
Search author on:PubMed Google Scholar
Alejandra Martell
View author publications
Search author on:PubMed Google Scholar
Anna Guzman
View author publications
Search author on:PubMed Google Scholar
David Harrison
View author publications
Search author on:PubMed Google Scholar
Laura J. Niedernhofer
View author publications
Search author on:PubMed Google Scholar
Katherine Y. King
View author publications
Search author on:PubMed Google Scholar
Peter J. Campbell
View author publications
Search author on:PubMed Google Scholar
Jamie Blundell
View author publications
Search author on:PubMed Google Scholar
Margaret A. Goodell
View author publications
Search author on:PubMed Google Scholar
Jyoti Nangalia
View author publications
Search author on:PubMed Google Scholar

Contributions

C.D.K., J.N. and M.A.G. designed the experiments. J.N. and M.A.G. supervised the project. C.D.K. performed cell sorting and in vitro culture with support from S.W., X.Z., R.A., A.M., A.G., E.M., S.K. and K.S. C.D.K. performed genomic, phylogenetic, signature and population dynamics analyses with support from N.W., K.J.D., D. Leongamornlert, J.D.L.F., E.M., P.J.C., M.A.G. and J.N. N.W. developed hidden Markov modelling, and K.J.D. performed population dynamic inferences. A.C. and K.N. prepared colonic crypt microdissections. S.K. and K.S. assisted with colony culture. C.D.K. performed mouse experiments with advice and assistance from M.J.Y., S.W., K.N., A.C., D. Le, M.A.F., R.A., A.M., A.G., D.H., K.Y.K. and L.J.N. C.W. and J.B. developed the population genetic analyses of clone sizes and parameter inferences in Fig. 5. C.D.K., J.N. and M.A.G. wrote and edited the manuscript. All authors reviewed and edited the manuscript.

Corresponding authors

Correspondence to Margaret A. Goodell or Jyoti Nangalia.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Cynthia Dunbar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Cell isolation strategy and quality control.

a) Sorting strategy for single HSCs and MPPs from young and aged mice. Progenitor-enriched bone marrow was stained as described in the Methods, and then single cells were sorted into individual wells for in vitro expansion. b) Colony-forming efficiency of sorted HSCs and MPPs for each sample. Each bar represents the listed cell type and underlying sample ID. c) Variant allele fraction (VAF) distribution of all variants within a colony that pass filtering, shown for a representative clonal colony that passed sample QC (left) and a non-clonal colony that passed sample QC (right). After variant filtration, the VAF distribution of a colony’s variants is centred around 50% in clonal colonies, but less than 50% in non-clonal colonies. d) Representative image of two colonic crypts isolated by laser capture microdissection. e) Correlation between total single base substitution burden and depth, for all colonies from sample M7180, shown before (left) and after (right) sequencing depth correction. Shaded area denotes 95% confidence interval. f) Trinucleotide spectra from aggregated somatic mutations mapped to shared (truncal) or private branches of phylogenetic trees. Signatures are highly similar, suggesting artefacts are not relatively enriched in either portion of reconstructed trees.

Extended Data Fig. 2 Additional phylogenetic trees from young and aged mice.

Phylogenies for a-b) 2 additional young (3-month) mice and c-d) 2 additional aged mice (30-month), presented as described in Fig. 2.

Extended Data Fig. 3 Early-in-life phylogenetic patterns and cross-tissue mutations.

Phylogenies from aged (left) and young (right) HSCs zoomed into the first 12 mutations molecular time. Polytomies in the branching structure, which represent cell division without mutation acquisition, are enriched among early-in-life cell divisions at the tops of the phylogenies. Variants shared with matched colonic crypts are layered onto the trees as pie charts. Pie chart fullness represents the proportion of colonic crypts in which the mutation present on the haematopoietic phylogeny was observed. Sample M7183 lacked sufficient early life diversity (<10 unique lineages within 12 mutations molecular time) and was excluded.

Extended Data Fig. 4 Mutational processes in murine stem cells.

a) Signature extraction overview. Trinucleotide spectra from all single-base substitutions (SBS) (top), were used for signature extraction as described in the Methods. Three signatures identified as SBS1, SBS5, and SBS18 best described the catalogue of mutations observed (cosine similarity=0.997). b) Linear mixed-effect regression of signature-specific mutation burdens observed in colonies. Shaded areas indicate the 95% confidence interval. c) Signature attribution in phylogenies. Individual branches of HSC phylogenies are overlaid with signature contribution proportions. SBSs assigned to each branch were fit to SBS1, SBS5 or SBS18. d) Signature-specific mutation accumulation in all branches across phylogenies. Early-life branchpoints, located at the top of a given phylogeny, and shown as an inset.

Extended Data Fig. 5 Phylogeny comparison between aged human and mouse.

a) Representative ultrametric phylogenies from the three oldest humans described in Mitchell et al.⁵. The published trees have been randomly downsampled to 100 colonies (tips). b) Aged mouse phylogenies, also downsampled to 100 colonies, to allow comparison of topological structure. The median lifespan for human and mouse species are labelled and were derived as described in Supplementary Note 1. Full murine phylogenetic trees are shown in Fig. 2a,b and Extended Data Fig. 2.

Extended Data Fig. 6 Approximate Bayesian inferences.

Results from approximate Bayesian computation (ABC) inference of a) population size ($N$), b) symmetric division rate per week ($\lambda $), and c) death rate per week ($\nu $) for the three 30-month-old mice. Blue lines represent the prior density of parameters; red lines represent the posterior densities. Median posterior density estimates and 95% credibility intervals are displayed for each parameter per sample. The prior density for the death rate was bounded to ensure the growth rate ($\lambda -\nu $) remained positive, as observed in phylodyn trajectories in Fig. 3. d) Joint density distributions indicating optimal parameters of population size and division rates that explain observed phylogenetic trees. The estimated N/λ, in HSC-years, is shown with 95% credibility intervals. Data from the three aged mice are shown.

Extended Data Fig. 7 Extended phylogenetic trees including early progenitors.

a-c) Extended phylogenies for three 30-month mice using the pattern of sharing of somatic mutations among HSCs (blue), MPPs (red), and the mixed LSK (Lineage-, Sca1+, c-kit+) haematopoietic progenitor compartment. The LSK compartment contains HSCs and MPP, and additionally contains the myeloid-biased MPP^GM (orange) and lymphoid-biased MPP^Ly populations (green). LSK subcompartments were identified at time of single cell sorting using a consensus definition⁵¹. Each tip represents a single colony. Branch lengths represent mutation numbers. d-e) Clade mixing metrics for MPP^GM and MPP^Ly colonies used to evaluate interrelatedness with HSC and MPP. HSC, MPP and MPP^GM or MPP^Ly were designated as being in the same clade if they shared a most recent common ancestor after 25 mutations, corresponding to early foetal development. Only clades with more than 3 colonies are considered. The vertical bar reflects the average clade mixing metric observed in the constructed phylogenies, while distributions reflect the average clade mixing metric expected random chance, estimated by reshuffling the tip states. If the observed value (vertical bar) significantly deviated from random chance (filled distribution), then there would be minimal overlap between the observed data and the random reshuffling distribution. The average clade mixing metric for MPP^GM compared to HSCs (blue) and MPPs (red) is shown in d). The similar measure of interrelatedness of MPP^Ly to HSCs and MPPs is shown in e). All one tailed significance values in d) and e) were p > 0.05 and were derived from the rank of the observed metric in the corresponding reshuffled distribution.

Extended Data Fig. 8 Mutation overlap between phylogenies and peripheral blood.

a) Phylogenies for three aged mice (as described in Extended Data Fig. 7a–c) constructed to only include private branches targeted with the peripheral blood baitset. Branch shading indicates the maximum VAF among branch-specific variants captured in peripheral blood. The sampled cell immunophenotype is indicated by dot colour at the bottom of each private branch. b) VAF trajectories of HSC and MPP variants shared in peripheral blood. The aggregate VAF across molecular time is calculated using Gibbs sampling (Methods). Earlier molecular time corresponds to further in the ancestral past. Shaded regions denote 95% confidence intervals of VAF estimates.

Extended Data Fig. 9 Peripheral blood VAF of variants shared with HSCs and MPPs.

Baitset mutation-specific HSC and MPP phylogenies are shown for each 30-month mouse. Each branch shows mutations that were detected in peripheral blood in descending VAF order. On each branch, a row denotes a single variant mapped to that specific branch. Red fill denotes the peripheral blood VAF for the variant. VAF is denoted on a log scale from 10-5 to 1; internal divisions are marked from left to right at VAF 0.0001, 0.001, 0.01, and 0.1. HSC trees are shown on the left with blue dots at terminal branches; MPP trees are shown on the right with red dots. Trees are downsampled to allow equivalent comparison between HSC and MPP branches. Only variants seen in peripheral blood with a depth > 100X are shown.

Extended Data Fig. 10 Haematopoietic perturbation modulates selection landscapes.

Clonal haematopoiesis prevalence in aged mice following a) normalized microbial experience (NME), b) M. avium infection, c) cisplatin treatment, and d) 5-FU myeloablation. At final sampling, aged mice were 30 months old for the NME experiments in panel a), and were 25 months old for the perturbation experiments in panels b), c), and d). Enrichment of clonal prevalence (measured using a two-sided Mann-Whitney test) and dN/dS ratios departing from parity (q < 0.1) following treatment are shown for each gene. Within b), significance values for Tet2, Bcor, and Asxl1 are 0.047, 0.046, 0.046, respectively. Within c) significance values for Tet2, Asxl1, and Trp53 are 0.025, 0.049, 0.037, respectively. Within d), significance values for Cux1 and Bcor are 0.022 and 0.007, respectively. Survival curves and experimental endpoint blood counts are displayed for b) and c), using log-rank and two-sided t tests, respectively. For blood count data, bar height denotes the mean among samples (individual dots), error bars denote SEM, and each data point is a biologically independent animal aggregated from 2 independent treatment cohorts. * denotes p-value < 0.05 and ** denotes p-value < 0.01, with no correction for multiple hypothesis testing. Treatment schedules are as displayed or described in Methods.

Supplementary information

Supplementary Information

Supplementary Notes 1–4, Figs. 1–14, Tables 1–3 and references.

Reporting Summary

Peer Review file

Supplementary Data 1

List of coding variants detected in phylogenetic trees and associated dn/ds output table.

Supplementary Data 2

Duplex target panel and all variants detected by duplex sequencing.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kapadia, C.D., Williams, N., Dawson, K.J. et al. Clonal dynamics and somatic evolution of haematopoiesis in mouse. Nature 641, 681–689 (2025). https://doi.org/10.1038/s41586-025-08625-8

Download citation

Received: 02 November 2023
Accepted: 10 January 2025
Published: 05 March 2025
Version of record: 05 March 2025
Issue date: 15 May 2025
DOI: https://doi.org/10.1038/s41586-025-08625-8

This article is cited by

Clonal hematopoiesis in myeloid malignancies and solid tumors
- Xiurong Cai
- Robert L. Bowman
- Jennifer J. Trowbridge
Nature Cancer (2025)
Haematopoietic ageing in health and lifespan
- Rebecca Andersson
- Eva Mejia-Ramirez
- Maria Carolina Florian
Nature Cell Biology (2025)
Age-related blood condition counteracted with a common diabetes drug
- Elmira Khabusheva
- Margaret A. Goodell
Nature (2025)
How Faithful Flt3-Cre Mouse is to Hematopoietic Lineage Tracing?
- Jinglei Zhai
- Xinying Li
- Hideo Ema
Stem Cell Reviews and Reports (2025)