Abstract
Robertsonian chromosomes are a type of variant chromosome that is commonly found in nature. Present in 1 in 800 humans, these chromosomes can underlie infertility, trisomies and increased cancer incidence1,2,3,4,5. They have been recognized cytogenetically for more than a century6, yet their origins have remained unknown. Here we describe complete assemblies of three human Robertsonian chromosomes. We identified a common breakpoint in SST1, a macrosatellite DNA located on chromosomes 13, 14 and 21, which commonly undergo Robertsonian translocation. SST1 is contained within a larger shared homology domain7 that is inverted on chromosome 14, which enables a meiotic crossover event that fuses the long arms of two chromosomes. Robertsonian chromosomes have two centromeric DNA arrays and have lost all ribosomal DNA. In two cases, we find that only one of the two centromeric arrays is active. In the third case, both arrays can be active but owing to their proximity, they are often encompassed by a single outer kinetochore. Thus a combination of array proximity and epigenetic changes in centromeres facilitates the stable propagation of Robertsonian chromosomes. Investigation of the assembled genomes of chimpanzee and bonobo highlights that the inversion on chromosome 14 is unique to the human genome. Resolving the structural and epigenetic features of human Robertsonian chromosomes at a molecular level provides a foundation for a broader understanding of the molecular mechanisms of structural variation and chromosome evolution.
Similar content being viewed by others
Main
Robertsonian chromosomes (ROBs) are structurally variant chromosomes created by the fusion of two telocentric or acrocentric chromosomes to make a single metacentric chromosome. First recognized in 1916 by Robertson in grasshopper karyotypes6, these fusions are a common occurrence in nature, having since been recognized in many branches of life including plants, vertebrates and invertebrates. Robertsonian fusions (or translocations) are the most common karyotypic change in mammals8. ROBs create challenges for meiosis, potentially leading to subfertility, reproductive isolation and speciation9,10. Human ROB carriers are often asymptomatic, but ROBs contribute to trisomies such as Down and Patau syndromes5 and are associated with increased rates of certain cancers4 and uniparental disomy11. Despite their frequent occurrence and important effects on fertility, speciation and human health, the underlying mechanisms and evolutionary origins that explain why these chromosomes form so frequently in nature remain unknown.
In humans, ROBs occur in 1 out of 800 births1,2,3, most commonly in female meiosis12,13. The most common ROBs are the fusion of acrocentric chromosome 14 with chromosome 13 or chromosome 21, which are suspected to form by a similar specific mechanism14. In these fusions, the long arms of the chromosomes are joined and parts of the short arms are lost. ROBs can occur de novo or be inherited15. A population-level analysis7 of the recent complete assembly of the human genome (CHM13)16, which includes the short arms of the acrocentric chromosomes, revealed megabase-sized homology blocks called pseudo-homologue regions (PHRs) that are shared between acrocentric chromosomes. The existence of PHRs implies frequent interhomologue recombination and a working model for how ROBs form17.
Here we fully assemble three common ROBs: two chromosome 13–chromosome 14 (13;14) fusions and one 14;21 fusion. In all three cases, the ROBs have two centromere arrays, have lost all ribosomal 45S DNA repeats and are fused near a macrosatellite array composed of SST1 repeats. The repeats are named for the restriction enzyme that cuts them, isolated from the eponymous Streptomyces stanford. We demonstrate that the SST1 arrays on ribosomal DNA (rDNA)-containing chromosomes (also known as NBL2)18 bear hallmarks of exchange between chromosomes, including enrichment for PRDM9 DNA binding motifs that are associated with recombination hotspot activity19. Further analysis of SST1 repeats and segmental duplications in human and nonhuman primate genomes suggests that SST1 may have a broader role in genome rearrangements and evolution beyond Robertsonian fusions. Additionally, analysis of the two centromeric arrays on each ROB indicates epigenetic adaptations to support stable mitotic transmission, consistent with previous observations20,21. In conclusion, we provide, to our knowledge, the first complete assemblies of common ROBs, precise mapping of their fusion sites, and insight into their formation and transmission mechanisms.
A working model for the formation of common ROBs postulated that they occur owing to: (1) sequence homology between nonhomologous chromosomes, provided by the PHRs; (2) proximity of the PHRs, provided by co-location of rDNA arrays from different chromosomes in the nucleolus; (3) recombination initiation in meiosis to form a crossover; and (4) an inversion on chromosome 14 (refs. 7,17,22) (Fig. 1a). The asymmetric structure and chromosome-unique features around the SST1 arrays suggested that we would be able to map the translocation site (Fig. 1b). To further test this working model, we sequenced and assembled the complete, telomere-to-telomere sequence of three ROBs.
a, Working model for the dependence of ROB formation on recombination between SST1 repeats (pink triangles) located in PHRs (coloured bands) on the short arms of human chromosomes 13, 14 and 21. The adjacent 45S rDNA arrays facilitate 3D proximity by co-locating in nucleoli. b, Schematic representation of the main SST1 arrays and flanking sequences in acrocentric chromosomes from the CHM13 genome. This region is similar on chromosomes 13 and 21 and is inverted on chromosome 14. c–e, Representative images of ROBs from GM03786 (c), GM04890 (d) and GM03417 (e) cell lines. Left image, chromosome labelled with an SST1 probe (magenta) and whole chromosome paints as indicated. Centre image, chromosome labelled with an SST1 probe (magenta) and centromeric satellite probes for cen14/22 (orange) and cen13/21 (green). DNA was counterstained with DAPI. Right image, magnified view showing SST1 localization between the two centromere arrays. Scale bar, 1 µm. Right, averaged intensity profiles of lines drawn through the centromeres of multiple ROBs (GM03786: n = 10, GM04890: n = 20, GM03417: n = 11). Intensity profiles were aligned to the peak of the Gaussian of the SST1 signal and normalized to the maximum intensity of each channel. Error bars denote s.d. Bottom, synteny plots comparing the assembled ROB to the CHM13 genome sequence. The structure of each fused region is shown in detail. a.u., arbitrary units.
We selected three independent cell lines for sequencing and assembly, each harbouring a unique ROB: (1) GM03417 (45 or 46,XX,t(14;21); clinical features of Down syndrome); (2) GM04890 (45,XX, t(13;14); clinically normal with five miscarriages); and (3) GM03786 (45,XX,t(13;14); clinically normal). The fusions were initially confirmed in mitotic spreads using chromosome-specific paints (Fig. 1c–e). We generated Oxford Nanopore Technologies (ONT), Hi-C, Illumina and PacBio HiFi sequencing data (Supplementary Table 1). The Verkko assembler utilized ONT, PacBio and Hi-C data to generate complete de novo assemblies of the ROBs23. Each ROB was visible in the assembly graph by a node connecting regions representing two different acrocentric chromosomes (Extended Data Figs. 1–3). Notably, these connecting nodes skipped the rDNA arrays, consistent with the working model and supporting the loss of rDNA in the common ROBs.
We assessed the quality of the assemblies (see Methods), finding high Phred quality scores (quality values) ranging from 49.10 to 54.70, corresponding to approximately 99.999% accuracy, and high gene completeness, with all assemblies containing over 98% of benchmarking universal single-copy orthologues24 (BUSCOs) (Supplementary Fig. 1). Read coverage analysis across the breakpoint regions shows consistent coverage without significant drops or high-frequency secondary alleles, confirming the structural accuracy of the assembled fusion points (Supplementary Fig. 2). Hi-C data, which capture the three-dimensional organization of chromosomes, showed increased interactions between the q arms of each ROB, emphasizing the physical proximity of these regions due to the fusion (Supplementary Fig. 3), which was also observed by microscopy in nuclei (Extended Data Fig. 4).
SST1 macrosatellite arrays are present on several chromosomes25. Multiple subfamilies of SST1 arrays exist. One subfamily occurs in the PHRs of chromosomes 13, 14 and 21, whose population genetic and structural features have led to the hypothesis that they are the site of recombination in recurrent Robertsonian fusions7. Using cytogenetics, we probed centromeres of chromosomes 13 and 21 (cen13/21), cen14/22 and SST1 and 45S rDNA arrays. Cen13/21 and cen14/22 are highly similar at the sequence level and cannot be distinguished by conventional fluorescence in situ hybridization (FISH) probes. Our analysis confirmed that all three ROBs are missing 45S rDNA arrays and that the SST1 signal is present between centromeric arrays (Extended Data Fig. 4), consistent with the working model for the formation of common ROBs and the assembly graphs. Karyotypic analysis of GM03417 indicates cellular heterogeneity with regard to the number of copies of chromosome 21, with 3 copies of chromosome 21 being relatively rare (around 10%) (Extended Data Fig. 4). Notably, the normal copy of chromosome 14 in GM03786 has lost the SST1 array (Extended Data Figs. 2 and 4), suggesting that polymorphisms in this region exist. Analysis of 26 contigs from the human pangenome that extend beyond the SST1 array have a deletion in 9 out of 26 haplotypes (Extended Data Fig. 5).
To visualize the assemblies, we created synteny plots (Fig. 1c–e) using NGenomeSyn26. SST1 arrays are present on chromosomes 13, 14 and 21, but are inverted on chromosome 14 relative to chromosomes 13 and 21 in many genomes7. Owing to this inversion, when this region of chromosome 14 pairs with a 13 or 21 partner chromosome in meiosis, a crossover event is predicted to join the two long arms, forming a ROB. When we compared the assembled ROBs to their component chromosomes in the CHM13 genome, we observed that the sequence order and directionality were syntenic on either side of the SST1 region. This pattern is consistent with crossover within the SST1 array generating each of the assembled ROBs. Sequence asymmetry in PHRs on chromosomes 13 and 21 versus chromosome 14, which includes a partial SST1 monomer, allowed us to identify that the SST1 array was the breakpoint (Extended Data Fig. 6). We speculate that haplotypes of chromosome 14 that are missing this region would form ROBs less readily.
SST1 and interchromosomal exchange
To better understand the evolutionary relationships and potential exchange between SST1 arrays, we conducted a phylogenetic analysis of SST1 monomers derived from the HG002 and CHM13 genomes. The analysis revealed three distinct subfamilies. Subfamily 1 (sf1), also known as NBL2, consists of monomers primarily derived from large arrays on the acrocentric chromosomes and forms a single branch in the phylogenetic tree. Subfamily 2 (sf2) comprises monomers mainly from autosomal arrays (for example, chromosomes 4, 17 and 19) and forms chromosome-specific branches. Subfamily 3 (sf3), also referred to as TTY2, is composed of monomers primarily originating from arrays on the Y chromosome. This classification of subfamilies and their distinct characteristics are illustrated in Fig. 2a,c and Extended Data Fig. 6. The Y-derived repeats co-cluster with a few SST1 monomers from autosomes (see more below). The co-clustering of sf1 monomers from the large arrays on the acrocentric chromosomes due to their high sequence identity (approximately 99% compared with around 90% for sf2 and 75% for sf3) is consistent with frequent recombinational exchange between these repeats, leading to concerted evolution of their constituent monomers.
a, All SST1 monomers from previous analysis of CHM137 and from HG002 were collected and phylogenetic analysis was performed using the maximum-likelihood method based on the best-fit substitution model (Kimura two-parameter + G, parameter = 5.5047) inferred by Jmodeltest2 with 1,000 bootstrap replicates. Bootstrap values higher than 75 are indicated at the base of each node. The colour indicates the source chromosome and the shape indicates the source genome. Three major subfamilies were identified: sf1, primarily on the acrocentrics (Acros); sf2, primarily on the remaining autosomes; and sf3, primarily on the Y chromosome. Black arrows indicate the location on the phylogenetic trees of sf2 monomers S and L from the acrocentric chromosomes (Fig. 1b). b, Predicted PRDM9 DNA binding site frequency (mean sites per kb, each dot indicates one haplotype) in SST1 arrays in n haploid genomes are plotted by chromosome. ANOVA analysis with the two-sided Tukey–Kramer test for pairwise mean comparisons. c, Schematic representation of the three subfamilies of SST1. SST1-sf1 has a central gap and a predicted PRDM9 DNA binding site (red box). d, A segmental duplication of 27 kb or larger was identified on several autosomes in CHM13 that includes Y-like α-satellite DNA (α-sat) and SST1-sf3. Phylogenetic analysis was performed using the maximum-likelihood method and general time reversible (GTR)-plus-gamma substitution parameters. Bootstrap values are shown. e, Comparison of overlaps between segmental duplications (SDs; ≥10 kb) and random regions or SST1 monomers across 147 genomes. Distributions show the number of overlaps versus density. A permutation test with 10,000 iterations per genome was used to generate random region overlaps. The significant difference between distributions (Wilcoxon signed-rank test, paired, two-sided) indicates non-random association between segmental duplications and SST1 regions.
The recurrent involvement of the SST1 array in ROB formation shown here and the concerted evolution of SST1 arrays on chromosomes 13, 14 and 21 (ref. 7) suggest that this array may comprise a meiotic recombination hotspot. To investigate this, we focused on PRDM9, a key protein in meiosis that has a crucial role in defining recombination hotspots. PRDM9 contains a variable number of zinc-fingers that allow it to bind to specific DNA sequences. Additionally, it possesses a histone methyltransferase domain that trimethylates histone H3 at lysine 4 (H3K4) and lysine 36 (H3K36)27,28. This trimethylation of H3K4 and H3K36 by PRDM9 creates an open chromatin environment that favours recombination events during meiosis19. Previous work suggested that SST1 arrays on chromosomes 13, 14 and 21 in the CHM13 genome contain predicted PRDM9 binding sites7. To further examine PRDM9 binding sites within SST1 arrays, we searched for the 13-bp PRDM9 DNA binding motif29 of the common A allele30 in the SST1 arrays in 147 human genomes (Methods). The density of PRDM9 DNA binding motifs is significantly higher in the SST1 arrays on chromosomes 13, 14 and 21 relative to chromosomes 4, 17, 18 and Y (Fig. 2b). Whereas previous analysis used a collection of PRDM9 binding motifs to identify sites across acrocentric p-arms7, our current analysis uses only the high-confidence 13-bp PRDM9 DNA binding motif, which appears at a lower frequency in rDNA arrays.
PRDM9 activity leads to the erosion of its own binding sites19. PRDM9 sites in SST1 arrays should erode on all three acrocentric chromosomes. We speculate that PRDM9 sites within the SST1 arrays can regenerate via intrachromosomal or interchromosomal exchange events. However, the inversion on chromosome 14 may create a barrier for interchromosomal gene conversion, tipping the relative balance of erosion and regeneration toward erosion and resulting in the observed lower density of sites on this chromosome. The significantly higher density of PRDM9 DNA binding motifs in SST1 arrays on acrocentric chromosomes aligns with our finding that the consensus sequence of their monomers contains a predicted PRDM9 DNA binding motif (Fig. 2c and Extended Data Fig. 7). These findings suggest that the predominant subfamily of SST1 repeats on acrocentric chromosomes (sf1) creates a sequence context that is permissive to meiotic recombination.
Nearly 100 PRDM9 alleles have been identified across human populations30. Some differ by a single nucleotide; others vary substantially by copy number and organization of their zinc-finger motifs. PRDM9 alleles can influence the activity of meiotic recombination hotspots. We reasoned that PRDM9 allele variation could influence the likelihood of ROB formation, so we examined the genotype of our three ROB cell lines. GM3417 is A/A, GM3786 is A/L24 and GM4890 is L24/L9. A is the most common allele, but L24 and L9 are rare, being present in only 1.66% of individuals in the previous study (12 out of 720), although these alleles are overrepresented in an ethnically southern European population30 (9 out of 109). No conclusions can be drawn regarding PRDM9 alleles and ROB formation at this time, as we lack information on vertical transmission and do not have enough samples to test for an association.
Further evidence for SST1-mediated interchromosomal recombination comes from two key observations. First, the phylogenetic analysis revealed the presence of Y-like SST1 monomers (sf3) on multiple autosomes (7, 9, 12, 17 and 20). These monomers are contained within larger sequence blocks ranging in size from 25–50 kb with 95–99% identity that also include Y-like α-satellite DNA (Fig. 2d). This pattern suggests that these blocks originated from the Y chromosome, although we could not identify syntenic blocks on the human Y chromosome. Second, we examined the association between SST1 arrays and segmentally duplicated regions, defined as regions longer than 10 kb with at least 90% identity at two or more locations in the genome. Segmental duplications arise and persist via recombination facilitated by repetitive DNA shared between chromosomes. To quantify the association between sf1, sf2 and segmental duplications, we conducted a permutation test with 10,000 iterations in 147 genomes (Methods). The results suggest that the observed association between SST1 and segmental duplications is unlikely to occur by random chance, and the association is highest with sf1 (Fig. 2e). Together, these findings suggest that, beyond its role in recurrent Robertsonian translocations, SST1 may be associated with interchromosomal recombination throughout the genome.
SST1 and rDNA in chimpanzee and bonobo genomes
Tandem arrays of the SST1 macrosatellite are not unique to humans; they also exist in nonhuman primate genomes, including several recently assembled telomere-to-telomere (T2T) genomes31. In gibbon, SST1 arrays have been suggested to be responsible for evolutionary patterns of genome instability. As chimpanzees (Pan troglodytes) and bonobos (Pan paniscus) are the most closely related species to humans, we examined the acrocentric chromosomes in their genomes32 for the position and orientation of rDNA and SST1 arrays to understand the potential for ROB formation, specifically, and chromosome evolution more generally. Two out of five rDNA array-positive acrocentric p arms in the bonobo genome (hsa14 and hsa22) and all five rDNA array-positive acrocentric p arms in the chimpanzee genome (hsa13, hsa14, hsa18, hsa21 and hsa22) have co-oriented SST1 arrays (Fig. 3a). The location of SST1 arrays on these chromosomes was validated by FISH (Supplementary Fig. 4).
a, Ideograms of all the rDNA array-bearing chromosomes in human, chimpanzee and bonobo, annotated with the human numbering system (indicated by Hsa prefix). The directionality of 45S rRNA gene arrays (grey) and SST1 arrays (coloured bars) are indicated with arrowheads. b, Predicted PRDM9 binding sites were identified in the chimpanzee genome, and the number of sites per kb is plotted for SST1 arrays for the indicated subfamily. Random regions of the genome (randBins and randGC (GC-matched random regions)) were used to determine background. c, All SST1 monomers from the chimpanzee genome were subjected to phylogenetic analysis using the maximum-likelihood method. The colour indicates the source chromosome. The SST1 monomers from Hsa13, Hsa14, Hsa18, Hsa21 and Hsa22 chromosomes form a single branch, indicating a high degree of similarity. d, All SST1 monomers from the bonobo genome were subjected to phylogenetic analysis using the maximum-likelihood method. The SST1 monomers from chromosomes 14 and 22 form a single branch, indicating a high degree of similarity. e, SST1 monomers from human (Hs), chimpanzee (Pt) and bonobo (Pp) were subjected to phylogenetic analysis using the maximum-likelihood method. The three subfamilies are apparent.
To investigate the potential role of PRDM9 in SST1-mediated recombination, we examined the frequency of predicted PRDM9 DNA binding sites33,34 in SST1 repeats in the chimpanzee genome. The density is high on the rDNA array-positive chromosomes but lower on the other chromosomes (Fig. 3b). Of note, although the SST1 arrays on hsa15 in the chimpanzee genome are composed of sf1 monomers, these monomers are not enriched for predicted PRDM9 DNA binding sites. This is consistent with the monophyletic state of SST1 monomers from hsa15 in chimpanzees and bonobos (Fig. 3c,d). Hsa15 does not have an rDNA array, and its resident SST1 appears not to recombine with SST1 monomers on other chromosomes with rDNA arrays. Together, our findings suggest that further analysis of SST1 and rDNA arrays will provide insight into the recombination and evolution of karyotypes in primate genomes.
To probe the evolutionary history of SST1 arrays and their potential role in chromosomal rearrangements across closely related species, we conducted a comparative phylogenetic analysis of SST1 monomers from rDNA array-positive chromosomes in chimpanzee and bonobo. The analysis revealed patterns similar to the human genome. In the chimpanzee genome, monomers from Hsa13, Hsa14, Hsa18, Hsa21 and Hsa22 co-cluster, whereas in the bonobo genome, monomers from Hsa14 and Hsa22 co-cluster (Fig. 3c,d). These findings provide evidence for interchromosomal exchange between SST1 monomers on rDNA array-positive chromosomes in both chimpanzee and bonobo genomes, suggesting that this phenomenon is not unique to humans but is a shared feature among great apes. When we performed a combined phylogenetic analysis of monomers from chimpanzee, human and bonobo, we observed that the acrocentric chromosomes from all three species contain SST1-sf1 monomers (Fig. 3e). The sf3 monomers from the Y chromosome also co-cluster. Monomers from other parts of the genome form a separate branch for sf2.
The inverted orientation of the SST1 array on the p arm of chromosome Hsa15 in both chimpanzee and bonobo genomes resembles the configuration of the SST1 array on chromosome 14 in the human genome (Fig. 3a). However, the SST1 monomers from Hsa15 form a chromosome-specific branch in the phylogenetic trees (Fig. 3c–e), indicating a lack of recombination between SST1 monomers on Hsa15 and other chromosomes in the ape genomes. This observation correlates with the absence of rDNA on Hsa15 in the chimpanzee and bonobo, which we speculate brings the SST1-containing regions in proximity to the nucleolus, facilitating interchromosomal exchange. Consequently, the chimpanzee and bonobo genomes lack the specific placement and orientation of rDNA and SST1 arrays that we hypothesize facilitate the formation of human ROBs, suggesting that this structural arrangement is unique to the human genome. Correspondingly, the presence of the rDNA and absence of SST1 on human chromosome 15 suggest that the ancestral Hsa15 of human, chimpanzee and bonobo had both rDNA32 and SST1 arrays. The greater sequence divergence of chromosome 15 from other rDNA array-positive chromosomes in humans highlights the importance of SST1 for maintaining their sequence similarity7.
The association between SST1 and interchromosomal recombination events is further supported by the identification of regions that are syntenic with the segmental duplication shown in Fig. 2d across several nonhuman primate genomes (Supplementary Fig. 5a). This finding suggests that these duplication events occurred in a common ancestor. Moreover, we identified a second segmental duplication on chromosome 16 in primates (hsa16), which exhibits a different structure but also contains Y-like SST1 and α-satellite sequences derived from cenY of the common ancestor of apes and New World monkeys (Supplementary Fig. 5b). These findings provide compelling evidence that SST1 is associated with interchromosomal recombination in primate genomes and that past events have involved the Y chromosome as a donor. Notably, the ancestral version of the Y chromosome was likely to contain rDNA35.
Centromere activity on ROBs
All ROBs examined in this study have two centromeric DNA arrays, roughly 5 Mb apart, yet they are faithfully transmitted through mitosis. This observation, consistent with previous cytogenetic observations21, suggests that epigenetic alterations to the centromeres might occur to enable correct chromosome transmission. To investigate centromere activity in ROBs, we leveraged both microscopy and genomic data. To analyse ROBs by microscopy, we performed immunolabelling with FISH (immunoFISH) using FISH probes targeting cen13/21 and cen14/22, along with antibodies against CENP-C, an inner kinetochore protein that marks the active centromere, and CENP-B, which binds to the 17-bp CENP-B box present in α-satellite DNA except on cenY. CENP-B was present broadly across α-satellite DNA, consistent with previous reports for a dicentric chromosome36. Confocal fluorescence microscopy (Supplementary Fig. 6) revealed that the CENP-B signal was proportional to the size of the centromeric array determined by the assembly (Supplementary Fig. 7).
To gain more detailed insights into centromere activity in ROBs, we used structured illumination microscopy (SIM) and single-particle averaging (Methods) to evaluate the localization of CENP-C in the two t(13;14) fusion chromosomes, as it provides higher resolution than confocal microscopy. CENP-C signal was confined to cen14 (Fig. 4a,b), suggesting that cen14 is the active centromere, whereas cen13 is inactive. This observation is consistent with a previous study that found cen14 to be the active centromeric array in most 13;14 dicentrics21. The active array did not depend strictly on size, as in GM04890 cen14 was smaller than cen13 (Supplementary Fig. 7). By contrast, the CENP-C signal for the t(14;21) fusion was less binary. CENP-C signal overlapped with both cen14 and cen21 within a single chromosome (Fig. 4c), and the pattern exhibited heterogeneity between chromosomes, suggesting that both arrays may remain active. Similar to CENP-C, two CENP-A signals were sometimes detected on the t(14;21) ROB (Extended Data Fig. 8). Imaging of NDC80 revealed that these two signals were often encompassed within a single outer kinetochore signal (Extended Data Fig. 8), indicating only one attachment site for microtubules, which would help prevent erroneous attachments.
a–c, ImmunoFISH, DNA methylation and CENP-A CUT&Tag analyses were performed for GM03786 (a), GM04890 (b) and GM03417 (c). Left, representative SIM images of ROBs labelled by immunoFISH with centromeric satellite probes for cen14/22 (orange), cen13/21 (green) and CENP-C antibody (red). DNA was counterstained with DAPI. Bottom images, magnified views depicting single CENP-C foci on cen14 in GM03786 (a) and GM04890 (b), and double CENP-C foci on cen21 and cen14 in GM03417 (c). Scale bars, 1 µm. Bottom left, averaged intensity profiles of lines drawn through the individual kinetochore regions of sister chromatids of multiple ROBs. GM03786: n = 22 (11 chromosomes) (a). GM04890: n = 26 (13 chromosomes) (b); GM03417: n = 24 (12 chromosomes) (c). Intensity profiles were aligned to the peak of the Gaussian of the cen14 signal and normalized to the maximum intensity of each channel. Error bars denote s.d. Top centre and right, corresponding heat maps of sequence similarity calculated for 5-kb bins for each centromere. Below the heat maps, DNA methylation tracks show methylation calls from ONT (orange) or PacBio HiFi (turquoise) sequencing, with hypomethylated regions suggesting active centromere localization. Active centromere regions are indicated by CENP-A enrichment on CUT&RUN (blue) and CUT&Tag (black) tracks.
To further investigate centromere activity in ROBs, we examined epigenetic markers that were associated with active centromeres. Previous studies have identified a dip in CpG methylation in active centromere arrays that coincides with enrichment for CENP-A, the centromere histone H3 variant. This region is referred to as the centromere dip region (CDR) and probably marks the site of kinetochore assembly37,38. We inspected CpG methylation from both HiFi and ONT reads across the centromere arrays in the three assembled ROBs. As controls, we examined CDRs in chromosomes 13, 14 and 21 from the normal chromosomes in the same cell line. We visualized CDRs in conjunction with pairwise sequence similarity heat maps between 5-kb bins of the centromeric array39 and methylation, CUT&RUN and CUT&Tag data for CENP-A. In the t(13;14) ROBs, there is a CDR on cen14, coincident with CENP-A enrichment, and although there is low CENP-A enrichment in the adjacent cen13 array, there is no corresponding CDR (Fig. 4a,b and Extended Data Fig. 9). This finding is consistent with the CENP-C signal being confined to the region with the signal from the cen14 FISH probe, suggesting that cen14 is the active centromere and that this chromosome is functionally monocentric. By contrast, cen13 on the normal chromosome has a clear CDR and CENP-A enrichment (Extended Data Fig. 10). Sequence similarity between centromeres 13/21 and 14/22 are a challenge for mapping short read data from CUT&RUN and CUT&Tag, but have less effect on CDR identification from long read data or imaging.
In the t(14;21) ROB, we observed a dip in methylation in both cen14 and cen21 arrays, located near the adjacent edges of the two arrays (Fig. 4c and Extended Data Fig. 9) towards the fusion breakpoint. The CENP-A enrichment is stronger on cen14, but CENP-A enrichment also exists on cen21, coincident with a CDR. In combination with imaging, the data suggest that inner kinetochore proteins are located at both centromeres. The CDR and CENP-A enrichment on cen21 on the normal chromosome are more pronounced (Extended Data Fig. 10). A previous study showed that isodicentric X chromosomes can be stably transmitted through mitosis when centromeres are less than 12 Mb apart20. Our findings suggest that ROBs are stably propagated owing to epigenetic adaptation in centromere activity. In some cases, this is achieved by complete inactivation of one array, whereas in other cases, activity is shared between two arrays that are physically close enough to be encompassed by the outer kinetochore.
Discussion
Whereas 15% of human ROBs appear to form by idiosyncratic mechanisms, 85% involve chromosome 14 (ref. 14). These common ROBs are likely to arise owing to a unique combination of factors, including the inversion on chromosome 14, the presence of a rDNA array which draws the acrocentric short arms into physical proximity near nucleoli, and recombination initiation from a hotspot at or near the SST1 array. Tandem arrays of SST1-sf1 are found primarily on rDNA array-bearing chromosomes in chimpanzee, bonobo, gorilla and human genomes32. However, the assembled chimpanzee and bonobo genomes do not possess a chromosome containing both an inverted SST1 array and an rDNA array, suggesting that this particular structural arrangement is unique to the human lineage. Despite this, phylogenetic evidence suggests that interchromosomal exchange between SST1 arrays on rDNA-containing chromosomes is a common feature in human, bonobo and chimpanzee genomes. Furthermore, the SST1-sf1 arrays that show evidence of exchange in the human and chimpanzee genomes are also enriched for PRDM9 DNA binding sites. Together, the evidence suggests that the common ROBs in humans occur via a combination of four key factors: (1) large regions of homology on heterologous chromosomes; (2) an inversion on chromosome 14, which creates a unique structural arrangement; (3) the co-location of two of these regions in 3D space due to the presence of rDNA arrays that bring the regions into close proximity (within the nucleolus); and (4) meiotic recombination hotspots in SST1, potentially mediated by PRDM9, that lead to crossing over between the nonhomologous chromosomes17.
Our work consolidates many disparate observations regarding the SST1 macrosatellite family, efforts that have been limited by gaps in repetitive DNAs in reference genomes. Through careful analysis of multiple human, chimpanzee and bonobo genomes, we identified three distinct subfamilies of SST1. Several names for these subfamilies appear in the literature, including TTY2 for sf3 on the Y chromosome, MER22 for sf2 and NBL2 for sf1 on the acrocentric chromosomes. In several cases, SST1 sequence has been associated with genome instability and adverse health effects. For example, loss of SST1 repeats on the Y chromosome is associated with male infertility40, and hypomethylation is associated with cancer41,42, potentially contributing to its transcription and subsequent genomic instability43,44. Additionally, meiotic genes such as PRDM9 are often expressed in cancer45, and translocations in cancer often involve acrocentric chromosomes46. Our work is consistent with previous proposals that SST1 contributes to genome instability, and it may have a much broader role than previously appreciated.
SST1-sf1 sequences on the acrocentric chromosomes are highly enriched for PRDM9 DNA binding sites, further implicating this shared macrosatellite DNA as a candidate for meiotic recombination hotspots that would result in recombination between heterologous chromosomes and the formation of common ROBs. We speculate that PRDM9 DNA binding sites that are lost via erosion19 may be ‘regenerated’ by gene conversion or other exchange events, allowing SST1-hotspot associated activity to persist and self-sustain. It is possible that rare and functionally distinct PRDM9 alleles contribute to the incidence of ROB formation and segmental duplications, based on their binding properties, but more studies will be required. Together, the evidence suggests SST1 repeats have a substantial role in the stability and evolution of primate chromosomes, as first suggested for the gibbon genome31.
ROBs form more commonly in female meiosis12,13. Meiosis is sexually dimorphic in many ways that could contribute to the higher frequency of ROB formation in female meiosis compared with male meiosis. Female meiosis proceeds through the earliest stages of prophase with DNA in a more demethylated state than in male meiosis47, and open chromatin is more prone to recombination. Consistently, there is more recombination initiation and crossing over in female meiosis48,49,50. Furthermore, hotspots can be sexually dimorphic51. Chromosomes synapse later in female meiosis than in male meiosis49, which could make them less constrained and allow more interchromosomal exchange. Paradoxically, crossovers are suppressed at rDNA arrays in plants52 and yeast53, but the SST1 arrays implicated here may be sufficiently far from the rDNA to avoid any protective effect. Future efforts will be required to distinguish the transcriptional, epigenetic and hotspot status of the short arms of acrocentric chromosomes in developmental and disease states in males and females, and the factors that allow ROBs to occur more commonly in female meiosis.
ROBs are transmitted at a higher rate than Mendelian ratios in female meiosis, a phenomenon known as meiotic drive54. Drive can occur in females because of the segregation of material into ‘dead end’ polar bodies, potentially owing to weaker centromeres. The specific structure of each ROB, including centromere activity, may affect its transmission. In our study, the t(14;21) ROB appeared functionally dicentric for inner kinetochore proteins with a spanning outer kinetochore, which may create a stronger centromere, potentially influencing its transmission. Moreover, ROBs have two long q arms, so crossovers on the ROB are more likely relative to an individual acrocentric chromosome, potentially facilitating ROB segregation at M1. The structure of each ROB will differ based on variation in individual centromeres and other repeats, potentially conferring differential transmission. Analysis of many ROBs will be required to understand how individual features affect their propagation and carrier fertility. By integrating insights from genomic, cytogenetic and evolutionary studies, we can gain a more comprehensive understanding of the role of rDNA and SST1-based recombination in genome evolution and reproductive biology.
Methods
Cell culture
Human lymphoblastoid cell lines (LCL) GM03786, GM04890 and GM03417 were obtained from Coriell. All LCL cell lines were cultured in RPMI 1640 (Gibco) with l-glutamine supplemented with 15% fetal bovine serum (FBS) in a 37 °C incubator with 5% CO2.
ONT sequencing
The ultra-high molecular weight DNA was extracted from frozen cell pellets using the NEB Monarch HMW DNA Extraction Kit for Tissue and assessed for fragment size using a pulsed field gel. The fragments span from 50 kb to 1,000 kb in size. Genomic DNA libraries were prepared using the NEB_5ml_Ultra-Long Sequencing Kit (SQK-ULK001)-promethion protocol from Oxford Nanopore. Each library was loaded onto a FLO-PRO002 flow cell and ran for 72 h with two subsequent loadings at 24-h intervals. The libraries were sequenced on a PromethION (Oxford Nanopore) running MinKNOW software v.22.12.5. Basecalling and modified base detection (5mC) were performed on-instrument using Guppy 6.4.6 with the following model: dna_r9.4.1_450bps_modbases_5mc_cg_hac_prom.cfg.
PacBio HiFi sequencing
PacBio library preparation was conducted using the SMRTBELL Prep Kit 3.0. The prepared libraries were quantified and sequenced on a PacBio Revio system with Instrument Control Software v.12.0.0.183503 and chemistry v.12.0.0.172289. Sequencing was performed using two SMRT Cells, each with a movie length of 24 h.
Using Pacific Biosciences SMRTbell Prep Kit 3.0 with binding kit 102-739-100 and sequencing kit 102-118-800, three libraries (one per sample). The Megarupter (Diagenode) was used for shearing and SageELF (Sage Science) was used for size selection. Library size was assessed using a FemtoPulse (Agilent). Each library was run on v.25M SMRT Cells using the first generation polymerase and chemistry v.1 (P1-C1). Sequencing was performed on a PacBio Revio system running instrument control software v.12.0.0.183503 and a movie collection time of 24 h per SMRTCell. Using PacBio SMRTLink v.12.0.0.172289, CCS/HiFi reads generated on-instrument using ccs v.7.0.0, lima v.2.7.1 (demultiplexing), and primrose v.1.4.0 (5mC calling).
Hi-C sequencing
Hi-C libraries were generated according to manufacturer’s directions using the Arima High Coverage Hi-C User Guide for Mammalian Cell Lines (A160161 v.01) and Arima-HiC+ User Guide for Library Preparation Using the Arima Library Prep Module (A160432 v.01). Starting with 5 million cells per sample, the Standard Input Crosslinking protocol was followed, resulting in 1.49–1.86 μg of DNA available per sample to generate large proximally ligated DNA as assessed using the Qubit Fluorometer (Life Technologies). Library preparation was performed using the S220 Focused-ultrasonicator (Covaris) to shear samples to 550 bp followed by a DNA purification bead cleanup with no size selection, and 5 or 7 cycles of library PCR amplification per sample. Resulting short fragment libraries were checked for quality and quantity using the Bioanalyzer (Agilent) and Qubit Fluorometer (Life Technologies). Libraries were pooled, requantified and sequenced as 150 bp paired reads on both the Illumina NextSeq 2000 and NextSeq 500 instruments to obtain at least 600 M read pairs per sample, using real-time analysis and instrument software versions current at the time of processing. Demultiplexing was performed with bcl-convert v.3.10.5. The cut sites (^) for the enzymes used were ^GATC, G^ANTC, C^TNAG and T^TAA.
Library construction for GM04890 and GM03786
Libraries were generated from 100 ng genomic DNA using Covaris LE220 plus to shear the DNA and the 2S Plus DNA Library Kit (Integrated DNA Technologies 10009878) for library preparation. To minimize coverage bias, only four cycles of PCR amplification were used. The median insert sizes were approximately 300 bp. Libraries were tagged with unique dual index DNA barcodes to allow pooling of libraries and minimize the impact of barcode hopping. Libraries were pooled for sequencing on the NovaSeq X plus (Illumina) across 14 lanes to obtain at least 369 million 151-base read pairs per individual library.
Library construction for GM03417
PCR-free libraries were generated from 1 μg genomic DNA using a Covaris R230 to shear the DNA and the TruSeq DNA PCR-Free HT Sample Preparation Kit (Illumina) for library preparation. The median insert sizes were approximately 400 bp. Libraries were tagged with unique dual index DNA barcodes to allow pooling of libraries and minimize the impact of barcode hopping. Libraries were pooled for sequencing on the NovaSeq X plus (Illumina) across 7 lanes on 25B flowcells to obtain at least 388 million 151-base read pairs per individual library.
Assembly methods
Phased genome assemblies were generated using Verkko (v.1.4.1)23. The assembly process integrated PacBio HiFi reads and Oxford Nanopore (ONT) reads, with Hi-C reads used specifically for the phasing. The ONT reads included ultra-long reads, defined as reads that are at least 100 kb in length. Verkko was run with the command:
samples = (″GM03417″ ″GM03786″ ″GM04890″) for sample in ″${samples[@]}″; do verkko --slurm -d $sample \ --screen human \ --graphaligner conda/bin/GraphAligner \ --mbg conda/bin/MBG \ --hifi-coverage 30 \ --hifi $sample/HiFi/*fa.gz \ --nano $sample/ONT/fastq/*fq.gz \ --hic1 $sample/HiC/*_1_[ACTG]*.fastq.gz \ --hic2 $sample/GM03786/HiC/*_2_[ACGT]*.fastq.gz done
Haplotype-consistent contigs and scaffolds were automatically extracted from the labelled Verkko graph, with unresolved gap sizes estimated directly from the graph structure. After the assembly was generated, we collapsed all nodes composed of only rDNA k-mers into a single node and added telomere nodes to the graph to indicate ends of chromosomes using the commands:
seqtk hpc rDNA.fasta > rDNA_compressed.fasta seqtk telo assembly.fasta > assembly.telomere.bed mash sketch -i 8-hicPipeline/unitigs.hpc.fasta -o compressed.sketch.msh $mash screen compressed.sketch.msh rDNA_compressed.fasta | awk '{if ($1 > 0.9 & & $4 < 0.05) print $NF}' > target.screennodes.out python remove_nodes_add_telomere.py -r target.screennodes.out -t assembly.telomere.bed
In this simplified graph, the Robertsonian translocation was apparent in all cases (Extended Data Figs. 1–3). We extracted the assembly path corresponding to the ROB and identified gaps in the assembly. There was one gap in GM03417, one gap in GM03786 and two gaps in GM04890. Manual interventions were used to complete the chromosomes.
Assembly quality evaluation
We evaluated the quality and gene completeness of the genome assemblies using two approaches: a k-mer-based, reference-free method and a gene content assessment. For the k-mer-based evaluation, we employed Merqury55, a tool that assesses assembly completeness and accuracy without relying on a reference genome. Merqury uses k-mer frequencies from sequencing reads to estimate the quality value of the assemblies, which represents the phred-scaled error rate. For our evaluation, we used PacBio HiFi reads for the quality value estimation.
To assess gene completeness, we used compleasm56, a tool based on BUSCO. Compleasm evaluates the presence and integrity of a curated set of BUSCOs expected to be present in the genomes of the taxonomic group under study. We used the primate-specific BUSCO dataset, which includes 13,780 genes, to quantify the completeness, duplication and fragmentation of conserved genes in our assemblies.
PRDM9 site predictions and density
In 147 human haploid genomes (from 72 diploid individuals plus the haploid CHM13 and diploid HG002 genomes), predicted PRDM9 DNA binding sites were identified by using Motifence (v.0.1.1, commit fb1ebc0; https://github.com/AndreaGuarracino/motifence) to find DNA sequences matching the canonical 13-mer motif CCNCCNTNNCCNC57 or its reverse complement. To compute the density of PRDM9 DNA binding sites per kb in SST1 regions, SST1 arrays were first identified using TideHunter58. For a region to be defined as an SST1 array, the following criteria were applied: monomeric unit within the array had to be at least 500 bp in length, there had to be at least two monomers, and the monomers had to overlap with RepeatMasker (v.4.1.5, http://repeatmasker.org/) SST1 annotations. The PRDM9 density was then calculated by dividing the number of PRDM9 binding sites in the SST1 regions by the total length of these SST1 regions. PRDM9 alleles were found by conducting a BLAST search (blast-plus/2.13.0) on GM3417, GM3786 and GM4890 with the A allele as the reference. To identify genotypes, these hits were aligned to the 69 alleles from Alleva et al.30 using MUSCLE and visualized in Geneious Prime 2024.0.7.
In the chimpanzee genome, PRDM9 site density in sites per kb on SST1 regions was calculated using R and Bioconductor. The function vmatchPattern from the Biostrings library was used to map the occurrence of the chimpanzee PRDM9 motifs: prdm9_E CNNCCNAANAA, prdm9_W CNGNNAANANTT and prdm9_pt1 ANTTNNATCNTCC, or their reverse compliments, on the genome. SST1-containing regions were then queried for overlap of PRDM9 sites using the countOverlaps function from the GenomicRanges library. Query width was used to calculate sites per kb. SST1 regions larger than 10 kb were broken into 3-kb tiles to approximate resolution near SST1 feature size. Background PRDM9 site density was assessed in two ways. Random background PRDM9 density for each chromosome was determined using 100 randomly chosen 3-kb segments. To account for GC bias, the genome was scored for GC content at 3 kb resolution, and fragments within one s.d. of the average GC content of the SST1-containing elements were chosen to calculate background PRDM9 site density.
SST1–segmental duplication association
To examine associations between SST1 repeats and segmental duplications, we performed the following analysis in 147 human genomes (from 72 diploid individuals plus the haploid CHM13 and diploid HG002 genomes). First, repetitive regions in the genomic sequences were masked using RepeatMasker (v.4.1.5, http://repeatmasker.org/) and Tandem Repeats Finder (v.4.09.1)59. Segmental duplications were then identified using SEDEF (v.1.1)60 on each haploid masked genome. SST1 repeats were detected using RepeatMasker and refined with TideHunter, as described above. Finally, we used the R package regioneR (v.1.36.0)61 to perform permutation testing (n = 10,000) to assess the significance of spatial associations between SST1 repeats and segmental duplications. This analysis was conducted on 147 haplotype-resolved genomes to provide a comprehensive view of these genomic features across diverse human genomes.
SST1 monomer characterization
We used RepeatMasker to find the regions. We retrieved all fasta files with 1 kb of flanking regions for all arrays. Then, we manually curated all clusters using visual inspection by generating dot plots with the Dotlet applet62 with a 15 bp word size and 60% similarity cut-off. We made regressive changes in the consensus sequences used and that enabled us to describe the sequences properly. By manual curation, we were able to identify the beginning and end of the arrays and each monomer regarding the consensus generated. All monomeric sequences analysed were characterized with the same initial and final point regarding the consensus for the sake of alignment.
Maximum-likelihood phylogenetic analysis
We aligned all SST1 full-length monomeric sequences retrieved from assembled genomes using MUSCLE63. We conducted the phylogenetic analysis by using the maximum-likelihood method based on the best-fit substitution model (Kimura two-parameter + G, parameter = 5.5047) inferred by Jmodeltest2 with 1,000 bootstrap replicates. Bootstrap values higher than 75 are indicated at the base of each node.
Chromosome spreads, FISH and immunoFISH
For the preparation of chromosome spreads, cells were blocked in mitosis by the addition of Karyomax colcemid solution (0.1 µg ml−1, Life Technologies) for 6–7 h. Adherent fibroblast cells were collected by trypsinization. Collected cells were incubated in hypotonic 0.4% KCl solution for 12 min and pre-fixed by addition of methanol:acetic acid (3:1) fixative solution (1% total volume). Pre-fixed cells were spun down and then fixed in methanol:acetic acid (3:1).
For SST1 and centromere FISH, spreads were dropped on a glass slide and incubated at 65 °C overnight. Before hybridization, slides were treated with 0.1 mg ml−1 RNAse A (Qiagen) in 2× SSC for 45 min at 37 °C and dehydrated in a 70%, 80% and 100% ethanol series for 2 min each. Slides were denatured in 70% deionized formamide/2× SSC solution pre-heated to 72 °C for 1.5 min. Denaturation was stopped by immersing slides in 70%, 80% and 100% ethanol series chilled to −20 °C. Labelled DNA probes were denatured separately in a hybridization buffer by heating to 80 °C for 10 min before applying to denatured slides. Fluorescently labelled human centromere probes for D13Z1/D21Z1 and D14Z1/D22Z1 were from Cytocell. The biotin-labelled BAC probe for SST1 (RP11-614F17) was obtained from Empire genomics. Specimens were hybridized to the probes under a glass coverslip or HybriSlip hybridization cover (GRACE Biolabs) sealed with the rubber cement or Cytobond (SciGene) in a humidified chamber at 37 °C for 48–72 h. After hybridization, slides were washed in 50% formamide/2× SSC 3 times for 5 min per wash at 45 °C, then in 1× SSC solution at 45 °C for 5 min twice and at room temperature once. For biotin detection, slides were incubated with fluorescent streptavidin conjugated with Cy5 (ThermoFisher Scientific) for 2–3 h in PBS containing 0.1% Triton X-100 and 5% bovine serum albumin (BSA), and then washed 3 times for 5 min with PBS/0.1% Triton X-100. Slides were mounted in Vectashield containing DAPI (Vector Laboratories). Confocal z-stack images were acquired on the Nikon TiE microscope equipped with PlanApo 100× oil immersion objective NA 1.45, Yokogawa CSU-W1 spinning disk, Flash 4.0 sCMOS camera (Hamamatsu), and NIS Elements software.
For chimpanzee and bonobo cell lines, chromosome spreads specimens were hybridized to the probes under a glass coverslip or HybriSlip hybridization cover (GRACE Biolabs) sealed with rubber cement or Cytobond (SciGene) in a humidified chamber at 37 °C for 48 h. After hybridization, slides were washed in 50% formamide/2× SSC 3 times for 5 min per wash at 45 °C, then in 1× SSC solution at 45 °C for 5 min twice and at room temperature once. For biotin detection, slides were incubated with fluorescent streptavidin conjugated with 488 (ThermoFisher Scientific) for 45 min in PBS containing 0.1% Triton X-100 and 5% BSA, and then washed 3 times for 5 min with PBS/0.1% Triton X-100. Slides were mounted in Vectashield containing DAPI (Vector Laboratories). Confocal z-stack images were acquired on the Zeiss LSM 800 microscope equipped with a 63×/1.4 Plan-Apochromat 63× oil immersion objective and Zen Blue software.
For chimpanzee and bonobo, we used the following SST1-sf1 probe: (5′-AGGCCAAATATCAGCTGCAAATTCAATCATCCATCAGCCCTCTGCCTACCTCTTCCTTTGAAAGGGCAGTGGCCGGCCCGGCTTGTAAAAGCCCTGGGGTTCCAGAAAGCCGACCGCGCTTTACAGAACAACTGTAATGAGGAACACAGGCGAATCCGAGGGGGTGACCATGTGACCACGCGTGGTACTGGCCAATCCCACAGCAGCTGGTGTTAATGTGTGTCACCGGAGGCATACGGGGCGACGGCGAAACAAAGGGTGGTGTCCAGGAATGTGCCGGTGGATGGGGAAACGGGTGACCTTTCCATCAATGCCAACGAAAATCAAAGAACAACTGGGACCCGGGGGTTGGGGGTGCCGCCTGTGCCTGACCCAAGCCACGTTTTCAAATGCCTACCAGAGGAGCAGAGAGGTTTCTGCAAAATTCGCAGCATCCCCAATCCTCCACCGACCTGGTAGCCCTGACGAAACTTCGGCTGGCACAAACCCAGAGAGGGTGGGGAGTCATACAGCAGAGGAGAGCAGCCCAGGGGCACGCAGGCCGACCCGTCATCGAGATCACGGACGGCCGCACGACTTTTCGGGAGACTCACCCCAGCCAACACCGTCCGTGCAGGCCTGAGGCTGGTATCCCGTGCTGCTTCCCCCCGTCTCCGCCTGGGGTTTCCTCATCAAGGTCGGCCCTTTGCGACTCCTGGCATCCGGAGACGTTCCCTTCGACCCCGTGGAGAGGTGAGGCTTTAGCCTCAGAGCCTCGACACCCAAGCACTGCAACGGAGGGCTCCTGCTCTGCCAAGCCTCGGGGCCTGGTTTCTAAGAAAACCGTGGGAACCACTGTGACGGGAGATACCGCTCGCGCCTCGCGCATGCGCATTGGCCGAGCCGATTCGCGCTCCACTGCTGACAGATAGGCTGCGTCCGCTTTAAATATCGCCACCACCACGCGGCGGCCTTGGTGCTCCTGCTGCCGCTGCGGCGGCGGCTGGATCCTGGGTCCTGTTTGGGGCGGCATGCGAAAGGGGACCGCGGGTGTCTCGTCCTGTCCCAGGCCCACACCCCCAGGGGTCCTGTCCACAGGACCTGCTTCAGCCGACTTCCACCGAGGGAGGGGGAGCTTCAGGACGCCTGCTGTGTTCTCCGGACTCCCGTTGAGATCCGATTTTGGCCCTCTCCGAGTGAGATAGGACGAGCTCACCACACCCGGACAGGCCGGCAGGGCCTC GCTGCAGCACAGAATGATCCCGTAGGTCTGA-3′).
For CENP-B and CENP-C immunoFISH, freshly prepared chromosome spreads were dropped on a glass slide, washed with PBS/0.1% Triton X-100, and blocked with 5% BSA in PBS/0.1% Triton X-100. Primary antibody (rabbit polyclonal anti-CENP-B, Abcam, ab25734, rabbit polyclonal anti-CENP-C, Millipore, ABE1957) and secondary antibody (goat anti-rabbit Alexa Fluor 647, ThermoFisher Scientific) were diluted in 5% BSA/PBS/0.1% Triton X-100. Specimens were incubated with primary antibody overnight, washed 3 times for 5 min, incubated with secondary antibody for 2–4 h and washed again 3 times for 5 min. All washes were performed with PBS/0.1% Triton X-100. After antibody incubation, spreads were post-fixed in 2% paraformaldehyde diluted in PBS for 15 min, washed in PBS, and processed for FISH as described above, starting with an ethanol dehydration series. DNA was stained with 1.5 µg ml−1 DAPI. Confocal z-stack images of CENP-B immunoFISH were acquired on the Nikon TiE microscope as described above. For SIM performed on CENP-C immunoFISH, slides were rinsed in ddH2O, air-dried in the dark, mounted in ProLong Glass antifade mountant (ThermoFisher Scientific) and allowed to cure for at least 24 h before imaging. z-stack images were acquired on an Elyra 7 Lattice SIM2 microscope (Zeiss) equipped with two PCO.edge 4.2 sCMOS cameras, four high power continuous wave lasers (405, 488, 561 and 642 nm) and a Zeiss PlanApo 63× oil immersion objective NA 1.4. The illumination pattern was set to 15 phases, and the z-stack spacing was set at 100 nm. Raw SIM images were reconstructed using the ZEN Black software (Zeiss) with 10.5 manual adjustments for sharpness and best-fit settings for all channels except 405 nm (DAPI), which was processed in the widefield mode. Image pre-processing for SPA-SIM included channel alignment; this analysis randomizes any residual chromatic shifts by averaging randomly oriented chromosomes.
For CENP-A and NDC80 immunoFISH experiments, fibroblasts plated on 150-mm dishes were treated with 100 µM Monastrol (Tocris Bioscience) and 100 µM Apcin (Selleck Chemicals) for 5 h and collected by mitotic shake-off. Collected cells were further incubated with Karyomax colcemid solution (0.1 µg ml−1, Life Technologies) for 15 min. After that, cells were spun down and resuspended in 0.075 M KCl swelling buffer containing 10 mM HEPES, incubated at room temperature for 12 min, washed with ice-cold PBS and kept on ice. Cells (3–4 × 105) were spun onto glass slides using Shandon Cytospin 4 centrifuge (Thermo scientific) at 700–1000 rpm for 3–5 min, washed in KCM buffer (120 mM KCl, 20 mM NaCl, 10 mM Tris-HCl, pH8, 0.5 mM EDTA, 0.1% (v/v) Triton X-100) and blocked in 5% (w/v) BSA/KCM for 30 min. Slides were then incubated with primary antibodies (mouse anti-CENP-A (3-19) Enzo ADI-KAM-CC006, or mouse anti-NDC80 (9G3.23) ThermoFisher Scientific MA1-23308 with rabbit anti-CENP-A ProSci 30-143) used at 1:100 dilution for 1 h at room temperature, washed 3 times in KCM for 5 min, followed by incubation for 1 h with species-specific secondary antibodies conjugated with Alexa Fluor Plus dyes at 2 µg ml−1, washed again and post-fixed in in 4% (v/v) paraformaldehyde/KCM for 10 min. Fixed slides were incubated in 50% glycerol/PBS at 4 °C for at least 1 h or overnight. Before hybridization, slides were subjected to a freeze-thaw treatment by dipping into liquid nitrogen, then treated with 0.1 N HCl for 5 min, washed twice in 2× SSC buffer, and pre-incubated in 50% formamide/2× SSC overnight. Fluorescently labelled probes were pre-denatured for 7 min at 80 °C, followed by incubation with the specimen for 3 min at 74 °C, and hybridized under HybriSlip hybridization cover (GRACE Biolabs) sealed with Cytobond (SciGene) in a humidified chamber at 37 °C for 24–48 h. After hybridization, slides were washed in 50% formamide/2× SSC 3 times for 5 min per wash at 45 °C, then in 1× SSC solution at 45 °C for 5 min twice and at room temperature once. DNA was stained with 1.5 µg ml−1 DAPI. After staining was completed, slides were rinsed in ddH2O, air-dried in the dark, mounted in ProLong Glass (ThermoFisher Scientific) and allowed to cure for at least 24 h before SIM imaging.
Centromere intensity profiling of centromere and SST1 FISH and CENP-B immunoFISH
Maximum intensity projections from spinning disk confocal z-stacks were generated, and chromosomes of interest were segmented manually on the basis of DNA and centromere labelling. Segmented chromosomes from each cell line were oriented vertically and assembled in a new stack consisting of identified specific chromosomes from multiple chromosome spreads. Intensity plot profiles were generated from 2 µm vertical lines with the width of 10 pixels drawn through centromeric regions of each chromosome. Intensity profiles were combined by channel, fit to single Gaussian functions, and aligned to the peak of the Gaussian of the indicated channel. These profiles were then averaged together and normalized to the maximum intensity of each peak. For each chromosome from each cell line, at least ten intensity profiles were averaged and plotted with the s.d. All image processing and analysis were performed using ImageJ/FIJI. A detailed description of this type of analysis and relevant plugins are available at https://research.stowers.org/imagejplugins/spasim.html.
Semi-automated intensity profiling of CENP-C immunoFISH from SIM images
Reconstructed SIM images were mean projected, except for the DAPI channel, which had the slice of highest contrast selected. ROBs and corresponding normal acrocentric chromosomes were identified using centromere FISH signals and segmented manually or with a Cellpose model trained on a combination of the DAPI and centromere signals. Individual chromosomes were transferred to a new image and oriented vertically using a second Cellpose model trained to find a skeleton of the chromosome. Bent chromosomes were straightened in ImageJ/FIJI by manually drawing two annotation lines across centromeres, one through each kinetochore. The straightened images were then aligned to the peak of the specified centromere FISH signal used as the anchor point, and the line intensity profiles were aggregated over multiple images and split by cell line. At least ten chromosomes were analysed for each instance from each cell line. All analysis was performed in ImageJ/FIJI and Python with code at https://github.com/jouyun/Gerton_Robertsonian_2024.
Methylation calls
HiFi BAM and ONT FASTQ files with 5mC methylation calls as MM and ML tags were aligned against the generated assemblies using pbmm2 (v.1.13.0, https://github.com/PacificBiosciences/pbmm2) for HiFi reads and Winnomap (v.2.03)64, for ONT reads. The alignments were then converted to sorted BAM files containing only primary mappings with samtools (v.1.17)65:
# HiFi reads pbmm2 align {genome}.mmi {bam_with_meth_calls} -j 42 > {output.bam} samtools view -@ 24 -Sb -F 2048 {output.bam} | samtools sort -@ 24 -T {temporary_directory} - > {output.bam} samtools index {output.bam} # ONT reads winnowmap -t 48 -W {genome}_repetitive_k15.txt -ax map-ont -y {assembly_fasta} {fastq_with_meth_calls} > {output.sam} samtools view -@ 24 -Sb -F 2048 {output.sam} | samtools sort -@ 24 -T {temporary_directory} - > {output.bam} samtools index {output.bam}
Aggregated methylation percentages at all CpGs were obtained using modbam2bed (v.0.10.0, https://github.com/epi2me-labs/modbam2bed) with bases with >0.8 probability called “methylated” and bases with <0.2 probability called “unmethylated”:
modbam2bed -t 48 -e -m 5mC --cpg -a 0.20 -b 0.80 {assembly_fasta} {output.bam} > {output.bed}
CUT&RUN library preparation
The CUT&RUN assay was performed using the CUT&RUN Assay Kit (86652, Cell Signaling Technology) in accordance with the manufacturer’s protocol. For each condition, 250,000 cells were pelleted and washed in 1× wash buffer, prepared from 10× wash buffer (31415, Cell Signaling Technology), 100× spermidine (27287, Cell Signaling Technology) and 200× protease inhibitor cocktail (7012, Cell Signaling Technology). Cell suspensions were then incubated with concanavalin A-coated beads for 5 min at room temperature to facilitate binding, followed by resuspension in 1× binding buffer containing 100× spermidine, 200× protease inhibitor cocktail, 40× digitonin solution (Cell Signaling Technology, 16359) and antibody binding buffer (Cell Signaling Technology, 15338). For the detection of CENP-A–DNA interactions, a monoclonal antibody against CENP-A (Enzo, ADI-KAM-CC006-E) was employed at a 1:50 dilution. As controls, tri-methyl-histone H3 (Lys4) (Cell Signaling Technology, 9751, C42D8 rabbit monoclonal antibody) was used at 1:50 dilution as a positive control, while a rabbit IgG XP isotype control (Cell Signaling Technology, 66362, DA1E monoclonal antibody) was applied at 1:10 dilution as a negative control. Antibody incubation was conducted at 4 °C overnight (16 h). Later, the beads were subjected to magnetic separation and washed in digitonin buffer.
The beads were then resuspended in digitonin buffer containing pAG-MNase enzyme (40366) and incubated at 4 °C for 1 h. Following another wash in digitonin buffer, the beads were treated with calcium chloride in digitonin buffer and incubated at 4 °C for 30 min to facilitate MNase activation. The enzymatic digestion was terminated by adding 1× stop buffer (prepared from 4× stop buffer (Cell Signaling Technology, 48105), digitonin solution and 200× RNase A (Cell Signaling Technology, 7013)). For normalization, spike-in DNA (Cell Signaling Technology, 40366) was introduced at a final concentration of 10 pg μl−1 (1:100 dilution). Samples were then incubated at 37 °C for 10 min, and the supernatants were collected by centrifugation. DNA was liberated via incubation at 65 °C for 2 h before purification. Input chromatin samples were sheared to fragments ranging from 100–700 base pairs using a Covaris S2 sonicator prior to purification.
DNA purification was performed using a DNA purification with spin columns kit (Cell Signaling Technology, 14209). DNA concentration was assessed using the Qubit dsDNA HS kit for the Qubit Fluorometer.
CUT&Tag library preparation
For anti-CENP-A CUT&Tag, library preparation was used the CUT&Tag-IT kit from Active Motif (53160). Each experiment was performed for 500,000 fresh cells. Fresh cells were washed using 1× wash buffer and nuclei were isolated and incubated with activated concanavalin A-coated magnetic beads in 2 ml PCR tubes at room temperature for 10 min. A 1:100 dilution of primary antibody anti-CENP-A (human) monoclonal antibody (D115-3) in antibody buffer was added and nuclei were incubated overnight at 4 °C. The next day tubes were incubated on a magnetic tube holder and supernatants were discarded. Secondary antibody (rabbit anti-mouse) was diluted at 1:100 in Dig-Wash buffer and nuclei were incubated for 1 h on an orbital rotator at room temperature. Nuclei were washed three times in Dig-Wash buffer and then incubated with a 1:100 dilution of CUT&Tag-IT pA–Tn5 Transposomes for 1 h on an orbital rotator at room temperature. After, 125 μl of tagmentation buffer was added to each sample. To stop tagmentation, 4.2 μl 0.5 M EDTA, 1.25 μl 10% SDS and 1.1 μl 10 mg ml−1 proteinase K was added to each reaction and incubated at 55 °C for 1 h. DNA was barcoded and amplified using the following conditions: a PCR mix of 25 μl NEBNext 2× mix, 2 μl each of barcoded forward and reverse 10 μM primers, and 21 μl of extracted DNA was amplified at: 58 °C for 5 min, 72 °C for 5 min, 98 °C for 45 s, 16× 98 °C for 15 s followed by 63 °C for 10 s, 72 °C for 1 min. Amplified DNA libraries were purified by adding a 1.1× volume of SPRI beads to each sample and incubating for 10 min at 23 °C. Samples were placed on a magnet and liquid was removed. Beads were rinsed twice with 80% ethanol, and DNA was eluted with 20 μl elution buffer. All individually i7-barcoded libraries were mixed at equimolar proportions for sequencing.
CUT&Tag and CUT&RUN libraries and sequencing
Libraries were quantified and individually converted to process on the Singular Genomics G4 with the SG Library Compatibility Kit (700141), following the Adapting Libraries for the G4–Retaining Original Indices protocol. The converted libraries were sequenced in individual lanes on an F3 flow cell (700125) on the G4 instrument, using Instrument Control Software 23.08.1-1 with 100 bp paired reads. Following sequencing, sgdemux 1.2.0 was run to generate FASTQ files.
CUT&Tag and CUT&RUN bioinformatic analysis
CUT&Tag and CUT&RUN sequencing reads were trimmed using the trim-galore tool (v.0.6.10, https://github.com/FelixKrueger/TrimGalore), which included adapter removal. The trimmed reads of each sample were then aligned to the corresponding generated de novo assemblies using bowtie2 (v.2.5.3)66. Post-alignment, the reads were sorted and indexed using samtools (v.1.17)65, to then extract depth information for primary alignments with mosdepth67.
Pairwise sequence identity heat maps
To generate pairwise sequence identity heat maps of each centromeric region, we used a modified version of StainedGlass (v.0.6)39 with the following parameters: window=5000, mm_f = 30000, and mm_s = 1000. Our modifications were applied to visualize the identity heat maps with methylation and CENP-A CUT&Tag information included at the bottom.
Synteny plots
To visualize the alignment between the generated assemblies and the CHM13 genome, we used NGenomeSyn26 to generate the synteny plots, which were then manually curated.
Hi-C data analysis
We mapped Hi-C reads against the CHM13 genome and the phased genome assemblies of the three cell lines with the BWA aligner68, configured to handle the chimeric nature of Hi-C reads by allowing local mapping and tuning the parameters to minimize gaps. Following read mapping, for each cell line, we constructed three Hi-C contact matrices, one against CHM13 and two against the 2 haplotypes of the respective assembly, by specifying a bin size of 10,000 bp and incorporating restriction site information using HiCExplorer tools69. The resulting matrices were then binned at various resolutions (100 kb, 200 kb and 500 kb) and corrected to normalize the contact frequencies across bins and remove GC and open chromatin biases. Finally, we visualized the corrected matrices using hicPlotMatrix, applying log transformation to handle the wide range of contact counts.
Genome versions used
We leveraged multiple reference genomes and assemblies. The primary reference was T2T-CHM13v2.0. We also incorporated the recent diploid T2T-HG002v1.1 genome and 72 samples from the Human Pangenome Reference Consortium (HPRC). The HPRC samples were assembled using Verkko v.2.123, using a combination of sequencing technologies for each sample. The assembly process utilized PacBio High-Fidelity (HiFi) reads and Oxford Nanopore Technology (ONT) long reads. For phasing, we primarily used short Illumina reads. In cases where trio information was unavailable, Hi-C reads were used for phasing instead.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data were derived from de-identified human patient DNA samples. Assemblies and supporting data types (Hi-C, PacBio, ONT and Illumina sequencing) are available via dbgap from phs003920.v1.p1 in accordance with patient consent from Coriell (Supplementary Table 1). CUT&RUN and CUT&Tag data are available at NCBI BioProject under accessions PRJNA1289885 (GM04890), PRJNA1289767 (GM03786) and PRJNA1286808 (GM03417). Original raw data files for imaging are publicly available and can be accessed from the Stowers Original Data Repository at http://www.stowers.org/research/publications/libpb-2501.
References
Hamerton, J. L., Canning, N., Ray, M. & Smith, S. A cytogenetic survey of 14,069 newborn infants. I. Incidence of chromosome abnormalities. Clin. Genet. 8, 223–243 (1975).
Nielsen, J. & Wohlert, M. Chromosome abnormalities found among 34,910 newborn children: results from a 13-year incidence study in Arhus, Denmark. Hum. Genet. 87, 81–83 (1991).
Zhao, W. W. et al. Robertsonian translocations: an overview of 872 Robertsonian translocations identified in a diagnostic laboratory in China. PLoS ONE 10, e0122647 (2015).
Schoemaker, M. J. et al. Mortality and cancer incidence in carriers of balanced Robertsonian translocations: a national cohort study. Am. J. Epidemiol. 188, 500–508 (2019).
Poot, M. & Hochstenbach, R. Prevalence and phenotypic impact of Robertsonian translocations. Mol. Syndromol. https://doi.org/10.1159/000512676 (2021).
Robertson, W. R. B. Taxonomic relationships shown in the chromosomes of Tettigidae and Acrididae: V-shaped chromosomes and their significant in Acridia, Locustidae, and Gryllidae: chromosomes and variation. J. Morphol. 27, 179–331 (1916).
Guarracino, A. et al. Recombination between heterologous human acrocentric chromosomes. Nature 617, 335–343 (2023).
Qumsiyeh, M. B. Evolution of number and morphology of mammalian chromosomes. J. Hered. 85, 455–465 (1994).
Brannan, E. O., Hartley, G. A. & O’Neill, R. J. Mechanisms of rapid karyotype evolution in mammals. Genes 15, 62 (2023).
Ferguson-Smith, M. A. & Trifonov, V. Mammalian karyotype evolution. Nat. Rev. Genet. 8, 950–962 (2007).
Liehr, T. Cytogenetic contribution to uniparental disomy (UPD). Mol. Cytogenet. 3, 8 (2010).
Bandyopadhyay, R. et al. Parental origin and timing of de novo Robertsonian translocation formation. Am. J. Hum. Genet. 71, 1456–1462 (2002).
Page, S. L. & Shaffer, L. G. Nonhomologous Robertsonian translocations form predominantly during female meiosis. Nat. Genet. 15, 231–232 (1997).
Page, S. L., Shin, J. C., Han, J. Y., Choo, K. H. & Shaffer, L. G. Breakpoint diversity illustrates distinct mechanisms for Robertsonian translocation formation. Hum. Mol. Genet. 5, 1279–1288 (1996).
Hook, E. B. & Cross, P. K. Rates of mutant and inherited structural cytogenetic abnormalities detected at amniocentesis: results on about 63,000 fetuses. Ann. Hum. Genet. 51, 27–55 (1987).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Gerton, J. L. A working model for the formation of Robertsonian chromosomes. J. Cell Sci. 137, jcs261912 (2024).
Ehrlich, M. Cancer-linked DNA hypomethylation and its relationship to hypermethylation. Curr. Top. Microbiol. Immunol. 310, 251–274 (2006).
Grey, C., Baudat, F. & de Massy, B. PRDM9, a driver of the genetic map. PLoS Genet. 14, e1007479 (2018).
Sullivan, B. A. & Willard, H. F. Stable dicentric X chromosomes with two functional centromeres. Nat. Genet. 20, 227–228 (1998).
Sullivan, B. A., Wolff, D. J. & Schwartz, S. Analysis of centromeric activity in Robertsonian translocations: implications for a functional acrocentric hierarchy. Chromosoma 103, 459–467 (1994).
McStay, B. The p-arms of human acrocentric chromosomes play by a different set of rules. Annu. Rev. Genomics Hum. Genet. 24, 63–83 (2023).
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Waterhouse, R. M., Tegenfeldt, F., Li, J., Zdobnov, E. M. & Kriventseva, E. V. OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Res. 41, D358–D365 (2013).
Hoyt, S. J. et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022).
He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics 39, btad121 (2023).
Eram, M. S. et al. Trimethylation of histone H3 lysine 36 by human methyltransferase PRDM9 protein. J. Biol. Chem. 289, 12177–12188 (2014).
Hayashi, K., Yoshida, K. & Matsui, Y. A histone H3 methyltransferase controls epigenetic events required for meiotic prophase. Nature 438, 374–378 (2005).
Altemose, N. et al. A map of human PRDM9 binding provides evidence for novel behaviors of PRDM9 and other zinc-finger proteins in meiosis. eLife 6, e28383 (2017).
Alleva, B., Brick, K., Pratto, F., Huang, M. & Camerini-Otero, R. D. Cataloging human PRDM9 allelic variation using long-read sequencing reveals PRDM9 population specificity and two distinct groupings of related alleles. Front. Cell Dev. Biol. 9, 675286 (2021).
Hartley, G. A., Okhovat, M., O’Neill, R. J. & Carbone, L. Comparative analyses of gibbon centromeres reveal dynamic genus-specific shifts in repeat composition. Mol. Biol. Evol. 38, 3972–3992 (2021).
Yoo, D. et al. Complete sequencing of ape genomes. Nature 641, 401–418 (2025).
Schwartz, J. J., Roach, D. J., Thomas, J. H. & Shendure, J. Primate evolution of the recombination regulator PRDM9. Nat. Commun. 5, 4370 (2014).
Stevison, L. S. et al. The time scale of recombination rate evolution in great apes. Mol. Biol. Evol. 33, 928–945 (2016).
Makova, K. D. et al. The complete sequence and comparative analysis of ape sex chromosomes. Nature 630, 401–411 (2024).
Earnshaw, W. C., Ratrie, H. 3rd & Stetten, G. Visualization of centromere proteins CENP-B and CENP-C on a stable dicentric chromosome in cytological spreads. Chromosoma 98, 1–12 (1989).
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
Logsdon, G. A. et al. The variation and evolution of complete human centromeres. Nature 629, 136–145 (2024).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Shaveisi-Zadeh, F. et al. TTY2 genes deletions as genetic risk factor of male infertility. Cell. Mol. Biol. 63, 57–61 (2017).
Nagai, H. et al. A novel sperm-specific hypomethylation sequence is a demethylation hotspot in human hepatocellular carcinomas. Gene 237, 15–20 (1999).
Thoraval, D. et al. Demethylation of repetitive DNA sequences in neuroblastoma. Genes Chromosomes Cancer 17, 234–244 (1996).
Samuelsson, J. K. et al. Helicase lymphoid-specific enzyme contributes to the maintenance of methylation of SST1 pericentromeric repeats that are frequently demethylated in colon cancer and associate with genomic damage. Epigenomes 1, 2 (2017).
Gonzalez, B. et al. Somatic hypomethylation of pericentromeric SST1 repeats and tetraploidization in human colorectal cancer cells. Cancers 13, 5353 (2021).
Lingg, L., Rottenberg, S. & Francica, P. Meiotic genes and DNA double strand break repair in cancer. Front. Genet. 13, 831620 (2022).
Lin, C. Y. et al. Translocation breakpoints preferentially occur in euchromatin and acrocentric chromosomes. Cancers 10, 13 (2018).
Gkountela, S. et al. DNA demethylation dynamics in the human prenatal germline. Cell 161, 1425–1436 (2015).
Broman, K. W., Murray, J. C., Sheffield, V. C., White, R. L. & Weber, J. L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998).
Gruhn, J. R., Rubio, C., Broman, K. W., Hunt, P. A. & Hassold, T. Cytological studies of human meiosis: sex-specific differences in recombination originate at, or prior to, establishment of double-strand breaks. PLoS ONE 8, e85075 (2013).
Wang, S. et al. Inefficient crossover maturation underlies elevated aneuploidy in human female meiosis. Cell 168, 977–989.e917 (2017).
Bherer, C., Campbell, C. L. & Auton, A. Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nat. Commun. 8, 14994 (2017).
Sims, J., Copenhaver, G. P. & Schlogelhofer, P. Meiotic DNA repair in the nucleolus employs a nonhomologous end-joining mechanism. Plant Cell 31, 2259–2275 (2019).
Li, P., Jin, H. & Yu, H. G. Condensin suppresses recombination and regulates double-strand break processing at the repetitive ribosomal DNA array to ensure proper chromosome segregation during meiosis in budding yeast. Mol. Biol. Cell 25, 2934–2947 (2014).
Pardo-Manuel de Villena, F. & Sapienza, C. Transmission ratio distortion in offspring of heterozygous female carriers of Robertsonian translocations. Hum. Genet. 108, 31–36 (2001).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Huang, N. & Li, H. compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics 39, btad595 (2023).
Myers, S., Freeman, C., Auton, A., Donnelly, P. & McVean, G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat. Genet. 40, 1124–1129 (2008).
Gao, Y., Liu, B., Wang, Y. & Xing, Y. TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain. Bioinformatics 35, i200–i207 (2019).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Numanagic, I. et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018).
Gel, B. et al. regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests. Bioinformatics 32, 289–291 (2016).
Junier, T. & Pagni, M. Dotlet: diagonal plots in a web browser. Bioinformatics 16, 178–179 (2000).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19, 705–710 (2022).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184 (2020).
Acknowledgements
The authors thank the Sequencing and Discovery Genomics group at the Stowers Institute and the National Institutes of Health Intramural Sequencing Center (NISC) for generating sequencing data; and S. Hawley, A. Ruiz-Herrera, S. Zanders, C. Langley and T. Hassold. J.L.G., T.P., L.G.d.L., C.S. and S.M. are supported by the Stowers Institute for Medical Research. J.L.G., T.P. and L.G.d.L. are also supported by NCI R01CA266339. This work was supported, in part, by the Intramural Research Program of the National Human Genome Research Institute, US National Institutes of Health. This work utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). E.G. and A.G. are supported by NIH R01HG013017, NIH U01DA057530, NSF 2118743 and the State of Tennessee’s Center for Integrative and Translational Genomics.
Author information
Authors and Affiliations
Contributions
J.L.G., A.M.P. and E.G. conceived the study. T.P., L.G.d.L., M.P., K.H., S.Y.B. and A.C.Y. performed all experiments. S.K. performed and analysed assemblies, L.G.d.L. made phylogenies and collected CUT&RUN and CUT&Tag data, T.P. performed microscopy and analysed images, A.G. analysed the pangenome and other genomics data, M.P., K.H., S.Y.B. and A.C.Y. performed sequencing. A.R., S.J.S., B.L.F., S.M., C.S., B.P.W., G.G.B., J.C. and B.D.P. analysed data. J.L.G., S.K., A.G., T.P. and L.G.d.L. wrote the manuscript with input from all authors. J.L.G., A.M.P. and E.G. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks André Marques and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Assembly graphs of the t(14;21)-bearing GM03417 cell line.
Assembly graph of GM03417 integrating PacBio and ONT reads visualized using Bandage (https://doi.org/10.1093/bioinformatics/btv383). Each chromosome is mostly resolved as a single connected component, with the exception of the acrocentric chromosomes (13, 14, 15, 21 and 22), which are joined by the highly similar rDNA arrays. The graph is colored by Hi-C information used to phase both haplotypes (haplotype 1, red; haplotype 2, blue). The inset shows a portion of the graph involving the acrocentric chromosomes, highlighting the graph node representing the Robertsonian translocation, which connects chromosomes 14 and 21.
Extended Data Fig. 2 Assembly graphs of the t(13;14)-bearing GM03786 cell line.
Assembly graph of GM03786 integrating PacBio and ONT reads visualized using Bandage. Each chromosome is mostly resolved as a single connected component, with the exception of the acrocentric chromosomes (13, 14, 15, 21 and 22), which are joined by the highly similar rDNA arrays. The graph is colored by Hi-C information used to phase both haplotypes (haplotype 1, red; haplotype 2, blue). The inset shows a portion of the graph involving the acrocentric chromosomes, highlighting the graph node representing the Robertsonian translocation, which connects chromosomes 13 and 14.
Extended Data Fig. 3 Assembly graphs of the t(13;14)-bearing GM04890 cell line.
Assembly graph of GM04890 integrating PacBio and ONT reads visualized using Bandage. Each chromosome is mostly resolved as a single connected component, with the exception of the acrocentric chromosomes (13, 14, 15, 21 and 22), which are joined by the highly similar rDNA arrays. The graph is colored by Hi-C information used to phase both haplotypes (haplotype 1, red; haplotype 2, blue). The inset shows a portion of the graph involving the acrocentric chromosomes, highlighting the graph node representing the Robertsonian translocation, which connects chromosomes 13 and 14.
Extended Data Fig. 4 Cytogenetic evidence for the structural changes in ROBs.
Fluorescence in situ hybridization (FISH) was used to visualize the arrangement of centromeres, SST1 and 45S rDNA arrays on acrocentric chromosomes in the ROB-bearing cell lines. A. Representative nuclei of ROB cell lines labeled by FISH with probes for Cen 14/22 (orange), Cen 13/21 (green) and DAPI. Bar, 10 µm. Magnified insets show centromeres of ROB chromosomes (bar 1 µm). B. Extended karyograms of GM03786, GM04890, and GM03417 cell lines labeled by FISH with SST1 probe (magenta) and whole chromosome paints (14 in orange, 13 and 21 in green). All acrocentric chromosomes and other chromosomes with detectable SST1 signals are shown. Acrocentric chromosomes lacking SST1 signal are denoted with blue numbers. DNA was counter-stained with DAPI. C. Extended karyograms of acrocentric chromosomes only labeled by FISH with rDNA probe (red), whole chromosome paints, and DAPI. Note lack of rDNA signal on ROBs in all cell lines.
Extended Data Fig. 5 SST1 array deletion on chromosome 14.
Alignment of chromosome 14 contigs from 29 haplotypes, including 26 from the Human Pangenome Reference Consortium (HPRC) year 1 dataset and 3 from ROB cell lines (GM04890, GM03417, and GM03786, highlighted in bold). Each row represents a distinct haplotype aligned against the T2T-CHM13 reference chromosome 14 (bottom track). Gold bars indicate aligned regions, while gaps represent absent and unaligned sequences. The vertical black line marks the position of the SST1 array within the Pseudo-Homolog Region (PHR). The bottom track displays the CHM13 centromeric satellite annotation, with different colors representing various satellite families. Approximately 34.5% (10 out of 29) of the analyzed chromosome 14 haplotypes lack the SST1-containing PHR, including the non-ROB chromosome 14 in the GM03786 cell line. Haplotypes are clustered based on sequence similarity to better visualize the deletion pattern. The x-axis shows the genomic position in megabases (Mb).
Extended Data Fig. 6 Schematic representation of common ROB formation.
Top panel: Structure of the short arms of chromosomes 13/21 and chromosome 14 before fusion. Centromeric, rDNA, and SST1 arrays are highlighted. Note the inverted orientation of the SST1 type 1 array on 14 relative to 13/21. Middle panel: Alignment of 13/21 and 14 during meiosis, showing potential recombination within the SST1 type 1 array region. Bottom panel: Resulting ROB structure after fusion. The ROB retains two centromeres and the SST1 arrays from both parent chromosomes. The central multicopy SST1 array is flanked by sequences from 13 and 14 or 14 and 21. The fusion is within the SST1 subfamily 1 array. The size of the array is variable. The SST1 type 2 L monomer is derived from 13 or 21, while the SST1 type 2S monomer is from 14. The rDNA regions are lost. Key features are color-coded: magenta for SST1 type 1 array, pink for SST1 type 2S monomer, blue for SST1 type 2 L monomer, and gray for centromeres and rDNA regions. Distances between features are indicated in kilobases (kb).
Extended Data Fig. 7 Sequence alignment comparison among SST1 monomer consensuses from each chromosome in CHM13 with a major array.
Each row corresponds to the consensus sequence for monomers of SST1 derived from major arrays on chromosomes 13, 14, 21, 17, 19, and 4. The consensuses from chromosomes 13, 14, and 21 (gray labels) have in common a large gap in the middle of the macrosatellite as well as a predicted PRDM9 DNA binding site (red). Darker shades indicate more conservation.
Extended Data Fig. 8 CENP-A and NDC80 localization on the dicentric ROB chromosome in GM03417.
A-B. Representative SIM images of ROB chromosome (A) and normal copies of chromosomes 14 and 21 (B) labeled by immuno-FISH with centromeric satellite probes for Cen 14/22 (orange), Cen 13/21 (green), and anti-CENP-A antibody (white). DNA was counterstained with DAPI. Plots below show averaged line intensity profiles along the individual kinetochore regions of sister chromatids normalized to the maximum intensity of each channel for 20 ROBs (A) and 11 normal copies of chr.14 and chr.21 (B). Error bars denote standard deviations. C. Example SIM images of four individual ROB chromosomes labeled by immuno-FISH with a probe for Cen 13/21 (green), and antibodies against CENP-A (white) and NDC80 (magenta). Magnified insets show centromeric regions labeled with each antibody individually. Line scans (1 and 2) along kinetochores of sister chromatids show normalized fluorescence intensity profiles for CENP-A (blue) and NDC80 (magenta), illustrating variations between kinetochores. D. SIM images and corresponding line scans of representative normal copies of acrocentric chromosomes 13 and 21 labeled and analyzed as in (C). Scale bar in all images is 1 µm.
Extended Data Fig. 9 CENP-A CUT&Run on the ROB chromosomes.
Tracks displaying CENP_A enrichment in three ROBs (A) GM03786, (B) GM04890, and (C) GM03417. From top to bottom, tracks denote the two α-satellite arrays with centromeric regions colored by chromosome, CENP-A CUT&RUN enrichment (blue) (two replicates), IgG negative control, Input coverage, and CpG methylation profile. DNA methylation tracks show methylation calls from ONT (orange) or PacBio HiFi (light blue) sequencing. CENP-A enrichment is associated with a dip in CpG methylation (CDRs).
Extended Data Fig. 10 Imaging and genomic analysis of chromosomes 13, 14, and 21 in assembled genomes.
ImmunoFISH, DNA methylation, and CENP-A CUT&RUN, and CUT&Tag analysis of normal copies of acrocentric chromosomes from (A) GM03786, (B) GM04890 and (C) GM03417 cell lines are shown. The left panels show representative structured illumination super-resolution images of normal acrocentric chromosomes labeled by immuno-FISH with centromeric satellite probes for Cen 14/22 (orange), Cen 13/21 (green), and anti-CENP-C antibody (red). DNA was counterstained with DAPI. Scale bar is 1 µm. Note single CENP-C foci on corresponding centromeres. Plots below show averaged intensity profiles of lines drawn through the individual kinetochore regions of sister chromatids of multiple chromosomes: GM03786 n = 22 for each chromosome (11 chromosomes 13 and 11 chromosomes 14), GM04890 n = 24 for each chromosome (12 chromosomes 13 and 12 chromosomes 14), GM03417 n = 24 for each chromosome (12 chromosomes14 and 12 chromosomes 21). Intensity profiles were aligned to the peak of the Gaussian of the corresponding Cen signals and normalized to the maximum intensity of each channel. Error bars denote standard deviations. The right panels display corresponding heatmaps of sequence similarity calculated for 5 kb bins for each centromere. Below the heatmaps, DNA methylation tracks show methylation calls from ONT (orange) or PacBio HiFi (blue) sequencing. Hypomethylated regions correspond to CENP-A enrichment on CUT&RUN (blue) and CUT&Tag (black) tracks below, indicating active centromere location.
Supplementary information
Supplementary Information
Supplementary Figs. 1–7 and Supplementary Table 1
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
de Lima, L.G., Guarracino, A., Koren, S. et al. The formation and propagation of human Robertsonian chromosomes. Nature (2025). https://doi.org/10.1038/s41586-025-09540-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41586-025-09540-8