- Research
- Open access
- Published:
Regulatory roles of three-dimensional structures of chromatin domains
Genome Biology volume 26, Article number: 184 (2025)
Abstract
Background
Transcriptional enhancers usually, but not always, regulate genes within the same topologically associating domain (TAD). We hypothesize that this incomplete insulation is partially due to three-dimensional structures of corresponding chromatin domains in individual cells: whereas enhancers and genes buried inside the core of a domain interact mostly with other regions in the same domain, those on the surface can more easily interact with the outside.
Results
Here we show that a simple measure, the intra-TAD ratio, can quantify the coreness of a region with respect to the single-cell domains to which it belongs. We show that domain surfaces are permissive for high gene expression. Cell type-specific active cis-regulatory elements, active histone marks, and transcription factor binding sites are enriched on domain surfaces, most strongly in chromatin subcompartments typically considered inactive.
Conclusions
These findings suggest a model of gene regulation that involves positioning active cis-regulatory elements on domain surfaces. We also find that disease-associated non-coding variants are enriched on domain surfaces.
Background
Mammalian genomes have a complex three-dimensional (3D) architecture, such that meter-long DNA sequences can be compactly fitted into the micrometer-scale nucleus. The 3D genome architecture enables loci far apart on the same chromosome or even different chromosomes to come to physical proximity, which is important at various functional levels [1,2,3,4,5,6].
Understandings of chromatin structures have been greatly expanded by two complementary classes of methods, respectively, based on imaging and chromosome conformation capture (3C) [7, 8]. Imaging methods enable direct visualization of spatial organization of whole chromosomes or individually labeled DNA regions within a single cell. While some existing imaging methods are highly advanced in visualizing large-scale chromosome organizations [9,10,11] and the simultaneous spatial distribution of RNA species [12,13,14], high-resolution imaging (with \(\le\)50 kb consecutive probes) of whole mammalian genomes with the genomic locations of all imaged loci resolved is still not achieved. In contrast, high-throughput 3C-based methods, such as Hi-C [15, 16], can detect chromatin interactions in the whole genome at high resolution, but these methods provide only information about pairs of genomic loci in close proximity. More recent “multiplex” methods such as Chromatin-Interaction Analysis via Droplet-based and Barcode-linked Sequencing (ChIA-Drop) [17] and Split-Pool Recognition of Interactions by Tag Extension (SPRITE) [18] can identify chromatin interactions between multiple (\(\ge\)2) loci simultaneously, but they still do not provide spatial coordinates of genomic regions directly as imaging methods do. It is therefore necessary to combine imaging and 3C-based data to gain a more holistic view of genome organization.
From Hi-C data, multiple types of structural components that constitute the 3D genome architecture have been observed, including compartments and subcompartments of megabases long on average, topologically associating domains (TADs) of tens to hundreds of kilobases long on average, and individual chromatin loops [19]. TADs are consecutive genomic regions with a clear enrichment of chromatin interactions among loci within the TAD as compared to the background distribution [20, 21], and they can be identified using various methods [22]. Previous studies have proposed that TADs are formed through loop extrusion, where loop-extruding factors form progressively larger loops until stalled by boundary proteins, such as CTCF (CCCTC-binding factor) [23, 24]. Regions near TAD boundaries usually have properties different from other regions within the TAD, such as enrichment of housekeeping genes and presence of a pair of convergent CTCF binding sites [20, 24]. Functionally, promoters tend to interact with transcriptional regulatory elements such as enhancers within the same TAD, but this insulation is incomplete as revealed by high-resolution promoter interaction profiling [25,26,27].
For Hi-C experiments performed on a population of cells in a bulk sample, the resulting data only reflect average interaction frequencies across the cells. In contrast, single-cell Hi-C (scHi-C) and single-nucleus Hi-C (snHi-C) [28,29,30] detect chromatin interactions in individual cells. Despite challenges in data interpretation due to dropout and low signal-to-noise ratio caused by data sparsity, TAD-like domains have been also observed in individual cells. With the help of data imputation, variability of domain boundary appearance and location in individual cells has been quantified [31]. In corroboration with these observations from sequencing data, super-resolution imaging has also shown that TAD-like chromatin domains at the mesoscale (hundreds of nanometers wide) appear in single cells [9, 32, 33], which roughly match the size of TADs. However, the boundaries of these chromatin domains vary across single cells and TAD boundaries appear to be statistical constructs that emerge by aggregating across single cells [9, 29, 34,35,36].
A potential molecular mechanism of TAD-like domains in individual single cells (to be referred to as “scDomains” hereafter) emerged from the observation in imaging studies that these domains can take a somewhat globular structure in the 3D space [9, 33, 37, 38]. Quantified by either distance to inter-chromatin compartment or voxel intensity, scDomains can be divided into exterior perichromatin regions and interior core regions [32, 33]. As compared to core regions, perichromatin regions are enriched for transcriptional activities and active histone marks, and depleted of repressive histone marks [32, 33]. These observations suggest that the 3D structures of scDomains may have functional significance.
Here we hypothesize that due to the 3D structures of scDomains, genomic regions closer to the “core” are more isolated from loci outside the domain, while those closer to the “surface” are more exposed to outside loci and thus have a higher chance of interacting with them (Fig. 1a). We further hypothesize that despite cell-to-cell variability, information about this core-versus-surface distinction is partially preserved in TADs detected from bulk Hi-C data, which contributes to their incomplete insulation. We propose that the “coreness” of a region with respect to the 3D structure of a scDomain can be approximated from 3C-based and multiplex data, and it negatively correlates with the region’s molecular activities by affecting its opportunities to interact with outside protein factors.
Calculation of intra-scDomain ratio and intra-TAD ratio using different types of data. a Schematic figure showing how the proportions of intra- and inter-scDomain interactions are related to the location of a genomic locus relative to the 3D structure of its scDomain. b Distribution of Pearson correlations between intra-TAD ratio and average intra-scDomain ratio in x single cells of IMR90 cells, for x equal to 10–990. Intra-TAD ratio was calculated using bulk Hi-C data [16]. Intra-scDomain ratio was calculated using single-cell imaging data [13]. For each value of x, we first sub-sampled x single cells and computed the average intra-scDomain ratio of each 50 kb region over these cells. Then, the vector of these average intra-scDomain ratios and the corresponding vector of intra-TAD ratios of the same 50 kb regions were taken to compute a correlation. Finally, by repeating the sub-sampling of x cells 100 times, we obtained a distribution of correlation values. c Violin plots comparing the intra-scDomain ratios of regions on the convex hulls of scDomains (vertices) and other regions (non-vertices) defined using single-cell imaging data in IMR90 cells [13]. d Scatterplots comparing intra-TAD ratios calculated from SPRITE data [18] with those calculated from bulk Hi-C [16], scHi-C [34, 39], and structural models [34, 40]. Black curves represent moving averages. PCC, Pearson correlation coefficient; SCC, Spearman correlation coefficient
Results
TAD-like domains in single cells exhibit 3D structural tendencies despite cell-to-cell variability
Considering the cell-to-cell variability of scDomain boundaries, it is intriguing that TAD boundaries are usually fairly obvious in bulk Hi-C contact maps. To quantify the similarity between scDomains in different single cells and the corresponding bulk-aggregated TADs, we took a whole-chromosome tracing data set with individual 50 kb regions on chromosome 21 of human IMR90 cells sequentially imaged at high resolution [13]. In each cell, scDomains were identified by the original authors based on the imaging data using an insulation approach [13]. For each pair of 50 kb genomic regions on chromosome 21 separated by a certain distance (from 100 to 500 kb), we labeled it as either “Same TAD” or “Different TADs” based on whether they belong to the same TAD called from IMR90 bulk Hi-C data. We then determined the number of cells in which these two regions also belong to the same scDomain as defined by the imaging data. The results (Additional file 1: Fig. S1) show that consistently across all distance thresholds, region pairs from the same TAD are significantly more often to belong to the same scDomain than distance-matched region pairs not from the same TAD. These results show that in terms of one-dimensional (1D) boundaries, scDomains display location tendencies that are consistent with the clear TAD boundaries. This is in line with previous findings that although chromatin interactions are dynamic and vary across different cells, 1D TAD boundaries tend to be more stable [31, 41, 42].
In a similar manner, we explored whether scDomains also display some tendencies in terms of their 3D structures. Using the same imaging data set, we defined a spatial neighborhood for each 50 kb region (the “starting region”) in each cell based on its 3D coordinates in the cell and then computed the fraction of regions within this neighborhood coming from the same scDomain as the starting region (Methods section). For a starting region buried inside the core of a scDomain 3D structure that is isolated from outside, this “intra-scDomain ratio” should be high (closer to 1); in contrast, for a starting region residing on the surface of a scDomain 3D structure that interacts with outside regions, its intra-scDomain ratio should be low (closer to 0) (Fig. 1a). Analogously, for each 50 kb region, we computed an “intra-TAD ratio” as the proportion of its bulk Hi-C contacts coming from the same TAD. We found that the intra-scDomain ratio of a 50 kb region averaged over a certain number of single cells correlates positively with its intra-TAD ratio, and the correlation reaches its maximum when averaging over 200 or more single cells consistently for TADs called by three different methods [20, 43, 44] (Fig. 1b, Additional file 1: Fig. S2a–c). These results show that despite cell-to-cell variability, the degree of insulation of a region provided by the 3D structure of its scDomain also displays some tendencies that are partially captured by bulk Hi-C data. As a control, intra-TAD ratios computed from randomly shuffled TADs correlated much less strongly with average intra-scDomain ratios (Methods section, Additional file 1: Fig. S3), which further supports that real TADs capture information about 3D structures of scDomains.
To further establish the relationship between intra-scDomain ratio and the “coreness” of a region relative to the 3D structure of its scDomain, we considered shape-related features. It has been shown by super-resolution imaging that TAD-like domains in single cells can take mostly globular shapes that are sometimes elongated [33], but a systematic analysis of the shapes of many scDomains has been missing. To avoid making any assumptions about scDomain shapes, we determined the convex hull of each scDomain based on the 3D coordinates of its constituent 50 kb regions. Regardless of the actual shape of a scDomain, regions that define the convex hull (called the vertices) should be residing on the exterior of the 3D structure, while non-vertices can be on the exterior or in the interior. If intra-scDomain ratio represents the coreness of a region, the vertex regions should have lower intra-scDomain ratios than non-vertices on average. This was indeed the case based on the imaging data (Fig. 1c), even when the chromosome compartment or size of the scDomains is controlled for (Additional file 1: Fig. S4).
Collectively, these results show that a genomic region’s coreness with respect to the 3D structure of its scDomain in individual cells exhibits some level of consistency across cells and this tendency is partially captured by the intra-TAD ratio computed from bulk Hi-C data.
Intra-TAD ratios computed from different types of data show consistency
In order to study functional roles of 3D structures of scDomains, we need the capability to estimate the coreness of a genomic region relative to its scDomain in a variety of cell types with available functional data. On the basis that intra-TAD ratios computed from bulk Hi-C data correlate with average intra-scDomain ratios in IMR90 cells (Fig. 1b, Additional file 1: Fig. S2), we further explored the possibility to compute similar intra-TAD ratios using other types of data.
Considering the ability to capture multiplex interactions from individual cells, we first explored SPRITE data. We computed intra-TAD ratios for all equal-sized genomic bins that tile the whole genome based on SPRITE data produced from the GM12878 human lymphoblastoid cells and mouse embryonic stem cells (mESCs) [18]. In general, the distribution of intra-TAD ratios computed from SPRITE data remains highly stable for different sizes of the genomic bins (Additional file 1: Fig. S5). We developed and applied a data pre-processing procedure to ensure the robustness of the intra-TAD ratios and their analyses (Methods section). Since TAD boundaries approximate average scDomain boundaries (Additional file 1: Fig. S1), and a 1D boundary of a scDomain is expected to be on the exterior of its 3D structure (for otherwise the segment between this “boundary” and the surface of the structure should also belong to this scDomain, thus invalidating the region as the boundary), we would expect intra-TAD ratios to be lower at TAD boundaries. Indeed, we observed that bins overlapping 1D TAD boundaries have significantly lower intra-TAD ratios than other bins within TADs, in both GM12878 and mESC (Additional file 1: Fig. S6a–d, i–l) (\(p<2.23 \times 10^{-308}\) in all cases, two-sided Mann-Whitney U test). Importantly, in addition to TAD boundaries, many other bins also had low intra-TAD ratios, including some “valleys” with particularly low intra-TAD ratios (Additional file 1: Fig. S6e–h, m–p), which likely correspond to genomic loci that tend to stay on domain surfaces. Some of these valleys overlap genes with high expression levels and regions with active chromatin signals, suggesting their potential functional significance (Additional file 1: Fig. S7).
Next, we compared intra-TAD ratios computed from SPRITE and bulk Hi-C data. Since SPRITE can detect longer-distance chromatin interactions among regions in each barcoded complex [18], conceptually the intra-TAD ratios computed from SPRITE and bulk Hi-C data can be different (Additional file 1: Fig. S8). However, we still found their values highly consistent in the actual data (Pearson correlation coefficient (PCC) = 0.83 and Spearman correlation coefficient (SCC) = 0.79 for GM12878; PCC = 0.82 and SCC = 0.82 for mESC) (Fig. 1d), suggesting that 3D structures of scDomains are consistently captured by both multiplex SPRITE data and pairwise Hi-C data.
We also computed intra-TAD ratios using scHi-C data produced from GM12878 cells [34] and mESCs [39]. Despite the sparsity of scHi-C data, intra-TAD ratios computed from SPRITE and scHi-C data still correlated positively (Fig. 1d). We further used structural models derived from scHi-C data of GM12878 [34] and mESC [40] to determine the fraction of proximal loci of each genomic bin that are within the same TAD in the structural models, which can be considered an independent way of computing intra-TAD ratio. These structurally derived intra-TAD ratios were found to be highly correlated with the intra-TAD ratios computed from SPRITE data (PCC = 0.64 and SCC = 0.66 for GM12878; PCC = 0.63 and SCC = 0.64 for mESC) (Fig. 1d). When intra-TAD ratios computed from SPRITE data were used to color genomic loci in the structural models, the interior regions of TADs were indeed enriched for colors of large intra-TAD ratios (Additional file 1: Fig. S9).
In the above comparisons, for a region involved in multiple SPRITE clusters, we computed its average intra-TAD ratio across these clusters. We also repeated the same comparisons based on data from individual SPRITE clusters without averaging over different clusters (Additional file 1: Fig. S10). The resulting intra-TAD ratios computed from SPRITE data still correlate quite strongly with those computed from bulk Hi-C data, scHi-C data, and structural models. In addition, given the variability of scDomains across single-cells, loci from two adjacent TADs could appear in the same scDomain frequently. To make sure that the intra-TAD ratio calculation is not dominated by such adjacent TADs, we checked the SPRITE data and found that contacts among loci from adjacent TADs constitute only a small proportion of all the inter-TAD contacts (Additional file 1: Fig. S11a, c). Besides, excluding contacts from adjacent TADs resulted in intra-TAD ratios highly correlated with the original intra-TAD ratios (Additional file 1: Fig. S11b, d), thus confirming that they are not dominated by contacts from adjacent TADs.
Taken together, these results show that the intra-TAD ratio, as a proxy of intra-scDomain ratio, can be computed using a variety of data. Considering the multiplex nature of SPRITE, in the following analyses we compute intra-TAD ratios using SPRITE data unless otherwise stated.
3D structural properties of chromatin domains differ across chromosome subcompartments
Previous studies have classified genomic regions roughly into the A (active) and B (inactive) compartments [15], and further divided them into different subcompartments [16, 45]. These subcompartments have been shown to be associated with various epigenomic signals and molecular activities, such as gene expression, histone modifications, and DNA replication timing [16, 45]. To see if the 3D structures of chromatin domains in different subcompartments have different properties, we compared the distributions of intra-TAD ratios of genomic bins in five subcompartments in GM12878 (A1, A2, B1, B2, and B3) [45]. We found that the intra-TAD ratio has different distributions in the different subcompartments, generally larger in the B2 and B3 subcompartments than in the A1, A2, and B1 subcompartments (Additional file 1: Fig. S12a). This could be related to previous observations that TAD-like domains in A1 and A2 are more dispersed [16, 37, 46], which allow core regions in these domains to be more accessible to regions outside the TAD and thus their lower intra-TAD ratios. We also found that TADs in B2 and B3 tend to be larger than TADs in the other subcompartments in terms of 1D genomic span (Additional file 1: Fig. S12b), which may also contribute to their larger intra-TAD ratios since core regions in a large TAD are expected to be more insulated from outside. This is supported by the fact that intra-TAD ratio is positively correlated with TAD size in all subcompartments (Additional file 1: Fig. S13). A recent study has demonstrated that DNA in smaller nucleosome clutches is more accessible and active, which is in line with our finding here since TADs may represent clusters of clutches [46]. The generally smaller TADs in the B1 subcompartment as compared to those in B2 and B3 likely explain their different distributions of intra-TAD ratios despite all three subcompartments belong to the inactive B compartment.
Genes on chromatin domain surfaces tend to have higher expression
Chromatin structures and gene regulation are coordinated during cell differentiation and other biological processes [2,3,4, 47], but how such an interplay is mediated by the 3D structures of chromatin domains has not been investigated in detail. To study whether 3D structures of chromatin domains are related to gene expression, we downloaded RNA-seq data produced from GM12878 cells and mESCs by ENCODE [48]. As expected, genes in the A compartment have significantly higher expression than those in the B compartment (Additional file 1: Fig. S14). In both GM12878 cells and mESCs, partitioning genes into groups based on intra-TAD ratios of their transcription start sites (TSSs), there is a trend for genes with larger intra-TAD ratios to have lower expression levels (Fig. 2a, Additional file 1: Fig. S15a). The same negative correlation is also observed when the SPRITE cluster size or the genomic bin size is changed to other values (Additional file 1: Fig. S16), or when TADs are called by other methods [43, 44] (Additional file 1: Fig. S17). In contrast, the negative correlation largely disappears when the locations of TADs are randomly shuffled (Additional file 1: Fig. S18). These findings suggest that genes with high expression levels tend to reside on domain surfaces. As to be shown below, this may make these genes more exposed to interactions coming from outside the domain such as transcription factor binding.
Inverse relationship between intra-TAD ratio and gene expression in GM12878 cells. a Violin plots of gene expression (\(\text {log}_{2}\)(FPKM + 0.01)) of TSS bins in groups with increasing intra-TAD ratio and similar number of bins. The blue and red lines connect median and mean values of the different groups, respectively. b Violin plots of intra-TAD ratios of the highly expressed bins and lowly expressed bins, defined as the genomic bins containing TSSs of genes whose FPKM values rank within top and bottom 1000, respectively. c Violin plots of intra-TAD ratio of housekeeping genes and non-housekeeping genes. d Distribution of Spearman correlations between gene expression and intra-TAD ratio of genomic bins in individual TADs. Black dotted curve shows the fitted normal distribution. The vertical black dotted line shows the mean of Spearman correlations. The vertical red dotted line shows zero Spearman correlation. e Violin plots of gene expression levels (\(\text {log}_{2}\)(FPKM + 0.01)) in domain surface TSS bins and domain core TSS bins. p values in all panels are calculated using the two-sided Mann-Whitney U test
To further characterize this dependency, we compared the most highly expressed genes (top 1000) and the most lowly expressed genes (bottom 1000), and found a significant difference between their intra-TAD ratios (Fig. 2b, Additional file 1: Fig. S15b). As a specific example, we found that housekeeping genes [49], which tend to have high expression, have significantly smaller intra-TAD ratios (Fig. 2c, Additional file 1: Fig. S15c), even when TAD 1D boundaries and their neighboring regions were excluded (Additional file 1: Fig. S19).
To rule out the possibility that the inverse relationship between intra-TAD ratio and gene expression is simply caused by TAD size or differences between different chromosome subcompartments, we computed the correlation between intra-TAD ratio and gene expression level across different genes in the same TAD for each TAD containing at least 5 genes. In this case, any effects due to TAD size and chromosome subcompartment would be eliminated. The resulting distribution of correlation values collected from these TADs has a mean significantly smaller than zero (Fig. 2d, Additional file 1: Fig. S15d, \(p=4.61 \times 10^{-89}\) for GM12878 and \(p=3.59 \times 10^{-200}\) for mESC, two-sided t-test). As a further confirmation, when we plotted these correlation values against TAD sizes for each chromosome subcompartment separately, we did not observe any significant correlation between them (Additional file 1: Fig. S20). These results show that the inverse relationship between gene expression level and intra-TAD ratio is not an artifact caused by TAD size or subcompartment.
We also used an intra-TAD ratio threshold to classify genes into “domain core genes” and “domain surface genes” (Methods section), and found that domain surface genes have significantly higher expression levels than domain core genes (\(p=3.06 \times 10^{-252}\) for GM12878 and \(p<2.23 \times 10^{-308}\) for mESC, two-sided Mann-Whitney U test) (Fig. 2e, Additional file 1: Fig. S15e).
We repeated the above analyses in different subcompartments separately (subcompartments call available only in GM12878) and reached the same conclusions highly consistently across subcompartments (Additional file 1: Fig. S21), even when 1D TAD boundaries were excluded (Additional file 1: Fig. S22). Further, to ensure that the above results are not dominated by lowly expressed genes, we repeated the analyses but excluded TSS bins with extremely low expression (FPKM < 0.02). The overall trends are highly consistent with those considering all genes (Additional file 1: Fig. S23).
All these results confirm that genes with higher expression generally have a lower intra-TAD ratio.
Chromatin signals are correlated with chromatin domain 3D structures
Chromatin signals such as chromatin accessibility and histone modifications have been shown to characterize chromosome subcompartments [16, 45], but whether they are associated with chromatin domain 3D structures remains unclear. We computed the correlation between intra-TAD ratio and each of 12 chromatin signals in GM12878, including chromatin accessibility signals from ATAC-seq and enrichment of 11 histone marks from ChIP-seq [48, 50] (Fig. 3a). When considering all genomic bins together, ATAC-seq signals (Fig. 3a, b) and most active histone marks (Fig. 3a, c) are negatively correlated with intra-TAD ratio, while some repressive histone marks are positively correlated with intra-TAD ratio (Fig. 3a, d). When considering genomic bins of different chromosome subcompartments separately, stronger correlations are seen in the B2 and B3 subcompartments (Fig. 3a). These observations are consistent with the negative correlation between gene expression and intra-TAD ratio shown above that active chromatin signals are enriched on domain surfaces. Similar trends are also observed from the correlations between epigenomic signals and intra-TAD ratio in mESCs in the A and B compartments (Additional file 1: Fig. S24). Excluding TAD boundaries and their neighboring regions does not affect the correlation trends (Additional file 1: Fig. S25). In contrast, when repeating the calculations with randomly shuffled TADs, the correlations are much weaker (Additional file 1: Fig. S26).
Correlation between intra-TAD ratio and chromatin signals. a Spearman correlation between intra-TAD ratio and ATAC-seq signals and 11 histone modifications across either all genomic bins or only bins in a genomic subcompartment. The columns are sorted by the correlation values when all TADs are considered (in the “All” row). Violin plots of ATAC-seq signals (b), H3K4me1 (c), and H3K9me3 (d) in groups with increasing intra-TAD ratio and similar number of bins. y-axis shows \(\text {log}_{2}\)(signal + 0.01). The blue and red lines connect median and mean values of the different groups, respectively
Interestingly, the correlation between the repressive mark H3K27me3 and intra-TAD ratio is slightly positive in A1 and A2 but quite negative in B2 and B3 (Fig. 3a, Additional file 1: Fig. S27). This is different from the universal positive correlation between the repressive mark H3K9me3 and intra-TAD ratio across all subcompartments. This may be related to differences between facultative and constitutive heterochromatin that H3K27me3-marked regions can more easily be reactivated if they are on domain surfaces. In addition, this may also be due to the different molecular weights (MW) of the enzyme complexes that deposit H3K27me3 and H3K9me3, as previously suggested [33]. The H3K27me3 modifier polycomb repressive complex 2 (PRC2) (MW: \(\sim\)300 kDa) has a larger molecular weight than H3K9me3 modifiers such as SETDB1 (MW: \(\sim\)140 kDa), SETDB2 (MW: \(\sim\)80 kDa), SUV39H1 (MW: \(\sim\)50 kDa), and SUV39H2(MW: \(\sim\)50 kDa). As a result, the large TADs in B2 and B3 subcompartments may impose stronger hindrance to the deposition of H3K27me3 to their core regions than to H3K9me3.
To further explore the effect of H3K9me3 and H3K27me3 on intra-TAD ratio, we built a linear regression model with H3K9me3 signal, H3K27me3 signal, and the interaction term between them as the explanatory variables, and intra-TAD ratio as the response variable. Consistent with the correlation analysis above, both H3K9me3 and H3K27me3 had positive model coefficients in the A1, A2, and B1 subcompartments, while the coefficients of H3K27me3 were much more negative than H3K9me3 in the B2 and B3 subcompartments (Additional file 1: Fig. S28). Surprisingly, the coefficients of the interaction term were consistently opposite in sign to the coefficients of the two independent variables, suggesting that the distribution of the two marks when they appear together with respect to chromatin domain 3D structures is different from the distribution when they appear separately. The exact reason for this discrepancy requires further investigations.
Cell type-specific cis-regulatory elements and transcription factors are enriched on chromatin domain surfaces
As shown above, despite the different properties of individual chromatin domains, domain surfaces are generally more active than domain cores, especially for the domains in the B2 and B3 subcompartments. We hypothesized two alternative models for the domain surface genes to be regulated by cis-regulatory elements (CREs) in the same domain, namely having the CREs (such as enhancers) also residing on domain surfaces or being buried in the domain cores (Fig. 4a). The former, “domain surface CRE” model, provides a better explanation for the CREs to be easily accessible by transcription factors, while the latter, “domain core CRE” model, provides a better explanation for the insulation of the CREs in a TAD from regulating genes outside the TAD. To test these models, we downloaded different categories of candidate CREs (cCREs) active in GM12878 from ENCODE [51] and examined their locations with respect to chromatin domain 3D structures based on intra-TAD ratios (Methods section). Several categories of cCREs were found enriched on domain surfaces (Fig. 4b) and depleted from domain cores (Additional file 1: Fig. S29a), most notably proximal enhancer-like signatures (pELS) and promoter-like signatures (PLS). This is not simply due to 1D TAD boundaries, as the enrichment of cCREs on domain surfaces and depletion of them from domain cores still hold when cCREs at and near TAD boundaries are excluded from the analysis (Fig. 4c, Additional file 1: Fig. S29b). To check whether the enrichment of cCREs on domain surfaces is related to cell type-specific activities or it simply reflects the locations of CREs regardless of their activities, we also downloaded cell type-agnostic cCREs and cCREs active in other cell types (Methods section). When cCREs active in GM12878 were excluded from these sets, the remaining cCREs were no longer enriched on the surfaces of GM12878 domains (Fig. 4d). Together, these results demonstrate that cCREs are enriched on domain surfaces dependent on their activities in a cell type-specific manner.
Enrichment of CREs and transcription factors on chromatin domain surfaces. a A schematic diagram showing the “domain surface CRE” model and the “domain core CRE” model. b Fold of enrichment of cCREs on domain surfaces in GM12878. c Fold of enrichment of cCREs on domain surfaces in GM12878 when regions around TAD boundaries are excluded. d Fold of enrichment of cCREs on domain surfaces when the cCREs are cell type-agnostic or defined in other cell types and the GM12878-specific cCREs are excluded. e Heatmap showing log2(enrichment fold) of transcription factor binding sites on domain surfaces and in domain cores in GM12878. cCREs All, all categories of cCREs; dELS, distal enhancer-like signatures; pELS, proximal enhancer-like signatures; PLS, promoter-like signatures; CTCF-bound, bound by CTCF
To see whether residing on domain surfaces allows CREs to be easily bound by transcription factors, we next examined the enrichment of cell type-specific transcription factor binding sites on domain surfaces and in domain cores, based on ChIP-seq peaks of 153 TFs in GM12878 produced by ENCODE [48, 50]. Strikingly, the binding sites of almost all the TFs are enriched on domain surfaces and depleted from domain cores (Fig. 4e, Additional file 1: Fig. S30), consistent with the finding that enhancers and promoters are enriched on domain surfaces. The two most prominent exceptions are CBX3, which is associated with heterochromatin marked with H3K9me3, and EZH2, which is the enzymatic subunit of the polycomb repressive complex 2 that catalyzes H3K27me3. In contrast, the binding sites of the lysine acetyltransferase KAT2A, which is involved in the acytelation of H3K9, are strongly enriched on domain surfaces.
Again, the same patterns are still observed even when TAD boundaries are excluded (Additional file 1: Fig. S31) but they become much weaker with randomly shuffled TADs (Additional file 1: Fig. S32), thus confirming the robustness of the results.
To check whether the enrichment of TFs on domain surfaces is cell type-specific, we also downloaded ChIP-seq peak locations of TFs in four other cell types. When the TF peaks in GM12878 were excluded from these sets, the remaining peaks had reduced enrichment on the GM12878 domain surfaces. Furthermore, when the cCREs in GM12878 were also excluded from these sets, the enrichment largely disappeared (Additional file 1: Figs. S33–S36). These results further support the “domain surface CRE” model that the localization of CREs on domain surfaces allows TFs to bind them more easily in a cell type-specific manner.
Chromatin domain 3D structures correlate with replication timing and nuclear spatial compartmentalization
Previous studies have demonstrated that DNA replication timing has strong association with spatial genome organization. In general, loci in the active A compartment tend to replicate early, whereas loci in the repressive B compartment tend to replicate late [52]. Although TAD boundaries often align with boundaries of replication domains, the relationship between chromatin domain 3D structures and replication timing remains obscure [52].
To see whether replication timing is correlated with chromatin domain 3D structures, we analyzed Repli-seq data produced from GM12878 cells by ENCODE [48] and found that replication timing signal (higher value corresponding to earlier replication) negatively correlated with intra-TAD ratio when considering all the TADs genome-wide, and even more so for the TADs in the B subcompartments (Fig. 5a, Additional file 1: Fig. S37). The negative correlation was also observed when TAD boundaries were excluded from the analysis (Additional file 1: Fig. S38). These results suggest that 3D structures of chromatin domains play a stronger role in replication timing for genomic loci in the B subcompartments and that early replication tends to occur in regions on the surfaces of these domains.
Intra-TAD ratio correlates with replication timing, nuclear compartments, and disease variants. a Replication timing is correlated with intra-TAD ratio. Left panel: Spearman correlation between intra-TAD ratio and Repli-seq signals; right panel: violin plots of Repli-seq signals in groups with increasing intra-TAD ratio and similar number of bins. The blue and red lines connect median and mean values of the different groups, respectively. b Distribution of intra-TAD ratios of genomic bins in individual SPIN states. Speckle: genomic regions predicted to be associated with nuclear speckles; Interior_Act1, Interior_Act2, Interior_Act3: predicted active interior regions; Interior_Repr1, Interior_Repr2: predicted repressive interior regions; Near_Lm1, Near_Lm2: regions predicted to be near nuclear lamina; Lamina_Like and Lamina: regions predicted to be associated with nuclear lamina. c Violin plots of intra-TAD ratio in GM12878 of lymphoma-related non-coding variants and non-lymphoma-related non-coding bins
Nuclear compartments, such as nuclear speckles and lamina, participate in chromosome packaging and potentially associate with genome function [18, 52, 53]. Given the correlation between chromatin domain 3D structure and replication timing, we wondered whether chromatin domain 3D structure is also related to nuclear compartments. A recent study has identified several chromatin states (called Spatial Position Inference of the Nuclear genome, “SPIN” states) that correspond to nuclear compartmentalization [54]. We compared the intra-TAD ratios of genomic loci belonging to different SPIN states in human K562 myelogenous leukemia cells (Fig. 5b). In general, the inactive and heterochromatic lamina and lamina-like regions have the highest intra-TAD ratios, significantly higher than the intra-TAD ratios of the active interior regions. It has been shown that an active inter-chromosomal interaction hub is organized around nuclear speckles [18]. We found that genomic regions near nuclear speckles have lower intra-TAD ratios than those close to the nuclear lamina but higher intra-TAD ratios than some active interior regions. This may be due to spatial constraints imposed on chromatin structures by the nuclear speckles, which require further investigations.
Chromatin domain 3D structure is related to disease-associated genetic variants
Beyond the correlation of chromatin domain 3D structures with the various molecular- and cellular-level features shown above, we further explored whether they are also related to higher-level phenotypes. Specifically, we studied the locations of disease-associated genetic variants relative to chromatin domain 3D structures. Genome-wide association studies (GWAS) have identified millions of variants associated with various diseases. A large proportion of these variants are outside coding sequences and tend to locate in the regulatory regions [55]. As such, they may tend to reside on domain surfaces in the relevant cell types. To test this hypothesis, we downloaded lymphoma-related variants from the GWAS Catalog [56]. We first found that these variants are strongly enriched in the cCREs in GM12878 (2.84-fold of enrichment, \(p=1.60 \times 10^{-35}\), hypergeometric test), as expected. We then specifically focused on the lymphoma-associated non-coding variants and found that they have significantly smaller intra-TAD ratios as compared to all other non-coding regions (Fig. 5c), indicating that indeed these variants tend to locate on domain surfaces in this B lymphocyte cell line.
Discussion
While previous imaging studies have shown that TAD-like domains in individual cells can take a somewhat globular structure in the 3D space [9, 33, 37, 38], this could be partially due to limited imaging resolution. Importantly, most of our analyses in the current study did not require the assumption of globular structures of scDomains. Instead, we only assumed that there are genomic regions that tend to stay on the surface of scDomain 3D structures (i.e., vertices in the convex hulls and regions with low intra-scDomain ratios and intra-TAD ratios) while some tend to stay in the core, which can also happen in scDomain structures that are not fully globular.
To ensure the robustness of our findings, we have repeated our analyses (i) using different SPRITE cluster sizes (Additional file 1: Fig. S16), (ii) using different genomic bin sizes (Additional file 1: Figs. S6, S16), (iii) excluding 1D TAD boundaries (Fig. 4c, Additional file 1: Figs. S19, S22, S25, S31, S38), (iv) using different TAD calling methods (Additional file 1: Figs. S2, S17), and (v) excluding genes with extremely low expression (Additional file 1: Fig. S23). All these analyses confirm that our major findings cannot be explained by these potential confounding factors. As a negative control, we also repeated the analyses using randomly shuffled TADs rather than real TADs (Additional file 1: Figs. S3b, c, S18, S26, S32) and found most of our findings abrogated in these cases, thus confirming that our findings were properties of real TADs identified from actual chromatin interaction data.
For the A compartment, since the chromatin domains are smaller in 1D genomic span and the chromatin is not highly packed, genomic regions in the core of a domain 3D structure and those on the surface may have similar access to transcription factors and epigenetic modifiers outside. In contrast, the core-versus-surface distinction is more critical for the B compartment, which has bigger chromatin domains and more packed chromatin. This is highly consistent with our observations that in most of our analyses, stronger functional correlations are observed in the B compartment, especially the B2 and B3 subcompartments.
The plots of gene expression against intra-TAD ratio (Fig. 2a, Additional file 1: Fig. S15a) show that there is a depletion of genes with both a high intra-TAD ratio and a high gene expression level. On the other hand, genes with a low intra-TAD ratio are not guaranteed to have a high expression level. This relationship resembles findings about promoter DNA methylation in gene expression control: whereas a gene with high promoter methylation is repressed, a gene with low promoter methylation is permitted to express but its expression level depends on other factors such as transcription factor binding [57]. Analogously, tendency to stay on the surface of chromatin domains, as indicated by a low intra-TAD ratio, appears to be a condition permissive for a gene to be highly expressed, but whether it actually has a high expression level depends on additional factors.
The strong enrichment of cell type-specific CREs and TF binding sites on domain surfaces led us to propose the “domain surface CRE” model, in which CREs and genes are all on domain surfaces, which make it easy for them to interact with each other and with transcription factors coming from outside the TAD. Indeed, we found that in general most TFs have their binding sites enriched on domain surfaces, although variations were observed across chromatin subcompartments.
One question associated with the “domain surface CRE” model is why CREs tend to regulate genes in the same TAD, given that they are also exposed to genes outside (Fig. 4a). One possibility is that genes in the same TAD as a CRE have more stable proximity with it, while the proximity between the CRE and genes outside the TAD could be more dynamic. As a result, a CRE can occasionally get into contact with genes outside its TAD when proximity requirements are satisfied, resulting in the incomplete insulation of CREs [27].
A recent study has proposed another model for CREs to regulate genes outside their domains, termed boundary stacking interactions [58]. In this model, 1D boundaries of different domains can become physically close to each other, which makes it easier for CREs close to a domain boundary to interact with genes close to a boundary of another domain. This model is consistent with our work since 1D domain boundaries are on the surface of domain 3D structures. On the other hand, our domain CRE model also explains cross-domain interactions that involve CREs not close to 1D domain boundaries.
Among the different types of functional correlations we have studied, one of the strongest is with replication timing (Fig. 5a), where regions with a low intra-TAD ratio tend to replicate early, most clearly in the B2 and B3 subcompartments (Additional file 1: Fig. S37). This suggests that DNA replication tends to start at regions on the surface of chromatin domain 3D structures. Some of these structures may remain partially maintained from the S phase to the G2 phase and get involved in transcriptional regulation. Our discovery aligns well with a recent study that used super-resolution imaging to show that replication initiation generally occurs at the spatial boundary of a TAD, with this spatial positioning presumably driven by transcription [59].
In this study, we have demonstrated the use of multiplex data from SPRITE to study chromatin domain 3D structures, which offer both semi-single-cell information (by having each cluster expected to come from a single cell but different clusters in the same data set coming from a population of cells) and longer-range interactions that are not captured by the proximity ligation of Hi-C. The fairly strong correlation between the intra-TAD ratios computed from SPRITE and bulk Hi-C suggest that chromatin domain 3D structures are somewhat stable across single cells and that binary interactions from Hi-C could be sufficient for computing the intra-TAD ratio. This opens the opportunity for large-scale study of intra-TAD ratios in different cell types, given the much more abundant published bulk Hi-C data than multiplex data. On the other hand, to further investigate the stability of chromatin domain 3D structures across single cells, single-cell SPRITE data [60] could be utilized. By comparing intra-TAD ratios computed from bulk Hi-C, scHi-C, SPRITE, and single-cell SPRITE, differences in intra-TAD ratios caused by cell-cell variation and those caused by multiplexing can be separately analyzed.
A common limitation of all these “all-against-all” sequencing methods is that due to the very large number of chromosome interactions genome-wide, it is practically impossible to achieve sub-kilobase data resolution even with very deep sequencing. Instead, by using digitonin cell permeabilization, micrococcal nuclease (MNase) digestion, and direct sequencing of ligation junctions, it is possible to probe interactions of distinct viewpoints (the Micro Capture-C method [61]) or tiled regions (the Tiled-MCC method [62]) at near-base pair resolution, which helps explore nanoscale regulatory interactions. Based on the data produced from these new methods so far, it has been found that active loop extrusion is not essential for enhancer-promoter interactions but it contributes to their specificity and robustness, while local insulation of interactions among regulatory elements depends on CTCF [62]. We hypothesize that these findings are related to scDomains, whose boundary locations and correspondingly 3D structures are partially specified by loop extrusion and CTCF binding sites, and CTCF binding can create physical barriers between nearby regions in the 3D space. Further investigations are required to elucidate these mechanisms in detail.
Our analysis of lymphoma-associated non-coding variants shows that there is a trend for them to reside on domain surfaces. A difficulty of this analysis was the small number of variants available from the GWAS Catalog, which prohibited us from further focusing on only the non-coding variants within GM12878 cCREs since that would leave too few variants to remain and thus decrease statistical power. Similarly, it would be interesting to investigate whether the coreness of a region relative to its chromatin domain provides non-redundant information about disease risk on top of its chromatin state, but the same issue with statistical power pertains. Nevertheless, our initial results here suggest that similar analyses can also be performed for other diseases and corresponding cell types. By pooling data from related diseases, such as a pan-cancer analysis, more significant enrichment of disease-associated non-coding variants on domain surfaces may be found.
In some previous work, efforts have been devoted to deriving useful 1D features (i.e., a single numeric value for each genomic locus) from Hi-C contact maps and to linking them with chromatin organization and gene regulation [63]. As the most well-known examples, the first principal component of the normalized Hi-C contact map was introduced to partition genomic loci into the A and B compartments [15], and both the directionality index (DI) [20] and insulation score (IS) [43] were used to identify TAD boundaries. The intra-TAD ratio we introduce in this work can also be considered a type of 1D feature. The previously proposed open chromatin index (OCI) [64] and distal-to-local ratio (DLR) [65] bear some similarity to the intra-TAD ratio, in that they are both based on the comparison of Hi-C interactions that involve proximal regions versus distal regions. Yet to the best of our knowledge, the current study is the first one that uses 1D features derived from Hi-C contact maps to study chromatin domain 3D structures.
Limitations of the study
There are several limitations of this study. First, although the whole-chromosome tracing data set has enabled us to study scDomain structures, it only provides high-resolution data (at 50 kb resolution) for one chromosome, while the whole-genome data from the same study do not have the resolution required by our analyses. Second, data involved in our study of chromatin domain structures (imaging, SPRITE, bulk Hi-C, and scHi-C) were only obtained from the same cell types as the functional data but not the same single cells. We were therefore not able to directly relate chromatin domain structures and functional signals at the single cell level. Third, all the data we have used in this study were obtained from cell lines, which may differ from in vivo situations. Fourth, the structural models we used were originally derived from a limited number of cells, and thus may not have captured sufficient information about cell-to-cell variability. Fifth, the different types of data we used all provide only a snapshot of the cells but do not provide information about dynamic changes over time. Sixth, the analyses performed for studying relationships between intra-TAD ratio and functional signals were correlative in nature. Causal relationships, such as whether change of location of a gene from the core to the surface of a scDomain would lead to increase of its expression, are yet to be confirmed. We propose that by analyzing large-scale time-course data, it is possible to test whether surface-versus-core positioning precedes gene activity or vice versa while controlling for other factors that may also affect expression levels, such as change of domain 1D boundary locations or change of A/B compartment membership. Finally, some of the correlations between the intra-TAD ratio and chromatin signals are weak. While the binding sites of several proteins that associate with specific histone marks (CBX3, EZH2, and KAT2A) have provided supports for the biological relevance of these correlations, a more systematic analysis involving binding sites of more histone writers would be required.
Conclusions
In this study, we have shown that the intra-TAD ratio serves as a proxy for chromatin domain 3D structures. It enabled us to discover various molecular features that are correlated with chromatin domain 3D structures, including the enrichment of highly expressed genes, CREs, active histone marks, and TF binding sites on domain surfaces. Most of these correlations are cell type-specific and are more prominent in the B2 and B3 subcompartments, which contain larger and more inactive TADs. We have also shown that genomic loci with different replication timing and near different nuclear compartments have different preferences of their localization relative to chromatin domain 3D structures. Finally, we have also shown that disease-associated non-coding variants tend to reside more on domain surfaces.
Methods
Quantifying the location tendencies of scDomain boundaries
High-resolution imaging data of 50 kb genomic bins in chromosome 21 of IMR90 cells produced by Su et al. [13] were downloaded from Zenodo at https://doi.org/10.5281/zenodo.3928890. Processed bulk Hi-C data produced by Rao et al. [16] were also downloaded from the same link. scDomains were identified from the imaging data following the code provided by Su et al. [13], available at https://github.com/ZhuangLab/Chromatin_Analysis_2020_cell. Briefly, pairwise distances between all pairs of 50 kb bins in each individual cell were calculated from the imaging data. For each 50 kb bin in each cell, upstream and downstream loci within a fixed window size were selected to calculate the insulation score. With this sliding window calculation, loci exhibiting local maxima of insulation scores were identified as potential domain boundaries. Boundaries that separated domains exhibiting similar patterns were then removed.
TADs called from the bulk Hi-C data using directionality index [20] were downloaded from the 3D Genome Browser [66] at http://3dgenome.fsm.northwestern.edu/. For each pair of 50 kb bins separated by a certain genomic distance, we divided them into two groups based on whether they belonged to the same TAD or not. In each group, we counted the number of cells in which the two bins belonged to the same scDomain.
Quantifying the 3D structural tendencies of scDomains
For each 50 kb bin in chromosome 21 of an IMR90 cell, we defined its spatial neighborhood as a sphere centered at this bin (based on its coordinates from the imaging data) with a radius of 500 nm, which is the distance threshold that provides the best correlation with Hi-C contact data [13]. We then calculated the intra-scDomain of the bin as the fraction of bins in its neighborhood that come from its scDomain.
Using the Hi-C contact matrix of chromosome 21 in IMR90 provided by Su et al. [13], the intra-TAD ratio of each 50 kb bin was calculated as the total intra-TAD contacts divided by the total contacts involving this bin.
To compare the intra-scDomain ratio computed from imaging data with the intra-TAD ratio computed from bulk Hi-C data, we randomly sampled a certain number of cells, among all single cells imaged, and took the average of the intra-scDomain ratio of each 50 kb bin in these sampled cells. The resulting vector of average intra-scDomain ratios was used to compute Pearson correlation and Spearman correlation with the vector of intra-TAD ratios. For each number of cells, we repeated the random sampling independently 100 times to obtain a distribution of correlation values.
We also created shuffled TADs with the same size (i.e., 1D span) distribution and inter-TAD gap pattern as the actual TADs. This was done by first collecting the real sizes of the TADs and the gaps between them and putting them in a list following their original order. We then randomly permuted the list of TAD sizes and gap sizes, together with the label of whether the value is a TAD or a gap. We repeated this random permutation 100 times to produce 100 sets of shuffled TAD locations. We then used these shuffled TAD locations to compute intra-TAD ratios and calculated their correlations with average intra-scDomain ratios (over 200 randomly sampled cells), in the same way that we did with the real TADs.
With the 3D coordinates of 50 kb genomic bins in each scDomain, we identified the convex hull for each scDomain using the ConvexHull function implemented in the scipy.spatial module of Python [67]. We then compared the intra-scDomain ratio of the 50 kb genomic bins forming the convex hull (“vertices”) with other bins (“non-vertices”).
Calculation of intra-TAD ratio using SPRITE data
SPRITE multi-contact read clusters from GM12878 cells and mESCs were downloaded from the 4D Nucleome data portal (https://data.4dnucleome.org/) [68] (data processed in GRCh38 [4DNESI1U7ZW9] and GRCm38 [4DNESOJRTZZR]) and Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) [69] (GEO; accession: GSE114242) (data processed in GRCh37). All sequencing reads within the read clusters were overlapped with genomic bins of a fixed size (5 kb, 10 kb, 25 kb, or 50 kb). TAD calls using directionality index [20] were downloaded from the 3D Genome Browser (http://3dgenome.fsm.northwestern.edu/) [66]. For a genomic bin i involved in a read cluster j, its normalized intra-TAD ratio in the read cluster, \(r_{ij}\), was defined as \(N_{ij}/(C_j-1)\), where \(N_{ij}\) is the number of intra-TAD interactions between bin i and other bins involved in this cluster and \(C_j\) is the total number of bins in cluster j. The overall intra-TAD ratio of bin i, \(r_i\), is then the average of all its intra-TAD ratios in the read clusters that it is involved in. For calculations that considered only intra-chromosomal interactions, we first partitioned the reads in each cluster into sub-clusters according to their chromosomes, and then computed intra-TAD ratios of each sub-cluster separately.
In order to compare with intra-TAD ratios calculated from bulk Hi-C data, scHi-C data, and structural models, which were processed using different reference genome builds in the case of GM12878, we computed intra-TAD ratios using SPRITE data based on both GRCh38 and GRCh37. In the case of mESC, all calculations were based on GRCm38.
In the analyses that involved classifying bins into “domain surface bins” and “domain core bins,” domain surface bins were defined in each TAD separately as bins with an intra-TAD ratio smaller than or equal to the \(q_k\)-th quantile of intra-TAD ratio within TAD k, where \(q_k = n_k^{-\frac{1}{3}}\) and \(n_k\) is the total number of bins in TAD k. This definition of \(q_k\) is based on the ratio between surface area (S) and volume (V) of a sphere, which is proportional to \(V^{-\frac{1}{3}}\). The remaining TAD regions were defined as domain core bins.
Pre-processing of SPRITE data when computing intra-TAD ratios
Since SPRITE clusters involving too few or too many reads are potentially problematic, we tested different ranges of cluster size (i.e., number of reads per cluster) following the original SPRITE paper [18] and found 2–1000 to be a reasonable range that provides a balance between filtering out problematic clusters and maintaining the diversity of intra-TAD ratios (Fig. S5). Previous studies have shown that inter-chromosomal interactions provide additional information about chromatin organizations [18, 45] and considering them led to more consistent intra-TAD ratios across different ranges of cluster sizes compared with using intra-chromosomal interactions alone to compute intra-TAD ratios (Fig. S5). Considering these observations and the fact that intra-TAD ratios remain fairly stable across bin sizes (Fig. S5), in our default setting for computing the intra-TAD ratio, we used a small bin size of 10 kb to obtain fine-grained information and considered both intra-chromosomal and inter-chromosomal interactions based on SPRITE clusters with 2–1000 reads. We also filtered out genomic bins with less than 100 SPRITE clusters involving them.
Calculation of intra-TAD ratio using bulk Hi-C data
Processed in situ Hi-C data (in.hic format) produced from GM12878 cells [16] (4DNES3JX38V5), K562 cells [16] (4DNESI7DEJTM), and mESCs (4DNESDXUWBD9) [70] were downloaded from the 4D Nucleome data portal (https://data.4dnucleome.org/). To extract whole-genome intra- and inter-chromosomal contact matrices, we first added SCALE normalization vectors to the original .hic data using the addNorm function of Juicer Tools (version 1.22.01). The contact matrices were generated by Juicer Tools (version 1.22.01) dump at 10 kb, 25 kb, and 50 kb resolution. The intra-TAD ratio of each genomic bin was then calculated as the intra-TAD contacts divided by the total contacts involving this bin in a Hi-C contact matrix.
Calculation of intra-TAD ratio using single-cell Hi-C data
Single-cell Hi-C contacts from GM12878 cells and mESCs were downloaded from GEO (accessions: GSE117876 and GSE80280, respectively). The Hi-C contacts were converted to .pairs format (https://github.com/4dn-dcic/pairix/blob/master/pairs_format_specification.md#standard-format) if they were originally not. The intra-TAD contacts and total contacts of a given genomic bin were extracted using the Pyhton package pypairix (v0.3.7), and its intra-TAD ratio was then calculated as their ratio. We used 10 kb bins for GM12878 scHi-C data and 50 kb bins for mESC scHi-C data due to the sparsity of the latter data set. Finally, for each bin, the average intra-TAD ratio across all single cells was compared with its intra-TAD ratio computed using SPRITE data.
Calculation of intra-TAD ratio using structural models
Dip-C [34] structures at 20 kb resolution for GM12878 cells were downloaded from GEO (accession: GSE117876) and Si-C [40] structures at 10 kb resolution for mESCs were downloaded from https://github.com/TheMengLab/Si-C_3D_structure. To calculate the intra-TAD ratio of a genomic bin according to a 3D structure, we first found the \(k=10\) nearest neighbors of it in the structure based on their Euclidean distances using the Python package sklearn (v0.23.2). Among these k nearest neighbors, we counted the number of them belonging to the same TAD as the genomic bin of interest and divided it by k to become the intra-TAD ratio. For Dip-C structures, which also contained the diploid chromosome information, genomic bins were regarded as intra-TAD neighbors only if they were from the same TAD and the same chromosome haplotype. To obtain the intra-TAD ratio at 10 kb resolution for the 20 kb bins in Dip-C data, we separated each 20 kb bin into two virtual 10 kb bins and assigned the same intra-TAD ratio to them. Finally, for each genomic bin, its intra-TAD ratios over all the available structural models (including cells and configurations) were averaged and compared with its intra-TAD ratio computed using SPRITE data.
Discretization of a range of values into intervals
In order to illustrate the correlation between two variables X and Y, we divided all values of X into distinct intervals with approximately equal numbers of values in each interval. By plotting the distribution of Y within each interval, we could observe how Y changes as X increases.
Violin plots
We used the violinplot function inside the seaborn module of Python [71] to create violin plots. By default, it estimates the data distribution based on the data points observed, which could result in non-zero probability densities of values outside the observed maximum and minimum values. To avoid the impression that values were observed outside the actual observed range of values, we disabled such data extrapolation by setting the parameter cut = 2.
RNA-seq data analysis
Processed RNA-seq data from GM12878 cells were downloaded from GEO (accessions: GSE88583 and GSE88627). GENCODE [72] (v24) was used to annotate the genes and extract TSSs. Replicates were averaged and gene expression levels were quantified by fragments per kilobase of exons per million fragments (FPKM). Processed RNA-seq data from mESCs were downloaded from GEO (accession: GSE90277) and processed in a similar way with GENCODE (vM4) annotations.
ATAC-seq and histone ChIP-seq data analysis
Processed ATAC-seq data and ChIP-seq data for H2AFZ, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me3, H3K27ac, H3K27me3, H3K36me3, H3K79me2, and H4K20me1 (in .bigwig format) from GM12878 cells were downloaded from the ENCODE data matrix (https://www.encodeproject.org/). For each type of data, the averaged signal (fold change over control) in each genomic bin was computed using bigWigAverageOverBed.
Enrichment of cis-regulatory elements
Candidate cis-regulatory elements (cCREs) in GM12878, HeLaS3, HepG2, IMR90, and K562 cells as well as cell type-agnostic cCREs were downloaded from ENCODE (https://www.encodeproject.org/). All cCREs were overlapped with genomic bins at 5 kb, 10 kb, 25 kb, and 50 kb resolution. Regions either with no cCRE labels or labeled with “Low-DNase” were filtered out since they are unlikely CREs, and the remaining cCREs were intersected with all TAD regions, domain surface regions, and domain core regions. Suppose the number of all TAD bins is a, the number of all domain surface bins is b, the number of all filtered cCRE bins within TAD regions is c, and the number of cCRE bins on domain surfaces is d, the domain surface enrichment fold was defined as \(\left( \frac{d}{c}\right) /\left( \frac{b}{a}\right)\). Enrichment p values of cCREs on domain surfaces were calculated by fitting a hypergeometric distribution using the hypergeom.sf() function inside the scipy.stats module of Python [67]. Domain core enrichment folds and p values were calculated similarly.
For each of the four categories of cCREs (dELS, pELS, PLS, and CTCF-bound), a genomic bin was labeled with a certain category if it overlapped with a region of this subtype, regardless of its overlap with regions of other categories. To test the effect of TAD boundaries on surface cCRE enrichment, we excluded TAD boundary bins and the neighboring 10% bins of the whole TAD from the domain surface bins and repeated the analysis. To test cell-type specificity of surface cCRE enrichment, we excluded all cCREs defined in GM12878 from cell type-agnostic cCREs or cCREs in HeLaS3, HepG2, IMR90, or K562, and repeated the analysis.
Enrichment of transcription factors
ChIP-seq peaks of 153 transcription factors (TFs) obtained from GM12878 cells were downloaded from ENCODE (https://www.encodeproject.org/). All TF peaks were overlapped with genomic bins at 5 kb, 10 kb, 25 kb, and 50 kb resolution and intersected with domain surfaces and domain cores. Enrichment of TFs on domain surfaces and in domain cores was calculated in the same way as described above for cCREs.
The molecular weights of TFs were downloaded from UniProt (https://www.uniprot.org/ [73]). Only the “reviewed” entry was kept if a TF had multiple entries. For TFs with multiple “reviewed” entries, we manually selected one of them to define the molecular weight. We only considered each TF protein as a single sub-unit without considering other proteins in the possible TF complexes.
Repli-seq data analysis
Wavelet-smoothed signals of Repli-seq data (in .bigWig format) from GM12878 cells were downloaded from ENCODE (https://www.encodeproject.org/). Averaged signals in each genomic bin at 5 kb, 10 kb, 25 kb, and 50 kb resolution were obtained using bigWigAverageOverBed.
SPIN state analysis
SPIN annotations of genomic regions in K562 cells were downloaded from https://github.com/ma-compbio/SPIN [54]. Since SPRITE data were not available for K562 cells, intra-TAD ratios were calculated using bulk Hi-C data downloaded from the 4D Nucleome data portal (https://data.4dnucleome.org/ and processed in the same way for other Hi-C data, as described above.
Disease variants analysis
Lymphoma-related variants were downloaded from GWAS Catalog [56] with trait ID of “EFO_0000574.” The variants overlapped with any human transcript defined by GENCODE (v29) were excluded. The remaining variants were considered non-coding variants and were overlapped with 10 kb bins. Intra-TAD ratios (calculated in GM12878) of these lymphoma-related non-coding variants were compared with those of all the non-coding 10 kb bins.
Data availability
All data used in this paper are publicly available. Custom code used to calculate intra-TAD ratios is publicly available on GitHub (https://github.com/kellyliyichen/3D_chromatin_domain) [74] and Zenodo (https://doi.org/10.5281/zenodo.15611861) [75] under the MIT License. High-resolution imaging data of 50 kb genomic bins on chromosome 21 of IMR90 cells produced by Su et al. [13] are available at Zenodo (https://doi.org/10.5281/zenodo.3928890) [76]. SPRITE data in GM12878 and mESC produced by Quinodoz et al. [18] are available in the 4D Nucleome data portal (https://data.4dnucleome.org) [77, 78] and GEO with accession number GSE114242 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114242) [79]. Bulk Hi-C data produced by Rao et al. [16] and Bonev et al. [70] are available in the 4D Nucleome data portal (https://data.4dnucleome.org) [80,81,82]. Single-cell Hi-C contacts from mESCs are available at GEO accession GSE80280 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80280) [83]. Single-cell Hi-C contacts from GM12878 cells and Dip-C structures at 20 kb resolution are available in GEO with accession number GSE117876 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117876) [84]. Si-C [40] structures at 10 kb resolution for mESCs are available at https://github.com/TheMengLab/Si-C_3D_structure [85]. Processed RNA-seq, ATAC-seq, ChIP-seq, Repli-seq, and candidate cis-regulatory elements are available in the ENCODE data portal (https://www.encodeproject.org) [86]. TAD calls using the directionality index are available in the 3D Genome Browser (http://3dgenome.fsm.northwestern.edu) [87]. SPIN annotations of genomic regions in K562 cells are available at https://github.com/ma-compbio/SPIN [88]. Lymphoma-related variants are available in the GWAS Catalog (https://www.ebi.ac.uk/gwas) [89].
References
Hug CB, Vaquerizas JM. The birth of the 3D genome during early embryonic development. Trends Genet. 2018;34(12):903–14.
Kim S, Shendure J. Mechanisms of interplay between transcription factors and the 3D genome. Mol Cell. 2019;76(2):306–19.
Stadhouders R, Filion GJ, Graf T. Transcription factors and 3D genome conformation in cell-fate decisions. Nature. 2019;569(7756):345–54.
Zheng H, Xie W. The role of 3D genome organization in development and cell differentiation. Nat Rev Mol Cell Biol. 2019;20(9):535–50.
Oudelaar AM, Higgs DR. The relationship between genome structure and function. Nat Rev Genet. 2020;22(3):154–68.
Aboelnour E, Bonev B. Decoding the organization, dynamics, and function of the 4D genome. Dev Cell. 2021;56(11):1562–73.
Han J, Zhang Z, Wang K. 3C and 3C-based techniques: the powerful tools for spatial genome organization deciphering. Mol Cytogenet. 2018;11:21.
Kempfer R, Pombo A. Methods for mapping 3D chromosome architecture. Nat Rev Genet. 2019;21(4):207–26.
Bintu B, Mateo LJ, Su JH, Sinnott-Armstrong NA, Parker M, Kinrot S, et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science. 2018;362(6413):eaau1783.
Cardozo Gizzi AM, Cattoni DI, Fiche JB, Espinola SM, Gurgo J, Messina O, et al. Microscopy-based chromosome conformation capture enables simultaneous visualization of genome organization and transcription in intact organisms. Mol Cell. 2019;74(1):212–22.
Xie L, Liu Z. Single-cell imaging of genome organization and dynamics. Mol Syst Biol. 2021;17(7):e9653.
Liu M, Lu Y, Yang B, Chen Y, Radda JSD, Hu M, et al. Multiplexed imaging of nucleome architectures in single cells of mammalian tissue. Nat Commun. 2020;11(1):1–14.
Su JH, Zheng P, Kinrot SS, Bintu B, Zhuang X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Cell. 2020;182(6):1641–59.
Takei Y, Yun J, Zheng S, Ollikainen N, Pierson N, White J, et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature. 2021;590(7845):344–50.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–93.
Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80.
Zheng M, Tian SZ, Capurso D, Kim M, Maurya R, Lee B, et al. Multiplex chromatin interactions with single-molecule precision. Nature. 2019;566(7745):558–62.
Quinodoz SA, Ollikainen N, Tabak B, Palla A, Schmidt JM, Detmar E, et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell. 2018;174(3):744–57.
Rowley MJ, Corces VG. Organizational principles of 3D genome architecture. Nat Rev Genet. 2018;19(12):789–800.
Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–80.
Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012;485(7398):381–5.
Zufferey M, Tavernari D, Oricchio E, Ciriello G. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 2018;19(1):1–18.
Sanborn AL, Rao SSP, Huang S-C, Durand NC, Huntley MH, Jewett AI, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci U S A. 2015;112(47):E6456–65.
Fudenberg G, Imakaev M, Lu C, Goloborodko A, Abdennur N, Mirny LA. Formation of chromosomal domains by loop extrusion. Cell Rep. 2016;15(9):2038–49.
Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167(5):1369-1384.e19.
Jung I, Schmitt A, Diao Y, Lee AJ, Liu T, Yang D, et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat Genet. 2019;51(10):1442–9.
Schoenfelder S, Fraser P. Long-range enhancer-promoter contacts in gene expression control. Nat Rev Genet. 2019;20(8):437–55.
Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature. 2013;502(7469):59–64.
Flyamer IM, Gassler J, Imakaev M, Brandão HB, Ulianov SV, Abdennur N, et al. Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature. 2017;544(7648):110–4.
Ramani V, Deng X, Qiu R, Gunderson KL, Steemers FJ, Disteche CM, et al. Massively multiplex single-cell Hi-C. Nat Methods. 2017;14(3):263–6.
Zhang R, Zhou T, Ma J. Multiscale and integrative single-cell Hi-C analysis with higashi. Nat Biotechnol. 2022;40(2):254–61.
Cremer M, Brandstetter K, Maiser A, Rao SSP, Schmid VJ, Guirao-Ortiz M, et al. Cohesin depleted cells rebuild functional nuclear compartments after endomitosis. Nat Commun. 2020;11(1):6146.
Miron E, Oldenkamp R, Brown JM, Pinto DMS, Xu CS, Faria AR, et al. Chromatin arranges in chains of mesoscale domains with nanoscale functional topography independent of cohesin. Sci Adv. 2020;6(39):eaba8811.
Tan L, Xing D, Chang CH, Li H, Xie XS. Three-dimensional genome structures of single diploid human cells. Science. 2018;361(6405):924–8.
Finn EH, Misteli T. Molecular basis and biological function of variability in spatial genome organization. Science. 2019;365(6457):eaaw9498.
Luppino JM, Joyce EF. Single cell analysis pushes the boundaries of TAD formation and function. Curr Opin Genet Dev. 2020;61:25–31.
Lakadamyali M, Cosma MP. Visualizing the genome in high resolution challenges our textbook understanding. Nat Methods. 2020;17(4):371–9.
Cheng Y, Liu M, Hu M, Wang S. TAD-like single-cell domain structures exist on both active and inactive X chromosomes and persist under epigenetic perturbations. Genome Biol. 2021;22(1):1–26.
Stevens TJ, Lando D, Basu S, Atkinson LP, Cao Y, Lee SF, et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017;544(7648):59–64.
Meng L, Wang C, Shi Y, Luo Q. Si-C is a method for inferring super-resolution intact genome structure from single-cell Hi-C data. Nat Commun. 2021;12(1):1–11.
Zhou J, Ma J, Chen Y, Cheng C, Bao B, Peng J, et al. Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation. Proc Natl Acad Sci U S A. 2019;116(28):14011–8.
Di Stefano M, Stadhouders R, Farabella I, Castillo D, Serra F, Graf T, Marti-Renom MA. Transcriptional activation during cell reprogramming correlates with the formation of 3D open chromatin hubs. Nat Commun. 2020;11(1):1–12.
Crane E, Bian Q, McCord RP, Lajoie BR, Wheeler BS, Ralston EJ, et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature. 2015;523(7559):240–4.
Shin H, Shi Y, Dai C, Tjong H, Gong K, Alber F, Zhou XJ. TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Res. 2016;44(7):e70.
Xiong K, Ma J. Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions. Nat Commun. 2019;10(1):5069.
Otterstrom J, Castells-Garcia A, Vicario C, Gomez-Garcia PA, Cosma MP, Lakadamyali M. Super-resolution microscopy reveals how histone tail acetylation affects DNA compaction within nucleosomes in vivo. Nucleic Acids Res. 2019;47(16):8470–84.
Dixon JR, Jung I, Selvaraj S, Shen Y, Antosiewicz-Bourget JE, Lee AY, et al. Chromatin architecture reorganization during stem cell differentiation. Nature. 2015;518(7539):331–6.
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.
Hounkpe BW, Chenou F, de Lima F, de Paula EV. HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets. Nucleic Acids Res. 2021;49(D1):D947–55.
Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, et al. The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018;46(D1):D794–801.
Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583(7818):699–710.
Marchal C, Sima J, Gilbert DM. Control of DNA replication timing in the 3d genome. Nat Rev Mol Cell Biol. 2019;20(12):721–37.
van Steensel B, Belmont AS. Lamina-associated domains: links with chromosome architecture, heterochromatin, and gene repression. Cell. 2017;169(5):780–91.
Wang Y, Zhang Y, Zhang R, van Schaik T, Zhang L, Sasaki T, et al. SPIN reveals genome-wide landscape of nuclear compartmentalization. Genome Biol. 2021;22(1):1–23.
Khurana E, Fu Y, Chakravarty D, Demichelis F, Rubin MA, Gerstein M. Role of non-coding sequence variants in cancer. Nat Rev Genet. 2016;17(2):93–108.
Buniello A, Macarthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–12.
Lou S, Lee HM, Qin H, Li JW, Gao Z, Liu X, et al. Whole-genome bisulfite sequencing of multiple individuals reveals complementary roles of promoter and gene body methylation in transcriptional regulation. Genome Biol. 2014;15(7):408.
Hung T-C, Kingsley DM, Boettiger AN. Boundary stacking interactions enable cross-TAD enhancer–promoter communication during limb development. Nat Genet. 2024;56(2):306–14.
Li Y, Xue B, Zhang M, Zhang L, Hou Y, Qin Y, et al. Transcription-coupled structural dynamics of topologically associating domains regulate replication origin efficiency. Genome Biol. 2021;22:206.
Arrastia MV, Jachowicz JW, Ollikainen N, Curtis MS, Lai C, Quinodoz SA, et al. Single-cell measurement of higher-order 3d genome organization with scSPRITE. Nat Biotechnol. 2022;40(1):64–73.
Hua P, Badat M, Hanssen LLP, Hentges LD, Crump N, Downes DJ, et al. Defining genome architecture at base-pair resolution DNA long range interactions. Nature. 2021;595(7865):125–9.
Aljahani A, Hua P, Karpinska MA, Quililan K, Davies JOJ, Oudelaar AM. Analysis of sub-kilobase chromatin topology reveals nano-scale regulatory interactions with variable dependence on cohesin and CTCF. Nat Commun. 2022;13:2139.
Wang J, Nakato R. HiC1Dmetrics: framework to extract various one-dimensional features from chromosome structure data. Brief Bioinforma. 2022;23(1):bbab509.
Chandra T, Ewels PA, Schoenfelder S, Furlan-Magaril M, Wingett SW, Kirschner K, et al. Global reorganization of the nuclear landscape in senescent cells. Cell Rep. 2015;10(4):471–83.
Heinz S, Texari L, Hayes MGB, Urbanowski M, Chang MW, Givarkes N, et al. Transcription elongation can affect genome 3D structure. Cell. 2018;174(6):1522–36.
Wang Y, Song F, Zhang B, Zhang L, Xu J, Kuang D, et al. The 3D genome browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 2018;19(1):1–12.
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods. 2020;17(3):261–72.
Dekker J, Belmont AS, Guttman M, Leshyk VO, Lis JT, Lomvardas S, et al. The 4D nucleome project. Nature. 2017;549(7671):219–26.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data - update. Nucleic Acids Res. 2013;41(D1):D991–5.
Bonev B, Mendelson Cohen N, Szabo Q, Fritsch L, Papadopoulos GL, Lubling Y, et al. Multiscale 3D genome rewiring during mouse neural development. Cell. 2017;171(3):557–72.
Waskom ML. seaborn: statistical data visualization. J Open Source Softw. 2021;6(60):3021.
Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2018;47(D1):D766–73.
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–9.
Li KY. Intra-TAD ratio calculation. Github. 2025. https://github.com/kellyliyichen/3D_chromatin_domain. Accessed 06 June 2025.
Li KY. Intra-TAD ratio calculation. Zenodo. 2025. https://doi.org/10.5281/zenodo.15611861.
Su J-H, Zheng P, Kinrot S, Bintu B, Zhuang X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Zenodo. 2020. https://doi.org/10.5281/zenodo.3928890.
Quinodoz SA, Ollikainen N, Tabak B, Palla A, Schmidt JM, Detmar E, et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Datasets. 2018. https://data.4dnucleome.org/experiment-set-replicates/4DNESI1U7ZW9. Accessed 08 Dec 2020.
Quinodoz SA, Ollikainen N, Tabak B, Palla A, Schmidt JM, Detmar E, et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Datasets. 2018. https://data.4dnucleome.org/experiment-set-replicates/4DNESOJRTZZR. Accessed 08 Dec 2020.
Quinodoz SA, Ollikainen N, Tabak B, Palla A, Schmidt JM, Detmar E, et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Datasets. Gene Expression Omnibus. 2018. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114242. Accessed June 2019.
Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Datasets. 2017. https://data.4dnucleome.org/experiment-set-replicates/4DNES3JX38V5. Accessed May 2021.
Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Datasets. 2017. https://data.4dnucleome.org/experiment-set-replicates/4DNESI7DEJTM. Accessed May 2021.
Bonev B, Mendelson Cohen N, Szabo Q, Fritsch L, Papadopoulos GL, Lubling Y, et al. Multiscale 3D genome rewiring during mouse neural development. Datasets. 2019. https://data.4dnucleome.org/experiment-set-replicates/4DNESDXUWBD9. Accessed May 2021.
Stevens TJ, Lando D, Basu S, Atkinson LP, Cao Y, Lee SF, et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Datasets. Gene Expression Omnibus. 2017. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80280. Accessed Aug 2019.
Tan L, Xing D, Chang CH, Li H, Xie XS. Three-dimensional genome structures of single diploid human cells. Datasets. Gene Expression Omnibus. 2018. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117876. Accessed Aug 2019.
Meng L, Wang C, Shi Y, Luo Q. Si-C is a method for inferring super-resolution intact genome structure from single-cell Hi-C data. Datasets. 2021. https://github.com/TheMengLab/Si-C_3D_structure. Accessed Sept 2021.
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Datasets. 2012. https://www.encodeproject.org. Accessed May 2020.
Wang Y, Song F, Zhang B, Zhang L, Xu J, et al. The 3D genome browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Datasets. 2018. http://3dgenome.fsm.northwestern.edu. Accessed July 2020.
Wang Y, Zhang Y, Zhang R, van Schaik T, Zhang L, Sasaki T, et al. SPIN reveals genome-wide landscape of nuclear compartmentalization. Datasets. 2021. https://github.com/ma-compbio/SPIN. Accessed Jan 2022.
Buniello A, Macarthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Datasets. 2019. https://www.ebi.ac.uk/gwas. Accessed Apr 2022.
Acknowledgements
We thank Aniruddha Deshpande and Geneviève Fourel for their comments on this work.
Peer review information
Andrew Cosgrove was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.
Funding
This project was supported by Hong Kong Research Grants Council General Research Fund under Award Number 14107420. QC is supported by National Natural Science Foundation of China under Award Number 32100515 and CUHK direct grant for research under Award Numbers 2021.061 and 2022.080. KYY is supported by National Cancer Institute of the National Institutes of Health under Award Numbers P30CA030199 and R01CA287114, National Institute on Aging of the National Institutes of Health under Award Numbers R01AG085498 and U54AG079758, and internal grants of Sanford Burnham Prebys Medical Discovery Institute. SHC is supported by the California Institute for Regenerative Medicine under a pre-doctoral fellowship (EDUC4-12813). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
KYY conceived the study. KYL, QC, and KYY developed the methodology and designed the analyses. KYL and SHC performed data processing and analyses. KYL, QC, SHC, CN, PLP, HW, DL, and KYY interpreted the results. KYL and KYY wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, K.Y., Cao, Q., Chow, S.HC. et al. Regulatory roles of three-dimensional structures of chromatin domains. Genome Biol 26, 184 (2025). https://doi.org/10.1186/s13059-025-03659-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1186/s13059-025-03659-7