Abstract
Unraveling the regulatory mechanisms that govern complex traits is pivotal for advancing crop improvement. Here we present a comprehensive regulome atlas for rice (Oryza sativa), charting the chromatin accessibility across 23 distinct tissues from three representative varieties. Our study uncovers 117,176 unique open chromatin regions (OCRs), accounting for ~15% of the rice genome, a notably higher proportion compared to previous reports in plants. Integrating RNA-seq data from matched tissues, we confidently predict 59,075 OCR-to-gene links, with enhancers constituting 69.54% of these associations, including many known enhancer-to-gene links. Leveraging this resource, we re-evaluate genome-wide association study results and discover a previously unknown function of OsbZIP06 in seed germination, which we subsequently confirm through experimental validation. We optimize deep learning models to decode regulatory grammar, achieving robust modeling of tissue-specific chromatin accessibility. This approach allows to predict cross-variety regulatory dynamics from genomic sequences, shedding light on the genetic underpinnings of cis-regulatory divergence and morphological disparities between varieties. Overall, our study establishes a foundational resource for rice functional genomics and precision molecular breeding, providing valuable insights into regulatory mechanisms governing complex traits.
Similar content being viewed by others
Introduction
Rice (Oryza sativa) is not only one of the most important crops in the world but also an outstanding model species for studying plant growth and development. Over the past two decades, tremendous efforts have been made to understand the genetic basis of important agronomic traits in rice1. Genome-wide association studies (GWAS) have played a pivotal role in this pursuit, helping to link genetic variations to phenotypic diversity. These studies have identified a large number of candidate genes that hold promise for trait improvement2,3,4,5. However, despite these advances, our understanding of the regulatory mechanisms governing complex traits in rice remains incomplete.
Gene regulatory networks (GRNs) are largely dictated by cis-regulatory DNA sequences, such as promoters and enhancers, which are bound by specific transcription factors (TFs)6. Deciphering the regulatory code within these regulatory sequences and linking the regulatory sequences to target genes are crucial for rewiring GRNs for crop improvement and trait optimization6. Nonetheless, efforts to profile the regulome, encompassing the entirety of regulatory elements in the genome, remain constrained in rice. These efforts often concentrate on specific tissues, neglecting the comprehensive landscape across developmental stages and tissues7,8. Similarly, endeavors to establish links between regulatory regions and their target genes in rice are also limited8.
Meanwhile, many functional genetic variants associated with agronomic traits in rice reside within noncoding regulatory regions (e.g., qSH19, DROT110, and FZP11), which makes their interpretation challenging and underscores the necessity for a systematic dissection of regulatory sequences. Given that diverse traits manifest across distinct developmental stages and tissues, systematic annotation of noncoding regulatory variants in rice is currently hindered by the lack of a comprehensive epigenome map across various tissues and growth stages.
To bridge these gaps, we systematically mapped chromatin accessibility profiles in various tissues across the life cycle of three representative rice cultivars using the UMI-ATAC-seq method12, a modified ATAC-seq (assay for transposase accessible chromatin-sequencing) protocol developed in our lab. Through analysis of 145 ATAC-seq datasets, we obtained a total of 117,176 unique open chromatin regions (OCRs), accounting for ~15% of the rice genome. By integration of RNA-seq data from matched tissues, we predicted potential target genes for OCRs based on the correlation of gene expression and adjacent chromatin accessibility across tissues. Through TF footprinting analysis, we inferred tissue- or stage-specific regulatory networks and identified cultivar-polymorphic/trait-associated OCRs by comparing the regulatory landscapes between indica and japonica rice subspecies. Notably, our analysis unveiled a preference for GWAS-associated variants within tissue-specific OCRs, enabling the identification of causal associations between 209 complex agronomic traits and noncoding regulatory variants using this OCR landscape. Utilizing optimized deep learning models, we decoded the regulatory grammar through modeling of tissue-specific chromatin accessibility and across-variety predictions from sequences. The modeling approach sheds light on the key genetic alterations contributing to cis-regulatory divergence. Overall, these data not only serve as a cornerstone resource for the plant research community but also provide valuable regulatory variants for precision molecular breeding.
Results
Charting a reference atlas of chromatin accessibility in rice
To generate a comprehensive landscape of accessible chromatin in rice (Oryza sativa), we took advantage of an improved ATAC-seq protocol (UMI-ATAC-seq12, which incorporates unique molecular identifiers to the regular ATAC-seq technique for accurate quantification and footprinting) to perform chromatin accessibility profiling in 23 tissues/organs spanning the entire life cycle of rice. The representative tissues include callus, radicle, plumule, leaf, leaf sheath, root, apical meristem (AM1/AM2), dormant buds (DBuds), shoot apical meristem (SAM1/SAM2/SAM3), panicle neck node (PNN), stem, young panicle (Panicle1/Panicle2/Panicle3/Panicle4), lemma, palea, pistil, stamen and seed coat (Seed1/Seed2/Seed3). The experiments were conducted in three representative rice varieties, namely Nipponbare (NIP; japonica subspecies), Minghui 63 (MH63; indica subspecies type II), and Zhenshan 97 (ZS97; indica subspecies type I), with each experiment consisting of at least two biological replicates (Fig. 1a and Supplementary Data 1). In total, 145 genome-wide chromatin accessibility datasets with high sequencing depth (~30.7 M reads on average) were generated. We applied the ENCODE standards13,14 to establish the analysis pipeline (see Methods). Compared to published ATAC-seq datasets in the plants as deposited in the ChIP-Hub database14, our data exhibited a significantly higher signal-to-noise ratio (Supplementary Fig. 1). Through data analysis using the corresponding reference genomes of the three cultivars15,16, we identified on average of 40,676 (ranging from 28,991 to 49,737) reproducible OCRs (with an Irreproducible Discovery Rate [IDR]17 < 0.05) per experiment (Fig. 1b). As expected, the identified OCRs from all experiments predominantly located either in the proximal upstream regions of the transcription start site (TSS) or the distal intergenic regions (Fig. 1b, f, Supplementary Fig. 2 and Supplementary Data 2), resembling promoters or enhancers, respectively18. Of note, OCRs from intragenic regions accounted for a relatively small proportion (about 15.7%), while most of these OCRs originated from intronic regions (Supplementary Fig. 2b). These observations indicate that the vast majority of OCRs originate from noncoding regions of the rice genome.
a ATAC-seq and RNA-seq experiments were conducted in three varieties (Nipponbare, Minghui 63, and Zhenshan 97) of rice in various tissues across the entire life. See Supplementary Data 1 for detailed descriptions of sample collection. Consistent tissue color code is used throughout the figure. b Bar plot showing the number of reproducible OCRs identified from each tissue in the three rice varieties. The OCRs are classified into three categories based on the distance of the OCR summit to its closest transcription start site (TSS): distal (>1 kb), proximal (<=1 kb), and intragenic. No data from the tissues of SAM3 (NIP), Seed1 (ZS97), and Stem (ZS97). c The proportion of the rice genome annotated as open chromatin regions (OCRs) in our study. d The accumulative number of unique OCRs in each tissue, calculated by excluding OCRs that overlap with the OCR superset. e Density plot showing the enrichment of TF binding sites (TFBSs) around the OCRs in Nipponbare (NIP). TFBSs were predicted either by ChIP-seq datasets for 56 distinct TFs (left) or DNA motifs for 458 TFs (right), which were obtained from the ChIP-Hub database14. The flanking area on both sides is 1 kb. f The distribution of the distance of OCR summit to its closest TSS in the three rice varieties. Published open chromatin data14 in rice (NIP) were included for comparison. Based on the distribution, a cutoff of 1 kb (dashed line) was used to distinguish the proximal and distal regulatory OCRs. g The distribution of the conservation PhastCons score19 around the NIP OCRs. h The t-SNE plot showing an unsupervised clustering analysis of chromatin accessibility across different samples. Each dot represents one replicate. Color code as in a. i Boxplot showing the distribution of tissue specificity score of intragenic (n = 14239), proximal (n = 29524) and distal (n = 57153) OCRs (left) or the median score in each tissue. P1 = 4.01e-39, p2 = 2.13e-96, p3 = 1.23e-95. All p-values were calculated by two-sided Mann–Whitney U test between proximal and distal OCRs in terms of specificity. Tissue color code as in a. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). Source data are provided as a Source Data file.
We estimated that approximate 15% of the rice genome could be annotated as OCRs, with a consistent pattern observed across each variety (Fig. 1c), and the estimation appeared to have reached saturation in rice (Fig. 1d). OCRs contain multiple TF binding sites and are responsible for regulating the expression of target genes6,18. We collected publicly available ChIP-seq data for 56 distinct TFs (Supplementary Data 3) and predicted DNA motifs for 458 TFs in rice from the ChIP-Hub database14, and showed that OCRs were significantly enriched for TF binding sites (Fig. 1e). Furthermore, we found that OCRs are highly evolutionarily constrained compared to flanking genomic regions (Fig. 1g), supporting previous findings that conserved noncoding sequences (CNSs) are predictive of OCRs in plants19,20.
We next assessed the overall similarities and differences of chromatin accessibility across varieties and tissues. We quantified all datasets based on the merged OCRs (n = 117,176) called from the same reference genome (i.e., Nipponbare) and visualized their global patterns using t-distributed stochastic neighbor embedding (t-SNE)21. While dimension 1 and 2 of t-SNE results generally reflected differences between the indica (MH63 and ZS97) and japonica (NIP) subspecies, dimensions 2 and 3 primarily delineated distinct clusters among tissue types (Fig. 1h). For instance, the chromatin accessibility patterns of vegetative and productive tissues of NIP were separated into distinct clusters, whereas young panicles and callus tissues exhibited similar patterns regardless of their variety origin. We further calculated the tissue specificity of each OCR based on the Jensen-Shannon divergence (JSD) index. Obviously, distal OCRs showed significantly higher specificity scores than proximal OCRs (Fig. 1i and Supplementary Fig. 3a, b), consistent with previous findings14,18,22.
In short, the comprehensive accessible chromatin landscape in rice represents a value resource for crop functional genomic studies.
Linking open chromatin regions to target genes
To decipher which genes these OCRs may regulate, we generated matched RNA-seq datasets for the investigated tissues in each rice variety (Supplementary Fig. 3c and Supplementary Data 4). We adopted a strategy23 to predict OCR-to-gene links based on correlation analysis between the OCR accessibility and gene expression across all samples (Fig. 2a; see Methods). Genes can be regulated by multiple OCRs (including promoters and enhancers) through chromatin interactions, which are supposed to occur within topologically associated domains (TADs). Since the size of TADs in the rice genome was estimated to be ranging from 35 kilobase pair (kb) to 45 kb based on Hi-C data24,25, we restricted our analysis to 40 kb (i.e., from 20 kb upstream to 20 kb downstream of the TSS) to predict target genes of OCRs. Using a cutoff of absolute Pearson correlation coefficient |R|>= 0.4 and P < 0.05, we obtained a total of 59,075 unique links between OCRs (n = 38,437, 32.8% of all OCRs) and genes (n = 18,781, 48.1% of annotated genes; Supplementary Fig. 4a, b and Supplementary Data 5). As expected, the OCR-to-gene links tended to occur more frequently in the proximal OCRs, and consequently the correlation between the gene expression and chromatin accessibility is higher for proximal links (Supplementary Fig. 4c–f).
a Schematic diagram illustrating the correlation-based approach to link ATAC-seq OCRs to target genes based on correlation analysis between chromatin accessibility and gene expression. b Heatmap showing the tissue-specific OCR-to-gene links (R > = 0.4, P < 0.05, two-tailed Z-test). Each row in the left panel is a unique OCR. Each row in the middle panel is a gene, corresponding to target genes for OCRs in the left panel. Representative genes are shown on the right. c Examples of tissue-specific OCRs (in the dashed box) regulating dynamic expression of the corresponding target genes. The orange lines indicate the OCR-to-gene links, and the deeper the line the higher the correlation between the chromatin accessibility and gene expression. d Enrichment of biological processes gene ontology (GO) terms for target genes in each OCR cluster in b. The asterisk (*) denotes P < 0.05 (P-values were calculated by Hypergeometric test after Bonferroni correction). e Bar plot showing the percentage of OCRs from different categories based on the genomic location. Source data are provided as a Source Data file.
Genetic variants within OCRs can contribute to changes in gene expression levels through expression quantitative trait loci (eQTL). We colocalized the identified OCR-to-gene links from our study with published eQTL data in rice26, and we found a significant overlap (Chi-squared test, P < 1.55e − 06) between OCR-to-gene links and eQTL-gene pairs (Supplementary Fig. 4g). As expected, the correlation coefficients of colocalized OCR-to-gene links with eQTLs are significantly higher than those without colocalization (Wilcoxon test, P = 4.11e−38; Supplementary Fig. 4h). We identified numerous known regulatory variants that influence the expression of genes associated with agronomic traits. To name a few, a variant within a distal regulatory region ( ~ 12 kb upstream) of qSH1 modulates its expression dynamics, leading to change the seed shattering in rice9. Accordingly, there is a positive correlation (R = 0.47, P < 0.013) between the accessibility of this enhancer and the expression of qSH1 in various tissues, particularly in SAM where gene expression increases (Supplementary Fig. 4i, l). Similarly, OsLG1 is tightly linked to upstream regulatory regions that colocalize with a strong QTL associated with the panicle shape trait27 (Supplementary Fig. 4j, l). IPA1 showed significantly positive correlation (R = 0.84, P < 2.95e−8) between its enhancer activity and gene expression, with increased expression in yield-related tissues (Supplementary Fig. 4k, l), confirming an important role of IPA1 to shape rice ideal plant architecture (IPA) and thus to enhance grain yield28.
Taken together, the predicted OCR-to-gene links provide regulatory insights into agronomic trait development in rice and highlight targetable OCRs of important genes for genome editing.
Dissecting tissue-specific and stage-specific regulatory grammar
The comprehensive chromatin accessibility landscape of representative tissues gave us an opportunity to uncover tissue-specific regulatory grammar. We quantified the tissue-specificity of OCRs by utilizing the JSD score, which enables the discrimination of target genes from housekeeping (e.g., GAPDH29 and OsGOGAT130) to tissue-specific (e.g., OsYABBY531 and OsWRKY4732) according to the above predicted OCR-to-gene links (Supplementary Fig. 5 and Supplementary Data 6). We have specifically focused on analyzing highly tissue-specific OCRs (n = 6686 with a cutoff of JSD > 0.08, ~ 7% of all OCRs) as they may encode the tissue-specific regulatory grammar. These OCRs were further annotated as promoters (n = 2322) or enhancers (n = 4364) according to the genomic distance to the TSS. By performing joint clustering analysis of chromatin accessibility and target gene expression using OCR-to-gene links, we identified 20 distinct clusters of OCRs (Fig. 2b and Supplementary Data 7). Each cluster had 200~500 OCR-to-gene links that were highly activated in specific tissues, and showed a high degree of consistency with the known biological characteristics of the corresponding tissues (Fig. 2b–d). For instance, the palea- and lemma-specific links in cluster 5 (C5) contained promoter-enhancer interactions at the locus of GW8, which is a known gene controlling grain weight in rice33 (Fig. 2c). Accordingly, GW8 was highly expressed in pistil, lemma, and palea. Gene ontology (GO) enrichment analysis using genes from C5 revealed that biological processes such as ‘pollen−pistil interaction’ and ‘pollination’ were overrepresented (Fig. 2d). Similarly, we identified a number of OCRs in C19 that were highly and specifically accessible in meristem-like tissues (including young panicle and shoot apical meristem), and the associated target genes showed significant enrichment for functions related to ‘reproductive system development’, ‘flower development’, and ‘shoot system development’ (Fig. 2b, d). Notably, RFL, a crucial regulator for plant architecture and flowering time34,35, was among these target genes (Fig. 2c). Interestingly, we observed that a higher proportion (28.9%) of tissue-specific OCRs originated from distal intergenic regions compared to constitutive OCRs (12.3%). In contrast, approximately 85% of constitutive OCRs were derived from the proximal-promoter regions. (Fig. 2e).
To delineate the TFs that may bind to these tissue-specific OCRs, we used GimmeMotifs36, a versatile tool can detect tissue-specific TF binding motifs by comparing TF binding activity across multiple experiments. We restricted our analysis to the top 2,500 OCRs in each tissue, as determined by their specificity measurement (SPM) score37. The predicted regulatory motifs showed significant enrichments in a tissue-specific manner in matching tissue types (Supplementary Fig 6 and Supplementary Data 8). We narrowed our focus to the top enriched regulators in each tissue type, and found many of the inferred links correspond to known regulatory relationships (Fig. 3a). For example, OsIDS1, a gene that plays a vital role in shaping inflorescence structure and establishing floral meristems38,39, exhibited relatively high activity in the panicle. OsbZIP72, enriched in plumule tissue, has been found to regulate plumule length by modulating abscisic acid (ABA) signaling and promote seed germination40,41. Notably, the tissues of seed and pistil demonstrated a co-enrichment pattern of crucial regulators involved in flower and seed development, including MFO1 and MADS6342,43,44 from the MADS gene family (Fig. 3a). For each tissue type, we performed a systematic analysis to calculate the relative preference of regulators within TF families. Our analysis revealed distinct tissue-specific TF binding patterns, indicating clear preferences for specific regulators in different tissues (Fig. 3b). For instance, the TCP TF family showed a preference for enrichment in stem, stamen, and panicle neck node (PNN) tissues. This observation aligns with the known biological function of TCP genes, specifically their role in regulating cell proliferation in developing tissues45.
a Enrichment of TF motifs in tissue-specific OCRs. Only top 5 enriched TFs in each tissue are shown. See Supplementary Data 8 for the full list. The thickness of edges is proportion to the corresponding enrichment score. b The relative preference of regulators within TF families in each tissue type. Only the top 100 TF motifs in each tissue were used for analysis. c The scatter plot showing the distribution of the Pearson correlation coefficient between TF footprint score and its expression. Only absolute values of correlation coefficients greater than 0.5 are marked. d The scatter plot showing the distribution of TF footprint score and its gene expression in NIP, MH63, and ZS97(left). The error bands indicate 95% confidence intervals. Distribution of Tn5 cuts around the footprint of DL and OsSPL9 at different stages of young panicle (right). Source data are provided as a Source Data file.
Analyzing temporal ATAC-seq data through footprinting could assist in identifying key regulators, such as pioneer factors, that control developmental progression and transition46. We generated temporal open chromatin data from the young panicle, which is a crucial organ determining the yield of rice47,48, across four successive developmental stages (<1 mm, 1–2 mm, 3–5 mm, and 5–10 mm; Fig. 1a). We endeavored to identify regulatory motifs that exhibited either positive or negative correlation with the young panicle developmental stage in terms of enrichment, using dynamically changing OCRs (n = 9244; Fig. 3c, Supplementary Fig. 7a and Supplementary Data 9). The regulators that were most enriched displayed predominantly positive correlations, indicating their function as transcriptional activators. Conversely, a subset of factors exhibited negative correlations, suggesting a repressive role. In this regard, DL (encoding OsYABBY49), OsSPL950, and OsSPL1451 were identified as representative positive regulators, during the development of young panicles in rice (Fig. 3d and Supplementary Fig. 7b). However, further experimental data is necessary to validate the potential involvement of these TFs in young panicle development.
Overall, the above results provide a valuable resource that can help guide studies of candidate key regulators for tissue-specific gene regulation.
Systemic localization of GWAS variants in tissue-specific regulatory DNA
Genome-wide association studies (GWAS) have identified numerous natural variations linked to various agronomic traits in rice3. To systematically colocalize GWAS-associated variants with the above annotated regulatory elements, especially those from noncoding regulatory regions, we compiled a comprehensive rice GWAS catalog from recent genome-wide association meta-analysis studies2,52,53,54 as well as the NGDC GWAS Atlas database55. In total, we collected 4831 significant (P < 1e−5) and representative (only considering lead SNP) associations for 209 distinct quantitative traits which can be classified into seven major categories56: morphological characteristics, physiological features, yield components, grain quality, resistance, coloration, and others (Fig. 4a and Supplementary Data 10). In a nutshell, these GWAS SNPs dominantly located in intergenic noncoding regions (Fig. 4b and Supplementary Fig. 8a) and 24.5% of them were either situated within a noncoding OCR (21.1%) or located in linkage disequilibrium (LD) with SNPs in a neighboring OCR (3.4%) (Fig. 4c). Moreover, OCRs revealed significantly higher enrichment of GWAS SNPs than protein-coding sequences (Fig. 4d), highlighting the crucial function of regulatory variants in determining phenotypic characteristics.
a Categorical proportions of lead SNP in each GWAS. The inner circle indicates the proportions of the seven major categories, and the outer circle indicates the subcategories contained in each major category. Only high proportions are marked in the outer circle. b Distribution of curated lead SNPs by genomic context. All lead SNPs are the same as in a. c Overlap proportions of lead SNPs and sets of SNPs with strong linkage disequilibrium (LD > 0.8) with lead SNPs with ATAC-seq OCRs, ChIP-seq peaks and footprints identified by NIP ATAC-seq, respectively. d The barplot showing the SNP density of OCR and CDS regions at different GWAS P-value thresholds. The error bars are the standard deviations of the SNP densities in the six GWAS catalogs from a. Data represents the mean ± SD of 6 independent GWAS catalogs. The P values were calculated by two-tailed Student’s t-test. e Boxplots showing the tissue-specificity score distribution of OCRs that overlap with grain width53 and leaf blade width52 GWAS SNPs. For grain width, the sample sizes for the “with” and “without” groups are 896 and 4480, respectively. For leaf blade width, the sample sizes for the “with” and “without” groups are 2864 and 5728, respectively. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). The P-values were calculated by two-tailed Student’s t-test. f The enrichment of GWAS SNPs2 in OCRs with different GWAS P-value threshold. g Manhattan plot showing the GWAS signal distribution of vg0724670482 and the LD distribution of its surrounding SNPs. The track plot demonstrates that the OCR where this SNP is located has a higher accessibility in palea tissue. “O2G” represents OCR-to-gene links. h Same meaning as g, except that vg0431203743 has a higher accessibility in SAM and young panicle. Source data are provided as a Source Data file.
Furthermore, our findings demonstrated that OCRs containing GWAS SNPs exhibited greater tissue specificity (Fig. 4e, f and Supplementary Fig. 8b-d). For instance, one of the OCRs containing a GWAS lead variant vg072467105553 (C/T, GWAS P < 9.27e−8) significantly associated to panicle number. This OCR was found to be highly accessible specifically to young panicle tissues and its accessibility showed a positive OCR-to-gene link with the expression of GW7 (R = 0.59, P < 9.14e−5; Fig. 4g). In another example, the GWAS lead variant vg0431427332 is significantly associated to leaf blade width52 (P < 1.58e−8), which was located in a SAM/Panicle-specific OCR to positively regulate the expression of NAL1 (R = 0.72, P < 1.16e−6) (Fig. 4h). The previous studies have shown that NAL1 is not only associated with leaf width but also with yield52 and has natural variations in expression levels26. More examples of validated OCR-related associations are presented in Supplementary Fig. 8e.
Tissue-specific regulatory variants explain agronomic trait associations
The variation in DNA sequences within OCRs plays a significant role in driving phenotypic innovation through altering chromatin state and gene expression patterns, which usually occurs in a tissue-specific manner. To investigate the relationship between genetic variations associated with agronomic traits and tissue-specific OCRs, we calculated the enrichment of genetic variations within OCRs in a tissue-specific manner. It turned out that significant GWAS SNPs were frequently enriched in OCRs of trait-relevant tissues (Fig. 4f and Supplementary Fig. 8d). For example, GWAS variants associated with spikelet traits were highly enriched in OCRs specific to the tissues of SAM1, pistil and panicle. Motivated by this observation, we performed an enrichment analysis of GWAS-identified SNPs in OCRs from various tissues, using a SNP enrichment method termed CHEERS57 (Supplementary Fig. 9). Of the 209 curated GWAS-related traits, ~78% (163 of 209) phenotypic traits showed GWAS SNP enrichment in at least one tissue (Supplementary Fig. 10 and Supplementary Data 11). The observed enrichment of agronomic trait-associated variants in regulatory elements was highly specific to tissue types, and the association is largely compatible with our current understanding of the tissue function (Fig. 5a). For example, in various GWAS studies, regulatory variants associated with plant height was enriched in stem-related tissues; while genetic associations for grain-related traits (such as grain thickness, grain width, grain length, blighted grains per plant, and filled grains per plant) were highly enriched in OCRs specific to the tissues of seed, lemma, pistil, and stamen (Fig. 5a). Meanwhile, we found that variants associated with root length were predominantly enriched in the root tissue. Specifically, a significant SNP (vg080620195758, P < 3.98e-8) located in a root-specific enhancer of OsHAK12, which has been shown to be involved in K+ uptake in roots59 (Supplementary Fig. 11a).
a GWAS SNPs enrichments for ATAC-seq OCRs of different tissues. The heatmap showing the significant tissue-specific enrichment results. The values are transformed by -log10(P) and then normalized by row. Those marked with an asterisk represent P < 0.05 for this result. The P values were calculated by Kolmogorov-Smirnov test. Only tissue data for the NIP variety were used for this analysis. The full list for GWAS enrichment result could be accessed by Supplementary Data 11. b One representative examples of genomic tracks at loci OsbZIP06 showing that GWAS lead SNP is located in tissue-specific OCRs. The GWAS study name and SNP location (denoted by red dashed line) are shown at the top of panel c. Haplotype distribution of vg0131729028 in the population. This result was obtained from the RiceVarMap 2.0 database7. d Identification of mutation information of two OsbZIP06 mutants based on Sanger sequencing. e The images show seed germination rates of wild type and mutants of OsbZIP06. f The line graph showing the germination rates of different mutants osbzip06 at different days of imbibition. “OE” represents overexpression. g Boxplot showing the enrichment results of proximal and distal OCRs with 209 GWAS results respectively. Only results where GWAS was significantly enriched with at least one of proximal and distal OCRs are shown. The sample size of each group is 764. The P value was calculated by Student’s t-test. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). h Venn plot showing the number of results significantly enriched (P < 0.05, Kolmogorov-Smirnov test) by proximal and distal OCRs. i Enrichment of GWAS SNPs in TSS proximal and distal OCRs. The names of the GWAS are marked at the top of the panel. The grey dashed line indicates the P-value threshold of 0.05. The P values were calculated by Kolmogorov-Smirnov test. Source data are provided as a Source Data file.
In the case of seed germination percentage, GWAS SNPs were most significantly enriched in plumule-specific OCRs (Fig. 5a). We noted a lead SNP (vg013172902860, A/G, P < 8.4e−8) localized within an intronic OCR of OsbZIP06, where the intronic OCR and OsbZIP06 formed a positive OCR-to-gene link (R = 0.82, P < 2.55e-7) with high tissue specificity in plumule and radicle (Fig. 5b). The minor allele (G) of vg0131729028 was present in a very small proportion (0.3%) in the XI population, but in 65.80% of the Aus population (Fig. 5c). We mutated the coding region (mainly 1st exon) of OsbZIP06 with CRISPR/Cas9 and found that the germination rate was higher in two frameshift mutations (osbzip06-1 and osbzip06-2) than in the wild type (Fig. 5d-f and Supplementary Data 12). In contrast, overexpression of the OsbZIP06 resulted in a lower germination rate (Fig. 5e, f). Therefore, the integration of publish GWAS data and our chromatin landscape can greatly facilitate the identification of candidate genes and the functional annotation of noncoding variants.
Furthermore, when we divided the OCRs into proximal (<3 kb from the TSS, 60,006 OCRs) and distal OCRs (>3 kb from TSS, 35,691 OCRs) before using CHEERS to do enrichment analysis. We observed that the proximal OCRs are more enriched in GWAS SNPs (Fig. 5g–i and Supplementary Fig. 11b). This implies that the enrichment above is mainly driven by the OCR close to the TSS and this result is consistent with previous studies57,61.
Deep learning models accurately predict differences in chromatin accessibility between tissues and unveil common regulatory grammar among varieties
We further investigated whether the tissue- and stage-specific regulatory grammar can be modelled. Deep learning has been successfully utilized to learn and identify essential features in genomic sequences, such as the identification of cis-elements62,63. Our previous study demonstrated that the Basenji deep learning framework64 is powerful for modelling epigenomic data in rice, such as the ability to accurately predict chromatin accessibility and to assess the impacts of variants7. Therefore, we optimized the Basenji framework to effectively model our ATAC-seq datasets from multiple tissues (Supplementary Fig. 12a,b). Three distinct models were trained for the varieties of NIP, MH63, and ZS97, demonstrating high accuracy with the mean AUROC values of 0.931, 0.921, and 0.928, respectively (Fig. 6a and Supplementary Fig. 12c). We observed that the Pearson’s correlation coefficient between the predicted and observed values of chromatin accessibility at different locations on the genome reached approximately 0.81, with the best prediction at the location of <1 kb upstream regions (Fig. 6b and Supplementary Fig. 12d). This implies that the regulatory syntax patterns within promoter regions could carry more significant information encoded in sequences, which can be effectively captured by deep learning models. Furthermore, the predicted signals from the test sets exhibit the ability to discern between distinct tissues and closely align with the clustering results of the actual values (Fig. 6c). For example, the root-specific expressed gene RCc3, responsible for regulating lateral root growth65, exhibits distinct chromatin accessibility patterns specifically in root (Fig. 6d and Supplementary Fig. 13).
a Receiver operating characteristic curves for different tissues in the NIP cultivar. The average AUORC value was 0.931. b Distribution of Pearson correlation coefficients between predicted and true signal values for different genomic regions using NIP model. Each point represents one tissue (n = 24). Data are displayed as mean ± SD. c Comparison of clustering results based on predicted and true signal values using NIP model. d The genomic tracks show the signal values predicted by NIP model versus the true signal values for Panicle1, PNN and Root, respectively. The shaded area is labelled with the gene region of RCc3. The heatmap below the tracks show the expression of the RCc3 in NIP varieties. e The boxplot showing the distribution of Pearson’s correlation coefficients for the models of NIP, MH63 and ZS97 tested separately using sequences from the other two varieties. The red dashed line represents a correlation coefficient at 0.80. Each sample consists of 24 observations. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). f The genomic tracks showing the signal values predicted with the ZS97 model for NIP, MH63 and ZS97 sequences versus the true signal values in Stamen and Stem tissues, respectively. The shaded area represents the orthologous region of GSE9 in NIP, MH63 and ZS97 varieties. The heatmap below the tracks show the expression of the GSE9 in NIP, MH63, and ZS97 varieties. g Comparison of OCRs in the three rice cultivars (NIP, MH63, and ZS97). For each cultivar, OCRs from all tissues were merged and then compared based on whole genome sequence alignments. h Ternary plot showing the chromatin accessibility of orthologous OCRs among the three rice cultivars with Panicle1 tissue. i Comparison of the SNP density within the balanced (n = 19793) and unbalanced (n = 8385) orthologous OCRs. The P value was calculated by two-tailed Student’s t test. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). j Sankey diagram showing the true chromatin accessibility difference and the chromatin accessibility difference predicted by the deep learning model for orthologous OCRs in NIP, MH63, and ZS97. The color representation is categorized in the same way as in h. Source data are provided as a Source Data file.
Subsequently, for each variety-specific model, we used test sets from the remaining two varieties to evaluate the model’s capacity for making predictions across different varieties. Our analysis revealed high Pearson correlation coefficients (about 0.8) between the predicted and observed signals (Fig. 6e). Notably, in the GSE9 promoter region, there is divergence between indica and japonica rice, marked by a 9 bp deletion and several SNPs in MH63 when compared to the sequences of NIP and ZS9766. The ZS97 model predicted the chromatin accessibility of this region in MH63 with weak signals. Contrarily, the ZS97 model accurately predicted the chromatin accessibility in NIP and ZS97, showing strong signals (Fig. 6f and Supplementary Fig. 14). These results suggest that the deep learning model can effectively make accurate predictions across varieties, implying that shared regulatory grammar across rice varieties.
We next performed comparative analyses on ATAC-seq data of 22 matched tissues/organs in both japonica rice (NIP) and indica rice (MH63 and ZS97), utilizing their respective reference genomes (Fig. 1a, b). We found that roughly 60% (60,764 out of 95,697) of OCRs were shared across all three cultivars (Fig. 6g and Supplementary Data 13). The indica varieties MH63 and ZS97 exhibited a higher proportion of shared OCRs compared to NIP from different subspecies (Fig. 6g). We next sought to compare chromatin accessibility dynamics of the 1:1:1 orthologous OCRs across the three varieties (referred to as triads; see Methods). To investigate the accessible bias of orthologous OCRs, we compared the chromatin accessibility of orthologous OCRs in each individual tissue (Fig. 6h). Orthologous OCRs were assigned into seven categories on the ternary plot based on their relative accessibility, including a balanced category and six dominated or suppressed categories in specific cultivars (Supplementary Fig. 15). The proportion of OCR triads assigned to unbalanced categories varied among different tissues, ranging from 3.2% in plumule to 24.8% in AM1 (Fig. 6h and Supplementary Fig. 16a). While promoters generally display balanced OCRs, indicating consistent accessibility across different cultivars, enhancers frequently exhibit unbalanced OCRs, reflecting cultivar-specific regulation (Supplementary Fig. 16b). Interestingly, unbalanced OCRs harbored more genotypic variations in terms of SNPs (Fig. 6i). This observation led us to suppose whether sequence variation among different varieties caused the differences in chromatin accessibility of these OCR orthologs. Therefore, we used NIP-based deep learning model to predict the chromatin accessibility signals of sequences from orthologous OCRs in NIP, MH63 and ZS97, respectively, and then compare these predictions. The results showed that about 50% of the differences in orthologous OCRs could be successfully resolved in terms of sequence variation (Fig. 6j and Supplementary Fig. 17).
In summary, the above results illustrate that deep learning models could accurately predict chromatin accessibility across tissues and varieties. The high accuracy of the models also indicates the high quality of our data.
Elucidate key genetic changes underlying cis-regulatory divergence by deep learning models
Genetic variants and de novo mutations in regulatory regions may lead to cis-regulatory divergence and thus changes in gene expression and organismal phenotypes67. We systematically dissected the cis-regulatory divergence due to genomic sequence changes (e.g., SNPs) in regulatory regions, which could be inferred from ATAC-seq data. To measure the effect of the variant on chromatin accessibility, we extracted variants that differed in the three varieties. The effect of different alleles of each variant on chromatin accessibility was evaluated using the deep learning models. We found that unbalanced OCRs had a higher absolute effect score than the balanced OCRs (Supplementary Fig. 18a) and these large-effect loci were significantly enriched for eQTLs26,68 (Supplementary Fig. 18b). This observation suggests that these putative large-effect variants are associated with changes in chromatin accessibility and gene expression. Meanwhile, we performed separate OCR-to-gene correlation analysis for each of the three varieties. We then identified conserved OCR-to-gene links and compared the correlation coefficients between them (Fig. 7a). Notably, OCRs with significant differences in correlation coefficients exhibited higher SNP density (Fig. 7b), and the OCR-to-gene links with large differences in correlation coefficients between MH63 and ZS97 were significantly enriched for differential cis-eQTL between MH63 and ZS97 (Fisher’s exact test, odds ratio = 1.81 and P < 1.83e−28)69. These suggesting that regulatory sequence variations among different varieties could influence gene expression. For instance, we observed that a SNP (vg0336150781, G/A) located in the GNP1 promoter region control grain number and plant height70. Among the OCR-to-gene links we inferred, the allele in NIP (‘G’ at this SNP) correlated with GNP1 (R = 0.59, P < 6.48e−04), whereas the allele (‘A’ at this SNP) did not show OCR-to-gene correlation in MH63 (R = 0.01, P = 0.99) and ZS97 (R = 0.17, P = 0.34) (Fig. 7c). In addition, eGWAS also demonstrated that this SNP affects GNP1 expression (Fig. 7d). When we evaluated the effects of this SNP with the deep learning model, we found that mutation of this SNP from “G” to “A” in Panicle2 significantly reduced chromatin accessibility (Fig. 7e). We also found that this variant overlaps with the footprint of OsSPL10 identified in Panicle2, and its binding site shows the typical “GTAC” motif of the SBP TF family. These results suggest that mutations control gene expression by affecting TF binding to alter chromatin accessibility.
a Density plot showing the difference in Pearson correlation coefficients (R) between the OCR-to-gene of NIP, MH63, and ZS97, respectively. The R of OCR-to-gene is not less than 0.4 we consider large differences while R located between −0.05 and 0.05 we consider no difference. b Boxplots showing the density of SNP differences between big and small difference groups. Comparisons are made by two-tailed Student’s t-test. Sample sizes for each group are labeled above their respective boxes. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). c The dot plot demonstrates that the GNP1 gene associates to an OCR (chr3:36150374-36152039) in NIP, but not in MH63 and ZS97 due to the presence of a variant (vg0336150781, G/A). Pearson’s correlation coefficient is used for the test. The error bands indicate 95% confidence intervals. The P-values were calculated by a two-tailed Z-test. d Manhattan plot showing local eGWAS results for GNP1. The eGWAS results were obtained from Ming et al. 68. e Changes in chromatin accessibility using deep learning models for mutations of 100 bp each on the left and right of vg0336150781. “Loss” represents reduced chromatin accessibility after the mutation compared to before the mutation, and “gain” represents increased. The figure shows the change in chromatin accessibility before and after the mutation in Panicle2. f The treemap showing the proportion and composition of OCRs without structural variants (SV) and OCRs with SV. Here we only consider deletions (DEL), inversions (INV), and duplications (DUP) for SV. OCRs were considered SV-related when it overlaps with DEL, DUP, and INV by at least 1 bp. g The heatmap showing the 12,313 OCR-to-gene links (R > = 0.4, P < 0.05, two-tailed Z-test) associated with SV. They were grouped into 6 clusters based on their chromatin accessibility. The number of OCRs in each cluster and the number of target genes are labeled on the right side of the heatmap. h The doughnut showing the proportion of DEL, DUP, and INV in each cluster. i Scatter plot demonstrates Pearson correlation coefficients (R = 0.83, P < 6.94e−09) between tissues for the accessibility of OCR associated with deletion and the expression of target genes (Oshsp18.0-CII). The error bands indicate 95% confidence intervals. The P-values were calculated by two-tailed Z-test. j Genome Browser showing ATAC-seq signal distribution in the vicinity of gene Oshsp18.0-CII. The gray dashed bracket represents the absence of this OCR in MH63 and ZS97 due to the deletion of this sequence. The barplot on the right shows the expression of the gene in each tissue. Source data are provided as a Source Data file.
Besides point mutations, small genomic alterations (including short insertions/deletions, inversions, and duplications) may abolish OCRs and thus confer an important avenue of regulatory divergence. We quantified all OCRs based on the NIP reference genome and investigated whether their regulatory activity dynamics were associated with short genomic alterations, which were determined by whole genome comparison across different cultivars (see Methods). In total, we found that nearly one third (26.6%) of the OCRs harbored small alterations (Fig. 7f). The regulatory activity of these mutation-associated OCRs is positively correlated with their surrounding gene expression patterns in a cultivar-specific manner (Fig. 7g, h), as exemplified at the loci of Oshsp18.0-CII and MAG2 (Fig. 7i, j and Supplementary Fig. 19a). Notably, GO analysis showed that these genes were highly enriched for various ‘response’ related functions (Supplementary Fig. 19b and Supplementary Data 14). Further investigation revealed that the identified mutation-embedded OCRs were significantly overlapped with transposable elements (TEs) (Supplementary Fig. 19c). The above results indicate that TEs may contribute to modification of regulatory sequences, fine-tuning gene expression networks and driving new functions71.
Discussion
Despite substantial progress, a complete catalog of regulatory sequences within the rice genome remains elusive, limiting the understanding of tissue-specific regulatory dynamics and GRNs. Our study presents a comprehensive exploration of rice genome regulation using the UMI-ATAC-seq technique12, providing insights into tissue-specific regulatory elements and their influence on complex agronomic traits. Of note, the identified OCRs in rice encompass approximately 15% of the genome, a notably higher proportion compared to previous reports in plants such as Arabidopsis (~4%)72 and maize (~4%)73. This expanded coverage underscores the importance of sampling depth in characterizing the regulatory complexity in plants and highlights the need for further comparative analyses to elucidate species-specific regulatory features.
Predicting OCR-to-gene links presents a significant challenge due to the intricate regulatory mechanisms governing gene expression. By integrating RNA-seq data from matched tissues, we predicted 59,075 OCR-to-gene links, including many reported enhancer-to-gene links. This analysis offers a holistic view of how changes in chromatin accessibility directly impact gene expression patterns, underscoring the significance of regulatory elements in shaping the rice transcriptome. The identified associations between enhancers and target genes provide guidelines for dissecting complex regulatory mechanisms and gene editing in non-coding regions. The approach for predicting OCR-to-gene links based on multi-omics data is versatile and transferable to other plant species. Despite our efforts to predict OCR-to-gene links, less than half of the protein-coding genes exhibit a relatively strong correlation (Pearson correlation coefficient |R|≥ 0.4 and P < 0.05) with OCRs. The complexities of dynamic and context-dependent regulation, coupled with long-range and indirect regulatory mechanisms, introduce additional layers of complexity to OCR-to-gene link prediction beyond the capabilities of linear models aimed at directly mapping OCRs to their target genes. These factors likely contribute to the weaker correlations observed for certain genes. Moreover, tissue-specific and housekeeping genes are difficult to correlate through linear models due to the small variation in expression levels between tissues (Supplementary Fig. 20).
Deep learning has emerged as a potent tool for interpreting the genomic and epigenomic data62,63, but its application in rice is hindered by the scarcity of high-quality epigenomic datasets. Our study addressed this gap and successfully modelled the chromatin accessibility of three rice varieties. The highly accurate models enable the prediction of chromatin accessibility variation across varieties using sequences, providing a reference for scientists to explore the functional effects of rare variants or new variants across different tissues.
Moreover, our comparative analysis across varieties revealed cis-regulatory divergence that could largely be predicted using deep learning models based on sequences, highlighting the genetic diversity of rice varieties and its impact on regulatory architecture. By integrating GWAS data, we localized significant variants within noncoding regulatory regions, demonstrating that these variants are preferentially located in tissue-specific OCRs, thus providing insights into the influence of regulatory variations on phenotypic outcomes. A notable achievement of our study is the identification of OsbZIP06’s role in seed germination, demonstrating the potential of integrating GWAS data with chromatin accessibility to uncover the genetic basis of complex traits.
In summary, our extensive chromatin accessibility atlas and the deep learning models we have constructed not only enhance our understanding of regulatory elements in rice but also serve as a versatile resource for gene editing and breeding strategies targeting non-coding regions. Nevertheless, there are several limitations associated with our study. Firstly, our map solely encompasses data from normal conditions, omitting insights into responses to biotic or abiotic stresses, mutants, and diverse environmental circumstances. Secondly, the inferred associations between OCRs and genes require experimental validation to confirm their regulatory relationships. Thirdly, our study primarily employed the NIP reference genome, thereby excluding sequences that were not available in the NIP genome. Furthermore, the advent of single-cell technologies has opened avenues for studying cis-elements at a single-cell resolution73,74. In the future, incorporating single-cell data will be crucial for further characterizing the heterogeneity among different cell types. These advancements will collectively contribute to a more comprehensive understanding of the regulatory landscape in rice and beyond.
Methods
Plant materials, ATAC-seq, and RNA-seq experiments
Three rice varieties, Nipponbare, Zhenshan 97 and Minghui 63, were planted in a field in Wuhan, China in the summer of 2020 and were used to obtain most of the tissues or organs used in this study. Details of the sampling are listed in Supplementary Data 1. We followed our previously established method to perform UMI-ATAC-seq experiments12. RNA was isolated using TRIzol reagent (Invitrogen Life Technologies), and sequencing libraries were prepared using the MGIEasy RNA Library Preparation Kit. The libraries were subsequently sequenced on the MGISEQ-2000.
ATAC-seq data analysis
For the pre-processing of ATAC-seq data, we follow the workflow of ChIP-Hub14 and cisDynet75. The raw reads were first trimmed by Trimmomatic (v.0.36)76 to remove sequencing adapters. The trimmed reads were aligned to the Oryza sativa L.ssp.japonica (cv. Nipponbare) reference genome (v.7.0)16 using Bowtie277 with the following parameters”-q—no-unal—threads 8—sensitive”. All reads mapped to mitochondrial and chloroplast DNA were removed. After sorting mapped reads with SAMtools78 (version 0.1.19), we only used properly paired reads with high mapping quality (MAPQ score > 30) for the subsequent analysis. The PCR duplicates were removed using the MarkDuplicates function from Picard tools (version 2.60; http://broadinstitute.github.io/picard/). The “callpeak” function in MACS279 (version 2.1.0) was used to call peaks with the following parameters: “-g 3.0e8 --nomodel --keep-dup 1 -B --SPMR --call-summits”. The “-shift” used in the model was determined by the analysis of cross-correlation scores using the phantompeakqualtools package (https://code.google.com/archive/p/phantompeakqualtools/).
RNA-seq data analysis
RNA-seq reads were aligned to the Nipponbare reference genome16 using STAR80 (version 2.7.1a). The expression of annotated genes was measured by RSEM81 (version 1.2.22) and normalized with transcripts per million (TPM).
Linking OCRs to target genes
To assign OCRs to genes, we used an approach similar to the previous study23,82. First, we prepared the ATAC-seq quantification matrix, with each row representing a merged OCR and each column representing a sample. After merging replicates, 66 tissues with both ATAC-seq data and RNA-seq data were taken as independent samples for the analysis. For the gene expression quantification matrix, we removed possible noise by considering only those genes whose TPM of each row added up to > 1.5. For each of the remaining 29,571 genes, we screened the OCRs that might regulate the gene within 20 kb upstream and downstream of the TSS of that gene separately. Then we calculated the Pearson correlation coefficients between the chromatin accessibility of these OCRs and the expression of that gene. Then we randomly generated pseudo-peak sets of the same length and number as these OCRs on other chromosomes, repeated the process 10,000 times, and used Z-test (z.test function from the R package ‘TeachDemos’) to calculate P values. Finally, we considered that absolute Pearson correlation coefficients (|R|) >= 0.4, and P < 0.05 were significant OCR-to-gene links. For the identification of OCR-to-gene links of NIP, MH63, and ZS97, we used the same strategy except that we used ATAC-seq and RNA-seq samples of the corresponding varieties.
Tissue-specific OCRs analysis
We merged the peaks with NIP tissues, counted the number of Tn5 cuts of these peaks in different tissues, normalized them, and then used the Jensen-Shannon Divergence (JSD) from the philentropy R package (https://github.com/drostlab/philentropy) to screen tissue-specific OCRs. Here we considered OCRs with JSD score > 0.08 (except > 0.1 for young panicle) as tissue-specific OCRs. For tissue-specific OCR, we performed Z-score transformation by row for visualization. To identify the top tissue-specific OCRs in each tissue, we employed a scoring metric known as Specificity Measurement (SPM), as detailed in the method provided at https://github.com/apcamargo/tspex. Subsequently, we sorted the OCRs within each tissue based on their SPM scores to select the top 2500 tissue-specific OCRs in each tissue.
Motif enrichment analysis
For motif enrichment analysis of tissue-specific OCRs, we first calculated the Tau index score using SPM metric for each tissue’s OCRs and selected the top 2500 OCRs of each tissue for motif enrichment analysis according to the ranking of Tau index scores. Then we used GimmeMotifs36 with maelstrom function to determine the tissue-specific motifs enrichment. We set the “--filter-cutoff” to 0.4. The input Position weight matrix (PWM) was downloaded from the JASPAR83 database (https://jaspar.genereg.net/). We combined the enrichment results of three methods (Lasso, Bayesian ridge regression, and boosted trees regression) to get the final motif enrichment lists.
ChIP-seq enrichment analysis
The public ChIP-seq data used in this study are provided in Supplementary Data 3. We downloaded the narrow Peak files of these TFs from the ChIP-Hub database (https://biobigdata.nju.edu.cn/ChIPHub/), and then used BEDTools84 (version 2.29.1) fisher function to calculate the enrichment level with the OCRs.
TF motif and footprinting analysis
For the TF motif enrichment analysis, we used the SEA program from the MEME suite and used constitutive OCRs as background. We considered motifs with P-value < 1e−5 to be significantly enriched. For genome-wide TF potential binding sites, we used the FIMO program in MEME to identify them and also used P-value < 1e−5 as the cutoff.
TF footprints were calculated by TOBIAS (version 0.13.1)85. We first used TOBIAS ATACCorrect function to correct the Tn5 inherent insertion bias. Then we calculated the footprint score in OCRs using FootprintScores function with default parameters. Finally, we used BINDetect function to predict the transcription factor binding footprint for each sample, which were matched to curated list of JASPAR83 motifs (https://jaspar.genereg.net/).
Cross-variety comparisons of OCRs
We first aligned the whole genome sequences of NIP, MH63, and ZS97 to each other. The strategy used for the whole-genome alignment was similar to the previously described method23. The results were further filtered to obtain more reliable conserved sequences following the default process of “Reciprocal Best” (http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best). We obtained three superset OCRs by merging OCRs of tissues shared by three varieties s (n = 22). Then we used the bnMapper.py script in bx-python (https://github.com/bxlab/bx-python) to convert the OCRs coordinates of MH63, ZS97 to the corresponding coordinates of the NIP. We then considered the OCRs of MH63, and ZS97 with at least a 50% overlap with the OCRs of NIP to be conserved OCRs for the three varieties. To obtain a quantitative matrix of conserved OCRs, we first quantified all OCRs for each variety and divided the length of the corresponding OCRs by the CPM strategy, and then extracted the conserved OCRs for each variety for subsequent analysis. We then refer to it to classify conservative OCRs into seven categories (NIP dominant, MH63 dominant, ZS97 dominant, NIP suppressed, MH63 suppressed, ZS97 suppressed, and balanced).
Deep learning model analysis
We used the Basenji64 deep learning framework with modifications to accommodate the relatively small rice genome for deep learning model training. We first use the bam_cov.py script to convert the bam files into bigwig files. We then used the basenji_data.py script to prepare the input files for the deep learning model according to the following parameters: “-d 1.0 -s 0.1 –local -t 1 -v 4”. The “-c, -l, -w” of these parameters are shown in the Supplementary Data 15. Data from chromosome 1 was used as the test sets and data from chromosome 2 was used as the validation sets. Next, we used the basenji_train.py script to train the model on a NVIDIA GTX 3090. The basenji_test.py script (default parameters) was used to perform the model performance test. We found differences in the training performance for different parameter settings for rice, with “-l 32768 -c 2048 -w 128” being the best, and subsequent analyses were based on models trained with this parameter. To measure the effect of variation in OCRs, we used the basenji_sat_bed.py script to perform base mutations at this locus and calculated the difference in signals between the reference and the mutation as the variation effect value. To predict the chromatin accessibility of orthologous OCRs, we extended the center of the OCR by 16,384 bp left and right to make a total length of 32,768 bp. Sequences exceeding the length of the corresponding chromosome were removed, and then sequences of the corresponding varieties were extracted using BEDTools getfasta function, and then basenji-predict_bed.py was modified to enable it to use fasta as input.
Analysis of structural variants and transposable elements
We downloaded deletions, duplications and inversions for MH63 (CX145), ZS97 (B156) in Rice SNP-Seek Database (https://snp-seek.irri.org/). Since this database provides large structural variants with a minimum length of 10 bp, we also integrated a series of variants with reference to this workflow7. Briefly, we selected Leaf ATAC-seq data from NIP, MH63, and ZS97 varieties, then aligned them to the Nipponbare reference genome using BWA-MEM78 (version 0.7.12-r1039) and identified INDELs using GATK86 (version 3.3-0-g37228af). The annotation files of transposable elements (TEs) were downloaded from Phytozome database. We use the default parameters of the BEDtools84 (version 2.29.1) intersect function to identify OCRs that overlap with structural variants and TEs.
GO enrichment analysis
All GO enrichment analysis was done in the Rice Gene Index database (https://riceome.hzau.edu.cn/)87 using default parameters. We considered FDR < 0.05 as a significantly enriched pathway.
GWAS data processing
The genotype and phenotype data used in this study were downloaded from four published cohorts. We refer to this reference3 to name them as 529 rice accessions2, 1,275 Chinese rice accessions53, 176 Japanese rice accessions52, and 3 K rice accessions54, respectively. GWAS was performed separately for each cohort by GCTA88 (version 7.93.2) with a mixed linear model. To determine the significant SNPs cutoff, we first used Genetic type 1 error calculator (GEC89, version 0.2) to evaluate the effective numbers of independent SNPs (N) and approximated by 0.05/N to estimate the cutoff. The threshold for significant SNPs varied by cohorts, we set the thresholds to 1 × 10−6, 1 × 10−4, 1 × 10−5, and 1 × 10−6 for 3 K rice accessions, 176 Japanese rice accessions, 529 rice accessions, and 1,275 Chinese rice accessions, respectively. Variants with a minor allele frequency (MAF) that was <5% were excluded. For the lead SNP identification, we used PLINK90 (version 1.9) and set the parameter “--clump-p1” to the threshold we defined above, “--clump-p2 0.05 --clump-r2 0.6 --clump-kb 1000” for the first round of parameters. Then we set the second round of “--clump-r2” to 0.1, other parameters are unchanged. We used PLINK with the following parameters “--ld-window-kb 1000 --ld-window 99999 --ld-window-r2 0.8” to calculate the SNPs with strong linkage disequilibrium (r2 > 0.8) with lead SNPs.
Enrichment analysis of GWAS-associated SNPs of different P-values with OCRs
The enrichment in the OCRs of tissue at a given threshold of different P-values was calculated as the fraction of SNPs with P-values below this threshold that overlap with the OCRs (merged all NIP tissues’ OCRs), divided by the fraction of all noncoding SNPs that overlap with the OCRs in the study. Enrichment was performed at P-value thresholds ranging from 1e-1 to 1e-7. The smallest threshold had at least 50 SNPs in their study to ensure a sufficient sample size.
GWAS SNPs enrichments
We first merged the peaks from all tissues in Nipponbare and used this peak superset to quantify each tissue. To make sure that our analysis was not interfered by low confident peaks, we dropped the peaks in the tenth percentile of the lowest Tn5 cuts coverage, yielding 86,011 ATAC-seq peaks finally. Then we performed the normalization with CHEERS_normalize.py from CHEERS57 (Chromatin Element Enrichment Ranking by Specificity) software (https://github.com/TrynkaLab/CHEERS/tree/python3). The normalized quantification matrix was next transformed to tissue-specificity score with range 0–1. To do the enrichment analysis, we used the set of lead SNPs and SNPs with strong linkage disequilibrium (r2 > 0.8) with the lead SNPs computed separately for the corresponding cohort from the 209 GWAS above as the input to CHEERS_computeEnrichment.py. The enrichment P-values were transformed by -log10 and normalized by row with Z-score for visualization. For the proximal and distal GWAS SNPs enrichment analysis, we first divided OCRs into proximal and distal according to its summit distance from the nearest TSS. All other steps are the same as described above.
Generation of transgenic rice plants
To obtain overexpression lines of OsbZIP06, the cDNA of OsbZIP06 was cloned using primers OsbZIP06-OE-F and OsbZIP06-OE-R and inserted into the Kpn1-BamH1 site of the pCAMBIA1301 vector and fused with the maize Ubiquitin promoter and three FLAG tags at its C-terminus using the ClonExpress II One Step Cloning Kit (Vazyme). The construct was then transformed into ZhongHua11 (ZH11) by Biogle GeneTech. Primers used to clone OsbZIP06 are listed in Supplementary Data 16.
For the OsbZIP06 mutant strain, T1 generation seeds produced using the CRISPR-Cas9 system were purchased from Biogle GeneTech. The sgRNA sequence OsbZIP06-CR-gRNA is listed in Supplementary Data 16.
Seed germination experiments
Seed germination experiments were performed as previously described60. Seeds of Zhonghua 11, OsbZIP06 mutant in the Zhonghua 11 background were used for germination experiments.
Statistics and reproducibility
If not specified, all statistical analyses and data visualization were done in R (version 4.0.0) or Python (version 3.8.9). R packages (e.g. ggplot2 and plotly) and Python packages (e.g. Seaborn) are heavily used for graphics. All the sources data for each figure can be found in the Supplementary Information. Specific tests used to determine statistical analyses are noted in each figure legend.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The sequencing data from ATAC-seq and RNA-seq generated in this study have been deposited in the NCBI BioProject database under accession code PRJNA940508. All public ChIP-seq used in this study are download from ChIP-Hub database (https://biobigdata.nju.edu.cn/ChIPHub/). The accession number are provided in the Supplementary Data 3. Some critical analysis results about this study can be accessed in the CART database (https://biobigdata.nju.edu.cn/cart/).
Code availability
The code related to figures is available at https://github.com/compbioNJU/CART.
References
Wang, C. & Han, B. Twenty years of rice genomics research: From sequencing and functional genomics to quantitative genomics. Mol. Plant. 15, 593–619 (2022).
Xie, W. et al. Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection. Proc. Natl Acad. Sci. 112, E5411–E5419 (2015).
Wei, X. et al. A quantitative genomics map of rice provides genetic insights and guides breeding. Nat. Genet. 53, 243–253 (2021).
Purugganan, M. D. & Jackson, S. A. Advancing crop genomics from lab to field. Nat. Genet. 53, 595–601 (2021).
Li, Y. et al. Rice functional genomics research: past decade and future. Mol. Plant 11, 359–380 (2018).
Schmitz, R. J., Grotewold, E. & Stam, M. Cis-regulatory sequences in plants: Their importance, discovery, and future challenges. Plant Cell 34, 718–741 (2022).
Zhao, H. et al. An inferred functional impact map of genetic variants in rice. Mol. Plant 14, 1584–1599 (2021).
Zhang, W. et al. High-resolution mapping of open chromatin in the rice genome. Genome Res. 22, 151–162 (2012).
Konishi, S. et al. An SNP caused loss of seed shattering during rice domestication. Science 312, 1392–1396 (2006).
Sun, X. et al. Natural variation of DROT1 confers drought adaptation in upland rice. Nat. Commun. 13, 4265 (2022).
Bai, X. et al. Duplication of an upstream silencer of FZP increases grain yield in rice. Nat. Plants 3, 885–893 (2017).
Zhu, T., Liao, K., Zhou, R., Xia, C. & Xie, W. ATAC-seq with unique molecular identifiers improves quantification and footprinting. Commun. Biol. 3, 675 (2020).
Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).
Fu, L.-Y. et al. ChIP-Hub provides an integrative platform for exploring plant regulome. Nat. Commun. 13, 3413 (2022).
Zhang, J. et al. Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63. Proc. Natl Acad. Sci. 113, E5163–E5171 (2016).
Kawahara, Y. et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4 (2013).
Qunhua, L., James, B. B., Haiyan, H. & Peter, J. B. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011).
Yan, W. et al. Dynamic control of enhancer activity drives stage-specific gene expression during flower morphogenesis. Nat. Commun. 10, 1705 (2019).
Zhou, X. et al. Systematic annotation of conservation states provides insights into regulatory regions in rice. J. Genet. Genomics 49, 1127–1137 (2022).
Van de Velde, J., Van Bel, M., Vaneechoutte, D. & Vandepoele, K. A collection of conserved noncoding sequences to study gene regulation in flowering plants. Plant Physiol. 171, 2586–2598 (2016).
Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S. & Kluger, Y. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat. Methods 16, 243–245 (2019).
Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015).
Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
Liu, C., Cheng, Y.-J., Wang, J.-W. & Weigel, D. Prominent topologically associated domains differentiate global chromatin packing in rice from Arabidopsis. Nat. Plants 3, 742–748 (2017).
Golicz, A. A., Bhalla, P. L., Edwards, D. & Singh, M. B. Rice 3D chromatin structure correlates with sequence variation and meiotic recombination rate. Commun. Biol. 3, 235 (2020).
Liu, C. et al. eQTLs play critical roles in regulating gene expression and identifying key regulators in rice. Plant Biotechnol. J. n/a, https://doi.org/10.1111/pbi.13912 (2022).
Ishii, T. et al. OsLG1 regulates a closed panicle trait in domesticated rice. Nat. Genet. 45, 462–465 (2013).
Zhang, L. et al. A natural tandem array alleviates epigenetic repression of IPA1 and leads to superior yielding rice. Nat. Commun. 8, 14789 (2017).
Zhang, X.-H., Rao, X.-L., Shi, H.-T., Li, R.-J. & Lu, Y.-T. Overexpression of a cytosolic glyceraldehyde-3-phosphate dehydrogenase gene OsGAPC3 confers salt tolerance in rice. Plant Cell Tissue Organ Cult. (PCTOC) 107, 1–11 (2011).
Lee, S. et al. Concurrent activation of OsAMT1;2 and OsGOGAT1 in rice leads to enhanced nitrogen use efficiency under nitrogen limitation. Plant J. 103, 7–20 (2020).
Tanaka, W. et al. The YABBY Gene TONGARI-BOUSHI1 is involved in lateral organ development and maintenance of meristem organization in the rice spikelet. Plant Cell 24, 80–95 (2012).
Raineri, J., Wang, S., Peleg, Z., Blumwald, E. & Chan, R. L. The rice transcription factor OsWRKY47 is a positive regulator of the response to water deficit stress. Plant Mol. Biol. 88, 401–413 (2015).
Wang, S. et al. The OsSPL16-GW7 regulatory module determines grain shape and simultaneously improves rice yield and grain quality. Nat. Genet. 47, 949–954 (2015).
Rao, N. N., Prasad, K., Kumar, P. R. & Vijayraghavan, U. Distinct regulatory role for RFL, the rice LFY homolog, in determining flowering time and plant architecture. Proc. Natl Acad. Sci. 105, 3646–3651 (2008).
Huang, L. et al. The LARGE2-APO1/APO2 regulatory module controls panicle size and grain number in rice. Plant Cell 33, 1212–1228 (2021).
van Heeringen, S. J. & Veenstra, G. J. C. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments. Bioinformatics 27, 270–271 (2011).
Xiao, S.-J., Zhang, C., Zou, Q. & Ji, Z.-L. TiSGeD: a database for tissue-specific genes. Bioinformatics 26, 1273–1275 (2010).
Lee, D.-Y. & An, G. Two AP2 family genes, SUPERNUMERARY BRACT (SNB) and OsINDETERMINATE SPIKELET 1 (OsIDS1), synergistically control inflorescence architecture and floral meristem establishment in rice. Plant J. 69, 445–461 (2012).
Lee, Y.-S., Lee, D.-Y., Cho, L.-H. & An, G. Rice miR172 induces flowering by suppressing OsIDS1 and SNB, two AP2 genes that negatively regulate expression of Ehd1 and florigens. Rice 7, 31 (2014).
Baoxiang, W. et al. OsbZIP72 is involved in transcriptional gene-regulation pathway of abscisic acid signal transduction by activating rice high-affinity potassium transporter OsHKT1;1. Rice Sci. 28, 257–267 (2021).
Wang, Y. et al. Abscisic acid promotes jasmonic acid biosynthesis via a ‘SAPK10-bZIP72-AOC’ pathway to synergistically inhibit seed germination in rice (Oryza sativa). N. Phytol. 228, 1336–1353 (2020).
Yu, X. et al. Abnormal flower and grain 1 encodes OsMADS6 and determines palea identity and affects rice grain yield and quality. Sci. China Life Sci. 63, 228–238 (2020).
Li, H. et al. Rice MADS6 interacts with the floral homeotic genes SUPERWOMAN1, MADS3, MADS58, MADS13, and drooping leaf in specifying floral organ identities and meristem fate. Plant Cell 23, 2536–2552 (2011).
Li, H. et al. The AGL6-like gene OsMADS6 regulates floral organ and meristem identities in rice. Cell Res. 20, 299–313 (2010).
Martín-Trillo, M. & Cubas, P. TCP genes: a family snapshot ten years later. Trends Plant Sci. 15, 31–39 (2010).
Zaret, K. S. & Mango, S. E. Pioneer transcription factors, chromatin dynamics, and cell fate control. Curr. Opin. Genet Dev. 37, 76–81 (2016).
Xing, Y. & Zhang, Q. Genetic and molecular bases of rice yield. Annu. Rev. Plant Biol. 61, 421–442 (2010).
Li, G., Zhang, H., Li, J., Zhang, Z. & Li, Z. Genetic control of panicle architecture in rice. Crop J. 9, 590–597 (2021).
Yamaguchi, T. et al. The YABBY Gene DROOPING LEAF regulates carpel specification and midrib development in Oryza sativa [W]. Plant Cell 16, 500–509 (2004).
Hu, L. et al. OsSPL9 regulates grain number and grain yield in rice. Front Plant Sci. 12, 682018 (2021).
Song, X. et al. Targeting a gene regulatory element enhances rice grain yield by decoupling panicle number and size. Nat. Biotechnol. 40, 1403–1411 (2022).
Yano, K. et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat. Genet. 48, 927–934 (2016).
Li, X. et al. Analysis of genetic architecture and favorable allele usage of agronomic traits in a large collection of Chinese rice accessions. Sci. China Life Sci. 63, 1688–1702 (2020).
Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).
Liu, X. et al. GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res. 51, D969–D976 (2023).
Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).
Soskic, B. et al. Chromatin activity at GWAS loci identifies T cell states driving complex immune diseases. Nat. Genet. 51, 1486–1493 (2019).
Zhao, Y. et al. Genetic basis and network underlying synergistic roots and shoots biomass accumulation revealed by genome-wide association studies in rice. Sci. Rep. 11, 13769 (2021).
Bañuelos, M. A., Garciadeblas, B., Cubero, B. & Rodríguez-Navarro, A. Inventory and functional characterization of the HAK potassium transporters of rice. Plant Physiol. 130, 784–795 (2002).
Magwa, R. A., Zhao, H. & Xing, Y. Genome-wide association mapping revealed a diverse genetic basis of seed dormancy across subpopulations in rice (Oryza sativa L.). BMC Genet. 17, 28 (2016).
Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022).
Zhao, H. et al. PlantDeepSEA, a deep learning-based web service to predict the regulatory effects of genomic variants in plants. Nucleic Acids Res. 49, W523–W529 (2021).
Maslova, A. et al. Deep learning of immune cell differentiation. Proc. Natl Acad. Sci. 117, 25655–25666 (2020).
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Li, X. et al. Overexpression of RCc3 improves root system architecture and enhances salt tolerance in rice. Plant Physiol. Biochem. 130, 566–576 (2018).
Chen, R. et al. A de novo evolved gene contributes to rice grain shape difference between indica and japonica. Nat. Commun. 14, 5906 (2023).
Wittkopp, P. J. & Kalay, G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2012).
Ming, L. et al. Transcriptome-wide association analyses reveal the impact of regulatory variants on rice panicle architecture and causal gene regulatory networks. Nat. Commun. 14, 7501 (2023).
Wang, J. et al. An expression quantitative trait loci-guided co-expression analysis for constructing regulatory network using a rice recombinant inbred line population. J. Exp. Bot. 65, 1069–1079 (2014).
Wu, Y. et al. The QTL GNP1 encodes GA20ox1, which increases grain number and yield by increasing cytokinin activity in rice Panicle Meristems. PLoS Genet. 12, e1006386 (2016).
Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, 199 (2018).
Sullivan, A. M. et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep. 8, 2015–2030 (2014).
Marand, A. P., Chen, Z., Gallavotti, A. & Schmitz, R. J. A cis-regulatory atlas in maize at single-cell resolution. Cell 184, 3041–3055.e3021 (2021).
Cuperus, J. T. Single-cell genomics in plants: current state, future directions, and hurdles to overcome. Plant Physiol. 188, 749–755 (2022).
Zhu, T. et al. cisDynet: An integrated platform for modeling gene-regulatory dynamics and networks. iMeta 2, e152 (2023).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323 (2011).
Sheffield, N. C. et al. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res 23, 777–788 (2013).
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Bentsen, M. et al. ATAC-seq footprinting unravels kinetics of transcription factor binding during zygotic genome activation. Nat. Commun. 11, 4267 (2020).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Yu, Z. et al. Rice Gene Index (RGI): a comprehensive pan-genome database for comparative and functional genomics of Asian rice. Mol. Plant, https://doi.org/10.1016/j.molp.2023.03.012 (2023).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Li, M.-X., Yeung, J. M. Y., Cherny, S. S. & Sham, P. C. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Acknowledgements
This work is supported by grants from STI2030-Major Projects (2023ZD04076), the National Natural Science Foundation of China (No. 32070656, U23A20189, 31922065), the Hubei Provincial Natural Science Foundation of China (2023AFA043), the Earmarked Fund for the China Agriculture Research System (CARS-01-01), the Fundamental Research Funds for the Central Universities (2662023PY002), the Foundation of Hubei Hongshan Laboratory (2021hszd005) and HZAU-AGIS Cooperation Fund (SZYJY2023003). The authors acknowledge the Center for Information Technology and the High-Performance Computing Center of Nanjing University and the bioinformatics computing platform of the National Key Laboratory of Crop Genetic Improvement at Huazhong Agricultural University for providing high performance computing (HPC) resources.
Author information
Authors and Affiliations
Contributions
D.C. and W.X. conceived and designed the project. T.Z., C.X., X.X., Y.L., and J.Y. performed experiments. T.Z., D.C., X.Z., R.Y., L.W., Z.Z., L.M. and Y.Y. conducted the bioinformatics analysis. D.C., W.X. and T.Z. wrote the paper. All the authors reviewed and approved the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Xuehui Huang and the other anonymous reviewer for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhu, T., Xia, C., Yu, R. et al. Comprehensive mapping and modelling of the rice regulome landscape unveils the regulatory architecture underlying complex traits. Nat Commun 15, 6562 (2024). https://doi.org/10.1038/s41467-024-50787-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-024-50787-y
This article is cited by
-
One versus many independent assemblies of symbiotic nitrogen fixation in flowering plants
Nature Communications (2025)
-
Constraint of accessible chromatins maps regulatory loci involved in maize speciation and domestication
Nature Communications (2025)
-
Cis-regulatory elements: systematic identification and horticultural applications
aBIOTECH (2025)
-
Enhancer Elements in Plants: Genomic Context and Biological Functions
Journal of Plant Biology (2025)