Main

TDP-43 binds to uridine/guanine-rich motifs in RNA transcripts1,2 and has a critical role in various aspects of RNA metabolism3,4,5. Defects in RNA metabolism are considered central to frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) pathogenesis3. However, it is not yet fully understood what genes TDP-43 regulates, what aspect of their RNA metabolism TDP-43 regulates and how their dysregulation promotes neurodegeneration. A major role of TDP-43 has emerged as a repressor of so-called cryptic exons during splicing6. Cryptic exons reside in introns of genes and are normally excluded from mature mRNAs. When TDP-43 is dysfunctional (that is, when depleted from the nucleus in FTD/ALS), these cryptic exons are spliced into final mRNAs, often leading to frameshifts, decreased RNA stability or even the production of new peptide sequences7,8,9,10,11,12,13. Notably, some of these cryptic exons are in genes critical for neuronal functions (for example, STMN2 (refs. 7,8,9,14)) or genes that harbor disease-associated variants that sensitize them to cryptic splicing upon loss of TDP-43 (for example, UNC13A10,11). Cryptic splicing events could serve as powerful biomarkers for TDP-43 dysfunction12,13 or even as therapeutic targets15,16.

Besides its well-established role in splicing, TDP-43 also has a critical role in other aspects of RNA processing. Do some of these additional TDP-43-dependent RNA-processing pathways also contribute to disease? One potential disease-relevant RNA-processing pathway is alternative polyadenylation (APA)—a major layer of gene regulation that occurs in >60% of human genes17. When a gene is transcribed into mRNA, it is cleaved and polyadenylated, which functions to stabilize the mRNA, facilitate its nuclear export and regulate its translation18. If alternative polyadenylation (polyA) sites are used, it could produce mRNA isoforms that have different 3′ untranslated region (UTR) lengths, impacting RNA and protein levels, subcellular localization or even protein functions17,19. If APA occurs prematurely, it can truncate mRNAs, reducing full-length protein levels17,19. Previous genome-wide TDP-43 binding studies revealed that TDP-43 binding is enriched not only in introns of genes but also in 3′ UTRs1,2, suggesting a role in regulating polyadenylation. Indeed, TDP-43 knockdown (KD) or disease-associated TDP-43 mutations affect polyadenylation20,21,22, including its own polyadenylation2,23 and premature polyadenylation in STMN2 (refs. 7,8). Notably, widespread APA changes have been observed in ALS patient samples using bulk RNA sequencing (RNA-seq)24 and single-nucleus RNA-seq25, although it is unclear whether these changes are directly owing to TDP-43 dysfunction.

Here we show that loss of TDP-43 from neuronal nuclei in postmortem brain tissue of FTD/ALS patients is associated with APA changes. Using high-resolution polyadenylation mapping in human stem cell-derived neurons, we comprehensively defined TDP-43-regulated APA events and discovered that TDP-43 loss caused widespread APA changes, impacting expression of disease-relevant genes, such as NEFL, SFPQ and TMEM106B. Notably, the strength and position of TDP-43 binding, along with the relative strengths of polyA sites, influence the outcome of APA changes upon TDP-43 dysfunction. Together, these findings provide evidence for a crucial role of TDP-43-mediated APA in maintaining neuronal health and its dysregulation as a key contributor to neurodegenerative disease.

Results

TDP-43 loss is linked to APA changes

To test the hypothesis that APA caused by TDP-43 dysfunction contributes to FTD/ALS, we first performed APA analysis in an RNA-seq dataset26, in which neuronal nuclei with or without TDP-43 were sorted from FTD/ALS postmortem brain samples for RNA-seq (Fig. 1a, top). We recently re-analyzed this dataset to discover TDP-43-dependent cryptic splicing events11. Here we ‘re-re-analyzed’ it to search for APA events (Fig. 1a, bottom). We used two different APA analysis programs and identified 41 APA changes using APAlyzer27 and 62 APA changes using QAPA28 (Fig. 1b, Supplementary Fig. 1a and Supplementary Table 1; adjusted P < 0.1). Only two genes with substantial APA changes were common between both programs—LRFN1 and MARK3 (Fig. 2c and Supplementary Fig. 2l), likely because of different polyA databases used by each program and the low resolution of RNA-seq for APA analysis. We thus considered APA changes identified by each program, which revealed APA events in genes critical for neuronal function, such as LRFN1 and SYN2. APA changes in 34 genes are associated with gene expression changes (Supplementary Fig. 1b), likely impacting their corresponding protein levels (see below).

Fig. 1: Loss of TDP-43 leads to widespread APA changes.
figure 1

a, APA analysis scheme. Scheme for sorting neuronal nuclei from frontal cortices of FTD or FTD–ALS patients based on TDP-43 levels for RNA-seq (top)26. Workflow of APA analysis using RNA-seq data (bottom). The ∆PUI is defined as the difference in log2-transformed distal polyA site usage between TDP-43 negative nuclei and TDP-43 positive nuclei. b, Loss of TDP-43 is associated with APA changes. APA change (∆PUI by APAlyzer) is plotted on the x axis and the corresponding statistical significance (the adjusted P value is calculated using the Benjamini–Hochberg procedure) on the y axis. Genes with statistically significant APA change (adjusted P < 0.1) are labeled in red for favoring distal polyA sites and in blue for favoring proximal polyA sites. Dashed horizontal line marks the adjusted P = 0.1. c, Western blot shows efficient TDP-43 KD after 7 days. Quantitation of protein levels is presented as mean ± s.e.m.; the P value was calculated using one-sided Student’s t-test; control, samples treated with scrambled shRNA (n = 3); TDP-43 KD, samples treated with TDP-43 shRNA (n = 3). d, Volcano plot shows widespread APA changes upon TDP-43 KD in iNeurons; thresholds for significant changes are |∆PUI| > 0.10, calculated by APAlyzer as in b, and adjusted P < 0.05, calculated as in b.

Source data

Fig. 2: High-resolution mapping of TDP-43-dependent APA using 3′ end-seq.
figure 2

a, Workflow of 3′ end-seq (top). Example of 3′ end-seq read coverage, indicative of polyA sites, at the 3′ UTR of MRS2 (bottom). Top track marks RNA-seq read coverage, middle track marks read coverage of 3′ end-seq with new sites marked in green and bottom track marks gene structure. b, Coverage of filtered read ends across transcripts. TSS, transcription start site; TES, transcription end site. c, Example of TDP-43-regulated APA changes detected by 3′ end-seq. The direction of LRFN1 gene is flipped for illustrative purpose (left). Example of premature polyadenylation captured by 3′ end-seq (right). Sashimi plots in the top four tracks illustrate splicing; the pink curved line marks the usage of a cryptic 3′ splice site. The ‘polyA DB’ track marks annotated polyA sites31 and the ‘TDP-43 sites’ track marks the observed TDP-43 binding34. Gene structure is shown at the bottom. The asterisk marks the positions of observed polyA site. d, Volcano plot showing 3′ end-seq captured widespread APA changes upon TDP-43 KD. The APA change (∆dPUR) is defined by the difference in the distal polyA site usage between TDP-43 KD and control samples. Thresholds for significant changes are |∆dPUR| > 0.05 and adjusted P < 0.05. The adjusted P value is calculated using the Benjamini–Hochberg procedure. e, Scatter plot with APA changes, represented by distal polyA site usage, on the x axis, and RNA level changes on the y axis, illustrating that APA changes could alter RNA levels. Genes with significant APA changes are plotted; genes with significant RNA-level changes are labeled in red when favoring distal polyA sites or blue when favoring proximal polyA sites. f, TDP-43 KD activated cryptic splicing and premature polyadenylation in intron 12 of ARHGAP32. Details are shown in c.

To test whether TDP-43 directly regulates APA, we knocked down TDP-43 levels in cortical neurons differentiated from human stem cells (iNeurons) and performed RNA-seq for APA analysis (Fig. 1c and Supplementary Fig. 1c,d). TDP-43 KD caused widespread APA changes (Fig. 1d and Supplementary Table 2; |∆PUI| > 0.1 and adjusted P < 0.05 from APAlyzer), including 24 changes, that were also observed in FTD/ALS postmortem brain samples (Fig. 1b and Supplementary Fig. 1a). Applying the same analysis to published RNA-seq datasets, we also observed APA changes in human-induced pluripotent stem cell-derived motor neurons (iMNs) upon TDP-43 KD7 (Supplementary Fig. 1e), in iNeurons carrying a pathogenic mutation (TDP-43-K263E), which disrupts RNA binding29 (Supplementary Fig. 1f), and in patient-derived iMNs carrying a different pathogenic mutation (TDP-43-M337V), which disrupts TDP-43 intermolecular interactions21 (Supplementary Fig. 1g). Although technical differences among these studies preclude direct comparisons, the presence of widespread APA changes across different studies strongly suggests that TDP-43 regulates APA in neurons and that TDP-43 loss of function or ALS-linked mutations result in widespread APA changes owing to direct and indirect effects of TDP-43 dysfunction.

Comprehensive high-resolution mapping of TDP-43-dependent APA

To comprehensively map TDP-43-dependent APA events and define how TDP-43 dysfunction contributes to neurodegeneration through APA, we sought a more sensitive assay than RNA-seq for APA analysis, because RNA-seq cannot map new polyadenylation events, identify actual polyA sites or capture ‘internal’ premature polyadenylation events (that is, ones that truncate the RNA/protein). We used a specialized transcriptomic method, 3′ end-seq, to map polyA sites with single-nucleotide resolution in iNeurons with or without TDP-43 KD (Fig. 2a). We used stringent filtering criteria to remove polyA sites likely derived from mis-priming events during cDNA synthesis, sites with no known polyA signals or sites with low read coverage (see Methods for details), resulting in high-quality polyA sites with read ends that align exactly at the 3′ ends of annotated transcripts (Fig. 2b). In total, we identified 60,369 polyA sites with a known upstream polyA signal, of which 7,975 polyA sites are new, when compared with a compendium of polyA site annotations30,31. Both annotated and new polyA sites harbored known polyA signals located ~20 nts upstream (Supplementary Fig. 2a,b). Consistent with previous findings2,20, the majority of polyA sites we identified were in 3′ UTRs, though the new polyA sites had more diverse distributions within transcripts (Supplementary Fig. 2c), consistent with the idea that they are normally repressed by TDP-43 under normal conditions. Furthermore, 3′ end-seq confirmed that loss of TDP-43 lengthened the 3′ UTR of LRFN1 (Fig. 2c, left), which we also observed in our analysis of FTD/ALS postmortem brain samples (Fig. 1b and Supplementary Fig. 1a). Notably, 3′ end-seq also captured premature polyadenylation in STMN2, which was previously identified but not apparent in standard RNA-seq data (Fig. 2c, right), demonstrating the power of 3′ end-seq for studying premature polyadenylation, the most detrimental form of APA, because it truncates the RNA/protein.

In total, TDP-43 KD altered the usage of 7,304 polyA sites (|∆dPUR| > 0.05 and adjusted P < 0.05) and caused APA changes in 3,206 genes (Fig. 2d and Supplementary Tables 3 and 4). Like recent findings in ALS samples24 and FTD samples (Fig. 1b and Supplementary Fig. 1a), as well as in our RNA-seq data from iNeurons (Fig. 1d), the majority of genes with APA changes (2,752 genes) detected by 3′ end-seq lengthened their RNA transcripts; 433 APA events were associated with at least a 1.5-fold change in RNA level (Fig. 2e), suggesting that APA changes could alter gene expression. Notably, we also observed several of these APA changes in FTD/ALS postmortem brain samples, and they were in genes connected to ALS or FTD (Supplementary Table 5).

In addition, 3′ end-seq also empowered the discovery of ‘cryptic’ polyA sites (ones not used under normal conditions but revealed upon TDP-43 KD). TDP-43 KD activated 404 cryptic polyA sites in 372 genes (Supplementary Table 3); 65 cryptic polyA sites occurred downstream of a gene’s annotated 3′ end-seq, such as the FTD risk factor RFNG32 (Supplementary Fig. 2d), ELP6 (Supplementary Fig. 3e) and SIX3, TLX1 and ELK1 (Supplementary Fig. 2e–g; ref. 33). In addition, 149 cryptic events induced premature polyadenylation in genes such as TNIP1, EGFR and GSTO2 (Supplementary Fig. 2h–j). Like coordinated activation of splicing and polyadenylation in intron 1 of STMN2, we observed apparent coordination of these two processes in 13 additional genes (Supplementary Table 6), such as ARHGAP32 (Fig. 2f; ref. 33). Cryptic splicing of ARHGAP32 was recently reported12, and we also confirmed this by qRT–PCR (Supplementary Fig. 2k) and in RNA-seq data from FTD/ALS postmortem brain samples (Fig. 2f). These results suggest a potential coupling between cryptic splicing and the activation of cryptic polyadenylation sites.

The position and strength of TDP-43 binding influence APA

By cross-referencing TDP-43-regulated APA sites with a curated genome-wide TDP-43-binding site dataset34, we found that ~70% of genes with these APA events had at least one TDP-43-binding site. Interestingly, TDP-43-binding sites located downstream of polyA sites were found closer to APA sites with increased usage, and farther from those with decreased usage, upon TDP-43 KD (Fig. 3a). Consistent with this observation, strong TDP-43 binding was enriched 50–100 nts upstream of APA sites with reduced usage, whereas two of the strongest TDP-43 binding motifs (GUGUGU and UGUGUG) were enriched 0–50 nts downstream of APA sites with increased usage (Fig. 3b). Notably, these two GU-repeat containing hexamers were also binding sites for cleavage stimulation factor 2 (CstF2)19, which binds immediately downstream of a polyA site to facilitate its cleavage and polyadenylation. Thus, TDP-43 might block CstF2’s binding to inhibit polyA site usage, and when TDP-43 is absent, CstF2 can bind to the region and promote polyA site usage.

Fig. 3: The position and strength of TDP-43 binding influence polyA site usage.
figure 3

a, Cumulative plots showing the location of TDP-43 binding relative to TDP-43-regulated polyA sites. b, Volcano plots showing the enrichment of hexamers in different regions surrounding polyA sites. c, Cumulative plots show that TDP-43 KD caused a larger absolute usage change in polyA sites with weak TDP-43 binding (low GU content) than sites with strong TDP-43 binding (high GU content). d, Cumulative plots show that for TDP-43-regulated APA sites, polyA sites with increased usage are stronger than polyA sites with decreased usage. e, Cumulative plots show that for genes with significant 3′ UTR lengthening upon TDP-43 KD, their distal polyA sites are stronger than their proximal ones. PolyA site scores (in d and e) were calculated using Aparent2, expressed as log odds ratio, and reflect polyA site strengths. P values in a and ce were calculated using two-sided Mann–Whitney U test, and adjusted P values in b were calculated using the Benjamini–Hochberg procedure.

Using GU content to estimate the strength of TDP-43 binding located downstream of APA sites, we found that TDP-43 KD caused a larger usage change in polyA sites with weak TDP-43 binding than sites with strong sites (Fig. 3c and Supplementary Fig. 2o), suggesting that, when reduced, TDP-43 likely dissociates from weak sites first, then from stronger ones.

PolyA site strength influences TDP-43-dependent APA

To determine whether polyA site strength influences TDP-43-regulated APA, we estimated the probability of cleavage and polyadenylation for each identified polyA site using a deep residual neural network-based model, Aparent2 (ref. 35), that accurately predicts polyA site usage and strength. Interestingly, we found that polyA sites with decreased usage upon TDP-43 KD are weaker than those with increased usage upon TDP-43 KD (Fig. 3d). Whereas genes with substantial 3′ UTR shortening show no difference in polyA strength between proximal and distal polyA sites, genes with substantial 3′ UTR lengthening have stronger distal sites compared with proximal ones (Fig. 3e). Given the cotranscriptional nature of cleavage and polyadenylation, our findings suggest that in the case of 3′ UTR lengthening upon TDP-43 KD, a stronger distal polyA site might compensate for its positional disadvantage over the proximal polyA site during transcription and therefore gets activated upon TDP-43 KD.

TDP-43 loss induces APA changes of disease-associated genes

TDP-43 depletion increased the usage of two distal polyA sites in ELP1 that lengthened its 3′ UTR (Fig. 4a). We confirmed such lengthening by qRT–PCR (Supplementary Fig. 3a) and longer ELP1 3′ UTR is found in multiple published datasets that have TDP-43 dysfunction (Supplementary Fig. 3b; ref. 36). ELP1, also called IKBKAP, encodes a subunit of the elongator complex, which mediates the methylation of the wobble base in tRNAs, and has been functionally and genetically linked to ALS37,38,39. In addition to ELP1, we observed APA changes in two other subunits of the elongator complex (ELP3 and ELP6) upon TDP-43 KD (Supplementary Fig. 3c–e). These APA changes are associated with protein levels; TDP-43 KD led to increased protein levels of ELP1 and ELP3 (Fig. 4b and Supplementary Fig. 3d). Increased ELP1 protein levels may be due to increased ELP1 RNA levels (Supplementary Fig. 3c). These results suggest a possible contribution of abnormal tRNA metabolism to FTD/ALS.

Fig. 4: Loss of TDP-43 causes APA changes in ELP1 and NEFL, impacting their protein levels.
figure 4

a, TDP-43 KD lengthened the 3′ UTR of ELP1. b, TDP-43 KD increased ELP1 protein levels. Quantitation of protein levels (n = 4 per condition) is presented as mean ± s.e.m.; P values were calculated using one-sided Student’s t-test. c, TDP-43 KD downregulates the usage of the proximal polyA site of NEFL, leading to the increased usage of the most distal polyA site. The asterisk in a and c marks the positions of observed polyA site. d, TDP-43 KD reduced NF-L protein levels. Quantitation of protein levels (n = 4 per condition) as in a.

Source data

Previous studies found that TDP-43 directly binds to the 3′ UTR of NEFL to stabilize its RNA levels40. NEFL encodes neurofilament light (NF-L) chain, which has emerged as a sensitive prognostic biomarker for diverse neurodegenerative diseases41, including FTD/ALS42. TDP-43 KD not only reduced NEFL RNA levels (Supplementary Fig. 3f), consistent with previous findings43, but also further shifted the apparent polyA site usage from proximal to distal (Fig. 4c), which we confirmed by qRT–PCR (Supplementary Fig. 3g), and reduced NF-L protein levels (Fig. 4d). These observations suggest that, together with NEFL 3′ UTR-targeting microRNAs44, loss of TDP-43-induced APA changes might further reduce NF-L levels.

TDP-43 KD also shifted polyA site usage from proximal to distal in another FTD/ALS-linked gene, SFPQ (Fig. 5a and Supplementary Fig. 3h), which encodes a ubiquitously expressed RNA-binding protein that has key roles in RNA metabolism45,46, and is depleted from the nucleus in sporadic ALS spinal cord47, losing its interaction with the FTD/ALS gene FUS, in neuronal nuclei in FTLD–TDP48,49. The gene structure of SFPQ suggests that, when the distal polyA site in SFPQ is used, a downstream 3′ splice site of intron 9 might be used (Fig. 5a and Supplementary Fig. 3h). Indeed, we confirmed that TDP-43 KD promoted the usage of both the distal polyA site (Supplementary Fig. 3i) and the alternative 3′ splice site (Supplementary Fig. 3j).

Fig. 5: Loss of TDP-43 promotes the usage of a distal polyA site and a downstream 3′ splice site in SFPQ, resulting in an NMD-sensitive mRNA isoform.
figure 5

a, TDP-43 KD shifted the polyA site usage from proximal to distal in SFPQ. The gene structure illustrates the following two mRNA isoforms: the canonical mRNA isoform that uses a proximal polyA site and an upstream 3′ splice site of intron 9; the alternative mRNA isoform that uses a distal polyA site and a downstream 3′ splice site of intron 9. The presence of additional exons downstream of the termination codon in the alternative mRNA isoform suggests it to be a potential NMD target. b, Inhibiting NMD with a small molecule (11j at 0.1 µm) stabilized the alternative SFPQ mRNA isoform in both control and TDP-43 KD conditions (n = 4 per condition). Data are presented as mean ± s.e.m.; P values were calculated using one-sided Student’s t-test. c, Western blot shows that TDP-43 KD reduced SFPQ protein levels. Quantitation of protein levels (n = 3 per condition) is presented as mean ± s.e.m.; P values were calculated using one-sided Student’s t-test. d,e, Levels of the long 3′ UTR (d) and the alternative splicing isoform (e) of SFPQ are substantially increased in the frontal cortices of FTLD–TDP patients (n = 221) compared with controls (n = 53). Data are presented as mean ± s.e.m.; P values were calculated using two-tailed Mann–Whitney test.

Source data

Intriguingly, this alternative SFPQ mRNA isoform seems to be a nonsense-mediated decay (NMD) target (Fig. 5a), as there are additional exons downstream of the termination codon. Indeed, inhibiting NMD increased the level of this alternative SFPQ mRNA isoform (Fig. 5b and Supplementary Fig. 3k), and this alternative isoform is less stable than the canonical isoform (Supplementary Fig. 3l). Notably, TDP-43 KD also reduced SFPQ protein levels (Fig. 5c). Together, these results provide evidence that TDP-43 KD promotes the production of an SFPQ alternative mRNA isoform that is an NMD target, leading to reduced SFPQ protein levels.

To validate the functional significance of the APA change in SFPQ, we analyzed the usage of the distal polyA site and the alternative 3′ splice site in a series of 274 frontal cortex brain samples from the Mayo Clinic Brain Bank using qRT–PCR. We found a substantial increase in the level of the longer SFPQ 3′ UTR and the usage of the alternative 3′ splice site in patients with FTLD–TDP compared with control samples (Fig. 5d,e), indicating that this SFPQ APA event occurs in human disease.

FTD risk gene TMEM106B is a TDP-43-regulated APA target

The sensitivity of 3′ end-seq further revealed that TDP-43 KD lengthened TMEM106B’s 3′ UTR (Fig. 6a), which we also confirmed by qRT–PCR (Supplementary Fig. 3m). TMEM106B is a top genetic risk factor that emerged in a genome-wide association study for FTLD–TDP50. C-terminal fragments of TMEM106B were recently found to form amyloid fibrils in the brains of older individuals and patients with neurodegenerative disorders51,52,53,54. By targeted analysis of RNA-seq read coverage across the TMEM106B 3′ UTR, we confirmed that loss of TDP-43 is associated with a longer TMEM106B 3′ UTR in FTD/ALS postmortem brain samples (Fig. 6b). We also found a substantial increase in longer TMEM106B 3′ UTRs in the frontal cortices of patients with FTLD–TDP compared with healthy controls (Fig. 6c), indicating that the increased 3′ UTR length of TMEM106B might be functionally relevant in FTD/ALS.

Fig. 6: FTD risk gene TMEM106B is a TDP-43-regulated APA target.
figure 6

a, TDP-43 KD lengthened the 3′ UTR of TMEM106B. b, Targeted APA analysis of TMEM106B using RNA-seq from FTD/ALS brain samples, shown in box plots with centerline, median; box limits, upper and lower quartiles; and whiskers, minimum and maximum. c, Level of the long TMEM106B 3′ UTR is substantially increased in the frontal cortices of FTLD–TDP patients (n = 221) compared with controls (n = 53). Data are presented as mean ± s.e.m.; P values were calculated using two-tailed Mann–Whitney test. d, Western blot shows reduced TMEM106B dimer levels after 12-day TDP-43 KD. Quantitation of protein levels (n = 3 per condition) is presented as mean ± s.e.m.; P values were calculated using one-sided Student’s t-test. e, The schematic of the luciferase activity assay for TMEM106B 3′ UTRs. f, The presence of the long TMEM106B 3′ UTR reduces translation efficiency. Dot plots show that the longer TMEM106B 3′ UTR led to lower luciferase activity (left) and higher RNA levels (right). No insert, a dual-luciferase reporter without TMEM106B 3′ UTRs; short, a dual-luciferase reporter with the short TMEM106B 3′ UTR; long, a dual-luciferase reporter with the long TMEM106B 3′ UTR. Data are presented as mean ± s.e.m.; P values were calculated using one-sided Student’s t-test. Panel e created with BioRender.com.

Source data

Because TMEM106B protein levels are altered in disease50,55, we next examined whether TDP-43 KD affected TMEM106B protein levels. Using a condition that has been shown previously to preserve TMEM106B dimers on SDS–PAGE55 (Supplementary Fig. 3n), we found that TDP-43 KD did not affect levels of TMEM106B monomers (~37 kDa) but reduced its dimer levels (Fig. 6d). A recent proteomics study also detected decreased TMEM106B levels caused by TDP-43 KD12. TDP-43 KD did not substantially affect total TMEM106B RNA levels (Supplementary Fig. 3o) but increased the RNA stability of the longer 3′ UTR isoform (Supplementary Fig. 3p), suggesting that the longer 3′ UTR might affect TMEM106B protein levels through translation. To test this hypothesis, we cloned short and long versions of the TMEM106B 3′ UTR into a dual-luciferase reporter (Fig. 6e). We mutated the proximal polyA site in the long 3′ UTR-containing reporter to prevent its usage and confirmed the production of the intended long 3′ UTR (Supplementary Fig. 3q). The long 3′ UTR-containing reporter had substantially lower luciferase activity and higher RNA levels than the short 3′ UTR-containing reporter (Fig. 6f), providing evidence that the loss of TDP-43-induced longer TMEM106B 3′ UTR reduces translation efficiency. TMEM106B dimer levels formed from the construct lacking a 3′ UTR were not sensitive to TDP-43 KD (Supplementary Fig. 3r), suggesting that TDP-43 influences TMEM106B dimer levels through its 3′ UTR.

Discussion

In the present study, we show that TDP-43 dysfunction, a hallmark of FTD/ALS, leads to widespread APA changes in disease-relevant neuronal cell lines and postmortem brain samples of FTD/ALS (Figs. 1b,d and 2d and Supplementary Fig. 1a,e–g). Furthermore, the interplay between the position and strength of TDP-43 binding and intrinsic polyA site strength influences TDP-43-regulated APA (Fig. 3). Notably, TDP-43 loss-induced APA changes are found in genes critical for neuronal function and disease-associated genes (Figs. 46 and Supplementary Fig. 3), impacting their expression (for example, ELP1, NEFL, SFPQ and TMEM106B). Together, these findings establish that APA dysregulation is a new pathogenic mechanism in TDP-43 proteinopathies.

Due to the constraints of RNA-seq for detecting APA, we applied 3′ end-seq to comprehensively map neuronal polyadenylation on a transcriptomic scale with base-pair resolution and discovered thousands of TDP-43-dependent APA events, including cryptic polyA sites. TDP-43 loss-induced APA changes substantially altered the transcriptome (Fig. 2d,f and Supplementary Fig. 2d–n), primarily through 3′ UTR lengthening, but also through 3′ UTR shortening, alternative last exon usage and premature polyadenylation. These findings underscore a critical role for TDP-43 in regulating APA.

By examining TDP-43 binding characteristics, we provide evidence that TDP-43-binding sites closer to the polyA generally result in repression of polyA site usage (Fig. 3a,b), consistent with previous findings20. Furthermore, polyA sites with weaker TDP-43 binding showed greater usage changes upon TDP-43 loss (Fig. 3c and Supplementary Fig. 2o), supporting the idea that reducing TDP-43 levels, such as in disease, likely causes it to dissociate first from weak binding sites before the strong ones. Intrinsic polyA site strength also substantially influences these APA events (Fig. 3d,e); stronger distal polyA sites could overcome their positional disadvantages over proximal polyA sites to lengthen 3′ UTRs upon TDP-43 loss. The impact of TDP-43 binding, its strength and polyA site strength on TDP-43-regulated APA provides a basis for future biochemical characterization and a toehold to prioritize antisense oligonucleotide-based therapeutic strategies to target specific APA events akin to current approaches underway to target cryptic splicing events15.

Many of the APA changes we identified were in genes with established links to FTD/ALS and other neurodegenerative diseases, often resulting in altered protein expression. For example, TDP-43 KD led to APA changes and increased protein levels of ELP1 (Fig. 4a,b), a subunit of the elongator complex involved in tRNA modification. Because abnormal tRNA metabolism affects translation and a recent finding of reduced aminoacylation of tRNAPhe in FTD/ALS56, our observations suggest that defects in tRNA metabolism might be an important mechanism contributing to FTD/ALS. Conversely, upon TDP-43 loss, NEFL exhibited a shift to distal polyA site usage accompanied by its reduced RNA and protein levels (Fig. 4c,d), adding complexity to its use as a biomarker in TDP-43 proteinopathies. For another FTD/ALS-linked gene, SFPQ, TDP-43 loss promoted the usage of a distal polyA site and a downstream 3′ splice site (Fig. 5a,d,e and Supplementary Fig. 3i,k), resulting in an alternative mRNA isoform targeted by NMD (Fig. 5b and Supplementary Fig. 3k), leading to reduced SFPQ protein (Fig. 5c); such APA and splicing change in SFPQ were validated in FTLD–TDP patient brains (Fig. 5d,e).

Notably, TDP-43 loss lengthened the 3′ UTR of TMEM106B, a top genetic risk factor for FTLD–TDP (Fig. 3a–c). Intriguingly, this 3′ UTR lengthening correlated with reduced TMEM106B dimer levels (Fig. 3d). But how could this APA event reduce the formation of TMEM106B dimers? TMEM106B APA might impact dimer levels by influencing the subcellular localization of TMEM106B mRNA and its ability to serve as a scaffold for protein–protein interactions, as has been shown for other APA events17. Recently, an Alu element insertion in the TMEM106B 3′ UTR was found in strong linkage disequilibrium with the top FTLD–TDP risk allele at this locus57,58,59 and the risk haplotype of TMEM106B was associated with reduced TMEM106B protein dimer levels and increased fibril levels60. Given the proximity of these variants to TMEM106B’s distal polyA site, future work will explore whether and how these disease-associated genetic variants impact APA and their association with TMEM106B dimer levels.

In summary, our findings, together with independent studies, reveal widespread APA changes associated with TDP-43 dysfunction in FTD/ALS33,36. Despite using different approaches, each of the three studies observed common APA changes, underscoring the importance of APA changes during neurodegeneration. Our application of 3′ end-seq complements and extends RNA-seq-based APA analysis because it enables de novo identification of polyadenylation events with base-pair resolution, revealing complex and highly sensitive APA changes (for example, NEFL, SFPQ and TMEM106B). Together, these findings and the related studies provide evidence that TDP-43 pathology contributes to disease pathogenesis through not only cryptic splicing but now also changes in APA. Future research should focus on the downstream functional consequences of these specific APA alterations in disease-relevant genes, examine APA changes in other cell types or diseases with TDP-43 pathology and explore whether targeted correction of these APA events, perhaps using antisense oligonucleotides, could offer therapeutic benefits.

Methods

Cell culture

HEK293T cells were purchased from American Type Culture Collection (CRL-3216) and authenticated by American Type Culture Collection. HEK293 cells were maintained in Dulbecco’s Modified Eagle Medium (1×) + GlutaMax-I media (Gibco, 10564011) with 10% FBS (Gibco, 16000044) and 100 U ml−1 penicillin–streptomycin (Gibco, 15140122). Cells were tested to be negative for mycoplasma using VENORGEM mycoplasma detection kit (Sigma-Aldrich, MP0025).

Stem cell maintenance and differentiation into iNeurons

Human embryonic stem cells (hESCs; H1) were purchased from WiCell (WA01), authenticated by WiCell. H1 cells were maintained in mTeSR1 Plus media (STEMCELL Technologies, 100-0276) with 50 U ml−1 penicillin–streptomycin (Gibco, 15140122) on plates coated with Matrigel (Corning, 354230). hESCs were fed every 2 days and split every 4–7 days using ReLeSR (STEMCELL Technologies, 100-0483) according to the manufacturer’s instructions. The differentiation of hESCs to neurons by forcing NGN2 overexpression was carried out as previously described61. In brief, cells were transduced with a Tet-On induction system to drive the expression of the transcription factor NGN2. Cells were dissociated on day 3 of differentiation and replated on poly-d-lysine and laminin double-coated tissue culture plates in Neurobasal medium (Thermo Fisher Scientific, 21103049) containing neurotrophic factors, brain-derived neurotrophic factor and glial cell line-derived neurotrophic factor (R&D Systems).

TDP-43 KD in iNeurons

After 7 days of being cultured for differentiation, iNeurons were transduced with lentivirus expressing scrambled shRNA or TDP-43 shRNA as previously described11, cultured for additional 7 days or 12 days, and then collected for downstream analyses. The KD efficiency was assessed by western blotting.

Immunoblotting

After 7 days of TDP-43 KD, cells were lysed at 4 °C for 15 min in ice-cold RIPA buffer (Sigma-Aldrich, R0278) supplemented with a protease inhibitor cocktail (Thermo Fisher Scientific, 78429) and a phosphatase inhibitor cocktail (Thermo Fisher Scientific, 78426). After pelleting lysates at 20,000g for 15 min at 4 °C, the supernatant was used for bicinchoninic acid (Invitrogen, 23225) assays to determine protein concentrations. Unless stated otherwise, 5–10 µg of protein lysates from each sample were denatured for 10 min at 70 °C in LDS sample buffer (Invitrogen, NP0008) containing 2.5% 2-mercaptoethanol (Sigma-Aldrich). These samples were loaded onto 4–12% Bis–Tris Mini gels (Thermo Fisher Scientific, NP0335BOX) for gel electrophoresis and then transferred onto 0.45-μm nitrocellulose membranes (Bio-Rad, 162-0115) using the semi-dry transfer method (Bio-Rad Trans-Blot Turbo Transfer System, 1704150) or at 100 V for 2 h at 4 °C using the wet transfer method (Bio-Rad Mini Trans-Blot Electrophoretic Cell, 170-3930). Membranes were blocked in EveryBlot Blocking Buffer (Bio-Rad, 12010020) or 5% nonfat dry milk in Tris-buffered saline with Tween 20 for 1 h then incubated overnight at 4 °C in blocking buffer containing antibodies against TMEM106B (1:500; Cell Signaling Technology, 93334), TDP-43 (1:1,500; Proteintech, 10782-2-AP), ELP1 (1:500; Cell Signaling Technology, 5071S), ELP3 (1:1,000; Proteintech, 17016-1-AP), NEFL (1:1,000; Thermo Fisher Scientific, MA5-14981), SFPQ (EPR11847) (1:1,000; Abcam, ab177149), GAPDH (1:2,000; Sigma-Aldrich, G8795) or GAPDH (D16H11) XP (1:1,000; Cell Signaling Technology, 8884S). Membranes were subsequently incubated in blocking buffer containing horseradish peroxidase-conjugated antimouse IgG (H + L; 1:5,000; Thermo Fisher Scientific, 62-6520) or horseradish peroxidase-conjugated antirabbit IgG (H + L; 1:5,000; Life Technologies, 31462) for 1 h. The Amersham ECL Prime kit (Cytiva, RPN2232) or SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Fisher Scientific, 34094) was used to develop blots, which were then imaged using ChemiDox XRS+ System (Bio-Rad). The intensity of bands was quantified using Fiji, and then normalized to the corresponding controls.

Immunofluorescence

After 7 days of being cultured for differentiation, cells were fixed with 4% formaldehyde (Electron Microscopy Sciences, 15710) in 1× PBS at room temperature for 15 min, washed four times with 1× PBS and permeabilized with 0.2% Triton X-100 in the blocking buffer (5% normal goat serum in 1× PBS) at room temperature for 1 h. After permeabilization and blocking, cells were stained in blocking buffer containing antibodies against VAT1L (1:500; Thermo Fisher Scientific, PA5-98934), NeuN (1:500; EMD Millipore, MAB377) and Tuj1 (1:1,000; Covance, MMS-435P) in a cold room overnight. After primary antibody staining, cells were washed four times with 1× PBS, twice with the blocking buffer and incubated with secondary antibodies in the blocking buffer containing Hoechst 33342 (1:1,000) at room temperature for 1 h. After the 1 h incubation, cells were washed four times with 1× PBS, and imaged on a Leica DMI6000 B with a ×20 objective and an HAMAMATSU ORCA-flash 4.0 camera.

Immunoblotting of TMEM106B

After 12 days of TDP-43 KD, cells were lysed and normalized as described above. To detect the impact of TDP-43 KD on the dimer level of TMEM106B, normalized cell lysates were mixed with 4× Laemmli buffer (Bio-Rad) containing 5% 2-mercaptoethanol (Sigma-Aldrich) and loaded directly onto 4–20% Tris–Glycine mini gels (Thermo Fisher Scientific, XP04205BOX) for gel electrophoresis on ice. To detect the temperature sensitivity of TMEM106B dimer, cell lysates were incubated at 4 °C, 37 °C, 70 °C and 85 °C for 10 min and then loaded onto a 4–20% Tris–Glycine mini gel for gel electrophoresis on ice. After electrophoresis, samples were transferred onto 0.45-μm polyvinylidene difluoride membranes (Bio-Rad, 162-0115) at 250 mA for 2 h using the wet transfer method (Bio-Rad Mini Trans-Blot Electrophoretic Cell, 170-3930). Membranes were blocked in 5% nonfat dry milk in Tris-buffered saline with Tween 20 for 1 h and then incubated overnight at 4 °C in blocking buffer containing the antibody against TMEM106B (1:500; Cell Signaling Technology, 93334). The secondary antibody, imaging and quantitation are the same as described above.

Total RNA extraction from iNeurons

Total RNA was extracted using TRIzol according to the manufacturer’s instructions. Total RNA was then treated with Turbo DNase (Thermo Fisher Scientific, AM2238) and cleaned using Zymo’s clean and concentration columns. The quality of RNA was examined on a High Sensitivity RNA ScreenTape (Agilent Technologies, TapeStation).

qRT–PCR from iNeurons

Total RNA (500 ng) was reverse transcribed to cDNA using the PrimeScript RT reagent Kit with gDNA Eraser (Takara, RR047A). qPCR was carried out using PowerUp SYBR Green Master Mix kit and detected by the QuantStudio3 system (Thermo Fisher Scientific). Primers are listed in Supplementary Table 7.

Competition PCR from iNeurons

Total RNA (250 ng) was reverse transcribed to cDNA using the PrimeScript RT Reagent Kit with gDNA Eraser (Takara, RR047A). Competition PCR was carried out using the GoTaq DNA polymerase (Promega, M3001) using one forward primer and two reverse primers—one targeting the canonical junction of intron 9 and the other targeting the alternative junction of intron 9. The resulting PCR products were visualized on a 2% agarose gel. Primers are listed in Supplementary Table 7.

Total RNA-seq

Total RNA from scrambled shRNA or TDP-43 shRNA-treated cells was used to construct RNA-seq libraries using the SMARTer Stranded Total RNA-Seq Kit v2—Pico Input Mammalian kit (Takara, 634411), according to the manufacturer’s instructions. The resulting libraries were quantified, pooled and sequenced on a NextSeq 500 machine using the 150-cycle high-output kit in a 75 bp paired-end mode (Illumina).

Gene expression analysis

Adapters in FASTQ files were trimmed using fastp. The adapter-trimmed FASTQ files were used for differential gene expression analysis using Salmon and DESeq2.

Splicing analysis

The adapter-trimmed FASTQ files were mapped to the human genome (hg38) following ENCODE’s recommended settings using STAR. The uniquely mapped, properly paired reads were then used for splicing analysis using LeafCutter. Cryptic splicing events were called by LeafCutter.

APA analysis using RNA-seq data

Both in-house RNA-seq and publicly available datasets were used for the analysis. The publicly available datasets were downloaded from GEO or SRA (GSE126542, GSE121569, GSE147544, GSE196144 and ERP126666). The adapter-trimmed FASTQ files were mapped to the human genome (hg38) following ENCODE’s recommended settings using STAR with GENCODE v42 transcriptome. The unique-mapping, properly paired reads were then used for APA detection by either APAlyzer or QAPA and then for differential analysis by DEXSeq. APAlyzer uses polyA DB3 (https://exon.apps.wistar.org/PolyA_DB/) and QAPA uses PolyASite 2.0 (https://polyasite.unibas.ch/) as polyA site references for detecting APA events. The ∆PUI used in APAlyzer is defined as the difference in the log2-transformed distal polyA site usage between two conditions.

3′ end-seq

To comprehensively map polyadenylation and quantify APA, three control samples and three TDP-43 KD samples were used for 3′ end-seq. For each sample, 500 ng of total RNA was used to construct the 3′ end-seq library using the Quantseq REV kit from Lexogen, according to the manufacturer’s instructions. The resulting 3′ end-seq libraries were quantified by Qubit, checked for library sizes on a D1000 high-sensitivity chip (Agilent Technologies, TapeStation), pooled and sequenced on a NextSeq 500 or NextSeq 2000 machine using the 150-cycle high-output kit in a 75 bp paired-end mode (Illumina).

APA analysis using 3′ end-seq data

The sequenced 3′ end-seq libraries were adapter trimmed and quality filtered using BBduk. The filtered reads were mapped to the human genome (hg38) using STAR with GENCODE v42 transcriptome, extracted for unique-mapping, properly paired reads using Samtools, and deduplicated using Picard tools. To remove mis-primed reads, we discarded reads that were mapped upstream of six consecutive As, upstream of a 10-bp region with at least 50% of As or upstream of one of the six pentamers—AAAAA, GAAAA, AGAAA, AAGAA, AAAGA and AAAAG—that might form the G–T wobble pair during cDNA synthesis. The resulting filtered reads were analyzed using a modified version of long-read alternative polyadenylation analysis (LAPA)62 that used the default setting, except that it required a replication rate cutoff at 0.75 and could identify polyA site-containing reads mapped to the reverse strand. We further filtered for identified polyA sites if they were present in all samples in at least one condition, had at least ten average reads across samples, had at least 5% usage rate in at least one condition and were not located in intergenic regions. This stringent filtering may remove false negative reads (that is, reads misclassified as mis-primed reads); however, this filtering allows us to capture most reproducible and substantial APA changes caused by TDP-43 KD. After this filtering, genes with at least two polyA sites were used for APA analysis. We used StringTie to assemble transcript isoforms from matched RNA-seq data and then used the assembled isoforms to help assign polyA sites that are located downstream of annotated transcript 3′ ends. The change in APA is defined as the usage difference in the distal p-aminosalicylic acid usage rate (∆dPUR) between TDP-43 KD and control samples. The APA change was considered substantial if |∆dPUR| > 0.05 with an adjusted P value of <0.05. If genes had more than two polyA sites, they were filtered for two polyA sites with the two smallest adjusted P values and used for plotting and downstream sequence feature analysis (Supplementary Table 4). PolyA sites were considered cryptic if their usage was <10% under the control condition but >10% under the TDP-43 KD condition, and their usage increase was ≥10% (Supplementary Table 3). Cryptic splicing events detected in RNA-seq data and 3′ end-seq data were used to search for coordinated cryptic polyadenylation events.

Alignment of read ends across transcripts

Using DeepTools, the ends of filtered reads were calculated for normalized coverage (RPKM) across protein-coding genes and lncRNA genes, processed with computeMatrix, and then plotted using plotHeatmap.

Analysis of polyA signals

The 60-nt region upstream of each polyA identified by LAPA was examined for any known polyA signals previously documented30. The frequency of each polyA signal was then plotted for both annotated and new polyA sites. Polyadenylation signals that were not AATAA or ATTAAA were categorized as the APA signal. Subsequently, the distances from these polyA signals to the corresponding polyA sites were plotted for both annotated and new polyA sites.

The position and strength of TDP-43-binding site

The 1,000-nt region surrounding each TDP-43-regulated APA site was analyzed for overlaps with TDP-43-binding sites identified through CLIP-seq. We calculated the distance from each overlapping TDP-43-binding site to its corresponding APA site. Only those TDP-43-binding sites with the shortest distances were retained for further analysis. For these selected binding sites, we estimated their strengths based on the GU content within a 50-nt region surrounding each site.

Hexamer analysis

To identify enriched and depleted hexamers in specific regions surrounding TDP-43-regulated APA sites, k-mer analysis was performed using the transite package in R.

Calculation of polyA site score

For each polyA site identified in the 3′ end-seq libraries, a 205 bp sequence centered at the site was extracted and used to calculate the polyA site score using Aparent2, which was then converted to the log odds ratio.

Inhibition of the NMD

iNeurons were treated with different concentrations of 11j (MedChem Express, HY-124719), an SMG-1 inhibitor or an equal volume of dimethyl sulfoxide for 24 h and collected for western blot to confirm that phosphorylation of UPF1 was inhibited. Western blot was performed as described above with wet transfer and proteins were detected using the following antibodies: UPF1 (1:1,000; Abcam, ab109363), pUPF1 (1:1,000; EMD Millipore, 07-1016) and GAPDH (1:2,000; Sigma-Aldrich, G8795). Samples treated with 0.01 µm 11j were further used for qRT–PCR with primers listed in Supplementary Table 7.

Pulse-chase experiment using actinomycin-D

After 12 days of TDP-43 KD, iNeurons were treated with 20 µm of actinomycin-D (Sigma-Aldrich, A1410) and collected at time points of 0, 1, 3 and 6 h post-treatment for total RNA extraction using TRIzol (Thermo Fisher Scientific, 15596026). The resulting RNA was measured by Qubit (Thermo Fisher Scientific), normalized to the same concentration, and used for qRT–PCR with primers listed in Supplementary Table 7. The normalized RNA levels over time were fitted with an exponential decay model using nls function in R (y ~ y0 × exp(−k × time) with starting values of y0 at 1 and k at 0.01) to calculate the decay rate (r), which were then used to estimate the half life with ln(2)/r.

Evaluation of SFPQ 3′ UTR length, SPFQ cryptic splicing and TMEM106B 3′ UTR length in FTLD–TDP brain samples by qRT–PCR

Postmortem brain tissues from patients with FTLD–TDP and cognitively normal control individuals were obtained from the Mayo Clinic Florida Brain Bank. Diagnosis was independently ascertained by trained neurologists and neuropathologists upon neurological and pathological examinations, respectively. Written informed consent was given by all participants or authorized family members, and all protocols were approved by the Mayo Clinic Institutional Review Board and Ethics Committee. Our study cohort included a total of 83 postmortem cases classified into the following two main groups: healthy controls (n = 53) and FTD cases (n = 221). A summary of patient data is included in Supplementary Table 8. RNA was extracted from the frontal cortices of healthy and FTD patients following the manufacturer’s protocol using the RNAeasy Plus Mini Kit (Qiagen) and as previously described9,11,63. Up to three cuts of the sample were used for extraction and only the high-quality RNA samples were processed for downstream analysis. RNA concentration was measured by using Nanodrop technologies (Thermo Fisher Scientific) and the RNA integrity number was evaluated by the Agilent 2100 Bioanalyzer (Agilent Technologies, TapeStation). Subsequently, 500 ng of the total RNA extracted was reverse transcribed into cDNA using the High-Capacity cDNA Transcription Kit (Applied Biosystems) as per the manufacturer’s instructions. cDNA samples, in triplicate, with SYBR GreenER qPCR SuperMix (Invitrogen), were further evaluated for qRT–PCR on a QuantStudio 7 Flex Real-Time PCR System (Applied Biosystems). Relative quantification of levels of the long SFPQ 3′ UTR, the total SFPQ 3′ UTR, the long TMEM106B 3′ UTR and total TMEM106B was determined using the ΔΔCt method and normalized to two endogenous controls, GAPDH and RPLP0. All statistical analyses were performed using the GraphPad Prism 10 (GraphPad software). For comparison of the frontal cortex RNA levels between the healthy and FTD cases, the Mann–Whitney test was used. Primers are listed in Supplementary Table 8.

Construction of luciferase reporters

The short TMEM106B 3′ UTR and the long TMEM106B 3′ UTR were amplified from human genomic DNA (H1) and then cloned into the pmirGLO Dual-Luciferase Vector (Promega, E1330) using Gibson assembly. The proximal polyA site in the long TMEM106B 3′ UTR was then mutated using Gibson assembly. Mutations that disrupt the proximal polyA site were identified using Aparent2. The polyA signal was identified using Aparent2. All of the reporters were confirmed to have correct sequences by whole-plasmid sequencing, and their sequences can be found in Supplementary Table 9.

Overexpression of exogenous TMEM106B

The lentiviral plasmid containing FLAG-tagged TMEM106B lacking its 3′ UTR and the corresponding lentivirus were made by VectorBuilder. Their sequences can be found in Supplementary Table 9. After 7 days of TDP-43 KD, iNeurons were transduced with lentivirus containing either the FLAG-tagged TMEM106B construct or the EGFP control and collected 5 days post-transduction for dimer-level analysis using western blot. The western blot was carried out as described in the immunoblotting of TMEM106B.

Measurement of luciferase activity

For TMEM106B reporters, HEK293T cells were plated on a 96-well plate or a 24-well plate and transfected with dual-luciferase reporters that have no insert, the short TMEM106B 3′ UTR or the long TMEM106B 3′ UTR using Lipofectamine 3000 (Thermo Fisher Scientific, L3000001). Two days later, the transfected cells on a 96-well plate were measured for the luciferase activities using the Dual-Glo Luciferase Assay System (Promega, E2920), according to the manufacturer’s instructions; the transfected cells on a 24-well plate were used for qRT–PCR as described above.

Quantitation and statistical analysis

All quantification and statistical analyses were done in R. No statistical methods were used to predetermine sample sizes, but our sample sizes are similar to those reported in previous publications11,20. Experiments were generally not anonymized or randomized during analysis. Data distributions in Fig. 3a,c–e and Supplementary Fig. 2o were tested for normality and other data distributions were not formally tested, but all data points were plotted for every experiment. Analysis details can be found in figure legends, Methods and ‘Main’. Genome tracks were prepared using IGV. All plots were prepared using tidyplots, ggplot2, ggpubr, patchwork and ggrepel in R.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.