Abstract
The mechanisms by which mRNA sequences specify translational control remain poorly understood in mammalian cells. Here we generate a transcriptome-wide atlas of translation efficiency (TE) measurements encompassing more than 140 human and mouse cell types from 3,819 ribosomal profiling datasets. We develop RiboNN, a state-of-the-art multitask deep convolutional neural network, and classic machine learning models to predict TEs in hundreds of cell types from sequence-encoded mRNA features. While most earlier models solely considered the 5′ untranslated region (UTR) sequence, RiboNN integrates how the spatial positioning of low-level dinucleotide and trinucleotide features (that is, including codons) influences TE, capturing mechanistic principles such as how ribosomal processivity and tRNA abundance control translational output. RiboNN predicts the translational behavior of base-modified therapeutic RNA and explains evolutionary selection pressures in human 5′ UTRs. Finally, it detects a common language governing mRNA regulatory control and highlights the interconnectedness of mRNA translation, stability and localization in mammalian organisms.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
We provide the processed data without restriction in supplementary tables herein.
Code availability
Code and pretrained models are available on Zenodo127 and GitHub (https://github.com/Sanofi-Public/RiboNN/). Our classic ML model code is available on Zenodo128 and GitHub (https://github.com/CenikLab/TE_classic_ML).
References
Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Wang, J. & Agarwal, V. How DNA encodes the start of transcription. Science 384, 382–383 (2024).
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).
Agarwal, V. & Kelley, D. R. The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol. 23, 245 (2022).
Gingold, H. & Pilpel, Y. Determinants of translation efficiency and accuracy. Mol. Syst. Biol. 7, 481 (2011).
Zur, H. & Tuller, T. Predictive biophysical modeling and understanding of the dynamics of mRNA translation and its evolution. Nucleic Acids Res. 44, 9031–9049 (2016).
Nieuwkoop, T. et al. Revealing determinants of translation efficiency via whole-gene codon randomization and machine learning. Nucleic Acids Res. 51, 2363–2376 (2023).
Shao, B. et al. Riboformer: a deep learning framework for predicting context-dependent translation dynamics. Nat. Commun. 15, 2011 (2024).
Tian, T., Li, S., Lang, P., Zhao, D. & Zeng, J. Full-length ribosome density prediction by a multi-input and multi-output model. PLoS Comput. Biol. 17, e1008842 (2021).
Tunney, R. et al. Accurate design of translational output by a neural network model of ribosome distribution. Nat. Struct. Mol. Biol. 25, 577–582 (2018).
Sample, P. J. et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol. 37, 803–809 (2019).
Cao, J. et al. High-throughput 5′ UTR engineering for enhanced protein production in non-viral gene therapies. Nat. Commun. 12, 4138 (2021).
Karollus, A., Avsec, Ž. & Gagneur, J. Predicting mean ribosome load for 5′UTR of any length using deep learning. PLoS Comput. Biol. 17, e1008982 (2021).
Bazzini, A. A. et al. Codon identity regulates mRNA stability and translation efficiency during the maternal-to-zygotic transition. EMBO J. 35, 2087–2103 (2016).
Hanson, G. & Coller, J. Codon optimality, bias and usage in translation and mRNA decay. Nat. Rev. Mol. Cell Biol. 19, 20–30 (2018).
Li, S. et al. CodonBERT large language model for mRNA vaccines. Genome Res. 34, 1027–1035 (2024).
Szostak, E. & Gebauer, F. Translational control by 3′-UTR-binding proteins. Brief. Funct. Genomics 12, 58–65 (2013).
Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. eLife 5, e10921 (2016).
Schlusser, N., González, A., Pandey, M. & Zavolan, M. Current limitations in predicting mRNA translation with deep learning models. Genome Biol. 25, 227 (2024).
Li, S. et al. mRNA-LM: full-length integrated SLM for mRNA analysis. Nucleic Acids Res. 53, gkaf044 (2025).
Vogel, C. et al. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol. Syst. Biol. 6, 400 (2010).
Eraslan, B. et al. Quantification and discovery of sequence determinants of protein-per-mRNA amount in 29 human tissues. Mol. Syst. Biol. 15, e8513 (2019).
Eisen, T. J., Li, J. J. & Bartel, D. P. The interplay between translational efficiency, poly(A) tails, microRNAs, and neuronal activation. RNA 28, 808–831 (2022).
Li, J. J., Chew, G.-L. & Biggin, M. D. Quantitative principles of cis-translational control by general mRNA sequence features in eukaryotes. Genome Biol. 20, 162 (2019).
Battle, A. et al. Genomic variation. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).
Cenik, C. et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 25, 1610–1621 (2015).
Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).
Jovanovic, M. et al. Immunogenetics. Dynamic profiling of the protein life cycle in response to pathogens. Science 347, 1259038 (2015).
Hernandez-Alias, X., Benisty, H., Radusky, L. G., Serrano, L. & Schaefer, M. H. Using protein-per-mRNA differences among human tissues in codon optimization. Genome Biol. 24, 34 (2023).
Spies, N., Burge, C. B. & Bartel, D. P. 3′UTR-isoform choice has limited influence on the stability and translational efficiency of most mRNAs in mouse fibroblasts. Genome Res. 23, 2078–2090 (2013).
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).
Li, J. J., Bickel, P. J. & Biggin, M. D. System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ. 2, e270 (2014).
Gorgoni, B., Marshall, E., McFarland, M. R., Romano, M. C. & Stansfield, I. Controlling translation elongation efficiency: tRNA regulation of ribosome flux on the mRNA. Biochem. Soc. Trans. 42, 160–165 (2014).
Sonenberg, N. & Hinnebusch, A. G. Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell 136, 731–745 (2009).
Jackson, R. J., Hellen, C. U. T. & Pestova, T. V. The mechanism of eukaryotic translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Biol. 11, 113–127 (2010).
Hinnebusch, A. G., Ivanov, I. P. & Sonenberg, N. Translational control by 5′-untranslated regions of eukaryotic mRNAs. Science 352, 1413–1416 (2016).
Sharp, P. M. & Li, W. H. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24, 28–38 (1986).
Presnyak, V. et al. Codon optimality is a major determinant of mRNA stability. Cell 160, 1111–1124 (2015).
Torrent, M., Chalancon, G., de Groot, N. S., Wuster, A. & Madan Babu, M. Cells alter their tRNA abundance to selectively regulate protein synthesis during stress conditions. Sci. Signal. 11, eaat6409 (2018).
Weinberg, D. E. et al. Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 14, 1787–1799 (2016).
Gamble, C. E., Brule, C. E., Dean, K. M., Fields, S. & Grayhack, E. J. Adjacent codons act in concert to modulate translation efficiency in yeast. Cell 166, 679–690 (2016).
Mauger, D. M. et al. mRNA structure regulates protein expression through changes in functional half-life. Proc. Natl Acad. Sci. USA 116, 24075–24083 (2019).
Verma, M. et al. A short translational ramp determines the efficiency of protein synthesis. Nat. Commun. 10, 5774 (2019).
Burke, P. C., Park, H. & Subramaniam, A. R. A nascent peptide code for translational control of mRNA stability in human cells. Nat. Commun. 13, 6829 (2022).
Narula, A., Ellis, J., Taliaferro, J. M. & Rissland, O. S. Coding regions affect mRNA stability in human cells. RNA 25, 1751–1764 (2019).
Forrest, M. E. et al. Codon and amino acid content are associated with mRNA stability in mammalian cells. PLoS ONE 15, e0228730 (2020).
Wu, Q. et al. Translation affects mRNA stability in a codon-dependent manner in human cells. eLife 8, e45396 (2019).
Hia, F. et al. Codon bias confers stability to human mRNAs. EMBO Rep. 20, e48220 (2019).
Zhu, X., Cruz, V. E., Zhang, H., Erzberger, J. P. & Mendell, J. T. Specific tRNAs promote mRNA decay by recruiting the CCR4-NOT complex to translating ribosomes. Science 386, eadq8587 (2024).
Ozadam, H., Geng, M. & Cenik, C. RiboFlow, RiboR and RiboPy: an ecosystem for analyzing ribosome profiling data at read length resolution. Bioinformatics 36, 2929–2931 (2020).
Liu, Y. et al. Translation efficiency covariation across cell types is a conserved organizing principle of mammalian transcriptomes. Preprint at bioRxiv https://doi.org/10.1101/2024.08.11.607360 (2024).
Larsson, O., Sonenberg, N. & Nadon, R. Identification of differential translation in genome wide studies. Proc. Natl Acad. Sci. USA 107, 21487–21492 (2010).
Guo, J. U. & Bartel, D. P. RNA G-quadruplexes are globally unfolded in eukaryotic cells and depleted in bacteria. Science 353, aaf5371 (2016).
Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).
Rogers, D. W., Böttcher, M. A., Traulsen, A. & Greig, D. Ribosome reinitiation can explain length-dependent translation of messenger RNA. PLoS Comput. Biol. 13, e1005592 (2017).
Fernandes, L. D., de Moura, A. P. S. & Ciandrini, L. Gene length as a regulator for ribosome recruitment and protein synthesis: theoretical insights. Sci. Rep. 7, 17409 (2017).
Witte, F. et al. A trans locus causes a ribosomopathy in hypertrophic hearts that affects mRNA translation in a protein length-dependent fashion. Genome Biol. 22, 191 (2021).
Thompson, M. K., Rojas-Duran, M. F., Gangaramani, P. & Gilbert, W. V. The ribosomal protein Asc1/RACK1 is required for efficient translation of short mRNAs. eLife 5, e11154 (2016).
Dever, T. E., Ivanov, I. P. & Hinnebusch, A. G. Translational regulation by uORFs and start codon selection stringency. Genes Dev. 37, 474–489 (2023).
Lewis, C. J. T. et al. Quantitative profiling of human translation initiation reveals elements that potently regulate endogenous and therapeutically modified mRNAs. Mol. Cell 85, 445–445 (2024).
Strayer, E. C. et al. NaP-TRAP reveals the regulatory grammar in 5′UTR-mediated translation regulation during zebrafish development. Nat. Commun. 15, 10898 (2024).
Alqaraawi, A., Schuessler, M., Weiß, P., Costanza, E. & Berthouze, N. Evaluating saliency map explanations for convolutional neural networks: a user study. Preprint at https://arxiv.org/abs/2002.00772 (2020).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at https://arxiv.org/abs/1312.6034 (2013).
Shrikumar, A. et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5.6.5. Preprint at https://arxiv.org/abs/1811.00416 (2018).
Chu, D. et al. Translation elongation can control translation initiation on eukaryotic mRNAs. EMBO J. 33, 21–34 (2014).
Wu, C. C.-C., Zinshteyn, B., Wehner, K. A. & Green, R. High-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress. Mol. Cell 73, 959–970 (2019).
Gogakos, T. et al. Characterizing expression and processing of precursor and mature human tRNAs by hydro-tRNAseq and PAR-CLIP. Cell Rep. 20, 1463–1475 (2017).
Sterne-Weiler, T. et al. Frac-seq reveals isoform-specific recruitment to polyribosomes. Genome Res. 23, 1615–1623 (2013).
Ritter, A. J., Draper, J. M., Vollmers, C. & Sanford, J. R. Long-read subcellular fractionation and sequencing reveals the translational fate of full-length mRNA isoforms during neuronal differentiation. Genome Res. 34, 2000–2011 (2024).
Nachtergaele, S. & He, C. Chemical modifications in the life of an mRNA transcript. Annu. Rev. Genet. 52, 349–372 (2018).
Whiffin, N. et al. Characterising the loss-of-function impact of 5′ untranslated region variants in 15,708 individuals. Nat. Commun. 11, 2523 (2020).
Sevilla, T. et al. Mutations in the MORC2 gene cause axonal Charcot–Marie–Tooth disease. Brain 139, 62–72 (2015).
Dueñas Rey, A. et al. Combining a prioritization strategy and functional studies nominates 5′UTR variants underlying inherited retinal disease. Genome Med. 16, 7 (2024).
Liu, L. et al. Mutation of the CDKN2A 5′ UTR creates an aberrant initiation codon and predisposes to melanoma. Nat. Genet. 21, 128–132 (1999).
Damjanovich, K. et al. 5′UTR mutations of ENG cause hereditary hemorrhagic telangiectasia. Orphanet J. Rare Dis. 6, 85 (2011).
Pan, X. et al. 5′-UTR SNP of FGF13 causes translational defect and intellectual disability. eLife 10, e63021 (2021).
Lee, D. S. M. et al. Disrupting upstream translation in mRNAs is associated with human disease. Nat. Commun. 12, 1515 (2021).
Lim, Y. et al. Multiplexed functional genomic analysis of 5′ untranslated region mutations across the spectrum of prostate cancer. Nat. Commun. 12, 4217 (2021).
Stephens, S. B. & Nicchitta, C. V. Divergent regulation of protein synthesis in the cytosol and endoplasmic reticulum compartments of mammalian cells. Mol. Biol. Cell 19, 623–632 (2008).
Horste, E. L. et al. Subcytoplasmic location of translation controls protein output. Mol. Cell 83, 4509–4523 (2023).
Hubstenberger, A. et al. P-body purification reveals the condensation of repressed mRNA regulons. Mol. Cell 68, 144–157 (2017).
Chew, G.-L., Pauli, A. & Schier, A. F. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish. Nat. Commun. 7, 11663 (2016).
Jia, L. et al. Decoding mRNA translatability and stability from the 5′ UTR. Nat. Struct. Mol. Biol. 27, 814–821 (2020).
Akirtava, C., May, G. E. & McManus, C. J. Deciphering the landscape of cis-acting sequences in natural yeast transcript leaders. Nucleic Acids Res. 53, gkaf165 (2025).
Choi, Y. et al. Time-resolved profiling of RNA binding proteins throughout the mRNA life cycle. Mol. Cell 84, 1764–1782 (2024).
Singh, G., Pratt, G., Yeo, G. W. & Moore, M. J. The clothes make the mRNA: past and present trends in mRNP fashion. Annu. Rev. Biochem. 84, 325–354 (2015).
May, G. E. et al. Unraveling the influences of sequence and position on yeast uORF activity using massively parallel reporter systems and machine learning. eLife 12, e69611 (2023).
Arribere, J. A. et al. Translation readthrough mitigation. Nature 534, 719–723 (2016).
Kramarski, L. & Arbely, E. Translational read-through promotes aggregation and shapes stop codon identity. Nucleic Acids Res. 48, 3747–3760 (2020).
Yordanova, M. M. et al. AMD1 mRNA employs ribosome stalling as a mechanism for molecular memory formation. Nature 553, 356–360 (2018).
Hashimoto, S., Nobuta, R., Izawa, T. & Inada, T. Translation arrest as a protein quality control system for aberrant translation of the 3′-UTR in mammalian cells. FEBS Lett. 593, 777–787 (2019).
Sherlock, M. E., Baquero Galvis, L., Vicens, Q., Kieft, J. S. & Jagannathan, S. Principles, mechanisms, and biological implications of translation termination–reinitiation. RNA 29, 865–884 (2023).
Wu, Q. et al. Translation of small downstream ORFs enhances translation of canonical main open reading frames. EMBO J. 39, e104763 (2020).
Mayr, C. Evolution and biological roles of alternative 3′UTRs. Trends Cell Biol. 26, 227–237 (2016).
Subtelny, A. O., Eichhorn, S. W., Chen, G. R., Sive, H. & Bartel, D. P. Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508, 66–71 (2014).
Ozadam, H. et al. Single-cell quantification of ribosome occupancy in early mouse development. Nature 618, 1057–1064 (2023).
Gruber, A. R. et al. Global 3′ UTR shortening has a limited effect on protein abundance in proliferating T cells. Nat. Commun. 5, 5465 (2014).
Requião, R. D., Barros, G. C., Domitrovic, T. & Palhano, F. L. Influence of nascent polypeptide positive charges on translation dynamics. Biochem. J 477, 2921–2934 (2020).
Dao Duc, K. & Song, Y. S. The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation. PLoS Genet. 14, e1007166 (2018).
Ahmed, N. et al. Pairs of amino acids at the P- and A-sites of the ribosome predictably and causally modulate translation–elongation rates. J. Mol. Biol. 432, 166696 (2020).
Kirchner, S. & Ignatova, Z. Emerging roles of tRNA in adaptive translation, signalling dynamics and disease. Nat. Rev. Genet. 16, 98–112 (2015).
Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).
Riba, A. et al. Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates. Proc. Natl. Acad. Sci. USA. 116, 15023–15032 (2019).
Barrington, C. L. et al. Synonymous codon usage regulates translation initiation. Cell Rep. 42, 113413 (2023).
Lyons, E. F. et al. Translation elongation as a rate limiting step of protein production. Preprint at bioRxiv https://doi.org/10.1101/2023.11.27.568910 (2024).
Chen, K. Y., Park, H. & Subramaniam, A. R. Massively parallel identification of sequence motifs triggering ribosome-associated mRNA quality control. Nucleic Acids Res. 52, 7171–7187 (2024).
Bicknell, A. A. & Ricci, E. P. When mRNA translation meets decay. Biochem. Soc. Trans. 45, 339–351 (2017).
Bicknell, A. A. et al. Attenuating ribosome load improves protein output from mRNA by limiting translation-dependent mRNA decay. Cell Rep. 43, 114098 (2024).
Mishima, Y., Han, P., Ishibashi, K., Kimura, S. & Iwasaki, S. Ribosome slowdown triggers codon-mediated mRNA decay independently of ribosome quality control. EMBO J. 41, e109256 (2022).
Bae, H. & Coller, J. Codon optimality-mediated mRNA degradation: linking translational elongation to mRNA stability. Mol. Cell 82, 1467–1476 (2022).
Inada, T. Quality controls induced by aberrant translation. Nucleic Acids Res. 48, 1084–1096 (2020).
Matsuo, Y. et al. RQT complex dissociates ribosomes collided on endogenous RQC substrate SDD1. Nat. Struct. Mol. Biol. 27, 323–332 (2020).
Mercier, B. C. et al. Translation-dependent and -independent mRNA decay occur through mutually exclusive pathways defined by ribosome density during T cell activation. Genome Res. 34, 394–409 (2024).
Leppek, K., Das, R. & Barna, M. Functional 5′ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat. Rev. Mol. Cell Biol. 19, 158–174 (2018).
Liu, T.-Y. et al. Time-resolved proteomics extends ribosome profiling-based measurements of protein synthesis dynamics. Cell Syst. 4, 636–644 (2017).
Shah, P., Ding, Y., Niemczyk, M., Kudla, G. & Plotkin, J. B. Rate-limiting steps in yeast protein translation. Cell 153, 1589–1601 (2013).
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).
Gerashchenko, M. V. & Gladyshev, V. N. Translation inhibitors cause abnormalities in ribosome profiling experiments. Nucleic Acids Res. 42, e134 (2014).
Rodriguez, J. M. et al. APPRIS: selecting functionally important isoforms. Nucleic Acids Res. 50, D54–D59 (2022).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. & Guyon, I.) 3146–3154 (Curran Associates, 2017).
Kokhlikyan, N. et al. Captum: a unified and generic model interpretability library for PyTorch. Preprint at https://arxiv.org/abs/2009.07896 (2020).
Gudmundsson, S. et al. Addendum: the mutational constraint spectrum quantified from variation in 141,456 humans. Nature 597, E3–E4 (2021).
Zheng, D., Wang, J. & Agarwal, V. RiboNN: a deep learning model to predict translation efficiency from mRNA sequence. Zenodo https://doi.org/10.5281/zenodo.15360345 (2025).
Persyn, L., Liu, Y. & Cenik, C. Classic TE prediction model. Zenodo https://doi.org/10.5281/zenodo.15360966 (2025).
Pagès, H., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: efficient manipulation of biological strings. Bioconductor https://doi.org/10.18129/B9.bioc.Biostrings (2025).
Acknowledgements
We thank I. Hoskins (UT Austin) for the code and data to generate secondary structure features and M. Miladi (Sanofi) for providing critical feedback. We thank C. Thoreen and W. Gilbert (Yale University) for sharing their data before publication. Research reported in this publication was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under award R35GM150667 (to C.C.). This work was also supported by the National Institutes of Health (grant HD110096) and the Welch Foundation (grant F-2027-20230405 to C.C.). C.C. was a CPRIT Scholar in Cancer Research supported by the CPRIT (grant RR180042).
Author information
Authors and Affiliations
Contributions
D.Z. trained RiboNN models, validated model predictions with public datasets and contributed to model interpretation. J.W. interpreted RiboNN, performed comparisons between TE and third-party measurements, and analyzed genetic variant data. L.P. trained and interpreted classic ML models. Y.L. helped synthesize the data compendia and developed the compositional approach to calculate TE. F.U.-M., C.C. and V.A. supervised the study. C.C. and V.A. conceptualized and designed the study.
Corresponding authors
Ethics declarations
Competing interests
D.Z., J.W., F.U.-M. and V.A. are employees of Sanofi and may hold shares and/or stock options in the company. The other authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 2 Visualization of the RiboNN model architecture.
Shown is a layer-by-layer graph of the RiboNN architecture, with input/output dimensions labeled for each layer. The ConvBlock in the broken-line box was applied 10 times in total to compress the sequence length. Light yellow nodes reflect input/output tensors, light blue nodes reflect functions, light green nodes reflect modules and numbers in parentheses reflect tensor dimensions.
Extended Data Fig. 3 Performance of deep learning models on all human cell types.
These panels mirror those shown in Supplementary Fig. 4, except they show the performance of multitask deep learning models with one of four architectures: (1) our final RiboNN architecture, (2) our RiboNN architecture, ablating the input channel recording codon positions, (3) our RiboNN architecture, anchoring all mRNAs at their 5′ end instead of at start codons and (4) the Saluki architecture7, removing the splice site input channel. For each architecture, the r2 values were measured on ten held-out CV folds (n = 10,242 total genes among the ten folds). The center of the boxes corresponds to the median (the 50th percentile). The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5× IQR (interquartile range, or distance between the first and third quartiles) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. Data beyond the end of the whiskers are plotted individually.
Extended Data Fig. 4 Performance of RiboNN on mouse cell types.
a, These panels mirror those shown in Supplementary Fig. 4, except they show the performance of our multitask RiboNN model on mouse cell types using r2 measured on ten held-out CV folds (n = 10,242 total genes among the ten folds). The center of the boxes corresponds to the median (the 50th percentile). The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5× IQR (interquartile range, or distance between the first and third quartiles) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. Data beyond the end of the whiskers are plotted individually. b,c, Scatter plots showing the relationships between our mouse RiboNN predictions to the observed mean TEs for human mRNAs (b) as well as the relationships between our human RiboNN predictions to the observed mean TEs for mouse mRNAs (c). Pearson (r) and Spearman (ρ) correlation coefficients are also shown. d,e, Scatter plots showing the relationships between sequence homology, considering the interspecies pair of mRNAs with the maximum homology, and the residual prediction error between the TE from one species and TE predicted from the alternative species. This was shown for human (d) and mouse (e) mean TE data. ‘Max homology %’ was computed as follows: (1) all human–mouse mRNA pairs were locally aligned using the ‘pairwiseAlignment’ function from the Biostrings (version 2.70.2) R package129 (‘match: 1, mismatch: −3, gap open: −2 and gap extend: −1’) and (2) for each mRNA, the final value was computed using the highest scoring alignment from the other species, calculating the maximum homology score divided by mRNA length.
Extended Data Fig. 5 Interpretation of human and mouse RiboNN models.
a, Attribution score plot for human RiboNN model focusing on specific regions along valid mRNAs (defined in Methods). The windows include the first 50 nt of the 5′ UTR, 50 nt upstream to 250 nt downstream of the start codon, 250 nt upstream and 250 nt downstream of the stop codon and the last 250 nt of the 3′ UTR. The absolute values of attribution scores were averaged across all valid mRNAs, which were grouped into one of four equally sized bins according to their mean TE. b, Metagene plot for the absolute value of attribution scores derived from the mouse RiboNN model, averaged across all mRNAs, for percentiles along the 5′ UTR, CDS and 3′ UTR. mRNAs were grouped into one of four equally sized bins according to their mean TE. c, Same as a, except it reflects results from the mouse RiboNN model. d,e, Enriched motifs learned by human (d) and mouse (e) RiboNN models for each functional region of mRNA. Motifs are ranked by the number of seqlets67 supporting each motif.
Extended Data Fig. 6 Amino acid-level-based correlation among codon influence scores.
a–c, Scatter plots showing the relationship between the amino acid-level-based codon influence (that is, the predicted effect size of each inserted codon, averaged across all positional bins and across codons for each amino acid) from the human RiboNN model and the mouse model (a), A-site ribosome occupancy scores69 (b) and mean codon stability coefficients49 (c). Pearson (r) and Spearman (ρ) correlation coefficients are also shown. The properties of amino acids are labeled by different colors for hydrophobicity and by different shapes for charge. The error bar represents the standard error across codons encoding the same amino acid. To compute amino acid-level scores, we computed the mean score among codons encoding the same amino acid. All 61 non-stop codons were included in the amino acid-based analysis.
Extended Data Fig. 7 In silico mutagenesis of disease-associated gene 5′ UTRs.
a–c, In silico mutagenesis of 5′ UTR regions of RDH12 (a), ENG (b) and FGF13 (c). Positions of wild-type uAUG are highlighted in purple at the top. The known disease-associated variants are boxed. Single-point mutations resulting in predicted TE differences are shown alongside annotations reflecting the corresponding gain or loss of TE.
Extended Data Fig. 8 In silico mutagenesis of disease-associated gene 5′ UTRs.
Continuation of results from Extended Data Fig. 7, except with BCL2L13.
Extended Data Fig. 9 In silico mutagenesis of cancer-associated gene 5′ UTRs.
a–d, In silico mutagenesis of 5′ UTR regions of ADAM32 (a), NUMA1 (b), COMT (c) and QARS (d). Positions of wild-type uAUG are highlighted in purple at the top. The known cancer-associated variants are boxed. Single-point mutations resulting in predicted TE differences are shown alongside annotations reflecting the corresponding gain or loss of TE.
Extended Data Fig. 10 In silico mutagenesis of cancer-associated gene 5′ UTRs.
Continuation of results from Extended Data Fig. 9, except with AKT3.
Supplementary information
Supplementary Information
Supplementary Discussion, Methods and Figs. 1–12.
Supplementary Table 1
Feature sizes, sequences, CV folds and TEs of human genes.
Supplementary Table 2
Feature sizes, sequences, CV folds and TEs of mouse genes.
Supplementary Table 3
Feature sizes, sequences, CV folds and TEs predicted by the human RiboNN models.
Supplementary Table 4
Feature sizes, sequences, CV folds and TEs predicted by the mouse RiboNN models.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, D., Persyn, L., Wang, J. et al. Predicting the translation efficiency of messenger RNA in mammalian cells. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02712-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41587-025-02712-x
This article is cited by
-
Translation efficiency covariation identifies conserved coordination patterns across cell types
Nature Biotechnology (2025)