Abstract
Characterizing shared patterns of RNA expression between genes across conditions has led to the discovery of regulatory networks and biological functions. However, it is unclear if such coordination extends to translation. In this study, we uniformly analyze 3,819 ribosome profiling datasets from 117 human and 94 mouse tissues and cell lines. We introduce the concept of translation efficiency covariation (TEC), identifying coordinated translation patterns across cell types. We nominate candidate mechanisms driving shared patterns of translation regulation. TEC is conserved across human and mouse cells and uncovers gene functions that are not evident from RNA or protein co-expression. Moreover, our observations indicate that proteins that physically interact are highly enriched for positive covariation at both translational and transcriptional levels. Our findings establish TEC as a conserved organizing principle of mammalian transcriptomes. TEC has potential as a predictive marker for gene function and may offer a framework for designing gene expression systems in synthetic biology and biotechnological applications.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Metadata about RiboBase can be found in Supplementary Table 1. Ribo files for the HeLa cell line are accessible via Zenodo at https://doi.org/10.5281/zenodo.15660080 (ref. 103). Full TEC and RNA co-expression matrices are accessible via Zenodo at https://doi.org/10.5281/zenodo.10373032 (ref. 127). A RiboFlow configuration file and processed ribo files for RBP knockout can be accessed via Zenodo at https://doi.org/10.5281/zenodo.11388478 (ref. 135). Sequencing data and ribo files for the RBP knockout experiments are available under GEO accession code GSE269734.
Code availability
The code and data used in this study are available via Zenodo at https://doi.org/10.5281/zenodo.10373032 (ref. 127) and via GitHub at https://github.com/CenikLab/TE_model. The code and data used to generate figures can be found via Zenodo at https://doi.org/10.5281/zenodo.15337774 (ref. 136) and via GitHub at https://github.com/CenikLab/coTE_paper.
References
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Combs, P. A. & Eisen, M. B. Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression. PLoS ONE 8, e71820 (2013).
Achim, K. et al. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol. 33, 503–509 (2015).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).
Skinnider, M. A., Squair, J. W. & Foster, L. J. Evaluating measures of association for single-cell transcriptomics. Nat. Methods 16, 381–386 (2019).
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).
DeRisi, J. L., Iyer, V. R. & Brown, P. O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Jansen, R., Greenbaum, D. & Gerstein, M. Relating whole-genome expression data with protein–protein interactions. Genome Res. 12, 37–46 (2002).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. & Church, G. M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).
Roth, F. P., Hughes, J. D., Estep, P. W. & Church, G. M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939–945 (1998).
Nusinow, D. P. et al. Quantitative proteomics of the Cancer Cell Line Encyclopedia. Cell 180, 387–402 (2020).
Gonçalves, E. et al. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell 40, 835–849 (2022).
Ryan, C. J., Kennedy, S., Bajrami, I., Matallanas, D. & Lord, C. J. A compendium of co-regulated protein complexes in breast cancer reveals collateral loss events. Cell Syst. 5, 399–409 (2017).
Singh, G., Pratt, G., Yeo, G. W. & Moore, M. J. The clothes make the mRNA: past and present trends in mRNP fashion. Annu. Rev. Biochem. 84, 325–354 (2015).
Keene, J. D. & Tenenbaum, S. A. Eukaryotic mRNPs may represent posttranscriptional operons. Mol. Cell 9, 1161–1167 (2002).
Keene, J. D. RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet. 8, 533–543 (2007).
Li, G.-W., Burkhardt, D., Gross, C. & Weissman, J. S. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635 (2014).
Taggart, J. C. & Li, G.-W. Production of protein-complex components is stoichiometric and lacks general feedback regulation in eukaryotes. Cell Syst. 7, 580–589 (2018).
Amirbeigiarab, S. et al. Invariable stoichiometry of ribosomal proteins in mouse brain tissues with aging. Proc. Natl Acad. Sci. USA 116, 22567–22572 (2019).
Soto, I. et al. Balanced mitochondrial and cytosolic translatomes underlie the biogenesis of human respiratory complexes. Genome Biol. 23, 170 (2022).
Natan, E. et al. Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins. Nat. Struct. Mol. Biol. 25, 279–288 (2018).
Li, G.-W., Oh, E. & Weissman, J. S. The anti-Shine–Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541 (2012).
Bertolini, M. et al. Interactions between nascent proteins translated by adjacent ribosomes drive homomer assembly. Science 371, 57–64 (2021).
Ozadam, H., Geng, M. & Cenik, C. RiboFlow, RiboR and RiboPy: an ecosystem for analyzing ribosome profiling data at read length resolution. Bioinformatics 36, 2929–2931 (2020).
Gerashchenko, M. V. & Gladyshev, V. N. Ribonuclease selection for ribosome profiling. Nucleic Acids Res. 45, e6 (2017).
Mohammad, F., Green, R. & Buskirk, A. R. A systematically-revised ribosome profiling method for bacteria reveals pauses at single-codon resolution. eLife 8, e42591 (2019).
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).
Larsson, O., Sonenberg, N. & Nadon, R. Identification of differential translation in genome wide studies. Proc. Natl Acad. Sci. USA 107, 21487–21492 (2010).
van den Boogaart, K. G., Filzmoser, P., Hron, K., Templ, M. & Tolosana-Delgado, R. Classical and robust regression analysis with compositional data. Math. Geosci. 53, 823–858 (2021).
Quinn, T. P. et al. A field guide for the compositional analysis of any-omics data. Gigascience 8, giz107 (2019).
Quinn, T. P., Richardson, M. F., Lovell, D. & Crowley, T. M.propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci. Rep. 7, 16252 (2017).
Sudmant, P. H., Alexis, M. S. & Burge, C. B. Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biol. 16, 287 (2015).
Wang, Z.-Y. et al. Transcriptome and translatome co-evolution in mammals. Nature 588, 642–647 (2020).
Lu, P., Takai, K., Weaver, V. M. & Werb, Z. Extracellular matrix degradation and remodeling in development and disease. Cold Spring Harb. Perspect. Biol. 3, a005058 (2011).
Artieri, C. G. & Fraser, H. B. Evolution at two levels of gene expression in yeast. Genome Res. 24, 411–421 (2014).
McManus, C. J., May, G. E., Spealman, P. & Shteyman, A. Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast. Genome Res. 24, 422–430 (2014).
Breschi, A., Gingeras, T. R. & Guigó, R. Comparative transcriptomics in human and mouse. Nat. Rev. Genet. 18, 425–440 (2017).
Crow, M., Suresh, H., Lee, J. & Gillis, J. Coexpression reveals conserved gene programs that co-vary with cell type across kingdoms. Nucleic Acids Res. 50, 4302–4314 (2022).
Thoreen, C. C. et al. A unifying model for mTORC1-mediated regulation of mRNA translation. Nature 485, 109–113 (2012).
Wurth, L. et al. UNR/CSDE1 drives a post-transcriptional program to promote melanoma invasion and metastasis. Cancer Cell 36, 337 (2019).
Pierson, E. et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol. 11, e1004220 (2015).
Kershaw, C. J. et al. Translation factor and RNA binding protein mRNA interactomes support broader RNA regulons for posttranscriptional control. J. Biol. Chem. 299, 105195 (2023).
Hentze, M. W., Castello, A., Schwarzl, T. & Preiss, T. A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol. 19, 327–341 (2018).
Liu, Y. The number of genes whose TE significantly correlates with an RBP’s expression. Zenodo https://doi.org/10.5281/zenodo.11359114 (2024).
Korbel, J. O., Jensen, L. J., von Mering, C. & Bork, P. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat. Biotechnol. 22, 911–917 (2004).
Szklarczyk, R. et al. WeGET: predicting new genes for molecular systems by weighted co-expression. Nucleic Acids Res. 44, D567–D573 (2016).
Zhang, M. et al. RNA-binding protein IMP3 is a novel regulator of MEK1/ERK signaling pathway in the progression of colorectal cancer through the stabilization of MEKK1 mRNA. J. Exp. Clin. Cancer Res. 40, 200 (2021).
Bodén, M. & Bailey, T. L. Associating transcription factor-binding site motifs with target GO terms and target genes. Nucleic Acids Res. 36, 4108–4117 (2008).
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Eichhorn, S. W. et al. mRNA destabilization is the dominant effect of mammalian microRNAs by the time substantial repression ensues. Mol. Cell 56, 104–115 (2014).
Bartel, D. P. Metazoan microRNAs. Cell 173, 20–51 (2018).
Mecham, R. The Extracellular Matrix: An Overview (Springer Science & Business Media, 2011).
Kagan, H. M. & Li, W. Lysyl oxidase: properties, specificity, and biological roles inside and outside of the cell. J. Cell. Biochem. 88, 660–672 (2003).
Kikuchi, A. et al. Structural basis for activation of DNMT1. Nat. Commun. 13, 7130 (2022).
Wu, Y.-Y. et al. The hTERT-p50 homodimer inhibits PLEKHA7 expression to promote gastric cancer invasion and metastasis. Oncogene 42, 1144–1156 (2023).
Kurita, S., Yamada, T., Rikitsu, E., Ikeda, W. & Takai, Y. Binding between the junctional proteins afadin and PLEKHA7 and implication in the formation of adherens junction in epithelial cells. J. Biol. Chem. 288, 29356–29368 (2013).
Pulimeno, P., Paschoud, S. & Citi, S. A role for ZO-1 and PLEKHA7 in recruiting paracingulin to tight and adherens junctions of epithelial cells. J. Biol. Chem. 286, 16743–16750 (2011).
Jeung, H.-C. et al. PLEKHA7 signaling is necessary for the growth of mutant KRAS driven colorectal cancer. Exp. Cell. Res. 409, 112930 (2021).
Tavano, S. et al. Insm1 induces neural progenitor delamination in developing neocortex via downregulation of the adherens junction belt-specific protein Plekha7. Neuron 97, 1299–1314 (2018).
Sukonina, V. et al. FOXK1 and FOXK2 regulate aerobic glycolysis. Nature 566, 279–283 (2019).
Kobe, B. & Kajava, A. V. The leucine-rich repeat as a protein recognition motif. Curr. Opin. Struct. Biol. 11, 725–732 (2001).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).
Carlsson, P. & Mahlapuu, M. Forkhead transcription factors: key players in development and metabolism. Dev. Biol. 250, 1–23 (2002).
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Kustatscher, G. et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol. 37, 1361–1371 (2019).
Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Shiber, A. et al. Cotranslational assembly of protein complexes in eukaryotes revealed by ribosome profiling. Nature 561, 268–272 (2018).
Ewing, R. M. et al. Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol. Syst. Biol. 3, 89 (2007).
Drew, K., Wallingford, J. B. & Marcotte, E. M. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 17, e10016 (2021).
Heider, M. R. et al. Subunit connectivity, assembly determinants and architecture of the yeast exocyst complex. Nat. Struct. Mol. Biol. 23, 59–66 (2016).
Kee, Y. et al. Subunit structure of the mammalian exocyst complex. Proc. Natl Acad. Sci. USA 94, 14438–14443 (1997).
Lalanne, J.-B. et al. Evolutionary convergence of pathway-specific enzyme expression stoichiometry. Cell 173, 749–761 (2018).
Bicknell, A. A. et al. Attenuating ribosome load improves protein output from mRNA by limiting translation-dependent mRNA decay. Cell Rep. 43, 114098 (2024).
Liu, T.-Y. et al. Time-resolved proteomics extends ribosome profiling-based measurements of protein synthesis dynamics. Cell Syst. 4, 636–644 (2017).
Wang, M., Herrmann, C. J., Simonovic, M., Szklarczyk, D. & von Mering, C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15, 3163–3168 (2015).
Piepoli, A. et al. The expression of leucine-rich repeat gene family members in colorectal cancer. Exp. Biol. Med. 237, 1123–1128 (2012).
Liu, Y. et al. Identification of differential expression of genes in hepatocellular carcinoma by suppression subtractive hybridization combined cDNA microarray. Oncol. Rep. 18, 943–951 (2007).
Chen, H. et al. miR-218 contributes to drug resistance in multiple myeloma via targeting LRRC28. J. Cell. Biochem. 122, 305–314 (2021).
Vander Heiden, M. G., Cantley, L. C. & Thompson, C. B. Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science 324, 1029–1033 (2009).
Liu, Y. et al. Histone H2AX promotes metastatic progression by preserving glycolysis via hexokinase-2. Sci. Rep. 12, 3758 (2022).
Zheng, D. et al. Predicting the translation efficiency of messenger RNA in mammalian cells. Nat. Bio. https://doi.org/10.1038/s41587-025-02712-x (2025).
Rodriguez, J. M. et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 41, D110–D117 (2013).
Rao, S. et al. Genes with 5′ terminal oligopyrimidine tracts preferentially escape global suppression of translation by the SARS-CoV-2 Nsp1 protein. RNA 27, 1025–1045 (2021).
Mills, E. W. & Green, R. Ribosomopathies: there’s strength in numbers. Science 358, eaan2755 (2017).
Ozadam, H. et al. Single-cell quantification of ribosome occupancy in early mouse development. Nature 618, 1057–1064 (2023).
VanInsberghe, M., van den Berg, J., Andersson-Rolf, A., Clevers, H. & van Oudenaarden, A. Single-cell Ribo-seq reveals cell cycle-dependent translational pausing. Nature 597, 561–565 (2021).
Benoit Bouvrette, L. P., Bovaird, S., Blanchette, M. & Lécuyer, E. oRNAment: a database of putative RNA binding protein target sites in the transcriptomes of model species. Nucleic Acids Res. 48, D166–D173 (2020).
Krismer, K. et al. Transite: a computational motif-based analysis platform that identifies RNA-binding proteins modulating changes in gene expression. Cell Rep. 32, 108064 (2020).
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).
Hou, Y., Xie, T., He, L., Tao, L. & Huang, J. Topological links in predicted protein complex structures reveal limitations of AlphaFold. Commun. Biol. 6, 1098 (2023).
Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023).
Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).
National Center for Biotechnology Information. SRA Tools. GitHub https://github.com/ncbi/sra-tools (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Liu, Y. HeLa ribosome profiling data. Zenodo https://doi.org/10.5281/zenodo.15660080 (2024).
Gerashchenko, M. V. & Gladyshev, V. N. Translation inhibitors cause abnormalities in ribosome profiling experiments. Nucleic Acids Res. 42, e134 (2014).
Wu, C. C.-C., Zinshteyn, B., Wehner, K. A. & Green, R. High-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress. Mol. Cell 73, 959–970 (2019).
Wolin, S. L. & Walter, P. Ribosome pausing and stacking during translation of a eukaryotic mRNA. EMBO J. 7, 3559–3569 (1988).
Sharma, J. et al. A small molecule that induces translational readthrough of CFTR nonsense mutations by eRF1 depletion. Nat. Commun. 12, 4358 (2021).
Tukey, J. W. The future of data analysis. Ann. Math. Stat. 33, 1–67 (1962).
Zhang, X.-O., Yin, Q.-F., Chen, L.-L. & Yang, L. Gene expression profiling of non-polyadenylated RNA-seq across species. Genom. Data 2, 237–241 (2014).
Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G. & Chen, L.-L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).
van den Boogaart, K. G. & Tolosano-Delgado, R. Analyzing Compositional Data with R (Springer, 2013).
Cenik, C. et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 25, 1610–1621 (2015).
Greenacre, M. Compositional data analysis. Annu. Rev. Stat. Appl. 8, 271–299 (2021).
Ramsköld, D., Wang, E. T., Burge, C. B. & Sandberg, R. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput. Biol. 5, e1000598 (2009).
Csárdi, G., Franks, A., Choi, D. S., Airoldi, E. M. & Drummond, D. A. Accounting for experimental noise reveals that mRNA levels, amplified by post-transcriptional processes, largely determine steady-state protein levels in yeast. PLoS Genet. 11, e1005206 (2015).
Schilder, B. M. & Skene, N. G. orthogene: An R package for easy mapping of orthologous genes across hundreds of species. R package version 3.21 https://doi.org/10.18129/B9.bioc.orthogene (2022).
van den Boogaart, K. G. & Tolosana-Delgado, R. ‘compositions’: a unified R package to analyze compositional data. Comput. Geosci. 34, 320–338 (2008).
Kim, S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015).
Berriz, G. F., Beaver, J. E., Cenik, C., Tasan, M. & Roth, F. P. Next generation software for functional trend analysis. Bioinformatics 25, 3043–3044 (2009).
Buttrey, S. & Whitaker, L. TreeClust: an R package for tree-based clustering dissimilarities. R J. 7, 227 (2015).
Wainberg, M. et al. A genome-wide atlas of co-essential modules assigns function to uncharacterized genes. Nat. Genet. 53, 638–649 (2021).
Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023. Genetics 224, iyad031 (2023).
Philippe, L., van den Elzen, A. M. G., Watson, M. J. & Thoreen, C. C. Global analysis of LARP1 translation targets reveals tunable and dynamic features of 5′ TOP motifs. Proc. Natl Acad. Sci. USA 117, 5319–5328 (2020).
Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).
Carlson, M. org.Mm.eg.db: Genome wide annotation for mouse. R package version 3.21 https://doi.org/10.18129/B9.bioc.org.Mm.eg.db (2025).
Carlson, M. org.Hs.eg.db: Genome wide annotation for human. R package version 3.21 https://doi.org/10.18129/B9.bioc.org.Hs.eg.db (2025).
Liu, Y. Intermediate data for TE calculation. Zenodo https://doi.org/10.5281/zenodo.10373032 (2024).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Hu, Y. et al. Paralog Explorer: a resource for mining information about paralogs in common research organisms. Comput. Struct. Biotechnol. J. 20, 6570–6577 (2022).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Ho, D., Imai, K., King, G. & Stuart, E. A. MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw. https://doi.org/10.18637/jss.v042.i08 (2011).
Sanson, K. R. et al. Optimized libraries for CRISPR–Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416 (2018).
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Liu, Y. KO_validation_RiboBase. Zenodo https://doi.org/10.5281/zenodo.11388478 (2024).
Yue, L. coTE_paper: code and to generate main figures. Zenodo https://doi.org/10.5281/zenodo.15337774 (2025).
Acknowledgements
We thank all contributors to metadata curation: H. Chiang, A. Hoffman, T. Tonn, A. Segura, C. Tante, E. Vasquez and L. Xu. We also thank Y. Shin and V. D. Chapman for their help with the experiments. We appreciate M. Miladi for providing critical feedback. The original text in this paper was written by the authors. A large language model was used to suggest edits for clarity and grammar (Open AI ChatGPT, https://chat.openai.com). The authors acknowledge the Texas Advanced Computing Center at The University of Texas at Austin (http://www.tacc.utexas.edu) for providing high-performance computing and storage resources that contributed to the research results reported within this paper.
Research reported in this publication was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health (NIH) under award numbers R35GM150667 (C.C.) and R35GM138340 (E.S.C.). This work was also supported by NIH grant HD110096 (C.C.) and Welch Foundation grants F-2027-20230405 (C.C.) and F-2133-20230405 (E.S.C.). C.C. was a Cancer Prevention and Research Institute of Texas (CPRIT) Scholar in Cancer Research, supported by CPRIT grant RR180042.
Author information
Authors and Affiliations
Contributions
Y.L., I.H. and C.C. co-wrote the original manuscript. Y.L., I.H. and S.R. generated the figures for the manuscript. H.O., M.G. and J.C. downloaded all the data from the GEO and processed raw sequencing data. Y.L. and C.C. developed the TE calculation pipeline. J.C. and C.C. designed and implemented the winsorization method. Y.Z. performed the deduplication comparison. Y.L. and C.C. developed the TE covariation analysis and function prediction pipelines. Y.L., K.Q. and H.O. performed the quality control analysis for all sequencing data. Y.L. carried out covariation analysis, gene function prediction and AlphaFold2 analysis. I.H. conducted the RBP analysis. L.P., J.W., D.Z. and V.A. assessed the quality of TE measurements by developing machine learning approaches. H.O., J.W., D.Z., V.A., Q.Z. and E.S.C. provided suggestions for the manuscript. Y.L., Q.Z. and E.S.C. conducted the literature search to evaluate gene function predictions. S.R. conducted the experimental validation of LRRC28. S.R., I.H., V.G. and D.P. performed other experiments. C.C. provided study oversight, conceptualized the study and acquired funding. All authors approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
D.Z., J.W. and V.A. are employees of Sanofi and may hold shares and/or stock options in the company. H.O. is an employee of Sail Biomedicines. I.H. is an employee of Monoceros Biosystems. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Sequencing quality of ribosome profiling data.
a, Distribution of read counts for 2,195 human and 1,624 mouse ribosome profiling data in RiboBase. In all figure panels, the horizontal line corresponds to the median. The box represents the interquartile range and the whiskers extend to 1.5 times of it. b, Distribution plot similar to panel a for 1,282 human and 995 mouse ribosome profiling data with matched RNA-seq. c, Distribution of the proportion of read count aligned to transcripts, read counts with high-quality alignments, and the percentage of reads remaining after PCR deduplication, relative to the total number of reads from panel a. d, Similar plot as panel c for ribosome profiling with matched RNA-seq. e, The read length distribution of RPFs aligned to coding sequences for all human experiments. The color in the heatmap represents the z-score adjusted RPF counts (Methods). Each experiment where the percentage of RPFs mapping to CDS was greater than 70% and achieving sufficient coverage of the transcript (>= 0.1X) was annotated as QC-pass. f, Similar to panel a for mouse samples.
Extended Data Fig. 2 Quality control and RPF length selection.
a, RPFs shorter than 21 nucleotides were removed, then we identified the RPF length with the highest number of reads mapping to CDS to serve as the starting point. Subsequently, we compared one nucleotide longer or shorter than the first and chose the length with the most reads again. This looping process continued until at least 85% of the total CDS mapping RPFs were included. b, We compared the usable reads selected with two different boundary cutoffs (y-axis) and the proportion of these selected reads that map to the coding regions (x-axis) for each ribosome profiling experiment. c, The percentage of ribosome profiling experiments from GEO that pass or fail quality control (the percentage of RPFs mapping to CDS was greater than 70% and achieving at least 0.1X coverage of the transcript as QC pass).
Extended Data Fig. 3 Assessment of periodicity and data matching for TE estimation.
a-d, In ribosome profiling experiments from RiboBase, samples were classified according to distinct periodicity patterns (Methods). For all figure panels, we added error bars to represent the standard deviation across samples. Statistical significance was assessed using the Wilcoxon test, and the p-values were subsequently adjusted for all 33 comparisons using the Benjamini-Hochberg method. We considered the Group 1 pattern as indicative of the expected three-nucleotide periodicity patterns. Human samples that pass quality control (a), human samples that fail quality control (b), mouse samples that pass quality control (c), mouse samples that fail quality control (d). e, We calculated the coefficient of determination (R²) between a specific ribosome profiling experiment and its corresponding RNA-seq from RiboBase. Additionally, we determined the average R² for all other pairings for the same ribosome profiling sample with other RNA-seq data from the same study. The matching score represents the difference in R² values between these two (x-axis; Methods). f, A dashed line at 0.188 serves as the threshold to identify samples with poor matching (Methods). In each figure panel containing boxplots, the horizontal line corresponds to the median. The box represents the IQR and the whiskers extend to 1.5 times of it. g, Distribution of standard error of TE values across tissue and cell lines (y-axis) for genes with polyA and without polyA tails.
Extended Data Fig. 4 Detailed workflow of data processing for TE and TEC calculations.
a, We selected ribosome profiling data with matched RNA-seq and removed duplicated reads with identical positions and lengths (PCR-deduplication). We set the RPF read length range for individual samples with our dynamic cutoff and filtered out ribosome profiling experiments that failed quality control. After selecting high-quality samples, we reprocessed all these ribosome profiling experiments using the winsorization method with non-deduplicated data. We removed genes without polyA tails and kept genes with sufficient counts per million RPFs. After obtaining RPF counts from the coding regions for both ribosome profiling and RNA-seq, we performed CLR normalization and compositional linear regression, defining the residuals as TE for each gene in each sample. We averaged this sample-level TE based on cell lines and tissues. TEC is further calculated with rho scores38. To build an RNA co-expression matrix, we transformed CDS counts from RNA-seq experiments using CLR, averaged them based on cell lines and tissue, and calculated pairwise proportionalities (rho scores).
Extended Data Fig. 5 Spearman correlation between TE and protein abundance.
a, The correlation between protein abundance and clr-transformed RPF counts from ribosome profiling (left), clr-transformed read counts from RNA-seq (middle), or TE calculated with winsorized RPFs counts using the linear regression model (right). Individual dots indicate specific experiments colored according to study (68 samples from 11 studies-HEK293, 86 samples from 10 studies-HeLa, 58 samples from 4 studies-U2OS, 29 samples from 5 studies-A549, 5 samples from 2 studies-MCF7, 7 samples from 2 studies-K562, 10 samples from 2 studies-HepG2). In the boxplot, the horizontal line corresponds to the median. The box represents the IQR and the whiskers extend to 1.5 times of this range. b, TE was calculated with winsorized RPF counts without deduplication or with deduplication based on position and fragment length. The Spearman correlation coefficient between TE calculated with winsorized RPF counts and protein abundance82 (y-axis) was plotted against “delta correlation” (x-axis) defined by subtracting the correlation values obtained with PCR deduplication from those obtained with the method using winsorized RPF counts without deduplication.
Extended Data Fig. 6 PCR vs UMI deduplication comparison for GSE144140.
a, Metagene plots centered on the start codon for samples GSM4282032 (RPFs range: 28-36 nt), GSM4282033 (RPFs range: 28-36 nt range), and GSM4282034 (RPFs range: 26-35 nt range) were plotted using three different deduplication methods: non-deduplication (ND), UMI-deduplication (UMI), and PCR-deduplication (PCR). b, Correlation of gene counts for GSM4282032 between the three deduplication methods. A blue diagonal line represents a 1:1 ratio in all figure panels. Same analysis as panel b for GSM4282033c, and GSM4282034d.
Extended Data Fig. 7 Conservation of gene expression between human and mouse.
a, The relationship between the mean RNA expressions (clr-transformed counts) of 9,194 orthologous genes across two species is plotted. Dots represent genes in all figure panels. b, The variability of genes’ RNA expression was quantified with metric standard deviation (msd; Methods) across different cell lines and tissues in either human or mouse. To account for the correlation between mean RNA expression and its variability, we adjusted the msd values with their mean values (Methods). c, The scatter plot shows the adjusted msd values (y-axis; Methods) and the average TE across different cell types (x-axis) for human genes. d, Similar analysis as in panel c for mouse genes.
Extended Data Fig. 8 Evaluation TEC calculation methods and TEC patterns.
a, The AUROCs for biological functions were calculated using the similarity scores among genes at ribosome occupancy level determined by eight distinct methods with 1,794 human ribosome profiling data (Methods). In the boxplot, the horizontal line corresponds to the median. The box represents the IQR and the whiskers extend to the largest value within 1.5 times the IQR from the hinge. The dot in this figure represents the AUROC for human 5’ TOP mRNAs. b, TE values that were randomly reassigned from the original data for each gene (shuffled) and TEC was calculated. In the figure panel, we plotted the number of orthologous gene pairs within specified ranges. Each dot represents the aggregated log10-transformed counts of these gene pairs. The dashed line captures 95% of the data. c, Distribution of absolute TEC among 110 TOP motif-containing mRNAs123 and 83 transcripts targeted by CSDE1 (Supplementary Table 22 (ref. 47); Methods) in comparison to all 11,149 human genes as background. Statistical significance between the groups was assessed using a Wilcoxon two-tailed test.
Extended Data Fig. 9 TEC and RNA co-expression among genes with shared functions.
a, A comparison between the number of human GO terms that have AUROC of 0.8 or higher with either TEC or RNA co-expression. b, Motif enrichment in human GO terms. RNA binding proteins (RBPs) from oRNAment94 or Transite95 are indicated. P-values were corrected using the Holm method and those kmers with a p-value < 0.05 are shown. c, Venn diagram for mouse GO terms that achieve an AUROC of 0.8 or higher with proportionality scores (rho) among genes at either TE or RNA expression level. d, The AUROC plot was calculated with genes associated with mannosyltransferase activity in mice. e, The connections represent absolute rho values above 0.1 in either TE pattern alone (green) from d, in both RNA co-expression and TE pattern (blue), or RNA co-expression alone (gray). f, Motif enrichment in mouse GO terms. RNA binding proteins (RBPs) from oRNAment94 or Transite95 are indicated. P-values were corrected using the Holm method and those kmers with a p-value < 0.05 are shown. g, We summarized GO terms where genes exhibit greater similarity at the TE level than at the RNA expression level (AUROC with TEC > 0.8, and different AUROC between TEC and RNA co-expression > 0.1) in mice. We visualized the distribution of absolute rho score for gene pairs within each specific GO term (bottom; gene pairs with abs(rho) > 0.1) at the TE level.
Extended Data Fig. 10 3D structure of the interaction between LRRC28 with FOXK1.
a, AlphaFold2-multimer predicted binding between LRRC28 and FOXK1. Kinetic ECAR response of b, MCF-7 cell line (n = 6, stable overexpression) and c, HEK293T cell line (n = 6; stable overexpression) overexpressing LRRC28 or LRCC42 to 10 mM glucose and 100 mM 2-DG. Unpaired two-sided Student’s t-test, (MCF-7; measurement 4 p = 0.06, 5 p = 0.1, 6 p = 0.3 & HEK293T measurement 4 p = 0.6, 5 p = 0.8, 6 p = 0.4). Panels b & c show mean ± s.d.; n shows biological independent experiments.
Supplementary information
Supplementary Information
Supplementary Information and Figs. 1–7.
Supplementary Tables1–23
Twenty-three supplementary tables support the results in this paper. The data can also serve as the source data to repeat the analysis results. Supplementary Table 1: Curated metadata for human and mouse datasets in RiboBase. Supplementary Table 2: Quality control of human ribosome profiling data. Supplementary Table 3: Quality control of mouse ribosome profiling data. Supplementary Table 4: Quality control of human RNA-seq data. Supplementary Table 5: Quality control of mouse RNA-seq data. Supplementary Table 6: RPF boundaries for human ribosome profiling data. Supplementary Table 7: RPF boundaries for mouse ribosome profiling data. Supplementary Table 8: Non-poly(A) gene list for human. Supplementary Table 9: Non-poly(A) gene list for mouse. Supplementary Table 10: Human linear regression-based TE. Supplementary Table 11: Mouse linear regression-based TE. Supplementary Table 12: Homologous genes between human and mouse. Supplementary Table 13: AUROC values for human GO terms with either ribosome profiling or RNA-seq data. Supplementary Table 14: AUROC values for mouse GO terms with either ribosome profiling or RNA-seq data. Supplementary Table 15: Literature-supported prediction of gene exhibiting TEC with genes associated with specific human and mouse GO terms. Supplementary Table 16: All predictions of a new gene exhibiting TEC with genes associated with specific human GO terms. Supplementary Table 17: All predictions of a new gene exhibiting TEC with genes associated with specific mouse GO terms. Supplementary Table 18: pDockQ and ipTM+PTM scores between human LRRC proteins and members of the forkhead family transcription factors. Supplementary Table 19: Two-sided chi-square test on the direction of similarity (same or different) among human gene pairs from GO terms and the STRING database. Supplementary Table 20: gRNA sequences for RBP validation. Supplementary Table 21: Primer sequences for human LRRC28 and LRRC42. Supplementary Table 22: Human 5′ TOP mRNA list and CSDE1 target gene list. Supplementary Table 23: Example comparing TE calculation using the canonical log-ratio method and the linear regression approach.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Rao, S., Hoskins, I. et al. Translation efficiency covariation identifies conserved coordination patterns across cell types. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02718-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41587-025-02718-5