Intermediate data for TE calculation

Liu, Yue

doi:10.5281/zenodo.10373032

Published December 13, 2023 | Version v1

Dataset Open

Intermediate data for TE calculation

Liu, Yue

This dataset includes intermediate data from RiboBase that generates translation efficiency (TE). The code to generate the files can be found at https://github.com/CenikLab/TE_model.

We uploaded demo HeLa .ribo files, but due to the large storage requirements of the full dataset, I recommend contacting Dr. Can Cenik directly to request access to the complete version of RiboBase if you need the original data.

The detailed explanation for each file:

human_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in human.

human_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in human.

human_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in human.

human_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in human.

human_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in human.

human_TE_rho.rda: TE proportional similarity data as genes by genes matrix in human.

mouse_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

mouse_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

mouse_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in mouse.

mouse_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in mouse.

mouse_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in mouse.

mouse_TE_rho.rda: TE proportional similarity data as genes by genes matrix in mouse.

All the data was passed quality control. There are 1054 mouse samples and 835 mouse samples:
* coverage > 0.1 X
* CDS percentage > 70%
* R2 between RNA and RIBO >= 0.188 (remove outliers)

All ribosome profiling data here is non-dedup winsorizing data paired with RNA-seq dedup data without winsorizing (even though it names as flatten, it just the same format of the naming)

####code
If you need to read rda data please use load("rdaname.rda") with R

If you need to calculate proportional similarity from clr data:
library(propr)
human_TE_homo_rho <- propr:::lr2rho(as.matrix(clr_data))
rownames(human_TE_homo_rho) <- colnames(human_TE_homo_rho) <- rownames(clr_data)

Files

human_TE_cellline_all_plain.csv

Files (4.2 GB)

Name	Size	Download all
human_flatten_ribo_clr.rda md5:b1a92cb1791955616a2e4d4de79e260b	34.1 MB	Download
human_flatten_rna_clr.rda md5:dcb48f58b2a44f5a9aa17d440866d46f	33.8 MB	Download
human_flatten_te_clr.rda md5:bae3ffdf91183c53cd5b8bab52aa840f	90.2 MB	Download
human_RNA_rho_new.rda md5:3e54e62877adebb0b3fdc2d51ae41309	943.8 MB	Download
human_TE_cellline_all_plain.csv md5:45c9336ecf4ea8d12b1c81bb9304af74	15.7 MB	Preview Download
human_TE_rho.rda md5:63f7230e0fcdf64d65a5f76dd11c0648	944.1 MB	Download
mouse_flatten_ribo_clr.rda md5:7e3fc2071d65ef88a60744699f1d719a	26.4 MB	Download
mouse_flatten_rna_clr.rda md5:1cf927f7d66409e25fce27327bfd977d	27.6 MB	Download
mouse_flatten_te_clr.rda md5:065d0bbb0635f98e1f7cf68a75c8ae22	73.1 MB	Download
mouse_RNA_rho_new.rda md5:98c3bb06ad803b0f76a115a2bbe10596	993.5 MB	Download
mouse_TE_cellline_all_plain.csv md5:8cdfc2b233c46b9c704160055b913c4a	14.2 MB	Preview Download
mouse_TE_rho.rda md5:3991de0778b8210762f591406f819a75	993.1 MB	Download

	All versions	This version
Views	284	284
Downloads	533	533
Data volume	121.5 GB	121.5 GB

Intermediate data for TE calculation

Creators

Description

Files

human_TE_cellline_all_plain.csv

Files (4.2 GB)