这是indexloc提供的服务,不要输入任何密码

There is a newer version of the record available.

Published August 6, 2023 | Version 1.0
Journal article Open

Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types

  • 1. Sanofi Pasteur Inc.
  • 2. Berlin Institute of Health at Charité - Universitätsmedizin Berlin
  • 3. Vavilov Institute of General Genetics

Description

The human genome contains millions of candidate regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing an extensive set of annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific ‘on switches’ providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.

Notes

best_model.zip: pre-trained model weights for MPRALegNet, trained on each of the three cell types examined final_joint_dump.zip: pre-trained model weights for MPRALegNet, trained on the joint library tested in each of the three cell types examined human_legnet-main.zip: Github code for training and testing MPRALegNet sequence_cnn_models-master.zip: Github code for training and testing MPRAnn, as well as interpreting in silico mutagenesis scores

Files

best_model.zip

Files (7.6 GB)

Name Size Download all
md5:e8d11cdd0710f6b4523da02059ace0b0
3.5 GB Preview Download
md5:edf24033ce6adf8e3cb27e465730c329
4.0 GB Preview Download
md5:d926b0c7b0f2a70c52e0b4ac3ca284cb
32.9 MB Preview Download
md5:199b3c4fa89318ccf9a19cf86771eae8
1.5 MB Preview Download
md5:513cb7678f4ae31634e501c6bcd10474
56.2 MB Download
md5:cf9fdf589d60f1d85244e9cd06aa8ae2
6.1 MB Download
md5:5d9b71452e9648e8473edd74a4ae26fc
8.3 MB Download
md5:ef967c8388d84da9ec81d22e11c868bd
3.6 MB Download