WO2021058145A1 - Phage t7 promoters for boosting in vitro transcription - Google Patents
Phage t7 promoters for boosting in vitro transcription Download PDFInfo
- Publication number
- WO2021058145A1 WO2021058145A1 PCT/EP2020/066070 EP2020066070W WO2021058145A1 WO 2021058145 A1 WO2021058145 A1 WO 2021058145A1 EP 2020066070 W EP2020066070 W EP 2020066070W WO 2021058145 A1 WO2021058145 A1 WO 2021058145A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rna
- sequence
- positions
- polynucleotide
- promoter
- Prior art date
Links
- 238000000338 in vitro Methods 0.000 title claims abstract description 84
- 238000013518 transcription Methods 0.000 title description 38
- 230000035897 transcription Effects 0.000 title description 36
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 164
- 239000002773 nucleotide Substances 0.000 claims abstract description 160
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical group O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 claims abstract description 143
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical group CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 claims abstract description 140
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical group NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims abstract description 132
- 238000000034 method Methods 0.000 claims abstract description 117
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical group NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 claims abstract description 82
- 229940104302 cytosine Drugs 0.000 claims abstract description 66
- 229930024421 Adenine Natural products 0.000 claims abstract description 42
- 229960000643 adenine Drugs 0.000 claims abstract description 42
- 229940113082 thymine Drugs 0.000 claims abstract description 42
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims abstract description 34
- -1 ribonucleotide triphosphates Chemical class 0.000 claims abstract description 13
- 239000001226 triphosphate Substances 0.000 claims abstract description 10
- 235000011178 triphosphate Nutrition 0.000 claims abstract description 10
- 108091028664 Ribonucleotide Proteins 0.000 claims abstract description 9
- 239000002336 ribonucleotide Substances 0.000 claims abstract description 9
- 108020004414 DNA Proteins 0.000 claims description 209
- 108091033319 polynucleotide Proteins 0.000 claims description 180
- 102000040430 polynucleotide Human genes 0.000 claims description 178
- 239000002157 polynucleotide Substances 0.000 claims description 178
- 238000012163 sequencing technique Methods 0.000 claims description 97
- 108020004999 messenger RNA Proteins 0.000 claims description 91
- 238000010839 reverse transcription Methods 0.000 claims description 91
- 108090000623 proteins and genes Proteins 0.000 claims description 90
- 239000002299 complementary DNA Substances 0.000 claims description 83
- 108020005544 Antisense RNA Proteins 0.000 claims description 54
- 238000011144 upstream manufacturing Methods 0.000 claims description 52
- 239000003184 complementary RNA Substances 0.000 claims description 51
- 230000000295 complement effect Effects 0.000 claims description 45
- 230000002441 reversible effect Effects 0.000 claims description 31
- 230000027455 binding Effects 0.000 claims description 27
- 230000036961 partial effect Effects 0.000 claims description 22
- 230000002194 synthesizing effect Effects 0.000 claims description 22
- 238000002360 preparation method Methods 0.000 claims description 18
- 238000011528 liquid biopsy Methods 0.000 claims description 17
- 108091079001 CRISPR RNA Proteins 0.000 claims description 15
- 108091034117 Oligonucleotide Proteins 0.000 claims description 14
- 108020005004 Guide RNA Proteins 0.000 claims description 12
- 108020004459 Small interfering RNA Proteins 0.000 claims description 12
- 210000001124 body fluid Anatomy 0.000 claims description 11
- 239000010839 body fluid Substances 0.000 claims description 11
- 108091027963 non-coding RNA Proteins 0.000 claims description 10
- 102000042567 non-coding RNA Human genes 0.000 claims description 10
- 238000000137 annealing Methods 0.000 claims description 8
- 241000894006 Bacteria Species 0.000 claims description 7
- 238000007481 next generation sequencing Methods 0.000 claims description 7
- 125000002652 ribonucleotide group Chemical group 0.000 claims description 5
- 241000206602 Eukaryota Species 0.000 claims description 3
- 239000003795 chemical substances by application Substances 0.000 claims description 3
- UNXRWKVEANCORM-UHFFFAOYSA-N triphosphoric acid Chemical compound OP(O)(=O)OP(O)(=O)OP(O)(O)=O UNXRWKVEANCORM-UHFFFAOYSA-N 0.000 claims description 3
- 241001465754 Metazoa Species 0.000 claims description 2
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 2
- 102000053602 DNA Human genes 0.000 description 187
- 210000004027 cell Anatomy 0.000 description 177
- 229920002477 rna polymer Polymers 0.000 description 167
- 230000000694 effects Effects 0.000 description 49
- 230000003321 amplification Effects 0.000 description 47
- 238000003199 nucleic acid amplification method Methods 0.000 description 47
- 238000003752 polymerase chain reaction Methods 0.000 description 26
- 125000003275 alpha amino acid group Chemical group 0.000 description 25
- 230000015572 biosynthetic process Effects 0.000 description 23
- 239000000523 sample Substances 0.000 description 22
- 238000003786 synthesis reaction Methods 0.000 description 21
- 102000004190 Enzymes Human genes 0.000 description 19
- 108090000790 Enzymes Proteins 0.000 description 19
- 108091028043 Nucleic acid sequence Proteins 0.000 description 19
- 238000002474 experimental method Methods 0.000 description 18
- 230000001965 increasing effect Effects 0.000 description 18
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 16
- 102000004169 proteins and genes Human genes 0.000 description 16
- 206010028980 Neoplasm Diseases 0.000 description 15
- 108010020764 Transposases Proteins 0.000 description 15
- 102000008579 Transposases Human genes 0.000 description 15
- 238000006243 chemical reaction Methods 0.000 description 14
- 230000000875 corresponding effect Effects 0.000 description 14
- 210000004748 cultured cell Anatomy 0.000 description 14
- 230000014509 gene expression Effects 0.000 description 14
- 108091026890 Coding region Proteins 0.000 description 13
- 108020004705 Codon Proteins 0.000 description 12
- 150000007523 nucleic acids Chemical class 0.000 description 12
- 230000004568 DNA-binding Effects 0.000 description 11
- 108700009124 Transcription Initiation Site Proteins 0.000 description 11
- 150000001413 amino acids Chemical class 0.000 description 11
- 238000013459 approach Methods 0.000 description 11
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 11
- 239000012634 fragment Substances 0.000 description 11
- 239000004055 small Interfering RNA Substances 0.000 description 11
- 238000003559 RNA-seq method Methods 0.000 description 10
- 201000010099 disease Diseases 0.000 description 10
- 238000010362 genome editing Methods 0.000 description 10
- 239000002679 microRNA Substances 0.000 description 10
- 102000039446 nucleic acids Human genes 0.000 description 10
- 108020004707 nucleic acids Proteins 0.000 description 10
- 230000001105 regulatory effect Effects 0.000 description 10
- 238000012174 single-cell RNA sequencing Methods 0.000 description 10
- 238000013519 translation Methods 0.000 description 10
- 239000000463 material Substances 0.000 description 9
- 239000000203 mixture Substances 0.000 description 9
- 108091033409 CRISPR Proteins 0.000 description 8
- 108010077544 Chromatin Proteins 0.000 description 8
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 8
- 101710137500 T7 RNA polymerase Proteins 0.000 description 8
- 108091023040 Transcription factor Proteins 0.000 description 8
- 102000040945 Transcription factor Human genes 0.000 description 8
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 8
- 210000003483 chromatin Anatomy 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000035945 sensitivity Effects 0.000 description 8
- 108020003589 5' Untranslated Regions Proteins 0.000 description 7
- 108091061744 Cell-free fetal DNA Proteins 0.000 description 7
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 7
- 239000011324 bead Substances 0.000 description 7
- 238000001514 detection method Methods 0.000 description 7
- 230000010354 integration Effects 0.000 description 7
- 102100031780 Endonuclease Human genes 0.000 description 6
- 108700011259 MicroRNAs Proteins 0.000 description 6
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 6
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 6
- 239000008280 blood Substances 0.000 description 6
- 210000004369 blood Anatomy 0.000 description 6
- 201000011510 cancer Diseases 0.000 description 6
- 230000001404 mediated effect Effects 0.000 description 6
- 239000013612 plasmid Substances 0.000 description 6
- 230000008092 positive effect Effects 0.000 description 6
- 208000032791 BCR-ABL1 positive chronic myelogenous leukemia Diseases 0.000 description 5
- 108010040467 CRISPR-Associated Proteins Proteins 0.000 description 5
- 108020004635 Complementary DNA Proteins 0.000 description 5
- 241000701867 Enterobacteria phage T7 Species 0.000 description 5
- 230000006819 RNA synthesis Effects 0.000 description 5
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 5
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 5
- 238000009396 hybridization Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 108091008146 restriction endonucleases Proteins 0.000 description 5
- 230000005758 transcription activity Effects 0.000 description 5
- 230000017105 transposition Effects 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- 108020005345 3' Untranslated Regions Proteins 0.000 description 4
- 208000010833 Chronic myeloid leukaemia Diseases 0.000 description 4
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 4
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 4
- 208000033761 Myelogenous Chronic BCR-ABL Positive Leukemia Diseases 0.000 description 4
- 108010073062 Transcription Activator-Like Effectors Proteins 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000010804 cDNA synthesis Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000000977 initiatory effect Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 210000003705 ribosome Anatomy 0.000 description 4
- 239000007858 starting material Substances 0.000 description 4
- 229940035893 uracil Drugs 0.000 description 4
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 3
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 3
- 238000010354 CRISPR gene editing Methods 0.000 description 3
- 108091033380 Coding strand Proteins 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- 101710163270 Nuclease Proteins 0.000 description 3
- 108010012306 Tn5 transposase Proteins 0.000 description 3
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 3
- 229960005305 adenosine Drugs 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 238000010367 cloning Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 238000012350 deep sequencing Methods 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000007901 in situ hybridization Methods 0.000 description 3
- 238000011534 incubation Methods 0.000 description 3
- 230000008774 maternal effect Effects 0.000 description 3
- 230000001394 metastastic effect Effects 0.000 description 3
- 206010061289 metastatic neoplasm Diseases 0.000 description 3
- 239000003607 modifier Substances 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- QJGQUHMNIGDVPM-UHFFFAOYSA-N nitrogen group Chemical group [N] QJGQUHMNIGDVPM-UHFFFAOYSA-N 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- 238000007671 third-generation sequencing Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 230000002103 transcriptional effect Effects 0.000 description 3
- 239000011701 zinc Substances 0.000 description 3
- 229910052725 zinc Inorganic materials 0.000 description 3
- FZWGECJQACGGTI-UHFFFAOYSA-N 2-amino-7-methyl-1,7-dihydro-6H-purin-6-one Chemical compound NC1=NC(O)=C2N(C)C=NC2=N1 FZWGECJQACGGTI-UHFFFAOYSA-N 0.000 description 2
- XTWYTFMLZFPYCI-KQYNXXCUSA-N 5'-adenylphosphoric acid Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O XTWYTFMLZFPYCI-KQYNXXCUSA-N 0.000 description 2
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- OGHAROSJZRTIOK-FCIPNVEPSA-O 7-methylguanosine Chemical compound C1=2N=C(N)NC(=O)C=2[N+](C)=CN1[C@@H]1O[C@@H](CO)[C@H](O)[C@H]1O OGHAROSJZRTIOK-FCIPNVEPSA-O 0.000 description 2
- XTWYTFMLZFPYCI-UHFFFAOYSA-N Adenosine diphosphate Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(COP(O)(=O)OP(O)(O)=O)C(O)C1O XTWYTFMLZFPYCI-UHFFFAOYSA-N 0.000 description 2
- ZKHQWZAMYRWXGA-UHFFFAOYSA-N Adenosine triphosphate Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)C(O)C1O ZKHQWZAMYRWXGA-UHFFFAOYSA-N 0.000 description 2
- 241000180579 Arca Species 0.000 description 2
- 241000713838 Avian myeloblastosis virus Species 0.000 description 2
- 108010014064 CCCTC-Binding Factor Proteins 0.000 description 2
- 208000024172 Cardiovascular disease Diseases 0.000 description 2
- 102000012410 DNA Ligases Human genes 0.000 description 2
- 108010061982 DNA Ligases Proteins 0.000 description 2
- 230000007018 DNA scission Effects 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 206010061818 Disease progression Diseases 0.000 description 2
- XKMLYUALXHKNFT-UUOKFMHZSA-N Guanosine-5'-triphosphate Chemical class C1=2NC(N)=NC(=O)C=2N=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O XKMLYUALXHKNFT-UUOKFMHZSA-N 0.000 description 2
- 108010033040 Histones Proteins 0.000 description 2
- 241000713869 Moloney murine leukemia virus Species 0.000 description 2
- 240000007019 Oxalis corniculata Species 0.000 description 2
- 108091027967 Small hairpin RNA Proteins 0.000 description 2
- 108091081024 Start codon Proteins 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 102100021393 Transcriptional repressor CTCFL Human genes 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- UDMBCSSLTHHNCD-KQYNXXCUSA-N adenosine 5'-monophosphate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP(O)(O)=O)[C@@H](O)[C@H]1O UDMBCSSLTHHNCD-KQYNXXCUSA-N 0.000 description 2
- 210000004381 amniotic fluid Anatomy 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 2
- 238000003149 assay kit Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000011230 binding agent Substances 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 210000002358 circulating endothelial cell Anatomy 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 239000012568 clinical material Substances 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 230000005750 disease progression Effects 0.000 description 2
- 230000008482 dysregulation Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000001973 epigenetic effect Effects 0.000 description 2
- 210000003527 eukaryotic cell Anatomy 0.000 description 2
- 238000007672 fourth generation sequencing Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 210000005260 human cell Anatomy 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 210000004962 mammalian cell Anatomy 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 208000010125 myocardial infarction Diseases 0.000 description 2
- XEBWQGVWTUSTLN-UHFFFAOYSA-M phenylmercury acetate Chemical compound CC(=O)O[Hg]C1=CC=CC=C1 XEBWQGVWTUSTLN-UHFFFAOYSA-M 0.000 description 2
- 230000003169 placental effect Effects 0.000 description 2
- 230000001124 posttranscriptional effect Effects 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 239000002994 raw material Substances 0.000 description 2
- 238000010223 real-time analysis Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 238000002864 sequence alignment Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 238000001890 transfection Methods 0.000 description 2
- 230000014621 translational initiation Effects 0.000 description 2
- 210000002700 urine Anatomy 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- YKBGVTZYEHREMT-KVQBGUIXSA-N 2'-deoxyguanosine Chemical group C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](CO)O1 YKBGVTZYEHREMT-KVQBGUIXSA-N 0.000 description 1
- MKKDSIYTXTZFAU-UHFFFAOYSA-N 2-amino-3,7-dihydropurin-6-one;5-methyl-1h-pyrimidine-2,4-dione Chemical compound CC1=CNC(=O)NC1=O.O=C1NC(N)=NC2=C1NC=N2 MKKDSIYTXTZFAU-UHFFFAOYSA-N 0.000 description 1
- MWBWWFOAEOYUST-UHFFFAOYSA-N 2-aminopurine Chemical compound NC1=NC=C2N=CNC2=N1 MWBWWFOAEOYUST-UHFFFAOYSA-N 0.000 description 1
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 description 1
- NMRPZKUERWKZCL-IVZWLZJFSA-N 3-[(2r,4s,5r)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]-6-methyl-7h-pyrrolo[2,3-d]pyrimidin-2-one Chemical compound O=C1N=C2NC(C)=CC2=CN1[C@H]1C[C@H](O)[C@@H](CO)O1 NMRPZKUERWKZCL-IVZWLZJFSA-N 0.000 description 1
- LQLQRFGHAALLLE-UHFFFAOYSA-N 5-bromouracil Chemical compound BrC1=CNC(=O)NC1=O LQLQRFGHAALLLE-UHFFFAOYSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- CKOMXBHMKXXTNW-UHFFFAOYSA-N 6-methyladenine Chemical compound CNC1=NC=NC2=C1N=CN2 CKOMXBHMKXXTNW-UHFFFAOYSA-N 0.000 description 1
- ZKHQWZAMYRWXGA-KQYNXXCUSA-J ATP(4-) Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)[C@H]1O ZKHQWZAMYRWXGA-KQYNXXCUSA-J 0.000 description 1
- 108010032595 Antibody Binding Sites Proteins 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 108091028075 Circular RNA Proteins 0.000 description 1
- UDMBCSSLTHHNCD-UHFFFAOYSA-N Coenzym Q(11) Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(COP(O)(O)=O)C(O)C1O UDMBCSSLTHHNCD-UHFFFAOYSA-N 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 108010017826 DNA Polymerase I Proteins 0.000 description 1
- 102000004594 DNA Polymerase I Human genes 0.000 description 1
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 1
- 102100029764 DNA-directed DNA/RNA polymerase mu Human genes 0.000 description 1
- 241000721047 Danaus plexippus Species 0.000 description 1
- 241000191543 Escherichia virus T7 Species 0.000 description 1
- 108010007577 Exodeoxyribonuclease I Proteins 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 102100029075 Exonuclease 1 Human genes 0.000 description 1
- 229940123611 Genome editing Drugs 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 102000011787 Histone Methyltransferases Human genes 0.000 description 1
- 108010036115 Histone Methyltransferases Proteins 0.000 description 1
- 102000003893 Histone acetyltransferases Human genes 0.000 description 1
- 108090000246 Histone acetyltransferases Proteins 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 241001562081 Ikeda Species 0.000 description 1
- 108020004684 Internal Ribosome Entry Sites Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 108010038807 Oligopeptides Proteins 0.000 description 1
- 102000015636 Oligopeptides Human genes 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 108091034057 RNA (poly(A)) Proteins 0.000 description 1
- 229940022005 RNA vaccine Drugs 0.000 description 1
- 108700005075 Regulator Genes Proteins 0.000 description 1
- 101710090029 Replication-associated protein A Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 108020004422 Riboswitch Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 108020004688 Small Nuclear RNA Proteins 0.000 description 1
- 102000039471 Small Nuclear RNA Human genes 0.000 description 1
- 108020003224 Small Nucleolar RNA Proteins 0.000 description 1
- 102000042773 Small Nucleolar RNA Human genes 0.000 description 1
- 238000002105 Southern blotting Methods 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- 206010046865 Vaccinia virus infection Diseases 0.000 description 1
- LNQVTSROQXJCDD-UHFFFAOYSA-N adenosine monophosphate Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(CO)C(OP(O)(O)=O)C1O LNQVTSROQXJCDD-UHFFFAOYSA-N 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 230000000890 antigenic effect Effects 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000019522 cellular metabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000003040 circulating cell Anatomy 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000007847 digital PCR Methods 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 230000000463 effect on translation Effects 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 230000001605 fetal effect Effects 0.000 description 1
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 230000009368 gene silencing by RNA Effects 0.000 description 1
- 102000054767 gene variant Human genes 0.000 description 1
- 201000005787 hematologic cancer Diseases 0.000 description 1
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 1
- 229920001519 homopolymer Polymers 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 239000000411 inducer Substances 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000010189 intracellular transport Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 208000010658 metastatic prostate carcinoma Diseases 0.000 description 1
- 239000011859 microparticle Substances 0.000 description 1
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 229920001223 polyethylene glycol Polymers 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 230000004481 post-translational protein modification Effects 0.000 description 1
- 230000007859 posttranscriptional regulation of gene expression Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 108091007428 primary miRNA Proteins 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000012175 pyrosequencing Methods 0.000 description 1
- 230000036647 reaction Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003757 reverse transcription PCR Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000009738 saturating Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 239000002924 silencing RNA Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 230000005026 transcription initiation Effects 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 210000002993 trophoblast Anatomy 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
- 208000007089 vaccinia Diseases 0.000 description 1
- 230000003966 vascular damage Effects 0.000 description 1
- 230000006492 vascular dysfunction Effects 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6865—Promoter-based amplification, e.g. nucleic acid sequence amplification [NASBA], self-sustained sequence replication [3SR] or transcription-based amplification system [TAS]
Definitions
- the present invention relates to a DNA polynucleotide and a method for in vitro transcribing (IVT) RNA using said DNA polynucleotide.
- the DNA polynucleotide comprises (i) a T7 promoter sequence comprising (1) a T7 core promoter sequence and (2) a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising three guanines at positions +1 to +3, an adenine or thymine at position +4, at least two adenines at positions +4 to +8 and/or a thymine at position +4, and at most one cytosine at positions +4 to +8, and downstream thereof (ii) a nucleotide sequence encoding the RNA to be transcribed.
- the method comprises (a) providing said DNA polynucleotide as an IVT template, and (b) in vitro transcribing said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
- RNAs have been considered as the major determinant for converting genomic information from DNA into biological function.
- research has been focusing on the transcriptome of cells including RNAs that are translated into protein as well as regulatory RNAs.
- Unraveling RNA biology and the biology of diseases associated with RNA dysregulation requires detection, quantification, and characterization of different types of RNAs.
- RNAs exist with strongly varying copy numbers per cell and many approaches require a minimum amount of a given target RNA for further investigation.
- approaches are of interest for amplification of RNAs that maintain the relative abundances of different RNAs present in the original cell or tissue.
- IVT using a T7 promoter has been applied for low-amount DNA sequencing using linear amplification via transposon insertion (LIANTI) that uses the Tn5 transposon to fragment the genome and simultaneously insert a T7 RNA promoter for in vitro transcription (IVT) (Chen, C.
- CRISPRRNAs or (single) guide RNAs can be used for disrupting or restoring the sequence of target genes, and antisense RNAs and/or small- interfering RNAs can be used for translation inhibition of target mRNAs.
- the T7 promoter refers to a 23 nucleotides long sequence comprising a transcription start site (TSS) denoted position +1.
- TSS transcription start site
- the T7 RNA polymerase binds the promoter DNA from nucleotide position -17 to -5 with high specificity, while the DNA double strand is melted from -4 to +2 to prime RNA synthesis from a GTP nucleotide at +1.
- sequences of the T7 core promoter in particular the region from -17 to -1, and the first three transcribed nucleotides (+1 to +3) can affect the transcriptional activity, i.e. the promoter strength (defined by relative production of transcripts from a promoter; Ikeda, Biol. Chem. 267,11322-8, 1992) and that sequences upstream of the T7 core promoter positively affect T7 polymerase binding affinity to an IVT DNA template (Tang et al., Biol. Chem. 280,40707-13, 2005).
- the +1 and +2 nucleotides as guanines (e.g.
- the present application addresses the need for optimized T7 promoter sequences and methods using said promoter sequences for enhanced amplification of antisense RNA (aRNA) and/or improved in vitro transcription of RNAs by providing the embodiments as recited in the claims.
- aRNA antisense RNA
- the present invention relates in preferred aspects to a DNA polynucleotide, a kit comprising said DNA polynucleotide, and a method for in vitro transcribing (IVT) RNA using said DNA polynucleotide.
- said DNA polynucleotide comprises (i) a T7 promoter sequence comprising (1) a T7 core promoter sequence and (2) a directly adjacent downstream flanking region, and downstream thereof (ii) a nucleotide sequence encoding the RNA to be transcribed.
- the downstream flanking region of said DNA polynucleotide comprises eight bases (+1 to +8) comprising ⁇ three guanines at positions +1 to +3 (GGG) ⁇ ,
- the method comprises the steps of (a) providing said DNA polynucleotide as an IVT template, and (b) in vitro transcribing the RNA from said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
- the present invention relates in a further preferred aspect to a method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene, wherein the method comprises the steps of (a) reverse transcribing an RNA to the first strand of a cDNA comprising annealing a DNA polynucleotide according to the present invention, and (b) transcribing the cDNA into RNA using a T7 polymerase.
- T7 promoters comprising
- An optimized T7 promoter according to the present invention can be readily applied in various IVT-based methods. It is especially advantageous when aiming at boosting linear amplification of nucleic acids.
- the T7 promoters of the invention comprising a defined +1 to +8 downstream flanking region ensure efficient amplification of target polynucleotides in a sample while avoiding a bias in IVT efficiency between different target polynucleotides.
- Optimized T7 promoters can be further used for increasing the efficiency of the synthesis of RNA in vitro , e.g. to produce large amounts of unmodified or modified RNA or recombinant proteins, preferably for therapeutic applications.
- An optimized T7 promoter is also highly valuable for IVT-based RNA-sequencing methods such as CEL-seq2, inDrop and MARS-seq as especially linear amplification of antisense RNA (aRNA) can be boosted and/or performed more accurately by using a reverse transcription primer comprising the optimized T7 promoter according to the invention instead of a commercially available reverse transcription primer.
- aRNA antisense RNA
- employing an inventive T7 promoter provided herein in a single cell RNA-seq method allows to detect more gene transcripts, i.e. mRNA of lowly expressed genes, for example of important cell state determinants such as transcription factors and chromatin modifiers.
- an inventive T7 promoter provided herein, which is especially advantageous, for example, in case of analyzing transcriptomes on a single-cell level and/or detecting polynucleotides in liquid biopsies, i.e. in case raw material such as circulating tumor cells (CTC) is scarce and/or precious (Lawson et al., Nat. Cell Biol. 20,1349- 1360 (2018).
- an optimized T7 promoter is also highly valuable for IVT-based DNA- sequencing methods such as LIANTI, CHIL-Seq, Dam-seq (e.g. scDam&T-seq) or the sci-L3 method.
- the present invention relates to a DNA polynucleotide, comprising a T7 promoter comprising a T7 core promoter sequence and a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising ⁇ three guanines at positions +1 to +3 ⁇ ,
- brackets i.e. “ ⁇ “[ ]” and “ ⁇ >”, are used to specify relationships and layers within rules defined herein, with terms and expressions being grouped within an opening and a closing bracket of the same type, e.g. “ ⁇ ”.
- brackets are to be understood as auxiliary means only for visually simplifying specific relationships and layers within rules defined herein.
- polynucleotide refers to a polynucleotide of at least 13 nucleotides covalently bonded in a chain and thus, representing a sequence of nucleotides.
- nucleotide refers to deoxyribonucleotides in case of DNA and cDNA (complementary DNA) molecules and to ribonucleotides in case of RNA molecules.
- Nucleotides consist of a nitrogenous base (also known as nucleobase), a five-carbon sugar (ribose in case of RNA or deoxyribose in case of DNA and cDNA), and at least one phosphate group.
- Nucleotides comprising as the respective nitrogenous base adenine, guanine, cytosine, thymine, and uracil are referred to as A, C, G, T and U nucleotides, respectively. Said nucleotides are preferred, though the term nucleotides can further comprise nucleotide analogues, such as peptide nucleic acids, morpholinos and/or locked nucleic acids, and/or modified nucleotides. Modified nucleotides include for example nucleotides comprising nucleobases that are methylated, formylated or hydroxylated nucleobases.
- modified or analogous nucleobases include for example 5-methyl-cytosine, 5-hydroxymethyl- cytosine, N6-methyladenine, 7-methylguanine, 2-aminopurine, pyrrolo-dC, 5-bromouracil, and hypoxanthine.
- polynucleotide refers to single- or double-stranded DNA (deoxyribonucleic acid) or RNA (ribonucleic acid) molecule built up of A, C, G, T and/or U nucleotides and/or modified versions thereof.
- RNA molecules include, but not limited to mRNA, IncRNAs, circular RNAs, miRNAs, snoRNAs, tRNAs, snRNAs, siRNAs, crRNAs and sgRNAs.
- adenine refers not only to the respective nucleobase, but also to the respective nucleotide or nucleoside.
- an “adenine”, “A” or “adenosine” refers to an “adenine”, “adenosine”, “adenosine triphosphate” (“ATP”), “adenosine diphosphate” (“ADP”) or “adenosine monophosphate” (“AMP”).
- the polynucleotide of the invention is a polynucleotide, preferably a DNA polynucleotide, more preferably a double-stranded DNA polynucleotide.
- a single-stranded polynucleotide consists of one nucleotide sequence and a double-stranded polynucleotide of two complementary nucleotide sequences.
- the term “complementary” refers to nucleotides, the nitrogenous bases of which can naturally bind to each other by hydrogen bonds, i.e. A and T nucleotides, A and U nucleotides as well as C and G nucleotides. Both, a nucleotide and a nucleotide sequence have a 5’ (five prime) end and a 3’ (three prime) end based on the five carbon sites of sugar molecules in adjacent nucleotides.
- a nucleotide sequence is read herein from its respective 5’ to its respective 3’end, wherein the term “downstream” refers to a nucleotide which is in 3’ direction of a reference nucleotide.
- the term “upstream” refers to a nucleotide which is in 5’ direction of a reference nucleotide within a respective nucleotide sequence.
- a nucleotide which is located downstream of a reference nucleotide within a respective nucleotide sequence can be directly, i.e. covalently, linked to the 3’ end of said reference nucleotide, which is referred to herein as “directly adjacent”.
- a nucleotide which is located upstream of a reference nucleotide within a respective nucleotide sequence can be directly, i.e. covalently linked to the 5’ end of said reference nucleotide, which is also referred to herein as “directly adjacent”.
- the polynucleotide, preferably the DNA polynucleotide, according to the present invention comprises a T7 promoter sequence comprising a T7 core promoter sequence and a directly adjacent downstream flanking region.
- promoter and “promoter sequence” are used interchangeably herein and refer to a nucleotide sequence required for initiation of transcription of a target nucleotide sequence by DNA-binding enzymes such as RNA polymerase, transcription factors and/or epigenetic modifiers, e.g. histone acetylases, histone methylases or DNA-methylases.
- DNA-binding enzymes such as RNA polymerase, transcription factors and/or epigenetic modifiers, e.g. histone acetylases, histone methylases or DNA-methylases.
- the promoter is a T7 bacteriophage (Escherichia virus T7) promoter
- the respective enzyme(s) binding to it preferably comprise a T7 RNA polymerase.
- the T7 promoter and the T7 RNA polymerase are derived from a T7 bacteriophage and thus, may comprise the sequence of the naturally occurring T7 bacteriophage core promoter (e.g. SEQ ID NO:l) and RNA polymerase, respectively, and/or may comprise modifications compared to their respective naturally occurring sequence.
- the T7 RNA polymerase may also be a recombinant protein and/or comprise altered post-translational modifications.
- T7 promoter is used interchangeably with the term “T7 promoter sequence” and refers to a nucleotide sequence comprising a transcription start site (TSS) which is denoted as position +1. Positions downstream the TSS are denoted herein with a positive sign (“+”) and positions upstream of the TSS with a negative sign (“-”).
- TSS transcription start site
- the T7 promoter sequence comprises at least a T7 core promoter sequence and a directly adjacent downstream flanking region that is directly, i.e. covalently, linked to the 3’ end of said T7 core promoter sequence.
- T7 core promoter sequence refers to the nucleotide sequence upstream of the TSS, more specifically to the nucleotide sequence from positions -1 to -17 and has the nucleotide sequence TAATACGACTCACTATA (SEQ ID NO:l).
- the downstream flanking region comprises eight positions (positions +1 to +8 of the T7 promoter sequence) that have a nucleotide sequence composed according to certain rules.
- the downstream flanking region comprises
- the TSS is preferably a G nucleotide.
- the two nucleotides at positions +2 and +3 are preferably G nucleotides.
- the number and concrete position of certain nucleotides at positions +4 to +8 influence the activity of the T7 promoter.
- at least three adenines or thymines at positions +4 to +8 are advantageous for enhancing the activity of the T7 promoter.
- the nucleotides at positions +4 to +7 and especially at position +4 have a strong impact on the performance of the T7 promoter.
- the absence of a cytosine is preferred within positions +4 to +7, in particular within positions +4 to +6 and/or in particular when the cytosine directly follows a thymine and/or directly precedes a guanine.
- sequences of covalently linked thymines e.g. TT, TTT or TTTT
- the combination of rules can result in an increased effect compared to sequences according to only one of the respective rules.
- a negative effect of sequences of covalently linked thymines on T7 promoter activity is reduced in the absence of a guanine or cytosine at positions +4 to +8 or the presence of an adenine at position +4.
- a negative effect of a cytosine within positions +4 to +6 on T7 promoter activity is reduced when at least three adenines are present at positions +4 to +8.
- a guanine at position +5 has a positive effect on T7 promoter activity in combination with an adenine or thymine at position +4 and/or an adenine at position +6.
- GGGAGA and GGGTGA at positions +1 to +6 can have positive effects on T7 promoter activity despite the presence of a guanine within positions +4 to +8.
- the downstream flanking region further comprises at least two adenines at positions +4 to +8.
- downstream flanking region further comprises
- the downstream flanking region does not comprise three consecutive thymines.
- downstream flanking region does not comprise three consecutive thymines within positions +4 to +8. In certain embodiments, the downstream flanking region does not comprise
- the downstream flanking region does not comprise two consecutive thymines within positions +4 and +7.
- the downstream flanking region does not comprise a thymine followed by a cytosine within positions +4 and +7.
- the downstream flanking region does not comprise a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8.
- the downstream flanking region does not comprise a cytosine within positions +4 to +6.
- the downstream flanking region does not comprise a cytosine followed by a guanine within positions +4 to +7.
- the downstream flanking region comprises ⁇ [at least three adenines or thymines at positions +4 to +8] and/or
- cytosine or guanine is present at positions +4 to +8> or ⁇ a thymine, cytosine or guanine is present at position +4> ] ⁇ ,
- downstream flanking region further comprises an adenine at position +4.
- the downstream flanking region has the sequence GGGAGAGT.
- the downstream flanking region further comprises at most one guanine or cytosine at positions +4 to +8.
- an adenine at position +4 improves the T7 promoter strength compared to any other nucleotide at this position. It was further found, that downstream sequences are advantageous that do not comprise a guanine within positions +4 to +8 except in the case the nucleotide is covalently linked to an adenine in the context of an AGA (or TGA) nucleotide sequence within positions +4 to +6. Thus, more than one guanine or cytosine within positions +4 to +8 is preferably avoided for increasing the activity of the T7 promoter.
- downstream flanking region comprises
- cytosine or guanine is present at positions +4 to +8> or ⁇ a thymine, cytosine or guanine is present at position +4> ] ⁇ ,
- downstream flanking region further comprises (an adenine at position +4 ⁇ and (at most one guanine or cytosine at positions +4 to +8 ⁇ .
- the downstream flanking region is selected from the group consisting of GGGAAATA, GGGAAAAT, GGGAATAT, GGGATAAT, GGGAGAAT, GGGAATAC, GGGAAGTA, GGGAGATT, GGGAGATA, GGGAAATG, GGGAAAAC and GGGAAAGT.
- position +8 has the smallest effect on the T7 promoter activity among positions +4 to +8.
- the downstream flanking region is selected from the group consisting of GGGAAATN, GGGAAAAN, GGGAATAN, GGGATAAN, GGGAGAAN, GGGAATAN, GGGAAGTN, GGGAGATN, GGGAGATN, GGGAAATN, GGGAAAAN and GGGAAAGN; wherein N denotes A, C, G, T or U.
- the downstream flanking region comprises three guanines at positions +1 to +3 ⁇ ,
- ⁇ a cytosine or guanine is present at positions +4 to +8> or
- the downstream flanking region comprises ⁇ three guanines at positions +1 to +3 ⁇
- the downstream flanking region does not comprise a thymine at position +5 followed by a guanine at position +6.
- the downstream flanking region does not comprise two consecutive thymines within positions +4 to +8.
- the downstream flanking region does not comprise ⁇ a thymine at position +5 followed by a guanine at position +6 ⁇ and ⁇ two consecutive thymines within positions +4 to +8 ⁇ .
- downstream flanking region comprises
- downstream flanking region further comprises
- the downstream flanking region is selected from the group consisting of GGGAAATA, GGGAAAAT, GGGAATAT, GGGATAAT, GGGAGAAT, GGGAATAC and GGGAAGTA.
- the downstream flanking region is selected from the group consisting of GGGAAATN, GGGAAAAN, GGGAATAN, GGGATAAN, GGGAGAAN, GGGAATAN and GGGAAGTN; wherein N denotes A, C, G, T or U.
- the downstream flanking region comprises ⁇ three guanines at positions +1 to +3 ⁇ ,
- the downstream flanking region comprises (three guanines at positions +1 to +3 ⁇ ,
- the downstream flanking region further comprises an adenine at position +4, three to four adenines at positions +4 to +8, and no guanines or cytosines at positions +4 to +8.
- downstream flanking region comprises
- downstream flanking region further comprises ⁇ an adenine at position +4 ⁇ and
- downstream flanking region does not comprise
- downstream flanking region further comprises
- downstream flanking region has the sequence GGGATAAT.
- the downstream flanking region has a sequence selected from the group consisting of GGGADDDN, wherein D denotes A, G or T/U, and wherein N denotes A, C, G, T or U.
- the sequence of the downstream flanking region is selected from the group consisting of GGGADDWH, preferably from GGGAWWWW, wherein D denotes an A, G or T/U, wherein W denotes A or T/U, and wherein H denotes A, C or T/U.
- the downstream flanking region comprises three guanines at positions +1 to +3, an adenine at position +4, three to four adenines at positions +4 to +8, no guanine or cytosine at positions +4 to +8, and no consecutive thymines within positions +4 and +8.
- the downstream flanking region has the sequence GGGAAATA (see the T7 promoter sequence as set forth in SEQ ID NO:21), GGGAAAAT (see the T7 promoter sequence as set forth in SEQ ID NO:23), GGAATAT (see the T7 promoter sequence as set forth in SEQ ID NO:25), GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17), or GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19).
- the downstream flanking region has the sequence GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17) or GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19), more preferably GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17).
- the downstream flanking region is selected from the group consisting of GGGAAATN, GGGAAAAN, GGAATAN, GGGATAAN, or GGGAGAGN; wherein N denotes A, C, G, T or U.
- the length of the primer may be as small as possible.
- the T7 promoter of the invention may be used for improving a pre-existing protocol, it may be preferable to modify the pre-existing protocol as little as possible while profiting from the enhanced T7 promoter activity according to the invention.
- the downstream flanking region of the inventive T7 promoter may overlap with the sequencing adapter further downstream thereof.
- the pre-existing sequencing adapter may be part of the downstream flanking region, i.e. at positions +6 to +8 or +7 to +8.
- the downstream flanking region further comprises the sequence AGT at positions +6 to +8, or the sequence AG at positions +7 to +8, preferably AGT at positions +6 to +8.
- this rule may be combined with other rules for the composition of the downstream flanking region, as provided herein, as far as there is no conflict.
- T7 promoters with such a downstream flanking region i.e. with AGT at positions +6 to +8) may be particularly useful in some embodiments of the inventive sequencing methods provided herein.
- the downstream flanking region has the sequence GGGAAAGT, GGGTAAGT, GGGATAGT, GGGTGAGT, or GGGAGAGT, preferably GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19).
- the downstream flanking region has the sequence GGGAAAAG, GGGAATAG, GGGATAAG, GGGTGAAG, GGGTAAAG, GGGAGAAG, GGGTGTAG, GGGAGTAG, GGGATTAG.
- the downstream flanking region has the sequence GGGAAATA (see the T7 promoter sequence as set forth in SEQ ID NO:21), GGGAAAAT (see the T7 promoter sequence as set forth in SEQ ID NO:23), GGAATAT (see the T7 promoter sequence as set forth in SEQ ID NO:25), GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17), GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19), GGGAAAGT, GGGTAAGT, GGGATAGT or GGGTGAGT, preferably GGGAGAGT.
- the T7 promoter of the invention is selected primarily based on the 5’RACE-seq rank (Table 1).
- the sequence of the downstream flanking region is selected from the group consisting of the “Pos. +1 to +8” sequences in Table 1 of ranks 1 to 510, ranks 1 to 176, ranks 1 to 132 or ranks 1 to 66, preferably ranks 1 to 66.
- the sequence of the downstream flanking region is selected from the group consisting of the “Pos. +1 to +8” sequences in Table 1 of ranks 1 to 64, ranks 1 to 48, ranks 1 to 32, ranks 1 to 24, or ranks 1 to 12.
- sequence of the downstream flanking region does not have the sequence GGGGTTCA, GGGAGTTC or GGGATACC. In some embodiments, the sequence of the downstream flanking region does not have the sequence GGGTAGAT.
- nucleotide sequences comprising AATT, in particular GAATT, directly adjacent upstream of the T7 core promoter further enhance the T7 promoter strength, especially in combination with a downstream flanking region according to the present invention.
- Further increasing the strength of an optimized T7 promoter sequence comprising a downstream flanking region as shown above by combining it with an upstream flanking region according to the present invention is particularly advantageous for approaches based on scarce and/or precious raw material (e.g. liquid biopsies) such as in case of in vitro transcription (IVT) when the concentration of the IVT template is low, e.g. very low as described herein, and/or analyses are performed, for example, on a single-cell level, or on less than 100 cells, preferably less than 10 cells, very preferably a single cell.
- upstream flanking region refers to the sequence directly adjacent upstream of the T7 core promoter sequence and thus, covalently bound to its 5’ end.
- the upstream flanking region comprises 4 or 5 nucleotides, preferably 5 nucleotides.
- the upstream flanking region has the sequence AATT, GAATT, GATTT or GAAAT, preferably AATT or GAATT, even more preferably GAATT (see e.g. the T7 promoter sequences as set forth in SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26).
- the T7 promoter further comprises four bases consisting of adenines and/or thymines directly upstream of the T7 core promoter.
- it further comprises a guanine directly upstream of said adenines and/or thymines.
- the T7 promoter comprises a downstream flanking region of the invention and an upstream flanking region of the invention.
- the T7 promoter further comprises the sequence AATT directly upstream of the T7 core promoter sequence.
- the T7 promoter further comprises the sequence GAATT directly upstream of the T7 core promoter sequence.
- the DNA polynucleotide according to the present invention can further comprise downstream of the T7 core promoter sequence a nucleotide sequence encoding an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA or CRISPRRNA.
- a nucleotide sequence encoding an mRNA or non-coding RNA preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA or CRISPRRNA.
- the terms “encoding” or “encode” refer to genetic information comprised in a DNA sequence that can be transcribed into an RNA molecule, i.e. that can be used as a template for synthesis of an RNA molecule.
- nucleotide sequence preferably refers to the strand that comprises the coding strand.
- a coding strand comprises a sequence identical to the RNA it encodes.
- a sequence is identical to another sequence if each nucleotide at each position is the same, except that a thymine can be in place of an uracil and an uracil can be in place of a thymine according to the nucleotides comprised in DNA and RNA, respectively.
- the encoded RNA can be an in-vitro transcribed RNA according to the invention, in particular an mRNA or a non-coding RNA.
- mRNA molecule and “mRNA” are used interchangeably and refer to a class of RNA molecules, preferably single-stranded sequences built up of A, C, G, and U nucleotides that contain one or more coding sequences. Said one or more coding sequences can be used for example as a translation template during synthesis of an amino acid sequence and thus, for converting genomic information stored in an mRNA molecule into an amino acid sequence.
- mRNA should be understood to mean any RNA molecule which is suitable for the expression of an amino acid sequence or which is translatable into an amino acid sequence such as a protein.
- an mRNA comprises preferably a 5’ UTR, a coding sequence and a 3’ UTR.
- the 5’ end of the 5’ UTR is defined by the TSS and its 3’ end is followed by the coding sequence.
- the coding sequence is delineated by the start and the stop codon, i.e. the first and the last three nucleotides of the mRNA that can be translated, respectively.
- the 3’ UTR starts after the stop codon of the coding sequence and can be followed by a poly A tail.
- the 5’ UTR generally comprises at least one ribosomal binding site (RBS) such as the Shine- Dalgarno sequence in prokaryotes and the Kozak sequence or translation initiation site in eukaryotes.
- RBS ribosomal binding site
- the activity of a given RBS can be optimized by varying its length and sequence as well as its distance to the start codon.
- the 5’ UTR comprises internal ribosome entry sites or IRES.
- a 5’ UTR can comprise one or more additional regulatory sequences such as a binding site for an amino acid sequence that enhances the stability of the mRNA, a binding site for an amino acid sequence that enhances the translation of the mRNA, a regulatory element such as a riboswitch, a binding site for a regulatory RNA molecule such as an miRNA, and/or a nucleotide sequence that positively affects initiation of translation.
- additional regulatory sequences such as a binding site for an amino acid sequence that enhances the stability of the mRNA, a binding site for an amino acid sequence that enhances the translation of the mRNA, a regulatory element such as a riboswitch, a binding site for a regulatory RNA molecule such as an miRNA, and/or a nucleotide sequence that positively affects initiation of translation.
- the mRNA may optionally comprise a 3’ UTR.
- the 3’ UTR may comprise one or more regulatory sequences such as a binding site for an amino acid sequence that enhances the stability of the mRNA molecule, a binding site for a regulatory RNA molecule such as a miRNA, and/or a signal sequence involved in intracellular transport of the mRNA.
- the coding sequence of an mRNA comprises codons that can be translated into an amino acid sequence.
- the coding sequence can contain the codons of a naturally occurring coding sequence or it can be a partially or completely synthetic coding sequence. Alternatively, the coding sequence can be a partly or fully codon optimized sequence derived from the natural sequence to be used. Most of the amino acids are encoded by more than one codon, i.e. three consecutive nucleotides of an mRNA that can be translated into an amino acid. Codons exist that are used preferentially in some species for a given amino acid.
- codons can enhance the amount of amino acid sequences translated based on a given mRNA molecule compared to the same mRNA molecule but comprising comparably rare codons.
- amino acid sequence encompasses any kind of amino acid sequence, i.e. chains of two or more amino acids which are each linked via peptide bonds and refers to any amino acid sequence of interest.
- the encoded amino acid sequence is at least 5 amino acids long, more preferably at least 10 amino acids, even more preferably at least 50, 100, 200 or 500 amino acids.
- amino acid sequence covers short peptides, oligopeptides, polypeptides, fusion proteins, proteins as well as fragments thereof, such as parts of known proteins, preferably functional parts.
- the function of the encoded amino acid sequence in the cell or in the vicinity of the cell is needed or beneficial, e.g. an amino acid sequence the lack or defective form of which is a trigger for a disease or an illness, the provision of which can moderate or prevent a disease or an illness, or an amino acid sequence which can promote a process which is beneficial for the body, in a cell or its vicinity.
- the encoded amino acid sequence can be the complete amino acid sequence or a functional variant thereof.
- the encoded amino acid sequence can act as a factor, inducer, regulator, stimulator or enzyme, or a functional fragment thereof, where this amino acid sequence is one whose function is necessary in order to remedy a disorder, in particular a metabolic disorder or in order to initiate processes in vivo such as the formation of new blood vessels, tissues, etc.
- functional variant is understood to mean a fragment which in the cell can undertake the function of the amino acid sequence whose function in the cell is needed or the lack or defective form whereof is pathogenic.
- such an amino acid sequence is advantageous with respect to applications in supplemental or medical purposes to generate or regenerate physiological functions caused by suboptimal amino acid sequence biosynthesis and thus also to favorably influence directly or indirectly the course of diseases.
- a preferred coding sequence downstream of the T7 core promoter encodes a transposase or a genome editing enzyme.
- a transposase refers to an enzyme that binds to the end of a transposon and catalyzes its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism.
- Preferred transposases include Sleeping Beauty transposases and PiggyBac transposases.
- a genome editing enzyme refers to an enzyme which recognizes a specific nucleotide sequence, preferably a DNA sequence, and cuts the nucleotide sequence at or nearby the recognition position.
- Some genome editing enzymes in particular CRISPR associated proteins, such as, but not limited to Cas9, require a polynucleotide, for example, a CRISPR RNA or guide RNA, which guides them to the specific target sequence.
- Some genome editing enzymes, in particular ZNFs and TALENs recognize the specific target sequence by themselves.
- Preferred genome editing enzymes include zinc finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALENs), and CRISPR-associated proteins such as Cas9, Casl2b or Cpfl, preferably Cas9.
- ZFNs zinc finger nucleases
- TALENs transcription activator-like effector-based nucleases
- CRISPR-associated proteins such as Cas9, Casl2b or Cpfl, preferably Cas9.
- a CRISPR-associated protein is an enzyme that uses CRISPR sequences as a guide to recognize and cleave specific strands of DNA that are complementary to the CRISPR sequence.
- Zinc-finger nucleases (ZFNs) are artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain. Zinc finger domains can target specific desired DNA sequences, which enables zinc-finger nucleases to target unique sequences even within complex genomes.
- the DNA-binding domains of individual ZFNs typically contain between three and six individual zinc finger repeats and can each recognize between 9 and 18 base pairs.
- the non-specific cleavage domain from the type IIs restriction endonuclease Fokl is typically used as the cleavage domain in ZFNs.
- Transcription activator-like effector nucleases are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA- binding domain to a DNA cleavage domain (a nuclease which cuts DNA strands).
- Transcription activator-like effectors can be engineered to bind to practically any desired DNA sequence, so when combined with a nuclease, DNA can be cut at specific locations.
- the DNA binding domain contains a repeated highly conserved 33-34 amino acid sequence with divergent 12th and 13th amino acids. These two positions, referred to as the Repeat Variable Diresidue (RVD), are highly variable and show a strong correlation with specific nucleotide recognition.
- RVD Repeat Variable Diresidue
- non-coding RNA refers preferably to regulatory RNA such as microRNA (miRNA), small-interfering RNA (siRNA), antisense RNA, CRISPR RNA (crRNA), single guide RNA (sgRNA), guide RNA (gRNA), antagoNAT, or a precursor thereof such as pri-miRNA, pre-miRNA or short-hairpin RNA (shRNA).
- miRNA microRNA
- siRNA small-interfering RNA
- crRNA CRISPR RNA
- sgRNA single guide RNA
- gRNA guide RNA
- antagoNAT antagoNAT
- a precursor thereof such as pri-miRNA, pre-miRNA or short-hairpin RNA (shRNA).
- a microRNA refers to a small non-coding RNA molecule (comprising about 22 nucleotides) that functions in RNA silencing and post-transcriptional regulation of gene expression. miRNAs function via base-pairing with complementary sequences within mRNA molecules. As a result, these mRNA molecules cleaved, destabilized by shortening the polyA tail of the mRNA and/or less efficiently translated into proteins by ribosomes. Hence, microRNA molecules can reduce the amount of protein produced by a respective mRNA.
- small interfering RNA also known as short interfering RNA or silencing RNA refer to double-stranded RNA molecules of 20 to 25 nucleotides in length. siRNA molecules are involved in the degrading mRNA after transcription, thus preventing their translation into proteins.
- An antisense RNA refers to a single stranded RNA that is complementary to another polynucleotide.
- an antisense RNA is a non-coding RNA.
- an antisense RNA can hybridize to the complementary polynucleotide.
- the complementary polynucleotide is an RNA, preferably an mRNA.
- the antisense RNA is used for inhibiting the translation of a complementary mRNA into protein by hybridization of the two RNAs. Preferably, the inhibition occurs within a living cell.
- the antisense RNA is used for detecting a further polynucleotide by hybridization of the RNA to the polynucleotide.
- the polynucleotide to be detected is an RNA, preferably an mRNA.
- the hybridization can occur outside or within a cell.
- said cell is fixed.
- An antisense RNA which is used for detecting another polynucleotide is also termed “probe”.
- the antisense RNA is an amplified antisense RNA which is generated in the linear amplification method according to the invention.
- the amplified antisense RNA may be transcribed from a double-stranded DNA.
- the amplified antisense RNA is transcribed from a cDNA.
- the cDNA has been reverse-transcribed from an RNA, preferably an mRNA, by using the DNA polynucleotide of the invention as reverse transcription primer.
- RNA refers to an amplified antisense RNA.
- An antagoNAT refers to a single stranded oligonucleotide which can inhibit a naturally occurring antisense RNA and thus, can be used for increasing gene expression, in particular by antagonizing said antisense RNA.
- CRISPR refers to a “clustered regularly interspaced short palindromic repeats” sequence, and in particular to a CRISPR RNA (crRNA) and a single-guide RNA (sgRNA).
- crRNA CRISPR RNA
- sgRNA single-guide RNA
- a crRNA comprises a guide RNA which is an approximately 20 nucleotides long sequence and which guides genome editing enzymes such as CRISPR associated proteins to the respective nucleotide sequence to be edited which is preferably comprised in a double-stranded genomic DNA molecule.
- CRISPR associated proteins further require a trans-activating CRISPR RNA (tracrRNA), which can be provided as a single guide RNA (sgRNA) and thus a fusion product comprising at least one crRNA and a trans-activating CRISPR RNA (tracrRNA).
- the DNA polynucleotide according to the present invention can comprise downstream of the T7 core promoter sequence a nucleotide sequence encoding an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA or CRISPR RNA, wherein the said nucleotide sequence is not, partially, or completely overlapping with the downstream flanking region directly adjacent to the T7 core promoter sequence.
- said nucleotide sequence is not or partially overlapping with the downstream flanking region directly adjacent to the T7 core promoter sequence.
- the DNA polynucleotide according to the present invention can further comprise a polyT- stretch, wherein the polyT-stretch is a nucleotide sequence of 9 to 35 consecutive thymines, preferably 22 to 27 consecutive thymines, more preferably 24 consecutive thymines, preferably followed by an adenine, cytosine or guanine, preferably at the 3’ end of the DNA polynucleotide.
- polyT-stretch refers to a nucleotide sequence consisting of consecutive, i.e. covalently linked thymines. Said polyT-stretch is especially advantageous for using the DNA polynucleotide as a reverse transcription primer for reverse transcription of an RNA, e.g. an mRNA, into a cDNA.
- a “primer”, as used herein, refers to a poly- or oligonucleotide which comprises a sequence complementary to a sequence comprised in another poly- or oligonucleotide.
- a primer can be used for example for PCR, reverse transcription and/or the synthesis of the complementary strand of a DNA molecule.
- the DNA polynucleotide according to the present invention comprises downstream of the T7 core promoter sequence a polyT-stretch.
- Embodiments comprising the DNA polynucleotide and a polyT-stretch are especially advantageous for use as a reverse transcription primer.
- the DNA polynucleotide according to the present invention can further comprise a polyT, and a sequencing adapter, wherein the sequencing adapter is a nucleotide sequence comprising a nucleotide sequence comprised in a library preparation primer, wherein the library preparation primer comprises a flow cell binding sequence, which is complementary to an oligo- or polynucleotide present at the surface of the flow cell of a next-generation-sequencing device, and wherein the sequencing adapter does not comprise more than eight consecutive thymines and/or only thymines.
- a sequencing adapter refers to a polynucleotide which functions as a binding sequence for a polymerase chain reaction (PCR) primer and/or library preparation primer.
- a library preparation primer comprises a flow cell binding sequence (FCBS) and a sequence which is identical or highly similar to a sequence comprised in the sequencing adapter, such that the LPP can bind to the complementary sequence of the sequencing adapter.
- FCBS flow cell binding sequence
- a flow cell binding site refers to an oligo- or polynucleotide which binds to a complementary sequence immobilized at the surface of a flow cell.
- a flow cell is a part of a sequencing device that can be loaded with the material to be sequenced, wherein a sequencing device refers to a device for determining the sequence of a polynucleotide.
- the sequencing adapters are sequencing adapters used in next- generation sequencing methods comprising library preparation methods and in particular Illumina library preparation protocols.
- a preferred sequencing adapter sequence is a sequence comprised in an Illumina library preparation primer such as PEI, PE2 or PE2-N6 and suitable flow cell binding sequences are for example comprised in Illumina P5 and P7 oligo- or polynucleotides.
- an oligonucleotide as used herein, the same applies as it has been described in the context of a polynucleotide, except its length.
- a polynucleotide can comprise more than two, preferably 24 or more, or even up to 250 nucleotides
- an oligonucleotide is to be understood as a sequence of two to 500, preferably 2 to 250 covalently linked nucleotides.
- the other features of such an oligonucleotide can be as described above.
- the DNA polynucleotide according to the present invention comprises a T7 promoter sequence and a sequencing adapter. This is especially advantageous for use of the DNA polynucleotide as a reverse transcription primer in approaches that comprise a sequencing step following reverse transcription of a nucleotide sequence encoding for example an mRNA, sgRNA, guide RNA or CRISPR RNA.
- the sequencing adapter can optionally at least partially overlap with the T7 promoter sequence and/or the polyT-stretch.
- the sequencing adapter is directly adjacent downstream of the T7 promoter sequence.
- the DNA polynucleotide according to the present invention further comprises a barcode and/or a unique molecular identifier (UMI).
- a barcode refers to an oligonucleotide which can be used to uniquely label e.g. an RNA of a specific sample, in particular of one specific cell.
- An UMI refers herein to an oligonucleotide which can be used to specifically label one specific RNA molecule.
- the DNA polynucleotide comprises a sequencing adapter downstream of the T7 core promoter and a polyT-stretch downstream of the sequencing adapter and not overlapping with the sequencing adapter.
- the present invention relates to a method for in vitro transcribing (IVT) RNA, said method comprising the steps of (a) providing a DNA polynucleotide according to the present invention as an IVT template and downstream thereof a nucleotide sequence encoding the RNA to be transcribed, and (b) in vitro transcribing said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
- IVTT in vitro transcribing
- IVT refers to a process, wherein RNA is produced in vitro and thus, in a system that is neither a cell nor comprised in a cell.
- the term “cell” refers to a living organism or a unit of a multicellular organism or an explant thereof which can preferably be cultured.
- In vitro transcription requires an IVT template comprising a promoter and a template sequence downstream thereof, as well as ribonucleotide triphosphates and an RNA polymerase.
- IVT can be used for linear amplification of a polynucleotide.
- linear amplification refers to increasing the amount of a polynucleotide over time in any relationship which resembles more a linear relationship than an exponential relationship at least for a certain period of time, for example 1 min, 10 min, 0.5 h, 1 h, 5 h, 10 h or 1 day.
- Linear amplification is advantageous over exponential amplification methods such as PCRfor certain applications, for example, but not limited to single-cell sequencing, in particular by reducing the amplification bias. Reducing an amplification bias is especially important when the starting material is scarce, such as the material obtained from a few cells or a single cell.
- the in vitro transcribed RNA is linearly amplified during IVT.
- linear amplification of a polynucleotide comprises the amplification of the polynucleotide as RNA or DNA having the identical and/or complementary sequence of said polynucleotide.
- IVT in vitro transcribing
- the DNA polynucleotide of the present invention is used as the IVT template.
- the IVT template the same applies as it has been described above in connection with the DNA polynucleotide according to the present invention.
- the other features of such an IVT template can be as described above.
- the DNA polynucleotide provided in step (a) comprises a T7 promoter sequence comprising a T7 core promoter sequence and a directly adjacent downstream flanking as defined above and a nucleotide sequence encoding the RNA to be transcribed, wherein said RNA is preferably an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA and/or CRISPR RNA.
- the nucleotide sequence encoding the RNA to be transcribed is an mRNA that encodes a transposase or a gene editing enzyme.
- the transposase is a Sleeping Beauty transposase.
- the genome editing enzyme is a CRISPR enzyme, for example, Cas9, Casl2b or Cpfl.
- the DNA polynucleotide which is used as an IVT template is particularly suitable for large-scale synthesis of RNA in vitro.
- the IVT template is preferably linear, doubled-stranded and/or purified, even more preferably double-stranded, linearized and purified.
- the DNA nucleotide of the invention is comprised in a plasmid such as a cloning and/or an expression vector.
- a linear IVT template can be obtained, for example, by linearizing a DNA plasmid comprising the IVT template by cutting the DNA plasmid using a restriction enzyme.
- a plasmid or vector, as used herein, refers to a double-stranded DNA molecule which can replicate within a bacterial host, preferably an E. coli strain.
- a plasmid or vector preferably comprises an origin of replication, a selection marker, such as a gene conferring resistance to an antibiotic, a cloning site such as restriction enzyme site, a recombination region, a promoter, an enhancer, a transcription termination site, a ribosomal binding site, a Shine Dalgarno and/or a Kozak sequence.
- a selection marker such as a gene conferring resistance to an antibiotic
- a cloning site such as restriction enzyme site, a recombination region, a promoter, an enhancer, a transcription termination site, a ribosomal binding site, a Shine Dalgarno and/or a Kozak sequence.
- Suitable plasmids as backbone are, for example, but not limited to pT7 FLAG (Merck), pT7 MAT (Merck), GatewayTM pDESTTM (Thermo Fisher) and pRSET (Thermo Fisher) of derivatives thereof.
- purification of the preferably linear and/or doubled- stranded IVT template is preferably performed using a spin-column such as a Monarch® PCR & DNA Cleanup Kit.
- a spin-column such as a Monarch® PCR & DNA Cleanup Kit.
- in vitro transcribing the IVT template is performed in the presence of ribonucleotide triphosphates which are preferably adenosine, uridine, cytidine and guanosine triphosphates or modified versions thereof.
- the T7 polymerase is preferably a T7 polymerase that is capable of binding to the T7 promoter.
- In vitro transcribing the IVT template comprises using the IVT template as a template during transcription, i.e. the synthesis of an RNA molecule.
- said RNA has a nucleotide sequence that is identical to the sequence of the IVT template (identical to the coding strand and complementary to the template strand of the IVT template) except that instead of T nucleotides U nucleotides are comprised in the synthesized RNA molecule.
- RNA is used as a template for the synthesis of a cDNA whereof the first strand has a sequence complementary to the sequence of the RNA and the second strand a sequence which is identical to the RNA except that instead of U nucleotides T nucleotides are comprised in the synthesized cDNA molecule.
- the DNA polynucleotide which is used as an IVT template is present in a high concentration in step (b), wherein the term “a high concentration” refers to a concentration of more than 30 ng/m ⁇ , preferably of more than 40 ng/pl.
- very low concentration refers to a concentration of less than 1 ng/pl, preferably of less than 100 pg/pl, e.g. less than 50 pg/pl, and even more preferably of less than 10 pg/pl, in particular less than 1 pg/pl, or even less than 0.1 pg/pl, e.g. 0.05 pg/pl.
- a very low concentration is more than 0, e.g. at least 0.005 pg/pl, 0.01 pg/pl or 0.05 pg/pl.
- the term “pg” refers to “picogram”, the term “ng” to “nanogram”, and the terms “pi” or “ul” to “microliter”.
- the DNA polynucleotide which is used as an IVT template is present in a very low concentration in step (b), wherein the term “a very low concentration” refers to a concentration of less than 1 ng/pl, preferably of less than 100 pg/pl, e.g. less than 50 pg/pl and even more preferably of less than 10 pg/pl, in particular less than 1 pg/pl, or even less than 0.1 pg/pl, e.g. 0.05 pg/pl.
- Said DNA polynucleotide or IVT template may be a mix of different DNA molecules, e.g.
- each of said DNA molecules comprises the T7 promoter of the invention, i.e. the same T7 core promoter and the same inventive upstream and downstream flanking regions.
- Said mix of DNA molecules may be obtained, for example, by reverse transcribing the mRNA from a few cells, i.e. less than 100 cells, preferably less than 10 cells, or, very preferably, from a single cell, and/or a liquid biopsy, with the reverse transcription primer comprising the inventive T7 promoter provided herein.
- the resulting cDNA (IVT template) of the single cells (or different samples) may be pooled if the reverse transcription primers have a barcode as described herein, but the concentration may still be very low as defined herein.
- the same principle applies to synthesizing the complementary strand of a target DNA polynucleotide, i.e. in a PCR reaction, from such few cells or a single cell and/or a liquid biopsy with a primer comprising the inventive T7 promoter and using the resulting library of DNA polynucleotides, i.e. double-stranded DNA molecules, which have incorporated said T7 promoter as IVT template, as described herein.
- the primers used for the PCR reaction may be complementary to different target DNA molecules, and thus the resulting IVT template may be a mix of different DNA molecules comprising the same inventive T7 promoter, which may be present at a very low concentration, as described herein.
- the IVT template comprising the T7 promoter of the invention may be obtained from at least one target polynucleotide that is present freely in a body fluid, such as blood, urine or cerebrospinal fluid.
- a body fluid such as blood, urine or cerebrospinal fluid.
- the IVT template may be obtained by reverse-transcription, and in case said target polynucleotide is a DNA, the IVT template may be obtained by synthesizing the complementary strand, i.e. in a PCR reaction, as described herein.
- the IVT template may be also obtained by attaching and/or integrating the DNA polynucleotide of the invention harboring the inventive T7 promoter to the target polynucleotide via a transposon system i.e. Tn5 mediation transposition, as described herein.
- the IVT template may be present at a very low concentration as described herein.
- a target polynucleotide e.g. a mRNA
- a target polynucleotide or at least one target polynucleotide refers, in particular, to at least one polynucleotide to be analyzed in a sample, e.g. one or a few cells, a liquid biopsy, and/or a body fluid.
- the individual molecules of the target polynucleotide are not necessarily fully identical although they comprise the same or a very similar nucleotide sequence to which the reverse transcription primer or primer comprising the T7 promoter of the invention can bind.
- a very similar nucleotide sequence in this context may have at most 5, 4, 3, 2 or 1 nucleotide mismatches, e.g. due to SNPs, deletions and/or insertions, compared to the reference sequence which is complementary to a sequence comprised in the reverse-transcription primer or primer, and allows binding of the (reverse transcription) primer.
- a target polynucleotide may refer to the mRNA molecules with a poly-A-tail in a sample, wherein the reverse transcription primer comprising a poly-T-stretch as provided herein binds to said poly-A-tail.
- a target polynucleotide may also refer, for example, to a polynucleotide comprising a unique nucleic acid sequence, and the reverse-transcription primer or primer comprising the complementary sequence of said unique sequence binds to said unique sequence.
- the target polynucleotide and the reverse-transcription primer or primer may be a mix of different polynucleotides, wherein the target polynucleotide is a plurality of polynucleotides which may comprise different unique nucleic acid sequences to which the reverse-transcription primer or primer mix binds.
- the target polynucleotide refers to a genomic DNA sequence to which the DNA polynucleotide of the present invention comprising the inventive T7 promoter is attached and/or integrated (i.e. covalently linked via the DNA backbone).
- said attachment and/or integration may be mediated by a transposase (transposon system), i.e a Tn5 transposase, e.g. as described in Harada (2019), Nat. Cell Biol. 21.
- the DNA polynucleotide of the invention further comprises a transposase binding sequence, i.e.
- Tn5 transposase binding sequence which may be also known in the context of the Tn5 transpose as a “mosaic end”.
- Said transposase binding sequence is preferably downstream of the +1 to +8 downstream flanking region of the T7 promoter provided herein.
- the individual molecules of the target polynucleotide may be structurally unrelated, wherein the target polynucleotide refers to genomic DNA sequences which are in proximity (e.g. within 5000 bp, 1000 bp, 500 bp, 200 bp or 100 bp, preferably within 200 bp) to a certain chromatin modification, e.g.
- the DNA polynucleotide may be conjugated to a molecule (e.g. an antibody) which binds to a certain DNA binding molecule, as described herein.
- a molecule e.g. an antibody
- the DNA polynucleotide of the invention may be conjugated to an antibody, e.g.
- ChIL-Seq is a suitable, non-limiting, example for using the T7 promoter of the invention for transposon-mediated attachement/integration to genomic DNA sites (target polynucleotides) (Harada (2019), Nat. Cell Biol. 21).
- a T7 promoter sequence is fused at the 5’end to an Illumina adapter and a Tn5 recognition sequence and covalently coupled at the 3’end to an antibody with specificity for chromatin components i.e. specific histone modifications, and the Tn5 recognition sequence is loaded with Tn5 transposase.
- the cells e.g. 100-1000 cells, or a single cell
- containing the target polynucleotide of interest are then fixed, permeabilized and incubated with the Antibody-T7 promoter-Tn5 complex.
- the DNA polynucleotide harboring the T7 promoter sequence Upon binding of the antibody to its target epitopes on the chromatin (DNA-binding molecules), the DNA polynucleotide harboring the T7 promoter sequence is covalently linked to the DNA backbone via Tn5 mediated transposition. The opposite DNA strand is repaired by incubation of the cells with DNA ligase, and DNA sequences adjacent to the integration sites (IVT template) are then in vitro transcribed into RNA by T7 polymerase. Conversion of the in vitro transcribed and amplified RNA into sequencing libraries followed by deep sequencing enables identification and/or quantification of the genomic antibody binding sites (i.e the target polynucleotide(s) in proximity to the DNA binding molecules that are recognized by the antibody).
- a target polynucleotide or at least one target polynucleotide refers, in particular, to those polynucleotide molecules (e.g. in a sample) which are converted to an IVT template (comprising a T7 promoter according to the invention) as described herein (e.g. by reverse-transcription and/or synthesis of the complementary strand, or transposase-mediated integration/attachment). Therefore, the concentration, e.g. the very low concentration, of a target polynucleotide or at least one target polynucleotide, as used herein, refers, in particular, to the total concentration of those polynucleotide molecules (e.g. in a sample) which are converted to an IVT template.
- the concentration of the IVT template is similar as the concentration of the corresponding target polynucleotide or at least one target polynucleotide, wherein “similar” may include a 5- fold, 3-fold, 2-fold, 1.5-fold, 1.2-fold, 1.1-fold or 1.05-fold difference, preferably a 2-fold or 1.5-fold difference.
- a 2-fold difference is 10 pg/m ⁇ vs. 20 pg/m ⁇ , or 10 pg/m ⁇ vs. 5 pg/m ⁇ ; and a 1.5-fold difference is 10 pg/m ⁇ vs. 15 pg/m ⁇ , or 10 pg/m ⁇ vs. 6.66 pg/m ⁇ .
- the DNA polynucleotide when the inventive DNA polynucleotide is present in a very low concentration, i.e. in step (b) of the inventive IVT method provided herein, the DNA polynucleotide comprises preferably an upstream flanking region as described above in the context of the upstream region of a DNA polynucleotide according to the present invention.
- the in vitro transcribed RNA is a probe for hybridization with another RNA or DNA.
- the probe is used for an in-situ hybridization method or southern blot.
- the in-situ hybridization method is FISH.
- the in vitro transcribed RNA is to be used as a FISH probe, it is transcribed in the presence of fluorescently labeled ribonucleotides.
- a probe may be an antisense RNA which is complementary to the target oligo- or polynucleotide, for example an mRNA.
- the RNA is in vitro transcribed in the presence of a capping and/or polyA-tailing enzyme.
- a 5’ cap is added at the 5’ end and/or a 3’ polyA tail is added at the 3’ end of the in vitro transcribed RNA.
- a 5’ -cap preferably refers to a 7-methyl guanosine (m7G) cap structure at the 5 ' end of an mRNA.
- the capping may be performed by enzyme-based capping following the transcription reaction (posttranscriptional capping) and/or by incorporation of a cap analog during transcription (co-transcriptional capping).
- Suitable cap analogues include 7-methyl guanosine (m7G) and 3 ' O-me 7-meGpppG cap analog (ARC A).
- ARCA is methylated at the 3' position of the m7G, preventing RNA elongation by phosphodiester bond formation at this position.
- transcripts synthesized using ARCA contain 5 ' -m7G cap structures in the correct orientation, with the 7-methylated G as the terminal residue.
- Suitable for posttranscriptional capping is, for example, the Vaccinia Capping System.
- a poly-A-tail may be added during transcription and/or after transcription.
- a reverse PCR primer comprising a polyT-stretch can be used for amplifying the template sequence.
- the polyA tail can be also added after transcription using e.g. an E. coli polyA polymerase. The length of the added tail can be adjusted by titrating the polyA polymerase.
- a polyA tail refers to a sequence comprising, preferably 5 to 300, covalently linked adenines.
- a 5’ cap is added at the 5’ end and a 3’ polyA tail is added at the 3’ end of the in vitro transcribed RNA.
- the 5’ cap and/or the 3’ polyA tail is added while the RNA is in vitro transcribed.
- the 5’ cap and/or the 3’ polyA tail is added after the RNA is in vitro transcribed.
- the polyA tail is transcribed from an IVT template comprising a polyT- stretch at the 3 ’ end of the IVT template.
- the RNA is purified before and/or after a 5’ cap and/or a 3’ polyA tail is added.
- the nucleotide sequence encoding the RNA to be transcribed is an mRNA and the mRNA synthesized in vitro in step (b) is used to produce a recombinant protein in vitro or within a cell after providing an IVT mRNA.
- the synthesis of the recombinant protein occurs in vitro.
- the nucleotide sequence encoding the RNA to be transcribed is an mRNA and the mRNA synthesized in vitro in step (b) is then transfected into a cultured cell.
- a cultured cell can be a mammalian cell line, an insect cell line, a yeast or a bacterium.
- the cultured cell is a bacterium, even more preferably a cultured cell from E. coli.
- Transfection of a cell with an mRNA is particularly advantageous to only temporarily express the protein encoded by the mRNA in the cell and/or expressing said protein without changing the genome of the cell.
- Temporary expression is especially advantageous in case of a genome editing enzyme as the risk of unspecific genome-editing activity can be reduced.
- DNA polynucleotide according to the present invention can thus be used for producing, e.g. a recombinant, protein.
- IVT is advantageous for amplifying a polynucleotide, wherein the amplified polynucleotide is DNA or RNA.
- the polynucleotide to be amplified is RNA which is reverse transcribed into cDNA which in turn is amplified by IVT.
- the RNA is an mRNA which is amplified as antisense RNA (aRNA).
- the present invention relates to a method for transcribing RNA within a cultured cell, said method comprising the steps of (a) providing a DNA polynucleotide according to the present invention comprising a T7 promoter sequence and downstream thereof a nucleotide sequence encoding the RNA to be transcribed and (b) introducing the DNA polynucleotide into a cultured cell expressing a T7 polymerase.
- the RNA which is transcribed in a cultured cell is an mRNA which is further translated into a protein within the cultured cell.
- the cultured cell wherein the RNA is transcribed is a bacterium.
- the present invention relates to a cultured cell expressing a T7 polymerase comprising a DNA polynucleotide according to the present invention comprising a T7 promoter sequence and downstream thereof a nucleotide sequence encoding the RNA to be transcribed.
- the cultured cell comprising the DNA polynucleotide of the invention is a bacterium.
- DNA polynucleotide according to the present invention comprising an inventive T7 promoter provided herein may be used as a primer, i.e. for incorporating the sequence of the inventive DNA polynucleotide comprising the T7 promoter of the invention, into a copy of a target polynucleotide.
- said DNA polynucleotide may be used for synthesizing the complementary strand of at least one target polynucleotide, wherein said synthesizing comprises annealing said DNA polynucleotide to said at least one target polynucleotide, in particular, thereby generating an IVT template according to the invention.
- said IVT template may be used in an inventive IVT method provided herein.
- the DNA polynucleotide according to the present invention is used as a reverse transcription primer.
- the DNA polynucleotide is used for reverse transcribing RNA to cDNA, thereby incorporating the sequence of the DNA polynucleotide at least partially into the cDNA.
- DNA polynucleotide of the invention as a primer or reverse transcription primer, the same applies as has been described above in connection with the T7 promoter, and in particular the downstream flanking region and/or the upstream flanking region of the T7 promoter, according to the invention.
- RTs reverse transcriptases
- RNA as template
- a short primer complementary to the 3' end of the RNA to direct the synthesis of the first strand cDNA, which may be used directly as a template for PCR.
- This combination referred to as RT-PCR, allows the detection of low abundance RNAs in a sample, and production of the corresponding cDNA.
- the first strand cDNA can be made double-stranded using e.g. DNA Polymerase I and DNA Ligase. This is for example advantageous for cloning approaches without amplification.
- Suitable RT polymerases are, for example, Avian Myeloblastosis Virus (AMV) Reverse Transcriptase and Moloney Murine Leukemia Virus (M-MuLV, MMLV), reverse Transcriptase or, preferably a Superscript RT polymerase.
- AMV Avian Myeloblastosis Virus
- M-MuLV Moloney Murine Leukemia Virus
- MMLV Moloney Murine Leukemia Virus
- reverse Transcriptase or, preferably a Superscript RT polymerase.
- the DNA polynucleotide according to the present invention may be used as a primer for synthesizing the complementary strand of a target DNA molecule, i.e. in a PCR reaction, thereby incorporating the sequence of the DNA polynucleotide of the invention at least partially into copies of the target polynucleotide.
- the synthesis of the complementary strand may be accompanied and/or followed by synthesizing the complementary strand of the strand having incorporated the DNA polynucleotide of the invention, thereby generating a double stranded DNA molecule comprising at least part of the target DNA molecule, and at one end the T7 promoter of the invention.
- This procedure may be considered as one cycle of a PCR reaction, wherein the DNA molecule of the invention is one PCR primer.
- typical PCR reaction conditions may be applied for this step, i.e. employing a DNA polymerase, for example, inter alia , a Taq polymerase.
- said PCR reaction comprises only few cycles, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 cycles, preferably 1, 2, 3, 4 or 5 cycles, preferably 1 or 2 cycles, preferably 1 cycle.
- the target polynucleotide which is amplified by using a primer comprising the T7 promoter of the invention may be obtained, in particular, from a few cells (i.e. less than 100 cells), a single cell, a single cell equivalent, a liquid biopsy, and/or a body fluid, as described herein.
- the few cells may be 100 cells or more, for example 200, 400, 600, 800, 1000, 1500 or 2000 cells but less than 10000 cells.
- the few cells, as described herein are preferably less than 100 cells, more preferably less than 10 cells.
- the target polynucleotide i.e the target polynucleotide that is amplified, analyzed and/or sequenced, according to the invention is from a prokaryote or a eukaryote, preferably an animal, preferably a mammal, preferably a human.
- said target polynucleotide may be associated with a disease and/or is suspected to be associated with a disease, i.e. as described herein.
- said target polynucleotide may be not from a virus, i.e. not from a virus that exists for less than 1000, 100 or 10 years.
- a liquid biopsy refers to the sampling of non-solid biological tissue, in particular, blood, and the analysis of at least one target polynucleotide comprised therein as described herein in the context of the invention.
- a liquid biopsy is mainly used as a diagnostic and monitoring tool, e.g. for diseases such as cancer, and may be largely non-invasive.
- a liquid biopsy may contain circulating tumor cells, e.g. inter alia , from metastatic breast, metastatic colon, and/or metastatic prostate cancer, and/or circulating tumor DNA.
- a liquid biopsy may contain circulating endothelial cells which are an indicator of vascular dysfunction or damage, e.g. associated with a heart attack.
- the presence or abundance of circulating tumor cells or circulating endothelial cells may be determined by a method, e.g. a sequencing method according to the invention, comprising amplifying target polynucleotides, e.g. mRNA molecules, from such single circulating cells by using a primer, e.g. a reverse transcription primer comprising an inventive T7 promoter, according to the invention.
- a sequencing method comprising amplifying target polynucleotides, e.g. mRNA molecules, from such single circulating cells by using a primer, e.g. a reverse transcription primer comprising an inventive T7 promoter, according to the invention.
- Circulating tumor DNA is tumor-derived fragmented DNA in the bloodstream that is not associated with cells.
- ctDNA may reflect the entire tumor genome, it may be useful in clinics, i.e. for diagnosing and/or monitoring cancer.
- the presence or abundance of circulating tumor DNA may be determined by a method, i.e. a sequencing method according to the invention, comprising amplifying target tumor DNA polynucleotides from a blood sample by using a primer comprising an inventive T7 promoter according to the invention.
- a liquid biopsy may be from blood or amniotic fluid, and contain cell-free fetal DNA (cffDNA).
- cffDNA originates from placental trophoblasts and is fragmented when placental microparticles are shed into the maternal blood circulation.
- cffDNA fragments are approximately 200 base pairs (bp) in length which is significantly smaller than maternal DNA fragments. This difference in size allows cffDNA to be distinguished from maternal DNA fragments.
- the presence or abundance of circulating tumor DNA may be determined by a method, e.g. a sequencing method according to the invention, comprising amplifying target cell-free fetal DNA polynucleotides from a blood sample (or an amniotic fluid sample) by using a primer comprising an inventive T7 promoter according to the invention.
- the ctDNA or cffDNA is linearly amplified by in vitro transcription employing the T7 promoter of the invention, and the IVT template is generated by few or one cycle(s) of PCR employing the inventive primer comprising said T7 promoter.
- the DNA polynucleotide and/or the T7 promoter i.e. the primer or reverse transcription primer of the invention may be used for diagnostic purposes, e.g. for diagnosing and/or monitoring cancer, a cardiovascular disease such as a heart attack, or a fetal gene defect or gene variant.
- the sensitivity and accuracy of single cell sequencing methods can be increased by employing a primer comprising an inventive T7 promoter provided herein for capturing target polynucleotides and amplifying them by in vitro transcription. This increase in sensitivity and accuracy may be particularly useful for diagnostic approaches, e.g. to determine certain cell states, e.g. of circulating tumor cells or leukemic cells, more reliably.
- cell state determinants such as transcription factors and chromatin binders, which are missed when using a conventional T7 promoter, are detected by the improved sequencing method of the invention employing a reverse transcription primer comprising an inventive T7 promoter provided herein.
- the invention relates to a method for amplifying a target polynucleotide, wherein said method comprises the steps of
- synthesizing the complementary strand of a target polynucleotide comprising annealing a DNA polynucleotide comprising the T7 promoter sequence according to the invention to said target polynucleotide, thereby obtaining an IVT template comprising said T7 promoter and a nucleotide sequence downstream thereof, i.e. wherein said nucleotide sequence reflects the target polynucleotide sequence, and
- said amplification is a linear amplification as described herein.
- reflecting the target polynucleotide sequence refers to the target polynucleotide sequence and the complementary sequence or antisense sequence thereof, and/or a well attributable fragment thereof.
- a fragment can be attributed to the target polynucleotide sequence by methods well known in the art, as described herein or illustrated in the appended Examples, e.g. by a sequence alignment algorithm.
- the target polynucleotide is an RNA which is reverse-transcribed into cDNA.
- said RNA is thereby amplified as aRNA.
- the IVT template is obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from less than 100 cells, preferably less than 10 cells, preferably a single cell.
- the cell containing an RNA that is to be reverse transcribed to cDNA or a target polynucleotide that is to be amplified, and/or a polynucleotide that is to be analyzed and/or sequenced may be a prokaryotic cell or an eukaryotic cell, preferably an eukaryotic cell, preferably an animal cell, preferably a mammalian cell, preferably a human cell.
- the human and/or animal cell may be associated with a disease and/or be suspected to be associated with a disease, e.g. inter alia cancer or a cardiovascular disease.
- Said cancer may be, in particular, characterized by metastatic cells, circulating tumor cells and/or blood cancer cells / leukemic cells.
- the IVT templated may be obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide, wherein said RNA and/or at least one target polynucleotide is present at a concentration of less than 1 ng/pl, preferably of less than 100 pg/m ⁇ , e.g. less than 50 pg/m ⁇ , and even more preferably of less than 10 pg/m ⁇ , in particular less than 1 pg/m ⁇ , or even less than 0.1 pg/m ⁇ , e.g. 0.05 pg/m ⁇ .
- the IVT template may be obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from a liquid biopsy and/or a body fluid.
- At least one target polynucleotide is amplified from less than 100 cells, preferably less than 10 cells, preferably a single cell, wherein the T7 promoter further comprises the sequence AATT directly upstream of the T7 core promoter sequence.
- the T7 promoter further comprises a poly-T-stretch as described herein.
- mRNA molecules from less than 100 cells, preferably less than 10 cells, preferably a single cell corresponding to at least 5000, preferably at least 7000, preferably at least 9000 different genes may be amplified.
- mRNA molecules from a single cell corresponding to at least 9000 different genes may be amplified.
- An mRNA molecule corresponding to a gene refers, in particular, to a transcript of said gene.
- the maximum amount of different genes detected and/or amplified is limited by the genes expressed in a cell, and may thus vary between individual cells or samples.
- a gene may refer to the genomic sequence of a gene comprising the exons of said gene. Moreover, said genomic sequence may comprise the intronic regions and/or, in some cases, the regulatory regions of said gene such as a promoter. Moreover, a gene may refer to a cDNA or a coding sequence. In particular, in the context of RNA sequencing, transcripts, reverse transcription, mRNA and/or cDNAs, a gene may preferably refer to the part of the gene that is transcribed (i.e. a cDNA).
- the RNA which is reverse transcribed into cDNA is obtained from a single cell or a single cell equivalent.
- a single cell equivalent as used herein, is a fraction of a sample comprising RNA, wherein the concentration of the RNA and/or the total amount of RNA is similar to what is obtained from a single cell by methods used in the art.
- the amount of RNA obtained from a single cell is about 10 to 30 pg and the amount of mRNA obtained from a single cell is about 0.1 to 1.5 pg.
- a single cell equivalent may be a sample of a liquid biopsy and/or a body fluid, such as blood, urine or cerebrospinal fluid, comprising a similar amount and/or concentration of RNA, mRNA and/or DNA as a single cell.
- a single cell equivalent may comprise about 0.1 to 1.5 pg and/or at least one target polynucleotide, e.g. at least one target mRNA and/or at least one target DNA, at a very low concentration as described herein.
- a sample of a body fluid and/or liquid biopsy may be concentrated or diluted and/or subsampled to comprise about 0.1 to 1.5 pg and/or at least one target polynucleotide at a very low concentration, as described herein.
- a polyT-stretch is particularly useful when the DNA polynucleotide of the invention is used as a reverse transcription primer for reverse transcribing mRNA to cDNA.
- the DNA polynucleotide comprising the T7 promoter of the invention and further comprising a polyT-stretch is used as a primer for reverse transcribing mRNA to cDNA, wherein the polyT-stretch binds to the polyA tail of the mRNA.
- a polyT-stretch is useful for reverse-transcribing a non-coding RNA comprising a polyA tail to cDNA.
- a cDNA comprising the T7 promoter of the invention and downstream thereof the antisense sequence of an mRNA is used as IVT template for in vitro transcribing antisense RNA (aRNA).
- aRNA antisense RNA
- the aRNA is thereby linearly amplified.
- the DNA polynucleotide comprising the T7 promoter of the invention is used for linearly amplifying RNA from cDNA by in vitro transcription.
- the DNA polynucleotide comprising the T7 promoter comprising the downstream flanking region of the invention and the upstream flanking region of the invention is used for linearly amplifying RNA from cDNA by in vitro transcription.
- cDNA is generated from RNA by using the DNA polynucleotide comprising the T7 promoter comprising the downstream flanking region of the invention and the upstream flanking region of the invention as reverse transcription primer.
- the RNA is obtained from less than 100 cells, preferably less than 10 cells, very preferably from a single cell.
- the T7 promoter comprising the downstream flanking region of the invention and the upstream flanking region of the invention is particularly suitable for in vitro transcribing RNA, when the IVT template concentration is low, i.e. very low as described herein.
- the IVT template concentration is low, for example, when the IVT template is a cDNA which is reverse transcribed from an RNA obtained from a single cell.
- the DNA polynucleotide of the invention comprising a T7 promoter comprising a downstream flanking region of the invention and an upstream flanking region of the invention is particularly useful for in vitro transcribing RNA from cDNA, wherein the cDNA has been reverse transcribed from RNA present in a very low concentration and/or derived or obtained from less than 100 cells, preferably a single cell.
- said DNA polynucleotide comprising a T7 promoter comprising a downstream flanking region of the invention and an upstream flanking region of the invention is very useful for in vitro transcribing RNA from a DNA polynucleotide, i.e.
- DNA polynucleotide has been generated from a target DNA polynucleotide that is present in a very low concentration and/or derived or obtained from less than 100 cells, preferably a single cell.
- a polynucleotide that is derived or obtained from a cell has been contained in said cell, and/or is contained in such a cell.
- aRNA is linearly amplified by in vitro transcribing aRNA from a cDNA IVT template, wherein the cDNA is obtained by reverse transcription from an mRNA, wherein the DNA polynucleotide of the invention is comprised in the reverse transcription primer, and wherein the mRNA is obtained from a single cell.
- the method for in vitro transcribing RNA further comprising the steps of reverse transcribing an RNA to cDNA and subsequently transcribing the cDNA to RNA, wherein the DNA polynucleotide of the invention is used as reverse transcription primer and the cDNA is the IVT template.
- the reverse transcribed RNA is an mRNA
- the DNA polynucleotide comprises a poly-T-stretch, preferably downstream of the T7 core promoter
- the in vitro transcribed RNA is an antisense RNA (aRNA).
- a second strand of the cDNA comprising the T7 promoter of the invention is synthesized upon first strand cDNA synthesis.
- the linearly amplified RNA is further reverse transcribed to cDNA, preferably by using random primers for first strand synthesis and the DNA polynucleotide of the invention comprising a polyT-stretch as primer for second strand synthesis.
- the cDNA derived from linearly amplified RNA can again be used as an IVT template to again in vitro transcribe RNA.
- the cDNA comprising the T7 promoter of the invention and downstream thereof a nucleotide sequence encoding preferably an RNA is used as IVT template for in vitro transcribing RNA.
- the RNA is thereby linearly amplified.
- the DNA polynucleotide of the invention can be used for, preferably linearly, amplifying a nucleotide sequence which is complementary to a nucleotide sequence comprised in said DNA polynucleotide.
- the presence and/or abundance of a target polynucleotide in sample such as a liquid biopsy, a sample from a body fluid, a few cells and/or a single cells, as described herein, may be determined by employing the inventive IVT and T7 promoter based amplification methods provided herein.
- the amount of the amplified target polynucleotide(s) e.g. aRNA
- the amplified target polynucleotide(s) may be determined by methods known in the art and/or as described herein, e.g. by quantitative PCR, digital PCR and/or next-generation sequencing.
- Further methods to detect the amplified target polynucleotide(s) may be also used, e.g. fluorescence in-situ hybridization (FISH) and/or dCas9-based imaging.
- FISH fluorescence in-situ hybridization
- dCas9-based imaging e.g. fluorescence in-s
- the invention relates to a method for amplifying a target polynucleotide, wherein said target polynucleotide is a genomic DNA sequence, and wherein said method comprises the steps of
- step b As regards the in vitro transcription (step b), the amplification of the target polynucleotide, and “reflecting the target polynucleotide sequence”, the same applies as is described herein in the context of further methods for amplifying a target polynucleotide.
- a primer or reverse transcription primer comprising a T7 promoter with an inventive and defined downstream flanking region, as provided herein, allows to compare the abundances of target polynucleotides (e.g. non-poly-A RNA or DNA) with improved sensitivity and/or accuracy.
- target polynucleotides e.g. non-poly-A RNA or DNA
- T7-promoter based transposase-mediated amplification of genomic DNA target polynucleotides e.g. which are in proximity to a certain epigenetic modification or DNA-binding molecule.
- the improved accuracy is due to the improved efficiency of amplifying the target polynucleotide, and furthermore, due to the lack of an amplification bias.
- an amplification bias may occur when only a core T7 promoter or a T7 core promoter with less than 8 defined directly adjacent downstream bases is used because in such a case the efficiency of the T7 promoter may vary between different target polynucleotides since they may constitute a (random) part of the downstream flanking region.
- the T7 promoter of the invention comprising an optimized and defined downstream flanking region is thus particularly useful for detecting and/or quantifying target polynucleotides in a sample, i.e.
- an inventive sequencing method employing an inventive primer provided herein may be used.
- the IVT method using a DNA polynucleotide according to the present invention is part of a sequencing method.
- the term “sequencing” refers to approaches aiming at determining the identity of at least one nucleotide, preferably most nucleotides, even more preferably all nucleotides in a given nucleotide sequence.
- the term “sequencing method” refers to a method for determining the partial or full sequence(s) of a polynucleotide and/or to determine the relative abundance of identical molecules of said polynucleotide.
- a sequencing method may be a first, next (second) or third generation sequencing method. Preferred are next or third generation sequencing methods.
- a preferred next-generation sequencing technique is Illumina sequencing.
- the sequencing method is suitable for single cell sequencing. Suitable single cell sequencing techniques are for example based on SMART-seq2, CEL-seq2, inDrop, MARS-seq, Drop-seq, STRT, Quartz-seq, LIANTI, CHIL-Seq, Dam-seq (e.g.
- scDam&T-seq and the sci-L3 method.
- a preferred third generation sequencing method is Nanopore sequencing which allows direct sequencing of RNA and/or DNA in the absence of a library preparation step comprising PCR based amplification of reverse-transcribed aRNA.
- the DNA polynucleotide of the present invention is used to reverse transcribe a target polynucleotide, e.g. an RNA, amplify it, i.e. by IVT, and then subject the amplified RNA directly and/or after reverse transcription to sequencing.
- a target polynucleotide e.g. an RNA
- IVT i.e. by IVT
- the DNA polynucleotide used for the sequencing method comprises a polyT-stretch to capture mRNAs which are then amplified as antisense RNAs (aRNAs).
- aRNAs antisense RNAs
- the entire sequence of RNAs, including the sequence of the respective poly(A) tails, can be determined by sequencing the synthesized cDNA library.
- the DNA polynucleotide of the invention used for the sequencing method further comprises a barcode and/or a unique molecular identifier (UMI) and preferably a polyT-stretch.
- UMI unique molecular identifier
- the sequencing method is suitable for scarce material, in particular for less than 10.000 cells, preferably less than 100 cells, preferably less than 10 cells. In a very preferred embodiment, the sequencing method is a single cell sequencing method.
- the invention relates to a method for determining the partial or full sequence(s) of a polynucleotide, i.e. a target polynucleotide, and/or the relative abundance of identical molecules of said polynucleotide comprising the steps of (a) synthesizing a DNA strand which is complementary to the polynucleotide that is to be sequenced (target polynucleotide) or the complementary strand thereof comprising annealing the DNA polynucleotide of the invention (to said target polynucleotide), and (b) transcribing the polynucleotide that is to be sequenced (target polynucleotide) or the complementary strand thereof into RNA by using a T7 polymerase.
- the present invention relates to a method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene comprising the steps of (a) reverse transcribing an RNA into a first strand of a cDNA comprising annealing the DNA polynucleotide according to the present invention; and (b) transcribing the cDNA into aRNA by using a T7 polymerase.
- step (a)) of the method the same applies as it has been described above in connection with reverse transcribing RNA to cDNA, synthesizing the complementary strand of a target polynucleotide and/or the use of the DNA polynucleotide of the invention as a primer or reverse transcription primer.
- annealing the DNA polynucleotide refers herein to the use of the DNA polynucleotide of the invention as a primer for reverse transcribing or synthesizing the complementary strand of the polynucleotide the sequence of which is to be determined (target polynucleotide).
- the method further comprises a step (a’) of synthesizing a second strand of the first strand of the cDNA obtained from step (a) or during the method for amplifying a target polynucleotide according to the invention.
- the invention relates to a method for determining the partial or full nucleotide sequence(s) and/or abundance of at least one target polynucleotide comprising the steps of
- determining the abundance of at least one target polynucleotide may comprise counting the nucleotide sequences corresponding to said target polynucleotide. Methods for determining and normalizing the counts are well known in the art, and described herein and in the appended Examples. Moreover, the abundance of at least one mRNA may further refer to the transcript level of the corresponding gene.
- the invention also relates to a method for determining the transcript level of at least one gene comprising the steps of
- the partial or full nucleotide sequence(s), the abundance of at least one target polynucleotide and/or the transcript level of at least one gene is determined in a sample comprising less than 100 cells, preferably less than 10 cells, preferably a single cell.
- the mRNA and/or at least one target polynucleotide may be present at a concentration of less than 1 ng/m ⁇ , preferably of less than 100 pg/m ⁇ , e.g. less than 50 pg/m ⁇ , and even more preferably of less than 10 pg/m ⁇ , in particular less than 1 pg/m ⁇ , or even less than 0.1 pg/m ⁇ , e.g. 0.05 pg/m ⁇ .
- the expression of at least about 9750 genes can be detected in a single cell (e.g. a K562 cell) with an exemplary sequencing method according to the invention employing an inventive reverse transcription primer provided herein (e.g. SEQ ID NO: 15) comprising an T7 promoter of the invention (e.g. SEQ ID NO:20).
- an inventive reverse transcription primer provided herein (e.g. SEQ ID NO: 15) comprising an T7 promoter of the invention (e.g. SEQ ID NO:20).
- the partial or full nucleotide sequence(s) and/or the abundance of at least 5000, preferably at least 7000, preferably at least 9000 different target polynucleotides and/or the transcript level of at least 5000, preferably at least 7000, preferably at least 9000 genes may be determined according to the sequencing method of the invention.
- at least 5000, preferably at least 7000, preferably at least 9000 transcripts of unique genes may be detected in a single cell.
- at least 9100, 9300 or 9500 transcripts of unique genes may be detected in a single cell.
- the sequencing method of the invention e.g. the method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene comprises a step (c) of generating a double-stranded cDNA from the aRNA of step (b), and/or a step of sequencing the aRNA of step (b) and/or the cDNA of step (c).
- the sequencing method of the invention further comprises a step (c’) of synthesizing the second strand of the cDNA of step (c) upon synthesis of the first strand of the cDNA.
- the method further comprises a step (d) of amplifying the cDNA of step (c) by PCR, wherein a library preparation primer binds to the sequencing adapter.
- the sequencing method further comprises a step of sequencing the amplified cDNA of step (d).
- the partial or full nucleotide sequence(s) of an mRNA and/or the transcript level of at least one gene comprising a coding sequence is determined, wherein the DNA polynucleotide of the invention comprises a polyT-stretch and a sequencing adapter, wherein the polyT-stretch binds to the poly-A-tail of the mRNA.
- the polynucleotide that is to be sequenced is derived or obtained from less than 10000 cells, preferably, less than 100 cells, preferably less than 10 cells.
- the polynucleotide to be sequenced is derived or obtained from a single cell and/or a single cell equivalent.
- the polynucleotide that is to be sequenced is derived from and/or obtained from a single cell.
- the polynucleotide to be sequenced may be derived or obtained from a body fluid, i.e. a single cell equivalent obtained from a body fluid, as described herein.
- the sequencing method of the invention further comprises a step of analyzing the sequencing data, wherein the analysis comprises a sequence alignment, counting reads, and/or normalizing read counts.
- the present invention relates to the use of the DNA polynucleotide according to the present invention in a sequencing method, e.g. for determining the partial or full nucleotide sequence(s) of an RNA contained in a single cell or single cell equivalent and/or the transcript level of at least one gene of a single cell or single cell equivalent.
- DNA polynucleotide and the determination of the partial or full nucleotide sequence(s) of an RNA contained in a single cell or single cell equivalent and/or the transcript level of at least one gene of a single cell or single cell equivalent the same applies as described above.
- the present invention further relates to a kit comprising the DNA polynucleotide of the invention, one or more modified and/or unmodified ribonucleotide triphosphate(s), library preparation primer(s), sequencing primer(s), and/or a microfluidic chip.
- the kit can further comprise an enzyme for 5’ capping of RNA and/or an enzyme for poly-A- tailing of RNA, and/or a manual describing an in vitro transcription method and/or a sequencing method comprising an IVT step using said kit.
- an enzyme for 5’ capping of RNA and/or an enzyme for poly-A- tailing of RNA and/or a manual describing an in vitro transcription method and/or a sequencing method comprising an IVT step using said kit.
- the polynucleotide of the invention is an RNA polynucleotide.
- the RNA polynucleotide of the invention the same applies as it has been described in the context of a DNA polynucleotide.
- RNA polynucleotide of the invention is reverse-transcribed to DNA, in particular before its use as a reverse transcription primer, for IVT and/or a sequencing method according to the invention.
- FIG. 1 Scheme of 5’RACE-Seq for investigating the efficiency of different T7 promoter sequences.
- a double stranded DNA (dsDNA) library (SEQ ID NO:2), wherein each DNA polynucleotide comprised a T7 core promoter sequence, followed by a guanine at position +1 and a randomized nucleotide sequence at positions +2 to +16, was transcribed in vitro using a T7 RNA polymerase. The resulting 210 nucleotides long RNAs were reverse transcribed, and the 5’ end of the respective cDNA converted into a library for deep sequencing. The activity of the T7 promoter sequences was determined by counting the reads comprising a respective flanking region directly adjacent downstream the T7 core promoter sequence. The DNA IVT template was amplified by PCR and sequenced for normalization of the read counts.
- FIG. 4 Activity of T7 promoter variants carrying nucleotide sequences from positions +4 to +8 determined by 5’RACE-Seq shown as log2 fold change (FC) (relative abundances of individual sequence motifs). All promoters contained a G at positions +1 to +3. A high correlation (R 2 ) of 0.98 was observed between two independent experiments denoted as RepA and RepB. Highlighted are exemplary T7 promoter sequences with high, mid, or low activity.
- Figure 5. Linear amplification of RNA by in vitro transcription.
- FIG. 6 Comparison of the IVT activity of T7 promoter variants with different 5’RACE-Seq ranks as shown in Table 1.
- All T7 promoter variants comprised a G at positions +1 to +3 and different nucleotide sequences at positions +4 to +8 and were used to in vitro transcribe a 410 nucleotides long RNA for 2h. Shown is the fold amplification relative to the template DNA.
- Indicated below is the +4 to +8 RACE-Seq rank as shown in Table 1. Results are shown from left to right for SEQ ID NO:8, SEQ ID NO:7, SEQ ID NO:6, SEQ ID NO:3, and SEQ ID NO:4. Error bars represent standard deviation for triplicate experiments
- FIG. 7 Comparison of the IVT activity of two T7 promoter variants using different IVT DNA template concentrations (SEQ ID NO:4 and SEQ ID NO: 8). IVT was performed for 2h.
- FIG. 8 Comparison of the IVT activity of a T7 promoter variant with a 5’RACE-Seq rank as shown in Table 1 with or without an AT -rich 4 nucleotides long upstream flanking region. IVTs were performed for 2h using T7 promoter variants of the indicated 5'RACE-Seq rank. Shown is the resulting fold amplification of template DNA for SEQ ID NO:4, SEQ ID NO:5, and SEQ ID NO: 8 (from left to right). The upstream flanking region was GAATT located directly adjacent upstream the T7 core promoter (comprised in SEQ ID NO:5).
- FIG. 9 An AT -rich upstream flanking region boosts the IVT activity of the T7 promoter in an IVT template that mimics the structure of CEL-seq2 derived cDNA, in particular at very low IVT template concentrations.
- a 410 nucleotides long RNA was in vitro transcribed for 2h from 1 nanogram of standard IVT templates (SEQ ID NO:4 and SEQ ID NO:5).
- the RNA was in vitro transcribed for 2h from 1 nanogram (50 pg/pl) of 305 nucleotides long CEL-seq cDNA mimics (SEQ ID NO: 10 and SEQ ID NO: 11).
- RNA was in vitro transcribed for 15h from 1 picogram (0.05 pg/m ⁇ ) of those CEL-seq cDNA mimics.
- the T7 promoter in all IVT templates further comprised a directly adjacent downstream flanking region with the sequence “GGGATAAT”.
- Figure 10 A T7 promoter of the present invention, in particular with a respective upstream flanking region improves IVT from a dsDNA that mimics the structure of CEL-seq2 derived cDNA.
- the relative activity of the T7 promoter as found in the CEL-seq2 adaptor is compared to the T7 promoter with rank #4 (see SEQ ID NO: 10) in the 5’RACE-Seq data as shown in Table 1, and the latter in combination with the directly adjacent upstream flanking region with the sequence GAATT (see SEQ ID NO: 11).
- the template was pre-synthesized as double stranded DNA to monitor the specific effect of the promoter sequence, independent of the mRNA capturing step during CEL-seq2. 1 picogram template was in vitro transcribed for 15h.
- the data for “CEL-seq + #4” and “CEL-seq up + #4” are the same as shown in the right panel of Figure 9.
- T7 promoter in combination with specific directly adjacent upstream flanking regions improves aRNA yield during CEL-seq2 from 10 single cells.
- CEL-seq2 experiments were performed until RNA amplification using the reverse transcription primers with SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 14 or SEQ ID NO: 16 (from left to right).
- the 5’RACE-Seq rank of the T7 promoter sequence is indicated as shown in Table 1. “upl” refers to the upstream flanking region GAATT (comprised in SEQ ID NO: 13 and SEQ ID NO: 15) and “up4” refers to the upstream flanking region AATTG (comprised in SEQ ID NO: 14 and SEQ ID NO: 16).
- CEL-seq indicates the original CEL-seq2 reverse transcription primer (SEQ ID NO: 12). Error bars represent standard deviation for triplicate experiments.
- FIG. 12 The combination of a directly adjacent up- and downstream region of the T7 improves IVT in a single cell RNA sequencing experiment based on CEL-seq2.
- CEL-seq2 was performed from single K562 cells using the indicated primers (SEQ ID NO: 12 and SEQ ID NO: 15). After reverse transcription and second strand synthesis cDNA from 10 cells was pooled and in vitro transcribed for 15h. Purified RNA was fragmented and quantified on a Tapestation (Agilent). Error bars represent the standard deviation from triplicate experiments.
- T7 T7 promoter
- Illumina A partial sequencing adapter A
- UMI unique molecular identifier
- BC cellular barcode
- a modified CEL-Seq2 reverse transcription primer (SEQ ID NO: 15; CEL-Seq+) harboring a modified T7 promoter (SEQ ID NO:20) significantly improved CEL-Seq2-based RNA-Sequencing of single cells, i.e. by enhancing the efficacy of the linear amplification of cDNA. Specifically, a significantly increased number of genes was detected with CEL-Seq+.
- FIG. 16 Genes from deep sequenced bulk K562 RNA-Seq (Prost (2015), Nature 525) were sorted into quartiles by expression level with the lowest expressed genes in the first quartile and the highest expressed genes in the fourth quartile. Shown are the numbers of genes from the indicated bulk quartiles that were detected in individual cells by CEL-Seq2 (with the reverse transcription primer shown in SEQ ID NO: 12) or CEL-Seq+ (with the reverse transcription primer shown in SEQ ID NO: 15). Within each of the four quartile plots, CEL-Seq2 is shown on the left, and CEL-Seq+ on the right.
- FIG. 17 Shown are the numbers of genes of different categories detected with CEL-Seq2 (with the reverse transcription primer shown in SEQ ID NO: 12) or CEL-Seq+ (with the reverse transcription primer shown in SEQ ID NO: 15) in individual cells.
- the first category encompasses genes that are differentially expressed throughout chronic myeloid leukemia (CML) disease progression (Prost (2015), Nature 525; left plot), the second category encompasses genes with the GO association “DNA binding transcription factor (TF)” (middle plot), and the third category encompasses genes with the GO association “chromatin binding” (right plot).
- CEL-Seq2 is shown on the left, and CEL-Seq+ on the right. Whiskers reach to 1.5x IQR away from the lst/3rd quartile.
- a saturated randomized 5' RACE-Seq screen was performed to determine the self-transcription activity of T7 promoter sequences comprising the T7 core promoter, a guanine at position +1 (TSS) and different combinations of nucleotides at positions +2 to +16.
- T7 promoter sequences comprising the T7 core promoter, a guanine at position +1 (TSS) and different combinations of nucleotides at positions +2 to +16.
- TSS guanine at position +1
- the 5’ end sequence of over 100 million 210 nucleotides long RNAs transcribed from said randomized promoter sequences comprised in a 500 nucleotides long dsDNA polynucleotide library (lOng; scale of 10 10 (10 billion) molecules of the +2 to +16 randomized T7 promoter template) was determined to interrogate the influence of the directly adjacent downstream flanking region of the core promoter on the transcriptional activity of the T7 promoter.
- a scheme of the applied experimental procedure of 5' RACE-Seq is shown in Figure 1.
- 5' “RACE-Seq” refers to a rapid amplification of cDNA ends and subsequent sequencing using a next-generation-sequencing method (deep sequencing).
- a “saturated” screen refers to a screen of virtually any possible combination of nucleotides at positions +2 to +16 contained in a 5’RACE-Seq library. Any possible sequence at positions +2 to +8 is covered at least 100 times, but typically thousands of times during sequencing. The promoter strength of a given promoter sequence was assessed based on the relative production of transcripts from said promoter.
- cDNA was purified using 1.6 volumes Ampure XP beads and eluted in 10 pi water.
- a poly (A)-tailing reaction was carried out using 3 U/mI terminal transferase (Invitrogen 10533).
- Second-strand cDNA synthesis was performed in 50 pi with 0.5 pM Oligo i5_5N_dT20VN, 2000 U Q5® High- Fidelity DNA Polymerase, and 0.2 mM dNTPs, using the temperature cycle 98 °C for 30 sec; 55 °C for 30 sec; 72 °C for 5 min.
- 2 pL Exonuclease 1 and 48 pi water were added and the reaction was incubated for 1 h at 37 °C.
- dsDNA was purified with 1.6 volumes AMPure XP beads, and eluted in 22 pi water. Sequencing adapters were introduced during 8 cycles of PCR with KAPA HiFi HotStart.
- dsDNA template T7 15N was amplified by PCR using KAPA HiFi HotStart Polymerase, P5 and BG T7 sequencing adapters. Here, simply 7 cycles of PCR had to be performed.
- Libraries were purified twice with 0.9 volumes AMPure XP beads and sequenced in single end mode using a NextSeq 500 instrument from Illumina. The first 15 nucleotides were used for data analysis.
- the dsDNA template for RACE-Seq sequence was as follows (SEQ ID NO:2):
- GCATCGAG wherein bold and italics refer to the T7 core sequence, bold and not in italics to the sequencing adapter binding site, and bold and underlined to the reverse transcription binding site, and wherein N denotes any nucleotide selected from the group consisting of A, C, G, and T.
- T7 promoter variants comprising three guanines at positions +1 to +3 and a certain set of nucleotide sequences at positions +4 to +8 is shown in Table 1.
- the downstream flanking regions (positions +1 to +8) of T7 promoter variants are ranked based on the transcriptional activity measured by 5’RACE-Seq.
- the normalized activity measurements of two replicate experiments (A and B) and the mean of the normalized activity obtained in the two experiments is shown.
- GGGAACCC 1.593360743 1.994660137 1.79401044
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Analytical Chemistry (AREA)
- Plant Pathology (AREA)
- Immunology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The invention relates to a DNA polynucleotide, a kit comprising said DNA polynucleotide, and a method for in vitro transcribing (IVT) RNA using said DNA polynucleotide, wherein said DNA polynucleotide comprises (i) a T7 promoter sequence comprising (1) a T7 core promoter sequence and (2) a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising three guanines at positions +1 to +3, an adenine or thymine at position +4, at least two adenines at positions +4 to +8 and/or a thymine at position +4, and at most one cytosine at positions +4 to +8, and downstream thereof (ii) a nucleotide sequence encoding the RNA to be transcribed; and wherein the method comprises (a) providing said DNA polynucleotide as an IVT template, and (b) in vitro transcribing said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
Description
Phage T7 promoters for boosting in vitro transcription
The present invention relates to a DNA polynucleotide and a method for in vitro transcribing (IVT) RNA using said DNA polynucleotide. In particular, the DNA polynucleotide comprises (i) a T7 promoter sequence comprising (1) a T7 core promoter sequence and (2) a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising three guanines at positions +1 to +3, an adenine or thymine at position +4, at least two adenines at positions +4 to +8 and/or a thymine at position +4, and at most one cytosine at positions +4 to +8, and downstream thereof (ii) a nucleotide sequence encoding the RNA to be transcribed. Furthermore, the method comprises (a) providing said DNA polynucleotide as an IVT template, and (b) in vitro transcribing said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
Background
Obtaining a comprehensive understanding of RNA biology and the biology of diseases associated with RNA dysregulation is crucial as RNAs have been considered as the major determinant for converting genomic information from DNA into biological function. Thus, research has been focusing on the transcriptome of cells including RNAs that are translated into protein as well as regulatory RNAs. Unraveling RNA biology and the biology of diseases associated with RNA dysregulation requires detection, quantification, and characterization of different types of RNAs. However, RNAs exist with strongly varying copy numbers per cell and many approaches require a minimum amount of a given target RNA for further investigation. Hence, approaches are of interest for amplification of RNAs that maintain the relative abundances of different RNAs present in the original cell or tissue.
Different approaches exist for an efficient amplification of nucleic acids which is critical for many molecular biology procedures (Sauer et ah, Nat. Rev. Genet. 6, 465-476, 2005). A widely applied procedure being core to a host of analytical genomics applications is the in vitro transcription (IVT) of RNA using efficient viral promoter sequences for RNA amplification (Melton et al, Nucleic Acids Res. 12, 7035-7056, 1984). As examples, recent single-cell RNA- sequencing (scRNA-seq) methods such as CEL-seq (Hashimshony et al, Cell Rep. 2012 Sep
27;2(3):666-73), CEL-seq2 (Hashimshony et al, Genome Biol. 17, 77, 2016) and microdroplet- based procedures (Klein, A. M. et al. , Cell 161,1187-1201, 2015), rely on IVT using a T7 promoter and optimized reaction conditions (e.g. ionic strength and temperature) for engineered RNA polymerases. Moreover, IVT using a T7 promoter has been applied for low-amount DNA sequencing using linear amplification via transposon insertion (LIANTI) that uses the Tn5 transposon to fragment the genome and simultaneously insert a T7 RNA promoter for in vitro transcription (IVT) (Chen, C. et al, Science , 356, 189-194, 2017) and more recently sci-L3, a single-cell/nuclei sequencing method that combines combinatorial indexing (sci-) and linear (L) amplification (Yin, Y. et al., Mol Cell. 2019 Aug 28. pii: S1097-2765(19)30618-5, 2019, doi: 10.1016/j.molcel.2019.08.002. [Epub ahead of print]). Compared to exponential amplification by PCR, linear amplification by IVT features a much lower amplification bias (Hashimshony et al., Cell Rep. 2012 Sep 27;2(3):666-73). Further recent procedures comprising single-cell DNA sequencing such as scDam&T-seq (Rooijers (2019), Nat. Biotechnol. 37) or Chromatin Integration Labeling sequencing (CHIL-Seq; Harada (2019), Nat. Cell Biol. 21) also rely fundamentally on T7-promoter based in vitro transcription. IVT has also gained significant commercial interest for therapeutic or biotechnological applications. This includes the synthesis of RNA-based vaccines, mRNAs for transfection of cells for (over)expressing a protein without leaving a footprint in the genome, and regulatory RNAs for modulating gene or protein expression in vitro and in vivo ; Sahin et al.; Nat Rev Drug Discov. 2014 Oct;13(10):759-80. For example, CRISPRRNAs or (single) guide RNAs can be used for disrupting or restoring the sequence of target genes, and antisense RNAs and/or small- interfering RNAs can be used for translation inhibition of target mRNAs.
Many molecular biology procedures for an efficient amplification of nucleic acids take advantage of the phage T7 promoter. The T7 promoter refers to a 23 nucleotides long sequence comprising a transcription start site (TSS) denoted position +1. During transcription initiation, the T7 RNA polymerase binds the promoter DNA from nucleotide position -17 to -5 with high specificity, while the DNA double strand is melted from -4 to +2 to prime RNA synthesis from a GTP nucleotide at +1. Furthermore, it has been shown that sequences of the T7 core promoter, in particular the region from -17 to -1, and the first three transcribed nucleotides (+1 to +3) can affect the transcriptional activity, i.e. the promoter strength (defined by relative production of transcripts from a promoter; Ikeda, Biol. Chem. 267,11322-8, 1992) and that sequences upstream of the T7 core promoter positively affect T7 polymerase binding affinity to an IVT DNA template (Tang et al., Biol. Chem. 280,40707-13, 2005). However, it has been also suggested to only specify the +1 and +2 nucleotides as guanines (e.g.
https://www.neb.com/protocols/2015/ll/24/sgrna-synthesis-using-the-hiscribe-quick-f7-high- yield-ma-synthesis-kit-neb-e2050; downloaded on 30.08.2019). Furthermore, to date, the sequence further downstream was not considered to be relevant for regulating the promoter strength; Patwardhan et al.; Nat Biotechnol. 2009 Dec;27(12): 1173-5. doi: 10.1038/nbt.1589. Moreover, a comprehensive characterization of the impact of flanking promoter sequences on transcription has not been shown. It is therefore unknown whether the extended downstream flanking sequence affects the T7 promoter strength or whether certain upstream and downstream flanking sequences affect the T7 promoter strength. However, there is still a need to expand the repertoire of effective T7 promoters. In particular, optimized T7 promoters with stronger outcome would be advantageous for a variety of applications such as linear amplification procedures.
Summary
The present application addresses the need for optimized T7 promoter sequences and methods using said promoter sequences for enhanced amplification of antisense RNA (aRNA) and/or improved in vitro transcription of RNAs by providing the embodiments as recited in the claims.
In particular, the present invention relates in preferred aspects to a DNA polynucleotide, a kit comprising said DNA polynucleotide, and a method for in vitro transcribing (IVT) RNA using said DNA polynucleotide. In particular, said DNA polynucleotide comprises (i) a T7 promoter sequence comprising (1) a T7 core promoter sequence and (2) a directly adjacent downstream flanking region, and downstream thereof (ii) a nucleotide sequence encoding the RNA to be transcribed.
The downstream flanking region of said DNA polynucleotide comprises eight bases (+1 to +8) comprising {three guanines at positions +1 to +3 (GGG)},
{an adenine (A) or thymine (T) at position +4},
{ [at least two adenines at positions +4 to +8] and/or [ a thymine at position +4] }, and {at most one cytosine (C) at positions +4 to +8}.
Furthermore, the method comprises the steps of (a) providing said DNA polynucleotide as an IVT template, and (b) in vitro transcribing the RNA from said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates. The present invention relates in a further preferred aspect to a method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene, wherein the method comprises the steps of (a) reverse transcribing an RNA to the first strand of a cDNA comprising annealing a DNA polynucleotide according to the present invention, and (b) transcribing the cDNA into RNA using a T7 polymerase.
It has surprisingly been found that the +4 to +8 bases/nucleotides of the downstream flanking region of the T7 promoter drastically affect its transcription activity/efficiency/strength. As demonstrated herein, combinations of bases/nucleotides at the positions +4 to +8 according to defined rules described in the following have positive effects compared to a T7 core promoter with a conserved guanine-triplet (GGG) at positions +1 to +3 and a random sequence at positions +4 to +8. More specifically, it was found that at positions +4 to +8 sequences that are rich in adenines and thymines (AT -rich) improve the transcription strength of the T7 promoter, especially compared to respective sequences that are rich in guanines and cytosine (GC-rich). In particular, it was surprisingly found that T7 promoters, comprising
(an adenine or thymine at position +4},
{ [at least two adenines at positions +4 to +8] and/or
[a thymine at position +4], and
(at most one cytosine at positions +4 to +8} are advantageous for efficiently increasing transcription activity and thus, the efficiency of the T7 promoter. In addition, it was surprisingly found that the combination of a downstream flanking region according to the present invention with an AT-rich upstream flanking region further enhances the transcription activity of the T7 promoter. This is especially advantageous in applications wherein the in vitro transcription template is present and/or available at low concentration, e.g. a very low concentration as described herein.
An optimized T7 promoter according to the present invention can be readily applied in various IVT-based methods. It is especially advantageous when aiming at boosting linear amplification of nucleic acids. In particular, the T7 promoters of the invention comprising a defined +1 to +8 downstream flanking region ensure efficient amplification of target polynucleotides in a sample while avoiding a bias in IVT efficiency between different target polynucleotides. Optimized T7
promoters can be further used for increasing the efficiency of the synthesis of RNA in vitro , e.g. to produce large amounts of unmodified or modified RNA or recombinant proteins, preferably for therapeutic applications. An optimized T7 promoter is also highly valuable for IVT-based RNA-sequencing methods such as CEL-seq2, inDrop and MARS-seq as especially linear amplification of antisense RNA (aRNA) can be boosted and/or performed more accurately by using a reverse transcription primer comprising the optimized T7 promoter according to the invention instead of a commercially available reverse transcription primer. As illustrated in the appended examples, employing an inventive T7 promoter provided herein in a single cell RNA-seq method allows to detect more gene transcripts, i.e. mRNA of lowly expressed genes, for example of important cell state determinants such as transcription factors and chromatin modifiers. Moreover, many of those lowly expressed genes were only detected with a reverse transcription primer comprising the inventive T7 promoter but not with the conventional CEL-Seq2 reverse transcription primer. In addition, more transcripts per gene were detected in a single cell over the entire expression range compared to the conventional CEL-Seq2 reverse transcription primer. Thus, the expression level of a gene in a single cell can be determined more precisely, i.e. with a reduced variability. Thus, the sensitivity and/or accuracy of analytical methods used for detecting and/or sequencing nucleic acid molecules can be increased by employing an inventive T7 promoter provided herein, which is especially advantageous, for example, in case of analyzing transcriptomes on a single-cell level and/or detecting polynucleotides in liquid biopsies, i.e. in case raw material such as circulating tumor cells (CTC) is scarce and/or precious (Lawson et al., Nat. Cell Biol. 20,1349- 1360 (2018). As for RNA applications, an optimized T7 promoter is also highly valuable for IVT-based DNA- sequencing methods such as LIANTI, CHIL-Seq, Dam-seq (e.g. scDam&T-seq) or the sci-L3 method.
General terms
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which the invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, preferred methods and materials are described. For the purposes of the present invention, the following terms are defined below.
The articles “a” and “an” are used herein to refer to one or to more than one (i.e. to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
As used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative (or).
Detailed description
In a first aspect, the present invention relates to a DNA polynucleotide, comprising a T7 promoter comprising a T7 core promoter sequence and a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising {three guanines at positions +1 to +3},
{an adenine or thymine at position +4},
{ [at least two adenines at positions +4 to +8 and/or
[a thymine at position +4], and
{at most one cytosine at positions +4 to +8}.
Herein, pairs of brackets, i.e. “{ “[ ]” and “< >”, are used to specify relationships and layers within rules defined herein, with terms and expressions being grouped within an opening and a closing bracket of the same type, e.g. “{}”. Hence, brackets are to be understood as auxiliary means only for visually simplifying specific relationships and layers within rules defined herein.
In the context of the present invention, the term “polynucleotide” refers to a polynucleotide of at least 13 nucleotides covalently bonded in a chain and thus, representing a sequence of nucleotides. Herein, the term “nucleotide” refers to deoxyribonucleotides in case of DNA and cDNA (complementary DNA) molecules and to ribonucleotides in case of RNA molecules. Nucleotides consist of a nitrogenous base (also known as nucleobase), a five-carbon sugar (ribose in case of RNA or deoxyribose in case of DNA and cDNA), and at least one phosphate group. Nucleotides comprising as the respective nitrogenous base adenine, guanine, cytosine, thymine, and uracil are referred to as A, C, G, T and U nucleotides, respectively. Said
nucleotides are preferred, though the term nucleotides can further comprise nucleotide analogues, such as peptide nucleic acids, morpholinos and/or locked nucleic acids, and/or modified nucleotides. Modified nucleotides include for example nucleotides comprising nucleobases that are methylated, formylated or hydroxylated nucleobases. Examples for modified or analogous nucleobases include for example 5-methyl-cytosine, 5-hydroxymethyl- cytosine, N6-methyladenine, 7-methylguanine, 2-aminopurine, pyrrolo-dC, 5-bromouracil, and hypoxanthine. Hence, the term “polynucleotide” refers to single- or double-stranded DNA (deoxyribonucleic acid) or RNA (ribonucleic acid) molecule built up of A, C, G, T and/or U nucleotides and/or modified versions thereof. Different types of RNA molecules exist including, but not limited to mRNA, IncRNAs, circular RNAs, miRNAs, snoRNAs, tRNAs, snRNAs, siRNAs, crRNAs and sgRNAs.
In the context of the invention, the terms “adenine”, “cytosine”, “guanine”, “thymine” and “uracil” refer not only to the respective nucleobase, but also to the respective nucleotide or nucleoside. For example, an “adenine”, “A” or “adenosine” refers to an “adenine”, “adenosine”, “adenosine triphosphate” (“ATP”), “adenosine diphosphate” (“ADP”) or “adenosine monophosphate” (“AMP”).
The polynucleotide of the invention is a polynucleotide, preferably a DNA polynucleotide, more preferably a double-stranded DNA polynucleotide.
A single-stranded polynucleotide consists of one nucleotide sequence and a double-stranded polynucleotide of two complementary nucleotide sequences. The term “complementary” refers to nucleotides, the nitrogenous bases of which can naturally bind to each other by hydrogen bonds, i.e. A and T nucleotides, A and U nucleotides as well as C and G nucleotides. Both, a nucleotide and a nucleotide sequence have a 5’ (five prime) end and a 3’ (three prime) end based on the five carbon sites of sugar molecules in adjacent nucleotides. A nucleotide sequence is read herein from its respective 5’ to its respective 3’end, wherein the term “downstream” refers to a nucleotide which is in 3’ direction of a reference nucleotide. The term “upstream” refers to a nucleotide which is in 5’ direction of a reference nucleotide within a respective nucleotide sequence. A nucleotide which is located downstream of a reference nucleotide within a respective nucleotide sequence, can be directly, i.e. covalently, linked to the 3’ end of said reference nucleotide, which is referred to herein as “directly adjacent”. A nucleotide which is located upstream of a reference nucleotide within a respective nucleotide sequence, can be directly, i.e. covalently linked to the 5’ end of said reference nucleotide, which is also referred to herein as “directly adjacent”.
The polynucleotide, preferably the DNA polynucleotide, according to the present invention comprises a T7 promoter sequence comprising a T7 core promoter sequence and a directly adjacent downstream flanking region. The terms “promoter” and “promoter sequence” are used interchangeably herein and refer to a nucleotide sequence required for initiation of transcription of a target nucleotide sequence by DNA-binding enzymes such as RNA polymerase, transcription factors and/or epigenetic modifiers, e.g. histone acetylases, histone methylases or DNA-methylases. As the promoter is a T7 bacteriophage (Escherichia virus T7) promoter, the respective enzyme(s) binding to it preferably comprise a T7 RNA polymerase. The T7 promoter and the T7 RNA polymerase, as used herein, are derived from a T7 bacteriophage and thus, may comprise the sequence of the naturally occurring T7 bacteriophage core promoter (e.g. SEQ ID NO:l) and RNA polymerase, respectively, and/or may comprise modifications compared to their respective naturally occurring sequence. The T7 RNA polymerase may also be a recombinant protein and/or comprise altered post-translational modifications.
Herein, the term “T7 promoter” is used interchangeably with the term “T7 promoter sequence” and refers to a nucleotide sequence comprising a transcription start site (TSS) which is denoted as position +1. Positions downstream the TSS are denoted herein with a positive sign (“+”) and positions upstream of the TSS with a negative sign (“-”). The T7 promoter sequence comprises at least a T7 core promoter sequence and a directly adjacent downstream flanking region that is directly, i.e. covalently, linked to the 3’ end of said T7 core promoter sequence.
The term “T7 core promoter sequence”, herein also referred to as “T7 core promoter”, refers to the nucleotide sequence upstream of the TSS, more specifically to the nucleotide sequence from positions -1 to -17 and has the nucleotide sequence TAATACGACTCACTATA (SEQ ID NO:l).
Herein, the downstream flanking region comprises eight positions (positions +1 to +8 of the T7 promoter sequence) that have a nucleotide sequence composed according to certain rules. In particular, the downstream flanking region comprises
(three guanines at positions +1 to +3},
(an adenine or thymine at position +4},
{ [at least two adenines at positions +4 to +8] and/or
[a thymine at position +4] }, and
(at most one cytosine at positions +4 to +8}.
Thus, the TSS is preferably a G nucleotide. Furthermore, also the two nucleotides at positions +2 and +3 are preferably G nucleotides.
Surprisingly, it was found that the number and concrete position of certain nucleotides at positions +4 to +8 influence the activity of the T7 promoter. In particular, it was found that at least three adenines or thymines at positions +4 to +8 are advantageous for enhancing the activity of the T7 promoter. It was further found that, in particular, the nucleotides at positions +4 to +7 and especially at position +4 have a strong impact on the performance of the T7 promoter. The absence of a cytosine is preferred within positions +4 to +7, in particular within positions +4 to +6 and/or in particular when the cytosine directly follows a thymine and/or directly precedes a guanine. It was further found that sequences of covalently linked thymines (e.g. TT, TTT or TTTT) have a negative effect on the T7 promoter strength, in particular when three or more consecutive thymines are present.
Furthermore, it was surprisingly found that the combination of rules can result in an increased effect compared to sequences according to only one of the respective rules. For example, a negative effect of sequences of covalently linked thymines on T7 promoter activity is reduced in the absence of a guanine or cytosine at positions +4 to +8 or the presence of an adenine at position +4. As another example, a negative effect of a cytosine within positions +4 to +6 on T7 promoter activity is reduced when at least three adenines are present at positions +4 to +8. As a further example, a guanine at position +5 has a positive effect on T7 promoter activity in combination with an adenine or thymine at position +4 and/or an adenine at position +6. Hence, GGGAGA and GGGTGA at positions +1 to +6 can have positive effects on T7 promoter activity despite the presence of a guanine within positions +4 to +8.
In certain embodiments, the downstream flanking region further comprises at least two adenines at positions +4 to +8.
In certain embodiments, the downstream flanking region further comprises
{ [at least three adenines or thymines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6] }.
In certain embodiments, the downstream flanking region does not comprise three consecutive thymines.
In certain embodiments, the downstream flanking region does not comprise three consecutive thymines within positions +4 to +8.
In certain embodiments, the downstream flanking region does not comprise
{two consecutive thymines within positions +4 and +7, when
[a cytosine or guanine is present at positions +4 to +8], or
[a thymine, cytosine or guanine is present at position +4] } .
In certain embodiments, the downstream flanking region does not comprise two consecutive thymines within positions +4 and +7.
In certain embodiments, the downstream flanking region does not comprise a thymine followed by a cytosine within positions +4 and +7.
In certain embodiments, the downstream flanking region does not comprise a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8.
In certain embodiments, the downstream flanking region does not comprise a cytosine within positions +4 to +6.
In certain embodiments, the downstream flanking region does not comprise a cytosine followed by a guanine within positions +4 to +7.
In preferred embodiments, the downstream flanking region comprises { [at least three adenines or thymines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6] }, and said downstream flanking region does not comprise { [three consecutive thymines within positions +4 to +8], and/or [two consecutive thymines within positions +4 and +7 when
<a cytosine or guanine is present at positions +4 to +8> or <a thymine, cytosine or guanine is present at position +4> ] },
{a thymine followed by a cytosine within positions +4 and +7},
{a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8, and
{a cytosine followed by a guanine within positions +4 to +7}.
In certain embodiments, the downstream flanking region further comprises an adenine at position +4.
In certain embodiments, the downstream flanking region has the sequence GGGAGAGT.
In certain embodiments, the downstream flanking region further comprises at most one guanine or cytosine at positions +4 to +8.
It was surprisingly found that an adenine at position +4 improves the T7 promoter strength compared to any other nucleotide at this position. It was further found, that downstream sequences are advantageous that do not comprise a guanine within positions +4 to +8 except in the case the nucleotide is covalently linked to an adenine in the context of an AGA (or TGA) nucleotide sequence within positions +4 to +6. Thus, more than one guanine or cytosine within positions +4 to +8 is preferably avoided for increasing the activity of the T7 promoter.
Hence, in preferred embodiments, the downstream flanking region comprises
{ [at least three adenines or thymines at positions +4 to +8] and/or [a guanine followed by an adenine within positions +5 and +6], and said downstream flanking region does not comprise
{ [three consecutive thymines within positions +4 to +8], and/or
[two consecutive thymines within positions +4 and +7 when
<a cytosine or guanine is present at positions +4 to +8> or <a thymine, cytosine or guanine is present at position +4> ] },
(a thymine followed by a cytosine within positions +4 and +7},
(a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8}, and
(a cytosine followed by a guanine within positions +4 to +7}; wherein the downstream flanking region further comprises (an adenine at position +4} and (at most one guanine or cytosine at positions +4 to +8}.
Preferably, the downstream flanking region is selected from the group consisting of GGGAAATA, GGGAAAAT, GGGAATAT, GGGATAAT, GGGAGAAT, GGGAATAC, GGGAAGTA, GGGAGATT, GGGAGATA, GGGAAATG, GGGAAAAC and GGGAAAGT.
It was further surprisingly found that position +8 has the smallest effect on the T7 promoter activity among positions +4 to +8.
Thus, in certain embodiments, the downstream flanking region is selected from the group consisting of GGGAAATN, GGGAAAAN, GGGAATAN, GGGATAAN, GGGAGAAN, GGGAATAN, GGGAAGTN, GGGAGATN, GGGAGATN, GGGAAATN, GGGAAAAN and GGGAAAGN; wherein N denotes A, C, G, T or U.
In other preferred embodiments,
{the downstream flanking region comprises three guanines at positions +1 to +3},
{an adenine at position +4},
{at least two adenines at positions +4 to +8}, and { [at least three adenines or thymines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6]; and does not comprise
{more than one guanine or cytosine at positions +4 to +8},
{ [three consecutive thymines within positions +4 to +8] and/or [two consecutive thymines within positions +4 and +7, when
<a cytosine or guanine is present at positions +4 to +8> or
<a thymine, cytosine or guanine is present at position +4> ] },
{a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8}, {and a cytosine followed by a guanine within positions +4 to +7}.
In other preferred embodiments, the downstream flanking region comprises
{three guanines at positions +1 to +3},
{an adenine at position +4},
{ [at least three adenines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6] },
{at most one guanine or cytosine at positions +4 to +8},
{at most two consecutive thymines within positions +4 and +8},
{no consecutive thymines within positions +4 and +7} and {no cytosine within positions +4 to +6}.
In certain embodiments, the downstream flanking region does not comprise a thymine at position +5 followed by a guanine at position +6.
In certain embodiments, the downstream flanking region does not comprise two consecutive thymines within positions +4 to +8.
In certain embodiments, the downstream flanking region does not comprise {a thymine at position +5 followed by a guanine at position +6} and {two consecutive thymines within positions +4 to +8}.
It was further surprisingly found that already two consecutive thymines within positions +4 to +8 might have a slightly negative effect on the activity of the T7 promoter in some nucleotide sequence compositions and that also a thymine-guanine doublet (TG) at positions +5 and +6 is preferably avoided for increasing T7 promoter activity.
Hence, in preferred embodiments, the downstream flanking region comprises
{ [at least three adenines or thymines at positions +4 to +8 and/or
[a guanine followed by an adenine within positions +5 and +6], and said downstream flanking region does not comprise
{ [three consecutive thymines within positions +4 to +8], and/or
[two consecutive thymines within positions +4 and +7 when
<a cytosine or guanine is present at positions +4 to +8> or
<a thymine, cytosine or guanine is present at position +4> ] },
{a thymine followed by a cytosine within positions +4 and +7},
{a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8},
{a cytosine followed by a guanine within positions +4 to +7}; wherein the downstream flanking region further comprises
{an adenine at position +4 and at most one guanine or cytosine at positions +4 to +8}; and wherein the downstream flanking region does not comprise {a thymine at position +5 followed by a guanine at position +6}, and
{two consecutive thymines within positions +4 to +8}.
Preferably, the downstream flanking region is selected from the group consisting of GGGAAATA, GGGAAAAT, GGGAATAT, GGGATAAT, GGGAGAAT, GGGAATAC and GGGAAGTA. In certain embodiments, the downstream flanking region is selected from the group consisting of GGGAAATN, GGGAAAAN, GGGAATAN, GGGATAAN, GGGAGAAN, GGGAATAN and GGGAAGTN; wherein N denotes A, C, G, T or U.
In other preferred embodiments, the downstream flanking region comprises {three guanines at positions +1 to +3 },
{an adenine at position +4},
{at least two adenines at positions +4 to +8}, and { [at least three adenines or thymines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6]; and does not comprise
{more than one guanine or cytosine at positions +4 to +8},
{two consecutive thymines within positions +4 to +8},
(a thymine at position +5 followed by a guanine at position +6}, and
(a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8}. In further preferred embodiments, the downstream flanking region comprises (three guanines at positions +1 to +3},
(an adenine at position +4},
{ [at least three adenines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6], (at most one guanine or cytosine at positions +4 to +8},
(no consecutive thymines within positions +4 and +8},
(no thymine at position +5 followed by a guanine at position +6}, and (no cytosine within positions +4 to +6}.
In certain embodiments, the downstream flanking region further comprises an adenine at position +4, three to four adenines at positions +4 to +8, and no guanines or cytosines at positions +4 to +8.
It was further surprisingly found that the complete absence of guanines and cytosines and consecutive thymines within positions +4 to +8 has a particularly strong positive effect on the transcription activity of the T7 promoter. It was further surprising that the presence of one or two thymines at positions +4 to +8 is preferred over the presence of only adenines.
Hence, in preferred embodiments, the downstream flanking region comprises
{ [at least three adenines or thymines at positions +4 to +8] and/or
[a guanine followed by an adenine within positions +5 and +6], and said downstream flanking region does not comprise
{ [three consecutive thymines within positions +4 to +8], and/or
[two consecutive thymines within positions +4 and +7 when
<a cytosine or guanine is present at positions +4 to +8> or <a thymine, cytosine or guanine is present at position +4> ] },
{a thymine followed by a cytosine within positions +4 and +7},
{a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8}, and
{a cytosine followed by a guanine within positions +4 to +7}; wherein the downstream flanking region further comprises {an adenine at position +4} and
{at most one guanine or cytosine at positions +4 to +8}; wherein the downstream flanking region does not comprise
{a thymine at position +5 followed by a guanine at position +6} and
{two consecutive thymines within positions +4 to +8}; wherein the downstream flanking region further comprises
{an adenine at position +4},
{three to four adenines at positions +4 to +8}, and {no guanines or cytosines at positions +4 to +8}, preferably wherein the downstream flanking region has the sequence GGGATAAT.
In particularly preferred embodiments, the downstream flanking region has a sequence selected from the group consisting of GGGADDDN, wherein D denotes A, G or T/U, and wherein N denotes A, C, G, T or U. In even more preferred embodiments, the sequence of the downstream flanking region is selected from the group consisting of GGGADDWH, preferably from GGGAWWWW, wherein D denotes an A, G or T/U, wherein W denotes A or T/U, and wherein H denotes A, C or T/U.
In further preferred embodiments, the downstream flanking region comprises three guanines at positions +1 to +3, an adenine at position +4, three to four adenines at positions +4 to +8, no
guanine or cytosine at positions +4 to +8, and no consecutive thymines within positions +4 and +8.
In particularly preferred embodiments, the downstream flanking region has the sequence GGGAAATA (see the T7 promoter sequence as set forth in SEQ ID NO:21), GGGAAAAT (see the T7 promoter sequence as set forth in SEQ ID NO:23), GGAATAT (see the T7 promoter sequence as set forth in SEQ ID NO:25), GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17), or GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19). Preferably the downstream flanking region has the sequence GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17) or GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19), more preferably GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17).
In certain embodiments, the downstream flanking region is selected from the group consisting of GGGAAATN, GGGAAAAN, GGAATAN, GGGATAAN, or GGGAGAGN; wherein N denotes A, C, G, T or U.
Furthermore, when the DNA polynucleotide of the invention is used as a primer, i.e. a reverse transcription primer, e.g. for RNA sequencing, the length of the primer may be as small as possible. In particular, if the T7 promoter of the invention is used for improving a pre-existing protocol, it may be preferable to modify the pre-existing protocol as little as possible while profiting from the enhanced T7 promoter activity according to the invention. For example, for improving a CEL-Seq2 based protocol by replacing the T7 promoter, the downstream flanking region of the inventive T7 promoter may overlap with the sequencing adapter further downstream thereof. In other words, the pre-existing sequencing adapter may be part of the downstream flanking region, i.e. at positions +6 to +8 or +7 to +8.
Thus, in certain embodiments, the downstream flanking region further comprises the sequence AGT at positions +6 to +8, or the sequence AG at positions +7 to +8, preferably AGT at positions +6 to +8. In particular, this rule may be combined with other rules for the composition of the downstream flanking region, as provided herein, as far as there is no conflict. T7 promoters with such a downstream flanking region (i.e. with AGT at positions +6 to +8) may be particularly useful in some embodiments of the inventive sequencing methods provided herein.
Thus, in one embodiment, the downstream flanking region has the sequence GGGAAAGT, GGGTAAGT, GGGATAGT, GGGTGAGT, or GGGAGAGT, preferably GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19).
In one further embodiment, the downstream flanking region has the sequence GGGAAAAG, GGGAATAG, GGGATAAG, GGGTGAAG, GGGTAAAG, GGGAGAAG, GGGTGTAG, GGGAGTAG, GGGATTAG.
Since a high-ranking 5’RACE-seq position (see Table 1) indicates a strong T7 promoter activity, and overlap with the CEL-Seq2 sequencing adapter downstream of the T7 promoter may be advantageous for certain sequencing applications, both a very high rank in Table 1, and a high rank in combination with an overlap with said sequencing adapter, may be particularly useful for said applications.
Thus, in one embodiment, the downstream flanking region has the sequence GGGAAATA (see the T7 promoter sequence as set forth in SEQ ID NO:21), GGGAAAAT (see the T7 promoter sequence as set forth in SEQ ID NO:23), GGAATAT (see the T7 promoter sequence as set forth in SEQ ID NO:25), GGGATAAT (see the T7 promoter sequence as set forth in SEQ ID NO: 17), GGGAGAGT (see the T7 promoter sequence as set forth in SEQ ID NO: 19), GGGAAAGT, GGGTAAGT, GGGATAGT or GGGTGAGT, preferably GGGAGAGT.
However, in certain embodiments, the T7 promoter of the invention is selected primarily based on the 5’RACE-seq rank (Table 1).
Thus, in certain embodiments, the sequence of the downstream flanking region is selected from the group consisting of the “Pos. +1 to +8” sequences in Table 1 of ranks 1 to 510, ranks 1 to 176, ranks 1 to 132 or ranks 1 to 66, preferably ranks 1 to 66.
Thus, in certain embodiments, the sequence of the downstream flanking region is selected from the group consisting of the “Pos. +1 to +8” sequences in Table 1 of ranks 1 to 64, ranks 1 to 48, ranks 1 to 32, ranks 1 to 24, or ranks 1 to 12.
Moreover, in certain embodiments the sequence of the downstream flanking region does not have the sequence GGGGTTCA, GGGAGTTC or GGGATACC. In some embodiments, the sequence of the downstream flanking region does not have the sequence GGGTAGAT.
It was furthermore surprisingly found that certain nucleotide sequences comprising AATT, in particular GAATT, directly adjacent upstream of the T7 core promoter, further enhance the T7 promoter strength, especially in combination with a downstream flanking region according to
the present invention. Further increasing the strength of an optimized T7 promoter sequence comprising a downstream flanking region as shown above by combining it with an upstream flanking region according to the present invention is particularly advantageous for approaches based on scarce and/or precious raw material (e.g. liquid biopsies) such as in case of in vitro transcription (IVT) when the concentration of the IVT template is low, e.g. very low as described herein, and/or analyses are performed, for example, on a single-cell level, or on less than 100 cells, preferably less than 10 cells, very preferably a single cell.
Herein, the term “upstream flanking region” refers to the sequence directly adjacent upstream of the T7 core promoter sequence and thus, covalently bound to its 5’ end. The upstream flanking region comprises 4 or 5 nucleotides, preferably 5 nucleotides.
In preferred embodiments, the upstream flanking region has the sequence AATT, GAATT, GATTT or GAAAT, preferably AATT or GAATT, even more preferably GAATT (see e.g. the T7 promoter sequences as set forth in SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, and SEQ ID NO:26).
In some embodiments, the T7 promoter further comprises four bases consisting of adenines and/or thymines directly upstream of the T7 core promoter. Preferably, it further comprises a guanine directly upstream of said adenines and/or thymines.
In some embodiments, the T7 promoter comprises a downstream flanking region of the invention and an upstream flanking region of the invention.
In preferred embodiments, the T7 promoter further comprises the sequence AATT directly upstream of the T7 core promoter sequence.
In particularly preferred embodiments, the T7 promoter further comprises the sequence GAATT directly upstream of the T7 core promoter sequence.
The DNA polynucleotide according to the present invention can further comprise downstream of the T7 core promoter sequence a nucleotide sequence encoding an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA or CRISPRRNA.
In the context of the present invention, the terms “encoding” or “encode” refer to genetic information comprised in a DNA sequence that can be transcribed into an RNA molecule, i.e. that can be used as a template for synthesis of an RNA molecule.
In a double-stranded DNA polynucleotide, the term “nucleotide sequence” preferably refers to the strand that comprises the coding strand. As used herein, a coding strand comprises a sequence identical to the RNA it encodes. As used herein, a sequence is identical to another sequence if each nucleotide at each position is the same, except that a thymine can be in place of an uracil and an uracil can be in place of a thymine according to the nucleotides comprised in DNA and RNA, respectively.
The encoded RNA can be an in-vitro transcribed RNA according to the invention, in particular an mRNA or a non-coding RNA.
Herein, the terms “mRNA molecule” and “mRNA” are used interchangeably and refer to a class of RNA molecules, preferably single-stranded sequences built up of A, C, G, and U nucleotides that contain one or more coding sequences. Said one or more coding sequences can be used for example as a translation template during synthesis of an amino acid sequence and thus, for converting genomic information stored in an mRNA molecule into an amino acid sequence. In other words, the term “mRNA” should be understood to mean any RNA molecule which is suitable for the expression of an amino acid sequence or which is translatable into an amino acid sequence such as a protein.
More specifically, an mRNA comprises preferably a 5’ UTR, a coding sequence and a 3’ UTR. The 5’ end of the 5’ UTR is defined by the TSS and its 3’ end is followed by the coding sequence. The coding sequence is delineated by the start and the stop codon, i.e. the first and the last three nucleotides of the mRNA that can be translated, respectively. The 3’ UTR starts after the stop codon of the coding sequence and can be followed by a poly A tail.
The 5’ UTR generally comprises at least one ribosomal binding site (RBS) such as the Shine- Dalgarno sequence in prokaryotes and the Kozak sequence or translation initiation site in eukaryotes. RBS promote efficient and accurate translation of an mRNA molecule by recruiting ribosomes during initiation of translation. The activity of a given RBS can be optimized by varying its length and sequence as well as its distance to the start codon. Alternatively or optionally, the 5’ UTR comprises internal ribosome entry sites or IRES. Furthermore, a 5’ UTR can comprise one or more additional regulatory sequences such as a binding site for an amino
acid sequence that enhances the stability of the mRNA, a binding site for an amino acid sequence that enhances the translation of the mRNA, a regulatory element such as a riboswitch, a binding site for a regulatory RNA molecule such as an miRNA, and/or a nucleotide sequence that positively affects initiation of translation. Furthermore, within the 5’ UTR there are preferably no functional upstream open reading frames, out-of-frame upstream translation initiation sites, out-of-frame upstream start codons and/or nucleotide sequences giving rise to a secondary structure that reduces or prevents translation. The presence of such nucleotide sequences in the 5’ UTR can have a negative effect on translation.
The mRNA may optionally comprise a 3’ UTR. The 3’ UTR may comprise one or more regulatory sequences such as a binding site for an amino acid sequence that enhances the stability of the mRNA molecule, a binding site for a regulatory RNA molecule such as a miRNA, and/or a signal sequence involved in intracellular transport of the mRNA.
The coding sequence of an mRNA comprises codons that can be translated into an amino acid sequence. The coding sequence can contain the codons of a naturally occurring coding sequence or it can be a partially or completely synthetic coding sequence. Alternatively, the coding sequence can be a partly or fully codon optimized sequence derived from the natural sequence to be used. Most of the amino acids are encoded by more than one codon, i.e. three consecutive nucleotides of an mRNA that can be translated into an amino acid. Codons exist that are used preferentially in some species for a given amino acid. The presence of more often occurring codons can enhance the amount of amino acid sequences translated based on a given mRNA molecule compared to the same mRNA molecule but comprising comparably rare codons. Hence, it is advantageous to species specific adapt the codons in a given coding sequence by avoiding rare codons and enhancing the occurrence of abundant codons for a given amino acid to improve the translation of said mRNA.
As regards the function of the encoded amino acid sequence, there is no limitation and possible amino acid sequences to be encoded by said mRNA. Herein, the term “amino acid sequence” encompasses any kind of amino acid sequence, i.e. chains of two or more amino acids which are each linked via peptide bonds and refers to any amino acid sequence of interest. Preferably, the encoded amino acid sequence is at least 5 amino acids long, more preferably at least 10 amino acids, even more preferably at least 50, 100, 200 or 500 amino acids. In other words, the term “amino acid sequence” covers short peptides, oligopeptides, polypeptides, fusion proteins, proteins as well as fragments thereof, such as parts of known proteins, preferably functional parts. These can, for example be biologically active parts of a protein or antigenic parts such as
epitopes which can be effective in raising antibodies. Preferably, the function of the encoded amino acid sequence in the cell or in the vicinity of the cell is needed or beneficial, e.g. an amino acid sequence the lack or defective form of which is a trigger for a disease or an illness, the provision of which can moderate or prevent a disease or an illness, or an amino acid sequence which can promote a process which is beneficial for the body, in a cell or its vicinity. The encoded amino acid sequence can be the complete amino acid sequence or a functional variant thereof. Further, the encoded amino acid sequence can act as a factor, inducer, regulator, stimulator or enzyme, or a functional fragment thereof, where this amino acid sequence is one whose function is necessary in order to remedy a disorder, in particular a metabolic disorder or in order to initiate processes in vivo such as the formation of new blood vessels, tissues, etc. Here, functional variant is understood to mean a fragment which in the cell can undertake the function of the amino acid sequence whose function in the cell is needed or the lack or defective form whereof is pathogenic. Preferably, such an amino acid sequence is advantageous with respect to applications in supplemental or medical purposes to generate or regenerate physiological functions caused by suboptimal amino acid sequence biosynthesis and thus also to favorably influence directly or indirectly the course of diseases.
A preferred coding sequence downstream of the T7 core promoter encodes a transposase or a genome editing enzyme.
A transposase refers to an enzyme that binds to the end of a transposon and catalyzes its movement to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism. Preferred transposases include Sleeping Beauty transposases and PiggyBac transposases.
A genome editing enzyme, as used herein, refers to an enzyme which recognizes a specific nucleotide sequence, preferably a DNA sequence, and cuts the nucleotide sequence at or nearby the recognition position. Some genome editing enzymes, in particular CRISPR associated proteins, such as, but not limited to Cas9, require a polynucleotide, for example, a CRISPR RNA or guide RNA, which guides them to the specific target sequence. Some genome editing enzymes, in particular ZNFs and TALENs recognize the specific target sequence by themselves.
Preferred genome editing enzymes include zinc finger nucleases (ZFNs), transcription activator-like effector-based nucleases (TALENs), and CRISPR-associated proteins such as Cas9, Casl2b or Cpfl, preferably Cas9. A CRISPR-associated protein is an enzyme that uses
CRISPR sequences as a guide to recognize and cleave specific strands of DNA that are complementary to the CRISPR sequence. Zinc-finger nucleases (ZFNs) are artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain. Zinc finger domains can target specific desired DNA sequences, which enables zinc-finger nucleases to target unique sequences even within complex genomes. The DNA-binding domains of individual ZFNs typically contain between three and six individual zinc finger repeats and can each recognize between 9 and 18 base pairs. The non-specific cleavage domain from the type IIs restriction endonuclease Fokl is typically used as the cleavage domain in ZFNs. Transcription activator-like effector nucleases (TALEN) are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA- binding domain to a DNA cleavage domain (a nuclease which cuts DNA strands). Transcription activator-like effectors (TALEs) can be engineered to bind to practically any desired DNA sequence, so when combined with a nuclease, DNA can be cut at specific locations. The DNA binding domain contains a repeated highly conserved 33-34 amino acid sequence with divergent 12th and 13th amino acids. These two positions, referred to as the Repeat Variable Diresidue (RVD), are highly variable and show a strong correlation with specific nucleotide recognition.
Genetic information comprised in a DNA sequence that can also be transcribed into non-mRNA molecules such as non-coding RNA molecules. Herein, the term “non-coding RNA” refers preferably to regulatory RNA such as microRNA (miRNA), small-interfering RNA (siRNA), antisense RNA, CRISPR RNA (crRNA), single guide RNA (sgRNA), guide RNA (gRNA), antagoNAT, or a precursor thereof such as pri-miRNA, pre-miRNA or short-hairpin RNA (shRNA).
A microRNA refers to a small non-coding RNA molecule (comprising about 22 nucleotides) that functions in RNA silencing and post-transcriptional regulation of gene expression. miRNAs function via base-pairing with complementary sequences within mRNA molecules. As a result, these mRNA molecules cleaved, destabilized by shortening the polyA tail of the mRNA and/or less efficiently translated into proteins by ribosomes. Hence, microRNA molecules can reduce the amount of protein produced by a respective mRNA.
Small interfering RNA (siRNA), also known as short interfering RNA or silencing RNA refer to double-stranded RNA molecules of 20 to 25 nucleotides in length. siRNA molecules are involved in the degrading mRNA after transcription, thus preventing their translation into proteins.
An antisense RNA, as used herein, refers to a single stranded RNA that is complementary to another polynucleotide. Preferably, an antisense RNA is a non-coding RNA. For example, an antisense RNA can hybridize to the complementary polynucleotide. Preferably, the complementary polynucleotide is an RNA, preferably an mRNA. In preferred embodiments, the antisense RNA is used for inhibiting the translation of a complementary mRNA into protein by hybridization of the two RNAs. Preferably, the inhibition occurs within a living cell.
In other preferred embodiments, the antisense RNA is used for detecting a further polynucleotide by hybridization of the RNA to the polynucleotide. Preferably, the polynucleotide to be detected is an RNA, preferably an mRNA. The hybridization can occur outside or within a cell. Preferably, said cell is fixed. An antisense RNA which is used for detecting another polynucleotide is also termed “probe”.
In preferred embodiments, the antisense RNA is an amplified antisense RNA which is generated in the linear amplification method according to the invention. The amplified antisense RNA (aRNA) may be transcribed from a double-stranded DNA. Preferably, the amplified antisense RNA is transcribed from a cDNA. Preferably, the cDNA has been reverse-transcribed from an RNA, preferably an mRNA, by using the DNA polynucleotide of the invention as reverse transcription primer.
As used herein, the term “aRNA” refers to an amplified antisense RNA. An antagoNAT refers to a single stranded oligonucleotide which can inhibit a naturally occurring antisense RNA and thus, can be used for increasing gene expression, in particular by antagonizing said antisense RNA.
Herein, the term “CRISPR” refers to a “clustered regularly interspaced short palindromic repeats” sequence, and in particular to a CRISPR RNA (crRNA) and a single-guide RNA (sgRNA). A crRNA comprises a guide RNA which is an approximately 20 nucleotides long sequence and which guides genome editing enzymes such as CRISPR associated proteins to the respective nucleotide sequence to be edited which is preferably comprised in a double-stranded genomic DNA molecule. Some CRISPR associated proteins further require a trans-activating CRISPR RNA (tracrRNA), which can be provided as a single guide RNA (sgRNA) and thus a fusion product comprising at least one crRNA and a trans-activating CRISPR RNA (tracrRNA).
The DNA polynucleotide according to the present invention can comprise downstream of the T7 core promoter sequence a nucleotide sequence encoding an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA or CRISPR RNA, wherein the said nucleotide sequence is not, partially, or completely overlapping with the downstream flanking region directly adjacent to the T7 core promoter sequence. Preferably, said nucleotide sequence is not or partially overlapping with the downstream flanking region directly adjacent to the T7 core promoter sequence.
The DNA polynucleotide according to the present invention can further comprise a polyT- stretch, wherein the polyT-stretch is a nucleotide sequence of 9 to 35 consecutive thymines, preferably 22 to 27 consecutive thymines, more preferably 24 consecutive thymines, preferably followed by an adenine, cytosine or guanine, preferably at the 3’ end of the DNA polynucleotide.
Hence, the term “polyT-stretch” refers to a nucleotide sequence consisting of consecutive, i.e. covalently linked thymines. Said polyT-stretch is especially advantageous for using the DNA polynucleotide as a reverse transcription primer for reverse transcription of an RNA, e.g. an mRNA, into a cDNA. A “primer”, as used herein, refers to a poly- or oligonucleotide which comprises a sequence complementary to a sequence comprised in another poly- or oligonucleotide. Hence, a primer can be used for example for PCR, reverse transcription and/or the synthesis of the complementary strand of a DNA molecule.
In some embodiments, the DNA polynucleotide according to the present invention comprises downstream of the T7 core promoter sequence a polyT-stretch. Embodiments comprising the DNA polynucleotide and a polyT-stretch are especially advantageous for use as a reverse transcription primer.
The DNA polynucleotide according to the present invention can further comprise a polyT, and a sequencing adapter, wherein the sequencing adapter is a nucleotide sequence comprising a nucleotide sequence comprised in a library preparation primer, wherein the library preparation primer comprises a flow cell binding sequence, which is complementary to an oligo- or polynucleotide present at the surface of the flow cell of a next-generation-sequencing device,
and wherein the sequencing adapter does not comprise more than eight consecutive thymines and/or only thymines.
A sequencing adapter, as used herein, refers to a polynucleotide which functions as a binding sequence for a polymerase chain reaction (PCR) primer and/or library preparation primer. In the context of the present invention, a library preparation primer (LPP) comprises a flow cell binding sequence (FCBS) and a sequence which is identical or highly similar to a sequence comprised in the sequencing adapter, such that the LPP can bind to the complementary sequence of the sequencing adapter. A flow cell binding site, as used herein, refers to an oligo- or polynucleotide which binds to a complementary sequence immobilized at the surface of a flow cell. A flow cell is a part of a sequencing device that can be loaded with the material to be sequenced, wherein a sequencing device refers to a device for determining the sequence of a polynucleotide. Preferably, the sequencing adapters are sequencing adapters used in next- generation sequencing methods comprising library preparation methods and in particular Illumina library preparation protocols. Hence, a preferred sequencing adapter sequence is a sequence comprised in an Illumina library preparation primer such as PEI, PE2 or PE2-N6 and suitable flow cell binding sequences are for example comprised in Illumina P5 and P7 oligo- or polynucleotides.
As regards an oligonucleotide as used herein, the same applies as it has been described in the context of a polynucleotide, except its length. Whereas a polynucleotide can comprise more than two, preferably 24 or more, or even up to 250 nucleotides, an oligonucleotide is to be understood as a sequence of two to 500, preferably 2 to 250 covalently linked nucleotides. Moreover, also the other features of such an oligonucleotide can be as described above.
In some embodiments, the DNA polynucleotide according to the present invention comprises a T7 promoter sequence and a sequencing adapter. This is especially advantageous for use of the DNA polynucleotide as a reverse transcription primer in approaches that comprise a sequencing step following reverse transcription of a nucleotide sequence encoding for example an mRNA, sgRNA, guide RNA or CRISPR RNA. The sequencing adapter can optionally at least partially overlap with the T7 promoter sequence and/or the polyT-stretch. Preferably, the sequencing adapter is directly adjacent downstream of the T7 promoter sequence.
In some embodiments, the DNA polynucleotide according to the present invention further comprises a barcode and/or a unique molecular identifier (UMI). A barcode, as used herein, refers to an oligonucleotide which can be used to uniquely label e.g. an RNA of a specific
sample, in particular of one specific cell. An UMI refers herein to an oligonucleotide which can be used to specifically label one specific RNA molecule.
In particularly preferred embodiments of the present invention, the DNA polynucleotide comprises a sequencing adapter downstream of the T7 core promoter and a polyT-stretch downstream of the sequencing adapter and not overlapping with the sequencing adapter.
In another aspect, the present invention relates to a method for in vitro transcribing (IVT) RNA, said method comprising the steps of (a) providing a DNA polynucleotide according to the present invention as an IVT template and downstream thereof a nucleotide sequence encoding the RNA to be transcribed, and (b) in vitro transcribing said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
In the context of the present invention, IVT refers to a process, wherein RNA is produced in vitro and thus, in a system that is neither a cell nor comprised in a cell. The term “cell” refers to a living organism or a unit of a multicellular organism or an explant thereof which can preferably be cultured. In vitro transcription requires an IVT template comprising a promoter and a template sequence downstream thereof, as well as ribonucleotide triphosphates and an RNA polymerase.
IVT can be used for linear amplification of a polynucleotide. Herein, the term “linear amplification” refers to increasing the amount of a polynucleotide over time in any relationship which resembles more a linear relationship than an exponential relationship at least for a certain period of time, for example 1 min, 10 min, 0.5 h, 1 h, 5 h, 10 h or 1 day. Linear amplification, as used herein, is advantageous over exponential amplification methods such as PCRfor certain applications, for example, but not limited to single-cell sequencing, in particular by reducing the amplification bias. Reducing an amplification bias is especially important when the starting material is scarce, such as the material obtained from a few cells or a single cell. Hence, in a preferred embodiment, the in vitro transcribed RNA is linearly amplified during IVT.
The term linear amplification of a polynucleotide, as used herein, comprises the amplification of the polynucleotide as RNA or DNA having the identical and/or complementary sequence of said polynucleotide.
For in vitro transcribing (IVT) RNA, the DNA polynucleotide of the present invention is used as the IVT template. As regards the IVT template the same applies as it has been described above in connection with the DNA polynucleotide according to the present invention. Moreover, also the other features of such an IVT template can be as described above.
Hence, the DNA polynucleotide provided in step (a) comprises a T7 promoter sequence comprising a T7 core promoter sequence and a directly adjacent downstream flanking as defined above and a nucleotide sequence encoding the RNA to be transcribed, wherein said RNA is preferably an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA and/or CRISPR RNA.
In some embodiments, the nucleotide sequence encoding the RNA to be transcribed is an mRNA that encodes a transposase or a gene editing enzyme. Preferably, the transposase is a Sleeping Beauty transposase. Preferably, the genome editing enzyme is a CRISPR enzyme, for example, Cas9, Casl2b or Cpfl.
In some embodiments, the DNA polynucleotide which is used as an IVT template is particularly suitable for large-scale synthesis of RNA in vitro.
The IVT template is preferably linear, doubled-stranded and/or purified, even more preferably double-stranded, linearized and purified. In some embodiments, the DNA nucleotide of the invention is comprised in a plasmid such as a cloning and/or an expression vector. In this case, a linear IVT template can be obtained, for example, by linearizing a DNA plasmid comprising the IVT template by cutting the DNA plasmid using a restriction enzyme. A plasmid or vector, as used herein, refers to a double-stranded DNA molecule which can replicate within a bacterial host, preferably an E. coli strain. A plasmid or vector preferably comprises an origin of replication, a selection marker, such as a gene conferring resistance to an antibiotic, a cloning site such as restriction enzyme site, a recombination region, a promoter, an enhancer, a transcription termination site, a ribosomal binding site, a Shine Dalgarno and/or a Kozak sequence. Suitable plasmids as backbone are, for example, but not limited to pT7 FLAG (Merck), pT7 MAT (Merck), Gateway™ pDEST™ (Thermo Fisher) and pRSET (Thermo Fisher) of derivatives thereof. Furthermore, purification of the preferably linear and/or doubled- stranded IVT template is preferably performed using a spin-column such as a Monarch® PCR & DNA Cleanup Kit.
In step (b), in vitro transcribing the IVT template is performed in the presence of ribonucleotide triphosphates which are preferably adenosine, uridine, cytidine and guanosine triphosphates or modified versions thereof. Furthermore, the T7 polymerase is preferably a T7 polymerase that is capable of binding to the T7 promoter.
In vitro transcribing the IVT template comprises using the IVT template as a template during transcription, i.e. the synthesis of an RNA molecule. In this case, said RNA has a nucleotide sequence that is identical to the sequence of the IVT template (identical to the coding strand and complementary to the template strand of the IVT template) except that instead of T nucleotides U nucleotides are comprised in the synthesized RNA molecule. Contrary, in case of reverse transcription an RNA is used as a template for the synthesis of a cDNA whereof the first strand has a sequence complementary to the sequence of the RNA and the second strand a sequence which is identical to the RNA except that instead of U nucleotides T nucleotides are comprised in the synthesized cDNA molecule.
In some embodiments, the DNA polynucleotide which is used as an IVT template is present in a high concentration in step (b), wherein the term “a high concentration” refers to a concentration of more than 30 ng/mΐ, preferably of more than 40 ng/pl.
As used herein, i.e. in the context of a polynucleotide, IVT template and/or target polynucleotide, the term “very low concentration” refers to a concentration of less than 1 ng/pl, preferably of less than 100 pg/pl, e.g. less than 50 pg/pl, and even more preferably of less than 10 pg/pl, in particular less than 1 pg/pl, or even less than 0.1 pg/pl, e.g. 0.05 pg/pl. Evidently, a very low concentration is more than 0, e.g. at least 0.005 pg/pl, 0.01 pg/pl or 0.05 pg/pl. Evidently, the term “pg” refers to “picogram”, the term “ng” to “nanogram”, and the terms “pi” or “ul” to “microliter”.
In some embodiments, the DNA polynucleotide which is used as an IVT template is present in a very low concentration in step (b), wherein the term “a very low concentration” refers to a concentration of less than 1 ng/pl, preferably of less than 100 pg/pl, e.g. less than 50 pg/pl and even more preferably of less than 10 pg/pl, in particular less than 1 pg/pl, or even less than 0.1 pg/pl, e.g. 0.05 pg/pl. Said DNA polynucleotide or IVT template may be a mix of different DNA molecules, e.g. cDNA molecules, wherein each of said DNA molecules comprises the T7 promoter of the invention, i.e. the same T7 core promoter and the same inventive upstream and downstream flanking regions. Said mix of DNA molecules may be obtained, for example, by reverse transcribing the mRNA from a few cells, i.e. less than 100 cells, preferably less than 10
cells, or, very preferably, from a single cell, and/or a liquid biopsy, with the reverse transcription primer comprising the inventive T7 promoter provided herein. The resulting cDNA (IVT template) of the single cells (or different samples) may be pooled if the reverse transcription primers have a barcode as described herein, but the concentration may still be very low as defined herein. The same principle applies to synthesizing the complementary strand of a target DNA polynucleotide, i.e. in a PCR reaction, from such few cells or a single cell and/or a liquid biopsy with a primer comprising the inventive T7 promoter and using the resulting library of DNA polynucleotides, i.e. double-stranded DNA molecules, which have incorporated said T7 promoter as IVT template, as described herein. Furthermore, the primers used for the PCR reaction may be complementary to different target DNA molecules, and thus the resulting IVT template may be a mix of different DNA molecules comprising the same inventive T7 promoter, which may be present at a very low concentration, as described herein.
Moreover, the IVT template comprising the T7 promoter of the invention may be obtained from at least one target polynucleotide that is present freely in a body fluid, such as blood, urine or cerebrospinal fluid. In case said target polynucleotide is a RNA, the IVT template may be obtained by reverse-transcription, and in case said target polynucleotide is a DNA, the IVT template may be obtained by synthesizing the complementary strand, i.e. in a PCR reaction, as described herein. If the target polynucleotide is a DNA, the IVT template may be also obtained by attaching and/or integrating the DNA polynucleotide of the invention harboring the inventive T7 promoter to the target polynucleotide via a transposon system i.e. Tn5 mediation transposition, as described herein. In particular, the IVT template may be present at a very low concentration as described herein.
As used herein, i.e. in the context of the use of the DNA polynucleotide of the invention as reverse transcription primer or (PCR) primer, i.e. for obtaining/generating an IVT template according to the invention and/or amplifying a target polynucleotide, e.g. a mRNA, a target polynucleotide or at least one target polynucleotide refers, in particular, to at least one polynucleotide to be analyzed in a sample, e.g. one or a few cells, a liquid biopsy, and/or a body fluid. In the context of reverse-transcription or synthesis of the complementary strand, the individual molecules of the target polynucleotide are not necessarily fully identical although they comprise the same or a very similar nucleotide sequence to which the reverse transcription primer or primer comprising the T7 promoter of the invention can bind. In particular, a very similar nucleotide sequence in this context may have at most 5, 4, 3, 2 or 1 nucleotide mismatches, e.g. due to SNPs, deletions and/or insertions, compared to the reference sequence
which is complementary to a sequence comprised in the reverse-transcription primer or primer, and allows binding of the (reverse transcription) primer. For example, a target polynucleotide may refer to the mRNA molecules with a poly-A-tail in a sample, wherein the reverse transcription primer comprising a poly-T-stretch as provided herein binds to said poly-A-tail. However, a target polynucleotide may also refer, for example, to a polynucleotide comprising a unique nucleic acid sequence, and the reverse-transcription primer or primer comprising the complementary sequence of said unique sequence binds to said unique sequence. Furthermore, the target polynucleotide and the reverse-transcription primer or primer may be a mix of different polynucleotides, wherein the target polynucleotide is a plurality of polynucleotides which may comprise different unique nucleic acid sequences to which the reverse-transcription primer or primer mix binds.
In some embodiments, the target polynucleotide refers to a genomic DNA sequence to which the DNA polynucleotide of the present invention comprising the inventive T7 promoter is attached and/or integrated (i.e. covalently linked via the DNA backbone). In particular, said attachment and/or integration may be mediated by a transposase (transposon system), i.e a Tn5 transposase, e.g. as described in Harada (2019), Nat. Cell Biol. 21. In corresponding embodiments, the DNA polynucleotide of the invention further comprises a transposase binding sequence, i.e. a Tn5 transposase binding sequence which may be also known in the context of the Tn5 transpose as a “mosaic end”. Said transposase binding sequence is preferably downstream of the +1 to +8 downstream flanking region of the T7 promoter provided herein. In those embodiments, the individual molecules of the target polynucleotide may be structurally unrelated, wherein the target polynucleotide refers to genomic DNA sequences which are in proximity (e.g. within 5000 bp, 1000 bp, 500 bp, 200 bp or 100 bp, preferably within 200 bp) to a certain chromatin modification, e.g. a histone modification, and/or a DNA binding molecule, e.g. inter alia a transcription factor, an epigenetic modifier, CCCTC-binding factor (CTCF) or a polymerase. In particular, in the context of transposase-mediated attachment/integration of the T7 promoter of the invention, the DNA polynucleotide may be conjugated to a molecule (e.g. an antibody) which binds to a certain DNA binding molecule, as described herein. For example, the DNA polynucleotide of the invention may be conjugated to an antibody, e.g. via a linker such as a PEG linker, or to an antibody binding molecule such as protein A (which then binds to an antibody binding to a DNA-binding molecule) (Kaya-Okur (2019), Nat Commun. 10(1):1930).
ChIL-Seq is a suitable, non-limiting, example for using the T7 promoter of the invention for transposon-mediated attachement/integration to genomic DNA sites (target polynucleotides) (Harada (2019), Nat. Cell Biol. 21). In ChIL-Seq, a T7 promoter sequence is fused at the 5’end to an Illumina adapter and a Tn5 recognition sequence and covalently coupled at the 3’end to an antibody with specificity for chromatin components i.e. specific histone modifications, and the Tn5 recognition sequence is loaded with Tn5 transposase. The cells (e.g. 100-1000 cells, or a single cell) containing the target polynucleotide of interest are then fixed, permeabilized and incubated with the Antibody-T7 promoter-Tn5 complex. Upon binding of the antibody to its target epitopes on the chromatin (DNA-binding molecules), the DNA polynucleotide harboring the T7 promoter sequence is covalently linked to the DNA backbone via Tn5 mediated transposition. The opposite DNA strand is repaired by incubation of the cells with DNA ligase, and DNA sequences adjacent to the integration sites (IVT template) are then in vitro transcribed into RNA by T7 polymerase. Conversion of the in vitro transcribed and amplified RNA into sequencing libraries followed by deep sequencing enables identification and/or quantification of the genomic antibody binding sites (i.e the target polynucleotide(s) in proximity to the DNA binding molecules that are recognized by the antibody).
Thus, a target polynucleotide or at least one target polynucleotide, as used herein, refers, in particular, to those polynucleotide molecules (e.g. in a sample) which are converted to an IVT template (comprising a T7 promoter according to the invention) as described herein (e.g. by reverse-transcription and/or synthesis of the complementary strand, or transposase-mediated integration/attachment). Therefore, the concentration, e.g. the very low concentration, of a target polynucleotide or at least one target polynucleotide, as used herein, refers, in particular, to the total concentration of those polynucleotide molecules (e.g. in a sample) which are converted to an IVT template.
In certain embodiments, i.e. in the context of reverse-transcription and/or amplification of RNA, the concentration of the IVT template is similar as the concentration of the corresponding target polynucleotide or at least one target polynucleotide, wherein “similar” may include a 5- fold, 3-fold, 2-fold, 1.5-fold, 1.2-fold, 1.1-fold or 1.05-fold difference, preferably a 2-fold or 1.5-fold difference. For example, a 2-fold difference is 10 pg/mΐ vs. 20 pg/mΐ, or 10 pg/mΐ vs. 5 pg/mΐ; and a 1.5-fold difference is 10 pg/mΐ vs. 15 pg/mΐ, or 10 pg/mΐ vs. 6.66 pg/mΐ.
Furthermore, when the inventive DNA polynucleotide is present in a very low concentration, i.e. in step (b) of the inventive IVT method provided herein, the DNA polynucleotide comprises
preferably an upstream flanking region as described above in the context of the upstream region of a DNA polynucleotide according to the present invention.
In some embodiments, the in vitro transcribed RNA is a probe for hybridization with another RNA or DNA. Preferably, the probe is used for an in-situ hybridization method or southern blot. Preferably, the in-situ hybridization method is FISH. In case the in vitro transcribed RNA is to be used as a FISH probe, it is transcribed in the presence of fluorescently labeled ribonucleotides. A probe may be an antisense RNA which is complementary to the target oligo- or polynucleotide, for example an mRNA.
In some embodiments, the RNA is in vitro transcribed in the presence of a capping and/or polyA-tailing enzyme. In some embodiments, a 5’ cap is added at the 5’ end and/or a 3’ polyA tail is added at the 3’ end of the in vitro transcribed RNA.
A 5’ -cap, as used herein, preferably refers to a 7-methyl guanosine (m7G) cap structure at the 5' end of an mRNA. The capping may be performed by enzyme-based capping following the transcription reaction (posttranscriptional capping) and/or by incorporation of a cap analog during transcription (co-transcriptional capping). Suitable cap analogues include 7-methyl guanosine (m7G) and 3' O-me 7-meGpppG cap analog (ARC A). ARCA is methylated at the 3' position of the m7G, preventing RNA elongation by phosphodiester bond formation at this position. Thus, transcripts synthesized using ARCA contain 5'-m7G cap structures in the correct orientation, with the 7-methylated G as the terminal residue. Suitable for posttranscriptional capping is, for example, the Vaccinia Capping System.
A poly-A-tail may be added during transcription and/or after transcription. Alternatively, a reverse PCR primer comprising a polyT-stretch can be used for amplifying the template sequence. The polyA tail can be also added after transcription using e.g. an E. coli polyA polymerase. The length of the added tail can be adjusted by titrating the polyA polymerase. As used herein, a polyA tail refers to a sequence comprising, preferably 5 to 300, covalently linked adenines.
In preferred embodiments, a 5’ cap is added at the 5’ end and a 3’ polyA tail is added at the 3’ end of the in vitro transcribed RNA. In some embodiments, the 5’ cap and/or the 3’ polyA tail is added while the RNA is in vitro transcribed. In some embodiments, the 5’ cap and/or the 3’ polyA tail is added after the RNA is in vitro transcribed.
In some embodiments, the polyA tail is transcribed from an IVT template comprising a polyT- stretch at the 3 ’ end of the IVT template.
In some embodiments, the RNA is purified before and/or after a 5’ cap and/or a 3’ polyA tail is added.
In some embodiments, the nucleotide sequence encoding the RNA to be transcribed is an mRNA and the mRNA synthesized in vitro in step (b) is used to produce a recombinant protein in vitro or within a cell after providing an IVT mRNA. Preferably, the synthesis of the recombinant protein occurs in vitro.
In preferred embodiments, the nucleotide sequence encoding the RNA to be transcribed is an mRNA and the mRNA synthesized in vitro in step (b) is then transfected into a cultured cell. A cultured cell can be a mammalian cell line, an insect cell line, a yeast or a bacterium. Preferably, the cultured cell is a bacterium, even more preferably a cultured cell from E. coli. Transfection of a cell with an mRNA is particularly advantageous to only temporarily express the protein encoded by the mRNA in the cell and/or expressing said protein without changing the genome of the cell. Temporary expression is especially advantageous in case of a genome editing enzyme as the risk of unspecific genome-editing activity can be reduced.
Furthermore, the DNA polynucleotide according to the present invention can thus be used for producing, e.g. a recombinant, protein.
IVT is advantageous for amplifying a polynucleotide, wherein the amplified polynucleotide is DNA or RNA. Preferably, the polynucleotide to be amplified is RNA which is reverse transcribed into cDNA which in turn is amplified by IVT. Even more preferably, the RNA is an mRNA which is amplified as antisense RNA (aRNA).
In another aspect, the present invention relates to a method for transcribing RNA within a cultured cell, said method comprising the steps of (a) providing a DNA polynucleotide according to the present invention comprising a T7 promoter sequence and downstream thereof a nucleotide sequence encoding the RNA to be transcribed and (b) introducing the DNA polynucleotide into a cultured cell expressing a T7 polymerase.
In preferred embodiments, the RNA which is transcribed in a cultured cell is an mRNA which is further translated into a protein within the cultured cell.
In preferred embodiments, the cultured cell wherein the RNA is transcribed is a bacterium.
In some aspect, the present invention relates to a cultured cell expressing a T7 polymerase comprising a DNA polynucleotide according to the present invention comprising a T7 promoter sequence and downstream thereof a nucleotide sequence encoding the RNA to be transcribed.
In preferred embodiments, the cultured cell comprising the DNA polynucleotide of the invention is a bacterium.
As regards the cultured cell and the DNA polynucleotide for transcription of an RNA within a cultured cell, the same applies as has been described above, in particular regarding the DNA polynucleotide of the invention, the cultured cell, the RNA to be transcribed and the production of a recombinant protein.
In particular, the DNA polynucleotide according to the present invention comprising an inventive T7 promoter provided herein may be used as a primer, i.e. for incorporating the sequence of the inventive DNA polynucleotide comprising the T7 promoter of the invention, into a copy of a target polynucleotide.
Thus, said DNA polynucleotide (primer) may be used for synthesizing the complementary strand of at least one target polynucleotide, wherein said synthesizing comprises annealing said DNA polynucleotide to said at least one target polynucleotide, in particular, thereby generating an IVT template according to the invention. In particular, said IVT template may be used in an inventive IVT method provided herein.
In preferred embodiments, the DNA polynucleotide according to the present invention is used as a reverse transcription primer. In particular, the DNA polynucleotide is used for reverse transcribing RNA to cDNA, thereby incorporating the sequence of the DNA polynucleotide at least partially into the cDNA.
As regards the use of the DNA polynucleotide of the invention as a primer or reverse transcription primer, the same applies as has been described above in connection with the T7 promoter, and in particular the downstream flanking region and/or the upstream flanking region of the T7 promoter, according to the invention.
Reverse transcription can be done using reverse transcriptases (RTs). RTs use an RNA as template and a short primer complementary to the 3' end of the RNA to direct the synthesis of the first strand cDNA, which may be used directly as a template for PCR. This combination, referred to as RT-PCR, allows the detection of low abundance RNAs in a sample, and
production of the corresponding cDNA. Alternatively, the first strand cDNA can be made double-stranded using e.g. DNA Polymerase I and DNA Ligase. This is for example advantageous for cloning approaches without amplification. Suitable RT polymerases are, for example, Avian Myeloblastosis Virus (AMV) Reverse Transcriptase and Moloney Murine Leukemia Virus (M-MuLV, MMLV), reverse Transcriptase or, preferably a Superscript RT polymerase.
Moreover, the DNA polynucleotide according to the present invention may be used as a primer for synthesizing the complementary strand of a target DNA molecule, i.e. in a PCR reaction, thereby incorporating the sequence of the DNA polynucleotide of the invention at least partially into copies of the target polynucleotide. In particular, the synthesis of the complementary strand may be accompanied and/or followed by synthesizing the complementary strand of the strand having incorporated the DNA polynucleotide of the invention, thereby generating a double stranded DNA molecule comprising at least part of the target DNA molecule, and at one end the T7 promoter of the invention. This procedure may be considered as one cycle of a PCR reaction, wherein the DNA molecule of the invention is one PCR primer. Thus, typical PCR reaction conditions may be applied for this step, i.e. employing a DNA polymerase, for example, inter alia , a Taq polymerase. Preferably, said PCR reaction comprises only few cycles, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 cycles, preferably 1, 2, 3, 4 or 5 cycles, preferably 1 or 2 cycles, preferably 1 cycle.
The target polynucleotide which is amplified by using a primer comprising the T7 promoter of the invention may be obtained, in particular, from a few cells (i.e. less than 100 cells), a single cell, a single cell equivalent, a liquid biopsy, and/or a body fluid, as described herein. In some embodiments, i.e. in the context of transposase mediate attachment/integration of the T7 promoter sequence, as described herein, the few cells may be 100 cells or more, for example 200, 400, 600, 800, 1000, 1500 or 2000 cells but less than 10000 cells. However, in general, the few cells, as described herein, are preferably less than 100 cells, more preferably less than 10 cells.
In certain embodiments, the target polynucleotide, i.e the target polynucleotide that is amplified, analyzed and/or sequenced, according to the invention is from a prokaryote or a eukaryote, preferably an animal, preferably a mammal, preferably a human. In particular, said target polynucleotide may be associated with a disease and/or is suspected to be associated with a disease, i.e. as described herein. In some embodiments, said target polynucleotide may be not from a virus, i.e. not from a virus that exists for less than 1000, 100 or 10 years.
A liquid biopsy, as used herein, refers to the sampling of non-solid biological tissue, in particular, blood, and the analysis of at least one target polynucleotide comprised therein as described herein in the context of the invention. A liquid biopsy is mainly used as a diagnostic and monitoring tool, e.g. for diseases such as cancer, and may be largely non-invasive. In particular, a liquid biopsy may contain circulating tumor cells, e.g. inter alia , from metastatic breast, metastatic colon, and/or metastatic prostate cancer, and/or circulating tumor DNA. Furthermore, a liquid biopsy may contain circulating endothelial cells which are an indicator of vascular dysfunction or damage, e.g. associated with a heart attack. The presence or abundance of circulating tumor cells or circulating endothelial cells may be determined by a method, e.g. a sequencing method according to the invention, comprising amplifying target polynucleotides, e.g. mRNA molecules, from such single circulating cells by using a primer, e.g. a reverse transcription primer comprising an inventive T7 promoter, according to the invention.
Circulating tumor DNA (ctDNA) is tumor-derived fragmented DNA in the bloodstream that is not associated with cells. In particular, because ctDNA may reflect the entire tumor genome, it may be useful in clinics, i.e. for diagnosing and/or monitoring cancer. The presence or abundance of circulating tumor DNA may be determined by a method, i.e. a sequencing method according to the invention, comprising amplifying target tumor DNA polynucleotides from a blood sample by using a primer comprising an inventive T7 promoter according to the invention.
Furthermore, a liquid biopsy may be from blood or amniotic fluid, and contain cell-free fetal DNA (cffDNA). cffDNA originates from placental trophoblasts and is fragmented when placental microparticles are shed into the maternal blood circulation. cffDNA fragments are approximately 200 base pairs (bp) in length which is significantly smaller than maternal DNA fragments. This difference in size allows cffDNA to be distinguished from maternal DNA fragments. Thus, the presence or abundance of circulating tumor DNA may be determined by a method, e.g. a sequencing method according to the invention, comprising amplifying target cell-free fetal DNA polynucleotides from a blood sample (or an amniotic fluid sample) by using a primer comprising an inventive T7 promoter according to the invention.
In particular, the ctDNA or cffDNA is linearly amplified by in vitro transcription employing the T7 promoter of the invention, and the IVT template is generated by few or one cycle(s) of PCR employing the inventive primer comprising said T7 promoter.
Thus, the DNA polynucleotide and/or the T7 promoter, i.e. the primer or reverse transcription primer of the invention may be used for diagnostic purposes, e.g. for diagnosing and/or
monitoring cancer, a cardiovascular disease such as a heart attack, or a fetal gene defect or gene variant. As illustrated in the appended Examples, the sensitivity and accuracy of single cell sequencing methods can be increased by employing a primer comprising an inventive T7 promoter provided herein for capturing target polynucleotides and amplifying them by in vitro transcription. This increase in sensitivity and accuracy may be particularly useful for diagnostic approaches, e.g. to determine certain cell states, e.g. of circulating tumor cells or leukemic cells, more reliably. In fact, as illustrated in the appended Examples, cell state determinants such as transcription factors and chromatin binders, which are missed when using a conventional T7 promoter, are detected by the improved sequencing method of the invention employing a reverse transcription primer comprising an inventive T7 promoter provided herein.
Thus, in a further preferred aspect, the invention relates to a method for amplifying a target polynucleotide, wherein said method comprises the steps of
(a) synthesizing the complementary strand of a target polynucleotide, wherein said synthesizing comprises annealing a DNA polynucleotide comprising the T7 promoter sequence according to the invention to said target polynucleotide, thereby obtaining an IVT template comprising said T7 promoter and a nucleotide sequence downstream thereof, i.e. wherein said nucleotide sequence reflects the target polynucleotide sequence, and
(b) in vitro transcribing said IVT template according to the IVT method of the invention, thereby amplifying said target polynucleotide. In particular, said amplification is a linear amplification as described herein. In particular, reflecting the target polynucleotide sequence refers to the target polynucleotide sequence and the complementary sequence or antisense sequence thereof, and/or a well attributable fragment thereof. A fragment can be attributed to the target polynucleotide sequence by methods well known in the art, as described herein or illustrated in the appended Examples, e.g. by a sequence alignment algorithm.
In certain embodiments, the target polynucleotide is an RNA which is reverse-transcribed into cDNA. In particular, said RNA is thereby amplified as aRNA.
In preferred embodiments, the IVT template is obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from less than 100 cells, preferably less than 10 cells, preferably a single cell.
In the context of the present invention, the cell containing an RNA that is to be reverse transcribed to cDNA or a target polynucleotide that is to be amplified, and/or a polynucleotide that is to be analyzed and/or sequenced may be a prokaryotic cell or an eukaryotic cell, preferably an eukaryotic cell, preferably an animal cell, preferably a mammalian cell, preferably
a human cell. In particular, the human and/or animal cell may be associated with a disease and/or be suspected to be associated with a disease, e.g. inter alia cancer or a cardiovascular disease. Said cancer may be, in particular, characterized by metastatic cells, circulating tumor cells and/or blood cancer cells / leukemic cells.
Furthermore, the IVT templated may be obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide, wherein said RNA and/or at least one target polynucleotide is present at a concentration of less than 1 ng/pl, preferably of less than 100 pg/mΐ, e.g. less than 50 pg/mΐ, and even more preferably of less than 10 pg/mΐ, in particular less than 1 pg/mΐ, or even less than 0.1 pg/mΐ, e.g. 0.05 pg/mΐ. Moreover, the IVT template may be obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from a liquid biopsy and/or a body fluid.
In preferred embodiments, at least one target polynucleotide is amplified from less than 100 cells, preferably less than 10 cells, preferably a single cell, wherein the T7 promoter further comprises the sequence AATT directly upstream of the T7 core promoter sequence. Preferably, the T7 promoter further comprises a poly-T-stretch as described herein. In particular, mRNA molecules from less than 100 cells, preferably less than 10 cells, preferably a single cell corresponding to at least 5000, preferably at least 7000, preferably at least 9000 different genes may be amplified. Preferably, mRNA molecules from a single cell corresponding to at least 9000 different genes may be amplified. An mRNA molecule corresponding to a gene refers, in particular, to a transcript of said gene. Evidently, the maximum amount of different genes detected and/or amplified is limited by the genes expressed in a cell, and may thus vary between individual cells or samples.
Herein, a gene may refer to the genomic sequence of a gene comprising the exons of said gene. Moreover, said genomic sequence may comprise the intronic regions and/or, in some cases, the regulatory regions of said gene such as a promoter. Moreover, a gene may refer to a cDNA or a coding sequence. In particular, in the context of RNA sequencing, transcripts, reverse transcription, mRNA and/or cDNAs, a gene may preferably refer to the part of the gene that is transcribed (i.e. a cDNA).
In preferred embodiments, the RNA which is reverse transcribed into cDNA is obtained from a single cell or a single cell equivalent. A single cell equivalent, as used herein, is a fraction of a sample comprising RNA, wherein the concentration of the RNA and/or the total amount of RNA is similar to what is obtained from a single cell by methods used in the art. The amount
of RNA obtained from a single cell is about 10 to 30 pg and the amount of mRNA obtained from a single cell is about 0.1 to 1.5 pg. Moreover, a single cell equivalent may be a sample of a liquid biopsy and/or a body fluid, such as blood, urine or cerebrospinal fluid, comprising a similar amount and/or concentration of RNA, mRNA and/or DNA as a single cell. In some embodiments, a single cell equivalent may comprise about 0.1 to 1.5 pg and/or at least one target polynucleotide, e.g. at least one target mRNA and/or at least one target DNA, at a very low concentration as described herein. In some cases, a sample of a body fluid and/or liquid biopsy may be concentrated or diluted and/or subsampled to comprise about 0.1 to 1.5 pg and/or at least one target polynucleotide at a very low concentration, as described herein.
A polyT-stretch is particularly useful when the DNA polynucleotide of the invention is used as a reverse transcription primer for reverse transcribing mRNA to cDNA. Hence, in a preferred embodiment, the DNA polynucleotide comprising the T7 promoter of the invention and further comprising a polyT-stretch is used as a primer for reverse transcribing mRNA to cDNA, wherein the polyT-stretch binds to the polyA tail of the mRNA. Similarly, a polyT-stretch is useful for reverse-transcribing a non-coding RNA comprising a polyA tail to cDNA.
In a preferred embodiment, a cDNA comprising the T7 promoter of the invention and downstream thereof the antisense sequence of an mRNA is used as IVT template for in vitro transcribing antisense RNA (aRNA). Preferably, the aRNA is thereby linearly amplified.
In certain embodiments, the DNA polynucleotide comprising the T7 promoter of the invention is used for linearly amplifying RNA from cDNA by in vitro transcription.
In a preferred embodiment, the DNA polynucleotide comprising the T7 promoter comprising the downstream flanking region of the invention and the upstream flanking region of the invention is used for linearly amplifying RNA from cDNA by in vitro transcription.
In a particularly preferred embodiment, cDNA is generated from RNA by using the DNA polynucleotide comprising the T7 promoter comprising the downstream flanking region of the invention and the upstream flanking region of the invention as reverse transcription primer. Preferably, the RNA is obtained from less than 100 cells, preferably less than 10 cells, very preferably from a single cell.
The T7 promoter comprising the downstream flanking region of the invention and the upstream flanking region of the invention is particularly suitable for in vitro transcribing RNA, when the IVT template concentration is low, i.e. very low as described herein. The IVT template concentration is low, for example, when the IVT template is a cDNA which is reverse
transcribed from an RNA obtained from a single cell. Hence, the DNA polynucleotide of the invention comprising a T7 promoter comprising a downstream flanking region of the invention and an upstream flanking region of the invention is particularly useful for in vitro transcribing RNA from cDNA, wherein the cDNA has been reverse transcribed from RNA present in a very low concentration and/or derived or obtained from less than 100 cells, preferably a single cell. Moreover, said DNA polynucleotide comprising a T7 promoter comprising a downstream flanking region of the invention and an upstream flanking region of the invention is very useful for in vitro transcribing RNA from a DNA polynucleotide, i.e. a double stranded DNA molecule, wherein said DNA polynucleotide has been generated from a target DNA polynucleotide that is present in a very low concentration and/or derived or obtained from less than 100 cells, preferably a single cell.
Evidently, a polynucleotide that is derived or obtained from a cell, has been contained in said cell, and/or is contained in such a cell.
In a particularly preferred embodiment, aRNA is linearly amplified by in vitro transcribing aRNA from a cDNA IVT template, wherein the cDNA is obtained by reverse transcription from an mRNA, wherein the DNA polynucleotide of the invention is comprised in the reverse transcription primer, and wherein the mRNA is obtained from a single cell.
In a preferred embodiment, the method for in vitro transcribing RNA further comprising the steps of reverse transcribing an RNA to cDNA and subsequently transcribing the cDNA to RNA, wherein the DNA polynucleotide of the invention is used as reverse transcription primer and the cDNA is the IVT template. Preferably, the reverse transcribed RNA is an mRNA, the DNA polynucleotide comprises a poly-T-stretch, preferably downstream of the T7 core promoter, and the in vitro transcribed RNA is an antisense RNA (aRNA).
As regards the initial reverse transcription step in certain embodiments of the method of the invention for in vitro transcribing RNA, the same applies as it has been described above in connection with use of the DNA polynucleotide as reverse transcription primer.
In preferred embodiments, a second strand of the cDNA comprising the T7 promoter of the invention is synthesized upon first strand cDNA synthesis.
In certain embodiments, the linearly amplified RNA is further reverse transcribed to cDNA, preferably by using random primers for first strand synthesis and the DNA polynucleotide of the invention comprising a polyT-stretch as primer for second strand synthesis. The cDNA
derived from linearly amplified RNA can again be used as an IVT template to again in vitro transcribe RNA.
In some embodiments, the cDNA comprising the T7 promoter of the invention and downstream thereof a nucleotide sequence encoding preferably an RNA is used as IVT template for in vitro transcribing RNA. Preferably, the RNA is thereby linearly amplified. Hence, the DNA polynucleotide of the invention can be used for, preferably linearly, amplifying a nucleotide sequence which is complementary to a nucleotide sequence comprised in said DNA polynucleotide.
Furthermore, in the context of the present invention, the presence and/or abundance of a target polynucleotide in sample such as a liquid biopsy, a sample from a body fluid, a few cells and/or a single cells, as described herein, may be determined by employing the inventive IVT and T7 promoter based amplification methods provided herein. Moreover, the amount of the amplified target polynucleotide(s) (e.g. aRNA) may be determined by methods known in the art and/or as described herein, e.g. by quantitative PCR, digital PCR and/or next-generation sequencing. Further methods to detect the amplified target polynucleotide(s) may be also used, e.g. fluorescence in-situ hybridization (FISH) and/or dCas9-based imaging.
In a further aspect, the invention relates to a method for amplifying a target polynucleotide, wherein said target polynucleotide is a genomic DNA sequence, and wherein said method comprises the steps of
(a) attaching and/or integrating a DNA polynucleotide comprising the T7 promoter sequence according to the invention to said target polynucleotide by transposition, i.e. Tn5 transposition, thereby obtaining an IVT template comprising said T7 promoter and a nucleotide sequence downstream thereof, i.e. wherein said nucleotide sequence reflects the target polynucleotide sequence, and
(b) in vitro transcribing said IVT template according to the IVT method of the invention, thereby amplifying said target polynucleotide.
As regards the in vitro transcription (step b), the amplification of the target polynucleotide, and “reflecting the target polynucleotide sequence”, the same applies as is described herein in the context of further methods for amplifying a target polynucleotide.
In particular, employing a primer or reverse transcription primer comprising a T7 promoter with an inventive and defined downstream flanking region, as provided herein, allows to compare the abundances of target polynucleotides (e.g. non-poly-A RNA or DNA) with improved
sensitivity and/or accuracy. The same applies for T7-promoter based transposase-mediated amplification of genomic DNA target polynucleotides (e.g. which are in proximity to a certain epigenetic modification or DNA-binding molecule).
In particular, the improved accuracy is due to the improved efficiency of amplifying the target polynucleotide, and furthermore, due to the lack of an amplification bias. Such an amplification bias may occur when only a core T7 promoter or a T7 core promoter with less than 8 defined directly adjacent downstream bases is used because in such a case the efficiency of the T7 promoter may vary between different target polynucleotides since they may constitute a (random) part of the downstream flanking region. The T7 promoter of the invention comprising an optimized and defined downstream flanking region is thus particularly useful for detecting and/or quantifying target polynucleotides in a sample, i.e. from less than 100 cells, preferably less than 10 cells, preferably a single cell, a single cell equivalent, a body fluid, and/or a liquid biopsy as described herein. In particular, for quantification of a plurality of target polynucleotides, e.g. a single cell transcriptome, an inventive sequencing method employing an inventive primer provided herein may be used.
In particularly preferred embodiments, the IVT method using a DNA polynucleotide according to the present invention is part of a sequencing method.
Herein, the term "sequencing" refers to approaches aiming at determining the identity of at least one nucleotide, preferably most nucleotides, even more preferably all nucleotides in a given nucleotide sequence. Furthermore, the term “sequencing method” refers to a method for determining the partial or full sequence(s) of a polynucleotide and/or to determine the relative abundance of identical molecules of said polynucleotide.
Examples for sequencing methods and/or commercial suppliers of sequencing methods include Sanger sequencing, pyrosequencing, SOLiD sequencing, Illumina sequencing, Ion Torrent sequencing, single molecule real time sequencing and Nanopore sequencing. A sequencing method may be a first, next (second) or third generation sequencing method. Preferred are next or third generation sequencing methods. A preferred next-generation sequencing technique is Illumina sequencing. Preferably, the sequencing method is suitable for single cell sequencing. Suitable single cell sequencing techniques are for example based on SMART-seq2, CEL-seq2, inDrop, MARS-seq, Drop-seq, STRT, Quartz-seq, LIANTI, CHIL-Seq, Dam-seq (e.g. scDam&T-seq) and the sci-L3 method. Particularly suitable are sequencing methods that comprise an in vitro transcription step such as CEL-seq, CEL-seq2, inDrop, MARS-seq, LIANTI, CHIL-Seq, Dam-seq (e.g. scDam&T-seq), and sci-L3. A preferred third generation
sequencing method is Nanopore sequencing which allows direct sequencing of RNA and/or DNA in the absence of a library preparation step comprising PCR based amplification of reverse-transcribed aRNA.
Hence, in some embodiments, the DNA polynucleotide of the present invention is used to reverse transcribe a target polynucleotide, e.g. an RNA, amplify it, i.e. by IVT, and then subject the amplified RNA directly and/or after reverse transcription to sequencing.
As regards the DNA polynucleotide the same applies as described above. Thus, in particularly preferred embodiments, the DNA polynucleotide used for the sequencing method comprises a polyT-stretch to capture mRNAs which are then amplified as antisense RNAs (aRNAs). Thus, the entire sequence of RNAs, including the sequence of the respective poly(A) tails, can be determined by sequencing the synthesized cDNA library. In further particularly preferred embodiments, the DNA polynucleotide of the invention used for the sequencing method further comprises a barcode and/or a unique molecular identifier (UMI) and preferably a polyT-stretch.
In preferred embodiments, the sequencing method is suitable for scarce material, in particular for less than 10.000 cells, preferably less than 100 cells, preferably less than 10 cells. In a very preferred embodiment, the sequencing method is a single cell sequencing method.
In preferred embodiments, the invention relates to a method for determining the partial or full sequence(s) of a polynucleotide, i.e. a target polynucleotide, and/or the relative abundance of identical molecules of said polynucleotide comprising the steps of (a) synthesizing a DNA strand which is complementary to the polynucleotide that is to be sequenced (target polynucleotide) or the complementary strand thereof comprising annealing the DNA polynucleotide of the invention (to said target polynucleotide), and (b) transcribing the polynucleotide that is to be sequenced (target polynucleotide) or the complementary strand thereof into RNA by using a T7 polymerase.
In preferred embodiments, the present invention relates to a method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene comprising the steps of (a) reverse transcribing an RNA into a first strand of a cDNA comprising annealing the DNA polynucleotide according to the present invention; and (b) transcribing the cDNA into aRNA by using a T7 polymerase.
As regards the reverse transcription step (step (a)) of the method, the same applies as it has been described above in connection with reverse transcribing RNA to cDNA, synthesizing the complementary strand of a target polynucleotide and/or the use of the DNA polynucleotide of
the invention as a primer or reverse transcription primer. Furthermore, the term “annealing the DNA polynucleotide” refers herein to the use of the DNA polynucleotide of the invention as a primer for reverse transcribing or synthesizing the complementary strand of the polynucleotide the sequence of which is to be determined (target polynucleotide).
In a preferred embodiment, the method further comprises a step (a’) of synthesizing a second strand of the first strand of the cDNA obtained from step (a) or during the method for amplifying a target polynucleotide according to the invention.
Thus, in a further preferred aspect, the invention relates to a method for determining the partial or full nucleotide sequence(s) and/or abundance of at least one target polynucleotide comprising the steps of
(a) amplifying at least one target polynucleotide according to the method for amplifying a target polynucleotide according to the invention, and
(b) determining the partial or full nucleotide sequence(s) and/or abundance of said at least one amplified target polynucleotide as described herein or illustrated in the appended Examples.
In particular, determining the abundance of at least one target polynucleotide may comprise counting the nucleotide sequences corresponding to said target polynucleotide. Methods for determining and normalizing the counts are well known in the art, and described herein and in the appended Examples. Moreover, the abundance of at least one mRNA may further refer to the transcript level of the corresponding gene.
Thus, the invention also relates to a method for determining the transcript level of at least one gene comprising the steps of
(a) amplifying at least one target polynucleotide according to the method for amplifying a target polynucleotide according to the invention, wherein said target polynucleotide is an mRNA corresponding to said at least one gene, and
(b) determining the partial or full nucleotide sequence(s) and/or abundance of said at least one amplified target polynucleotide as described herein or illustrated in the appended Examples.
In preferred embodiments, the partial or full nucleotide sequence(s), the abundance of at least one target polynucleotide and/or the transcript level of at least one gene is determined in a sample comprising less than 100 cells, preferably less than 10 cells, preferably a single cell. Moreover, the mRNA and/or at least one target polynucleotide may be present at a concentration of less than 1 ng/mΐ, preferably of less than 100 pg/mΐ, e.g. less than 50 pg/mΐ, and
even more preferably of less than 10 pg/mΐ, in particular less than 1 pg/mΐ, or even less than 0.1 pg/mΐ, e.g. 0.05 pg/mΐ.
As illustrated in the appended Examples, the expression of at least about 9750 genes can be detected in a single cell (e.g. a K562 cell) with an exemplary sequencing method according to the invention employing an inventive reverse transcription primer provided herein (e.g. SEQ ID NO: 15) comprising an T7 promoter of the invention (e.g. SEQ ID NO:20).
Thus, the partial or full nucleotide sequence(s) and/or the abundance of at least 5000, preferably at least 7000, preferably at least 9000 different target polynucleotides and/or the transcript level of at least 5000, preferably at least 7000, preferably at least 9000 genes may be determined according to the sequencing method of the invention. In particular, at least 5000, preferably at least 7000, preferably at least 9000 transcripts of unique genes may be detected in a single cell. Moreover, at least 9100, 9300 or 9500 transcripts of unique genes may be detected in a single cell.
In preferred embodiments, the sequencing method of the invention, e.g. the method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene comprises a step (c) of generating a double-stranded cDNA from the aRNA of step (b), and/or a step of sequencing the aRNA of step (b) and/or the cDNA of step (c).
In some embodiments, the sequencing method of the invention further comprises a step (c’) of synthesizing the second strand of the cDNA of step (c) upon synthesis of the first strand of the cDNA.
In preferred embodiments, the method further comprises a step (d) of amplifying the cDNA of step (c) by PCR, wherein a library preparation primer binds to the sequencing adapter. Preferably, the sequencing method further comprises a step of sequencing the amplified cDNA of step (d).
In preferred embodiments, the partial or full nucleotide sequence(s) of an mRNA and/or the transcript level of at least one gene comprising a coding sequence is determined, wherein the DNA polynucleotide of the invention comprises a polyT-stretch and a sequencing adapter, wherein the polyT-stretch binds to the poly-A-tail of the mRNA.
In a preferred embodiment, the polynucleotide that is to be sequenced (target polynucleotide) is derived or obtained from less than 10000 cells, preferably, less than 100 cells, preferably less than 10 cells. In a further preferred embodiment, the polynucleotide to be sequenced (target polynucleotide) is derived or obtained from a single cell and/or a single cell equivalent.
Preferably, the polynucleotide that is to be sequenced (target polynucleotide) is derived from and/or obtained from a single cell. Furthermore, the polynucleotide to be sequenced may be derived or obtained from a body fluid, i.e. a single cell equivalent obtained from a body fluid, as described herein.
In a very preferred embodiment, the sequencing method of the invention further comprises a step of analyzing the sequencing data, wherein the analysis comprises a sequence alignment, counting reads, and/or normalizing read counts. In a further aspect, the present invention relates to the use of the DNA polynucleotide according to the present invention in a sequencing method, e.g. for determining the partial or full nucleotide sequence(s) of an RNA contained in a single cell or single cell equivalent and/or the transcript level of at least one gene of a single cell or single cell equivalent.
As regards the DNA polynucleotide and the determination of the partial or full nucleotide sequence(s) of an RNA contained in a single cell or single cell equivalent and/or the transcript level of at least one gene of a single cell or single cell equivalent, the same applies as described above.
The present invention further relates to a kit comprising the DNA polynucleotide of the invention, one or more modified and/or unmodified ribonucleotide triphosphate(s), library preparation primer(s), sequencing primer(s), and/or a microfluidic chip.
The kit can further comprise an enzyme for 5’ capping of RNA and/or an enzyme for poly-A- tailing of RNA, and/or a manual describing an in vitro transcription method and/or a sequencing method comprising an IVT step using said kit. As regards the DNA polynucleotide and modified and/or unmodified ribonucleotide triphosphate(s) of the kit, the same applies as it has been described above in connection with the DNA polynucleotide according to the invention.
As regards the use of the kit, the same applies as it has been described above in connection with the use of the DNA polynucleotide according to the invention.
In some embodiments, the polynucleotide of the invention is an RNA polynucleotide. As regards the RNA polynucleotide of the invention, the same applies as it has been described in the context of a DNA polynucleotide.
In a preferred embodiment, the RNA polynucleotide of the invention is reverse-transcribed to DNA, in particular before its use as a reverse transcription primer, for IVT and/or a sequencing method according to the invention.
Brief description of the figures
Figure 1. Scheme of 5’RACE-Seq for investigating the efficiency of different T7 promoter sequences. A double stranded DNA (dsDNA) library (SEQ ID NO:2), wherein each DNA polynucleotide comprised a T7 core promoter sequence, followed by a guanine at position +1 and a randomized nucleotide sequence at positions +2 to +16, was transcribed in vitro using a T7 RNA polymerase. The resulting 210 nucleotides long RNAs were reverse transcribed, and the 5’ end of the respective cDNA converted into a library for deep sequencing. The activity of the T7 promoter sequences was determined by counting the reads comprising a respective flanking region directly adjacent downstream the T7 core promoter sequence. The DNA IVT template was amplified by PCR and sequenced for normalization of the read counts.
Figure 2. Relative abundances of T7 promoter sequence variants from the TSS to position +16 directly adjacent downstream the T7 core promoter sequence (each with a G at position +1) and grouped by the respective dinucleotide sequence at positions +2 and +3. Promoters with three guanines at positions +1 to +3 (GGG) showed the highest activity on average as detected by 5’RACE-Seq.
Figure 3. Normalized average nucleotide compositions of T7 promoter sequence variants from positions +2 to +16 in amplified (i.e. in vitro transcribed) RNAs, determined by 5’RACE-Seq. The overlapping part of the extended initiation bubble, which extends from positions -4 to +7, is highlighted in light grey.
Figure 4. Activity of T7 promoter variants carrying nucleotide sequences from positions +4 to +8 determined by 5’RACE-Seq shown as log2 fold change (FC) (relative abundances of individual sequence motifs). All promoters contained a G at positions +1 to +3. A high correlation (R2) of 0.98 was observed between two independent experiments denoted as RepA and RepB. Highlighted are exemplary T7 promoter sequences with high, mid, or low activity.
Figure 5. Linear amplification of RNA by in vitro transcription. In vitro transcription was performed to compare the activity of T7 promoter variants with a G at positions +1 to +3 followed by different nucleotide sequences at positions +4 to +8 with either a high, intermediate or low rank as determined by 5’RACE-Seq as shown in Table 1. A 410 nucleotides long RNA was in vitro transcribed for the indicated time points using the respective promoter variant (SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8). Shown is the resulting fold amplification of template DNA. Error bars represent standard deviation for triplicate experiments.
Figure 6. Comparison of the IVT activity of T7 promoter variants with different 5’RACE-Seq ranks as shown in Table 1. All T7 promoter variants comprised a G at positions +1 to +3 and different nucleotide sequences at positions +4 to +8 and were used to in vitro transcribe a 410 nucleotides long RNA for 2h. Shown is the fold amplification relative to the template DNA. Indicated below is the +4 to +8 RACE-Seq rank as shown in Table 1. Results are shown from left to right for SEQ ID NO:8, SEQ ID NO:7, SEQ ID NO:6, SEQ ID NO:3, and SEQ ID NO:4. Error bars represent standard deviation for triplicate experiments
Figure 7. Comparison of the IVT activity of two T7 promoter variants using different IVT DNA template concentrations (SEQ ID NO:4 and SEQ ID NO: 8). IVT was performed for 2h.
Figure 8. Comparison of the IVT activity of a T7 promoter variant with a 5’RACE-Seq rank as shown in Table 1 with or without an AT -rich 4 nucleotides long upstream flanking region. IVTs were performed for 2h using T7 promoter variants of the indicated 5'RACE-Seq rank. Shown is the resulting fold amplification of template DNA for SEQ ID NO:4, SEQ ID NO:5, and SEQ ID NO: 8 (from left to right). The upstream flanking region was GAATT located directly adjacent upstream the T7 core promoter (comprised in SEQ ID NO:5).
Figure 9. An AT -rich upstream flanking region boosts the IVT activity of the T7 promoter in an IVT template that mimics the structure of CEL-seq2 derived cDNA, in particular at very low IVT template concentrations. In the left panel, a 410 nucleotides long RNA was in vitro transcribed for 2h from 1 nanogram of standard IVT templates (SEQ ID NO:4 and SEQ ID NO:5). In the middle panel, the RNA was in vitro transcribed for 2h from 1 nanogram (50 pg/pl) of 305 nucleotides long CEL-seq cDNA mimics (SEQ ID NO: 10 and SEQ ID NO: 11). In the right panel, the RNA was in vitro transcribed for 15h from 1 picogram (0.05 pg/mΐ) of those CEL-seq cDNA mimics. The T7 promoter in all IVT templates further comprised a directly adjacent downstream flanking region with the sequence “GGGATAAT”.
Figure 10. A T7 promoter of the present invention, in particular with a respective upstream flanking region improves IVT from a dsDNA that mimics the structure of CEL-seq2 derived cDNA. Exemplarily, the relative activity of the T7 promoter as found in the CEL-seq2 adaptor (see SEQ ID NO:9) is compared to the T7 promoter with rank #4 (see SEQ ID NO: 10) in the 5’RACE-Seq data as shown in Table 1, and the latter in combination with the directly adjacent upstream flanking region with the sequence GAATT (see SEQ ID NO: 11). The template was pre-synthesized as double stranded DNA to monitor the specific effect of the promoter sequence, independent of the mRNA capturing step during CEL-seq2. 1 picogram template was in vitro transcribed for 15h. The data for “CEL-seq + #4” and “CEL-seq up + #4” are the same as shown in the right panel of Figure 9.
Figure 11. T7 promoter in combination with specific directly adjacent upstream flanking regions improves aRNA yield during CEL-seq2 from 10 single cells. CEL-seq2 experiments were performed until RNA amplification using the reverse transcription primers with SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 14 or SEQ ID NO: 16 (from left to right). The 5’RACE-Seq rank of the T7 promoter sequence is indicated as shown in Table 1. “upl” refers to the upstream flanking region GAATT (comprised in SEQ ID NO: 13 and SEQ ID NO: 15) and “up4” refers to the upstream flanking region AATTG (comprised in SEQ ID NO: 14 and SEQ ID NO: 16). Due to some sequence similarity of the directly adjacent downstream flanking region of the T7 core promoter comprised in the original CEL-seq2 primer and the T7 promoter with rank #66, only two nucleotides (AG) were inserted after position +3 in case of the T7 promoter sequence with rank #66 (see SEQ ID NO: 15), in contrast to the variant with rank #4 (see SEQ ID NO: 13), wherein six nucleotides were inserted (ATAATG). The term “CEL-seq” indicates the original CEL-seq2 reverse transcription primer (SEQ ID NO: 12). Error bars represent standard deviation for triplicate experiments.
Figure 12. The combination of a directly adjacent up- and downstream region of the T7 improves IVT in a single cell RNA sequencing experiment based on CEL-seq2. CEL-seq2 was performed from single K562 cells using the indicated primers (SEQ ID NO: 12 and SEQ ID NO: 15). After reverse transcription and second strand synthesis cDNA from 10 cells was pooled and in vitro transcribed for 15h. Purified RNA was fragmented and quantified on a Tapestation (Agilent). Error bars represent the standard deviation from triplicate experiments.
Figure 13. CEL-Seq2 workflow. Shown on top is a schematic of the CEL-Seq2 reverse transcription primer (T7 = T7 promoter; Illumina A = partial sequencing adapter A; UMI = unique molecular identifier; BC = cellular barcode). Polyadenylated RNA from single cells is
captured via the oligo d(T) terminus of the CEL-Seq2 reverse transcription primer. After reverse transcription (RT) and second strand synthesis, mRNA sequences are amplified by in vitro transcription (aRNA production). Antisense RNA (aRNA) is fragmented, and reverse transcribed using a random hexamer primer fused to partial sequencing adapter B (Illumina B). Sequencing adapters are completed, and the library is amplified in a final PCR step.
Figure 14. A modified CEL-Seq2 reverse transcription primer (SEQ ID NO: 15; CEL-Seq+) harboring a modified T7 promoter (SEQ ID NO:20) significantly improved CEL-Seq2-based RNA-Sequencing of single cells, i.e. by enhancing the efficacy of the linear amplification of cDNA. Specifically, a significantly increased number of genes was detected with CEL-Seq+. With the modified CEL-Seq2 reverse transcription primer (SEQ ID NO: 15) 9749 genes were detected per cell on average (n=24 cells), whereas only 8281 genes were detected per cell on average (n=14 cells) with the conventional CEL-seq2 reverse transcription primer (SEQ ID NO: 12) (left plot). Moreover, CEL-SEQ+ significantly increased the number of detected molecules. With the modified CEL-Seq2 reverse transcription primer (SEQ ID NO: 15), 85066 unique molecular identifiers (UMIs)/transcripts per cell were detected on average, whereas only 53541 UMIs per cell were detected on average with the conventional CEL-seq2 reverse transcription primer (SEQ ID NO: 12) (right plot). Statistical analysis was performed using the Mann-Whitney Wilcoxon test.
Figure 15. Average UMI count per gene (left plot) and corresponding coefficient of variation (CV) for UMI counts (right plot) in CEL-Seq2 employing the conventional CEL-seq2 reverse transcription primer (SEQ ID NO: 12) and CEL-Seq+ employing the modified CEL-Seq2 reverse transcription primer (SEQ ID NO: 15) . Shown are all genes with more than 1 average UMI per cell in both assays (n=7904). Genes are independently sorted by expression rank. 14 cells from CEL-Seq+ were randomly chosen for comparison with 14 cells from CEL-Seq2. Error bars show the standard deviation between cells. Thick lines refer to the average UMI counts (left) or CV (right) of all 14 cells interpolated across all genes assayed with either CEL- Seq2 or CEL-seq+ (as marked by the arrows).
Figure 16. Genes from deep sequenced bulk K562 RNA-Seq (Prost (2015), Nature 525) were sorted into quartiles by expression level with the lowest expressed genes in the first quartile and the highest expressed genes in the fourth quartile. Shown are the numbers of genes from the indicated bulk quartiles that were detected in individual cells by CEL-Seq2 (with the reverse transcription primer shown in SEQ ID NO: 12) or CEL-Seq+ (with the reverse transcription
primer shown in SEQ ID NO: 15). Within each of the four quartile plots, CEL-Seq2 is shown on the left, and CEL-Seq+ on the right.
Figure 17. Shown are the numbers of genes of different categories detected with CEL-Seq2 (with the reverse transcription primer shown in SEQ ID NO: 12) or CEL-Seq+ (with the reverse transcription primer shown in SEQ ID NO: 15) in individual cells. The first category encompasses genes that are differentially expressed throughout chronic myeloid leukemia (CML) disease progression (Prost (2015), Nature 525; left plot), the second category encompasses genes with the GO association “DNA binding transcription factor (TF)” (middle plot), and the third category encompasses genes with the GO association “chromatin binding” (right plot). Within each of the three gene category plots, CEL-Seq2 is shown on the left, and CEL-Seq+ on the right. Whiskers reach to 1.5x IQR away from the lst/3rd quartile.
Other aspects and advantages of the invention will be described in the following examples, which are given for purposes of illustration and not by way of limitation. Each publication, patent, patent application or other document cited in this application is hereby incorporated by reference in its entirety.
Examples
Methods and materials are described herein for use in the present disclosure; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting.
Example 1. Assessment of the strength of T7 core promoter sequences with different directly adjacent flanking regions using 5’RACE-Seq
5 ’RACE- Sea
A saturated randomized 5' RACE-Seq screen was performed to determine the self-transcription activity of T7 promoter sequences comprising the T7 core promoter, a guanine at position +1 (TSS) and different combinations of nucleotides at positions +2 to +16. In particular, the 5’ end sequence of over 100 million 210 nucleotides long RNAs transcribed from said randomized promoter sequences comprised in a 500 nucleotides long dsDNA polynucleotide library (lOng;
scale of 1010 (10 billion) molecules of the +2 to +16 randomized T7 promoter template) was determined to interrogate the influence of the directly adjacent downstream flanking region of the core promoter on the transcriptional activity of the T7 promoter. A scheme of the applied experimental procedure of 5' RACE-Seq is shown in Figure 1. 5' “RACE-Seq” refers to a rapid amplification of cDNA ends and subsequent sequencing using a next-generation-sequencing method (deep sequencing). A “saturated” screen refers to a screen of virtually any possible combination of nucleotides at positions +2 to +16 contained in a 5’RACE-Seq library. Any possible sequence at positions +2 to +8 is covered at least 100 times, but typically thousands of times during sequencing. The promoter strength of a given promoter sequence was assessed based on the relative production of transcripts from said promoter.
Generation of 5’ RACE-Seq libraries
10 ng of dsDNA template (T7_15N; SEQ ID NO:2) containing a T7 RNA polymerase promoter randomized from position +2 to +16 (gBlocks Gene Fragments, Integrated DNA Technologies) was used as input for in vitro transcription using the HiScribe T7 RNA Synthesis Kit (NEB E2040S, E2070S). After 2h incubation, RNA was purified with 1.6 volumes RNAClean XP beads (Beckman Coulter), and residual template DNA digested using the TURBO DNA- freeTM Kit (Invitrogen AMI 907). Final RNA yield was assayed using the QubitTM RNA Assay Kit. Next, 500 ng of RNA was reverse transcribed using 6.66 U/pl Superscript™ IV (Invitrogen 18090010) and 1.2 mM RT oligonucleotide. Following RNAseA and RNAseH treatment, cDNA was purified using 1.6 volumes Ampure XP beads and eluted in 10 pi water. A poly (A)-tailing reaction was carried out using 3 U/mI terminal transferase (Invitrogen 10533). Second-strand cDNA synthesis was performed in 50 pi with 0.5 pM Oligo i5_5N_dT20VN, 2000 U Q5® High- Fidelity DNA Polymerase, and 0.2 mM dNTPs, using the temperature cycle 98 °C for 30 sec; 55 °C for 30 sec; 72 °C for 5 min. Next, 2 pL Exonuclease 1 and 48 pi water were added and the reaction was incubated for 1 h at 37 °C. Then, dsDNA was purified with 1.6 volumes AMPure XP beads, and eluted in 22 pi water. Sequencing adapters were introduced during 8 cycles of PCR with KAPA HiFi HotStart. Libraries were purified twice with 1.2 volumes AMPure XP beads and sequenced using a NextSeq500 instrument from Illumina in paired-end mode (2x35), detecting the randomized 15mer sequence in Read 1 and a 5 base UMI (unique molecular identifier) in Read 2. As the adapter for second strand synthesis following polyadenylation contained a UMI, any bias introduced during the final 8 cycles of PCR could be filtered out.
For background library preparation, a total of 2.5 ng of dsDNA template (T7 15N) was
amplified by PCR using KAPA HiFi HotStart Polymerase, P5 and BG T7 sequencing adapters. Here, simply 7 cycles of PCR had to be performed. Libraries were purified twice with 0.9 volumes AMPure XP beads and sequenced in single end mode using a NextSeq 500 instrument from Illumina. The first 15 nucleotides were used for data analysis.
The dsDNA template for RACE-Seq sequence was as follows (SEQ ID NO:2):
ACGAGAGCTTGTCTGCTCCCAGGCATCGCTTACAGACCTATATGACCACGCGTAT
CGATGTCGACCTACACGACGCTCTTCCGATCTTTTCATCGGGCCGTGCAGGCE44r
A CGA CTCA CTA rAGNNNNNNNNNNNNNNNCCTCTCGTTCTGTCGAAGGGC ATCGA
CTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAG
CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTT
CAAGATCCGCCACAACATCGAGGACGCAAGATCGGAAGAGCGGTTCAGCACC
GCATCGAG wherein bold and italics refer to the T7 core sequence, bold and not in italics to the sequencing adapter binding site, and bold and underlined to the reverse transcription binding site, and wherein N denotes any nucleotide selected from the group consisting of A, C, G, and T.
Sequence data analysis
Raw reads from 5’RACE-Seq libraries were filtered for PhiX using Bowtie 2 (http://bowtie- bio.sourceforge.net/bowtie2/index.shtml). Duplicate 15mer-UMl combinations were removed. Forward reads harboring 15mer sequences were selected for the presence of a C at position 16, followed by TT, CT or CC at position 17 and 18 (to accommodate for Cs added by the reverse transcriptase to the end of cDNAs, as well as for T7 polymerase sliding in promoter variants with transcription starting on triple G sequences), followed by a polyT sequence, wherein positions refer to positions within the respective forward read. After reverse-complementing the 15mer, lOmers corresponding to position +2 to +11 in the in vitro transcribed RNA (as numbered within the T7 promoter) were mapped against the constant sequences of the sequencing library and T7 template DNA using Bowtie 2 allowing one mismatch, to remove spurious contaminating sequences. After filtering, 35c10L6 and 84x10L6 reads were used for motif analysis in replicates A and B, respectively. For background libraries, reads were filtered for PhiX using Bowtie 2, resulting in 232c10L6 total reads for motif analysis. To determine relative abundances of +2 to +8 promoter variants, reads with identical +2 to +8 sequence were pooled, and the resulting read counts were divided by the total number of filtered reads and by
the corresponding counts in background libraries. Homopolymeric motifs were removed from the analysis as G nucleotides appeared to be slightly disproportionally removed by the real time analysis (RTA) software application of the Illumina NextSeq 500 sequencer (reads containing stretches of dark sequencing cycles correspond to Gs in this two-fluorescence channel sequencing system).
Results of 5’RACE-Seq with T7 promoter variants and a T7 RNA polymerase
The most enriched nucleotide dimer at positions +2 and +3 consisted of two deoxyguanosines (Figure 2). Thus, in combination with the conserved deoxyguanosine at position +1, 5’RACE- Seq revealed that the most effective bases at positions +1 to +3 of the T7 promoter were three guanines (“GGG”).
A substantial sequence preference was observed within the first 8 nucleotides (+1 to +8), as shown by the high variance in the base frequency. In contrast, a comparatively low variance was observed at positions +9 to +16 (Figure 3). Hence, 5’RACE-Seq revealed that the T7 promoter strength depended primarily on the nucleotides at positions +1 to +8. The measured self-transcription activity of individual T7 promoter variants with different +4 to +8 nucleotide sequences following GGG at positions +1 to +3 was highly reproducible in two replicate experiments (Figure 4).
A comprehensive list of 5’RACE-Seq data for T7 promoter variants comprising three guanines at positions +1 to +3 and a certain set of nucleotide sequences at positions +4 to +8 is shown in Table 1. The downstream flanking regions (positions +1 to +8) of T7 promoter variants are ranked based on the transcriptional activity measured by 5’RACE-Seq. Furthermore, the normalized activity measurements of two replicate experiments (A and B) and the mean of the normalized activity obtained in the two experiments is shown.
Table 1. 5’ RACE- Seq data with T7 promoter variants
Rank Pos. +1 to +8 Activity (A) Activity (B) Mean Activity
1 GGG A A AT A 3.525168248 3.823209146 3.674188697
2 GGGAAAAT 3.438692553 3.789168012 3.613930283
3 GGG A AT AT 3.232736283 3.601622929 3.417179606
GGGATAAT 3.097185327 3.594921408 3.346053368
GGGAGAAT 3.075681634 3.586431153 3.331056393
GGGAATAC 3.053363516 3.415617863 3.23449069
GGGAAGTA 2.972851942 3.388513435 3.180682688
GGGAGATT 2.869634955 3.420750535 3.145192745
GGGAGATA 2.900774601 3.374546127 3.137660364
GGGAAATG 2.945055434 3.315268421 3.130161927
GGGAAAAC 2.924415033 3.321505123 3.122960078
GGGAAAGT 2.951492193 3.291203249 3.121347721
GGGTAAAT 2.951959296 3.27633853 3.114148913
GGGATATT 2.897644609 3.313241754 3.105443182
GGGAATAA 2.837061301 3.314414742 3.075738022
GGGATACA 2.821467492 3.317186151 3.069326822
GGGAATTA 2.876415915 3.257396204 3.06690606
GGGAGATG 2.888913879 3.167365009 3.028139444
GGGAAATT 2.904651194 3.149765628 3.027208411
GGGATATA 2.802264185 3.246699636 3.02448191
GGGTGAAT 2.778302803 3.251270686 3.014786744
GGGAAAAG 2.740336533 3.200535702 2.970436117
GGGAAAAA 2.834863563 3.105204943 2.834863563
GGGATAAC 2.763659019 3.173558391 2.968608705
GGGATACT 2.661347668 3.226933538 2.944140603
GGGTGATG 2.795002498 3.090837867 2.942920183
GGGTAATA 2.766622429 3.115855797 2.941239113
GGGTGACG 2.885805743 2.986973135 2.936389439
GGGAAAGA 2.696791071 3.16673056 2.931760815
GGGAATAG 2.676987678 3.178098041 2.92754286
GGGTGACT 2.739784061 3.103831997 2.921808029
GGGTGACA 2.73581603 3.085354489 2.910585259
GGGAAGAT 2.640417175 3.175261328 2.907839252
GGGAGACT 2.668992818 3.143510608 2.906251713
GGGATAAG 2.693674813 3.118556314 2.906115564
GGGTGATT 2.696857823 3.114911609 2.905884716
GGGAAGTT 2.65827908 3.153233322 2.905756201
GGGTAAGT 2.740432423 3.019535979 2.879984201
GGGTAATG 2.706213926 3.049661779 2.877937852
GGGTAATT 2.718573103 3.028335092 2.873454097
GGGATATG 2.598076221 3.137393258 2.86773474
GGGAGAAA 2.631782575 3.095781911 2.863782243
GGGAAACA 2.670917539 3.050891454 2.860904497
GGGTGATA 2.615365374 3.098605087 2.85698523
GGGAAATC 2.704008581 3.007839613 2.855924097
GGGATTAT 2.616186414 3.081415351 2.848800882
GGGTAATC 2.692477936 2.991362364 2.84192015
GGGTGAAG 2.673033067 3.009394835 2.841213951
GGGATAGT 2.649694018 3.03117066 2.840432339
GGGATAAA 2.614840591 3.060912475 2.837876533
GGGATTAC 2.618533517 2.991334561 2.804934039
GGGAGACA 2.607068837 2.999938955 2.803503896
GGGAGTAT 2.522614184 3.082969421 2.802791802
GGGTGAGT 2.664084601 2.937448795 2.800766698
GGGTATAT 2.61460393 2.985719391 2.80016166
GGGAGATC 2.510935934 3.037315306 2.77412562
GGGACAAT 2.534079055 3.005570996 2.769825025
GGGAAGAC 2.573541755 2.962569118 2.768055437
GGGTAAAG 2.636775491 2.880652312 2.758713901
GGGATATC 2.556732902 2.959057729 2.757895316
GGGAAGTC 2.631055179 2.881123403 2.756089291
GGGAGAAC 2.539694252 2.941637993 2.740666122
GGGAGACG 2.543611003 2.934566036 2.73908852
GGGAAGTG 2.516972031 2.922638554 2.719805292
GGGTAGAT 2.583853931 2.854888699 2.719371315
GGGAGAGT 2.492741578 2.941430236 2.717085907
GGGAATGT 2.49929406 2.909344717 2.704319389
GGGAGAAG 2.506781408 2.899277796 2.703029602
GGGTAGTT 2.555265009 2.846707963 2.700986486
GGGTGTAG 2.615352515 2.778187954 2.696770234
GGGTAACA 2.533091616 2.848794729 2.690943172
GGGAGTAG 2.440985911 2.903625647 2.672305779
GGGAATGA 2.443050742 2.894136575 2.668593658
GGGACATA 2.415643828 2.908091944 2.661867886
GGGAAACT 2.444148231 2.875473019 2.659810625
GGGAACTA 2.430998927 2.888583778 2.659791352
GGGTGATC 2.419206738 2.894117701 2.65666222
GGGTAACT 2.444109396 2.850656605 2.647383
GGGTGTAT 2.424965377 2.856765463 2.64086542
GGGATAGA 2.467401891 2.813455435 2.640428663
GGGTATTA 2.465268736 2.79993901 2.632603873
GGGATTAG 2.350601593 2.893425139 2.622013366
GGGAATTG 2.455477307 2.759835535 2.607656421
GGGATTAA 2.398889072 2.809019347 2.60395421
GGGTAGTC 2.513950486 2.675168106 2.594559296
GGGTAGTG 2.477116173 2.711622605 2.594369389
GGGATAGC 2.351672105 2.835310639 2.593491372
GGGTAAAC 2.439773564 2.743370846 2.591572205
GGGTAGAC 2.424153979 2.745254592 2.584704286
GGGTACAT 2.43555543 2.733071621 2.584313526
GGGTAAGC 2.469103654 2.69475792 2.581930787
GGGAGTAA 2.314737543 2.823695711 2.569216627
GGGAACAC 2.323461972 2.811493817 2.567477895
GGGTGAAA 2.330090711 2.801544034 2.565817372
GGGATACG 2.320300681 2.809100901 2.564700791
GGGAAACG 2.405717439 2.707372291 2.556544865
GGGTGAGC 2.423439891 2.6876188 2.555529346
GGGTAGTA 2.394560426 2.70744816 2.551004293
GGGAAAGC 2.353424293 2.728586852 2.541005572
GGGACAAC 2.246809452 2.8266996 2.536754526
GGGAAGAA 2.26851203 2.803885432 2.536198731
GGGACATT 2.329625189 2.741597047 2.535611118
GGGTATTG 2.367929344 2.69411322 2.531021282
GGGAGTTA 2.298925226 2.747558214 2.52324172
GGGACACA 2.281647623 2.759472589 2.520560106
GGGAACAT 2.322575333 2.718004853 2.520290093
GGGTAAGA 2.337434156 2.696779147 2.517106652
GGGAATCA 2.263582631 2.746542613 2.505062622
GGGGAATA 2.264737749 2.731519678 2.498128714
GGGAACAA 2.219442057 2.774576166 2.497009112
GGGAAACC 2.291204308 2.699796118 2.495500213
GGGTAACG 2.339762518 2.649176272 2.494469395
GGGTACAC 2.340027025 2.630472091 2.485249558
GGGTCATA 2.408042629 2.553972856 2.481007743
GGGATGAT 2.259865364 2.695257376 2.47756137
GGGACAGT 2.274875643 2.661297488 2.468086565
GGGTGTAA 2.233272358 2.670932091 2.452102224
GGGTATAA 2.268812122 2.628036487 2.448424304
GGGAATGC 2.226058142 2.670720286 2.448389214
GGGTCAAT 2.321239547 2.572445229 2.446842388
GGGAACAG 2.187053649 2.704966571 2.44601011
GGGTGAGA 2.288140393 2.599166118 2.443653256
GGGTATTC 2.287414018 2.590367798 2.438890908
GGGTGTAC 2.256179165 2.609760255 2.43296971
GGGAGTAC 2.276089658 2.583595912 2.429842785
GGGTATGT 2.308667464 2.533499682 2.421083573
GGGAATCT 2.207419595 2.61344807 2.410433832
GGGACACT 2.160048721 2.653218578 2.40663365
GGGTAGGT 2.301560408 2.504926886 2.403243647
GGGTACTA 2.25242405 2.533334944 2.392879497
GGGTATAG 2.25569686 2.515430556 2.385563708
GGGACAAA 2.112463309 2.635037684 2.373750496
GGGATACC 2.15880511 2.582377789 2.370591449
GGGAGAGC 2.222711715 2.511843928 2.367277821
GGGTACAG 2.208226606 2.522376518 2.365301562
GGGACATG 2.17763587 2.548977868 2.363306869
GGGATGAC 2.186835287 2.536098983 2.361467135
GGGTGTTA 2.188037663 2.5288003 2.358418981
GGGACAAG 2.135488316 2.580642926 2.358065621
GGGTATAC 2.231101467 2.476377725 2.353739596
GGGGAAAT 2.172462515 2.534126221 2.353294368
GGGAGAGA 2.154888899 2.545266441 2.35007767
GGGGAATT 2.133100297 2.545183507 2.339141902
GGGTACAA 2.130819175 2.546468329 2.338643752
GGGTATCA 2.159975668 2.516627535 2.338301601
GGGAGACC 2.030706645 2.633735328 2.332220986
GGGTGAAC 2.216872909 2.447407227 2.332140068
GGGATTTA 2.130172498 2.532604807 2.331388652
GGGTAGAA 2.148878507 2.513485388 2.331181948
GGGGATTA 2.141310831 2.510496704 2.325903768
GGGAATTT 2.18858784 2.461124006 2.324855923
GGGTGTGT 2.198335029 2.444737988 2.321536508
GGGTGTCT 2.174406603 2.457650545 2.316028574
GGGAACCA 2.056753591 2.569208373 2.312980982
GGGAGTTG 2.144283404 2.480395815 2.312339609
GGGTGACC 2.070818868 2.515940589 2.293379728
GGGTATGA 2.116762323 2.44124428 2.279003302
GGGACAGA 2.112343544 2.444615914 2.278479729
GGGAAAGG 2.086520294 2.465835554 2.276177924
GGGATTGA 2.07577825 2.46933734 2.272557795
GGGAATGG 2.057805742 2.480679717 2.26924273
GGGAATTC 2.089684697 2.448179526 2.268932111
GGGTACTG 2.136828571 2.400067079 2.268447825
GGGAACTC 1.990947214 2.545818409 2.268382812
GGGTGTCA 2.132187157 2.396965306 2.264576231
GGGACACG 2.040543386 2.482632121 2.261587754
GGGTATTT 2.124420198 2.390429513 2.257424856
GGGGATAT 2.061329679 2.452278288 2.256803983
GGGACATC 2.012648332 2.487042106 2.249845219
GGGACTAT 2.036250084 2.462174887 2.249212485
GGGAATCC 2.004999389 2.488090451 2.24654492
GGGTTAAT 2.108810747 2.382916154 2.245863451
GGGTAAAA 2.114827135 2.373009687 2.243918411
GGGTAACC 2.073748571 2.406702831 2.240225701
GGGATGTA 2.064964503 2.406583709 2.235774106
GGGTGTCG 2.194064494 2.273300293 2.233682393
GGGAGTTC 2.029853974 2.431526858 2.230690416
GGGTACTT 2.051229964 2.407713149 2.229471557
GGGGAATC 2.001475686 2.456648719 2.229062202
GGGGAATG 2.039064528 2.415949439 2.227506983
GGGACAGC 2.058922725 2.386210877 2.222566801
GGGAACTT 1.982161271 2.450320651 2.216240961
GGGACTAC 2.001614949 2.415561718 2.208588334
GGGTGTTC 2.007655259 2.397526441 2.20259085
GGGTACTC 2.10743426 2.284449499 2.19594188
GGGAGTGT 1.962581201 2.409956088 2.186268644
GGGAGTCC 2.001160528 2.342501865 2.171831197
GGGATGAA 1.98644248 2.355115159 2.170778819
GGGAACCT 1.978036933 2.361030766 2.16953385
GGGTGAGG 2.049138883 2.270687567 2.159913225
GGGAGTGG 1.98084729 2.332774235 2.156810762
GGGTTATT 2.003451267 2.304575258 2.154013263
GGGTATGC 2.030888693 2.275129093 2.153008893
GGGAACTG 1.919623052 2.37333938 2.146481216
GGGATGAG 1.978952839 2.306845553 2.142899196
GGGTTACA 2.005904056 2.277918985 2.14191152
GGGAGTGA 1.923839752 2.35691958 2.140379666
GGGTTATG 2.033259992 2.245053466 2.139156729
GGGTGTGA 1.973488452 2.299700215 2.136594333
GGGTCATT 2.018069144 2.254047135 2.136058139
GGGGATTG 1.986976034 2.270625643 2.128800839
GGGTGTGC 1.986333191 2.26705449 2.12669384
GGGATTGT 1.974482657 2.265271634 2.119877145
GGGTATCT 1.923758953 2.313401648 2.118580301
GGGTAGAG 2.000314625 2.234481254 2.11739794
GGGAGTCT 1.933203058 2.289694262 2.11144866
GGGAAGCC 1.951507181 2.269882841 2.110695011
GGGTTAAG 1.961308247 2.258714887 2.110011567
GGGATTCA 1.918470215 2.300013379 2.109241797
GGGAGTGC 1.928866039 2.283330454 2.106098246
GGGTTATA 1.947486113 2.26231139 2.104898752
GGGTACGA 1.949726554 2.245653816 2.097690185
GGGTACGT 1.994707803 2.196961156 2.09583448
GGGTATCG 1.928579924 2.23642074 2.082500332
GGGTGTGG 1.978837818 2.183286129 2.081061973
GGGGATAC 1.897417132 2.263868607 2.08064287
GGGATAGG 1.900245625 2.260756457 2.080501041
GGGTTACT 1.917138265 2.233515628 2.075326946
GGGATGTG 1.908908946 2.238603734 2.07375634
GGGGATTC 1.884724439 2.256374458 2.070549449
GGGAGTCA 1.88628317 2.243065618 2.064674394
GGGTAGCC 1.919484828 2.190115597 2.054800213
GGGTCAAG 1.879268387 2.218725531 2.048996959
GGGTGTTG 1.884623073 2.205325625 2.044974349
GGGACTAA 1.86597052 2.219800333 2.042885427
GGGGAACT 1.859803904 2.220456335 2.04013012
GGGGATAA 1.863240128 2.211652305 2.037446216
GGGTCACA 1.872957221 2.1967677 2.03486246
GGGGAAAG 1.885389902 2.178418754 2.031904328
GGGATCAT 1.776386418 2.282007784 2.029197101
GGGAATCG 1.851352612 2.204271098 2.027811855
GGGGTAAT 1.856054509 2.195058451 2.02555648
GGGATTGG 1.833951997 2.203394501 2.018673249
GGGTGTTT 1.822871343 2.213474158 2.01817275
GGGGAAAC 1.85390056 2.178736954 2.016318757
GGGAACGA 1.811149881 2.217627235 2.014388558
GGGAGTTT 1.818170659 2.208710908 2.013440783
GGGCAGAT 1.801456648 2.222732349 2.012094498
GGGTTAGT 1.891836656 2.12998131 2.010908983
GGGTCAAA 1.845984922 2.15868546 2.002335191
GGGAAGGT 1.825490677 2.174186025 1.999838351
GGGATTTG 1.833596725 2.163511681 1.998554203
GGGGAAGT 1.814243726 2.179295865 1.996769795
GGGTCAAC 1.842827604 2.140469817 1.99164871
GGGTTACG 1.836256406 2.146983736 1.991620071
GGGATTCT 1.804700296 2.177155428 1.990927862
GGGTTAAA 1.853838743 2.126031942 1.989935343
GGGGATAG 1.846680125 2.132792 1.989736063
GGGACTAG 1.809495949 2.16648971 1.98799283
GGGATTGC 1.787847117 2.187882494 1.987864806
GGGAACGT 1.779509345 2.194668607 1.987088976
GGGCAAAT 1.769460291 2.197696402 1.983578346
GGGCAATA 1.736199936 2.222101206 1.979150571
GGGTAAGG 1.807098143 2.13709442 1.972096281
GGGGACAT 1.779846652 2.161025476 1.970436064
GGGTATCC 1.779402162 2.138804277 1.959103219
GGGATGTC 1.807447869 2.109060772 1.95825432
GGGGATCA 1.808455369 2.103912761 1.956184065
GGGTTAAC 1.843988526 2.068056711 1.956022618
GGGTTATC 1.803601166 2.105858303 1.954729734
GGGGTATT 1.786264531 2.117142725 1.951703628
GGGGTACA 1.794289886 2.10848321 1.951386548
GGGCAGAC 1.730448272 2.171266719 1.950857495
GGGAGTCG 1.838655383 2.062612176 1.950633779
GGGGTACT 1.801854803 2.092110181 1.946982492
GGGACACC 1.77132266 2.121212474 1.946267567
GGGTAGCT 1.769324462 2.109812526 1.939568494
GGGATCAG 1.713688598 2.163470211 1.938579404
GGGGAACA 1.757411931 2.117862017 1.937636974
GGGTCAGT 1.837620369 2.031553479 1.934586924
GGGTACGC 1.715353221 2.147611128 1.931482174
GGGGACTA 1.775581273 2.086763029 1.931172151
GGGTTAGA 1.786527721 2.072526492 1.929527106
GGGTCATG 1.845253936 1.992488922 1.918871429
GGGTGGAT 1.752532486 2.077762209 1.915147348
GGGTAGCA 1.762845665 2.057330281 1.910087973
GGGTGTCC 1.725938302 2.083058753 1.904498528
GGGACAGG 1.686501109 2.122330168 1.904415638
GGGATCAA 1.672341435 2.132516982 1.902429209
GGGTACCA 1.7406267 2.062709357 1.901668029
GGGCAAAC 1.661357046 2.137396145 1.899376596
GGGGAACG 1.749084075 2.04670937 1.897896722
GGGGTAAC 1.777022003 2.016881658 1.896951831
GGGGACAA 1.693414331 2.099698345 1.896556338
GGGATCAC 1.635619135 2.150292911 1.892956023
GGGGACAG 1.733449263 2.048787741 1.891118502
GGGTAGCG 1.776748302 2.001880584 1.889314443
GGGTAGGC 1.793314381 1.982093153 1.887703767
GGGCATAT 1.63624506 2.126471303 1.881358182
GGGAAGAG 1.73378526 2.026863781 1.880324521
GGGACTTA 1.693202305 2.063148682 1.878175494
GGGTTAGC 1.74071129 2.015219347 1.877965318
GGGCGAAT 1.669320912 2.066414686 1.867867799
GGGTCACG 1.725618522 2.002765097 1.86419181
GGGAACGC 1.670765666 2.057238826 1.864002246
GGGTATGG 1.694779359 2.029463821 1.86212159
GGGATTTC 1.707513263 2.006913807 1.857213535
GGGGATTT 1.669204112 2.036490769 1.85284744
GGGGTAGT 1.754650412 1.950374087 1.85251225
GGGATTCG 1.692832615 2.010368803 1.851600709
GGGTCACT 1.683783529 2.017823831 1.85080368
GGGCAATG 1.625698706 2.073594707 1.849646706
GGGGTATA 1.670594464 2.01302099 1.841807727
GGGGTAAG 1.708796353 1.968739394 1.838767874
GGGCAAGT 1.64458869 2.027414308 1.836001499
GGGAAGGA 1.690504412 1.980296687 1.835400549
GGGGTATG 1.686322928 1.974051142 1.830187035
GGGGTTAT 1.62646437 2.03177573 1.82912005
GGGGTACG 1.688328901 1.966081093 1.827204997
GGGATTCC 1.638941658 2.007231792 1.823086725
GGGGAAAA 1.672829066 1.966547355 1.81968821
GGGTGGTT 1.593443902 2.023827303 1.808635602
GGGACTCA 1.648432061 1.96549606 1.80696406
GGGTTGAT 1.662083756 1.947197737 1.804640746
GGGTGGAC 1.613064753 1.995787984 1.804426368
GGGGATGT 1.662265782 1.943765994 1.803015888
GGGATCTA 1.583372966 2.021305925 1.802339446
GGGGATGA 1.70372902 1.897891153 1.800810087
GGGCAAAG 1.570885798 2.029499293 1.800192546
GGGTCATC 1.688791496 1.908520868 1.798656182
GGGCAGTA 1.573127458 2.016218327 1.794672892
GGGAACCC 1.593360743 1.994660137 1.79401044
GGGGATCT 1.5841111 2.000569639 1.79234037
GGGGAGTT 1.595272827 1.987373176 1.791323001
GGGGTATC 1.642562982 1.939046191 1.790804587
GGGCAATC 1.526138714 2.054371814 1.790255264
GGGGATGC 1.667925593 1.901832917 1.784879255
GGGATGTT 1.624490234 1.944619888 1.784555061
GGGGTTAC 1.597995144 1.970107387 1.784051266
GGGTCAGA 1.655173077 1.90908927 1.782131173
GGGATGGC 1.61298302 1.948734118 1.780858569
GGGGACTG 1.605119159 1.943975019 1.774547089
GGGTGGTA 1.617700962 1.93080561 1.774253286
GGGTTGAG 1.679437455 1.865515321 1.772476388
GGGGTAAA 1.63252583 1.906202708 1.769364269
GGGGACTC 1.630680544 1.907510477 1.769095511
GGGACTGA 1.550643585 1.979550997 1.765097291
GGGGACAC 1.592349826 1.917966732 1.755158279
GGGTCAGC 1.656889813 1.848468341 1.752679077
GGGTCTAT 1.609816684 1.878214127 1.744015405
GGGTGGAG 1.630971685 1.856199831 1.743585758
GGGGATCG 1.60261806 1.881831213 1.742224637
GGGAACCG 1.556661433 1.918884348 1.73777289
GGGGACTT 1.532038828 1.943054043 1.737546436
GGGAACGG 1.553036029 1.919142579 1.736089304
GGGGAGTA 1.554869343 1.912314022 1.733591682
GGGCAGTC 1.52701458 1.940032102 1.733523341
GGGCTAAC 1.511457346 1.947263378 1.729360362
GGGCAGTG 1.526367853 1.90433955 1.715353702
GGGGTTAG 1.525572092 1.904637137 1.715104615
GGGCAATT 1.470997857 1.95474127 1.712869564
GGGATGGA 1.563489467 1.855498032 1.709493749
GGGTAGGA 1.583340067 1.834319804 1.708829936
GGGCGAAC 1.429720572 1.986701602 1.708211087
GGGAGAGG 1.575359746 1.839136342 1.707248044
GGGGAGTC 1.555059557 1.857268326 1.706163941
GGGCAAGA 1.522289184 1.887797475 1.705043329
GGGAAGCA 1.528104647 1.875894815 1.701999731
GGGATGCC 1.580459643 1.821511285 1.700985464
GGGGATCC 1.515186554 1.885826632 1.700506593
GGGTTTAT 1.545238609 1.854616224 1.699927417
GGGCGATA 1.460915718 1.93571387 1.698314794
GGGGTTAA 1.5073654 1.880395033 1.693880216
GGGACTGT 1.506244801 1.877306662 1.691775731
GGGTGGTC 1.609705307 1.773005387 1.691355347
GGGACTCT 1.514610124 1.866191758 1.690400941
GGGATGGT 1.514898672 1.865058901 1.689978786
GGGTGGAA 1.516743556 1.856077199 1.686410378
GGGTACCT 1.543940408 1.820318887 1.682129648
GGGTGGTG 1.535761259 1.826622504 1.681191882
GGGGAAGA 1.478138965 1.877285853 1.677712409
GGGATGCT 1.456603492 1.8890316 1.672817546
GGGAGGAT 1.494080791 1.847644161 1.670862476
GGGACTTG 1.510541129 1.821692682 1.666116905
GGGTGGCG 1.645349808 1.683769409 1.664559608
GGGCAGTT 1.439173432 1.882698028 1.66093573
GGGATCCA 1.469075831 1.84711568 1.658095755
GGGCGATG 1.472830846 1.842379724 1.657605285
GGGCGAAG 1.457935462 1.853300753 1.655618107
GGGTCTAA 1.482885656 1.823722267 1.653303961
GGGCTACT 1.463545566 1.841685823 1.652615695
GGGAAGCT 1.486008897 1.816850091 1.651429494
GGGTCTAC 1.544931566 1.748979523 1.646955545
GGGACTCG 1.520591474 1.773267005 1.646929239
GGGCAGAG 1.471302937 1.820186931 1.645744934
GGGTACGG 1.487768002 1.798865575 1.643316788
GGGCCATT 1.481861123 1.803900752 1.642880938
GGGCTACA 1.461861958 1.821151072 1.641506515
GGGCCAAT 1.395127211 1.886170583 1.640648897
GGGCATAG 1.400317969 1.876220267 1.638269118
GGGAGCCA 1.477066272 1.793441079 1.635253675
GGGCTAAG 1.431618202 1.838726933 1.635172567
GGGAAGGC 1.454240347 1.813098852 1.6336696
GGGGAAGC 1.489272707 1.776972389 1.633122548
GGGGTAGC 1.506115066 1.759395398 1.632755232
GGGATGCA 1.447850443 1.802776458 1.625313451
GGGTTTAA 1.466354467 1.783656194 1.62500533
GGGTTTAG 1.489736855 1.760137868 1.624937361
GGGGTTGT 1.49651839 1.748225411 1.622371901
GGGCAAGG 1.424995544 1.81876264 1.621879092
GGGCCACA 1.436218502 1.806971468 1.621594985
GGGTTACC 1.484405655 1.757551991 1.620978823
GGGACTGC 1.482441838 1.756748767 1.619595302
GGGTTGAC 1.4835237 1.751966903 1.617745302
GGGACCAT 1.391563902 1.843761574 1.617662738
GGGCTATT 1.389697114 1.844599631 1.617148372
GGGACGAC 1.471281341 1.753290773 1.612286057
GGGGAGTG 1.50063389 1.719468583 1.610051236
GGGCTATC 1.38764757 1.828850422 1.608248996
GGGCAACC 1.358050489 1.856238652 1.607144571
GGGTTAGG 1.4717753 1.740323148 1.606049224
GGGTTGAA 1.47087015 1.735764665 1.603317408
GGGCCAAC 1.436699332 1.76916963 1.602934481
GGGGAGAT 1.402145235 1.802664535 1.602404885
GGGCAGAA 1.41410585 1.790127941 1.602116896
GGGTGGCT 1.513493212 1.686508095 1.600000654
GGGCGACT 1.312354396 1.887007343 1.59968087
GGGCCATA 1.382569396 1.815520858 1.599045127
GGGGAACC 1.380503305 1.816507548 1.598505427
GGGAGGTG 1.457080956 1.732427301 1.594754128
GGGTCAGG 1.497351152 1.691501026 1.594426089
GGGAGGTA 1.441960796 1.746654735 1.594307766
GGGCTTAT 1.392255014 1.795844272 1.594049643
GGGCCAGT 1.43039275 1.754314482 1.592353616
GGGCAACT 1.373333695 1.804899146 1.589116421
GGGCATAC 1.377498049 1.800576227 1.589037138
GGGGTTGA 1.433340241 1.744351841 1.588846041
GGGACGAT 1.394853751 1.779439992 1.587146872
GGGACCAC 1.366687654 1.806889838 1.586788746
GGGCAAGC 1.352731406 1.820770559 1.586750982
GGGACTTC 1.403547037 1.766693147 1.585120092
GGGCGACA 1.359292129 1.805348834 1.582320482
GGGAGCCT 1.393461798 1.765522529 1.579492164
GGGGTAGA 1.460561415 1.692538316 1.576549866
GGGCTACG 1.404321615 1.747978846 1.57615023
GGGCACAC 1.386118947 1.764020613 1.57506978
GGGTTGTG 1.522599995 1.621944101 1.572272048
GGGCTATA 1.377250619 1.763506856 1.570378738
GGGACCAG 1.342424906 1.797054914 1.56973991
GGGCCACT 1.377106659 1.756823596 1.566965127
GGGCTAAT 1.3904259 1.741354544 1.565890222
GGGCATAA 1.372483573 1.759021256 1.565752415
GGGCGTAT 1.40132169 1.729406043 1.565363867
GGGACTGG 1.420247216 1.710235376 1.565241296
GGGCACAT 1.371738926 1.755168094 1.56345351
GGGCTATG 1.381549424 1.740212673 1.560881048
GGGGAGAC 1.376454908 1.744805575 1.560630242
GGGCAACG 1.382860748 1.736910151 1.55988545
GGGTCTCA 1.436485865 1.682769966 1.559627916
GGGTCTAG 1.424928646 1.688187817 1.556558231
GGGCCAGC 1.385405381 1.722791147 1.554098264
GGGCATTA 1.346556974 1.758832655 1.552694814
GGGATGCG 1.35726161 1.747988087 1.552624848
GGGAGCAT 1.36925603 1.732111841 1.550683935
GGGCGATT 1.310772016 1.788839949 1.549805982
GGGCAGGT 1.347839216 1.747456287 1.547647751
GGGCGAGT 1.338167127 1.756998151 1.547582639
GGGAGCCC 1.427378836 1.658784861 1.543081849
GGGAGCAA 1.39261322 1.686631705 1.539622462
GGGATCTG 1.314061706 1.764628505 1.539345106
GGGCCAAG 1.336849487 1.741391858 1.539120673
GGGCTAGT 1.360885996 1.714869427 1.537877712
GGGCACAG 1.329583517 1.732671993 1.531127755
GGGAGGAA 1.364904501 1.694282761 1.529593631
GGGAGGAG 1.426232735 1.632894288 1.529563512
GGGAAGCG 1.37697587 1.678792607 1.527884239
GGGGACGT 1.41293951 1.640203505 1.526571508
GGGACTCC 1.399591498 1.649627215 1.524609357
GGGCTAGA 1.339816238 1.702635776 1.521226007
GGGCGATC 1.235044686 1.801303449 1.518174068
GGGTGGCA 1.412128433 1.623400169 1.517764301
GGGCGACG 1.33013329 1.704435473 1.517284382
GGGAGGTT 1.316687896 1.717068602 1.516878249
GGGATCCT 1.304687068 1.726814863 1.515750965
GGGCTGAC 1.339572808 1.690562329 1.515067569
GGGGCAAT 1.331998275 1.697109582 1.514553929
GGGCAACA 1.332695462 1.696025918 1.51436069
GGGTTTAC 1.376185845 1.64734778 1.511766812
GGGATCTC 1.315294526 1.70650872 1.510901623
GGGGTGAT 1.347381879 1.674260883 1.510821381
GGGGCAAG 1.334151845 1.685965897 1.510058871
GGGACCAA 1.311266617 1.705467719 1.508367168
GGGATCCG 1.291676679 1.721112137 1.506394408
GGGACTTT 1.356008382 1.656083368 1.506045875
GGGCAGGA 1.332176668 1.67846716 1.505321914
GGGAGGTC 1.398761864 1.609112555 1.503937209
GGGGTTTA 1.306299152 1.700034139 1.503166645
GGGCTAAA 1.306639007 1.696022282 1.501330645
GGGTTGTT 1.385544755 1.616691808 1.501118282
GGGACGTA 1.35863433 1.641221166 1.499927748
GGGGTGAC 1.377753902 1.61836542 1.498059661
GGGCCACG 1.322506508 1.663946091 1.493226299
GGGAGCCG 1.360848844 1.621678261 1.491263552
GGGCCAGA 1.291231679 1.690790453 1.491011066
GGGCTTAC 1.263874216 1.717338556 1.490606386
GGGTGCCT 1.324572799 1.656554119 1.490563459
GGGTACCG 1.313317613 1.653491277 1.483404445
GGGCAGCC 1.290079787 1.674263221 1.482171504
GGGCGTAC 1.290621825 1.670012963 1.480317394
GGGGAGAA 1.279350807 1.673343769 1.476347288
GGGGATGG 1.306002938 1.644952449 1.475477694
GGGCTACC 1.263986157 1.685677626 1.474831892
GGGCTTAG 1.291760993 1.654991885 1.473376439
GGGTCTTA 1.343713634 1.601447539 1.472580587
GGGCGGAC 1.416702128 1.517736966 1.467219547
GGGGTTCT 1.25183187 1.661183914 1.456507892
GGGGTTGG 1.321329655 1.589928483 1.455629069
GGGTTGTA 1.37666463 1.531249939 1.453957284
GGGTCTGA 1.353428072 1.553499661 1.453463867
GGGTGCCA 1.324037826 1.582243247 1.453140537
GGGGTTCA 1.241196334 1.659907262 1.450551798
GGGCAGCA 1.27745916 1.621221239 1.4493402
GGGCGAGG 1.288401165 1.609981022 1.449191093
GGGCATGT 1.25216033 1.641891285 1.447025808
GGGGTGTA 1.286890826 1.605329169 1.446109998
GGGGTCAT 1.316313378 1.5743067 1.445310039
GGGTCTCT 1.293983651 1.596126943 1.445055297
GGGTGGGC 1.287293885 1.599271809 1.443282847
GGGCATGA 1.293992061 1.591382317 1.442687189
GGGATCGA 1.25756324 1.621385426 1.439474333
GGGGACGA 1.296999648 1.57763962 1.437319634
GGGCGAAA 1.251814471 1.622814201 1.437314336
GGGTGGCC 1.300467157 1.570149528 1.435308343
GGGCATTG 1.249706621 1.618909159 1.43430789
GGGCATTC 1.218023868 1.65013139 1.434077629
GGGTCACC 1.275117798 1.588447256 1.431782527
GGGATCGT 1.264167937 1.597533527 1.430850732
GGGCTGAA 1.328116834 1.52346481 1.425790822
GGGTGCAA 1.260561764 1.589299505 1.424930635
GGGAGCTA 1.274231748 1.572575147 1.423403447
GGGGCATA 1.227951781 1.614338972 1.421145377
GGGCGAGC 1.233813092 1.605141516 1.419477304
GGGCCATC 1.198043039 1.638488997 1.418266018
GGGAGGAC 1.222623699 1.613529124 1.418076411
GGGAGCAG 1.325774538 1.506345834 1.416060186
GGGACCTC 1.222005738 1.609548825 1.415777282
GGGCGAGA 1.31076846 1.518002703 1.414385581
GGGACGAA 1.256057541 1.563288613 1.409673077
GGGGTTGC 1.271879776 1.546688261 1.409284019
GGGCGTAG 1.231711271 1.581772656 1.406741964
GGGCACTA 1.200263293 1.612695098 1.406479196
GGGACCTA 1.215693453 1.587225511 1.401459482
GGGTTTTA 1.282186645 1.516913458 1.399550051
GGGCCAAA 1.212886068 1.586176229 1.399531148
GGGACGTG 1.246400013 1.550856349 1.398628181
GGGGCAGA 1.286640703 1.499536328 1.393088516
GGGCACCA 1.185204245 1.596761683 1.390982964
GGGGAAGG 1.266922378 1.50419376 1.385558069
GGGGTGAA 1.205101189 1.555804405 1.380452797
GGGCAGCG 1.222680785 1.534712746 1.378696765
GGGGCAAC 1.174514181 1.576266764 1.375390472
GGGGTTTG 1.225639141 1.52102262 1.373330881
GGGTGCAT 1.20836358 1.533312312 1.370837946
GGGACGTT 1.196648275 1.544411956 1.370530115
GGGCGTAA 1.214565666 1.520253456 1.367409561
GGGGCATG 1.228579918 1.497883539 1.363231729
GGGGACGC 1.251894862 1.469558775 1.360726819
GGGGACCA 1.17974468 1.53922331 1.359483995
GGGGCACG 1.207036721 1.506721657 1.356879189
GGGGCAGT 1.176893459 1.53527039 1.356081925
GGGCAGCT 1.166147951 1.54373288 1.354940415
GGGCTTAA 1.172425555 1.53424909 1.353337322
GGGTTTGA 1.242218594 1.463802047 1.35301032
GGGGTTCG 1.165947905 1.539375975 1.35266194
GGGATCGC 1.161386807 1.539014024 1.350200415
GGGGTGTG 1.247298863 1.452310299 1.349804581
GGGCACTG 1.138470376 1.55100549 1.344737933
GGGCACTC 1.167060504 1.517037521 1.342049013
GGGCTAGC 1.16877609 1.507389291 1.33808269
GGGGTAGG 1.23564343 1.436594587 1.336119009
GGGGCATT 1.155021679 1.516843086 1.335932383
GGGTGCTA 1.168990621 1.501651876 1.335321249
GGGAGCAC 1.229726213 1.440480614 1.335103413
GGGATCGG 1.154638495 1.508409985 1.33152424
GGGCTAGG 1.162104612 1.497646619 1.329875615
GGGCTGAT 1.123587938 1.53489989 1.329243914
GGGTTTCA 1.185280615 1.470745594 1.328013105
GGGCCATG 1.183884642 1.472084329 1.327984485
GGGTTCAT 1.174715658 1.479440281 1.32707797
GGGACGGA 1.239702773 1.410834089 1.325268431
GGGGTGAG 1.194933987 1.453054235 1.323994111
GGGGTACC 1.187140147 1.460733306 1.323936727
GGGACCTT 1.153461132 1.494390687 1.32392591
GGGCATGG 1.141956823 1.503604352 1.322780587
GGGCATCC 1.116970107 1.528520049 1.322745078
GGGTTGGT 1.213708867 1.42561064 1.319659753
GGGTTGGA 1.206566406 1.432154998 1.319360702
GGGACGAG 1.129966018 1.508739384 1.319352701
GGGTGCTT 1.153617836 1.483917978 1.318767907
GGGCCAGG 1.149586096 1.48770152 1.318643808
GGGGCTAT 1.127499196 1.508667514 1.318083355
GGGCCTAC 1.172135181 1.461417432 1.316776307
GGGCAGGC 1.136149706 1.49662025 1.316384978
GGGAGCTG 1.219789571 1.412709304 1.316249438
GGGCATTT 1.127315327 1.505117692 1.316216509
GGGCATGC 1.141866436 1.489336536 1.315601486
GGGCTTTA 1.119950501 1.509177047 1.314563774
GGGGTGGC 1.233684417 1.394546834 1.314115626
GGGGCAAA 1.11343278 1.513768491 1.313600636
GGGCTGAG 1.148972598 1.477731372 1.313351985
GGGATCTT 1.118424888 1.507097982 1.312761435
GGGTGCAG 1.18485031 1.436784829 1.31081757
GGGGCTAC 1.15126594 1.467586369 1.309426155
GGGGACGG 1.269003395 1.349686897 1.309345146
GGGGCTAG 1.164638464 1.451964248 1.308301356
GGGTGGGT 1.162275835 1.449291071 1.305783453
GGGACCTG 1.139378532 1.472135584 1.305757058
GGGCAAAA 1.087953978 1.51853905 1.303246514
GGGAGCTT 1.151827059 1.454253451 1.303040255
GGGGACCT 1.09811049 1.506845146 1.302477818
GGGGTGTT 1.150050571 1.447013382 1.298531977
GGGTGCCG 1.180511478 1.41426225 1.297386864
GGGTCTCG 1.215036189 1.376562749 1.295799469
GGGGTGTC 1.189361571 1.400644946 1.295003259
GGGGAGCC 1.144723879 1.44488174 1.294802809
GGGCCACC 1.098979907 1.487988671 1.293484289
GGGTTGGC 1.185522406 1.398577526 1.292049966
GGGTTCAA 1.162783994 1.420219302 1.291501648
GGGGTCAA 1.113394585 1.468354689 1.290874637
GGGCACGG 1.126094726 1.454136159 1.290115443
GGGGTTCC 1.084441997 1.49024448 1.287343238
GGGCACGT 1.093257479 1.477056378 1.285156929
GGGGCAGC 1.147364171 1.413810081 1.280587126
GGGTTTCT 1.141472283 1.419286836 1.280379559
GGGCATCT 1.082059481 1.473159701 1.277609591
GGGATGGG 1.142405913 1.411396875 1.276901394
GGGTCGAG 1.171565411 1.380604425 1.276084918
GGGCACGC 1.053120266 1.498806231 1.275963249
GGGCCTAT 1.089637635 1.458613603 1.274125619
GGGTCGAA 1.135912262 1.407487061 1.271699662
GGGATTTT 1.158342091 1.382679144 1.270510617
GGGGTCAG 1.14229633 1.397885896 1.270091113
GGGCTCAG 1.083550978 1.446989163 1.265270071
GGGCACTT 1.079810259 1.450613075 1.265211667
GGGCACCT 1.067498925 1.458938909 1.263218917
GGGTTCTA 1.10846698 1.416981517 1.262724249
GGGTTTGT 1.174150246 1.330408522 1.252279384
GGGGCACT 1.087181723 1.41628219 1.251731956
GGGCTTGA 1.108993039 1.393978237 1.251485638
GGGTCTGT 1.168399232 1.330529981 1.249464607
GGGGCCAG 1.079035261 1.416339438 1.24768735
GGGCTGTA 1.073693277 1.411997852 1.242845564
GGGTTGTC 1.180672138 1.303177442 1.24192479
GGGTCGAT 1.136383562 1.346527432 1.241455497
GGGCACGA 1.073479606 1.405753862 1.239616734
GGGGTCTA 1.07770371 1.395716039 1.236709874
GGGGCAGG 1.119865233 1.351089406 1.23547732
GGGAAGGG 1.094235901 1.374881299 1.2345586
GGGCTCAT 1.034702298 1.433421224 1.234061761
GGGGAGAG 1.144693885 1.32310795 1.233900917
GGGTGCTC 1.096554411 1.369337594 1.232946002
GGGTCTGC 1.120569558 1.341568569 1.231069064
GGGGTTTC 1.059567724 1.399014889 1.229291306
GGGGCACA 1.07537414 1.382779119 1.22907663
GGGTCCAC 1.087165713 1.368099563 1.227632638
GGGCGGAT 1.143897601 1.302920384 1.223408993
GGGCTTCA 1.048653706 1.396143062 1.222398384
GGGTTGCT 1.097503169 1.346564738 1.222033954
GGGCGTCA 1.057975588 1.385956928 1.221966258
GGGCGGAG 1.139597345 1.303579098 1.221588222
GGGGAGCG 1.113751009 1.321392676 1.217571842
GGGCCTAA 1.081511546 1.35186859 1.216690068
GGGTTGCA 1.102717756 1.330425901 1.216571829
GGGCTGTG 1.026581915 1.404032314 1.215307115
GGGTGGGA 1.049911965 1.37551789 1.212714927
GGGACCGA 0.997250078 1.424363886 1.210806982
GGGCTTTG 1.034582443 1.386361267 1.210471855
GGGACGTC 1.092347032 1.32778294 1.210064986
GGGCTTTC 1.020506625 1.396714087 1.208610356
GGGACGCC 1.067032646 1.34947504 1.208253843
GGGGTCAC 1.025730565 1.389775354 1.207752959
GGGCGTGG 1.074736411 1.340669667 1.207703039
GGGTCCTA 1.056156973 1.355138292 1.205647633
GGGGCTTA 1.047722617 1.35989151 1.203807063
GGGTCGTG 1.144914247 1.26241838 1.203666314
GGGTGCAC 1.073059152 1.327651836 1.200355494
GGGCCTAG 1.049551502 1.350267437 1.19990947
GGGGCTAA 1.060830325 1.338016321 1.199423323
GGGGTCTG 1.07949421 1.31230278 1.195898495
GGGCGTCT 1.045495282 1.343550809 1.194523046
GGGAGGCC 1.083471734 1.303907519 1.193689627
GGGCGTTA 1.041864756 1.342913734 1.192389245
GGGAGCGT 1.024756678 1.358610631 1.191683654
GGGCTCAC 0.973767502 1.407525024 1.190646263
GGGAGCGA 1.048623591 1.331447921 1.190035756
GGGCTTCT 1.002998502 1.37609706 1.189547781
GGGTCTGG 1.120082901 1.258800862 1.189441881
GGGTCGTT 1.076164666 1.301911661 1.189038163
GGGTGCGA 1.055794911 1.315381323 1.185588117
GGGGAGCA 1.059627868 1.310097534 1.184862701
GGGGCACC 1.041077031 1.327330484 1.184203757
GGGTTCTT 1.03655947 1.329219528 1.182889499
GGGTCTTT 1.075779556 1.289333911 1.182556734
GGGAGCTC 1.051807625 1.310936225 1.181371925
GGGACCCT 0.984989653 1.37563794 1.180313797
GGGGCCAC 1.014423285 1.345965945 1.180194615
GGGACCCA 1.018652415 1.336316472 1.177484443
GGGTCTTC 1.057862556 1.293579131 1.175720844
GGGCTTCC 0.985771983 1.364993035 1.175382509
GGGCATCA 1.021133791 1.320958336 1.171046064
GGGTGCGT 1.015748695 1.321670256 1.168709476
GGGTTCAC 1.012140577 1.323796196 1.167968386
GGGCGGAA 1.073479606 1.25990523 1.166692418
GGGTTTTG 1.079161164 1.251012031 1.165086597
GGGCACCC 0.996907893 1.332285715 1.164596804
GGGGCCAT 1.018404942 1.309642292 1.164023617
GGGTCCAG 1.033840886 1.293167351 1.163504118
GGGCGTGT 0.998511908 1.327167071 1.16283949
GGGCACAA 0.95477195 1.369897556 1.162334753
GGGTCCTT 0.989122431 1.333926058 1.161524245
GGGCCTCA 1.00267392 1.318974929 1.160824425
GGGTTTTC 1.048451374 1.269810762 1.159131068
GGGGTGGT 1.027405951 1.290688353 1.159047152
GGGTCCAT 1.026196378 1.291101565 1.158648971
GGGAGCGG 1.045556118 1.269390542 1.15747333
GGGCTGCG 1.000984121 1.308322475 1.154653298
GGGTTCAG 1.051412382 1.253641003 1.152526693
GGGCTTGT 1.005842908 1.298330557 1.152086733
GGGACCGC 0.973714398 1.329109838 1.151412118
GGGTCCTG 1.026992875 1.2754272 1.151210037
GGGTTCTG 1.051252644 1.248374356 1.1498135
GGGGAGGT 1.021254108 1.277869939 1.149562024
GGGGCTCA 0.962698256 1.335413474 1.149055865
GGGGAGCT 1.020731817 1.275151571 1.147941694
GGGCTGTC 1.00642428 1.289134608 1.147779444
GGGTTGCG 1.06291599 1.23197112 1.147443555
GGGCGTGA 1.010358685 1.279817359 1.145088022
GGGTTTGG 1.048680563 1.223148221 1.135914392
GGGTCGTA 1.05954625 1.209845371 1.13469581
GGGTGCGC 0.99111156 1.277023002 1.134067281
GGGGACCG 0.999998237 1.265711609 1.132854923
GGGCTCCT 0.916274365 1.339722999 1.127998682
GGGTTCTC 0.989219527 1.262031095 1.125625311
GGGACCCG 0.937824993 1.304637687 1.12123134
GGGTCTCC 0.980603234 1.258212296 1.119407765
GGGGTGGA 1.02173738 1.211699253 1.116718317
GGGACGGT 1.009140624 1.2237037 1.116422162
GGGAGGGA 1.00545809 1.225897346 1.115677718
GGGCTTGG 0.978315755 1.252961492 1.115638623
GGGGCTCG 0.960048811 1.267765298 1.113907054
GGGCTGCA 0.966499807 1.26081142 1.113655613
GGGGCTGA 0.998684764 1.228189136 1.11343695
GGGGTCGA 1.006955321 1.219416331 1.113185826
GGGTCGAC 1.033424019 1.189919462 1.111671741
GGGCGACC 0.853786195 1.368623048 1.111204621
GGGGTCTC 0.981640086 1.240419044 1.111029565
GGGCCTGT 0.954429934 1.264414682 1.109422308
GGGTTTCG 1.009372337 1.207087957 1.108230147
GGGCTGGT 0.908192758 1.306024375 1.107108566
GGGCCTGA 0.939169229 1.274630364 1.106899797
GGGGCCAA 0.927602584 1.279855841 1.103729212
GGGTAGGG 1.005037081 1.200303866 1.102670474
GGGCTCCA 0.914044685 1.290965711 1.102505198
GGGACCGT 0.909520933 1.293569461 1.101545197
GGGGTCCA 0.94841533 1.251581508 1.099998419
GGGTTTGC 0.988912996 1.199883096 1.094398046
GGGCGTTG 0.961520386 1.226942067 1.094231227
GGGAGGGC 0.961223645 1.223039702 1.092131673
GGGCTGCT 0.929174437 1.25308631 1.091130374
GGGGTGCA 0.977028531 1.204108783 1.090568657
GGGCCTCT 0.90115578 1.279930452 1.090543116
GGGCGGTA 0.976285204 1.204363584 1.090324394
GGGTCTTG 0.991685294 1.188074897 1.089880096
GGGCGTTC 0.921665244 1.255307591 1.088486418
GGGCAGGG 0.973087969 1.203814298 1.088451133
GGGTTCCT 0.942066454 1.231055394 1.086560924
GGGGTGCT 0.929735435 1.241252535 1.085493985
GGGAGCGC 0.952502286 1.216988355 1.08474532
GGGGTGCC 0.966734804 1.201995945 1.084365374
GGGACGCT 0.920780822 1.247139594 1.083960208
GGGGTCCT 0.939870134 1.22604622 1.082958177
GGGAGGGT 0.948808819 1.212253659 1.080531239
GGGCTGGA 0.938274017 1.219411352 1.078842685
GGGGTCGT 0.994642772 1.162530341 1.078586557
GGGGTCTT 0.957261882 1.198948142 1.078105012
GGGCGTGC 0.911203711 1.242424505 1.076814108
GGGCACCG 0.91268158 1.240659903 1.076670741
GGGTTGGG 1.006581541 1.146161026 1.076371284
GGGCTTGC 0.893593894 1.252811574 1.073202734
GGGTCGGA 0.992036794 1.15231386 1.072175327
GGGCCTTA 0.929372204 1.214796023 1.072084114
GGGCTCAA 0.890721719 1.252645678 1.071683699
GGGCTGGC 0.917715312 1.221486368 1.06960084
GGGCTCTC 0.890955583 1.245786912 1.068371247
GGGGTCCG 0.903490503 1.232301046 1.067895774
GGGACCGG 0.879186277 1.254042059 1.066614168
GGGCTTCG 0.899384061 1.23104636 1.06521521
GGGTGCTG 0.932364182 1.195516541 1.063940362
GGGGCTTG 0.927196435 1.19987893 1.063537682
GGGTTCCA 0.925036438 1.201617426 1.063326932
GGGTCCTC 0.943430534 1.179903183 1.061666858
GGGACGGC 0.967680165 1.153156194 1.06041818
GGGCGTCG 0.96776185 1.151397589 1.05957972
GGGCTCTA 0.856511175 1.259755315 1.058133245
GGGGCGTA 0.909890987 1.204600524 1.057245755
GGGGCCTA 0.883236207 1.230431793 1.056834
GGGGCATC 0.901896295 1.211448076 1.056672185
GGGGTGCG 0.937392479 1.175165173 1.056278826
GGGTCCAA 0.885408491 1.225663291 1.055535891
GGGCGTTT 0.888170221 1.219461681 1.053815951
GGGGCCTG 0.904191381 1.197457744 1.050824562
GGGCCTGC 0.908801199 1.192023715 1.050412457
GGGCTCCG 0.843257115 1.248492192 1.045874653
GGGGCTGC 0.918031764 1.166834155 1.042432959
GGGGCTTC 0.897768 1.177575311 1.037671655
GGGGCTCT 0.869879285 1.199159931 1.034519608
GGGAGGCA 0.935882731 1.127178942 1.031530837
GGGGCGGA 0.996270812 1.059781739 1.028026275
GGGGCGAA 0.930809491 1.123842327 1.027325909
GGGGCGAT 0.87994859 1.172738055 1.026343322
GGGGAGGA 0.938248389 1.111623357 1.024935873
GGGGCTGT 0.899097366 1.142268382 1.020682874
GGGCTGTT 0.85513976 1.177802981 1.016471371
GGGATCCC 0.87730801 1.154737689 1.016022849
GGGTTGCC 0.935102602 1.096036689 1.015569646
GGGCCTCG 0.891243443 1.136088456 1.01366595
GGGACGCG 0.8736865 1.150303895 1.011995197
GGGCTCTG 0.837248409 1.172423868 1.004836138
GGGCGGCG 0.916914745 1.090148245 1.003531495
GGGAGGCG 0.904298318 1.100757116 1.002527717
GGGGAGGC 0.878414131 1.117803739 0.998108935
GGGGCTTT 0.837534723 1.158435252 0.997984987
GGGCGGTT 0.84498216 1.146853883 0.995918021
GGGTTCGT 0.899055179 1.091107476 0.995081327
GGGCGTCC 0.895943475 1.092641594 0.994292534
GGGGTCGG 0.888789915 1.098890383 0.993840149
GGGCTCGT 0.830576081 1.154782872 0.992679477
GGGTCCGA 0.84912367 1.132678409 0.990901039
GGGGCGGC 0.933071809 1.044640789 0.988856299
GGGTGCGG 0.871788106 1.104163662 0.987975884
GGGCGGTC 0.889992037 1.084987969 0.987490003
GGGCTGCC 0.834749396 1.139175973 0.986962684
GGGGCGAC 0.803619654 1.159795424 0.981707539
GGGCTCGA 0.859417969 1.103936126 0.981677047
GGGTTTCC 0.868571712 1.094250825 0.981411269
GGGAGGCT 0.869291009 1.093223556 0.981257282
GGGCCTGG 0.833944817 1.123631074 0.978787946
GGGGCCTT 0.810708251 1.145407929 0.97805809
GGGCCTTC 0.819760472 1.136175094 0.977967783
GGGCGGCT 0.866116754 1.087168826 0.97664279
GGGCTCGC 0.82444142 1.124662983 0.974552202
GGGGTCGC 0.858992102 1.09006089 0.974526496
GGGTCGCA 0.854300029 1.090303219 0.972301624
GGGGCGAG 0.857516099 1.083835219 0.970675659
GGGTCCGC 0.814087643 1.126444215 0.970265929
GGGCTCTT 0.786728814 1.152315864 0.969522339
GGGTTCCG 0.816643367 1.120744406 0.968693886
GGGGCTCC 0.824124433 1.111387204 0.967755819
GGGCGGTG 0.837420533 1.09135739 0.964388962
GGGTCGGC 0.897282185 1.029080267 0.963181226
GGGCCGAG 0.798690296 1.126292192 0.962491244
GGGGCGTG 0.827684187 1.093130149 0.960407168
GGGGCTGG 0.816249928 1.092526673 0.9543883
GGGTCCCT 0.811605917 1.091811308 0.951708612
GGGACGCA 0.852630305 1.0501131 0.951371702
GGGTCGCG 0.858621566 1.040809215 0.94971539
GGGTTCGA 0.834159779 1.06447304 0.94931641
GGGCCTCC 0.811641989 1.086433025 0.949037507
GGGGCCGC 0.78584837 1.107992012 0.946920191
GGGGCCTC 0.774010918 1.113836447 0.943923683
GGGCTCGG 0.799434473 1.087959593 0.943697033
GGGTGCCC 0.845580808 1.037706156 0.941643482
GGGACGGG 0.817694638 1.06150595 0.939600294
GGGTCCCG 0.805077709 1.073884815 0.939481262
GGGTCCGT 0.810489574 1.061177877 0.935833726
GGGCCCAT 0.780212766 1.086413565 0.933313166
GGGCCTTT 0.77228643 1.086813883 0.929550156
GGGCCTTG 0.799233757 1.053826346 0.926530052
GGGCGGGT 0.779310194 1.071312563 0.925311378
GGGCCGGA 0.798490332 1.051993114 0.925241723
GGGCCGAT 0.760764872 1.082402123 0.921583497
GGGGCCGG 0.797219068 1.035586724 0.916402896
GGGCTGGG 0.780204752 1.043646649 0.911925701
GGGTCGGT 0.807361829 1.015103878 0.911232853
GGGCCGAC 0.752892612 1.067629274 0.910260943
GGGTCGTC 0.835149633 0.984849438 0.909999535
GGGCGGGC 0.763593625 1.049598695 0.90659616
GGGTCGCT 0.778604562 1.027981709 0.903293135
GGGGCGTT 0.766928813 1.03297024 0.899949526
GGGGTTTT 0.790317204 1.005793826 0.898055515
GGGCCCAG 0.74053741 1.053102626 0.896820018
GGGCGCAT 0.771005907 1.019213378 0.895109643
GGGGCGGT 0.833889892 0.949555275 0.891722584
GGGCCCAC 0.745432108 1.03150701 0.888469559
GGGCCGTA 0.760475971 1.013472627 0.886974299
GGGGCCGA 0.699252229 1.073905021 0.886578625
GGGTCCGG 0.761879742 1.00688192 0.884380831
GGGCCGAA 0.717074829 1.041923839 0.879499334
GGGCGGGA 0.793469355 0.960662694 0.877066024
GGGCGGCA 0.776347119 0.973290444 0.874818782
GGGTTCGC 0.757070562 0.984833364 0.870951963
GGGCCGTT 0.72548859 1.010081113 0.867784851
GGGCATCG 0.750197435 0.983874672 0.867036053
GGGTTCGG 0.764944353 0.949135631 0.857039992
GGGTACCC 0.735316161 0.975439679 0.85537792
GGGGCGTC 0.778874069 0.92958646 0.854230264
GGGCCCTC 0.682599222 1.025028238 0.85381373
GGGCCCTT 0.695481969 1.010491834 0.852986901
GGGGCGCA 0.721711781 0.976572059 0.84914192
GGGCGGCC 0.729520176 0.962493436 0.846006806
GGGCCGCT 0.688717973 0.998708921 0.843713447
GGGCGCAG 0.739575714 0.946966516 0.843271115
GGGCCGCA 0.693272675 0.990779033 0.842025854
GGGGACCC 0.725687048 0.956626381 0.841156714
GGGCTTTT 0.716126735 0.947968018 0.832047376
GGGTCGGG 0.748170268 0.900045686 0.824107977
GGGGCCCG 0.666577944 0.98116379 0.823870867
GGGGCCGT 0.696814402 0.925568305 0.811191354
GGGCCGCG 0.67504302 0.941603598 0.808323309
GGGCCCGC 0.643118441 0.958669926 0.800894184
GGGCCCTG 0.630391381 0.970728917 0.800560149
GGGCCCAA 0.670374828 0.928175342 0.799275085
GGGCCCTA 0.635446263 0.950086716 0.792766489
GGGCCGGT 0.648684661 0.912764914 0.780724788
GGGCGCAA 0.642797558 0.918165346 0.780481452
GGGCCCGT 0.620196502 0.931155704 0.775676103
GGGGCGCG 0.657008366 0.893797382 0.775402874
GGGCCGCC 0.663090757 0.878640912 0.770865835
GGGCCGGC 0.637532266 0.902223067 0.769877666
GGGGCCCA 0.645771951 0.890534695 0.768153323
GGGGCGCT 0.630096445 0.876843404 0.753469925
GGGCGCAC 0.628966893 0.848755356 0.738861125
GGGGTGGG 0.639463276 0.837036022 0.738249649
GGGCCGTG 0.617249474 0.854371572 0.735810523
GGGCCGGG 0.626902228 0.840367143 0.733634686
GGGTCCCA 0.654146184 0.808181444 0.731163814
GGGGCGGG 0.639794668 0.821578808 0.730686738
GGGCGCCT 0.593439868 0.865301703 0.729370785
GGGCCGTC 0.631923712 0.824867234 0.728395473
GGGCGCCA 0.606296915 0.849657615 0.727977265
GGGGCCCT 0.62498374 0.818075426 0.721529583
932 GGGGCGCC 0.633242621 0.808583459 0.72091304
933 GGGCCCGG 0.577208362 0.851008666 0.714108514
934 GGGCGCCG 0.582330054 0.820782469 0.701556262
935 GGGGAGGG 0.621337097 0.770629388 0.695983242
936 GGGCGCTA 0.581578527 0.798615425 0.690096976
937 GGGCGCTT 0.555824449 0.808696826 0.682260637
938 GGGTCGCC 0.611183748 0.753207192 0.68219547
939 GGGCGCGA 0.534214373 0.809461337 0.671837855
940 GGGCGCGT 0.573492984 0.754941686 0.664217335
941 GGGCTCCC 0.526359442 0.770270271 0.648314856
942 GGGCCCGA 0.514554248 0.78083549 0.647694869
943 GGGCGCGG 0.545982711 0.748550461 0.647266586
944 GGGCGCTG 0.525524266 0.737397388 0.631460827
945 GGGCGCTC 0.48781661 0.704183023 0.595999816
946 GGGCGCGC 0.463809507 0.697426999 0.580618253
947 GGGGTCCC 0.486124521 0.644497285 0.565310903
948 GGGCGCCC 0.407405312 0.619372279 0.513388795
949 GGGTTCCC 0.414025675 0.559109324 0.486567499
Example 2. Assessing the IVT activity of T7 promoter sequences
In vitro transcription assay (IVT)
One ng of dsDNA template (gBlocks Gene Fragments, Integrated DNA Technologies) was in vitro transcribed using the HiScribe T7 RNA Synthesis Kit (NEB E2040S, E2070S) producing 410 nucleotides long RNA (see SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8). RNA was purified with 1.6 volumes RNAClean XP beads, followed by elution in 20 pi water. RNA was quantified using the QubitTM RNA Assay Kit.
IVT-activitv of T7 promoter variants using a T7 RNA polymerase T7 promoter variants which ranked as comparatively high, moderate, or low self-transcribing in 5’RACE-Seq experiments as shown in Table 1; #4 (see SEQ ID NO:4), #368 (see SEQ ID NO: 6), #949 (see SEQ ID NO: 8), respectively, were exemplarily assayed for their activity in
transcribing template DNA. The fold amplification of the template over time was higher, the higher the promoter ranked in 5’RACE-Seq (Figures 4 and 5). Assessment of the two other high and low ranked variants #1 (see SEQ ID NO:3) and #948 (see SEQ ID NO:7) also revealed high and low fold amplification of the template, respectively, as determined after 2h of IVT (Figure 6). This confirmed that the 5’RACE-Seq experiments were predictive of the efficiency/strength/activity of a given T7 promoter sequence in transcribing a downstream template sequence.
Furthermore, the transcripts increased rather linearly over time (Figure 5), demonstrating that the rank in 5’RACE-Seq experiments correlated well with the activity of the T7 promoter variant in linearly amplifying a template.
As shown exemplarily in Figure 7, the selection of a high-ranking downstream sequence (#4; see SEQ ID NO:4) revealed the improvement of the in vitro transcription activity of the T7 promoter over a low-ranking downstream sequence (#949; see SEQ ID NO: 8) at a range of IVT-template DNA concentrations as well as at saturating concentrations (40 ng). This suggested that selecting a suitable downstream flanking region can improve the performance of a T7 promoter also for large scale RNA synthesis.
The high-ranked promoter sequence #4 was further assayed with or without a 5 nucleotides long AT -rich upstream sequence (GAATT) after 2h of IVT. The directly adjacent upstream sequence in the controls was CAGGC (see SEQ ID NO:4, SEQ ID NO:8). It was found that replacing this control sequence by GAATT (see SEQ ID NO:5) further improved the activity of the T7 promoter (Figure 8).
Example 3. T7 promoter variants improve CEL-seq2
In vitro transcription assay (IVT) using CEL-seq mimics
The pre-synthesized double-stranded DNA templates harboring a T7 promoter variant mimicked cDNAs which are generated during the CEL-seq or CEL-seq2 protocol after mRNA capture to monitor the specific effect of the T7 promoter independent of the mRNA capturing step during CEL-seq(2); see SEQ ID NO: 10 and SEQ ID NO: 11. One pg (0.05 pg/mΐ) of those double stranded CEL-seq cDNA mimics was in vitro transcribed using the Hi Scribe T7 RNA Synthesis Kit. RNA was purified with 1.6 volumes RNAClean XP beads and eluted in 10 pi water. RNA was quantified using a High Sensitivity RNA ScreenTape (Agilent 5067-5579) on an Agilent Tapestation.
IVT activity of T7 promoter sequences in CEL-seq mimics
It was found that the directly adj acent upstream sequence GAATT enhanced the relative activity of the T7 promoter (total RNA yield) with the directly adjacent downstream sequence “GGGATAAT” similarly in CEL-seq cDNA mimics (SEQ ID NO: 10 and SEQ ID NO: 11) as in standard IVT templates (SEQ ID NO:4 and SEQ ID NO:5), see Figure 9 (left and middle panels).
It was furthermore shown that addition of the directly adjacent upstream sequence GAATT (see SEQ ID NO: 11) improved the RNA yield especially when the starting material was scarce. When the starting material was 1 ng (50 pg/mΐ), the increase in yield was about 1.1 -fold. When only 1 pg (0.05 pg/mΐ) IVT-template was used, the improvement was about 1.5-fold (Figure 9, middle and right panels).
It was further revealed that the T7 promoter variant #4 (see SEQ ID NO: 10; cf. Table 1), in particular with the directly adjacent upstream flanking sequence GAATT (see SEQ ID NO: 11) had a strongly improved activity compared to the T7 promoter used in the original CEL-seq2 protocol (see SEQ ID NO:9). (Figure 10).
This suggested that the combination of a suitable downstream (e.g. GGGATAAT) and upstream sequences (e.g. GAATT) such as in SEQ ID NO:5 and SEQ ID NO: 11 can improve the transcription efficiency especially in applications where only a very small amount of template DNA is available, for example single-cell analyses such as single-cell sequencing.
Amplification of mRNA in CEL-seq2 experiments
CEL-seq2 experiments were performed with single K562 cells according to Hashimshony et al., Genome Biol. 2016 Apr 28; 17:77 until the in vitro transcription step and the amplified RNA was measured. Specifically, the cDNA from 10 cells was pooled and in vitro transcribed for 15h after reverse transcription and second strand cDNA synthesis. Different reverse transcription (RT) primers comprising T7 promoter variants were compared to the original CEL-seq2 RT primer (SEQ ID NO: 12). RT primers with T7 promoter variants ranking #4 and #66 as shown in Table 1 and further comprising “GAATT” as upstream sequence (SEQ ID NO: 13 and SEQ ID NO: 15, respectively) showed stronger activity than the original CEL-seq2 RT primer (SEQ ID NO: 12) (Figures 11 and 12). The #66 T7 promoter showed the strongest activity in this experiment (SEQ ID NO: 15), probably because it was the shortest primer as only two nucleotides had to be inserted (corresponding to positions +4 and +5) between the “GGG” at positions +1 to +3 and the directly adjacent sequencing adapter of the original CEL-seq2
primer (starting at position +6 and encompassing positions +6 to +8). Hence, reducing the length of RT primers may have a positive effect on mRNA capture and thus on the performance of CEL-seq(2) based methods.
This suggested that increasing the overlap of the +1 to +8 directly adjacent downstream flanking region of the T7 core promoter with the sequencing adapter may further improve the performance of CEL-seq or CEL-seq2 methods.
RT primers comprising #4 and #66 ranking sequences with “AATTG” as directly adjacent upstream sequence (SEQ ID NO: 14 and SEQ ID NO: 16, respectively) showed no improvement over the original CEL-seq2 RT primer having “GCCGG” as upstream sequence (SEQ ID NO: 12) (Figure 11)
This indicated that “GAATT” is advantageous, in particular in CEL-seq or CEL-seq2 experiments as upstream sequence directly adjacent to the T7 promoter as it had a positive effect on the strength of the T7 promoter.
CEL-Seq2 based single-cell RNA-sequencing
The full CEL-Seq2 protocol was performed as described in Hashimshony et ah, Genome Biol. 2016 Apr 28;17:77, and as illustrated in Figure 13, unless indicated differently. In particular, a modified reverse transcription primer (CEL-Seq+; SEQ ID NO: 15) was employed instead of the original (conventional) CEL-Seq2 reverse transcription primer (SEQ ID NO: 12) where indicated.
In brief, single cells were FACS sorted using a BD Aria III device into 96 well plates with barcoded reverse transcription primers (Figure 13) comprising either the modified T7 promoter (SEQ ID NO: 15) or the conventional CEL-Seq2 T7 promoter (0.4 mM) in lysis buffer (110 pM dNTPs, 0.007 % Triton X-100, 1.4 U SUPERaseln) and stored at -80°C. After thawing, plates were heated to 72°C for 3 min and incubated for 10 min at 10°C. Four pi Superscript II reaction mix were added at room temperature and the plate was incubated for 1 h at 42 °C followed by heat inactivation for 10 min at 70°C. Second strand synthesis was performed at 16 °C for 2 h with the NEBNext® Ultra II Non-Directional RNA Second Strand Synthesis Module (NEB E6111S). Single cell reactions were pooled separately for CEL-Seq2 or CEL-Seq+ primers, purified with 1.2 volumes of Ampure XP, and in vitro transcribed for 15 h using the MEGAscript T7 kit. After EXOSAP -IT treatment (Thermo, 78201) and fragmentation for 1.5min (NEB E6150S), RNA was cleaned up by 1.8 volumes of Ampure XP. The fragmented RNA was combined with HEX-primer and dNTPs and incubated at 65 °C for 5 min followed
by addition of superscript II mix and incubation at 25°C for 10 min and 42 °C for 1 h. The cDNA was amplified with 12 cycles of PCR using KAPA HiFi HotStart ReadyMix. Final reactions were cleaned up with 0.8 fold Ampure XP and sequenced on a MiniSeq (Illumina) instrument.
Single-cell RNA-sequencing data analysis
Fastq files from each dataset were processed by zUMI software (version 2.5) for filtering, single cell demultiplexing and mapping the reads into genes. Filtering was done by using default parameters of zUMI (minimum reads per cell equal to 100), reads were mapped to hg38 genome assembly version and gencode (v27) annotation for gene read count. From each dataset, the output matrix files from zUMI were accessed by Seurat (version 3). Reads from intronic and exonic region of the genes were included and no further coverage filtering was applied.
The CEL-Seq+ reverse transcription primer (SEP ID NO: 15) improves sensitivity and accuracy of CEL-Seq2 based single-cell RNA sequencing
Successful preparation of a sequencing library strongly depends on the efficiency of the T7- based amplification step. A main characteristic of conventional single-cell RNA-seq (scRNA- seq) is a high “drop out” rate, which means that features (e.g. transcripts) that are present in an individual cell escape detection. Accordingly, conventional scRNA-seq data is inherently shallow, meaning that most expressed genes are only represented by a small number of transcript counts, and some are not represented by any transcript count at all (missed expressed genes). This is because many mRNA molecules are not initially captured by hybridization or are subsequently lost in one of the downstream steps of library preparation, all of which are not 100% efficient.
It was thus tested whether additional copies generated by a more efficient T7 promoter (e.g. SEQ ID NO:20) may increase the probability especially of lowly abundant transcripts to be represented in the final sequencing output, corresponding to enhanced sensitivity of scRNA- seq. It was found that the CEL-Seq+ reverse transcription primer (SEQ ID NO: 15) harboring a modified and more active T7 promoter (SEQ ID NO:20) facilitated library preparation, and substantially increased library complexity and the number of expressed genes detected per cell, highlighting a particular value for bioanalytical applications (Figures 11 to 17), as detailed out in the following.
Following mRNA capture with the modified reverse transcription primer (SEQ ID NO: 15), a robust ~2.5-fold increase in aRNA amplification from pooled cDNA of 10 single K562 cells
was observed, which facilitated library preparation from scarce material (Figures 12 and 13). Moreover, employing the modified reverse transcription primer (SEQ ID NO: 15) harboring the modified T7 promoter sequence (SEQ ID NO:20) for CEL-seq2 based scRNA-seq (CEL-seq+) substantially increased the number of detected genes (average 9749 vs. 8281), and the number of unique molecular identifiers (UMIs; average 85066 vs. 53541) from single cells compared to the original CEL-seq2 reverse transcription primer (SEQ ID NO: 12) (Figure 14). Detection of almost 10,000 genes per cell likely approaches the entire set of expressed genes present in a single cell at any given time. In addition to higher detection rates, expression levels of individual genes were measured with higher accuracy by CEL-Seq+ across the entire gene expression spectrum, as reflected by higher UMI counts and a consistently lower coefficient of variation (CV) (Figures 14 and 15). As an additional benchmark the genes detected in deep sequenced bulk K562 RNA-Seq (Prost (2015), Nature 525) were divided into expression quartiles and their recovery in CEL-seq2 vs. CEL-Seq+ was compared. Enhanced recovery of genes, in particular from lower expression quartiles, in CEL-Seq+ suggested that improved linear amplification of RNA molecules based on an improved T7 promoter sequence (e.g. SEQ ID NO:20 or SEQ ID NO: 18) directly translates into substantially increased sensitivity of single-cell genomics applications (Figure 16). At the same time, the detection rate of transcription factors, chromatin binders, or genes involved in Chronic myelogenous leukemia (CML) disease progression was significantly increased (Figure 17). Specifically, 64 transcription factors were exclusively detected by CEL-Seq2+, including many with known relevance for leukemia (i.e. TLX21), CML tumor stem cells (i.e. PPARG20), CML cancer cell metabolism (i.e. SIX120) or response to treatment (i.e. PBX120). Accordingly, drug development and diagnostics, e.g. in the field of tumor biology, may broadly benefit from application of an improved T7 promoter for scRNA-seq, i.e. such as in CEL-Seq+.
Further single-cell DNA- and RNA-sequencing approaches and nucleic acids based diagnostics of scarce clinical material
Furthermore, since important low-abundant regulatory genes are usually missed by microdroplet-based single-cell RNA-sequencing methods that produce particularly shallow transcriptomes, microdroplet-based methods such as inDrop (Klein, A. M. etal. , Cell 161,1187- 1201, 2015) may benefit from employing the herein described improved T7 promoters (e.g. SEQ ID NO:20 or SEQ ID NO: 18) to increase gene detection rates.
In general, since uniform and accurate amplification of nucleic acids is important when starting material is scarce and precious, the improved T7 promoter sequences provided herein (e.g. SEQ
ID NO:20 or SEQ ID NO: 18) can be readily applied in various IVT-based methods to boost linear amplification of nucleic acids. This is of particular interest for various single-cell DNA- and RNA-sequencing approaches, and for nucleic acids based diagnostics of scarce clinical material such as circulating tumor cells (CTCs) (Lawson et al., Nat. Cell Biol. 20,1349- 1360 (2018)).
Claims
1. A method for in vitro transcribing (IVT) RNA, comprising
(a) providing a DNA polynucleotide as an IVT template comprising (i) a T7 promoter sequence comprising
(1) a T7 core promoter sequence and
(2) a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising three guanines at positions +1 to +3, an adenine or thymine at position +4, at least two adenines at positions +4 to +8 and/or a thymine at position +4, and at most one cytosine at positions +4 to +8, and downstream thereof (ii) a nucleotide sequence encoding the RNA to be transcribed, and
(b) in vitro transcribing said IVT template in the presence of a T7 polymerase and ribonucleotide triphosphates.
2. The method of claim 1, wherein the IVT template is present at a concentration of less than 1 ng/mΐ, preferably of less than 100 pg/mΐ, preferably of less than 10 pg/pl.
3. The method of claims 1 or 2, wherein step (a) comprises reverse transcribing an RNA to cDNA, wherein said cDNA is the IVT template, and wherein a DNA polynucleotide comprising said T7 promoter sequence is used as reverse transcription primer, and wherein step (b) comprises transcribing said cDNA to RNA.
4. The method of claim 3, wherein the reverse transcription primer comprises a poly-T- stretch, wherein the RNA that is reverse transcribed to cDNA is an mRNA, the in vitro transcribed RNA is an antisense RNA (aRNA), and wherein the polyT-stretch is a nucleotide sequence of 9 to 35 consecutive thymines.
5. A method for amplifying a target polynucleotide, wherein said method comprises the steps of
(a) synthesizing the complementary strand of a target polynucleotide, wherein said synthesizing comprises annealing a DNA polynucleotide comprising the T7 promoter sequence according to any of claims 1 to 4 to said target polynucleotide, thereby obtaining an IVT template comprising said T7 promoter and a nucleotide sequence downstream thereof, i.e. wherein said nucleotide sequence reflects the target polynucleotide sequence, and
(b) in vitro transcribing said IVT template according to any of claims 1 to 4, thereby amplifying said target polynucleotide.
6. The method of claim 3, wherein the target polynucleotide is an RNA which is reverse- transcribed into cDNA, thereby amplifying said RNA as aRNA.
7. The method of any of the preceding claims, wherein the IVT template is obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from less than 100 cells, preferably less than 10 cells.
8. The method of any of the preceding claims, wherein the IVT template is obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from a single cell.
9. The method of any of the preceding claims, wherein the IVT template is obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide from a liquid biopsy and/or a body fluid.
10. The method of any of the preceding claims, wherein the IVT template is obtained by reverse transcribing RNA to cDNA and/or synthesizing the complementary strand of at least one target polynucleotide, wherein said RNA and/or at least one target polynucleotide is present at a concentration of less than 1 ng/mΐ, preferably of less than 100 pg/pl, preferably of less than 10 pg/pl.
11. A method for determining the partial or full nucleotide sequence(s) of an RNA and/or the transcript level of at least one gene comprising the steps of
(a) reverse transcribing an RNA to the first strand of a cDNA comprising annealing a DNA polynucleotide comprising the T7 promoter according to any one of the preceding claims; and
(b) transcribing the cDNA into aRNA by using a T7 polymerase.
12. The method of claim 11, further comprising a step (c) of generating a double-stranded cDNA from the aRNA of step (b), and/or a step of sequencing the aRNA of step (b) and/or the cDNA of step (c).
13. A method for determining the partial or full nucleotide sequence(s) and/or abundance of at least one target polynucleotide comprising the steps of
(a) amplifying at least one target polynucleotide according to any of claims 5 to 10, and
(b) determining the partial or full nucleotide sequence(s) and/or abundance of said at least one amplified target polynucleotide.
14. A method for determining the transcript level of at least one gene comprising the steps of
(a) amplifying at least one target polynucleotide according to any of claims 5 to 10, wherein said target polynucleotide is an mRNA corresponding to said at least one gene, and
(b) determining the partial or full nucleotide sequence(s) and/or abundance of said at least one amplified target polynucleotide.
15. The method of any of claims 11 to 14, wherein the partial or full nucleotide sequence(s), the abundance of at least one target polynucleotide and/or the transcript level of at least one gene is determined in a sample comprising less than 100 cells, preferably less than 10 cells, preferably a single cell.
16. The method of any of claim 11 to 15, for determining the partial or full nucleotide sequence(s) of an RNA contained in a single cell or single cell equivalent and/or the transcript level of at least one gene of a single cell or single cell equivalent.
17. The method of any of claims 11 to 16, wherein the partial or full nucleotide sequence(s), the abundance of at least one target polynucleotide and/or the transcript level of at least one gene is determined in a sample, wherein said mRNA and/or at least one target polynucleotide is present at a concentration of less than 1 ng/mΐ, preferably of less than 100 pg/mΐ, preferably of less than 10 pg/pl.
18. A DNA polynucleotide, comprising a T7 promoter sequence comprising
(1) a T7 core promoter sequence and
(2) a directly adjacent downstream flanking region, wherein the downstream flanking region comprises eight bases (+1 to +8) comprising three guanines at positions +1 to +3, an adenine or thymine at position +4, at least two adenines at positions +4 to +8 and/or a thymine at position +4, and at most one cytosine at positions +4 to +8.
19. The method of any of claims 1 to 17, or the DNA polynucleotide of claim 18, wherein the downstream flanking region further comprises at least three adenines or thymines at positions +4 to +8 and/or a guanine followed by an adenine within positions +5 and +6, and
the downstream flanking region does not comprise three consecutive thymines within positions +4 to +8, and/or two consecutive thymines within positions +4 and +7, when a cytosine or guanine is present at positions +4 to +8, or a thymine, cytosine or guanine is present at position +4; a thymine followed by a cytosine within positions +4 and +7; a cytosine within positions +4 to +6 when less than three adenines are present at positions +4 to +8; and a cytosine followed by a guanine within positions +4 to +7.
20. The method or the DNA polynucleotide of claim 19, wherein the downstream flanking region further comprises an adenine at position +4 and at most one guanine or cytosine at positions +4 to +8.
21. The method or the DNA polynucleotide of claim 20, wherein the downstream flanking region does not comprise a thymine at position +5 followed by a guanine at position +6 and two consecutive thymines within positions +4 to +8.
22. The method or the DNA polynucleotide of claim 21, wherein the downstream flanking region further comprises an adenine at position +4, three to four adenines at positions +4 to +8, and no guanines or cytosines at positions +4 to +8, preferably wherein the downstream flanking region has the sequence GGGATAAT.
23. The method of any of claims 1 to 20 or the DNA polynucleotide of any of claims 22 to 24, wherein the downstream flanking region further comprises the sequence AGT at positions +6 to +8.
24. The method or the DNA polynucleotide of claim 23, wherein the downstream flanking region has the sequence GGGAAAGT, GGGTAAGT, GGGATAGT, GGGTGAGT, or GGGAGAGT, preferably GGGAGAGT.
25. The method of any of claims 1 to 19 or the DNA polynucleotide of claims 18 or 19, wherein the downstream flanking region further comprises the sequence AG at positions +7 to +8.
26. The method or the DNA polynucleotide of claim 25, wherein the downstream flanking region has the sequence GGGAAAAG, GGGAATAG, GGGATAAG, GGGTGAAG, GGGTAAAG, GGGAGAAG, GGGTGTAG, GGGAGTAG, or GGGATTAG.
27. The method of any of claims 1 to 17 or 19 to 26 or the DNA polynucleotide of any of claims 18 to 26, wherein the T7 promoter further comprises the sequence, AATT, preferably GAATT, directly upstream of the T7 core promoter sequence.
28. The method of any of claims 1 to 17 or 19 to 27 or the DNA polynucleotide of any of claims 18 to 27, wherein the DNA polynucleotide is a reverse transcription primer, further comprising a polyT-stretch, wherein the polyT-stretch is a nucleotide sequence of 9 to 35 consecutive thymines, preferably 22 to 27 consecutive thymines, preferably 24 consecutive thymines, followed by an adenine, cytosine or guanine.
29. The method of any of claims 1 to 17 or 19 to 28 or the DNA polynucleotide of any of claims 18 to 28, wherein the DNA polynucleotide is a reverse transcription primer, further comprising a sequencing adapter, wherein the sequencing adapter is a nucleotide sequence comprising a nucleotide sequence comprised in a library preparation primer, wherein the library preparation primer comprises a flow cell binding sequence, which is complementary to an oligo- or polynucleotide present at the surface of the flow cell of a next-generation-sequencing device, and wherein the sequencing adapter does not comprise more than eight consecutive thymines and/or only thymines.
30. The method or the DNA polynucleotide of claim 29, wherein the sequencing adapter is downstream of the T7 core promoter, and wherein the polyT-stretch is downstream of the sequencing adapter and not overlapping with the sequencing adapter.
31. The method of any of claims 5 to 17, or 19 to 30, wherein at least one target polynucleotide is amplified from less than 100 cells, preferably less than 10 cells, preferably a single cell, and wherein the T7 promoter further comprises the sequence AATT directly upstream of the T7 core promoter sequence.
32. The method of claim 31, wherein an mRNA is amplified, and wherein the T7 promoter further comprises a poly-T-stretch according to claims 28 or 30.
33. The method of any of claims 5 to 17 or 19 to 32, wherein the target polynucleotide is from a prokaryote or a eukaryote, preferably an animal or a human.
34. The method of any of claims 1 to 17, or 19 to 33 or the DNA polynucleotide of any of claims 18 to 30, wherein the DNA polynucleotide further comprises downstream of the T7 core promoter sequence a nucleotide sequence encoding an mRNA or non-coding RNA, preferably an mRNA, antisense RNA, siRNA, sgRNA, guide RNA or CRISPR
RNA.
35. A kit comprising the DNA polynucleotide of any one of claims 18 to 30, further comprising a T7 polymerase, one or more modified and/or unmodified ribonucleotide triphosphate(s), library preparation primer(s), sequencing primer(s), and/or a microfluidic chip.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19199289.0 | 2019-09-24 | ||
EP19199289 | 2019-09-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021058145A1 true WO2021058145A1 (en) | 2021-04-01 |
Family
ID=68172082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2020/066070 WO2021058145A1 (en) | 2019-09-24 | 2020-06-10 | Phage t7 promoters for boosting in vitro transcription |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021058145A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11898186B1 (en) | 2022-08-10 | 2024-02-13 | Genscript Usa Inc. | Compositions and methods for preparing capped mRNA |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000005359A1 (en) * | 1998-07-24 | 2000-02-03 | Oklahoma Medical Research Foundation | Use of prohibitin rna in treatment of cancer |
WO2002029118A1 (en) * | 2000-10-05 | 2002-04-11 | Hong Kong Dna Chips Limited | A kit for detecting non-pathogenic or pathogenic influenza a subtype h5 virus |
US20020055095A1 (en) * | 2000-09-01 | 2002-05-09 | Yang Yeasing Y. | Amplification of HIV-1 sequences for detection of sequences associated with drug-resistance mutations |
WO2003102243A1 (en) * | 2002-05-31 | 2003-12-11 | Janssen Pharmaceutica N.V. | Methods for improving rna transcription reactions |
WO2007080126A2 (en) * | 2006-01-12 | 2007-07-19 | Devgen N.V. | Dsrna as insect control agent |
EP1908492A1 (en) * | 2005-06-22 | 2008-04-09 | Ajinomoto Co., Inc. | Metabotropic glutamate receptor activator |
WO2009017743A2 (en) * | 2007-07-30 | 2009-02-05 | Argos Therapeutics, Inc. | Improved primers and probes for the amplification and detection of hiv gag, rev and nef polynucleotides |
WO2010057069A1 (en) * | 2008-11-14 | 2010-05-20 | Gen-Probe Incorporated | Compositions, kits and methods for detection campylobacter nucleic acid |
WO2010099378A2 (en) * | 2009-02-26 | 2010-09-02 | Gen-Probe Incorporated | Assay for detection of human parvovirus nucleic acid |
WO2012058072A1 (en) * | 2010-10-27 | 2012-05-03 | Harrisvaccines, Inc | Method of rapidly producing vaccines for animals |
WO2013071047A1 (en) * | 2011-11-11 | 2013-05-16 | Children's Medical Center Corporation | Compositions and methods for in vitro transcription of rna |
WO2013116774A1 (en) * | 2012-02-01 | 2013-08-08 | Gen-Probe Incorporated | Asymmetric hairpin target capture oligomers |
WO2013163188A1 (en) * | 2012-04-24 | 2013-10-31 | Gen-Probe Incorporated | Compositions, methods and kits to detect herpes simplex virus nucleic acids |
EP2687603A1 (en) * | 2012-07-19 | 2014-01-22 | Ruhr-Universität Bochum | T7 promoter variants and methods of using the same |
WO2014167514A1 (en) * | 2013-04-09 | 2014-10-16 | Tuttle Chris | Formulations and methods for control of weedy species |
CN104531675A (en) * | 2014-12-10 | 2015-04-22 | 中国医学科学院医学实验动物研究所 | Method for in-vitro synthesis and activity detection on gRNA |
WO2017139412A1 (en) * | 2016-02-09 | 2017-08-17 | Brookhaven Science Associates, Llc | Improved cloning and expression vectors and systems |
WO2018138201A1 (en) * | 2017-01-25 | 2018-08-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Promoter construct for cell-free protein synthesis |
-
2020
- 2020-06-10 WO PCT/EP2020/066070 patent/WO2021058145A1/en active Application Filing
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000005359A1 (en) * | 1998-07-24 | 2000-02-03 | Oklahoma Medical Research Foundation | Use of prohibitin rna in treatment of cancer |
US20020055095A1 (en) * | 2000-09-01 | 2002-05-09 | Yang Yeasing Y. | Amplification of HIV-1 sequences for detection of sequences associated with drug-resistance mutations |
WO2002029118A1 (en) * | 2000-10-05 | 2002-04-11 | Hong Kong Dna Chips Limited | A kit for detecting non-pathogenic or pathogenic influenza a subtype h5 virus |
WO2003102243A1 (en) * | 2002-05-31 | 2003-12-11 | Janssen Pharmaceutica N.V. | Methods for improving rna transcription reactions |
EP1908492A1 (en) * | 2005-06-22 | 2008-04-09 | Ajinomoto Co., Inc. | Metabotropic glutamate receptor activator |
WO2007080126A2 (en) * | 2006-01-12 | 2007-07-19 | Devgen N.V. | Dsrna as insect control agent |
WO2009017743A2 (en) * | 2007-07-30 | 2009-02-05 | Argos Therapeutics, Inc. | Improved primers and probes for the amplification and detection of hiv gag, rev and nef polynucleotides |
WO2010057069A1 (en) * | 2008-11-14 | 2010-05-20 | Gen-Probe Incorporated | Compositions, kits and methods for detection campylobacter nucleic acid |
WO2010099378A2 (en) * | 2009-02-26 | 2010-09-02 | Gen-Probe Incorporated | Assay for detection of human parvovirus nucleic acid |
WO2012058072A1 (en) * | 2010-10-27 | 2012-05-03 | Harrisvaccines, Inc | Method of rapidly producing vaccines for animals |
WO2013071047A1 (en) * | 2011-11-11 | 2013-05-16 | Children's Medical Center Corporation | Compositions and methods for in vitro transcription of rna |
WO2013116774A1 (en) * | 2012-02-01 | 2013-08-08 | Gen-Probe Incorporated | Asymmetric hairpin target capture oligomers |
WO2013163188A1 (en) * | 2012-04-24 | 2013-10-31 | Gen-Probe Incorporated | Compositions, methods and kits to detect herpes simplex virus nucleic acids |
EP2687603A1 (en) * | 2012-07-19 | 2014-01-22 | Ruhr-Universität Bochum | T7 promoter variants and methods of using the same |
WO2014167514A1 (en) * | 2013-04-09 | 2014-10-16 | Tuttle Chris | Formulations and methods for control of weedy species |
CN104531675A (en) * | 2014-12-10 | 2015-04-22 | 中国医学科学院医学实验动物研究所 | Method for in-vitro synthesis and activity detection on gRNA |
WO2017139412A1 (en) * | 2016-02-09 | 2017-08-17 | Brookhaven Science Associates, Llc | Improved cloning and expression vectors and systems |
WO2018138201A1 (en) * | 2017-01-25 | 2018-08-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Promoter construct for cell-free protein synthesis |
Non-Patent Citations (24)
Title |
---|
A. J. LAMBERT ET AL: "Nucleic Acid Amplification Assays for Detection of La Crosse Virus RNA", JOURNAL OF CLINICAL MICROBIOLOGY, vol. 43, no. 4, 1 April 2005 (2005-04-01), US, pages 1885 - 1889, XP055671337, ISSN: 0095-1137, DOI: 10.1128/JCM.43.4.1885-1889.2005 * |
CHEN, C. ET AL., SCIENCE, vol. 356, 2017, pages 189 - 194 |
DUNN J J ET AL: "Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elements", JOURNAL OF MOLECULAR BIOLOGY, ACADEMIC PRESS, UNITED KINGDOM, vol. 166, no. 4, 5 June 1983 (1983-06-05), pages 477 - 535, XP026026688, ISSN: 0022-2836, [retrieved on 19830605], DOI: 10.1016/S0022-2836(83)80282-4 * |
GUO-QING TANG ET AL: "Extended upstream A-T sequence increases T7 promoter strength", JOURNAL OF BIOLOGICAL CHEMISTRY, AMERICAN SOCIETY FOR BIOCHEMISTRY AND MOLECULAR BIOLOGY, US, vol. 280, no. 49, 9 December 2005 (2005-12-09), pages 40707 - 40713, XP002685708, ISSN: 0021-9258, [retrieved on 20051007], DOI: 10.1074/JBC.M508013200 * |
HARADA, NAT. CELL BIOL., vol. 21, 2019 |
HASHIMSHONY ET AL., CELL REP., vol. 2, no. 3, 27 September 2012 (2012-09-27), pages 666 - 73 |
HASHIMSHONY ET AL., GENOME BIOL., vol. 17, 2016, pages 77 |
HASHIMSHONY ET AL., GENOME BIOL., vol. 17, 28 April 2016 (2016-04-28), pages 77 |
IKEDA, BIOL. CHEM., vol. 267, 1992, pages 11322 - 8 |
KAYA-OKUR, NAT COMMUN., vol. 10, no. 1, 2019, pages 1930 |
KLEIN, A. M. ET AL., CELL, vol. 161, 2015, pages 1187 - 1201 |
LAU L-T ET AL: "Nucleic acid sequence-based amplification methods to detect avian influenza virus", BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, ELSEVIER, AMSTERDAM, NL, vol. 313, no. 2, 9 January 2004 (2004-01-09), pages 336 - 342, XP004601915, ISSN: 0006-291X, DOI: 10.1016/J.BBRC.2003.11.131 * |
LAWSON ET AL., NAT. CELL BIOL., vol. 20, 2018, pages 1349 - 1360 |
MELTON ET AL., NUCLEIC ACIDS RES., vol. 12, 1984, pages 7035 - 7056 |
PATWARDHAN ET AL., NAT BIOTECHNOL., vol. 27, no. 12, December 2009 (2009-12-01), pages 1173 - 5 |
PING-HUA TENG: "Rapid and sensitive detection of Taura syndrome virus using nucleic acid-based amplification", DISEASES OF AQUATIC ORGANISMS, 1 January 2006 (2006-01-01), pages 13 - 22, XP055722157, Retrieved from the Internet <URL:https://www.int-res.com/articles/dao2006/73/d073p013.pdf> [retrieved on 20200812] * |
PROST, NATURE, vol. 525, 2015 |
ROOIJERS, NAT. BIOTECHNOL., vol. 37, 2019 |
S. PAUL ET AL: "Selection of a T7 promoter mutant with enhanced in vitro activity by a novel multi-copy bead display approach for in vitro evolution", NUCLEIC ACIDS RESEARCH, vol. 41, no. 1, 15 October 2012 (2012-10-15), pages e29 - e29, XP055159046, ISSN: 0305-1048, DOI: 10.1093/nar/gks940 * |
SAHIN ET AL., NAT REV DRUG DISCOV., vol. 13, no. 10, October 2014 (2014-10-01), pages 759 - 80 |
SASKIA A. RUTJES ET AL: "ABSTRACT", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, vol. 72, no. 8, 1 August 2006 (2006-08-01), US, pages 5349 - 5358, XP055722158, ISSN: 0099-2240, DOI: 10.1128/AEM.00751-06 * |
SAUER ET AL., NAT. REV. GENET., vol. 6, 2005, pages 465 - 476 |
TANG ET AL., BIOL. CHEM., vol. 280, 2005, pages 40707 - 13 |
YIN, Y. ET AL., MOL CELL, vol. S1097-2765, no. 19, 28 August 2019 (2019-08-28), pages 30618 - 5 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11898186B1 (en) | 2022-08-10 | 2024-02-13 | Genscript Usa Inc. | Compositions and methods for preparing capped mRNA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3555305B1 (en) | Method for increasing throughput of single molecule sequencing by concatenating short dna fragments | |
US10017761B2 (en) | Methods for preparing cDNA from low quantities of cells | |
EP2914745B1 (en) | Barcoding nucleic acids | |
US8999677B1 (en) | Method for differentiation of polynucleotide strands | |
US8574864B2 (en) | Methods and kits for 3'-end-tagging of RNA | |
CN114829623A (en) | Methods and compositions for high throughput sample preparation using dual unique dual indices | |
US20230056763A1 (en) | Methods of targeted sequencing | |
JP2020500035A (en) | Methods for processing nucleic acid samples | |
WO2017054302A1 (en) | Sequencing library, and preparation and use thereof | |
US11761037B1 (en) | Probe and method of enriching target region applicable to high-throughput sequencing using the same | |
JP2021514651A (en) | Preparation of single-stranded circular DNA template for single molecule sequencing | |
US20220259649A1 (en) | Method for target specific rna transcription of dna sequences | |
US12157911B2 (en) | Method for generating single-stranded circular DNA libraries for single molecule sequencing | |
JP2023553983A (en) | Methods for double-stranded sequencing | |
JP2025041594A (en) | Methods for target-specific RNA transcription of DNA sequences - Patents.com | |
EP3775269A1 (en) | Integrative dna and rna library preparations and uses thereof | |
US20170175182A1 (en) | Transposase-mediated barcoding of fragmented dna | |
CN107488655B (en) | Removal method of 5' and 3' adapter ligation by-products in sequencing library construction | |
WO2021058145A1 (en) | Phage t7 promoters for boosting in vitro transcription | |
EP3918091A1 (en) | Method of sequencing nucleic acid with unnatural base pairs | |
US20250084484A1 (en) | Methods and compositions for transcriptome analysis | |
Villegas et al. | Optimizing in-vitro Transcribed CRISPR-Cas9 single-guide RNA Libraries for Improved Uniformity and Affordability | |
EP4392577A1 (en) | Optimised set of oligonucleotides for bulk rna barcoding and sequencing | |
EP4536854A1 (en) | Optimised set of oligonucleotides for bulk rna barcoding and sequencing | |
JP2009125001A (en) | Oligo dT primer, cDNA library preparation kit, and cDNA library preparation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20731098 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20731098 Country of ref document: EP Kind code of ref document: A1 |