WO2014048185A1 - Méthode d'enrichissement d'un produit de transcription à partir d'un échantillon d'arn et son utilisation - Google Patents
Méthode d'enrichissement d'un produit de transcription à partir d'un échantillon d'arn et son utilisation Download PDFInfo
- Publication number
- WO2014048185A1 WO2014048185A1 PCT/CN2013/081581 CN2013081581W WO2014048185A1 WO 2014048185 A1 WO2014048185 A1 WO 2014048185A1 CN 2013081581 W CN2013081581 W CN 2013081581W WO 2014048185 A1 WO2014048185 A1 WO 2014048185A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequencing
- rna
- sequence
- transcript
- optionally
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 72
- 239000013614 RNA sample Substances 0.000 claims abstract description 51
- 230000000694 effects Effects 0.000 claims abstract description 21
- 108060002716 Exonuclease Proteins 0.000 claims abstract description 12
- 102000013165 exonuclease Human genes 0.000 claims abstract description 12
- 125000002264 triphosphate group Chemical group [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 claims abstract 6
- 238000012163 sequencing technique Methods 0.000 claims description 158
- 108700009124 Transcription Initiation Site Proteins 0.000 claims description 157
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 116
- 108090000623 proteins and genes Proteins 0.000 claims description 79
- 239000000523 sample Substances 0.000 claims description 43
- 238000010839 reverse transcription Methods 0.000 claims description 38
- 150000007523 nucleic acids Chemical class 0.000 claims description 35
- 108020004707 nucleic acids Proteins 0.000 claims description 30
- 102000039446 nucleic acids Human genes 0.000 claims description 30
- UNXRWKVEANCORM-UHFFFAOYSA-N triphosphoric acid Chemical compound OP(O)(=O)OP(O)(=O)OP(O)(O)=O UNXRWKVEANCORM-UHFFFAOYSA-N 0.000 claims description 29
- 239000001226 triphosphate Substances 0.000 claims description 28
- 235000011178 triphosphate Nutrition 0.000 claims description 28
- 230000003321 amplification Effects 0.000 claims description 26
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 26
- 238000011144 upstream manufacturing Methods 0.000 claims description 20
- 238000009966 trimming Methods 0.000 claims description 19
- 239000002299 complementary DNA Substances 0.000 claims description 17
- 238000010276 construction Methods 0.000 claims description 15
- 238000013518 transcription Methods 0.000 claims description 15
- 230000035897 transcription Effects 0.000 claims description 15
- 238000000546 chi-square test Methods 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 150000002500 ions Chemical class 0.000 claims description 10
- 239000002253 acid Substances 0.000 claims description 8
- 241000208125 Nicotiana Species 0.000 claims description 7
- 235000002637 Nicotiana tabacum Nutrition 0.000 claims description 7
- 102000009609 Pyrophosphatases Human genes 0.000 claims description 7
- 108010009413 Pyrophosphatases Proteins 0.000 claims description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 5
- 101710086015 RNA ligase Proteins 0.000 claims description 5
- 238000007672 fourth generation sequencing Methods 0.000 claims description 5
- 239000007787 solid Substances 0.000 claims description 5
- 238000013519 translation Methods 0.000 claims description 5
- 239000000344 soap Substances 0.000 claims description 4
- 125000000446 sulfanediyl group Chemical group *S* 0.000 claims description 4
- 108091034117 Oligonucleotide Proteins 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims 4
- 108020003589 5' Untranslated Regions Proteins 0.000 claims 2
- 238000012165 high-throughput sequencing Methods 0.000 description 27
- 150000003839 salts Chemical class 0.000 description 18
- 239000000047 product Substances 0.000 description 16
- 241000588724 Escherichia coli Species 0.000 description 14
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 12
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 12
- PHTQWCKDNZKARW-UHFFFAOYSA-N isoamylol Chemical compound CC(C)CCO PHTQWCKDNZKARW-UHFFFAOYSA-N 0.000 description 10
- 150000004712 monophosphates Chemical class 0.000 description 10
- 102000004190 Enzymes Human genes 0.000 description 9
- 108090000790 Enzymes Proteins 0.000 description 9
- 239000000872 buffer Substances 0.000 description 9
- 238000001914 filtration Methods 0.000 description 8
- 238000003752 polymerase chain reaction Methods 0.000 description 7
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical group Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 6
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 6
- ISWSIDIOOBJBQZ-UHFFFAOYSA-N Phenol Chemical compound OC1=CC=CC=C1 ISWSIDIOOBJBQZ-UHFFFAOYSA-N 0.000 description 6
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 229910001629 magnesium chloride Inorganic materials 0.000 description 6
- 241000206602 Eukaryota Species 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000002255 enzymatic effect Effects 0.000 description 5
- 239000012634 fragment Substances 0.000 description 5
- 230000014621 translational initiation Effects 0.000 description 5
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 4
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 4
- NBIIXXVUZAFLBC-UHFFFAOYSA-N Phosphoric acid Chemical compound OP(O)(O)=O NBIIXXVUZAFLBC-UHFFFAOYSA-N 0.000 description 4
- 239000011324 bead Substances 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 4
- 230000003750 conditioning effect Effects 0.000 description 4
- 238000010219 correlation analysis Methods 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 230000000593 degrading effect Effects 0.000 description 4
- 238000002156 mixing Methods 0.000 description 4
- 239000002091 nanocage Substances 0.000 description 4
- 239000003415 peat Substances 0.000 description 4
- 241000894006 Bacteria Species 0.000 description 3
- 108091026890 Coding region Proteins 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- 108700026244 Open Reading Frames Proteins 0.000 description 3
- VMHLLURERBWHNL-UHFFFAOYSA-M Sodium acetate Chemical group [Na+].CC([O-])=O VMHLLURERBWHNL-UHFFFAOYSA-M 0.000 description 3
- 238000003766 bioinformatics method Methods 0.000 description 3
- 239000000337 buffer salt Substances 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 3
- 238000012869 ethanol precipitation Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 3
- 239000001632 sodium acetate Substances 0.000 description 3
- 235000017281 sodium acetate Nutrition 0.000 description 3
- 239000011780 sodium chloride Substances 0.000 description 3
- 239000002904 solvent Substances 0.000 description 3
- 241000894007 species Species 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- DGVVWUTYPXICAM-UHFFFAOYSA-N β‐Mercaptoethanol Chemical compound OCCS DGVVWUTYPXICAM-UHFFFAOYSA-N 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 2
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 2
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 2
- 102000016911 Deoxyribonucleases Human genes 0.000 description 2
- 108010053770 Deoxyribonucleases Proteins 0.000 description 2
- 108700011259 MicroRNAs Proteins 0.000 description 2
- 108700026226 TATA Box Proteins 0.000 description 2
- 229910000147 aluminium phosphate Inorganic materials 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 238000013079 data visualisation Methods 0.000 description 2
- -1 deep-RACE Substances 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- ZXEKIIBDNHEJCQ-UHFFFAOYSA-N isobutanol Chemical compound CC(C)CO ZXEKIIBDNHEJCQ-UHFFFAOYSA-N 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 102000042567 non-coding RNA Human genes 0.000 description 2
- 108091027963 non-coding RNA Proteins 0.000 description 2
- 229920002113 octoxynol Polymers 0.000 description 2
- 239000002244 precipitate Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000011541 reaction mixture Substances 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 150000003384 small molecules Chemical class 0.000 description 2
- 230000005026 transcription initiation Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 238000001190 Q-PCR Methods 0.000 description 1
- 101710188536 RNA ligase 1 Proteins 0.000 description 1
- 101710093506 RNA-editing ligase 1, mitochondrial Proteins 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000003698 anagen phase Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 229940048102 triphosphoric acid Drugs 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
Definitions
- the present invention relates to the field of biotechnology, and in particular, the present invention relates to a method for enriching transcripts from RNA samples and uses thereof. More specifically, the present invention relates to a method for enriching transcripts from RNA samples, a method for constructing a sequencing library, and sequencing Library, nucleic acid sample sequencing method, method for determining transcription start site (TSS), enrichment reagent for enriching transcript from RNA sample, device for constructing sequencing library, nucleic acid sample sequencing device, and system for determining TSS.
- TSS transcription start site
- the transcription process of the gene begins with the binding of the RNA polymerase to the promoter position of the DNA template, and then proceeds from the transcription start site (TMS in this paper) to form a complete RNA.
- TSS transcription start site
- the RNA molecules present in the organism start from TSS, so the study of TSS by high-throughput sequencing helps us to predict the location and structure of the promoter from the whole genome, so as to understand the gene transcriptional regulation network globally. TSS research also helps to correct existing gene annotations or discover new genes.
- the present invention is directed to solving at least some of the above technical problems or at least providing a useful commercial choice. To this end, it is an object of the present invention to provide a means for efficiently enriching transcripts, which in turn can effectively determine TSS.
- the method for studying high-throughput sequencing of TSS is usually directed to RNA with a hat structure, and CAGE or RACE is used to capture the 5th end of the RNA molecule.
- CAGE or RACE is used to capture the 5th end of the RNA molecule.
- Common are deepCAGE, PEAT, deep-RACE, nanoCAGE and CAGEscan.
- deepCAGE, PEAT, deep-RACE and CAGEscan require cumbersome operations such as enzymatic cleavage, high requirements for RNA, and short sequencing sequences (about 20 nt), which are only applicable to RNA with hat structure.
- the nanoCAGE operation is relatively simple and requires less RNA, it is only applicable to RNAs with a hat structure, and there are more false positives in the data generated.
- the inventors have found that by using 5, exo-exonuclease, it is possible to specifically degrade 5, monophosphate RNA, retain intact RNA molecules with 5, cap and 5, triphosphate, and can be effectively applied to enriched transcripts. Therefore, high-throughput sequencing of TSS capable of simultaneous application to eukaryotic and prokaryotic RNA has many advantages of operating the cartridge, high accuracy and low cost.
- the invention proposes a method of enriching transcripts from an RNA sample.
- the method for enriching a transcript from an RNA sample comprises: processing an RNA sample with an enrichment reagent to enrich a transcript, wherein the enrichment reagent has a 5,-monophosphate exo Enzyme activity, the transcript is an RNA molecule having a cap structure or a 5, triphosphate at its 5' end.
- the enzymatic activity enrichment reagent can effectively enrich transcripts, and thus can be applied to high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA at the same time, and has many advantages of operating a single cartridge, high accuracy and low cost.
- the invention proposes a method of constructing a sequencing library.
- the method of constructing a sequencing library comprises: enriching a transcript from an RNA sample according to the method described above; removing the transcript 5, a cap structure or a 5, triphosphate to obtain a removal 5 a transcript of cap structure or 5, triphosphate; at the 5, end of the transcript of cap 5 or triphosphate, the RNA linker is ligated to obtain a transcript to which the RNA linker is ligated; The transcript is reverse transcribed to obtain a cDNA corresponding to the transcript; the cDNA is amplified to obtain an amplification product; and a sequencing library is constructed based on the amplification product.
- a sequencing library can be efficiently constructed for an transcript enriched in a nucleic acid sample, thereby being capable of simultaneously applying high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, having an operation cartridge, and an accuracy rate Many advantages of high cost and low cost.
- the invention proposes a sequencing library which is constructed by the method described above.
- the sequencing library enables efficient sequencing of RNA transcripts and can be applied to both high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, with many advantages of operating the cartridge, high accuracy and low cost.
- the invention proposes a nucleic acid sample sequencing method.
- the nucleic acid sample sequencing method comprises: constructing a sequencing library according to the method described above; and sequencing the sequencing library to obtain a sequencing result.
- the invention proposes a method for determining TSS.
- the method for determining a TSS comprises: extracting an RNA sample from a host; obtaining a sequencing result composed of a plurality of sequencing sequences using the method described above; and determining a TSS based on the sequencing result.
- the invention proposes an enrichment reagent for enriching transcripts from an RNA sample.
- the enrichment reagent has 5,-exeruclease activity.
- the enrichment reagent can efficiently enrich transcripts, and thus can be applied to high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA simultaneously, and has many advantages of operating a single cartridge, high accuracy and low cost.
- the invention proposes an apparatus for constructing a sequencing library.
- the device can efficiently construct a sequencing library for the transcripts enriched in the nucleic acid sample, thereby being capable of simultaneously applying high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, having an operation cartridge, high accuracy and cost. Low many advantages.
- the present invention provides a nucleic acid sample sequencing apparatus, comprising: a library construction device, wherein the library construction device is the device described above to construct a sequencing library for a nucleic acid sample; A sequencing device coupled to the library construction device and adapted to sequence the sequencing library to obtain sequencing results.
- the device can efficiently sequence RNA transcripts and can be applied to high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA simultaneously, and has many advantages of operating a single cartridge, high accuracy and low cost.
- the invention proposes a system for determining TSS.
- the system comprises: a sample extraction device for extracting an RNA sample from a host; a nucleic acid sample sequencing device, the nucleic acid sample sequencing device being coupled to the sample extraction device, and The sequencing device is the nucleic acid sample sequencing device described above to sequence the RNA sample to obtain a sequencing result composed of a plurality of sequencing sequences; and a TSS determining device, the TSS determining device being connected to the sequencing device, And adapted to determine the TSS based on the sequencing result.
- the system can effectively determine TSS in a nucleic acid sample.
- FIG. 1 shows a flow diagram of a method of constructing a sequencing library in accordance with one embodiment of the present invention
- FIG. 2 shows a flow chart of an informatics analysis for determining a TSS sequence in accordance with one embodiment of the present invention
- FIG. 3 shows a schematic diagram of a system for determining TSS in accordance with one embodiment of the present invention
- Figure 4 shows a schematic diagram of a nucleic acid sample sequencing device in accordance with one embodiment of the present invention
- Figure 5 shows a construction sequencing in accordance with one embodiment of the present invention.
- Schematic diagram of the apparatus for the library
- FIG. 6 shows a schematic diagram of an apparatus for determining TSS according to an embodiment of the present invention
- FIG. 7 shows the distribution of the screened TSS on the genome according to an embodiment of the present invention, upper and lower
- the figure is a TSS map of human RNA and E. coli RNA samples, where 0 is the base Because of the start site of the coding region, the upstream is the site of transcription initiation. As can be seen from the figure, most of the sequences fall upstream of the coding region of the gene;
- Figure 8 shows a TSS map showing eight human RNA samples, from which the distribution of TSS in different samples can be seen, in accordance with one embodiment of the present invention
- Figure 9 shows a base upstream of TSS in accordance with one embodiment of the present invention.
- the base distribution graph where the abscissa 1 corresponds to the position of the TSS, and is dominated by ⁇ (A/G).
- the above figure is the base distribution map of TSS upstream of human RNA samples, with obvious GC enrichment region, which is also the main promoter type of eukaryotes;
- the following figure is the base distribution map of TSS upstream of E. coli RNA samples.
- a typical TATA box can also be found in the upstream-10 area;
- Figure 10 shows the length distribution of 5, UTR, i.e. the distance of the TSS to the coding region, in accordance with one embodiment of the present invention.
- the figure above shows the length distribution of human RNA sample 5, UTR, and the figure below shows the length distribution of E. coli RNA sample 5, UTR;
- Figure 11 shows that the correlation analysis provides an assessment of the reliability and operational stability of the experimental results.
- the top panel is two replicates of a human RNA sample, and the lower panel is two replicates of an E. coli RNA sample; as well as
- Figure 12 is a graphical representation showing the results of predicting genes in accordance with an embodiment of the present invention.
- the above figure shows the TSS distribution of two human genes NM-018997 and NM-031901. They are genes with variable shearing.
- the red vertical line in the figure indicates the TSS of the sieve, and the black vertical line is before the filtration.
- the resulting sequence, the blue horizontal line represents the exon of the gene, the yellow horizontal line ⁇ the intron of the cause; the lower figure shows the TSS distribution of an operon of Escherichia coli, the intron does not exist in the pronucleus, so only the representative gene
- the blue horizontal line, the four genes of this operon share a TSS.
- the terms “installation”, “connected”, “connected”, “fixed” and the like should be understood broadly, and may be either a fixed connection or a detachable connection, unless otherwise explicitly stated and defined. , or connected integrally; can be mechanical or electrical; can be directly connected, or indirectly connected through an intermediate medium, can be the internal communication of the two components.
- installation or connected integrally; can be mechanical or electrical; can be directly connected, or indirectly connected through an intermediate medium, can be the internal communication of the two components.
- upstream and “downstream” as used herein are determined in the direction of 5, end to 3, end.
- the method for studying TSS by high-throughput sequencing is usually directed to RNA with a hat structure, and the 5, end of the RNA molecule is captured by CAGE or RACE.
- CAGE or RACE Common are deepCAGE, PEAT, deep-RACE, nanoCAGE and CAGEscan.
- deepCAGE, PEAT, deep-RACE and CAGEscan require cumbersome operations such as enzymatic cleavage, high requirements for RNA, and short sequencing sequences (about 20 nt), which are only applicable to RNA with hat structure.
- a study of TSS for prokaryotic RNA without a hat structure A study of TSS for prokaryotic RNA without a hat structure.
- the nanoCAGE operation is relatively simple and requires less RNA, it is only applicable to RNA with a hat structure, and there are more false positives in the data generated.
- the inventors found that by using 5, exo-exonuclease, Heterologous degradation of 5, monophosphate RNA, retaining intact RNA molecules with 5, cap and 5, triphosphate, can be effectively applied to enriched transcripts, and thus can be applied to both TES of eukaryotic and prokaryotic RNA High-throughput sequencing, with the advantages of operating cartridges, high accuracy and low cost.
- the invention proposes a method of enriching transcripts from an RNA sample.
- the method for enriching a transcript from an RNA sample comprises: processing an RNA sample with an enrichment reagent to enrich a transcript, wherein the enrichment reagent has a 5,-monophosphate exo Enzyme activity, the transcript is an RNA molecule having a cap structure or a triphosphate at its 5' end.
- examples of the enzyme having 5, exo-exonuclease activity may include: exonuclease XRN-1, TerminatorTM is dependent on 5, acid-extracting exonuclease (TerminatorTM 5' - Phosphate-Dependent Exonuclease or TAKARATM Alkaline Phosphatase.
- the enzymatic activity enrichment reagent can effectively enrich transcripts, and thus can be simultaneously applied to high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, and has many advantages of operating a single cartridge, high accuracy and low cost.
- the method of enriching transcripts from RNA samples can employ any enrichment reagent having 5, exo-exonuclease activity.
- the enzyme having 5, exo-exonuclease activity may include: Exonuclease XRN-1, TerminatorTM is dependent on 5, exonuclease of phosphoric acid or TAKARATM alkaline phosphatase .
- the enrichment reagent contains DNase I. Thereby, the specificity and efficiency of the degradation of 5, monophosphate RNA can be further improved, thereby further improving the efficiency of the method of enriching the transcript.
- the enrichment reagent may further contain a buffer and a soluble salt to further increase the enzymatic activity of DNase.
- the pH of the enrichment reagent is 8.0.
- the buffer is Tris-HCl, and the soluble salt is at least one selected from the group consisting of sodium chloride and magnesium chloride.
- the RNA sample is treated with the enrichment reagent at 30 degrees Celsius. Thereby, the efficiency of enriching the transcript using the enrichment reagent according to the embodiment of the present invention can be further improved.
- Examples of the enzyme having 5, exo-exonuclease activity according to an embodiment of the present invention may include: exonuclease XRN-1, TerminatorTM depends on 5, exonuclease of phosphoric acid or TAKARATM alkaline phosphatase .
- the invention proposes a method of constructing a sequencing library.
- the method for constructing a sequencing library includes:
- S100 enriched transcript: Enrichment of transcripts from RNA samples according to the methods described previously. Regarding this step, detailed description has been made above, and details are not described herein again.
- S200 end trimming: Remove the 5, cap structure or 5, triphosphate of the transcript to obtain a transcript of the 5, cap structure or 5, triphosphate.
- the transcript of the transcript 5, cap structure or 5, triphosphate is removed using an end trimming reagent, wherein the terminal trimming reagent has tobacco acid pyrophosphatase activity.
- the conditioning reagent comprises: tobacco acid pyrophosphatase, soluble salt, EDTA, ⁇ -mercaptoethanol, and Triton-X 100.
- the soluble salt is sodium acetate.
- the pH of the conditioning agent is 7.5. Thereby, the effect of end trimming the RNA can be further improved, that is, Efficient removal of the transcript 5, cap structure or 5, triphosphate, thereby improving the efficiency of constructing the sequencing library.
- the RNA linker is ligated to obtain a transcript to which the RNA linker is ligated.
- the 5, terminus ligated RNA adaptor of the transcript of 5, cap structure or 5, triphosphate is removed using a ligation reagent, wherein the ligation reagent has T4 RNA ligase activity.
- the ligation reagent comprises: T4 RNA ligase, a buffer, a soluble salt, dithiothreitol.
- the pH of the linking reagent is 7.5.
- the buffer is Tris-HCl.
- the soluble salt is magnesium chloride.
- the RNA linker is ligated at the 5, end of the 5, cap structure or 5, triphosphate transcript at 30 degrees Celsius using a ligation reagent. Thereby, the efficiency of the ligation junction can be improved, thereby improving the efficiency of constructing the sequencing library.
- S400 reverse transcription: A transcript to which an RNA linker is ligated is reverse transcribed to obtain a C DNA corresponding to the transcript.
- the reverse transcription primer used for reverse transcription has a sequence corresponding to the RNA linker at its end, whereby the resulting cDNA will also have a linker at its end, thereby facilitating subsequent Library construction and sequencing.
- corresponding to an RNA linker means that a sequence contained in a reverse transcription primer is capable of matching with an RNA linker, and is capable of performing an amplification reaction, thereby obtaining a cDNA having a linker at both ends. .
- one of the two reverse transcription primers for reverse transcription contains the same sequence as one of the RNA linkers, and the other reverse transcription primer contains a sequence complementary to the other RNA linker.
- the reverse transcription uses an oligonucleotide having the sequence of SEQ ID NO: 1 as a reverse transcription primer.
- at least one N of the reverse transcription primer (SEQ ID NO: 1) is thiolated, thereby preventing degradation of the primer by nuclease.
- the penultimate N of the reverse transcription primer (SEQ ID NO: 1) is modified by thio.
- S500 (amplification): The cDNA was amplified to obtain an amplification product.
- a person skilled in the art can perform amplification by any known method, for example, by a conventional PCR method, it is only necessary to design a corresponding primer according to the sequence of the linker.
- S600 sequencing library: Based on the amplification product, a sequencing library was constructed. Those skilled in the art can refer to the amplification products according to the sequencing method that is desired to be used. Those skilled in the art can refer to the operation instructions provided by the manufacturer, and details are not described herein. It should be noted that the amplification products obtained by the treatment according to the method of the present invention can be applied to Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology and Nanopore sequencing technology enables high-throughput sequencing.
- a sequencing library can be efficiently constructed for an transcript enriched in a nucleic acid sample, thereby being capable of simultaneously applying high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, having an operation cartridge, and an accuracy rate Many advantages of high cost and low cost.
- a step of purifying the product may be optionally included, and according to an embodiment of the present invention, the purified RNA may be phenol/chloroform/isoamyl alcohol (volume ratio of 25:24).
- Extraction, ethanol precipitation, in order to remove the enzyme in the reaction mixture, so as not to affect the reaction of the next step, and precipitate with ethanol, can also retain one
- small molecule transcripts such as microRNAs, allow TSS information from this portion of non-coding RNA to be obtained, helping to understand the state of transcriptional regulation.
- the invention proposes a sequencing library which is constructed by the method described above.
- the sequencing library enables efficient sequencing of RNA transcripts and can be applied to both high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, with many advantages of operating the cartridge, high accuracy and low cost.
- the invention proposes a nucleic acid sample sequencing method.
- the nucleic acid sample sequencing method comprises: constructing a sequencing library according to the method described above; and sequencing the sequencing library to obtain a sequencing result.
- the sequencing is performed using at least one of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology.
- the sequencing is performed using Illumina Hiseq 2000.
- the invention proposes a method of determining TSS.
- the method of determining a TSS comprises: extracting an RNA sample from a host; obtaining a sequencing result composed of a plurality of sequencing sequences using the method described above; and determining a TSS based on the sequencing result. With this method, the TSS in the nucleic acid sample can be effectively determined.
- the RNA sample is at least a portion of total RNA of the host.
- the host may be a eukaryote, such as a human, or a prokaryote, such as E. coli.
- determining the TSS based on the sequencing result further includes: comparing the sequencing data with a reference sequence;
- the reference sequence comprises at least a portion of a 5,-UTR sequence of a predetermined gene, and a sequencing sequence capable of pairing with the reference sequence and upstream of the reference sequence is selected as a positive sequence, and the positive is determined
- the first base of the sequence serves as the transcription start site.
- predetermined gene refers to a range of possible inclusions of a series of genes pre-set on a reference genome, which may or may not be known by bioinformatics. of.
- the length of the reference sequence is not particularly limited, and according to an embodiment of the present invention, the reference sequence contains at least a translation initiation site of a predetermined gene and a sequence of a predetermined length upstream thereof.
- the transcription start site can be included by selecting the length of the reference sequence.
- the reference sequence comprises a nucleic acid sequence between a translation initiation site of the predetermined gene and a 700 bp site upstream of the translation initiation site
- the reference sequence comprises a nucleic acid sequence between a translation initiation site of the predetermined gene and a 5000 bp site upstream of the translation initiation site.
- the alignment can be performed using SOAP Alignment.
- a high-throughput sequencing technique is obtained by a short sequence mapping program, soapalignment v2.2.
- the clean sequence fragments obtained are aligned to the reference genome and the reference gene sequence, respectively, and base mismatches are not allowed.
- Reference genomic sequences and reference gene sequences are available in public databases.
- the method further comprises screening the positive sequence, wherein the screening principle is that the number of the positive sequences is N times the average of the number of sequencing sequences inside the predetermined gene.
- N is a real number greater than 1, preferably, the N is a real number of at least 10.
- the results may be first screened to obtain reliable TSS information.
- the screening method is as follows: It is assumed that the first position of the clean sequence alignment to the gene (sequence corresponding to the predetermined gene) is the original TSS, but these sequences may be compared to the TSS which becomes a false positive inside the gene, so Need to filter further.
- This method can make the obtained sequence enriched at the 5th end of the gene, so the number of real TSS sequences will be higher than the average number of sequences falling inside the gene, so a multiple N filter TSS is introduced between them, ie The number of sequences of the screened TSS is determined to be a true TSS if it falls N times the average of the number of internal sequences of the corresponding gene.
- N may be a real number of at least 10.
- the method further comprises performing a chi-square test on the screening result.
- the check value of the chi-square test is 3.84 or more, the confidence is greater than 95%.
- the method uses a chi-square test to verify the reliability of the filtering result. Specifically, based on the previous embodiment, the average of the multiples corresponding to all TSSs is calculated, and Standard deviation, after normalization, calculate the chi-square value using the following formula:
- the chi-square test table when the confidence level is 0.95, the chi-square value is 3.84, so the TSS with reliability greater than 95% can be obtained.
- the chi-square value calculated according to the formula must be greater than 3.84.
- the step of removing the unqualified sequence from the sequencing sequence to obtain a clean sequencing sequence may also be included.
- the sequence of failures includes:
- a number of bases whose sequencing quality is below a certain threshold exceeds 50% of the number of bases in the entire sequence and is considered to be an unqualified sequence.
- the low quality threshold is determined by the specific sequencing technology and sequencing environment;
- Bases with undefined sequencing results in the sequence are considered to be unqualified sequences by more than 10% of the total number of bases in the sequence;
- sample linker sequence In addition to the sample linker sequence, it is aligned with other exogenous sequences introduced by experiments, such as various linker sequences. A foreign sequence is considered to be a non-conforming sequence if it exists in the sequence.
- sequence data obtained by removing the unqualified sequence of the original sequence data is called clean reads and can be used as the basis for subsequent analysis, thereby improving the effectiveness of subsequent analysis.
- the screened TSS can be Divided into two categories, one is a TSS that can be compared to the genome and has a corresponding gene annotation, called an annotated TSS; the other is a gene that can be compared to the genome but has no annotation around it. Information, called unannotated TSS, can be used for the prediction of new genes.
- TSS Note: The TSS that falls on known genes is mainly annotated here, including the expression level of TSS, the location of TSS, and the corresponding gene annotation information.
- the TSS found by the same species in the method can be visually displayed in the form of a picture to form a TSS map, and the TSS can be visually seen from the map. Location and their expression. At the same time, the difference in TSS expression and distribution in different samples can also be seen.
- New gene prediction For TSS in which no reference gene is found nearby, sequences near these TSSs can be extracted for gene prediction. Prokaryotes are predicted using glimmer, and eukaryotes are predicted using genscan.
- the analysis results can be used to plot the TSS distribution of the gene or region of interest.
- the invention proposes an enrichment reagent for enriching transcripts from an RNA sample.
- the enrichment reagent has 5,-exeruclease activity.
- the enrichment reagent can effectively enrich transcripts, and thus can be applied to high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA simultaneously, and has many advantages of operating a single cartridge, high accuracy and low cost.
- the enrichment reagent contains DNase I. Thereby, the specificity and efficiency of the degradation of the RNA of the monophosphate can be further improved, thereby further improving the efficiency of the method of enriching the transcript.
- the enrichment reagent may further contain a buffer and a soluble salt to further increase the enzyme activity of DNase.
- the pH of the enrichment reagent is 8.0.
- the buffer is Tris-HCl, and the soluble salt is at least one selected from the group consisting of sodium chloride and magnesium chloride.
- the RNA sample is treated with the enrichment reagent at 30 degrees Celsius. Thereby, the efficiency of enriching the transcript using the enrichment reagent according to the embodiment of the present invention can be further improved.
- Examples of the enzyme having 5, exo-exonuclease activity according to an embodiment of the present invention may include: exonuclease XRN-1, TerminatorTM is dependent on exonuclease of 5' phosphate or TAKARATM alkaline phosphatase .
- the invention proposes an apparatus for constructing a sequencing library.
- the apparatus for constructing a sequencing library comprises: a transcript enrichment unit 211, an end trimming unit 212, an RNA adaptor joining unit 213, a reverse transcription unit 214, an amplification unit 215, and a library construction.
- Unit 216 a transcript enrichment unit 211, an end trimming unit 212, an RNA adaptor joining unit 213, a reverse transcription unit 214, an amplification unit 215, and a library construction.
- the transcript enrichment unit 211 is provided with the enrichment reagent described above to enrich the transcript from the RNA sample; the end trimming unit 212 is connected to the transcript enrichment unit 211, and is adapted To remove the 5, cap structure or 5, triphosphate of the transcript to obtain a transcript of the 5, cap structure or 5, triphosphate; the RNA linker unit 213 is connected to the end trim unit 212 and is adapted to be removed 5, a cap structure or a transcript of 5, triphosphate, the 5, terminus is linked to an RNA linker to obtain a transcript to which an RNA linker is ligated; the transcription unit 214 is linked to the RNA linker unit 213 and is adapted to be ligated A transcript having an RNA linker is reverse transcribed to obtain a cDNA corresponding to the transcript; an amplification unit 215 and the opposite A transcription unit 214 is ligated and is adapted to amplify the cDNA to obtain an amplification product; a library construction unit 216 is
- the device can efficiently construct a sequencing library for the transcripts enriched in the nucleic acid sample, thereby being capable of simultaneously applying high-throughput sequencing of TSS of eukaryotic and prokaryotic RNA, having an operation cartridge, high accuracy and cost Low many advantages.
- the end trimming unit 212 is provided with an end trimming reagent, wherein the terminal trimming reagent has tobacco acid pyrophosphatase activity.
- the conditioning reagent comprises: tobacco acid pyrophosphatase, soluble salt, EDTA, ⁇ -mercaptoethanol, and Triton-X 100.
- the soluble salt is sodium acetate.
- the pH of the conditioning agent is 7.5.
- an oligonucleotide having the sequence of SEQ ID NO: 1 is provided in the reverse transcription unit 214 as a reverse transcription primer.
- at least one N of the reverse transcription primer is modified by thio.
- the penultimate N of the reverse transcription primer is modified by thio.
- the RNA adaptor ligation unit 213 is provided with a ligation reagent, wherein the ligation reagent has T4 RNA ligase activity.
- the ligation reagent comprises: T4 RNA ligase, a buffer, a soluble salt, dithiothreitol.
- the pH of the linking reagent is 7.5.
- the buffer is Tris-HCl.
- the soluble salt is magnesium chloride.
- the invention provides a nucleic acid sample sequencing device.
- the apparatus includes: a library construction device 210, which is the device described above, to construct a sequencing library for a nucleic acid sample; and a sequencing device 220, the sequencing device 220 is coupled to the library constructing device 210 and is adapted to sequence the sequencing library to obtain sequencing results.
- a library construction device 210 which is the device described above, to construct a sequencing library for a nucleic acid sample
- a sequencing device 220 the sequencing device 220 is coupled to the library constructing device 210 and is adapted to sequence the sequencing library to obtain sequencing results.
- the sequencing device is at least one of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS system, and nanopore sequencing system.
- the invention proposes a system for determining a TSS.
- the system includes: a sample extraction device 100 for extracting an RNA sample from a host; a nucleic acid sample sequencing device 200, the nucleic acid sample sequencing device, and the sample extraction The devices are connected, and the sequencing device is a nucleic acid sample sequencing device as described above to sequence the RNA sample to obtain a sequencing result composed of a plurality of sequencing sequences; and a TSS determining device 300, the TSS determining device 300 is coupled to the sequencing device 200 and is adapted to determine a TSS based on the sequencing results.
- the system can effectively determine TSS in a nucleic acid sample.
- the TSS determining apparatus further includes: a comparing device 310, the comparing device is configured to compare the sequencing data with a reference sequence; determining device 320, the Determining means adapted to determine said TSS based on the result of the comparison, wherein said reference sequence comprises at least a portion of a 5,-UTR sequence of a predetermined gene, and said determining means 320 is adapted to: select to be able to compare The sequencing sequence corresponding to the sequence corresponding to the predetermined gene and closest to the sequence 5 corresponding to the predetermined gene, as a positive sequence, and indeed The first base of the positive sequence is the transcription start site.
- the comparison device is adapted to perform the comparison using SOAP Alignment.
- the determining apparatus further includes a selecting unit, wherein the selecting unit is adapted to screen the positive sequence, wherein the principle of the screening is: the number of sequences of the positive sequence is the The predetermined number of internal sequences of the sequence corresponding to the predetermined gene is N times or more, wherein the N is a real number greater than 1, and preferably, N may be a real number of at least 10.
- the determining means further comprises a checking unit, the checking unit being adapted to perform a chi-square test on the screening result.
- the check value of the chi-square test is 3.84 or more, and the corresponding confidence is greater than 95%.
- predetermined gene as used in the present invention is to be understood broadly, and may refer to any known gene, and may also refer to a nucleic acid sequence which predicts a protein to be encoded by a known method.
- the methods used in the examples mainly include TSS library construction and post-sequencing analysis, wherein the TSS library construction method mainly includes the following steps:
- RNA phenol/chloroform/isoamyl alcohol (25:24:1) is extracted and purified (2) to obtain RNA;
- step (1) Library concentration and fragment size were determined using an Agilent Bioanalyzer 2100 and Q-PCR.
- the amount of total RNA is 5 ⁇ ⁇ .
- the reagent I contains: l L 5, exo-exonuclease ( ⁇ / ⁇ 50 mM buffer salt, 2 mM-100 mM soluble salt, pH 8.0, the solvent is water.
- the reagent 4 in the buffer 4 is Tris- HCl.
- the soluble salt in the reagent I is sodium chloride or magnesium chloride.
- the temperature of the RNA obtained in the step (2) mixed with the reagent I is 30 °C.
- the reagent II contains: 0.2 L of tobacco acid pyrophosphatase (lOU ⁇ L), 50 mM Soluble salt, pH 6.0, ImM EDTA, 0.1% ⁇ -mercaptoethanol, 0.01% Triton X-100, solvent water.
- the soluble salt in Reagent II is sodium acetate. The temperature at which the sample and reagent II were mixed was 37 °C.
- the reagent III contains: ⁇ ⁇ 4 RNA ligase 1, 50 mM buffer salt, 10 mM soluble salt, 1 mM dithiothreitol, pH 7.5, and the solvent is water.
- the buffer salt in Reagent III is Tris-HCl.
- the soluble salt in Reagent III is magnesium chloride.
- the mixing temperature of the obtained RNA and the reagent III in the step (6) is 20 °C.
- the specific reverse transcription sequence used in the step (7) is: 5-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNN -3 , where the penultimate N is thiolated.
- the purified RNA is extracted with phenol/chloroform/isoamyl alcohol, and ethanol is precipitated to remove the enzyme in the reaction mixture, so as not to affect the reaction of the next step, and precipitate with ethanol. It also retains some small molecule transcripts, such as microRNAs, to obtain TSS information from this part of the non-coding RNA, thus helping to understand the state of transcriptional regulation.
- the bioinformatics analysis of the data generated by sequencing the TSS library includes the following steps:
- high-throughput sequencing technology can be Illumina Hiseq2000 sequencing technology, or other existing high-throughput sequencing technologies.
- the unqualified sequence includes that the number of bases whose sequencing quality is below a certain threshold exceeds 50% of the number of bases in the entire sequence and is considered to be a non-conforming sequence.
- the low quality threshold is determined by the specific sequencing technology and the sequencing environment; the number of bases with undetermined sequencing results in the sequence (such as N in Illumina Hiseq2000 sequencing results) exceeds 10% of the number of bases in the entire sequence.
- Qualified sequence in addition to the sample linker sequence, aligned with other exogenous sequences introduced by the experiment, such as various linker sequences. If there is a foreign sequence in the sequence, it is considered to be a non-conforming sequence.
- the sequence data obtained by removing the unqualified sequence of the original sequence data is called clean reads and serves as the basis for subsequent analysis.
- a clean sequence fragment obtained by high-throughput sequencing technology is separately aligned to a reference genome and a reference gene sequence by a short sequence mapping program soapalignment v2.2, which does not allow base mismatches.
- Reference genomic sequences and reference gene sequences are available in public databases.
- the screening method is: Suppose the clean sequence is aligned to the original position of the genome as the original TSS, but these sequences may be compared to the TSS that is false positive inside the gene, so further filtering is required. This method allows the sequence we obtain to be enriched at the 5' end of the gene, so the actual number of TSS sequences will be higher than the average number of sequences falling inside the gene, so a multiple N filter TSS is introduced between them. That is, the number of sequences of the screened TSS is determined to be a true TSS if it falls N times the average of the number of internal sequences of the corresponding gene.
- the method uses a chi-square test to verify the reliability of the filtering result, that is, the chi-square test value should be greater than 3.84, that is, the confidence is greater than 95%.
- the screened TSS can be divided into two categories, one is a TSS that can be aligned to the genome and has a corresponding gene annotation. It is an annotated TSS; the other is genetic information that can be compared to the genome but has no annotations around it, called unannotated TSS, which can be used for the prediction of new genes.
- TSS Note: The TSS of known genes is mainly annotated here, including the expression of TSS, the location of TSS, and the corresponding gene annotation information.
- the TSS found by the same species in the method can be visually displayed in the form of a picture to form a TSS map, and the TSS can be visually seen from the map. Location and their expression. At the same time, the difference in TSS expression and distribution in different samples can also be seen.
- New gene prediction For TSS in which no reference gene is found nearby, sequences near these TSSs can be extracted for gene prediction. Prokaryotes are predicted using glimmer, and eukaryotes are predicted using genscan.
- the analysis results can be used to plot the TSS distribution of the gene or region of interest.
- Example 1 Sequence Analysis of Transcriptional Start Sites of Human RNA Samples and E. coli RNA Samples Human RNA samples (Sample 1) were purchased from Agilent, Inc., E. coli RNA (Sample 2) was obtained by culturing E. coli to logarithmic growth phase. RNA.
- RNA linker Take 1-5 g of total RNA, digest it with DNasel, purify by ethanol precipitation, and mix the purified RNA with reagent I to obtain complete RNA containing 5, cap or 5, triphosphate, using phenol/chloroform/ After isobutanol extraction and purification, mix with reagent II to remove the 5, the end of the cap or triphosphate to become monophosphoric acid, purified with phenol / chloroform / isoamyl alcohol extraction, 5, monophosphate RNA and Reagent III and RNA linker were mixed and reacted, and a linker was added at the 5' end of RNA.
- the specific reverse transcription primer was used to reverse-transcribe the RNA with 5, and the linker was reverse-transferred into cDNA with a fixed sequence at both ends, and purified by magnetic beads.
- the cDNA product was amplified by polymerase chain reaction (PCR), and the PCR product was purified by magnetic beads and sequenced on a machine. The sequencing was performed using Illumina Hiseq2000.
- FIG. 7 shows the distribution of the screened TSS on the genome.
- the upper and lower panels are the TSS profiles of human RNA and E. coli RNA samples, respectively. It is the starting site of the coding region of the gene, and the upstream is the site of transcription initiation. As can be seen from the figure, most of the sequences fall upstream of the coding region of the gene.
- the first is the classification of TSS.
- the screened TSS is divided into two categories. One is the TSS that can be compared to the genome and has the corresponding gene annotation, which is called the annotated TSS. The other is the comparison. Gene information on the genome but without annotations around it, called unannotated TSS, can be used for the prediction of new genes.
- the TSSs of known genes are mainly annotated, including the expression level of TSS, the location of TSS, and the corresponding gene annotation information.
- construct the TSS map and the inventor visually displays the TSS found by the same species in the method in the form of a picture.
- the location of each TSS and their expression can be seen very intuitively from the map.
- the difference in TSS expression and distribution in different samples can also be seen.
- each is a TSS map of a sample of 8 people, from which the distribution of TSS in different samples can be seen.
- FIG 9 is the base distribution map upstream of the TSS, where the abscissa 1 corresponds to the location of the TSS, with ⁇ as the main (A/G), the above image shows It is the base distribution map of human TSS, with obvious GC enrichment region, which is also the main promoter type of eukaryote.
- the figure below shows the base distribution map of E. coli, at the upstream -10 region of TSS. A typical TATA box can also be found;
- Figure 10 shows the length distribution of 5, UTR in human (top) and E. coli (bottom), ie the distance from the TSS to the coding region, 5, the length of the UTR affects gene function. Play, the eukaryotic 5, UTR is longer than the original.
- the correlation analysis of the results of two parallel experiments is also performed to obtain an evaluation of the reliability and operational stability of the experimental results, as shown in Fig. 11, the correlation between two parallel experiments of the same sample.
- the present invention utilizes the results of the analysis to map the TSS distribution of the gene or region of interest, as shown in Figure 12, which is the TSS distribution of the two human genes NM-018997 and NM-031901, which are occurring.
- the variable-cut gene in the figure, the vertical line of red ⁇ indicates the screening of TSS, the vertical line of black is the sequence obtained before filtration, the blue horizontal line represents the exon of the gene, and the intron of the yellow horizontal line ⁇
- the figure below shows the TSS distribution of an operon of E. coli.
- the description of the terms “one embodiment”, “some embodiments”, “example”, “specific example”, or “some examples” and the like means a specific feature described in connection with the embodiment or example.
- a structure, material or feature is included in at least one embodiment or example of the invention.
- the schematic representation of the above terms does not necessarily refer to the same embodiment or example.
- the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Plant Pathology (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne une méthode d'enrichissement d'un produit de transcription à partir d'un échantillon d'ARN et son utilisation. Ladite méthode d'enrichissement d'un produit de transcription à partir d'un échantillon d'ARN consiste à traiter un échantillon d'ARN à l'aide d'un réactif d'enrichissement en vue d'enrichir un produit de transcription, le réactif d'enrichissement présentant l'activité d'une 5'-monophosphate exonucléase, le produit de transcription consistant en une molécule d'ARN avec une structure en coiffe ou un groupe triphosphate à son extrémité 5'. L'utilisation de ladite méthode peut permettre d'enrichir efficacement le produit de transcription.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210379402.8A CN103710336B (zh) | 2012-09-29 | 2012-09-29 | 从rna样本富集转录本的方法及其用途 |
CN201210379402.8 | 2012-09-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014048185A1 true WO2014048185A1 (fr) | 2014-04-03 |
Family
ID=50386954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/081581 WO2014048185A1 (fr) | 2012-09-29 | 2013-08-15 | Méthode d'enrichissement d'un produit de transcription à partir d'un échantillon d'arn et son utilisation |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103710336B (fr) |
WO (1) | WO2014048185A1 (fr) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106319639B (zh) * | 2015-06-17 | 2018-09-04 | 深圳华大智造科技有限公司 | 构建测序文库的方法及设备 |
CN113463202B (zh) * | 2020-03-31 | 2022-04-15 | 广州序科码生物技术有限责任公司 | 一种新的rna高通量测序的方法、引物组和试剂盒及其应用 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000056913A1 (fr) * | 1999-03-19 | 2000-09-28 | Genetics Institute, Inc. | Elongation vectorielle liee aux amorces (pave): strategie de clonage d'adnc orientee 5' |
WO2002028876A2 (fr) * | 2000-10-05 | 2002-04-11 | Riken | Lieurs oligonucleotidiques comprenant une partie cohesive variable et procede de preparation de banques de polynucleotides au moyen desdits lieurs |
WO2007117039A1 (fr) * | 2006-04-07 | 2007-10-18 | Riken | Méthode pour isoler des extrémités 5' d'acide nucléique et son application |
CN101967476A (zh) * | 2010-09-21 | 2011-02-09 | 深圳华大基因科技有限公司 | 一种基于接头连接的DNA PCR-Free标签文库构建方法 |
CN102076851A (zh) * | 2008-05-02 | 2011-05-25 | Epi中心科技公司 | Rna的选择性的5′连接标记 |
CN102533752A (zh) * | 2012-02-28 | 2012-07-04 | 盛司潼 | 一种Oligo dT引物及构建cDNA文库的方法 |
CN102534813A (zh) * | 2011-11-15 | 2012-07-04 | 杭州联川生物信息技术有限公司 | 构建中小片段rna测序文库的方法 |
WO2013063308A1 (fr) * | 2011-10-25 | 2013-05-02 | University Of Massachusetts | Procédé enzymatique pour l'enrichissement en arn coiffés, trousses pour la mise en œuvre de celui-ci et compositions issues de ce procédé |
-
2012
- 2012-09-29 CN CN201210379402.8A patent/CN103710336B/zh active Active
-
2013
- 2013-08-15 WO PCT/CN2013/081581 patent/WO2014048185A1/fr active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000056913A1 (fr) * | 1999-03-19 | 2000-09-28 | Genetics Institute, Inc. | Elongation vectorielle liee aux amorces (pave): strategie de clonage d'adnc orientee 5' |
WO2002028876A2 (fr) * | 2000-10-05 | 2002-04-11 | Riken | Lieurs oligonucleotidiques comprenant une partie cohesive variable et procede de preparation de banques de polynucleotides au moyen desdits lieurs |
WO2007117039A1 (fr) * | 2006-04-07 | 2007-10-18 | Riken | Méthode pour isoler des extrémités 5' d'acide nucléique et son application |
CN102076851A (zh) * | 2008-05-02 | 2011-05-25 | Epi中心科技公司 | Rna的选择性的5′连接标记 |
CN101967476A (zh) * | 2010-09-21 | 2011-02-09 | 深圳华大基因科技有限公司 | 一种基于接头连接的DNA PCR-Free标签文库构建方法 |
WO2013063308A1 (fr) * | 2011-10-25 | 2013-05-02 | University Of Massachusetts | Procédé enzymatique pour l'enrichissement en arn coiffés, trousses pour la mise en œuvre de celui-ci et compositions issues de ce procédé |
CN102534813A (zh) * | 2011-11-15 | 2012-07-04 | 杭州联川生物信息技术有限公司 | 构建中小片段rna测序文库的方法 |
CN102533752A (zh) * | 2012-02-28 | 2012-07-04 | 盛司潼 | 一种Oligo dT引物及构建cDNA文库的方法 |
Non-Patent Citations (1)
Title |
---|
BORRIES, A. ET AL.: "Differential RNA Sequencing (dRNA-Seq): Deep-Sequencing-Based Analysis BORRIES, A. et al., Differential RNA Sequencing (dRNA-Seq): Deep-Sequencing-Based Analysis of Primary Transcriptomes.", TAG-BASED NEXT GENERATION SEQUENCING, 23 January 2012 (2012-01-23), pages 109 - 121 * |
Also Published As
Publication number | Publication date |
---|---|
CN103710336A (zh) | 2014-04-09 |
CN103710336B (zh) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hu et al. | Next-generation sequencing technologies: An overview | |
Podnar et al. | Next‐generation sequencing RNA‐Seq library construction | |
Deininger | Alu elements: know the SINEs | |
JP6483249B2 (ja) | 単離されたオリゴヌクレオチドおよび核酸の配列決定におけるその使用 | |
Morozova et al. | Applications of next-generation sequencing technologies in functional genomics | |
Head et al. | Library construction for next-generation sequencing: overviews and challenges | |
WO2016037537A1 (fr) | Procédé pour la construction d'une banque de séquençage sur la base d'une molécule d'adn simple brin et son application | |
CN108220394B (zh) | 基因调控性染色质相互作用的鉴定方法、系统及其应用 | |
EP2083090A1 (fr) | Analyse d'interaction d'acide nucléique | |
EP2702175A2 (fr) | Procédés et compositions pour l'analyse d'acide nucléique | |
CN104153003A (zh) | 一种基于illumina测序平台的大片段DNA文库的构建方法 | |
WO2012028105A1 (fr) | Banque de séquençage et son procédé de préparation, procédé et système de détermination de séquence terminale d'acide nucléique | |
WO2018184495A1 (fr) | Procédé de construction d'une bibliothèque d'amplicons à travers un procédé à étape unique | |
Ren et al. | MicroRNA signatures from multidrug‑resistant Mycobacterium tuberculosis | |
CN102839168A (zh) | 核酸探针及其制备方法和应用 | |
JP2008504805A (ja) | 塩基配列タグの調製方法 | |
CN104711340A (zh) | 一种转录组测序方法 | |
Cao et al. | Very long intergenic non-coding (vlinc) RNAs directly regulate multiple genes in cis and trans | |
CN111433359B (zh) | 制备cDNA文库的方法 | |
US20210102246A1 (en) | Genetic test for detecting congenital adrenal hyperplasia | |
CN109750086B (zh) | 单链环状文库的构建方法 | |
WO2014048185A1 (fr) | Méthode d'enrichissement d'un produit de transcription à partir d'un échantillon d'arn et son utilisation | |
CN110951827B (zh) | 一种转录组测序文库快速构建方法及其应用 | |
CN117343989B (zh) | 一种检测基因融合的靶向建库方法 | |
Dunwell et al. | Adaptor Template Oligo-Mediated Sequencing (ATOM-Seq) is a new ultra-sensitive UMI-based NGS library preparation technology for use with cfDNA and cfRNA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13841076 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/08/2015) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13841076 Country of ref document: EP Kind code of ref document: A1 |