WO1999042621A2 - Proprietes thermodynamiques d'acides nucleiques - Google Patents
Proprietes thermodynamiques d'acides nucleiques Download PDFInfo
- Publication number
- WO1999042621A2 WO1999042621A2 PCT/US1999/003754 US9903754W WO9942621A2 WO 1999042621 A2 WO1999042621 A2 WO 1999042621A2 US 9903754 W US9903754 W US 9903754W WO 9942621 A2 WO9942621 A2 WO 9942621A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- nucleotide
- protein
- determining
- nucleotide sequence
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 150000007523 nucleic acids Chemical class 0.000 title claims abstract description 61
- 102000039446 nucleic acids Human genes 0.000 title claims abstract description 27
- 108020004707 nucleic acids Proteins 0.000 title claims abstract description 27
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 53
- 108090000623 proteins and genes Proteins 0.000 claims description 86
- 239000003446 ligand Substances 0.000 claims description 73
- 230000027455 binding Effects 0.000 claims description 69
- 108091033319 polynucleotide Proteins 0.000 claims description 51
- 102000040430 polynucleotide Human genes 0.000 claims description 51
- 239000002157 polynucleotide Substances 0.000 claims description 51
- 239000002773 nucleotide Substances 0.000 claims description 49
- 125000003729 nucleotide group Chemical group 0.000 claims description 49
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 44
- 230000035772 mutation Effects 0.000 claims description 40
- 230000009257 reactivity Effects 0.000 claims description 34
- 238000009826 distribution Methods 0.000 claims description 21
- 230000000890 antigenic effect Effects 0.000 claims description 16
- 230000036438 mutation frequency Effects 0.000 claims description 5
- 238000005266 casting Methods 0.000 claims description 2
- 239000002253 acid Substances 0.000 claims 1
- 108091005461 Nucleic proteins Proteins 0.000 abstract description 2
- 108020004414 DNA Proteins 0.000 description 33
- 108091034117 Oligonucleotide Proteins 0.000 description 20
- 108020004999 messenger RNA Proteins 0.000 description 20
- 108090000765 processed proteins & peptides Chemical group 0.000 description 20
- 239000000074 antisense oligonucleotide Substances 0.000 description 19
- 238000012230 antisense oligonucleotides Methods 0.000 description 19
- 230000000692 anti-sense effect Effects 0.000 description 17
- 230000000694 effects Effects 0.000 description 17
- 102000004196 processed proteins & peptides Human genes 0.000 description 16
- 108020000948 Antisense Oligonucleotides Proteins 0.000 description 15
- 229920001184 polypeptide Chemical group 0.000 description 15
- 230000001965 increasing effect Effects 0.000 description 14
- 239000000523 sample Substances 0.000 description 13
- 241000894006 Bacteria Species 0.000 description 12
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 10
- 108700023483 L-lactate dehydrogenases Proteins 0.000 description 9
- 239000000203 mixture Substances 0.000 description 9
- 239000013612 plasmid Substances 0.000 description 9
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 8
- 230000008018 melting Effects 0.000 description 8
- 238000002844 melting Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 7
- 230000000295 complement effect Effects 0.000 description 7
- 230000003247 decreasing effect Effects 0.000 description 7
- 238000013461 design Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 108091026890 Coding region Proteins 0.000 description 6
- 230000007423 decrease Effects 0.000 description 6
- 238000006467 substitution reaction Methods 0.000 description 6
- 238000003786 synthesis reaction Methods 0.000 description 6
- 108091005904 Hemoglobin subunit beta Proteins 0.000 description 5
- 102000003855 L-lactate dehydrogenase Human genes 0.000 description 5
- 150000001413 amino acids Chemical class 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 5
- 239000000499 gel Substances 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 4
- 108091035707 Consensus sequence Proteins 0.000 description 4
- 206010020460 Human T-cell lymphotropic virus type I infection Diseases 0.000 description 4
- 241000714260 Human T-lymphotropic virus 1 Species 0.000 description 4
- 125000003275 alpha amino acid group Chemical group 0.000 description 4
- 230000001580 bacterial effect Effects 0.000 description 4
- 239000011780 sodium chloride Substances 0.000 description 4
- 241000193830 Bacillus <bacterium> Species 0.000 description 3
- 108020004705 Codon Proteins 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000000368 destabilizing effect Effects 0.000 description 3
- -1 e.g. Proteins 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 229910052739 hydrogen Inorganic materials 0.000 description 3
- 239000001257 hydrogen Substances 0.000 description 3
- 238000001727 in vivo Methods 0.000 description 3
- 238000002703 mutagenesis Methods 0.000 description 3
- 231100000350 mutagenesis Toxicity 0.000 description 3
- 238000002264 polyacrylamide gel electrophoresis Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 108091008146 restriction endonucleases Proteins 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 241000304886 Bacilli Species 0.000 description 2
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 102100021519 Hemoglobin subunit beta Human genes 0.000 description 2
- 102100034343 Integrase Human genes 0.000 description 2
- 102000003960 Ligases Human genes 0.000 description 2
- 108090000364 Ligases Proteins 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 2
- 239000012472 biological sample Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000869 mutational effect Effects 0.000 description 2
- 230000006911 nucleation Effects 0.000 description 2
- 238000010899 nucleation Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 101710159080 Aconitate hydratase A Proteins 0.000 description 1
- 101710159078 Aconitate hydratase B Proteins 0.000 description 1
- 208000035143 Bacterial infection Diseases 0.000 description 1
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 1
- 101710096438 DNA-binding protein Proteins 0.000 description 1
- 108010005054 Deoxyribonuclease BamHI Proteins 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 241000192125 Firmicutes Species 0.000 description 1
- 108010010803 Gelatin Proteins 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000008300 Mutant Proteins Human genes 0.000 description 1
- 108010021466 Mutant Proteins Proteins 0.000 description 1
- 241001045988 Neogene Species 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 102000044126 RNA-Binding Proteins Human genes 0.000 description 1
- 101710105008 RNA-binding protein Proteins 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 108010017842 Telomerase Proteins 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000000211 autoradiogram Methods 0.000 description 1
- 208000022362 bacterial infectious disease Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 229920001222 biopolymer Polymers 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 230000001687 destabilization Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000008273 gelatin Substances 0.000 description 1
- 229920000159 gelatin Polymers 0.000 description 1
- 235000019322 gelatine Nutrition 0.000 description 1
- 235000011852 gelatine desserts Nutrition 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 101150066555 lacZ gene Proteins 0.000 description 1
- 101150104734 ldh gene Proteins 0.000 description 1
- 231100000518 lethal Toxicity 0.000 description 1
- 230000001665 lethal effect Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 229910052943 magnesium sulfate Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 238000002887 multiple sequence alignment Methods 0.000 description 1
- 101150091879 neo gene Proteins 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 239000002831 pharmacologic agent Substances 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000013615 primer Substances 0.000 description 1
- 239000002987 primer (paints) Substances 0.000 description 1
- 238000000159 protein binding assay Methods 0.000 description 1
- 230000004853 protein function Effects 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 229940124597 therapeutic agent Drugs 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
Definitions
- the invention relates to nucleotide and polypeptide sequences having selected functional characteristics or having selected values for a free energy parameter.
- the methods of the invention allow selection of anti-sense oligonucleotides (e.g., for use as pharmacological agents), determination of protein or polypeptide function from a nucleotide sequence which codes for the polypeptide, prediction of antigenic regions of a protein, determination of a region of a polynucleotide or protein which is susceptible to mutation, and the like.
- nucleic acid sequence such as a gene
- the primary structure of the protein or polypeptide which is encoded by a nucleic acid sequence, such as a gene can be determined through the use of the well-known genetic code.
- the function of a protein which is encoded by a particular gene is unknown at the time the gene is sequenced.
- the function of such proteins can often be determined, but the cost can be considerable.
- Antisense therapy involves the administration of exogenous oligonucleotides that bind to a target nucleic acid, typically an RNA molecule, located within cells.
- a target nucleic acid typically an RNA molecule
- antisense is so given because the oligonucleotides are typically complementary to mRNA molecules ("sense strands") which encode a cellular product. The ability * o ⁇ se anti-sense oligonucleotides to inhibit expression of mRNAs.
- oligonucleotide or oligonucleotides
- Anti-sense agents typically need to continuously bind all target RNA molecules so as to inactivate them or alternatively provide a substrate for endogenous ribonuclease H (Rnase H) activity.
- Sensitivity of RN A oligonucleotide complexes, generated by the methods of the present invention, to Rnase H digestion can be evaluated by standard methods (see, e.g., Donia, B. P., et al., J. Biol. Chem. 268 (19):14514-14522 (1993); Kawasaki, A. M., et al., J. Med. Chem. 6(7):831-841 (1993), incorporated herein by reference).
- the invention relates to nucleic acid sequences, and, in particular, to methods for identifying or designing structural and/or energetic characteristics of nucleic acids, and proteins which are encoded by the nucleic acids.
- the invention relates to methods for selecting antisense oligonucleotides.
- Prior art methods do not provide efficient means of determining which complimentary oligonucleotides to a given mRNA will be useful in an application.
- Shorter (15-200) base anti-sense molecules are preferred in clinical applications. In fact, a minimum of 15 base anti-sense oligonucleotides is preferred.
- the invention includes methods for selecting desired anti-sense oligonucleotides from the set of candidates provided by any given nucleic acid, e.g., an mRNA.
- the invention provides a means of determining desired, e.g., sequence positions, e.g., those which present a desired level of free energy variations on the mRNA to design anti-sense oligonucleotides against thus reducing the empiricism currently employed.
- the invention features a method of identifying a site on a nucleic acid sequence having high free energy variability. This allows determination of sites which are preferred for oligonucleotide, e.g, antisense, binding.
- the method includes some or all of the following steps: providing a nucleotide sequence, e.g., sequence from a target gene; casting the nucleotide sequence as the free energy as a function of base pair position: calculating the free energy of X windows centered on a base pair for a plurality of base pairs from the nucleotide sequence for every, or at least a plurality of window sizes between 2 and Y.
- Y is an integer betv ⁇ I 3 and 1.000. more preferably between 2 and 100
- for each window size constructing a free energy distribution along the sequence, preferably normalizing the distribution to a standard scale (to account for the fact that the free energy is proportional to window size) (this calculation gives the results which can be plotted as shown in Figure 1);
- the invention provides for a method for determining preferred anti-sense sequence compliments within a predefined RNA sequence, these are generally high variability sequences.
- high variability can be a relative parameter, e.g., relative to other variability in the sequence. Alternatively it can be relative to a predefined value).
- the invention provides for sets (e.g., sets of 2, 3, 4 or more) of sequences, e.g., anti-sense oligonucleotides, of an optimal duplex free energy or variability but variable length at the sites of anti-sense candidates within candidate regions.
- the invention provides sets of isoenergetic, or isovariable, oligonucleotides, e.g., anti-sense candidates of a set length within candidate region.
- the invention provides for establishing oligonucleotides, (e.g., sets of 2, 3, 4 or more) oligonucleotides, e.g., anti-sense oligonucleotides ,of a preselected melting temperature, Tm within candidate regions.
- oligonucleotides e.g., sets of 2, 3, 4 or more
- oligonucleotides e.g., anti-sense oligonucleotides ,of a preselected melting temperature, Tm within candidate regions.
- the method allows for identification, choosing, and matching of sequences with desired free energy variability characteristics.
- Methods of the invention can be used for any of the following: Determining the best anti-sense candidate regions, or sub-sequences, within any given anti-sense target. Such sequences exhibit wide variation in average energy as a function of increasing length.
- compositions of sequence which display the identified variation in sequence composition with increasing window size.
- the invention provides methods for determining functionally important regions of a protein. In another aspect, the invention provides methods for predicting mutation-prone regions of a protein. In still another aspect, the invention provides methods for determining flanking sequences which provide a pre-selected value for a thermodynamic parameter, such as free energy of ligand binding, to a ligand binding site of a polynucleotide.
- a thermodynamic parameter such as free energy of ligand binding
- the invention provides methods for selecting flanking sequences to a ligand binding site of a nucleic acid which affect ligand binding to a nucleic acid.
- the invention provides methods for determining the ability of a pre-selected nucleic acid sequence to transmit binding energy to a portion of nucleic acid which is remote from the pre-selected sequence.
- the invention provides methods for increasing or decreasing the mutation rate of a nucleic acid sequence, e.g., by stabilizing or destabilizing the nucleotide sequence.
- Any method of the invention can include providing a sequence, e.g., by synthesizing, or by placing in a reaction mixture which includes a carrier, e.g., a liquid, e.g., water.
- a carrier e.g., a liquid, e.g., water.
- Figure 1 is a plot of normalized energy as a function of window size and position along a representative DNA sequence.
- Figure 2 is an overlaid plot of the data shown in Figure 1.
- Figure 3 is a plot of the variability of energy distributions along the representative DNA sequence.
- Figure 4 is a schematic illustration of a ligand binding experiment.
- Figure 5 is an autoradiogram of a gel shift assay.
- Figure 6 shows the sequences of several DNA constructs identified by ligand binding experiments.
- Figure 7 is a graph of the distributions of free energies of 15 base window free energy calculations along the mouse betaglobin major gene are plotted as a function of amino acid position.
- the distributions were determined by calculating the free energy for each of the possible codon sequences dictated by the amino acid sequences and categorized into discrete energy ranges. Since all permutations of codons representing the amino acids are present, the distribution describes a probability of the energies available to the five amino acids in each window.
- the two introns of the betaglobin gene were deleted for this calculation.
- the axes are: x. free energy in calories; y, normalized probability; z. base position in the gene sequence.
- Figure 8 is an alternative representation of the energetic analysis shown in Figure 6.
- the distributions are normalized to their respective means, and each distribution is reduced by 2.5% from each end.
- the energy profile of the actual mouse betaglobin sequence is plotted after normalizing it to the same mean value.
- These confidence intervals define the range over which 95% of the energies of the DNA sequences that can be coded from the amino acid sequence of the- gene can fall.
- Figure 9 is a graph which displays the relative free energy of a segment of the HTLV-I LTR as a function of base position and the natural logarithm of the mutation rate over the same region (inset). Seven HTLV-I sequences were used in the mutation rate calculations.
- Figure 10 The positions along the consensus sequence which have a standard deviation of energy less than 1 kcal determined from the statistical analysis of the 24 ldh sequences are plotted as a function of base position in the consensus sequence. The total number of points is 79. b) The points in the mean free energy deviation versus base position which exceed the +95% and -95% confidence limits are designated as +1 or -1 , respectively, and are plotted as a function of consensus base position. All points which fall within the 95% confidence limits are zero. The two regions denoted by vertical bars have correlations between these two datasets; around 600 and around 825.
- the antigenic c) and nonantigenic d) regions of lactate dehydrogenase are displayed relative to the consensus gapped DNA sequence. Regions at 600 and 825 cross-correlate to a nonantigenic and an antigenic region, respectively. The antigenic region is the third most antigenic region found by Hogrefe et al. Figure 11 shows the wildtype and unstable insert DNA sequences of a neo gene.
- the invention relates to methods for determining structurally or functionally important regions of a nucleic acid sequence or a protein or polypeptide sequence.
- the invention provides a method(s) for determining base position(s) on a preselected mRNA sequence where best hybridization of an oligonucleotide will occur.
- the mRNA may be a pre- mRNA (hnRNA) thus containing untranscribed regions to be spliced out and that included in this mRNA/pre-mRNA are a variety of control sequences which allow binding of various cellular component?
- oligonucleotide e.g., 30 bases
- the method is described below with reference to data from a representative target nucleic acid sequence (LDH M72545, base positions from 64-924, the sequence is available through GENBANK)
- the algorithm for determining relatively "reactive" sites along genomic DNA is based on a representation of duplex DNA in terms of its sequence dependent melting free-energy This provides DNA sequence as energy contours, that when scrutinized in the proper way, can lead to direct determination of specific sites that are optimum for targeting by anti-sense therapeutic agents
- each bp l can be assigned a melting free-energy value.
- ⁇ G, ⁇ G H - B + ( ⁇ G S M . + ⁇ GS 1 1+1 )/2
- ⁇ G H B is the free-energy of hydrogen bonding that typically can take on only two values (for A-T or G-C type bps) and ⁇ G S , , , + ⁇ G S , 1+1 are the nearest-neighbor sequence dependent stacking free-energies for the stacking interactions between bp 1 and bps l+l and l-l Utilizing this equation each bp can be assigned a free-energy of melting
- plotting the values of ⁇ G j vs bp position s results in an energy contour for that particular window size, j. Since the magnitude of ⁇ G j increase with the size of j, relative features of energy contours constructed for different window sizes are difficult to compare directly.
- ⁇ G/ V ) I ( ⁇ G v - ⁇ G (min))
- N w is the number of window sizes
- T m melting temperature
- ⁇ H D and ⁇ S D are the calculated melting enthalpy and entropy for the particular sequence.
- the entropy of nucleation is ⁇ S nuc and is regarded as a constant for a particular type of target in our equational formulation. That is, it does not depend on oligomer length.
- the enthalpy of duplex nucleation, ⁇ H nuc is primarily electrostatic in nature and therefore depends on sequence length, G-C percentage and salt concentration.
- the total strand concentration is C and ⁇ is a factor that properly accounts for sequence degeneracies in association of the ohgomers. Overall, stability of the chosen ohgomers can therefore be adjusted by changes in G-C percentage and length.
- the invention relates to methods for selecting flanking nucleic acid sequences, e.g., nucleic acid sequences which flank (in the 3' and or 5' direction) a selected nucleic acid sequence (such as a ligand binding site).
- the methods of the invention are useful for determining flanking sequences which provide desired characteristics, such as Tm, ability of the ligand binding site to bind to a ligand, ability of a ligand to react with the nucleic acid sequence, stability of the nucleic acid sequence, and the like.
- thermodynamic parameters of a nucleic acid sequence can be modulated by providing appropriate flanking sequences.
- a flanking sequence(s) can be selected to increase the ability of a ligand to bind to a ligand binding site; decrease the ability of a ligand to bind to a ligand binding site; increase the ability of a ligand to react with a nucleic acid sequence; decrease the ability of a ligand to react with a nucleic acid sequence; increase the stability of the nucleic acid sequence (e.g., the mutability of the sequence); ''en/ease the stability of the nucleic acid sequence: and the like.
- the ability of a nucleic acid sequence to bind to a ligand is related to the thermodynamic stability of the nucleic acid sequence. It is believed that less stable sequences bind to ligands better than more stable sequences do.
- the ability of a polynucleotide to bind to a ligand is related to the stability of the polynucleotide sequence.
- the stability of a polynucleotide which includes the ligand binding site can be modulated by providing flanking sequences which affect the stability of the polynucleotide, e.g., at the ligand binding site.
- the method includes the steps of providing a polynucleotide which includes a ligand binding site and at least one sequence which flanks the ligand binding site (e.g., in the 3' and/or 5' direction); and determining the ability of the ligand to bind to the ligand binding site to bind to (or react with) the ligand.
- a plurality (such as combinatorial library) of polynucleotides, each including the same ligand binding site and different (e.g., randomly differing) flanking sequences, are provided.
- the mixture of polynucleotides can be screened against a limiting concentration of the ligand, and polynucleotide sequences which preferentially bind to the ligand (or do not bind to the ligand) can be selected, and (preferably) are then sequenced to determine an appropriate flanking sequence(s).
- polynucleotide sequences which preferentially bind to the ligand (or do not bind to the ligand) can be selected, and (preferably) are then sequenced to determine an appropriate flanking sequence(s).
- Example 1 infra, describes the preparation of a plurality of
- DNA sequences each sequence included a binding site for the restriction enzyme BamHI, flanked on each end by a random polynucleotide sequence 40 bases in length.
- the mixture of sequences was titrated with a known concentration of BamHI, and those sequences which bound most strongly to BamHI were selected (in this example, by gel shift assay and recovery of the shifted bands). Similarly, the poorest-binding sequences were selected. Certain of the selected polynucleotides were then sequenced to determine the flanking sequences which conferred increased or decreased ligand-binding ability on the ligand binding site.
- the ability of a polynucleotide sequence to bind to a ligand is believed to be related to the stability of the polynucleotide sequence. It is further believed that the ability of a polynucleotide sequence to bind to ligands is at least largely independent of the ligand selected. Thus, a flanking sequence which destabilizes the binding of any one ligand to a ligand binding site adjacent the flanking sequence, will also destabilize the binding of other ligands to the ligand binding site. Thus, the particular ligand selected for use according to the methods of the invention, to determine the ability of a flanking sequence to affect ligand binding, is a matter of convenience and design choice which will be routine to one of ordinary skill in the art.
- flanking sequences which confer particular energetic or reactivity attributes upon a neighboring sequence will have many potential uses.
- flanking sequences can be selected to promote binding to or reaction with a ligand, such as an RNA or DNA binding protein, a polymerase, a reverse transcriptase. a telomerase, a helicase, a transcription factor, and the like.
- a ligand such as an RNA or DNA binding protein, a polymerase, a reverse transcriptase. a telomerase, a helicase, a transcription factor, and the like.
- the invention provides methods for selecting flanking sequences which can be used in vivo, e.g., to study the interaction of ligands and nucleic acids, or to provide improved probes or primers for PCR amplification, and the like.
- flanking sequences of a polynucleotide can also be provided in a non- random manner.
- a flanking sequence can be provided , e.g., by oligonucleotide chemical or biochemical synthesis, to provide a flanking region of any known sequence.
- This flanking sequence can then be tested to determine the effect on ligand binding or sequence stability.
- One particularly preferred practice of the invention involves the construction of a plurality of oligonucleotides, each including a ligand binding site flanked by at least one flanking sequence which has a known, repeating motif. The effect on stability of each flanking sequence can then be assayed, e.g., as described herein.
- This embodiment of the invention is useful in constructing sequence reactivity data compilations, e.g., a database, which quantifies the effect on stability of any possible flanking sequence (see infra).
- the invention provides methods for determining polynucleotides which are more (or less) prone to mutation.
- the methods of the invention can be used to determine regions of a polynucleotide sequence, including a gene, which are more (or less) likely to mutate, e.g., in response to a selection pressure on the organism.
- the methods of the invention are useful, e.g., for determining which portions of a gene are optimal targets for design of probes, e.g., for the detection of the presence of a microorganism in a biological sample.
- a probe which is complementary to a portion of bacterial nucleic acid can be used to detect the presence of the bacterium in a biological sample, e.g., to detect bacterial infection, as is well known in the art.
- the probe will no longer bind (or will bind with decreased affinity) to the nucleic acid of the mutated bacterium, thus rendering detection of the bacterium more difficult.
- a probe be designed to be complementary to a portion of the bacterial nucleic acid which is stable, e.g., less prone to mutation. Mutations of the bacterium are less likely to occur in a stable region, and therefore, the probability that the probe will be rendered useless by subsequent mutation is decreased.
- destabilization of a portion of a polynucleotide sequence can result in increased mutation of nucleic acid sequences which are remote from the destabilized portion of the polynucleotide.
- insertion of a destabilized, A-T rich region of polynucleotide sequence into a gene can result in significant increases in mutation rate of regions of the gene as much as 200-250 bases awav from the destabilized region. compared to the wild-type gene, when the gene is inserted into a bacterial host cell which is then subjected to mutational pressure.
- the methods of the invention are also useful for determining functionally important portions of a protein which is encoded by a nucleic acid sequence.
- nucleic acid sequences which code for critical residues or regions of the protein will reside in regions of the gene which are relatively resistant to mutation, i.e., in stable regions of the gene, to avoid deleterious mutations which would decrease or abolish the desired function of the protein.
- the stability of regions of a gene e.g., by determining the effect on stability of flanking sequences, critical residues of the encoded protein can be identified.
- the methods of the invention can be used to determine or predict regions of a protein or polypeptide which are antigenically important.
- nucleic acids sequences can often code for a single polypeptide. Routine computational methods allow the determination of the free energy of each nucleic acid sequence which encodes a selected polypeptide. For a given polypeptide, the energy of a naturally-occurring coding sequence can be compared to the energies of all possible nucleic acids which could code for that polypeptide, to determine unusually stable or unstable coding sequences.
- Example 3 by identifying regions of the coding nucleic acid sequence where a coding nucleic acid is unusually stable or unusually unstable with respect to the possible range of nucleic acid sequences that could code for a given polypeptide sequence, important regions of the polypeptide can be identified.
- a particular "window" of length n bases e.g., 15 bases in Example 3
- the window can be moved through the entire coding sequence, or any portion thereof, to determine the free energy of the polynucleotide window subsequence.
- free energies of window subsequences of other potential (generally non-natural) coding sequences can be determined, and the wi: ir w free energies for the naturally-occurring (or test) coding sequence can be compared to the corresponding window free energies from the potential coding sequences.
- Unusually stable regions of the test sequence, compared to the potential coding sequences, can then be identified.
- the identified regions of the polypeptide (or the coding nucleic acid) can then be altered to provide polypeptides with altered properties (e.g., increased or decreased antigenicity) according to a variety of methods, some of which are known in the art.
- the invention provides methods for modulating (e.g., increasing or decreasing) the relative or absolute susceptibility to mutation of a polynucleotide sequence.
- the method includes the step of providing a polynucleotide sequence in which a portion of the polynucleotide sequence has been stabilized (or destabilized) relative to a control polynucleotide sequence, such that the relative or absolute susceptibility to mutation of a polynucleotide sequence is modulated.
- the invention provides means for stabilizing or destabilizing a polynucleotide sequence with respect to mutational susceptibility.
- a polynucleotide sequence can be stabilized, e.g., to prevent mutation when the polynucleotide sequence is inserted into a host cell or organism, or destabilized, e.g., to promote mutation.
- the addition of destabilizing regions of nucleic acid into a gene can provided increased rates of mutation when the gene is inserted into an organism.
- This method can thereby provide a method for producing non-naturally occurring nucleic acid sequences (and proteins encoded by them); this is form of "directed evolution" in that a particular gene or portion thereof can be targeted for mutation without increasing the propensity for mutation of other regions of the genome.
- Such non-naturally occurring proteins can be assayed to determine properties such as binding specificity, binding affinity, rate of catalysis of a reaction, and the like, to identify proteins which have desirable characteristics.
- the methods of the invention can be used to speed the process of preparing and selecting mutant proteins.
- the methods of the invention can be used to increase the stability of a sequence and thereby decrease the mutation frequency of the sequence.
- a nucleic acid sequence, and the protein encoded thereby can be "protected” to prevent mutations, e.g., by altering the coding nucleic acid sequence, or a nearby (e.g., flanking) sequence to increase stability of the coding sequence.
- the invention also provides methods for de ir yiining sequence stability of a test polynucleotide sequence by comparing the test sequence with sequences having a known stability, and thereby determining the stability of the test sequence.
- a "database" of sequence stability values can be constructed, e.g., using the methods described herein.
- the invention provides methods for selected, from a plurality of polynucleotides.
- flanking sequences which confer increased (or decreased) thermodynamic stability on a nucleic acid binding site.
- the results can be stored in a database. It will be appreciated that for a polynucleotide sequence, such as a flanking sequence, of length n bases, there will be 4 n possible polynucleotides. For sequences of length greater than about 10 bases, the number of possible polynucleotides becomes large, and experimental determination of the energies of all possible sequences may not be practical. It is therefore desirable to reduce the number of sequences which must be tested, while still providing sufficient data to permit the determination of the energy of any test sequence with reasonable accuracy.
- flanking sequence i.e., more homogeneous flanking sequences (e.g., flanking sequences which have repeated subsequence motifs) are better able to "transmit" energy to the sequence of interest. This relationship can be expressed as
- the free energy, ⁇ G j can be readily computed, e.g., by using known values and methods. Therefore, the ability of a flanking sequence to stabilize (or destabilized) a sequence of interest can be evaluated by determining the effect of sequence homogeneity on reactivity for a variety of sample flanking sequences. It is believed that conformational variability is a common, although heretofore unappreciated, feature of nucleic acids such as DNA. Studies have indicated that small (GC)g DNA segments can influence the overall structure of sequences of DNA as large as one thousand bases in length (e.g.. Kirn et al., Biopolymers 33: 1725-1745 (1993); see also Example 4. infra)).
- GC small
- flanking sequences are constructed to have a pre-determined amount of sequence homogeneity (e.g., repeated subsequence motifs).
- sequence homogeneity e.g., repeated subsequence motifs.
- the ability of a ligand binding site flanked by such a flanking sequence can be determined, e.g.. by methods described herein, e.g., the binding assay described in Example 1. infra.
- the effect of sequence homogeneity on reactivity can then be determined to provide a database of sequence reactivity values.
- the methods described in Example 1 and Example 2 herein can advantageously be combined to better determine the effect of any given flanking sequence on the binding or reactivity characteristics of a ligand binding site.
- the method of Example 1 can be used to explore flanking sequence effects by sampling a population (or at least a random subpopulation) of all possible flanking sequences to determine those sequences which most increase, or decrease, the reactivity of ligand binding site for a ligand.
- the method of Example 2 can be sued to explore flanking sequence effects in a systematic fashion, by preparing any desired number of flanking sequences which have known amounts of sequence homogeneity.
- the methods are complementary, and can be used in tandem to provide greater information about flanking sequence effects.
- FIG. 4 A schematic diagram illustrating one embodiment of a method of selecting flanking sequence(s) which affect the relative reactivity of binding site flanked by such sequence(s) is shown in Figure 4.
- the ligand can be. e.g., a protein which binds to a nucleic acid, e.g., an enzyme, e.g., a restriction enzyme.
- the ligand can be the endonuclease BamHI, which binds to the sequence 5'-GGATTC-3'; binding sites for other ligands can be employed as is known in the art.
- the binding site is flanked in both directions by A (to desymmetrize the construct), and then a 40-base long random insert is provided in both the 5' and 3' directions (longer or shorter random sequences can be employed if desired, e.g., to study the effect of remote flanking sequences on binding site reactivity).
- PCR primer sites were provided to permit amplification of the construct, if desired.
- Each PCR primer site included an EcoRI restriction site.
- the constructs used in this example were synthesized on an automated DNA synthesizer, (although other synthesis methods can be used). During the synthesis, the synthesizer was programmed to provide a mixture of each of the four nucleotide A, G. C.
- the population of constructs was amplified with PCR under standard conditions and purified by polyacrylamide gel electrophoresis (PAGE), followed by elution into buffer including 50mM NaCl and 50mM Tris-HCl (pH 8.0). The result was a population of polynucleotide duplexes (the PCR reaction provided the complementary strand). It was determined through appropriate control experiments that the synthesis and PCR amplification of the duplexes resulted in correct binding sites for BamHI.
- Lanes 2 through 10 show the result of incubating duplex polynucleotide aliquots with varying amounts of BamHI as indicated in the legend. It was found that the ratio of shifted to unshifted molecules increased with increasing BamHI concentration, as expected. The duplexes having highest affinity for BamHI should be shifted at relatively low BamHI concentrations, while lower-affinity duplexes will begin to bind as the BamHI concentration is increased.
- Samples of shifted and unshifted duplexes were then digested with 200 units of EcoRI per microgram of duplex polynucleotide.
- the polynucleotides were cleaved at the EcoRI recognition sites, purified by PAGE, and ligated into Lambda ZAp vector predigested with EcoRI and treated with CIAP at 1 : 1 insert:vector ratio in the presence of 2U of T4 ligase in 5 microliters of T4 ligase buffer at 40°C overnight.
- the ligated samples were then packaged using Gigapack II Gold packaging extract (Stratagene) and cloned into E. coli XL 1 -Blue host strain and sAje ;ted to blue/white selection.
- the recombinant (white) clones are selected and eluted in 500 microliters SM buffer (100 mM NaCl. 8 mM MgS0 4 . 50 mM Tris-HCl, pH 7.5, 0.01% gelatin, 0.04% chloroform).
- Ten microliters of the eluate was amplified by PCR using T3/T7 primers, purified by Qiagen PCR purification kit and sequenced. * 5 "*"* 1 PCT/US99/03754
- methods such as the methods described above can be used to generate compilations of data for the prediction of the reactivity of a potential binding site for a ligand based upon sequences which flank the ligand binding site.
- methods such as the methods described above can be used to generate compilations of data for the prediction of the reactivity of a potential binding site for a ligand based upon sequences which flank the ligand binding site.
- Example 2 This Example described a method for providing nucleotide sequences that represent all sequence possibilities. A minimal number of nucleotide repeat sequences of any specified length are created that represent all sequence possibilities using the nucleotides A, G, C, and T.
- a "pure repeat” is a repeating polynucleotide (e.g., DNA) sequence for which all base positions are defined.
- a pure dinucleotide repeat is (AG) n , where n is an integer and A and G are the defined nucleotides. Pure trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, and higher repeating units are also possible.
- An “impure repeat” is a repeating polynucleotide (e.g.. DNA) sequence in which random, non-repeating nucleotides may occur.
- an impure trinucleotide repeat is (AGX) n , where n is an integer and X is a random nucleotide. Impure trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats, and higher repeating units are also possible.
- flanking sequences can be u ⁇ c 1 to probe flanking sequence reactivity, e.g.. in a binding experiment analogous to the experiment described in Example 1, supra.
- the flanking sequences instead of random flanking sequences surrounding a BamHI binding site, the flanking sequences include pure or impure repeat units. Synthesis of a population of duplex polynucleotides having pure dinucleotide repeats, pure trinucleotide repeats, impure dinucleotide repeats, impure trinucleotide repeats, and the like, can be performed according to standard methods, e.g., on a DNA synthesizer.
- the standard deviation (SD) of DNA free energy for a protein shared by different organisms was studied.
- SD standard deviation
- the calculation of the SD of energy for a gene starts with extracting the coding sequences for the protein of interest and translating them into their respective protein sequence. These sequences are aligned simultaneously to provide a representation of the protein sequences containing gaps which can be used for intersequence comparison.
- a consensus protein sequence is determined.
- the energy profile for each of the DNA sequences is calculated using known hydrogen-bonding and base stacking free energies over an interval or window of the DNA sequence. This window is then moved one base step along the sequence and a new free energy value is calculated for the new window. This procedure is continued for the entire length of the DNA.
- the energy profiles are then gapped according to the gapped protein sequences.
- the SD and mean free energy values are calculated.
- the distribution of energies from the permutations is calculated for each base posiA"- . Upper and lower confidence limits for each base position are then calculated.
- Regions where the mean free energy exceeds the confidence limits are tabulated.
- the confidence limit outliers are cross-correlated with the positions of low SD. Regions of cross-correlation are prime candidates for antigenic and nonantigenic response for the protein of interest.
- the analysis of the lactate dehydrogenase genes from bacteria along with some data from the mouse beta globin gene and HTLV-I LTR gene are presented.
- the distribution of free energy states available to the DNA as determined by the amino acid sequence is calculated for each position along the mouse beta globin gene with a 15 base window and defined in discrete energy ranges. A representation of this data is shown in Figure 7. Each distribution represents the probability of finding the DNA at any particular energy.
- LDH low-density gene from three different bacilli were calculated using the method described above.
- the low temperature bacillus has a larger number of points above the upper 95% confidence limit.
- the ambient temperature bacillus has a number of points exceeding the upper and lower 95%> confidence limits which is between the number of points found for the high and low temperature bacilli.
- the mutation rate is dependent is the DNA stability. Too high a mutation rate may kill the organism due to lethal mutations and too low of a mutation rate would not allow the organism to adapt to environmental changes thereby reducing the survivability of the organism.
- lactate dehydrogenase was chosen since it is a very common gene, there is a large database of information about the protein structure and antigenicity and many gene sequences are available from Genbank. The search for gene sequences was limited to gram positive bacteria and provided 27 unique complete lactate dehydrogenase gene sequences for the analysis. The methodology described above for determining the SD of energy and the unusually high and low energy positions was applied to this set of gene sequences.
- Positions along the DNA which have a mean free energy deviation that are higher or lower than the 95% confidence limits are designated as + 1 or -1. respectively. All points which fall within the 95% confidence limits are assigned a 0.
- the result is plotted in the second line graph of Figure 10. From a total of 400 points (amino acid positions) 22 exceeded the 95% confidence limits.
- the antigenic and nonantigenic sites relative to the consensus gapped DNA determined by Hogrefe et al., J.B.C., 1987 for mouse lactate dehydrogenase in. rabbits are displayed in the two bottom graphs of Figure 10.
- the vertical bars around base positions 600 and 825 denote the only regions where we find low SD of energy and the mean free energy deviation exceeds the 95% confidence limits. In fact, the region with the unusually stable sequence at 600 is nonantigenic and the region at 825 with an unusually unstable sequence is the second most antigenic region in the protein.
- regions of the coding DNA that have a low SD of energy and are stable with respect to the overall possible range of energy states available for the peptide code for nonantigenic regions of the protein. In other words, these regions of the protein look like "self and thus do not induce an antigenic response. Conversely, regions of the coding DNA which have a low SD of energy and are unstable with respect to the overall possible range of energy states available for the peptide code for antigenic regions of the protein. Based on the consensus sequence determined from a set coding sequences for a particular protein from different organisms and the statistical variation of the free energy of the DNA, it is possible to predict which peptides will be the most likely candidates for eliciting an antigenic response. One advantage of this technique is the short time that is necessary to determine the most effective peptide candidates for inducing an antigenic response.
- the stable regions exhibit 2/3 of their of GC substitution bias in the 3rd position of the codon. Those that are unstable exhibit 50% of their GC substitution in the 3rd position.
- the stable regions appear to have a higher prevalence of GC substitutions than the unstable regions which appear to be more like a normal DNA sequence. Nevertheless, the stable regions with the higher rate of substitution only change their amino acid, i.e. synonymous 12.5% of the time, whereas the unstable regions make non synonymous substitutions 27% of time.
- the unstable regions look more like normal or random DNA than the stable regions which have a higher CG substitution which does not change the amino acid identity.
- a section of a gene (“Wild Type Sequence”; the sequence is shown in Figure 1 1) was replar a ⁇ with a DNA sequence which codes for the same polypeptide sequence, but which is richer in A and T than the wildtype sequence, and therefore is energetically less stable than the wildtype( "Unstable Insert Sequence” shown in Figure 11).
- the altered gene was inserted into a plasmid and used to transform E. coli.
- the wildtype sequence was also inserted into a plasmid and used to transform E. coli as a control.
- the plasmids included genes to permit blue/white selection of bacteria (e.g., see Example 1, supra).
- Transformed bacteria were then subjected to chemical mutagenesis conditions to cause mutations in the sequences.
- the level of mutagenesis was monitored by scoring the number of white colonies that result (white colonies form if the lacZ alpha- complementation, also carried on the plasmids, becomes mutated to the point where it is no longer functional. Isolated blue colonies were selected for expansion and the plasmid isolated from the resulting colony. This plasmid was then used to transform stock bacteria and the cycle of mutagenesis and plasmid isolation was repeated for a total of four cycles.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU33049/99A AU3304999A (en) | 1998-02-21 | 1999-02-19 | Thermodynamic properties of nucleic acids |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US7563398P | 1998-02-21 | 1998-02-21 | |
US60/075,633 | 1998-02-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1999042621A2 true WO1999042621A2 (fr) | 1999-08-26 |
WO1999042621A3 WO1999042621A3 (fr) | 2000-03-16 |
Family
ID=22127047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/003754 WO1999042621A2 (fr) | 1998-02-21 | 1999-02-19 | Proprietes thermodynamiques d'acides nucleiques |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU3304999A (fr) |
WO (1) | WO1999042621A2 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999063074A3 (fr) * | 1998-06-04 | 2000-04-06 | Tm Technologies Inc | Methode permettant de modifier le niveau d'expression relatif d'un quelconque gene; procedes et produits connexes |
WO2001037191A3 (fr) * | 1999-11-19 | 2002-03-21 | Proteom Ltd | Procede de manipulation de donnees de sequences de proteines ou d'adn servant a generer des ligands peptidiques complementaires |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2112130C (fr) * | 1991-06-27 | 1996-08-06 | Cynthia A. Edwards | Analyse de detection de molecules de fixation de l'adn |
CA2165163A1 (fr) * | 1993-06-17 | 1995-01-05 | Michael J. Lane | Thermodynamique, conception et utilisation de sequences d'acide nucleique |
JP2001523952A (ja) * | 1997-02-24 | 2001-11-27 | ティーエム テクノロジーズ,インク. | アンチセンス・オリゴヌクレオチドを選択するプロセス |
WO1999032664A1 (fr) * | 1997-12-23 | 1999-07-01 | Tm Technologies, Inc. | Procede de selection de sequences adjacentes porteuses d'affinites de liaison relatives a un site de liaison aux ligands |
-
1999
- 1999-02-19 AU AU33049/99A patent/AU3304999A/en not_active Abandoned
- 1999-02-19 WO PCT/US1999/003754 patent/WO1999042621A2/fr active Application Filing
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999063074A3 (fr) * | 1998-06-04 | 2000-04-06 | Tm Technologies Inc | Methode permettant de modifier le niveau d'expression relatif d'un quelconque gene; procedes et produits connexes |
WO2001037191A3 (fr) * | 1999-11-19 | 2002-03-21 | Proteom Ltd | Procede de manipulation de donnees de sequences de proteines ou d'adn servant a generer des ligands peptidiques complementaires |
US6721663B1 (en) | 1999-11-19 | 2004-04-13 | Proteom Limited | Method for manipulating protein or DNA sequence data in order to generate complementary peptide ligands |
Also Published As
Publication number | Publication date |
---|---|
WO1999042621A3 (fr) | 2000-03-16 |
AU3304999A (en) | 1999-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cech et al. | Secondary structure of the Tetrahymena ribosomal RNA intervening sequence: structural homology with fungal mitochondrial intervening sequences. | |
Harris et al. | Use of photoaffinity crosslinking and molecular modeling to analyze the global architecture of ribonuclease P RNA. | |
Wells et al. | The role of DNA structure in genetic regulatio | |
Hanvey et al. | Site-specific inhibition of Eco RI restriction/modification enzymes by a DNA triple helix | |
Su et al. | Mispair specificity of methyl-directed DNA mismatch correction in vitro. | |
Lowary et al. | New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning | |
Wang et al. | The actinomycete Thermobispora bispora contains two distinct types of transcriptionally active 16S rRNA genes | |
Komine et al. | Genomic organization and physical mapping of the transfer RNA genes in Escherichia coli K12 | |
Nielsen et al. | Sequence specific inhibition of DNA restriction enzyme cleavage by PNA | |
Woese et al. | Secondary structure model for bacterial 16S ribosomal RNA: phylogenetic, enzymatic and chemical evidence | |
EP2245187B1 (fr) | Procédés pour déterminer de façon précise des données de séquence et une position de base modifiée | |
US5556747A (en) | Method for site-directed mutagenesis | |
US6242222B1 (en) | Programmed sequential mutagenesis | |
Blackburn et al. | DNA termini in ciliate macronuclei | |
Westhof et al. | Mapping in three dimensions of regions in a catalytic RNA protected from attack by an Fe (II)-EDTA reagent | |
Yoshizawa et al. | Nuclease resistance of an extraordinarily thermostable mini-hairpin DNA fragment, d (GCGAAGC) and its application to in vitro protein synthesis | |
Zito et al. | Lead-catalyzed cleavage of ribonuclease P RNA as a probe for integrity of tertiary structure | |
Nieuwlandt et al. | The RNA component of RNase P from the archaebacterium Haloferax volcanii. | |
Qian et al. | Structural alterations far from the anticodon of the tRNAGGGProof Salmonella typhimurium induce+ 1 frameshifting at the peptidyl-site | |
Jilk et al. | The organization of the outside end of transposon Tn5 | |
Hansen et al. | Structural isomers of bis-PNA bound to a target in duplex DNA | |
Sayre et al. | TF1, the bacteriophage SPO1-encoded type II DNA-binding protein, is essential for viral multiplication | |
Munishkin et al. | Systematic Deletion Analysis of Ricin A-Chain Function: SINGLE AMINO ACID DELETIONS (∗) | |
Senger et al. | The presence of a D-stem but not a T-stem is essential for triggering aminoacylation upon anticodon binding in yeast methionine tRNA | |
WO1999042621A2 (fr) | Proprietes thermodynamiques d'acides nucleiques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
NENP | Non-entry into the national phase |
Ref country code: KR |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |