Methods and adaptors for analyzing specific nucleic acid populations
TECHNICAL FIELD
The present invention relates to methods and kits for generating or analyzing nucleic acid populations, and enriching desired nucleic acid sequences or sub-sets of populations, by use of repeated hybridization and cleavage reactions.
BACKGROUND ART
Various comparative nucleic acid techniques are available to analyze differences in nucleic acid populations. One widely known technique is referred to as "representational difference analysis" (RDA) ; see, for example, U.S. Pat. No. 5,436,142 and Lisitsyn, et al., Science, 259:946 (1993). RDA is a subtractive hybridization method that uses restriction digestion of genomic DNA, followed by amplification and selection methods to isolate molecules that are present in one nucleic acid population, but not in a second nucleic acid population.
Other RDA-based and related methods are disclosed by Birkenmeyer et al . (U.S. Pat. No. 6,455,255) "Method of performing subtractive hybridization using RDA"; Hou, et.al. Nucl . Acids Res. 24: 2196 (1996); Sorge et.al US patent 6,017,701; Suzuki, et.al. Nucl. Acids Res. 24: 797 (1996); Burmer, et.al US patent 5,935,788. For a review, see 0. D. Ermolaeva et al., Genetic Anal.: Biomol. Eng. 13:49-58 (1996).
Generally speaking these methods may require multiple steps, numerous costly reagents and may be relatively slow. Thus, there remains a need in the art for new and improved methods to generate specific populations of nucleic acids as used, for example, in producing subtracted libraries .
DISCLOSURE OF THE INVENTION
The present invention relates to novel methods of enriching target nucleic acids present in a sample. In the description of the invention, the following terms are used for brevity:
A nucleic acid sample containing or believed to contain one or more target nucleic acids may be referred to as the "tester" sample.
A nucleic acid sample which is believed to specifically lack a target nucleic acid, or include it only at a relatively low level is referred to as the "driver".
It should be noted that these terms are used relatively in respect of particular target nucleic acid levels. If different targets are present at different levels in two samples then each may be a "tester" for the nucleic acids it contains in excess.
Hybrid double stranded nucleic acids formed after combining the tester and driver nucleic acid samples are denoted, for example, "driver/driver", "tester/driver", and "tester/tester".
Nucleic acid complexes formed after covalently attaching adaptors to the tester and driver nucleic acid samples are denoted, for example, "adaptorA: tester", and "adaptorB : driver" .
Summary of the invention
Briefly, the present invention is based on the use of novel adaptors which may be used, for example, to preferentially enrich target nucleic acids represented only or primarily in a tester sample by allowing the tester sample to depleted of the nucleic acids which are also present in a driver sample. The methods thus permit the preferential amplification of nucleic acids, in a manner in which physical separation techniques or other labour-intensive processes may be avoided.
Thus in a first aspect the invention provides a method for enriching target nucleic acids present in a tester sample, the method comprising:
(a) providing tester and driver nucleic acid samples;
(b) covalently attaching a first adaptor to said tester nucleic acid such as to form adaptor : tester nucleic acid complexes;
(c) covalently attaching a second adaptor to said driver nucleic acid such as to form adaptor .-driver nucleic acid complexes;
(d) combining said adaptor : tester nucleic acid complexes and said adaptor : driver nucleic acid complexes to form a combined sample;
(e) subjecting said combined sample to denaturing and annealing conditions such that hybrid non-target double-stranded adaptor : tester/adaptor : driver nucleic acid- complexes are formed;
(f) adding restriction enzymes capable of preferentially digesting said hybrid non-target nucleic acid complexes and incubating under restriction digestion conditions;
(g) optionally repeating steps (e) and (f) .
In order to consider the steps in more detail, it is convenient to refer to the first adaptor (used with the tester nucleic) as adaptor A, and the second (used with the driver nucleic) as adaptor B. It will be appreciated that these designations are used for ease of explanation only, and not intended to be limiting. The structure of the adaptors is discussed in more detail hereinafter.
Preferably the nucleic acid samples of step (a) are provided as restriction-enzyme treated samples since this may facilitate the ligation of the adaptors.
In steps (b) and (c) , adaptor A is covalently attached to the ends of nucleic acids in a tester nucleic acid sample, and adaptor B is covalently attached to the driver nucleic acid sample. Any convenient method for covalently attaching of the adaptors onto the ends of the tester and driver nucleic acid sample (fragments) can be employed. Ligation (for example using a ligase) is a preferred method for covalently attachment. Such ligation methods are known to and used by those of skill in the art.
An alternative method to achieve covalent attachment of the desired adaptor to the nucleic acid sample is a combination of ligation of an 'ordinary' adaptor to nucleic acid sample, amplification and exonuclease digestion, which turn an ordinary adaptor : sample complex to the desired adaptor : sample complex. As discussed
below, an ordinary adaptor is a precursor adaptor having a simpler (usually double stranded) structure than the adaptors used in the preferential elimination\protection steps of the present invention, but which can be converted into a preferential elimination\ protection adaptor by enzymatic activity - for example by exonuclease activity.
Thus in one embodiment, an ordinary adaptor may be ligated to the driver sample. The ligated driver sample is then amplified by PCR using primers having a sequence identical or subsequently identical to the up-strand of the ordinary adaptor. The amplified driver nucleic acid is treated by an exonuclease that acts in the 5' to 3' direction, catalyzing the removal of 5 'nucleotides from duplex DNA. This treatment with exonuclease removes some nucleotides from 5 ' ends and leaves 3' overhangs. The resulting adaptor : driver complex is functionally identical to the ligation complex of the adaptor B in Fig. IB and driver sample. The exonuclease used herein can be, but is not limited to, lamda exonuclease and T7 exonuclease. If lamda exonuclease is used, the 5' ends of amplified product or primers used for amplfication are 5' phosphorylated. To prevent removal of too many nucleotides from 5' ends, blockage nucleotides may be incorporated into the double stranded DNA. One method for adding blockage nucleotides is to modify the amplification primer by synthesizing the 3 ' part of primer with phosphorothioated nucleotides that are resistant to exonuclease digestion. Another method is to add phosphorothioated nucleotides into the nucleotide mix during the PCR amplification. Any other method having similar effect known to those skilled in the art can be used.
In another embodiment, an 'ordinary' adaptor may be ligated to the tester sample. The ligated tester sample is then amplified by PCR using primers having a sequence identical or subsequently identical to the up-strand of the ordinary adaptor. The amplified tester nucleic acid is treated by an exonuclease that acts in the 3' to 5' direction, catalyzing the removal of 3 'nucleotides from duplex DNA. This treatment with exonuclease removes some nucleotides from 3 ' ends and leaves 5' overhangs. The resulting adaptor : tester complex is functionally identical to the ligation complex of the adaptor A in Fig. IB and tester sample. The exonuclease used herein can be, but is not limited to, exonuclease III, 3' -5' exonuclease activity
of any DNA polymerase.
The denaturing condition in steps (d) and (e) can be any suitable conditions that are known in the art, such as alkali treatment or high temperature 94-100 °C . The high temperature treatment is preferred.
In one embodiment, the adaptorA: tester nucleic acid complexes and adaptorB: driver nucleic acid complexes are combined at a ratio of about 1:1, and this may be preferred where the nucleic acid samples both contain target nucleic acids which are to be simultaneously isolated. In another embodiment, the adaptorA: tester nucleic acid complexes and adaptorB : driver nucleic acid complexes are combined at a ratio of 1:5 to about 1:500 and, more preferably, at a ratio of about 1:10-100.
Typically, the hybridization reaction will be carried out at high stringency conditions, usually at about 60-70°C and generally at least about equivalent to or greater than about 0. IM NaCl, usually about IM NaCl. The various buffers and salt concentrations used can be adjusted to achieve the necessary stringency using techniques known to those of skill in the art.
The effect of steps (d) and (e) is to produce a combined sample which includes a mixture of nucleic acid species such as the original double-stranded adaptorA: tester nucleic acid complexes and double-stranded adaptorB: driver nucleic acid complexes; single stranded adaptorA: tester nucleic acids; single stranded adaptorB: driver nucleic acids; and hybrid (non-target) double- stranded adaptor : tester/adaptor : driver nucleic acids.
In step (f) , restriction enzymes capable of specifically digesting complexes containing double stranded hybrid adaptorA/adaptorsB, preferentially over other double stranded nucleic complexes are added. These therefore digest adaptor regions of the double- stranded adaptorA: tester/adaptorB : driver nucleic acid complexes. The tester homoduplex and driver homoduplex double stranded nucleic complexes do not contain functional restriction sites are resistant to cleavage. The single stranded tester and driver nucleic acids
are also resistant to cleavage. The incubation conditions used will be appropriate to the restriction enzymes in question, and those skilled in the art will be able to select such conditions, based on the choice of enzyme, without difficulty.
Regarding step (g) , it is preferred that the process of denaturing, annealing and digesting is repeated many times, preferably in a single, homogeneous, sample mixture. Generally, the process is repeated 2 to 10 times, and 3 to 6 times are preferred. If the restriction enzymes are inactivated during denaturing process, new aliquots of the restriction enzymes are added each time during digestion. Preferably thermo-stable restriction enzymes are used, to minimise the need for this.
One effect of steps of (f) and (g) is that non-target sequences are gradually incorporated into hybrids and preferentially digested, while target adaptor : tester nucleic acid complexes (which cannot readily form hybrids) remain intact. The target adaptor : tester nucleic acid complexes are predominantly present as double-stranded homoduplex because of the annealing process. However single- stranded target adaptor : tester nucleic acid complexes may be present, especially in instances where the target nucleic acid is present at a low concentration in the sample. Both the homoduplex double- and single- stranded intact adaptor nucleic acid complexes enriched may be referred to herein as "desired nucleic acids".
Amplification of targets
In a preferred further step (h) , the (undigested) double- and single-stranded adaptor : tester nucleic acid complexes are amplified by known methods such as PCR. Alternatively, although this is less preferred, other species may be amplified e.g. (undigested) double- and single-stranded adaptor : drive .
Generally, the desired target nucleic acid can be directly amplified from the hybridization and digestion mixture by using primers capable of hybridising to priming portions of the adaptors on the target nucleic acid of interest.
Preferably such priming portions are preferentially or specifically
excised from hybrid non-target nucleic acid complexes in step (f) such as to prevent the amplification thereof. Such priming portions are described in more detail below.
A variety of primer-dependent polynucleotide amplification techniques may be used for amplification. Such techniques include strand displacement amplification, 3SR amplification, and the like. The polymerase chain reaction (PCR) is particularly preferred for amplifying the desired nucleic acid. The polymerase chain reaction is described in, among other places, Diffenbach and Dveksler, PCR Primer Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1995) and U.S. Pat. Nos. 4,683,202; 4,683,195; 4,800,159; 4,965,188; and 5,333, 675.
Alternatively or additionally, before ampli ication the desired target nucleic acid may be captured to a solid support by using a biotinylated moieties tagged on the adaptors. It will be readily apparent to those skilled in the art that in addition to biotin/streptavidin capture, other ligand/anti-ligand capture methods can be employed to isolate the target nucleic acid of interest.
In preferred embodiments the candidate target sequence is then retested for its presence or absence in tester and driver, using conventional means e.g. hybridisation or amplification using the target sequence as the basis for a probe or primers.
The sequence information contained in nucleic acid subset obtained by performing steps (a) to (h) can itself, either as a collection of clones, purified molecules, or as a nucleotide sequence, be useful for research applications. Furthermore, desired nucleic acids or populations can be incorporated into appropriate vectors to produce libraries. One skilled in the art is familiar with many examples of appropriate vectors and the use of these vectors for producing libraries.
Some particular embodiments and aspects of the method will now be further described:
The adaptors
The adaptors of the present invention are designed to preferentially eliminate undesired nucleic acids prior to an amplification step. More specifically when covalently attached to nucleic acids, the adaptors function to protect desired target nucleic acid complexes from cleavage by restriction enzymes while other nucleic acid complexes are rendered susceptibility to cleavage by selected enzymes. The protected target nucleic acids can then be preferentially amplified.
Preferably, the methods employ adaptors comprising cleavage sites, such as a restriction enzyme sites. Usually two adaptors (termed herein, for convenience only, Adaptor A and Adaptor B) are covalently attached respectively to the tester and driver nucleic acids .
Each adaptor comprises two strands, termed an "up-strand" and "down-strand" (Fig. 1) . The up-strand also sometimes is referred to as "first strand"; the down-strand also sometimes is referred to as "second strand". As depicted non-limitingly in Fig. 1, the up- strand has its 5 ' • end to the left and 3' end to the right; the down-strand has its 3' end to the left and 5' end to the right.
The adaptors may contain sufficient double-stranded sequence to permit them to be ligated to the sample nucleic acids.
Alternatively, the covalent attachment of the desired adaptor to the sample nucleic acids is achieved by the combination of ligation of an ordinary adaptor, amplification and exonuclease digestion. As described below, each adaptor also comprises at least one strand having a "complementary portion", the function of which is to generate a functional double stranded restriction site in hybrid complexes. At least one strand of one adaptor (generally Adaptor A) also has a "priming portion" the function of which is to facilitate specific amplification of intact double- and single- stranded adaptorA: tester nucleic acid complexes, but to be cleaved from hybrid complexes. It will therefore generally be in a 5' direction to a restriction site in the adaptor - in Figl it is at the 5' end of the up-strand.
Complementary portion
The "complementary" portion present in at least one strand in each adaptor comprises a restriction enzyme site sequence which is the target of a restriction enzyme having double-stranded nucleic acid specificity.
The restriction site in the adaptor is non-functional, therefore can not be cleaved by the corresponding enzyme. However hybridisation of the complementary portions of two different adaptors creates a functional restriction site. In practicing the invention, restriction enzymes having double-stranded nucleic acid specificity are used, thus any design that affects the double strandedness may render the restriction site non-functional. This includes, but is not limited to, the following embodiments. Those skilled in the art may easily recognise that other similar design is possible.
Thus in one embodiment, this region of each adaptor may be single stranded, such that the restriction site is non-functional (see e.g. Figure IB) .
Alternatively the complementary portion of the adaptor may be double-stranded but contain one or more mismatches in the region of the restriction enzyme site i.e. the complementary portion of the adaptor is not 100% complementary, but is substantially complementary, and capable of forming a double stranded portion. Nevertheless, the mismatches in or surrounding the restriction enzyme site prevent the adaptor from being cleaved by the corresponding restriction enzyme.
In one embodiment, complementary portion of the adaptor is double- stranded and each strand contains a different restriction site sequence in the corresponding position (i.e. a target site for different restriction enzymes) . In one preferred embodiment, the restriction sites of the up- and down-strands within one adaptor are related (Fig. 1A) . The term "related" means that the two restriction site sequences contain at lease two matched nucleotides, preferably either three or four matched nucleotides. An example of related restriction site sequences is AGATCT (Bgl II site) and TGATCA (Bel I site) . In this case the mismatches in the
restriction enzyme sites prevent the adaptors from being cleaved by either of the corresponding restriction enzymes.
However, in all cases the "complementary" portion present in at least one strand in each adaptor is complementary or substantially complementary to the "complementary" portion present in at least one strand of the other adaptor. Thus, the up-strand restriction site region of adaptor A completely matches the down-strand restriction site region of adaptor B. This match makes double stranded hybrid adaptorA/adaptorB susceptible to cleavage by relevant restriction enzymes."
Where both strands of each adaptor comprise a different, optionally related, restriction site, similarly the down-strand restriction site region of adaptor A matches the up-strand restriction site region of adaptor B. These matches make double stranded hybrid adaptorA/adaptorB susceptible to cleavage by both of the relevant restriction enzymes.
It will be understood that a complementary portion of an adaptor can thus be of any length and can contain any desired sequence. Usually, the restriction site containing complementary portion will be at least about 12 nt, more usually at least 17 nt, and generally fewer than about 200 nt, more usually fewer than about 100 nt . The non-restriction site containing complementary portion will be at lease about 6 nt, more usually at least about 10 nt, and generally fewer than about 100 nt.
The priming portion
As discussed above, at least one strand of one adaptor (generally Adaptor A) additionally comprises a "priming portion". This includes sufficient additional nucleotide (additional with respect to those nucleotides that are required to form the complementary portion described above) to permit the adaptor to be used as a site for primer binding in a nucleic acid amplification reaction e.g. to selectively amplify desired nucleic acid complexes, following the digestion steps by PCR. A priming portion of an adaptor can thus be of any length and can contain any desired sequence. Usually, the priming portion will be at least about 12 nt, more usually at least
17 nt, and generally fewer than about 200 nt, more usually fewer than about 100 nt.
A priming portion may be present in only one strand the adaptor (see e.g. Figure IB) . In this embodiment, a single primer which is identical or substantially identical to the single priming portion sequence may be used for amplifying desired target sequences. Before performing amplification using this primer, the resulting target double-stranded adaptor : tester nucleic acid complexes may need to be treated with DNA polymerase under nucleotide extension condition to make the single stranded adaptor (s) double stranded.
Optionally, both strands of one adaptor may include a priming portion (see e.g. Figure 1). In such cases the priming portion of one strand of each adaptor can be complementary to the priming portion of the other strand of the same adaptor. In this case a single primer which is identical or substantially identical to a priming portion sequence may be used for amplifying desired target sequences .
However it is preferred that each strand of one adaptor includes a priming portion, wherein the priming portion of one strand of the adaptor is non-complementary to the priming portion of the other strand of the same adaptor. The non-complementarities of priming portions within one adaptor allow a primer pair to be used to amplify desired target nucleic acids: one primer in the primer pair can have a sequence identical or substantially identical to the up- strand priming portion sequence, whereas another primer in the primer pair can have a sequence complementary or substantially complementary to the down-strand priming portion sequence.
In addition to the priming portion of Adaptor A above (i.e. the adaptor used in the tester sample which can ultimately amplify the target sequence from that sample) the other Adaptor may also include priming portions. In such cases the priming portion of one adaptor is non-complementary or non-identical to the priming portion of any strand of other adaptor. This allows different primer pairs specific for each adaptor to be used to amplify desired target sequences from mixed nucleic acid populations each of which are ligated to one of the two different adaptors.
Ligating portion
As discussed above, the adaptors may contain sufficient double- stranded sequence to permit them to be ligated to the sample nucleic acids. Where the complementary region of each adaptor contains only a single restriction site sequence (whether this region is double or single stranded) , the essential function of the other strand of the adaptor is to make the adaptor double-stranded at or near one terminus so that it is suitable for ligating to a fragmented nucleic acid sample (Fig. IB) .
Preferably the adaptors have a partially or fully double stranded "terminus" within, or at the end of, the complementary portion that is capable of being joined to the terminus of a sample restriction fragment. The terminus of the adaptor may have a 3' overhang, a 5' overhang, or may be blunt-ended. As the terminus of the adaptor is designed to be compatible with the terminus of restriction fragments produced by a given restriction endonuclease (e.g. as used to provide restriction-enzyme treated samples in step (a) ) . Naturally, the nature of the terminus of the adaptor will vary in accordance with the nature of the termini of the restriction fragments for ligation. The terminus nucleotide moieties of the adaptor are selected so as to be compatible with the particular joining method used to join the adaptor to the restriction fragments of interest. For example, when the joining is catalyzed by a DNA ligase, the 5' terminal nucleotide is phosphorylated and the 3' terminal nucleotide has a hydroxy group.
As stated above, an alternative method to achieve covalent attachment of the adaptor having all desired properties described above except the ligating portion to the nucleic acid sample is a combination of ligation of an ordinary adaptor, amplification and exonuclease digestion.
The ordinary adaptor used herein can be any adaptor that is capable of turning to the desired adaptor after amplification and exonuclease treatment. An ordinary adaptor may be used to ligate to driver or .tester sample. The ligated driver or tester sample is then amplified by PCR using primer having a sequence identical or
subsequently identical to the up-strand of the ordinary adaptor. The amplified nucleic acid is treated by an exonuclease that acts in the 5 ' to 3 ', or 3' to 5' direction, catalyzing the removal of nucleotides from duplex DNA ends. This treatment with exonuclease removes some nucleotides from 5' ends or 3' ends and leaves 3' or 5' overhangs. The resulting adaptor : sample complex is functionally identical to the ligation complex of the adpator in Fig. IB and sample .
In certain embodiments, the adaptor can include one or more
"separation moieties" incorporated into a 5' or 3' terminus (not required for ligation) or internally that allow for the separation of products including the moiety from those which do not. Preferred moieties are those that can interact specifically with a cognate ligand. For example, a capture moiety can include biotin, digoxigenin etc. Other examples of capture groups include ligands, receptors, antibodies, haptens, enzymes, chemical groups recognizable by antibodies or aptamers . The capture moieties can be immobilized on any desired substrate. Examples of desired substrates include, e.g., particles, beads, magnetic beads, optically trapped beads, microtiter plates, glass slides, papers, test strips, gels, other matrices, nitrocellulose, nylon. For example, when the capture moiety is biotin, the substrate can include streptavidin.
Nucleic acid and sources
In this disclosure, nucleic acids refer to DNA, cDNA, RNA, and mRNA molecules, combinations thereof, or the like from any source, with or without modified nucleotides. The methods of the present invention are particularly well-suited for the use of cDNA. To prepare cDNA, RNA is isolated as a subset of the genomic nucleic acid using any of the isolation methods known in the art and, thereafter, cDNA is synthesized using any of the cDNA synthesis methods known in the art, such as by using reverse transcriptase.
See, for example, Innis, et al., PCR Protocols, infra, and Ehrlich, ed., PCR Technology, W. H. Freeman and Company, N.Y. (1991), the teachings of which are incorporated herein by reference for all purposes .
For the purposes of the present invention, the tester and driver nucleic acid populations to be analyzed are derived from two or more sources of nucleic acid. The nucleic acid can be from any source in which one is interested in identifying nucleic acid sequences that are present in different abundance between two nucleic acid populations. Sources of nucleic acid suitable for use in the methods of the present invention include, but are not limited to, eukaryotic or prokaryotic, invertebrate or vertebrate, mammalian (e.g. cancerous) or non-mammalian and plant or other higher eukaryotic sources. For example, the methods described herein can be used to investigate nucleic acid differences between:
Two different genomic sources, optionally from the same species e.g. to identify markers associated with a phenotypic or other differences between. For example the subject method may be used in forensic medicine, to establish similarities between the DNA from two sources, where one is interested in the degree of relationship between the two sources.
Two different human sources e.g. to identify a marker or lesion associated with an individual's predisposition to a disease or to determine the alteration of a gene wherein detection of the alteration itself is diagnostic of the disease. In these determinations, a pooled normal sample serves as the driver, and the patient sample is used as the tester.
A non-disease (or pre-inoculation, or pre-infection) sample and a disease (or post-inoculation, or post-infection) sample to identify if an infectious agent or pathogen is present. For example to determine if a viral sequence is integrated into the genome, or a change in the genome as a result of cancer has occurred.
The subject method may also be used to enrich nucleic acid sequences that are more abundant in one pool of nucleic acid samples than another pool of nucleic acid samples (from which they may be absent) . Such pools may be created from those individuals that have in common a particular phenotypic of genotypic character but little else. In this manner, nucleic acid markers or potentially genes can be enriched that are genetically associated with the trait or genes that formed the basis of the separation
into pools .
If genomic DNA is to be the source of the nucleic acid, the DNA is isolated, and then substantially completely digested with a sample fragmenting restriction endonuclease. If RNA is the source of nucleic acid, and cDNA is synthesized from the corresponding RNA, and then substantially completely digested with a sample fragmenting restriction endonuclease. Normally, the tester and driver nucleic acid samples will be those which are expected to have substantially similar nucleic acid sequences.
Restriction treatment of sample nucleic acids
Regardless of the sources of nucleic acid, the tester and driver nucleic acid samples are preferably separately fragmented using a sample fragmenting restriction enzyme to create tester nucleic acid sample fragments and driver nucleic acid sample fragments, respectively. It is preferred that both the tester and driver nucleic acid samples are fragmented using the same sample fragmenting restriction enzyme, and that the sample fragmenting restriction enzyme is one which recognizes and cuts at a four base site, leaving an overhang.
In one preferred embodiment, the fragmentation restriction enzyme may be selected to be one which does not leave the whole of the restriction site as an overhang. In this way, the sequence of the adaptor could be such that the ligation of the adaptor to the fragments of DNA does not regenerate the sequence of the fragmentation restriction site - consequently the digestion of the DNA and ligation of the adaptors may take place simultaneously.
Moreover, the sample fragmenting restriction enzyme site and the restriction enzyme site(s) used in adaptors (discussed above) may be related. In this context "related" means that the restriction site sequences contain sufficient nucleotides in common that the action of the former enzyme destroys any sites for the latter enzyme. For example in a preferred embodiment, the restriction enzyme sites in the adaptor which are usually 6 base sites comprise the sample fragmenting restriction enzyme site which usually is 4 base site. For example, where the restriction site in an adaptor is
AGATCT (Bgl II site) , the sample fragmenting restriction enzyme site can be GATC (Mbo I site) . Upon complete digestion of nucleic acid sample by Mbo I, there will be no remaining internal GATC sites, therefore no AGATCT sites either. This design will prevent desired target nucleic acids from being digested with restriction enzymes used for digesting hybrid adaptors.
Prior to fragmentation by restriction treatment, the sample nucleic acid can be amplified by any convenient method. The amplified sample nucleic acid can be purified by any method before and/or after restriction treatment.
Other methods
By providing methods to generate powerful new libraries, the invention facilitates the discovery of new genes and helps to uncover pathways involved in various disease states. Researchers can screen such libraries directly or use them in subtractive protocols to produce further libraries for use in the art.
Using the methods of the present invention, the resulting target nucleic acid will be greatly enriched. The target nucleic acid may be sequenced directly after PCR or, as noted above, it may be cloned by inserting it in a cloning vector for cloning into a host cell. The cloned DNA can subsequently be sequenced to determine the nature of the target DNA.
In other aspects of the invention, methods may be used as follows, in each case comprising steps (a) -(g), and optionally (h) , above:
A method for providing a subtractive nucleic acid populations from a tester sample and a driver sample, for example to determine the presence of one or more nucleic acid sequence differences in the tester and driver samples. Such differences may, for example, be associated with recessive or dominant traits. A method for identifying a target nucleic acid in a tester sample. A method of comparing two related sources of nucleic acid to determine whether they share a particular sequence. A method for producing a probe capable of distinguishing at least one sequence difference between a tester and a driver sample, for example by purifying the enriched
target nucleic acid obtained by the method.
Conventional and well-known techniques and methods in the fields of molecular biology, microbiology, and recombinant DNA technology may be employed in the practice of the present invention unless otherwise noted. These techniques and methods are explained and detailed in the literature and standard textbooks and are therefore known to those of ordinary skill in the art. (See, for example, J. Sambrook et al . , "Molecular Cloning: A Laboratory Manual," 2nd edition, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1989) ) .
Adaptors & kits
Adaptors for nucleic acid replication and amplification techniques, as well as primers that generally anneal to the adaptors, are well known in the art. However, the design and use of the adaptors disclosed herein, optionally in combination with a replication or amplification reaction, provides a novel, simplified, and generally applicable reagent for preferentially directing the various replication and amplification reactions to desired nucleic acids.
The adaptors may be any of those discussed above. Thus in a further aspect the present invention provides a nucleic acid adaptor for use in the present invention, which adaptor comprises: a first strand and a second strand, wherein the first strand comprises (i) a complementary portion including restriction enzyme sequence, which is the target of restriction enzyme having double-stranded nucleic acid specificity, and (ii) in a 5' direction to said restriction enzyme site sequence, and preferably at the 5' end of said restriction enzyme site sequence, a priming portion capable of binding a primer suitable for a nucleic acid amplification reaction, wherein the region of the adaptor including the restriction enzyme sequence is either single stranded, or double-stranded but containing one or more mismatches in the region of restriction enzyme sequence, in either case so as to render the restriction site non-functional, and wherein the second strand is adapted such that the terminus of the adaptor which is distal to the priming portion is capable of
being ligated to a double stranded nucleic 'acid sample
preferably the priming portion will be at least about 12 nt, more usually at least 17 nt, and generally fewer than about 200 nt, more usually fewer than about 100 nt. Similarly the complementary portion of the adaptor.
preferably the adaptors have a terminus within, or at the end of, the complementary portion that is capable of being joined to the terminus of a sample restriction fragment. Preferably the terminus has an overhang e.g. a 5' overhang in which the terminal nucleotide is phosphorylated, or a 3 ' overhang in whi'ch the terminal nucleotide has a hydroxy group.
Optionally, the adaptor includes one or more separation moieties incorporated into a terminus or internally that allow for the separation of products including the moiety from those which do not .
In one embodiment, complementary portion of the adaptor is double- stranded and each strand contains a different restriction site sequence in the corresponding position.
In one preferred embodiment, the restriction sites of the up- and down-strands within one adaptor are related.
In one embodiment, each strand of the adaptor includes a priming portion, which priming portions may or may not be complementary to each other.
Thus, preferably, the adaptor comprises a first strand and a second strand, and each strand comprises a priming portion and a complementary portion, wherein the priming potion provides a site for a primer to bind, and wherein the complementary portions of anti-parallel first and second strands of the adaptor form a double-stranded portion of the adaptor, and wherein the complementary portions of the strands each comprise a related restriction enzyme site, which site is the target of a different restriction enzyme having double-stranded nucleic acid
specificity, wherein the related restriction site sequences are different but contain at lease two matched nucleotides, and wherein the related restriction sites are located in corresponding positions in the complementary portion of anti- parallel first and second strands of the adaptor such that the double-stranded portion contains one or more sequence mismatches in the restriction enzyme sites, which mismatches prevent the adaptor from being cleaved by the different restriction enzymes .
More preferably the invention provides a pair of first and second adaptors as described herein e.g. present as a composition of matter, wherein the complementary portion present in at least one strand in each adaptor is complementary to, and completely matches, the complementary portion present in at least one strand of the other adaptor, such that a hybrid first-adaptor/second-adaptors are capable of forming a double-stranded portion which is susceptible to cleavage by restriction enzymes.
As described above, both strands of each adaptor may comprise different, optionally related, restriction sites. These are capable of making double stranded hybrid adaptorA/adaptorB susceptible to cleavage by both of the relevant restriction enzymes.
Thus, the invention specifically includes the disclosed adaptors, with or without the appropriate modifications and annealing primers, functioning or capable of functioning in the disclosed methods.
The invention further provides kits for performing the methods of the invention e.g. for research or in diagnostic applications to identify the presence or absence of certain nucleic acids in a sample.
One skilled in the art is familiar with amplification reaction- based assays and kits, as exemplified in the listed references and others known in the art, as well as nucleic acid replication-based assays, such as primer extension reactions.
Thus in another aspect, the present invention provides kits for carrying out the methods described herein. Combinations of reagents useful in the methods set out above can be packaged together with instructions for using them in the described methods. In particular, such kits can contain a separate container for the first adaptor, the second adaptor, and primers to amplify the tester nucleic acid sample fragments and primers to amplify the driver nucleic acid sample fragments as described above. Further, the kits can contain adaptors tagged with moieties such as biotin which can be used to isolated the target nucleic acid of interest.
The invention will now be further described with reference to the following non-limiting Figures and Examples . Other embodiments of the invention will occur to those skilled in the art in the light of these.
The disclosure of all references cited herein, inasmuch as it may be used by those skilled in the art to carry out the invention, is hereby specifically incorporated herein by cross-reference.
FIGURES
Throughout these figures and by convention, double-stranded nucleic acid is represented as two horizontal lines, either solid or broken. The strand represented by the top line has its 5' end to the left and 3' end to the right. The strand represented by the bottom line has its 3' end to the left and 5' end to the right.
FIG. 1 shows examples of adaptors. 1A shows preferred adaptors each of which includes 2 priming portions and 2 related restriction sites . IB shows other adaptors of the invention, which includes only a single priming portion and single restriction site. In each case the adaptors include a double stranded terminus (including an overhang) to facilitate ligation.
FIG. 2 Two samples, tester cDNA and driver cDNA, are prepared by cleaving with a restriction enzyme, for example Mbol. The fragmented tester cDNA and driver cDNA are ligated to adaptors A and B, respectively. The adaptors (A and B) contain 5' phosphate groups in their down-strands. The up-strand and the down-strand of adaptor A comprise restriction sites Bgl II and Bel I, respectively. Similarly the up-strand and the down-strand of adaptor B comprise restriction sites Bel I and Bgl II, respectively. The two samples are combined at a desired ratio, followed by treatment under appropriate denaturing and annealing conditions. Some of the possible hybrids and single-stranded nucleic acids formed after hybridization are depicted in the figure. The hybridized mixture is treated with restriction enzymes Bgl II and Bel I. The heteroduplex hybrids formed by tester and driver are cleaved by Bgl II and Bel I in the adaptor sequence, resulting in the elimination of the primer binding site. These hybrids cannot be amplified. The single-stranded nucleic acids, the homoduplex tester/tester and driver/driver can be amplified in an amplification reaction employing primers hybridizing to the priming portions of the adaptor A or adaptor B. If required, before the amplification the processes of denaturing, hybridization and digestion are repeated several times. The resulting amplified nucleic acids are incorporated into an appropriate vector to generate a library of preferentially amplified nucleic acids.
EXAMPLE
Example 1
mRNA from spleen of a normal mouse strain was used to generate driver cDNA, and mRNA from spleen of a diseased mouse strain was used to generate tester cDNA. Briefly, five μg mRNA from mouse spleen was converted to double stranded cDNA using a cDNA synthesis kit following the manufacturer's protocol -(Invitrogen) , using the primer poly (T) 19. The cDNA was then cleaved with Mbo I. The digested tester cDNA was ligated to adaptor A, and the digested driver cDNA was ligated to adaptor B. The adaptor sequences are shown in FIG. 1A. After ligation, the adaptor : nucleic acid complexes were purified using Qiagen PCR purification kit.
0.2 μg of the tester cDNA ligated to adaptor A and 2.0 μg of driver cDNA ligated to adaptor B were mixed, then the mixture was ethanol precipitated, and was dissolved in 5 μl of 2 times NEB buffer 3 (New England Biolab) and overlaid with 30 μl of mineral oil (Perkin Elmer Cetus) . Following heat denaturation at 95 °C for 5 min, DNA was hybridised for 1 h at 60 °C. 10 unit of restriction enzyme Bgl II and 10 U of Bel I were then added and the volume was brought to 10 μl and incubated at 37 °C for lh. The denaturation, hybridisation and digestion were repeated 5 times with the addition of enzymes 2 times in the digestion steps. All these steps were performed in the same single tube. After the final digestion step, the tester-specific sequences were amplified using primers 5'- GGTAAAACGACGGCCAGT-3' and 5' -GTACCAGTATCGACAAAGG-3r . The amplified products were cloned into the BamH I site of pBluescript vector (Stratagene) . Colonies were screened for inserts by PCR using T7 and T3 sequences outside the cloning site as primers. About 20 clones containing inserts were identified by PCR amplification and were sequence analysed. The sequences obtained were searched against Database to determine their identities.
Among the analysed fragments, genes known to be specifically expressed in the diseased mouse spleen were identified that include immunoglobulin genes, gp70-related sequences, and several virus- related sequence. This result was consistent with previous analyses
of disease related genes using another approach (Fu et al. Nucl. Acids. Res. 2002. 30 (6): 1394 ).
Example 2
Double stranded cDNAs are obtained as in sample 1. The cDNA is then cleaved with Mbo I. The digested tester cDNA is ligated to adaptor A, the sequences of which are shown in FIG. IB. The digested driver cDNA is ligated to an ordinary adaptor which has a sequences:
upstrand 5' gctcgattcagatctgacgc 3' , and downstrand 5' gatcgcgtcagatctgaatcgagc 3'
The ligation product is amplified by primer having a sequence:
5' gctcgattcagatctgacgcgatc 3' ,
in which the last three nucleotides of 3 ' end are phosphorothioate nucleotides. After PCR amplification, the product is treated with T7 exonuclease. In the result, the end of exonuclease treated product may have a structure as follows:
5' atcnnnnnn...3 '
3' cgagctaagtctagactgcgctagnnnnnn...5'
The adaptor : nucleic acid complexes are purified using Qiagen PCR purification kit. 0.2 μg of the adaptorA: tester complex and 2.0 μg of adaptor : driver complex are mixed, then the mixture is ethanol precipitated, and is dissolved in 3 μl of hybridization buffer and overlaid with 30 μl of mineral oil (Perkin Elmer Cetus) . Following heat denaturation at 95 °C for 3 min, DNA is hybridised for 12 h at
65 °C. The hybridized DNA is purified using the Qiage PCR purification kit, and digested with 10 units of restriction enzyme Bgl II at 37 °C for 3h. The Bgl II treated DNA is ethanol precipitated and subjected to denaturation, hybridisation and digestion for another 2 times. After the final digestion step, the tester-specific sequences are amplified using primers 5'- GGTAAAACGACGGCCAGT-3' . The amplified fragments are cloned, sequenced and analysed.