WO2002004686A2

WO2002004686A2 - Detecting methylated cytosine in polynucleotides

Info

Publication number: WO2002004686A2
Application number: PCT/US2001/041321
Authority: WO
Inventors: Norbert O. Reich; Alec M. Wodtke
Original assignee: Epigenx Pharmaceutical, Inc.
Priority date: 2000-07-10
Filing date: 2001-07-10
Publication date: 2002-01-17
Also published as: WO2002004686A3; AU2001281311A1

Abstract

The methylation status of cytosine bases in a polynucleotide can be determined, with enhanced sensitivity, by contacting the polynucleotide with an agent that modifies either unmethylated cytosine or methylated cytosine, amplifying the modified polynucleotide using one or more 'heavyweight' nucleotides and then determining the mass of the amplified product. The mass of the amplified product is determined, for example, by mass spectrometry, and gauged in relation to the mass of a control sample having identical base sequence to the nucleotide sequence under study. The presence or absence of a mass difference indicates whether one or more cytosine bases of the polynucleotide are methylated.

Description

DETECTING METHYLATED

CYTOSINE IN POLYNUCLEOTIDES

FIELD OF THE INVENTION

The present invention relates to the detection of a modified nucleotide sequence and, more specifically, to the use of amplification and mass analysis to detect cytosine base methylation of a nucleotide sequence.

BACKGROUND OF THE INVENTION

In most organisms, genetic material is composed of deoxyribonucleic acid ("DNA"), the primary structure of which codes for the amino acid sequence of proteins in the organism. DNA consists of two intertwined polynucleotide chains, each comprising a string of nucleic acid bases linked together by a sugar-phosphate linkage. In a protein-encoding DNA segment, or "gene," the sequence of the protein is determined by the sequence of four types of bases in the DNA: adenine (A), guanine (G) , cytosine (C), and thymine (T).

The "expression" of a gene, resulting in the production of an encoded protein, involves transcribing the DNA of the gene into a nucleic acid intermediate called "messenger ribonucleic acid" (mRNA). The mRNA has the same nucleotide base sequence as the DNA from which it is transcribed, except the mRNA contains another base, uracil (U), in place of thymine.

Regulation of gene expression in a cell is what distinguishes one cell type from another. During cellular differentiation, a cell-type-specific pattern of gene expression is established via complex interactions that can involve, for example, extracellular signals and tissue-specific transcription factors. Another phenomenon, which can affect gene expression, entails covalent modification of cytosine bases through methylation. A gene can be silenced transcriptionally when cytosine bases are methylated in areas that, generally, are outside the protein encoding DNA sequence and that contain elements responsible for regulating transcription. More specifically, regions of DNA are observed, usually in the vicinity of genes, that are G:C-rich and contain many cytosines in a cytosine-guanine dinucleotide ("CpG") motif. See Bird (1980); Gardiner-Garden and Frommer (1987); Larsen et al. (1992). Although not affected by methylation normally, CpG islands may be methylated in the upstream regulatory region of a gene, such as a tumor-suppression gene, the inactivation of which is associated with, cancer. See Issa et al. (1994); Merlo et al. (1995); Herman et al. (1996).

There are a number of ways to detect the presence of methylated cytosines in DNA, but each detection methodology has deficiencies. Some methods depend upon cleavage, via methylation-sensitive restriction enzymes or reactive chemicals, of the phosphodiester bond that connects cytosine nucleosides. Such methods, however, tend to work only in limited areas of genomic DNA. Conventional sequencing protocols which identify methylated cytosine residues in genomic DNA also require a large amount of genomic DNA and turn on the observation of hard- to-discern gaps in a sequencing ladder. A variety of approaches that obviate the need for restriction enzymes involve treating the DNA with bisulfite, which converts all unmethylated cytosines to uracil. The methylated form of cytosine is not converted to uracil by this chemistry. Consequently, the methylation dependent uracil conversion can be used to mark unmethylated CpG (or as well as other cytosines not appearing within the CpG context). Various means have been devised to detect the methylation dependent sequence. For example, the altered DNA can be amplified by polymerase chain reaction (PCR) and sequenced. Other methods also have been reported (Eads et al , 2000), but these approaches are technically difficult and labor- intensive.

SUMMARY OF THE INVENTION

Accordingly, it is an object of the present invention to provide an improved methodology, readily automated, for detecting methylated cytosines in genomic

DNA.

It accomplishing this and other objectives, the present invention provides, in accordance with one of its aspects, a method for determining whether one or more cytosine bases are methylated in a polynucleotide that has a nucleotide sequence containing one or more cytosine bases, comprising:

(a) contacting the polynucleotide with an agent that modifies either unmethylated cytosine or methylated cytosine in the polynucleotide, the modification causing the modified unmethylated or methylated cytosine to be replaced by another nucleic acid base upon amplification of the polynucleotide; (b) effecting an amplification using the modified polynucleotide as template and in the presence of a mixture of nucleotides, wherein at least one of the nucleotides is a heavyweight nucleotide that has increased molecular weight compared to it's natural counterpart; and then

(c) using mass analysis to determine if the mass of the amplified product is different from a control sample having identical base sequence to the nucleotide sequence, whereby the presence or absence of the mass difference indicates if one or more cytosine bases of the polynucleotide are methylated.

In a preferred embodiment, the control sample: (i) is an actual control sample that is either fully methylated or fully unmethylated and has been subjected to steps (a) and (b) before actual mass determination; or (ii) is a theoretical control sample that is fully methylated or fully unmethylated, the mass of which is estimated by taking into account the mass of heavyweight and normal nucleotides that would be present if the theoretical control sample were subject to steps (a) and (b). In another embodiment, the control sample in step (c) is one that contains a specified number of methylated cytosines. For example, there can be one or more cytosine bases present in the sequence as a CpG sequence motif. Also, the CpG sequence motif may be represented in one or more CpG islands.

In yet a further embodiment of the invention, mass analysis is determined by means of mass spectrometry. These and other embodiments of the present invention are described in greater detail below.

DETAILED DESCRIPTION OF THE INVENTION

The present inventors have developed a new approach to detecting the methylation status of a polynucleotide sequence. The approach is characterized by an enhanced sensitivity to differences of even a single methylated cytosine, and yet it can be employed to ascertain the methylation status of an entire CpG island.

In accordance with the present invention, a polynucleotide containing one or more cytosines for which methylation is to be determined is contacted with an agent that modifies unmethylated cytosines in the polynucleotide but not methylated cytosines, or vice versa. Illustrative of suitable agents is a bisulfite salt such as sodium bisulfite, which converts non-methylated cytosine to uracil. The treated polynucleotide then is used as template in an amplification reaction which employs primers designed to amplify the particular segment of the polynucleotide containing the cytosines to be evaluated for methylation status. A key feature of this step is performing the amplification in the presence of one or more "heavyweight" deoxynucleoside triphosphates (dNTPs), which have greater mass than their corresponding "natural" nucleotide triphosphates. (In this context, amplification with heavyweight dNTPs is termed "heavyweight amplification" or "heavyweight PCR. ") Finally, mass analysis of amplified product is performed to determine whether the product is larger or smaller in mass than one would find for an appropriate control nucleotide sequence.

Isolation of Nucleic Acids

Unless indicated otherwise, the term "polynucleotide" denotes DNA in this description. Also, the "base sequence" of a nucleic acid relates to the sequence of nucleic acid bases without regard to methylation status. Accordingly, if a polynucleotide contains a 5-methyl cytosine at a particular position, then the base sequence of the polynucleotide has a cytosine at that position.

Conventional techniques for isolating nucleic acids from cells and tissues are described, for example, in Maniatis et al. (1989). Methods for isolating nucleic acid from essentially any organ in the body, as well as from cultured cells, also are well known.

With the present invention, the polynucleotide for which methylation status is to be determined can be an isolated molecule or part of a mixture of nucleic acids. That is, the sequence to be analyzed can represent only a minor fraction of a complex of nucleic acids; also, it can constitute a portion or essentially all of the polynucleotide in question. The polynucleotide to be analyzed may comprise one or more genes or their fragments, and it may be methylated at individual cytosines, at small groups of cytosines, or at one or more CpG islands. In relation to CpG islands, the present description uses the phrase "CpG island" to denote a sequence region of at least 200 base pairs, with at least 50% C + G, that contains at least 0.6 of the statistical frequency of CpG, where 0.6 is the ratio of observed to expected CpG, that is:

Number of CpG _x N. Number of C x Number of G

. In this relationship, N is the total number of nucleotides analyzed in a sequence (N = 100). See Gardner-Garden et al. (1987).

"Fixing" Methylation Status for Subsequent Amplification " As noted, a polynucleotide under study is contacted, pursuant to the present invention, with an agent that "fixes" the methylation pattern into the polynucleotide sequence (by employing methylation dependent chemical conversion) and allows for this methylation-pattern-dependent sequence to be amplified by PCR in a way that maintains the information regarding the methylation pattern in the sequence of the PCR products. In conventional practice, methylation status is lost following amplification by polymerases. In a method of the present invention, however, methylation status is fixed in the polynucleotide by chemically modifying either the nonmethylated cytosine or the methylated cytosine, but not both. That is, the chemical modification causes the modified unmethylated or methylated cytosine to be replaced by another nucleic acid base, upon amplification of the polynucleotide.

If the polynucleotide is double-stranded, then the sequence of the two strands is likely no longer to be complementary.

An agent is preferred that chemically converts the nonmethylated cytosine to another type of base but that does not affect methylated cytosine. A bisulfite salt (e.g., NaHSO3) combined, for example, with hydroquinone, can be used to convert unmethylated cytosine to uracil without affecting methylated cytosine. See Frommer et al. (1992); Olek et al. (1996). Without being bound by any theory, the inventors believe that cytosine reacts with the bisulfite ion to form a sulfonated cytosine reaction intermediate, which is susceptible to deamination, giving rise to a sulfonated uracil, which can be removed to yield uracil. By changing the base sequence of the polynucleotide with respect to nonmethylated cytosine, bisulfite treatment fixes the methylation status into the resulting amplification product because the polymerase recognizes uracil as a myrnine. Thus, the resultant amplification product contains cytosine only at the position where 5-methylcytosine was present in the starting DNA. Agents such as sodium pyrosulfite (e.g., Na2S2O₅) or pyrosulfate also may be useful, provided that they can convert one of the two forms of cytosine (methylated or non-methylated) to another base.

Amplification of Chemically Modified DNA

The chemically treated polynucleotide is used as a template in an amplification reaction with oligonucleotide primers designed to amplify a particular sequence which contains one or more cytosines that may be methylated. The primers preferably are designed to hybridize effectively with the base sequence that exists in the polynucleotide following chemical modification. With publicly available sequence information, including that gleaned from the human genome project, primer sequences can be designed to hybridize and amplify virtually any region of the genome where methylation is suspected. Primers may be designed which encode specifically to the chemically modified sequence or which encode to regions of sequence not modified by the chemical treatment.

Primers also may be designed to amplify any of a variety of controls, such as a polynucleotide in which the methylation pattern is erased. This can be accomplished by amplifying the sequence directly from the polynucleotide that has not been treated with an agent to chemical modify and fix methylation status. Another control entails a polynucleotide that has been treated to methylate all cytosines in CpG sites. Saturation can be achieved by treating the polynucleotide with a bacterial methyltransf erase, such as the M.SssI enzyme. In yet another control polynucleotide, only particular, defined cytosines of CpG sites are methylated.

An oligonucleotide primer, which is suitable for the present invention, should be of sufficient length and appropriate sequence to provide specific initiation of polymerization at the appropriate site on the polynucleotide under study. Primer extension is conducted in the presence of appropriate nucleoside triphosphates and an agent for polymerization, such as DNA polymerase, and under suitable conditions including, for example, an appropriate temperature and pH. The exact length of a given primer depends on many factors well known in the art, including, for example, temperature, chosen buffer, and nucleotide composition. A suitable primer typically contains between 12 and 20 bp, although it may longer or shorter in some cases.

Primers can be prepared via any suitable method, such as conventional phosphotriester and phosphodiester techniques, including automated embodiments thereof. Diethylphosphoramidites can be used for synthesis and prepared as described by Beaucage et al. (1981). One method for synthesizing oligonucleotides on a modified solid support is described in U.S. patent No. 4,458,066. Primers may be chemically derivatized or may be synthesized using a derivatized or unnatural dNTP, as described in detail elsewhere, to provide a detectable agent for enhancing detection by mass analysis.

Various approaches can be used to amplify the nucleotide sequence of interest. These include any of a variety of primer extension reactions and the use of any of a variety of DNA polymerases. A preferred approach is to use PCR with a thermostable polymerase such as Taq polymerase.

If the polynucleotide is double-stranded, it is preferred to amplify only a single strand for subsequent mass analysis. Amplification of a single strand can readily be achieved without strand separation by designing strand specific primers essentially as described by Frommer et al. (1992). In this case, following chemical modification of the polynucleotide to fix the methylation status, the sequence of the two strands is no longer complementary. Thus, one can design primers that anneal only to a single strand. The design of the primer sequence and length depends on the sequence within and around the CpG island or cytosine(s) of interest and makes use of numerous design criteria. See Frommer et al. (1992). Software programs also are available for this purpose. See Grunau et al. (2000).

Single-strand amplification without strand separation also can be achieved using so-called "asymmetric" PCR. This approach relies on the use of unequal or asymmetric concentrations of the two amplification primers. In general, during the initial PCR cycles most of the product generated is double stranded and accumulates exponentially. As the low-concentration primer becomes depleted, further cycles generate an excess of one of the two strands, depending on which of the two primers was limited. This single-stranded DNA accumulates linearly and is complementary to the limiting partner. Typical primer ratios for asymmetric PCR are 50:1 to 100:1. See McCabe (1990); Gyllensten et al. (1988).

A type of asymmetric amplification also may be achieved by exploiting differences in the annealing temperature of the two primers. This can be achieved in situations where a polynucleotide has been chemically treated such as with bisulfite as described above. In this case, the chemical treatment differentially modifies the sequence of the two strands, thus, rendering them less than fully complementarity. In most cases, this change in complementarity will be sufficient to cause the two stands to separate from one another under standard conditions. In other cases, a double stranded polynucleotide may be separated into individual strands using methods well known in the art and include, for example, use of various denaturing conditions, including physical (e.g., heat between 80-100°C), chemical, or enzymatic (e.g., helicases). Strand separation can be effected either as a separate step or simultaneously with the synthesis of the primer extension products. Alternatively, the individual strands can be isolated before amplification. Isolation can be accomplished, for example, by chromatography or other well- known methods. See Maniatis et al.( 1989).

The illustration below demonstrates how bisulfite treatment ("bis") followed by PCR amplification affects the base sequence of DNA having a methylated CpG (M = 5-methylcytosine), a cytosine not in a CpG context, and a cytosine in a CpG context that is unmethylated. The affect of bisulfite treatment on the base sequence of both strands of a polynucleotide is shown below.

Before Bis Treatment After Bis Treatment

5 ' -MG C— T CG — 3 ' 5 ^* --MG- — U— T UG— -3 '

3 ' --GM G— A GC — 5 ' 3-^.-GM— -G— A — GU 5'

As described above, amplification of a single strand is accomplished by two "bis" primers, i.e., primers based on the bis treated sequence, which are designed to anneal to the bis-treated DNA or by asymmetric PCR. The sequence of the top strand in the above illustration is shown below following bis treatment (#2) and amplification (#3). The amplified product, which is the complement of the amplified strand, is shown linked to marker (e.g. , a dye) which is linked to one of the PCR primers used for amplification.

Bis treatment PCR

# 2 # 3

5 ' --MG- — U— T UG — 3 ' 5 - ___CG T— T TG — 3 '

3'___GC A— A — AC— //Dye5'

As shown, PCR amplification following bis treatment results in two strands that are no longer complementary (#3). Depicted below is a single strand of an amplification (the complement of the amplified strand that formed by extension from dye-containing primer described above) following Bis or no Bis treatment of methylated or nonmethylated DNA. The differences between methylated and unmethylated sequences after bis treatment are shown in bold font and are underlined. Methylation

[1] No Bis treatment 3'— MC G— A GC— //Dye5'

[2] Bis treatment 3 ' — GC A— A AC— //Dye5 '

No methylation

[3] No Bis treatment 3 ' — GC G— A GC— //Dye5 '

[4] Bis treatment 3 ' —AC A— A AC— //Dye5 '

As illustrated, the complementary strand that results from amplification following bis treatment has a guanine at the far left position if the position was methylated in the complementary strand and an adenine at the far left position if the position was not methylated in the complementary strand. Thus, in relation to the complementary strand resulting from a single strand amplification, methylation in effect substitutes guanine for adenine. Conversely, in relation to the non- complementary strand resulting from a single strand amplification, methylation in effect substitutes cytosine for thymine. In either case, after bis treatment and amplification by a primer set using methylated and nonmethylated starting DNA of the same base sequence, the only difference would be the base that appears at the position of the methylated cytosine or at the position complementary thereto. Bisulfite treatment converts nonmethyated cytosines to uracil. After PCR with suitable primers, the uracils are converted to thymines, while 5- methylcytosines are converted to cytosines. In the present illustration, mass analysis of the non-bisulfite treated [3], full-length PCR product may be used as a baseline to which the bisulfite treated sample [4] is compared. Amplification results in products of different mass, depending upon the methylation state of the original polynucleotide. An analysis along these lines also may include any of a number of controls, such as polynucleotides where the methylation pattern has been erased, where all cytosines in CpG sites are methylated, or where only defined cytosines in CpG sites are methylated. Heavyweight Amplification

Another aspect of the present invention concerns amplification in the presence of one or more "heavyweight" nucleotides (dNTPs). In this context, a heavyweight nucleotide is one that has increased mass over its "natural" counterpart nucleotide, although it still functions as a substrate in a primer extension reaction. The natural nucleotides include, for example, dATP, dTTP, dCTP, and dGTP. For purposes of illustration, the molecular weights of the nucleic acid bases of preferred dNTPs are: A = 135; T = 126; C = 111; and G = 151. A heavyweight nucleotide based on A, for example, has molecular weight that is greater than 135 daltons. A heavyweight nucleotide can be prepared by chemically substituting the nucleic acid base, the sugar or the phosphate moiety of a natural nucleotide. Substitution of the nucleic acid base of the nucleotide can achieved, for example, by halogenation with fluorine, chlorine, bromine or iodine. In the case where substituted versions of ATP or TTP are contemplated, one can use 8- bromodeoxyadenosine triphosphate (BdATP, Sigma), 8-iododeoxyadenosine triphosphate (IdATP), 5-bromodeoxythymidine triphosphate (BdTTP, Sigma) or 5- iodoodeoxythymidine triphosphate (IdTTP). Heavyweight forms of CTP such as 5- iododeoxycytosine triphosphate (IdCTP) and GTP such as 8-iododeoxyguanine triphosphate (IdGTP) also may be prepared. The nucleic acid base of a nucleotide also may be made heavier by substitution with biotin or other chemical moieties provided the substituted base can still be incorporated into the amplification product. Many such substituted nucleotides are commercially available (e.g., bio-7-ATP, Sigma). A nucleotide also can be chelated with a metal to increase its mass.

In addition, heavyweight nucleotides can be prepared by synthesizing the nucleotide with heavier-than-natural elements. For example, the mass of the sugar moiety or alpha phosphate moiety of a nucleotide may be increased over its natural mass by synthesizing the nucleotide using heavy isotopes such as ¹⁴C for the sugar or ³²P for the phosphate. Other radionuclides such as ³H or ¹² I also can be used^'.

By including heavyweight nucleotides during amplification, the differences in methylation status can be further enhanced, providing greater ease in detecting a mass difference, particularly in the case where the analysis is focused on only a few cytosines or CpG sites. The amplification reaction should include all of the nucleotide precursors (i.e. , nucleoside triphosphates) needed for primer extension where at least one of the precursors is a heavyweight nucleotide. If one heavyweight nucleotide is used, for example, then the three remaining nucleotides are "natural" nucleotides.

It is preferred that a heavyweight nucleotide not limit the length of the amplification product, relative to that obtained when only natural or unsubstituted nucleotides are used for amplification. DNA polymerases are known to "accept" a variety of heavyweight NTPs with no apparent problems of mispairing leading to mutations in the amplified strands. See Innis, et al. (1990). The determination of polymerase enzymes, standard NTP concentrations, analog concentration, and enzyme concentrations can be conducted according to standard procedures well known in the art. Heavyweight nucleotides increase the ability to detect the methylation status of an amplified sequence of a polynucleotide. The choice of which heavyweight nucleotide to use in a particular circumstance may depend on several factors including, for example, whether the polynucleotide is chemically treated, what chemical treatment of the polynucleotide is used for fixing methylation status, and which strand is to be amplified. In one approach, amplification can be performed using chemically treated polynucleotide as template, adding a pair of primers designed for the chemically treated sequence on one strand of the polynucleotide, and further adding one or more heavy weight nucleotides. In this case, a double stranded product results where both strands incorporate the heavy weight nucleotides. In another approach, the chemically treated polynucleotide may first be amplified as above but with natural nucleotides, resulting in formation of a double stranded product incorporating only natural nucleotides. A sample of the double stranded product can then be used as template in a subsequent asymmetric-type PCR reaction performed with appropriate primers and in the presence of one or more heavyweight nucleotides. In this case, the heavyweight nucleotides are incorporated only into the single amplified strand, removing "background" incorporation of heavyweight nucleotides into the other strand. This same result also may be accomplished by initiating amplification of chemically treated polynucleotide as template in the presence of natural nucleotides for a time sufficient to form some double stranded template, interrupting the reaction to add a large amount of one of the primers and one or more heavyweight nucleotides and then continuing the amplification. By adding the large excess of the primer, an asymmetric PCR reaction is achieved, which then amplifies a single strand of the earlier-generated double stranded template, incorporating the heavy weight nucleotide(s) solely into the single amplified strand. As an alternative to adding the additional primer during the interruption in amplification, asymmetric amplification with heavyweight nucleotide(s) may be achieved by exploiting differences in the annealing temperatures of the two primers used in the initial amplification.

The value provided by using heavyweight nucleotides for amplification can be appreciated in the following example. For a polynucleotide that has been bisulfite-treated and then subjected to PCR with "natural" nucleotides, any methylated C position in the top strand of the amplified product would incorporate a C (mw. Ill), while any nonmethylated C position would incorporate a T (mw. 126). On the bottom strand of the amplified product, the complement of a methylated C position would incorporate a G (mw. 151), and the complement of a nonmethylated C position would incorporate an A (mw. 135). Thus, one obtains a net mass difference of 15 daltons for C-versus-T in the top strand (minus 15, if methylated is compared to unmethylated) or a difference of 16 daltons for G-versus- A in the bottom strand (+16, if methylated is compared to unmethylated).

The difference in mass due to amplification with "natural" nucleotides only becomes significant when the amplified sequence contains many methylated cytosines such as in a CpG island. For example, in a typical CpG island with 500 bases, of which 15 are methylated, the mass difference between the methylated island and unmethylated island is 210 daltons. After bisulfite treatment and PCR amplification with "natural" nucleotides, depending on which strand of the double stranded amplification product is mass analyzed or "weighed, " the mass difference is either +240 or -225 daltons.

The difference in mass between a methylated and a non-methylated polynucleotide, amplified with heavyweight nucleotides, increases detectability of methylation status even for one or few cytosines or for small CpG island(s). In the case where bisulfite chemistry and PCR amplification are used, for example, one can focus the analysis on the non-methylated cytosine positions. In this case, PCR can be performed using heavyweight versions of ATP or TTP. In the bis-treated, methylated and non-methylated amplified strands depicted above, [2] and [4] are compared, respectively, each methylated cytosine results in a single G instead of A substitution and a mass difference of 63 (G vs BdA, 150 vs 213; amplification conducted in the presence of BdATP). By looking at the complementary strand and using BdTTP, one can effect an even larger mass difference: 94 (C vs BdT; amplification conducted in the presence of BdTTP).

Accordingly, a much larger, methylation-related mass shift is produced when a substituted nucleotide is employed in amplification (compares to + 16 or - 15 daltons, respectively, for amplification with natural nucleotides). After bisulfite treatment and PCR amplification, the mass difference for a typical CpG island, where 15 of 500 bases are methylated, is either 945 (using BdATP) or 1410 daltons (using BdTTP), depending on which strand of the double-stranded amplification product is mass analyzed or "weighed" (compares to +240 or -225 daltons, respectively, for amplification with natural nucleotides. Thus, heavyweight PCR contributes significantly to the detection of mass differences, reflecting methylation patterns of nucleic acid.

Mass Analysis of Amplified Products

The present invention makes use of mass analysis to determine whether the mass of the amplified product is different from a control sample having identical base sequence to the nucleotide sequence that is amplified. Several types of control samples are useful. With respect to a preferred control sample, the starting polynucleotide has been subjected to amplification but not chemical modification to fix methylation states. In this case, the methylation pattern of the polynucleotide is erased and provides a baseline for a fully non-methylated control sequence.

Another control involves a polynucleotide that has been treated to methylate all cytosines in CpG sites. This control provides the other extreme of full CpG site methylation. Other control polynucleotides are possible where only cytosines of particular CpG sites are methylated. In all of these cases, the control sample is subjected to mass analysis, following amplification. Another option is to compare to a theoretical control sample(s) that is either fully methylated or fully unmethylated. In this case, the mass of the control sample(s) is estimated by taking into account the mass of heavyweight and natural nucleotides that would be present if said theoretical control sample were subject to chemical modification to fix the methylation status and amplification. Comparison to actual control samples is preferred over comparison to theoretical control samples.

Amplification methods such as PCR, generally produce a double stranded product, and exception being asymmetric PCR. In such cases, it is preferred to determine the mass of only one stand of the amplified product. Mass analysis of a single strand may be accomplished by any of a number of approaches. For example, the two strands may be separated and isolated, as discussed above. Another approach is to use the double stranded amplified product as template in asymmetric PCR. An additional approach is to have one of the two stands marked to enhance its detection by mass analysis as is well known in the art. Methods are well known for appending a detectable moiety or molecular "tag" such as a small organic molecule (often, a dye) to one of the two primers used in amplification. See Glen Research; Innis et al. (1990). The detectable moiety or molecular "tag" may be attached to the 5' end of one of the primers before or after synthesis or a detectably labeled may be incorporated into the primer during synthesis. There are various strategies for mass analysis of amplified products. For example, an amplified product may be analyzed directly for mass, if it can be discerned from other potential products by means, for example, of a "tag" used as described above. Alternatively, multiple products amplified in a single pot can be separated by hybridizing to specific capture probes, attached to a solid phase. Such capture probes can be similar to the probes used for amplification. A solid phase can be a "microarray biochip" or "DNA chip," where the probes are attached in an array. For example, see U.S. patents No. 5,741,644, No. 5,861,242 and No. 5,556,752. Attachment can be achieved by conventional means, for example, using VLSIPS™ technology as described in U.S. patents No. 5,143,854 and 5,561,071. Mass analysis from a microarray biochip can be readily automated for high throughput analysis as is well known in the art.

A variety of methods are known for mass analysis, including mass spectrometry, ultracentrifugation, and gel electrophoresis. Mass spectrometry (MS), a preferred approach in this context, is an analytical technique for determining the mass and structural information of any particles, including neutral atoms, molecules, clusters (i.e. , aggregates of atoms or molecules), and polymers.

Traditionally, an analyte in the vapor phase is bombarded by a high-energy beam of electrons, atoms or molecules, causing the formation of ions. The resulting ions are accelerated through a magnetic electrostatic or electrodynamic field, which confines and directs the ions into a mass analyzer, where they are separated on the basis of their mass-to-charge ratio (m/z). An ion detector measures the relative abundance of each ion and produces a spectrum of signal intensity v. m/z for each ion detected.

A mass spectrometer combines the features of ion formation or ionization, with mass analysis and ion detection. Exemplary ionization methods useful with the present invention include chemical ionization (CI), plasma discharge ionization, glow discharge ionization, electron impact ionization (El), electrospray ionization (ESI), fast-atom bombardment ionization (FAB), field ionization, laser desorption/ionization, matrix-assisted laser desorption/ionization (MALDI), laser multi-photon ionization, laser single-photon ionization, electron capture ionization, Penning ionization, plasma desorption/ionization, resonance ionization, secondary ionization and spark ionization or thermal ionization. Exemplary mass analyzer designs useful with the present invention include time-of-flight (TOF), magnetic sector, electric sector, combined magnetic/electric sector instruments, Fourier-transform ion cyclotron resonance, quadrupole, and quadrupole trap, and any combination of these or other mass spectrometric approaches in the many, so-called "hyphenated mass spectrometries." See Branee (1987). Exemplary ion detectors useful with the present invention include electron multiplier, multichannel plate microchannel plates, microsphere plates, ceratrons, and cryogenic detectors.

The mass spectrometer type is denoted by the mass analyzer design. For example, a mass spectrometer equipped with a time-of-flight mass analyzer is herein referred to as a time-of-flight mass spectrometer.

A preferred mass spectrometer design is TOF mass analysis. Ions are separated in TOF by measuring their time of flight from the ion source to an ion detector. The ions travel through an electric field-free region in a vacuum with velocities corresponding to their respective mass-to charge ratios (m/z). Accordingly, smaller m/z ions will travel through the vacuum region faster than the larger m/z ions, thereby causing a separation. The flight path of an ion in a TOF mass analyzer may be linear or reflective as is well known in the art. Exemplary TOF mass spectrometers are disclosed in U.S. patents No. 5,045,694, No. 5,160,840 and No. 5,627,369. Matrix-assisted laser desorption/ionization (MALDI) or electrospray ionization (ESI) are useful, in combination with a variety of mass analyzer designs, for analyzing the mass of amplified products as described in the methods of the invention. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) employs laser pulses focused on a small sample plate with the molecules to be analyzed, e.g., nucleic acids, embedded in either a solid or liquid matrix comprising a small, highly absorbing compound. The laser pulses transfer energy to the matrix that causes a microscopic ablation and concomitant ionization of the analyte molecules and produces a gaseous plume of intact, charged nucleic acids in single-stranded form. The ions generated by the laser pulses are accelerated to a fixed kinetic energy by a strong electric field and then made to pass through an electric field-free region in vacuum, the ions traveling with a velocity proportional to their respective mass-to-charge ratios (m/z), the smaller m/z ions travelling faster than the larger m/z ions and, therefore, producing a mass: charge-related molecular separation. Towards the end of the electric field- free region, the ions collide with a detector, thus generating a separate signal for each set of ions of a particular massxharge ratio that strikes the detector. The ions may also be detected by an electron multiplier, a micro-channel plate detector or a microsphere plate detector, where the electron multiplier is preferably provided with a conversion diode.

For analysis of an individual sample, 10 to 100 mass spectra resulting from individual laser pulses can be averaged to obtain a single composite mass spectrum with an improved signal-to-noise ratio. The mass of an ion, such as a charged nucleic acid, is measured by using its velocity to determine the mass-to-charge ratio by time-of-flight analysis. The mass of the molecule, thus, correlates directly with the time it takes to travel from the sample plate to the detector, with the entire process developing in the microsecond range. Mass spectrometry may be automated by methods well known in the art. In this way, tens to hundreds of samples per minute may be analyzed. A sample containing the amplified nucleic acid product can be embedded in a matrix or adsorbed onto a surface for MALDI mass spectrometry. The matrix preferably comprises a dye or a frozen solvent, and more preferably comprises dye diluted in a solvent. The matrix may be formed in the manner known in the art, and preferably comprises sinapinic acid, 2,5-hydroxybenzoic acid, 2-cyano-4- hydroxycinnamic acid, gentisic acid, dithranol, 2-amino-4-methyl-5-nitropyridine, 2-amino-5-nitropyridine, 6-aza-2-thiothymine, caffeic acid, 3-hydroxypicolinic acid, nicotinic acid, 2,4,6-trihydroxyacetophenone and 3-hydroxy-4- methoxybenzaldehyde. Other constituents and mixtures of constituents are also suitable, as is known in the art.

Because amplification methods such as PCR generally produce a double- stranded product, mass spectrometry analysis of a single strand may be accomplished by electron attachment ("EA"). An EA enhancing molecule can be attached to the 5' end of one of the PCR primers. Preferred EA agents are water- soluble C60 derivatives as described, for example, by Prato and Maggini (1998). A few water-soluble fulleropyrrolidines have been synthesized, and used in experiments with various enzymes. See Sijbesma et al. (1993); Schinazi et al. (1993). The fulleropyrrolidines can be readily converted to the amides, which in torn can be coupled to the 5' end of the appropriate amino-linked oligonucleotide primer. Diphenyl C60 alcohol and di-isopropyl cyclohexyl C60 alcohol derivatives have also been prepared and studied with HIV protease. See Friedman et al. 1998). Both provide ready attachment strategies to 5' amino-linked oligonucleotides (Glen Research; Sterling, Virginia).

Applications of Methylation Detection by Heavyweight Amplification The present invention can be used to detect the methylation state of any polynucleotide, whether DNA or RNA, provided that one can prepare suitable primers for amplification. Accordingly, the present invention is particularly suited for study of human genomic DNA, by virtue of the public availability of sequence information for essentially the entire human genome.

The use of heavy weight nucleotides in amplification enhances sensitivity and ease of detection, even for an entire CpG island. Thus, the present invention allows for rapid and relatively cheap interrogation of the methylation status of virtually all of the 45,000 CpG-rich islands known to exist in the human genome. The information thereby gleaned informs diagnoses and prognoses that reflects, in whole or in part, the gene-silencing pattern characterizing a particular disease. That is, the genome methylation determined via the present invention offers a disease-state indicator of exceptional selectivity, specificity, and sensitivity.

Aberrant CpG island-hypermethylation, which occurs at high frequency in tumors, can yield diagnostic information as well. Determining patient's genome methylation by means of the present invention opens the way, in a cost-effective manner, for an unprecedented early warning diagnosis of many common cancers.

In addition, epigenetic changes have been implicated in several other important diseases. These include atherosclerosis (Post et al. , 1999), Angelman syndrome (Lalande et al. , 1999), Duchenne muscular dystrophy (Yoshioka et al. , 1998) and ICF syndrome (Kondo et al. , 2000), to name a few. Knowledge gained, in accordance with the present invention, about cellular methylation fingerprints specific to a given disease also can illuminate improved therapeutic strategies. By combining disease-specific gene methylation analysis with information of the human genome, one can establish which genes are being silenced, leading to acceleration of pathogenesis. Other applications for the present invention are detailed below. Diagnosis: Inappropriate methylation changes in CpG islands is one of the earliest known stages in the development of many cancers and direct detection of these molecular aberrations using the methods disclosed herein provides an extraordinary opportunity for unprecedented early stage molecular diagnosis of cancer.

Enabling technology for improved clinical trials: The methods disclosed herein can be used to determine which individuals are afflicted with methylation dependent cancers. This can increase the success of efficacy studies in clinical trials of drugs targeting the basis for the methylation difference in the cancer. Drug candidates that effect methylation status represent the next generation of non- cytotoxic cancer therapies.

Personalized Medicine: With the high cost of cancer therapies and the wide variety of cancer types, it is important that tests be developed to determine which patients will respond to which therapies. As the new generation of methylation dependent cancer therapies advances, the assays of the present invention will be important to determine which cancer patients are afflicted with methylation defects. Such data can help the oncologist's therapy decision process, determining patient suitability for a methylation-based drug regimen.

Discovery: Detection of inappropriate methylation in CpG islands acts as an indicator of which genes are involved in the development of cancer. Furthermore, only a small fraction ( < 3 %) of all CpG islands have been investigated as to their role in cancer. Discoveries of new gene silencing events in cancer using the methods described herein will provide critical information for the initiation of new drug development strategies. Toxicology: The cost of bringing drug candidates through clinical trials that eventually fail due to toxicological problems is enormous. Thus, there is a great need for methylation detection methods to "weed-out" drugs with toxicology problems at an early (pre-clinical) stage. Applying the methods of the present invention in a high throughput screening format will be helpful in determining if a particular drug impacts the methylation status of cells or tissues. Such screening would lower the likelihood that candidate drugs with mutagentic or epigenetic-based toxicity will proceed inappropriately to clinical trials.

Thus, detecting methylation status by amplification with heavyweight nucleotides, as described above, can advance any purpose for which determination of methylation status is important. For instance, the present invention can be employed to determine whether a gene is involved in a pathology, by determining the activation state of the gene based on its extent of cytosine CpG methylation. Also, the present invention can be used for screening drug candidates, potentially for reduced toxicity, by determining the effect of the candidate compounds on the activation state of various genes, as reflected by associated cytosine methylation.

CITED PUBLICATIONS

^• Beaucage, et al. (1981), Tetrahedron Letters, 22:1859-1862.

^■ Bird, A. P. (1980), "DNA methylation and the frequency of CpG in animal DNA." Nucleic Acids Res., 8: 1499-504. ^• Branee (1987), Int. J. Mass Spectrom, 76:125-237.

. Eads, C.A., Danenberg, K.D., Kawakami, K., Saltz, L.B., Blake, C, Shibata, D., Danenberg, P.V., Laird, P.W. MethyLight (2000), "A high-throughput assay to measure DNA methylation." Nucleic Acids Res.,, 28, E32. ^• Friedman, S.H., Ganapathi, P.S., Rubin, Y. and Kenyon, G.L. (1998), "Optimizing the binding of fuUerene inhibitors of the HIV-1 protease through predicted increases in hydrophobic desolvation," J. Med. Chem., 41:2424-2429.

^• Frommer, M., McDonald, L. E., Millar, D. S., CoUis, C. M., Watt, F., Grigg, G. W., Molloy, P. L., Paul, C. L. (1992), "A genomic sequencing protocol that yields a positive display of 5- methylcytosine residues in individual DNA strands." Proc. Nat'l Acad. Sci. (USA), 89:1827-1831.

^• Gardiner-Garden, M., and Frommer, M. (1987), "CpG islands in vertebrate genomes." J. Mol. Biol., 196: 261-282. ^• Grunau, C , Schattevoy, R. Mache, N. & Rosenthal, A.; (2000), "MethTools~a toolbox to visualize and analyze DNA methylation data," Nucleic Acids Res., 28(5): 1053-8. (also see a review of pattern recognition methods NMR Biomed. 11:148-156).

^• Gyllensten et al, (1988), Proc. Nat'l Acad. Sci. (USA), 85, 7652. ^• Herman, et al. (1996), Cancer Res., 56:722.

^• Innis, M.A, Gelfand, D.H., Sninsky, J.J. and White, T.J. (1990), PCR Protocols: A Guide to Methods and Applications, Academic Press, Inc.

^• Issa et al. (1994), Nature Genet, 7:536.

^• Kondo, T., Bobek, M. P., Kuick, R., Lamb, B., Zhu, X., Narayan, A., Bourc'his, D. , Viegas-Pequignot, E., Ehrlich, M., and Hanash, S. M. (2000),

"Whole-genome methylation scan in ICF syndrome: hypomethylation of non- satellite DNA repeats D4Z4 and NBL2." Hum. Mol. Genet, 9: 597-604.

^• Lalande, M., Minassian, B. A., DeLorey, T. M., and Olsen, R. W. (1999), "Parental imprinting and Angelman syndrome." Adv. Neurol., 79: 421-429. ^• Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. (1992), "CpG islands as gene markers in the human genome." Genomics, 13: 1095-1107.

^• Maniatis et al. (1989), "Molecular Cloning: A Laboratory Manual." Cold Spring Harbor Laboratory.

^• Merlo, et al (1995), Nature Med., 1:686.

^• McCabe, P.C. (1990) "Production of single stranded DNA by asymmetric PCR." in PCR protocols: A guide to methods and applications., pages 76-83.

^• Olek et al (1996), "A modified and improved method for bisulphite based cytosine methylation analysis" Nucleic Acids Res., 24:5064-66. ^• Post, W. S„ Goldschmidt-Clermont, P. J. , Wilhide, C. C, Heldman, A. W.,

Sussman, M. S., Ouyang, P., Milliken, E. E., and Issa, J. P. (1999), "Methylation of the estrogen receptor gene is associated with aging and atherosclerosis in the cardiovascular system." Cardiovasc. Res., 43: 985-991.

^■ Prato M. and Maggini, M. (1998), "Fulleropyrrolidones: a family of full-fledged fullerene derivatives," Accounts of Chem. Res., 31:519-5526.

^• Schinazi, R.F., Sijbesma, R.P., Srdanov, G., Hill, C, Wudl, F. (1993), "Synthesis and virucidal activity of a water-soluble, configurationally stable, derivatized C60 fullerene," Antimicrob. Agents Chemother., 37: 1707-1710.

^• Sijbesma, R., Srdanov, G., Wudl, F. , Castoro, J.A., Wilkins, C, Friedman, S., Decamp, Kenyon, G.L., (1993), J. Am. Chem. Soc, 115: 6510-6512.

^• Yoshioka, M., Yorifuji, T., and Mitayoshi, I. (1998), "Skewed X inactivation in manifesting carriers of Duchenne muscular dystrophy." Clin. Genet, 53: 102-107. The invention thus has been disclosed broadly and illustrated in reference to representative embodiments described above. Those skilled in the art will recognize that various modifications can be made to the present invention without departing from the spirit and scope thereof. All publications, patent applications, and issued patents, are herein incorporated by reference to the same extent as if each individual publication, patent application or issued patent were specifically and individually indicated to be incorporated by reference in its entirety.

Claims

What is claimed is:

1. A method for determining whether one or more cytosine bases are methylated in a polynucleotide that has a nucleotide sequence containing one or more cytosine bases, comprising:

(a) contacting said polynucleotide with an agent that modifies either unmethylated cytosine or methylated cytosine in the polynucleotide, said modification causing the modified unmethylated or methylated cytosine to be replaced by another nucleic acid base upon amplification of the polynucleotide;

(b) effecting an amplification using the modified polynucleotide as template and in the presence of a mixture of nucleotides, wherein at least one of the nucleotides is a heavyweight nucleotide that has increased molecular weight compared to it's natural counterpart; and then

(c) using mass analysis to determine if the mass of said amplified product is different from a control sample having identical base sequence to said nucleotide sequence, whereby the presence or absence of said mass difference indicates if one or more cytosine bases of the polynucleotide are methylated.

2. The method of claim 1, wherein said control sample:

(i) is an actual control sample that is either fully methylated or fully unmethylated and has been subjected to steps (a) and (b) before actual mass determination; or

(ii) is a theoretical control sample that is fully methylated or fully unmethylated, the mass of which is estimated by taking into account the mass of heavyweight and normal nucleotides that would be present if said theoretical control sample were subject to steps (a) and (b).

3. The method of claim 1, wherein said control sample in step (c) is a control sample that contains a specified number of methylated cytosines.

4. The method of claim 1 wherein the one or more cytosine bases are present in the sequence as a CpG sequence motif.

5. The method of claim 3, wherein said CpG sequence motif is represented in one or more CpG islands.

6. The method of claim 1, wherein said modifying agent is a bisulfite salt.

7. The method of claim 1, wherein unmethylated cytosine is modified to uracil.

8. The method of claim 1, wherein said amplification in step (b) comprises a polymerase chain reaction.

9. The method of claim 1, wherein said heavyweight nucleotides are adenine- containing or thymine-containing.

10. The method of claim 9, wherein said one or more heavyweight nucleotides is a halogenated nucleotide.

11. The method of claim 10, wherein said halogenated nucleotide is halogenated with bromine or iodine.

12. The method of claim 11, wherein said halogenated nucleotide is 8- bromodeoxyadenosine triphosphate.

13. The method of claim 11, wherein said halogenated nucleotide is 5- bromodeoxythymidine triphosphate.

14. The method of claim 11, wherein said halogenated nucleotide is 8- iododeoxyadenosine triphosphate.

15. The method of claim 11, wherein said halogenated nucleotide is 5- iodoodeoxythymidine triphosphate.

16. The method of claim 1, wherein said polynucleotide is double-stranded and said amplification amplifies only a single strand of the polynucleotide.

17. The method of claim 16, wherein said single strand is amplified using asymmetric polymerase chain reaction.

18. The method of claim 1, wherein the amplified product is double-stranded and mass analysis is conducted on only one strand.

19. The method of claim 18, wherein mass analysis of only one strand is achieved by amplifying in step (b) with a primer chemically modified to enhance detectability.

20. The method of claim 1, wherein said mass analysis is mass spectrometry.

21. The method of claim 20, wherein mass spectrometry is conducted on only a single strand of a double stranded amplification product by amplifying in step (b) with a primer chemically modified by addition of an electron attachment-enhancing molecule.

22. The method of claim 21, wherein said electron attachment-enhancing molecule is a water-soluble C60 derivative.

23. The method of claim 20, wherein mass spectrometry is Matrix Assisted Laser Desorption/ionization Mass spectrometry.

24. The method of claim 20, wherein mass spectrometry is Electrospray Mass spectrometry.

25. The method of claim 1, for incorporating heavyweight nucleotides into a single strand of the polynucleotide, wherein in step (b), a double stranded amplification product incorporating only natural nucleotides is first produced by amplifying the modified polynucleotide in the presence of a pair of primers designed to amplify a single strand of the modified polynucleotide and in the presence of natural nucleotides, and wherein said double stranded product is used as template in a subsequent asymmetric amplification in the presence of said mixture of nucleotides wherein at least one of the nucleotides is a heavyweight nucleotide.

26. The method of claim 25, wherein said step of asymmetric amplification is achieved by adding an excess of one of the primers used to amplify the double stranded product in step (c).

27. The method of claim 25, wherein said step of asymmetric amplification is achieved by choosing an annealing temperature during amplification that favors annealing of one of the two primers used to amplify the double stranded product in step (c).