+

WO2007095038A2 - Mutations and polymorphisms of erbb2 - Google Patents

Mutations and polymorphisms of erbb2 Download PDF

Info

Publication number
WO2007095038A2
WO2007095038A2 PCT/US2007/003305 US2007003305W WO2007095038A2 WO 2007095038 A2 WO2007095038 A2 WO 2007095038A2 US 2007003305 W US2007003305 W US 2007003305W WO 2007095038 A2 WO2007095038 A2 WO 2007095038A2
Authority
WO
WIPO (PCT)
Prior art keywords
erbb2
cancer
seq
polypeptide
mutations
Prior art date
Application number
PCT/US2007/003305
Other languages
French (fr)
Other versions
WO2007095038A8 (en
Inventor
Kenneth Wayne Culver
Jian Zhu
Stan Lilleberg
Original Assignee
Novartis Ag
Novartis Pharma Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novartis Ag, Novartis Pharma Gmbh filed Critical Novartis Ag
Publication of WO2007095038A2 publication Critical patent/WO2007095038A2/en
Publication of WO2007095038A8 publication Critical patent/WO2007095038A8/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • This invention relates generally to the analytical testing of tissue samples in vitro, and more particularly to aspects of genetic mutations and polymorphisms of v-erb-b2 erythroblastic leukaemia viral oncogene homolog 2 (ERBB2).
  • ERBB2 erythroblastic leukaemia viral oncogene homolog 2
  • Theranostic tests are also useful to select subjects for treatments that are particularly likely to benefit from the treatment or to provide an early and objective indication of treatment efficacy in individual subjects, so that the treatment can be altered with a minimum of delay.
  • Theranostics are useful in clinical diagnosis and management of a variety of diseases and disorders, which include, but are not limited to, e.g., cardiovascular disease, cancer, infectious diseases, Alzheimer's disease and the prediction of drug toxicity or drug resistance.
  • Theranostic tests may be developed in any suitable diagnostic testing format, which include, but is not limited to, e.g., immunohistochemical tests, clinical chemistry, immunoassay, cell-based technologies, and nucleic acid tests.
  • v-erb-b2 erythroblastic leukaemia viral oncogene homolog 2 neuro/glioblastoma derived oncogene homolog (avian)
  • ERBB2 neuro/glioblastoma derived oncogene homolog (avian)
  • HER2 neuro/glioblastoma derived oncogene homolog (avian)
  • HER2 neuro/glioblastoma derived oncogene homolog (avian)
  • HER2 a.k.a., C-erbB-2, c-erb B2, HER2, HER-2, HER- 2/neu, MLN 19, NEU, NEU proto-oncogene, NGL, pl85erbB2, receptor tyrosine-protein kinase erbB-2 precursor, TKRl, Tyrosine kinase-type cell surface receptor HER2
  • TKRl Tyrosine kinase-type cell surface receptor HER2
  • ERBB2 encodes a 185-kDa, 1255 amino acid receptor tyrosine kinase belonging to the family of four transmembrane receptor tyrosine-kinase receptors (RTK).
  • RTK transmembrane receptor tyrosine-kinase receptors
  • This HER family of RTKs consists of four members that mediate the growth, differentiation, and survival of cells: epidermal growth factor receptor (EGFR, also called HER-I or erbB-1), HER-2 (also called erbB-2 or Neu), HER-3 and HER-4 (also called erbB-3 and erbB-4, respectively).
  • EGFR epidermal growth factor receptor
  • HER-2 also called erbB-2 or Neu
  • HER-3 and HER-4 also called erbB-3 and erbB-4, respectively.
  • the ERBB2 protein is expressed in several human organs and tissues such as normal epithelium, endometrium and ovarian epithelium and at neuromuscular level; prostate, pancreas, lung, kidney, liver, heart, haematopoietic cells.
  • ERBB2 expression is low in mononuclear cells from bone marrow, peripheral blood (PB) and mobilized PB. The higher expression has been found in cord blood-derived cells.
  • PB peripheral blood
  • Quiescent CD34+ progenitor cells from all blood sources and resting lymphocytes are ERBB2 negative, but the expression of this receptor is up-regulated during cell-cycle recruitment of progenitor cells.
  • ERBB2 phosphatidylinositol-3 kinase
  • ERBB2 mutations and altered expression levels have been found to be associated with different cancers: ERBB2 is overexpressed in 25% to 40% of several human tumours and associated with the malignancy of the disease, high mitotic index and a shorter survival time for the patient. www.infobiogen.fr/services/chromcancer/Genes/ERBB2ID 162chl 7ql 1.html: [0007] ERBB2 overexpression occurs in 25-30% of human breast cancers. It is associated with tumour aggressiveness, thus, with shorter time to relapse and lower overall survival.
  • ERBB2 protein overexpression is caused by amplification of the erbB-2 gene.
  • ERBB2 has a role in this disease of the breast, where the epidermis of the nipple is infiltrated by large neoplastic cells of glandular origin.
  • ERBB2 may be activated in the early stage of pathogenesis of cervical carcinoma in geriatric patients and is frequently amplified in squamous cell carcinoma of the uterine cervix. Further, overexpression of ERBB2 in medulloblastoma is associated with poor prognosis and metastasis and HER2-HER4 receptor heterodimerization is of particular biological significance in this disease. ERBB2 expression is reported to also be detected in blast cells from patients with hematological malignancies including acute lymphoblastic leukaemia (ALL).
  • ALL acute lymphoblastic leukaemia
  • ERBB2 Overexpression of ERBB2 has also been shown to be associated with transitional cell carcinoma of the bladder. Further, it has been shown that ERBB2 overexpression occurs in muscle-invasive urothelial carcinomas of the bladder and is associated with worse survival. . Amplifications of ERBB2 gene are also frequently linked to alterations of the TOP2A gene in bladder cancer. Overexpression of ERBB2 was also shown to occur in a significant number of colorectal cancers, being significantly associated with poor survival and related to tumour progression in colorectal cancer. E6/E7 proteins of HPV type 16 and ERBB2 cooperate to induce neoplastic transformation of primary normal oral epithelial cells.
  • ERBB2-receptor Overexpression of ERBB2-receptor is a frequent event in oral squamous cell carcinoma and is correlated with, poor survival. ERB B2 amplif ⁇ cation/overexpression has been reported to likely not play a role in the molecular pathogenesis of most gastrinomas. However, mild gene amplification is shown to occur in a subset of gastrinomas, and overexpression of this receptor is associated . with aggressiveness of the disease. ERBB2 is also correlated with tumour histological . differentiation and is associated with poor prognosis in well-differentiated gastric adenocarcinoma. A significant correlation was also observed between ERBB2 overexpression -A-
  • ERBB2 Increased ERBB2 expression was shown to contribute to the development of cholangiocarcinogenesis into an advanced stage associated with tumour metastasis.
  • overexpression of ERBB2 and COX-2 were reported to directly correlate with tumour differentiation.
  • NSCLC non-small cell lung cancer
  • Higher frequency of ERBB2 expression has been observed in samples from patients with metastatic disease at presentation and at the time of relapse, and it correlates with worse histologic response and decreased event-free survival.
  • the Cancer Genome Project and Collaborative Group sequenced the ERB B2 gene from 120 primary lung tumours and identified 4% that had mutations within the kinase domain; and in the adenocarcinoma subtype of lung cancer, 10% of cases had mutations. (Cancer Genome Project and Collaborative Group, Nature 431: 525-526, 2004). [0010] Accordingly, there is a need in the art for additional information about the relationship between ERBB2 mutations and cancer. SUMMARY OF THE INVENTION
  • the invention provides for the use of an ERBB2 modulating agent in the manufacture of a medicament for the treatment of cancer in a selected patient population.
  • the patient population is selected on the basis of the genotype of the patients at an ERBB2genetic locus indicative of efficacy of the ERBB2 modulating agent in treating cancer.
  • the cancer can be breast cancer.
  • the invention also provides an isolated polynucleotide having a sequence encoding an ERBB2mutation.
  • the ERBB2 mutations are the previously- unidentified mutations listed in TABLE 1.
  • the invention provides vectors and organisms containing the ERBB2 mutations of the invention and polypeptides encoded by polynucleotides containing the ERBB2 mutations of the invention.
  • the invention further provides a method for treating cancer in a subject.
  • the genotype or haplotype of a subject is obtained at an ERBB2 gene locus, so that the genotype and/or haplotype are indicative of a propensity of the cancer to respond to the drug.
  • an anti-cancer therapy is administered to the subject.
  • the invention provides a method for diagnosing cancer in a subject and a method for choosing subjects for inclusion in a clinical trial for determining efficacy of an ERBB2 modulating agent; in both these methods the genotype and/or haplotype of a subject is interrogated at an ERBB2 gene locus. Also provided by the invention are kits for use in determining a treatment strategy for cancer.
  • the invention also provides for the use of each of the mutations of the inventions as a drug target.
  • the various aspects of the present invention relate to polynucleotides encoding ERBB2 mutations and polymorphisms of the invention, expression vectors encoding the ERBB2 mutant polypeptides of the invention and organisms that express the ERBB2mutant/polymorphic polynucleotides and/or ERBB2mutant/polymorphic polypeptides of the invention.
  • the various aspects of the present invention further relate to diagnostic/theranostic methods and kits that use the ERBB2 mutations and/or polymorphisms of the invention to identify individuals predisposed to disease or to classify individuals and tumours with regard to drug responsiveness, side effects, or optimal drug dose.
  • the invention provides methods for compound validation and a computer system for storing and analyzing data related to the ERBB2 mutations and polymorphisms of the invention. Accordingly, various particular embodiments that illustrate these aspects follow.
  • allele means a particular form of a gene or DNA sequence at a specific chromosomal location (locus).
  • the term “antibody” includes, but is not limited to, polyclonal antibodies, monoclonal antibodies, humanized or chimaeric antibodies and biologically functional antibody fragments sufficient for binding of the antibody fragment to the protein.
  • the term “clinical response” means any or all of the following: a quantitative measure of the response, no response, and adverse response (i.e., side effects).
  • the term “clinical trial” means any research study designed to collect clinical data on responses to a particular treatment, and includes but is not limited to phase I, phase II and phase III clinical trials. Standard methods are used to define the patient population and to enrol subjects.
  • the term "effective amount" of a compound is a quantity sufficient to achieve a desired pharmacodynamic, toxicologic, therapeutic and/or prophylactic effect, for example, an amount which results in the prevention of or a decrease in the symptoms associated with a disease that is being treated, e.g., the diseases associated with ERBB2 mutant polypeptides and ERBB2 mutant polynucleotides identified herein.
  • the amount of compound administered to the subject will depend on the type and severity of the disease and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. It will also depend on the degree, severity and type of disease. The skilled artisan will be able to determine appropriate dosages depending on these and other factors.
  • an effective amount of the compounds of the present invention sufficient for achieving a therapeutic or prophylactic effect, range from about 0.000001 mg per kilogram body weight per day to about 10,000 mg per kilogram body weight per day.
  • the dosage ranges are from about 0.0001 mg per kilogram body weight per day to about 100 mg per kilogram body weight per day.
  • the compounds of the present invention can also be administered in combination with each other, or with one or more additional therapeutic compounds.
  • Glivec® (Gleevec®; imatinib) is a medication for chronic myeloid leukaemia (CML) and certain stages of gastrointestinal stromal tumours (GIST). It targets and interferes with the molecular abnormalities that drive the growth of cancer cells.' Corless CL et al., J. CHn. Oncol.22(18):3813-25 (September 15, 2004); Verweij J et al, Lancet 364(9440):! 127- 34 (September 25, 2004); Kantarjian HM et al, Blood 104(7): 1979-88 (October 1, 2004). By inhibiting multiple targets, Glivec® has potential as an anticancer therapy for several types of cancer, including leukaemia and solid tumours.
  • the aromatase inhibitor FEMARA ® is a treatment for advanced breast cancer in postmenopausal women. It blocks the use of oestrogen by certain types of breast cancer that require oestrogen to grow. Janicke F, Breast 13 Suppl 1 :S10-8 (December 2004); Mouridsen H et al, Oncologist 9(5):489-96 (2004).
  • Sandostatin® LAR® is used to treat patients with acromegaly and to control symptoms, such as severe diarrhoea and flushing, in patients with functional gastro-entero- pancreatic (GEP) tumours ⁇ e.g., metastatic carcinoid tumours and vasoactive intestinal peptide-secreting tumours [VIPomas]).
  • GEP gastro-entero- pancreatic
  • Sandostatin® LAR® regulates hormones in the body to help manage diseases and their symptoms.
  • ZOMET A® is a treatment for hypocalcaemia of malignancy (HCM)I and for the treatment of bone metastases across a broad range of tumour types. These tumours include multiple myeloma, prostrate cancer, breast cancer, lung cancer, renal cancer and other solid tumours. Rosen LS et al, Cancer 100(12):2613-21 (June 15, 2004).
  • Vatalanib (1 -[4-chloroanilino]-4-[4-pyridylmethyl] phthalazine succinate) is a multi- VEGF receptor (VEGF) inhibitor that may block the creation of new blood vessels to prevent tumour growth.
  • VEGF VEGF receptor
  • This compound inhibits all known VEGF receptor tyrosine kinases, blocking angiogenesis and lymphangiogenesis. Drevs J et ah, Cancer Res. 60:4819-4824 (2000); Wood JM et ah, Cancer Res. 60:2178-2189 (2000).
  • Vatalanib is being studied in two large, multinational, randomized, phase III, placebo-controlled trials in combination with FOLFOX- 4 in first-line and second-line treatment of patients with metastatic colorectal cancer.
  • Thomas A et ah 37th Annual Meeting of the American Society of Clinical Oncology, San Francisco, CA 5 Abstract 279 (May 12-15, 2001).
  • everolimus inhibits oncogenic signalling in tumour cells.
  • mTOR mammalian target of rapamycin
  • everolimus exhibits broad antiproliferative activity in tumour cell lines and animal models of cancer. Boulay A et ah, Cancer Res. 64:252-261 (2004).
  • everolimus also potently inhibited the proliferation of human umbilical vein endothelial cells directly indicating an involvement in angiogenesis.
  • everolimus may provide a clinical benefit to patients with cancer.
  • Everolimus is being investigated for its antitumour properties in a number of clinical studies in patients with haematological and solid tumours. Huang S & Houghton PJ, Curr. Opin. Investig. Drugs 3:295-304 (2002).
  • Gimatecan is a novel oral inhibitor of topoisomerase I (topo I). Gimatecan blocks cell division in cells that divide rapidly, such as cancer cells, which activates apoptosis. Preclinical data indicate that gimatecan is not a substrate for multidrug resistance pumps, and that it increases the drug-target interaction. De Cesare M et ah, Cancer Res. 61 :7189-7195 (2001). Phase I clinical studies indicate that the dose-limiting toxicity of gimatecan is myelosuppression.
  • Patupilone is a microtubule stabilizer.
  • Altmann K-H Curr. Opin. Chem. Biol. 5:424-431 (2001); Altmann K-H et ah, Biochim. Biophys. Acta 470:M79-M91 (2000); O'Neill V et ah, 36th Annual Meeting of the American Society of Clinical Oncology; May 19- 23, 2000; New La, LA, Abstract 829; Calvert PM et a Proceedings of the 11th National Cancer Institute-European Organization for Research and Treatment of Cancer/American Association for Cancer Research Symposium on New Drugs in Cancer Therapy; November 7-10, 2000; Amsterdam, The Netherlands, Abstract 575.
  • Patupilone blocked mitosis and induced apoptosis greater than the frequently used anticancer drug paclitaxel. Also, patupilone retained full activity against human cancer cells that were resistant to paclitaxel and other chemotherapeutic agents.
  • the somatostatin analogue pasireotide is a stable cyclohexapeptide with broad somatotropin release inhibiting factor (SRIF) receptor binding.
  • SRIF broad somatotropin release inhibiting factor
  • LBH589 By triggering apoptosis, LBH589 induces growth inhibition and regression in tumour cell lines. LBH589 is being tested in phase I clinical trials as an anticancer agent. See also, George P et al, Blood 105(4): 1768-76 (February 15, 2005).
  • AEE788 inhibits multiple receptor tyrosine kinases including EGFR, HER2, and VEGFR, which stimulate tumour cell growth and angiogenesis. Traxler P et al., Cancer Res. 64:4931-4941 (2004). In preclinical studies, AEE788 showed high target specificity and demonstrated antiproliferative effects against tumour cell lines and in animal models of cancer. AEE788 also exhibited direct antiangiogenic activity. AEE788 is currently in phase I clinical development.
  • ERBB2 modulating agent is any compound that alters (e.g., increases or decreases).
  • the expression level or biological activity level of ERBB2 polypeptide compared to the expression level or biological activity level of ERBB2 polypeptide in the absence of the ERBB2 modulating agent.
  • ERBB2 modulating agent can be a small molecule, antibody, polypeptide, carbohydrate, lipid, nucleotide, or combination thereof.
  • the ERBB2 modulating agent can be an organic compound or an inorganic compound.
  • the ERBB2 modulating agent is selected from the group consisting of: AEE788, lapatinib (GW572016), HKI-272, PD158780, PKI-166, AG879, TAK165, CI-1033, CP-654577, AG825, BMS-599626, EKB-569, PDl 53035, SUl 1925, ZM 252868, CP 127,374, SUCl 02, pertuzumab and trastuzumab.
  • expression includes but is not limited to one or more of the following: transcription of the gene into precursor mRNA; splicing and other processing of the precursor mRNA to produce mature mRNA; mRNA stability; translation of the mature mRNA into protein (including codon usage and tRNA availability); and glycosylation and/or other modifications of the translation product, if required for proper expression and function.
  • gene means a segment of DNA that contains all the information for the regulated biosynthesis of an RNA product, including promoters, exons, introns, and other untranslated regions that control expression.
  • genotype means an unphased 5' to 3' sequence of nucleotide pairs found at one or more polymorphic or mutant sites in a locus on a pair of homologous chromosomes in an individual.
  • genotype includes a full- genotype and/or a sub-genotype.
  • locus means a location on a chromosome or DNA molecule corresponding to a gene or a physical or phenotypic feature.
  • modulate or “modify” are used interchangeably herein and refer to the up-regulation or down-regulation of a target gene or a target protein.
  • modifies or modified also refers to the increase, decrease, elevation, or depression of processes or signal transduction cascades involving a target gene or a target protein (e.g., a cascade or pathway that induces growth arrest in a cell).
  • a target gene can be a gene involved in apoptosis.
  • the target gene can also encode a target protein that is involved in apoptosis.
  • Modification of the target protein e.g., a ERBB2 protein may occur when a ERBB2 modulating agent such AEE788, lapatinib (GW572016), HKI-272, PD158780, PKI-166, A ⁇ 3879, TAK165, CI-1033, CP-654577, AG825, BMS-599626, EKB-569, PD153035, SUl 1925, ZM 252868, CPl 27,374, SUC 102, pertuzumab, and/or trastnzumab that binds to the target protein.
  • the modification may directly affect the ERBB2 protein, for example modifications that result in alteration in ERBB2 protein expression (i.e., an increase or decrease).
  • the modifications may occur as an indirect effect of binding to the target protein.
  • binding of a ERBB2 modulating agent that leads to a change in downstream processes involving ERBB2, such as activation of signal transduction pathways involving apoptosis and cell proliferation can therefore be direct modifications of the target protein, or an indirect modification of a process or cascade involving the target protein.
  • Non- limiting examples of modifications includes modifications of morphological and functional processes, under-or over production or expression of proteins that, e.g., inhibit cell proliferation, cell activity, cell migration, chemotaxis and cell tumourogenicity.
  • the term "medical condition” includes, but is not limited to, any condition or disease manifested as one or more physical and/or psychological symptoms for which treatment and/or prevention is desirable, and includes previously and newly identified diseases and other disorders.
  • nucleotide pair means the two nucleotides bound to each other between the two nucleotide strands.
  • polymorphism means any sequence variant present at a frequency of >1% in a population.
  • the sequence variant may be present at a frequency significantly greater than 1% such as 5% or 10% or more.
  • the term may be used to refer to the sequence variation observed in an individual at a polymorphic site.
  • Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but need not, result in detectable differences in gene expression or protein function.
  • polynucleotide means any RNA or DNA, which may be unmodified or modified RNA or DNA.
  • Polynucleotides include, without limitation, single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, RNA that is mixture of single- and double-stranded regions, and hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions.
  • polynucleotide refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA.
  • the term polynucleotide also includes DNAs or RNAs containing one or more modified bases and DNAs or RNAs with backbones modified for stability or for other reasons.
  • the polynucleotide contains polynucleotide sequences from the ERBB2 gene.
  • polypeptide means any polypeptide comprising two or more amino acids joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres.
  • Polypeptide refers to both short chains, commonly referred to as peptides, glycopeptides or oligomers, and to longer chains, generally referred to as proteins.
  • Polypeptides may contain amino acids other than the 20 gene-encoded amino acids.
  • Polypeptides include amino acid sequences modified either by natural processes, such as post- translational processing, or by chemical modification techniques that are well known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature.
  • the polypeptide contains polypeptide sequences from the ERBB2 protein.
  • SNP nucleic acid means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus, existing as alleles. Such SNP nucleic acids are preferably from about 15 to about 500 nucleotides in length.
  • the SNP nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning.
  • the SNP nucleic acids are referred to hereafter simply as "SNPs".
  • the SNP probes according to the invention are oligonucleotides that are complementary to a SNP nucleic acid. In a particular embodiment, the SNP is in the ERBB2 gene.
  • the term "subject" means that preferably the subject is a mammal, such as a human, but can also be an animal, e.g., domestic animals (e.g., dogs, cats and the like), farm animals (e.g., cows, sheep, pigs, horses and the like) and laboratory animals (e.g., monkey (e.g., cynmologous monkey), rats, mice, guinea pigs and the like).
  • the administration of an agent or drug to a subject or patient includes self-administration and the administration by another.
  • Sequence variation in the human germline consists primarily of SNPs, the remainder being short tandem repeats (including micro-satellites), long tandem repeats (mini-satellites), and other insertions and deletions.
  • a SNP is the occurrence of nucleotide variability at a single position in the genome, in which two alternative bases occur at appreciable frequency (i.e., >1%) in the human population.
  • a SNP may occur within a gene or within intergenic regions of the genome.
  • SNPs Due to their prevalence and widespread nature, SNPs have the potential to be important tools for locating genes that are involved in human disease conditions. See e.g.,
  • An association between SNP's and/or mutations and a particular phenotype ⁇ e.g., cancer type) does not necessarily indicate or require that the SNP or mutation is causative of the phenotype. Instead, an association with a SNP may merely be due to genome proximity between a SNP and those genetic factors actually responsible for a given phenotype, such that the SNP and said genetic factors are closely linked. That is, a SNP may be in linkage disequilibrium ("LD") with the "true" functional variant. LD exists when alleles at two distinct locations of the genome are more highly associated than expected. Thus, a SNP may serve as a marker that has value by virtue of its proximity to a mutation or other DNA alteration (e.g., gene duplication) that causes a particular phenotype.
  • LD linkage disequilibrium
  • SNPs and mutations that are associated with disorders may also have a direct effect on the function of the genes in which they are located.
  • a sequence variant e.g., SNP
  • SNP may result in an amino acid change or may alter exon-intron splicing, thereby directly modifying the relevant protein, or it may exist in a regulatory region, altering the cycle of expression or the stability of the mRNA (see, e.g., Nowotny et al., Current Opinions in Neurobiology, 11:637-641 (2001)).
  • nucleic acid molecules containing the gene may be complementary double stranded molecules and thus reference to a particular site on the sense strand refers as well to the corresponding site on the complementary antisense strand. That is, reference may be made to the same polymorphic or mutant site on either strand and an oligonucleotide may be designed to hybridize specifically to either strand at a target region containing the polymorphic and/or mutant site.
  • the invention also includes single-stranded polynucleotides and mutations that are complementary to the sense strand of the genomic variants described herein.
  • SNPs and Mutations Many different techniques can be used to identify and characterize SNPs and mutations, including single- strand conformation polymorphism (SSCP) analysis, heteroduplex analysis by denaturing high-performance liquid chromatography (DHPLC), direct DNA sequencing and computational methods (Shi et al, CHn. Chem. 47:164-172 (2001)).
  • SSCP single- strand conformation polymorphism
  • DPLC denaturing high-performance liquid chromatography
  • DNA sequencing and computational methods Karlinuent DNA sequencing and computational methods.
  • the most common SNP-typing methods currently include hybridization, primer extension, and cleavage methods. Each of these methods must be connected to an appropriate detection system.
  • Detection technologies include fluorescent polarization (Chan et al., Genome Res. 9:492-499 (1999)), luminometric detection of pyrophosphate release (pyrosequencing) (Ahmadiian et al., Anal. Biochem. 280:103-10 (2000)), fluorescence resonance energy transfer (FRET)-based cleavage assays, DHPLC, and mass spectrometry (Shi, Clin Chetn 47:164-172 (2001); U.S. Pat. No. 6,300,076 Bl). Other methods of detecting and characterizing SNPs and mutations are those disclosed in U.S. Pat. Nos. 6,297,018 Bl and 6,300,063 Bl.
  • the detection of polymorphisms and mutations is detected using INVADERTM technology (available from Third Wave Technologies Inc. Madison, Wisconsin USA).
  • INVADERTM technology available from Third Wave Technologies Inc. Madison, Wisconsin USA.
  • a specific upstream "invader” oligonucleotide and a partially overlapping downstream probe together form a specific structure when bound to complementary DNA template.
  • This structure is recognized and cut at a specific site by the Cleavase enzyme, resulting in the release of the 5' flap of the probe oligonucleotide.
  • This fragment then serves as the "invader” oligonucleotide with respect to synthetic secondary targets and secondary fluorescently labelled signal probes contained in the reaction mixture. This results in specific cleavage of the secondary signal probes by the Cleavase enzyme.
  • Fluorescent signal is generated when this secondary probe (labelled with dye molecules capable of fluorescence resonance energy transfer) is cleaved.
  • Cleavases have stringent requirements relative to the structure formed by the overlapping DNA sequences or flaps and can, therefore, be used to specifically detect single base pair mismatches immediately upstream of the cleavage site on the downstream DNA strand.
  • Ryan D et al Molecular Diagnosis 4(2): 135-144 (1999) and Lyamichev V et al. Nature Biotechnology 17: 292-296 (1999), see also U.S. Pat. Nos. 5,846,717 and 6,001,567.
  • polymorphisms and mutations may also be determined using a mismatch detection technique including, but not limited to, the RNase protection method using riboprobes (Winter et al, Proc. Natl. Acad. ScL USA 82:7575 (1985); Meyers et al, Science 230:1242 (1985)) and proteins which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich P, Ann Rev Genet 25:229-253 (1991)).
  • riboprobes Winter et al, Proc. Natl. Acad. ScL USA 82:7575 (1985); Meyers et al, Science 230:1242 (1985)
  • proteins which recognize nucleotide mismatches such as the E. coli mutS protein (Modrich P, Ann Rev Genet 25:229-253 (1991)).
  • variant alleles can be identified by single strand conformation polymorphism (SSCP) analysis (Orita et al, Genomics 5:874-879 (1989); Humphries et al, in Molecular Diagnosis of Genetic Diseases, Elles R, ed. (1996) pp. 321-340) or denaturing gradient gel electrophoresis (DGGE) (Wartell et al, Nucl Acids Res. 18:2699-2706 (1990); Sheffield et al, Proc. Natl. Acad. Sd. USA 86: 232-236 (1989)).
  • SSCP single strand conformation polymorphism
  • DGGE denaturing gradient gel electrophoresis
  • a polymerase-mediated primer extension method may also be used to identify the polymorphisms/mutations.
  • multiple polymorphic and/or mutant sites may be investigated by simultaneously amplifying multiple regions of the nucleic acid using sets of allele-specif ⁇ c primers as described in WO 89/10414.
  • the invention provides methods and compositions for haplotyping and/or genotyping the genetic polymorphisms (and possibly mutations) in an individual.
  • the terms "genotype” and “haplotype” mean the genotype or haplotype containing the nucleotide pair or nucleotide, respectively, that is present at one or more of the novel polymorphic (or mutant) sites described herein and may optionally also include the nucleotide pair or nucleotide present at one or more additional polymorphic (or mutant) sites in the gene.
  • the additional polymorphic (and mutant) sites may be currently known polymorphic/mutant sites or sites that are subsequently discovered.
  • compositions contain oligonucleotide probes and primers designed to specifically hybridize to one or more target regions containing, or that are adjacent to, a polymorphic or mutant site.
  • Oligonucleotide compositions of the invention are useful in methods for genotyping and/or haplotyping a gene in an individual.
  • the methods and compositions for establishing the genotype or haplotype of an individual at the novel polymorphic/mutant sites described herein are useful for studying the effect of the polymorphisms and mutations in the aetiology of diseases affected by the expression and function of the protein, studying the efficacy of drugs targeting, predicting individual susceptibility to diseases affected by the expression and function of the protein and predicting individual responsiveness to drugs targeting the gene product.
  • Some embodiments of the invention contain two or more differently labelled genotyping oligonucleotides, for simultaneously probing the identity of nucleotides at two or more polymorphic or mutant sites. It is also contemplated that primer compositions may contain two or more sets of allele-specific primer pairs to allow simultaneous targeting and amplification of two or more regions containing a polymorphic or mutant site. [0069] Genotyping oligonucleotides of the invention may be immobilized on or synthesized on a solid surface such as a microchip, bead, or glass slide (see, e.g., WO 98/20020 and WO 98/20019).
  • Immobilized genotyping oligonucleotides may be used in a variety of polymorphism and mutation detection assays, including but not limited to probe hybridization and polymerase extension assays.
  • Immobilized genotyping oligonucleotides of the invention may comprise an ordered array of oligonucleotides designed to rapidly screen a DNA sample for polymorphisms and mutations in multiple genes at the same time.
  • An allele-specific oligonucleotide primer of the invention has a 3' terminal nucleotide, or preferably a 3' penultimate nucleotide, that is complementary to only one nucleotide of a particular SNP, thereby acting as a primer for polymerase-mediated extension only if the allele containing that nucleotide is present.
  • Allele-specific oligonucleotide (ASO) primers hybridizing to either the coding or noncoding strand are contemplated by the invention.
  • An ASO primer for detecting gene polymorphisms and mutations can be developed using techniques known to those of skill in the art.
  • genotyping oligonucleotides of the invention hybridize to a target region located one to several nucleotides downstream of one of the novel polymorphic or mutant sites identified herein. Such oligonucleotides are useful in polymerase-mediated primer extension methods for detecting one of the novel polymorphisms or mutations described herein and therefore such genotyping oligonucleotides are referred to herein as "primer- extension oligonucleotides”.
  • the 3 '-terminus of a primer- extension oligonucleotide is a deoxynucleotide complementary to the nucleotide located immediately adjacent to the polymorphic/mutant site.
  • a genotyping method of the invention involves isolating from an individual a nucleic acid mixture comprising at least one copy of the gene of interest and/or a fragment or flanking regions thereof, and determining the identity of the nucleotide pair at one or more of the polymorphic/mutant sites in the nucleic acid mixture.
  • the two "copies" of a germline gene in an individual may be the same on each allele or may be different on each allele.
  • the genotyping method comprises determining the identity of the nucleotide pair at each polymorphic and mutant site.
  • the nucleic acid mixture is isolated from a biological sample taken from the individual, such as a blood sample, tumour or tissue sample.
  • tissue samples include whole blood, tumour or as part of any tissue type, semen, saliva, tears, urine, faecal material, sweat, buccal smears, skin and hair.
  • the nucleic acid mixture may be comprised of genomic DNA, mRNA, or cDNA and, in the latter two cases, the biological sample must be obtained from an organ in which the gene may be expressed.
  • mRNA or cDNA preparations would not be used to detect polymorphisms or mutations located in introns or in 5' and 3' nontranscribed regions.
  • a gene fragment If a gene fragment is isolated, it must usually contain the polymorphic and/or mutant sites to be genotyped. Exceptions can include mutations leading to truncation of the gene where a specific polymorphism may be lost. In these cases, the specific DNA alterations are determined by assessing the flanking sequences of the gene and underscore the need to specifically look for both polymorphisms and mutations.
  • a haplotype pair is determined for an individual by identifying the phased sequence of nucleotides at one or more of the polymorphic/mutant sites in each copy of the gene that is present in the individual.
  • the haplotyping method comprises identifying the phased sequence of nucleotides at each polymorphic/mutant site in each copy of the gene.
  • the identifying step is preferably performed with each copy of the gene being placed in separate containers. However, if the two copies are labelled with different tags, or are otherwise separately distinguishable or identifiable, it is possible in some cases to perform the method in the same container.
  • first and second copies of the gene are labelled with different first and second fluorescent dyes, respectively, and an allele-specific oligonucleotide labelled with yet a third different fluorescent dye is used to assay the polymorphic/mutant sites, then detecting a combination of the first and third dyes would identify the polymorphism or mutation in the first gene copy, while detecting a combination of the second and third dyes would identify the polymorphism or mutation in the second gene copy.
  • the identity of a nucleotide (or nucleotide pair) at a polymorphic and/or mutant site may be determined by amplifying a target region containing the polymorphic and/or mutant sites directly from one or both copies of the gene, or fragments thereof, and sequencing the amplified regions by conventional methods. It will be readily appreciated by the skilled artisan that only one nucleotide will be detected at a polymorphic or mutant site in individuals who are homozygous at that site, while two different nucleotides will be detected if the individual is heterozygous for that site.
  • the polymorphism or mutation may be identified directly, known as positive-type identification, or by inference, referred to as negative-type identification.
  • a site may be positively determined to be either guanine or cytosine for all individuals homozygous at that site, or both guanine and cytosine, if the individual is heterozygous at that site.
  • the site may be negatively determined to be not guanine (and thus cytosine/cytosine) or not cytosine (and thus guanine/guanine).
  • the target regions may be amplified using any oligonucleotide-directed amplification method, including but not limited to polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • LCR ligase chain reaction
  • OLA oligonucleotide ligation assay
  • Oligonucleotides useful as primers or probes in such methods should specifically hybridize to a region of the nucleic acid that contains or is adjacent to the polymorphic/mutant site.
  • the oligonucleotides are between 10 and 35 nucleotides in length and preferably, between 15 and 30 nucleotides in length. Most preferably, the oligonucleotides are 20 to 25 nucleotides long. The exact length of the oligonucleotide will depend on many factors that are routinely considered and practiced by the skilled artisan.
  • nucleic acid amplification procedures may be used to amplify the target region including transcription-based amplification systems (U.S. Pat. No. 5,130,238; EP 329,822; U.S. Pat. No. 5,169,766, published PCT patent application WO 89/06700) and isothermal methods (Walker et al., Proc. Natl. Acad. Sci. USA 89: 392-396 (1992)).
  • a polymorphism or mutation in the target region may be assayed before or after amplification using one of several hybridization-based methods known in the art. Typically, allele-specific oligonucleotides are utilized in performing such methods.
  • the allele-specific oligonucleotides may be used as differently labelled probe pairs, with one member of the pair showing a perfect match to one variant of a target sequence and the other member showing a perfect match to a different variant.
  • more than one polymorphic/mutant site may be detected at once using a set of allele-specific oligonucleotides or oligonucleotide pairs.
  • the members of the set have melting temperatures within 5°C, and more preferably within 2°C, of each other when hybridizing to each of the polymorphic or mutant sites being detected.
  • Hybridizing Allele-Specific Oligonucleotide to a Target Gene Hybridization of an allele-specific oligonucleotide to a target polynucleotide may be performed with both entities in solution, or such hybridization may be performed when either the oligonucleotide or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking, baking, etc.
  • the genotype or haplotype for the gene of an individual may also be determined by hybridization of a nucleic sample containing one or both copies of the gene to nucleic acid arrays and subarrays such as described in WO 95/11995.
  • the arrays would contain a battery of allele-specif ⁇ c oligonucleotides representing each of the polymorphic or mutant sites to be included in the genotype or haplotype.
  • the present invention provides a method for determining the frequency of a genotype or haplotype in a population.
  • the method comprises determining the genotype or the haplotype for a gene present in each member of the population, wherein the genotype or haplotype comprises the nucleotide pair or nucleotide detected at one or more of the polymorphic sites in the gene and mutations identified in the region, and calculating the frequency at which the genotype or haplotype is found in the population.
  • the population may be a reference population, a family population, a same sex population, a population group, or a trait population (e.g., a group of individuals exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment).
  • a trait population e.g., a group of individuals exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment.
  • the trait is susceptibility to a disease, severity of a disease, the staging of a disease or response to a drug.
  • Such methods have applicability in developing diagnostic tests and therapeutic treatments. for all pharmacogenetic applications where there is the potential for an association between a genotype and a treatment outcome, including efficacy measurements, PD measurements, PK measurements and side effect measurements.
  • the frequency data for the reference and/or trait populations. are obtained by accessing previously determined frequency, data, which may be in written or electronic form.
  • the frequency data may be present in a database that is accessible by a computer.
  • the frequencies of the genotypes or haplotypes of interest in the reference and trait populations are compared.
  • the frequencies of all genotypes and/or haplotypes observed in the • populations are compared. If a particular genotype or haplotype for the gene is more frequent in the trait population than in the reference population at a statistically significant amount, then the trait is predicted to be associated with that genotype or haplotype.
  • the haplotype frequency data for different ethnogeographic groups are examined to determine whether they are consistent with Hardy- Weinberg equilibrium.
  • Hartl DL et al Principles of Population Genomics, 3rd Ed. (Sinauer Associates, Sunderland, MA, 1997).
  • a statistically significant difference between the observed and expected haplotype frequencies could be due to one or more factors including significant inbreeding in the population group, strong selective pressure on the gene, sampling bias, and/or errors in the genotyping process. If large deviations from ⁇ ardy- Weinberg equilibrium are observed in an ethnogeographic group, the number of individuals in that group can be increased to see if the deviation is due to a sampling bias. If a larger sample size does not reduce the difference between observed and expected haplotype pair frequencies, then one may wish to consider haplotyping the individual using a direct haplotyping method such as, for example, CLASPER SystemTM technology (U.S. Pat. No. 5,866,404), SMD, or allele-specific long-range PCR (Michalotos-Beloin et ah, Nucl. Acids Res. 24: 4841-4843 (1996)).
  • CLASPER SystemTM technology U.S. Pat. No. 5,866,404
  • SMD SMD
  • the assigning step involves performing the following analysis. First, each of the possible haplotype pairs is compared to the haplotype pairs in the reference population. Generally, only one of the haplotype pairs in the reference population matches a possible haplotype pair and that pair is assigned to the individual. Occasionally, only one haplotype represented in the reference haplotype pairs is consistent with a possible haplotype pair for an individual, and in such cases the individual is assigned a haplotype pair containing this known haplotype and a new haplotype derived by subtracting the known haplotype from the possible haplotype pair.
  • the individual is preferably haplotyped using a direct molecular haplotyping method such as, for example, those discussed supra.
  • statistical analysis is performed by the use of standard ANOVA tests with a Bonferoni correction and/or a bootstrapping method that simulates the genotype phenotype correlation many times and calculates a significance value. When many polymorphisms and/or mutations are being analyzed, a calculation may be performed to correct for a significant association that might be found by chance.
  • the trait of interest is a clinical response exhibited by a patient to some therapeutic treatment, for example, response to a drug targeting or to a therapeutic treatment for a medical condition.
  • a detectable genotype or haplotype that is in linkage disequilibrium with a genotype or haplotype of interest may be used as a surrogate marker.
  • a genotype that is in linkage disequilibrium with another genotype is indicated where a particular genotype or haplotype for a given gene is more frequent in the population that also demonstrates the potential surrogate marker genotype than in the reference population. If the frequency is statistically significant, then the marker genotype is predictive of that genotype or haplotype, and can be used as a surrogate marker.
  • genotype or haplotype data is obtained on the clinical responses exhibited by a population of individuals who received the treatment, hereinafter the "clinical population".
  • This clinical data may be obtained by analyzing the results of a clinical trial that has already been previously conducted and/or by designing and carrying out one or more new clinical trials.
  • the individuals included in the clinical population be graded for the existence of the medical condition of interest. This grading of potential patients could employ a standard physical exam or one or more lab tests. Alternatively, grading of patients could use genotyping or haplotyping for situations where there is a strong correlation between haplotype pair and disease susceptibility or severity.
  • the therapeutic treatment of interest is administered to each individual in the trial population, and each individual's response to the treatment is measured using one or more predetermined criteria. It is contemplated that in many cases, the trial population will exhibit a range of responses, and that the investigator may choose more than one responder groups ⁇ e.g., low, medium, high) made up by the various responses. In addition, the gene for each individual in the trial population is genotyped and/or haplotyped, which may be done before or after administering the treatment.
  • a second method for finding correlations between genotype and haplotype content and clinical responses uses predictive models based on error-minimizing optimization algorithms, one of which is a genetic algorithm.
  • Judson R Genetic Algorithms and Their Uses in Chemistry, in Reviews in Computational Chemistry, Vol. 10, Lipkowitz KB & Boyd DB, eds. (VCH Publishers, New York, 1997) pp. 1-73.
  • Simulated annealing Press et al, Numerical Recipes in C: The Art of Scientific Computing, Ch. 10 (Cambridge University Press, Cambridge, 1992)
  • neural networks (Rich E & Knight K, Artificial Intelligence, 2nd Edition, Ch.
  • Correlations may also be analyzed using analysis of variation (ANOVA) techniques to determine how much of the variation in the clinical data is explained by different subsets of the polymorphic and mutant sites in the gene.
  • ANOVA is used to test hypotheses about whether a response variable is caused by or correlates with one or more traits or variables that can be measured (Fisher & vanBelle, supra, Ch. 10).
  • correlations between individual response and genotype or haplotype content are created. Correlations may be produced in several ways. In one method, individuals are grouped by their genotype or haplotype (or haplotype pair) (also referred to as a polymorphism/mutation group), and then the averages and standard deviations of clinical responses exhibited by the members of each polymorphism/mutation group are calculated.
  • the skilled artisan that predicts clinical response as a function of genotype or haplotype content may readily construct a mathematical model.
  • the identification of an association between a clinical response and a genotype or haplotype (or haplotype pair) for the gene may be the basis for designing a diagnostic method to determine those individuals who will or will not respond to the treatment, or alternatively, will respond at a lower level and thus may require more treatment, i.e., a greater dose of a drug or suffer an adverse reaction.
  • the diagnostic method may take one of several forms: for example, a direct DNA test ⁇ i.e., genotyping or haplotyping one or more of the polymorphic/mutant sites in the gene), a serological test, or a physical exam measurement. The only requirement is that there be a good correlation between the diagnostic test results and the underlying genotype or haplotype. In a preferred embodiment, this diagnostic method uses the predictive genotyping/haplotyping method described above.
  • Genotypes and haplotypes that correlate with efficacious drug responses will be used to select patients for therapy of existing diseases.
  • Genotypes and haplotypes that correlate with adverse consequences will be used to either modify how the drug is administered (e.g., dose, schedule or in combination with other drugs) or eliminated as an option.
  • the invention also provides a computer system for storing and displaying polymorphism and mutation data determined for the gene.
  • the computer system comprises a computer processing unit, a display, and a database containing the polymorphism/mutation data.
  • the polymorphism/mutation data includes the polymorphisms, mutations, the genotypes and the haplotypes identified for a given gene in a reference population.
  • the computer system is capable of producing a display showing haplotypes organized according to their evolutionary relationships.
  • a computer may implement any or all analytical and mathematical operations involved in practicing the methods of the present invention.
  • the computer may execute a program that generates views (or screens) displayed on a display device and with which the user can interact to view and analyze large amounts of information relating to the gene and its genomic variation, including chromosome location, gene structure, and gene family, gene expression data, polymorphism data, mutation data, genetic sequence data, and clinical population data (e.g.. data on ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations).
  • the polymorphism and mutation data described herein may be stored as part of a relational database (e.g., an instance of an Oracle database or a set of ASCII flat files). These polymorphism and mutation data may be stored on the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more other storage devices accessible by the computer. For example, the data may be stored on one or more databases in communication with the computer via a network.
  • a relational database e.g., an instance of an Oracle database or a set of ASCII flat files.
  • the invention provides SNP and mutation probes, which are useful in classifying subjects according to their types of genetic variation.
  • the SNP and mutation probes according to the invention are oligonucleotides, which discriminate between SNPs or mutations and the wild-type sequence in conventional allelic discrimination assays.
  • the oligonucleotides according to this aspect of the invention are complementary to one allele of the SNP/mutant nucleic acid, but not to any other allele of the SNP/Mutant nucleic acid. Oligonucleotides according to this embodiment of the invention can discriminate between SNPs and mutations in various ways.
  • kits of the Invention provides nucleic acid and polypeptide detection kits useful for haplotyping and/or genotyping the genes in an individual. Such kits are useful for classifying individuals for the purpose of classifying individuals. Specifically, the invention encompasses kits for detecting the presence of a polypeptide or nucleic acid corresponding to a marker of the invention in a biological sample, e.g., any tissue or bodily fluid including, but not limited to, serum, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascites fluid or blood, and including biopsy samples of body tissue.
  • a biological sample e.g., any tissue or bodily fluid including, but not limited to, serum, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascites fluid or blood, and including biopsy samples of body tissue.
  • the kit can comprise a labelled compound or agent capable of detecting a polypeptide or an mRNA encoding a polypeptide corresponding to a marker of the invention in a biological sample and means for determining the amount of the polypeptide or mRNA in the sample, e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA or mRNA encoding the polypeptide.
  • Kits can also include instructions for interpreting the results obtained using the kit.
  • the invention provides a kit comprising at least two genotyping oligonucleotides packaged in separate containers.
  • the kit may also contain other components such as hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate container.
  • the kit may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer extension mediated by the polymerase, such as in the case of PCR.
  • such kit may further comprise a DNA sample collecting means.
  • the genotyping primer composition may comprise at least two sets of allele specific primer pairs.
  • the two genotyping oligonucleotides are packaged in separate containers.
  • the kit can comprise, e.g., (1) a first antibody, e.g., attached to a solid support, which binds to a polypeptide corresponding to a marker or the invention; and, optionally; (2) a second, different antibody which binds to either the polypeptide or the first antibody and is conjugated to a detectable label.
  • the kit can comprise, e.g., (1) an oligonucleotide, e.g., a detectably-labelled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a polypeptide corresponding to a marker of the invention; or (2) a pair of primers useful for amplifying a nucleic acid molecule corresponding to a marker of the invention.
  • the kit can also comprise, e.g , a buffering agent, a preservative or a protein- stabilizing agent.
  • the kit can further comprise components necessary for detecting the detectable-label, e.g., an enzyme or a substrate.
  • the kit can also contain a control sample or a series of control samples, which can be assayed and compared to the test sample.
  • Each component of the kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.
  • the present invention includes one or more polynucleotides encoding mutant or polymorphic polypeptides, including degenerate variants thereof.
  • the invention also encompasses allelic variants of the same, that is, naturally occurring alternative forms of the isolated polynucleotides that encode mutant polypeptides that are identical, homologous or related to those encoded by the polynucleotides.
  • non-naturally occurring variants may be produced by mutagenesis techniques or by direct synthesis techniques well known in the art. Accordingly, nucleic acid sequences capable of hybridizing at low stringency with any nucleic acid sequences encoding mutant polypeptide of the present invention are considered to be within the scope of the invention.
  • a typical prehybridization, hybridization, and wash protocol is as follows: (1) prehybridization: incubate nitrocellulose filters containing the denatured target DNA for 3-4 hours at 55 0 C in SxDenhardt's solution, 6xSSC (2OxSSC consists of 175 g NaCl, 88.2 g sodium citrate in 800 ml H 2 O adjusted to pH.
  • Recombinant Expression Vectors Another aspect of the invention includes vectors containing one or more nucleic acid sequences encoding a mutant or polymorphic polypeptide.
  • many conventional techniques in molecular biology, microbiology and recombinant DNA are used. These techniques are well known and are explained in, e.g., Current Protocols in Molecular Biology, VoIs. I-III, Ausubel, ed. (1997); Sambrook et a!., Molecular Cloning: A Laboratory Manual, 2" d Edition. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989); Glover DN, DNA Cloning: A Practical Approach, VoIs.
  • the nucleic acid containing all or a portion of the nucleotide sequence encoding the polypeptide is inserted into an appropriate cloning vector, or an expression vector (i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence) by recombinant DNA techniques well known in the art and as detailed below.
  • an expression vector i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence
  • expression vectors useful in recombinant DNA techniques are often in the form of plasmids.
  • plasmid and "vector” can be used interchangeably as the plasmid is the most commonly used form of vector.
  • the invention is intended to include such other forms of expression vectors that are not technically plasmids.
  • viral vectors e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses
  • Such viral vectors permit infection of a subject and expression in that subject of a compound. Becker et al, Meth. Cell Biol. 43: 161 89 (1994).
  • the recombinant expression vectors of the invention comprise a nucleic acid encoding a mutant or polymorphic polypeptide in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory sequences, selected on the basis of the host cells to be used for expression that is operatively linked to the nucleic acid sequence to be expressed.
  • "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory sequences in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
  • the expression vectors of the invention can be introduced into host cells to thereby produce polypeptides or peptides, incl ⁇ ding fusion polypeptides, encoded by nucleic acids as described herein (e.g., mutant polypeptides and mutant-derived fusion polypeptides, etc.).
  • mutant and Polymorphic Polypeptide-Expressing Host Cells Another aspect of the invention pertains to mutant and polymorphic polypeptide-expressing host cells, which contain a nucleic acid encoding one or more mutant/polymorphic polypeptides of the invention.
  • the desired isogene may be introduced into a host cell in a vector such that the isogene remains extrachromosomal. In such a situation, the gene will be expressed by the cell from the extrachromosomal location.
  • the isogene is introduced into a cell in such a way that it recombines with the endogenous gene present in the cell.
  • Such recombination requires the occurrence of a double recombination event, thereby resulting in the desired gene polymorphism or mutation.
  • Vectors for the introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector or vector construct may be used in the invention. Methods such as electroporation, particle bombardment, calcium phosphate co-precipitation and viral transduction for introducing DNA into cells are known in the art; therefore, the choice of method may lie with the competence and preference of the skilled practitioner.
  • the recombinant expression vectors of the invention can be designed for expression of mutant polypeptides in prokaryotic or eukaryotic cells.
  • mutant/polymorphic polypeptides can be expressed in bacterial cells such as Escherichia coli (E. col ⁇ ), insect cells (using baculovirus expression vectors), fungal cells, e.g., yeast, yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990).
  • the recombinant expression vector can be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase.
  • the SMP2 promoter is useful in the expression of polypeptides in smooth muscle cells, Qian et ah, Endocrinology 140(4): 1826 (1999).
  • Fusion vectors add a number of amino acids to a polypeptide encoded therein, usually to the amino terminus of the recombinant polypeptide.
  • Such fusion vectors typically serve three purposes: (i) to increase expression of recombinant polypeptide; (ii) to increase the solubility of the recombinant polypeptide; and (iii) to aid in the purification of the recombinant polypeptide by acting as a ligand in affinity purification.
  • a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant polypeptide to enable separation of the recombinant polypeptide from the fusion moiety subsequent to purification of the fusion polypeptide.
  • enzymes, and their cognate recognition sequences include Factor Xa, thrombin and enterokinase.
  • Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, Gene 67: 31 40 (1988)), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S transferase (GST), maltose E binding polypeptide, or polypeptide A, respectively, to the target recombinant polypeptide.
  • GST glutathione S transferase
  • suitable inducible non fusion E. coli expression vectors include pTrc (Amrann et al , Gene 69:301 315 (1988)) and pET 1 Id (Studier et al.
  • One strategy to maximize recombinant polypeptide expression in E. coli is to express the polypeptide in host bacteria with an impaired capacity to proteolytically cleave the recombinant polypeptide. See. e.g... Gottesman, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) 1 19 128.
  • Another strategy is to alter the nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that the individual codons for each amino acid are those preferentially utilized in the expression host, e.g., E.
  • mutant/polymorphic polypeptide expression vector is a yeast expression vector.
  • yeast Saccharomyces cerivisiae examples include pYepSecl (Baldari et al, EMBO J. 6: 229 234 (1987)), pMFa (Kurjan & Herskowitz, Cell 30: 933 943 (1982)), pJRY88 (Schultz et al, Gene 54: 1 13 123 (1987)), pYES2 (InVitrogen Corporation, San Diego, Calif., USA), and picZ (InVitrogen Corp, San Diego, Calif., USA).
  • mutant polypeptide can be expressed in insect cells using baculovirus expression vectors.
  • Baculovirus vectors available for expression of polypeptides in cultured insect cells include the pAc series (Smith et al, MoI Cell Biol. 3: 2156 2165 (1983)) and the pVL series (Lucklow & Summers, Virology 170: 31 39 (1989)).
  • a nucleic acid of the invention is expressed in mammalian cells using a mammalian expression vector.
  • mammalian expression vectors include pCDM8 (Seed, Nature 329: 842 846 (1987)) and pMT2PC (Kaufman et al, EMBO J. 6: 187 195 (1987)).
  • the recombinant mammalian expression vector is capable of directing expression of the nucleic acid preferentially in a particular cell type (e.g., tissue specific regulatory elements are used to express the nucleic acid).
  • tissue specific regulatory elements are known in the art.
  • suitable tissue specific promoters include the albumin promoter (liver specific; Pinkert, et al., Genes Dev 1 : 268 277 (1987)), lymphoid specific promoters (Calame Sc Eaton, Adv. Immunol. 43: 235 275 (1988)), in particular promoters of T cell receptors (Winoto & Baltimore, EMBO J.
  • promoters are also encompassed, e.g., the murine hox promoters (Kessel & Gruss, Science 249: 374 379 (1990)) and the ⁇ -fetoprotein promoter (Campes & Tilghman, Genes Dev. 3: 537 546 (1989)).
  • the invention further provides a recombinant expression vector comprising a DNA molecule of the invention cloned into the expression vector in an antisense orientation. That is, the DNA molecule is operatively linked to a regulatory sequence in a manner that allows for expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to a mutant polypeptide mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the antisense orientation can be chosen that direct the continuous expression of the antisense RNA molecule in a variety of cell types, for instance viral promoters and/or enhancers, or regulatory sequences can be chosen that direct constitutive, tissue specific or cell type specific expression of antisense RNA.
  • the antisense expression vector can be in the form of a recombinant plasmid, phagemid or attenuated virus in which antisense nucleic acids are produced under the control of a high efficiency regulatory region, the activity of which can be determined by the cell type into which the vector is introduced.
  • a high efficiency regulatory region the activity of which can be determined by the cell type into which the vector is introduced.
  • a gene that encodes a selectable marker (e.g., resistance to antibiotics) is generally introduced into the host cells along with the gene of interest.
  • selectable markers include those that confer resistance to drugs, such as G418, hygromycin and methotrexate.
  • Nucleic acid encoding a selectable marker can be introduced into a host cell on the same vector as that encoding mutant polypeptide or can be introduced on a separate vector.
  • Transgenic Animals Recombinant organisms, i.e., transgenic animals, expressing a variant gene of the invention are prepared using standard procedures known in the art. Transgenic animals carrying the constructs of the invention can be made by several methods known to those having skill in the art. See, e.g , U.S. Pat. No. 5.610,053 and "The Introduction of Foreign Genes into Mice" and the cited references therein, in: Recombinant DNA, Watson JD, Gilman M, Witkowski J & Zoller M, eds. (W. H. Freeman and Company, New York) pp. 254-272.
  • Transgenic animals stably expressing a human isogene and producing human protein can be used as biological models for studying diseases related to abnormal expression and/or activity, and for screening and assaying various candidate drugs, compounds, and treatment regimens to reduce the symptoms or effects of these diseases.
  • Characterizing Gene Expression Level Methods to detect and measure mRNA levels (i.e., gene transcription level) and levels of polypeptide gene expression products (i.e., gene translation level) are well-known in the art and include the use of nucleotide microarrays and polypeptide detection methods involving mass spectrometers, reverse-transcription and amplification and/or antibody detection and quantification techniques. See also, Strachan T & Read A, Human Molecular Genetics, 2 nd Edition.
  • RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells. See, e.g., Ausubel et al, Ed., Ciirr. Prot.
  • the level of the mRNA expression product of the target gene is determined.
  • Methods to measure the level of a specific mRNA are well-known in the art and include Northern blot analysis, reverse transcription PCR and real time quantitative PCR or by hybridization to a oligonucleotide array or microarray.
  • the determination of the level of expression may be performed by determination of the level of the protein or polypeptide expression product of the gene in body fluids or tissue samples including but not limited to blood or serum. Large numbers of tissue samples can readily be processed using techniques well-known to those of skill in the art.
  • the isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to. Southern or Northern analyses, PCR analyses and probe arrays.
  • One preferred diagnostic method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected.
  • the nucleic acid probe can be, e.g., a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an mRNA or genomic DNA encoding a marker of the present invention.
  • Other suitable probes for use in the diagnostic assays of the invention are described herein. Hybridization of an mRNA with the probe indicates that the marker in question is being expressed.
  • the probes are immobilized on a solid surface and the mRNA is contacted with the probes, for example, in an Affymetrix gene chip array (Affymetrix, Calif. USA 1 ).
  • An alternative method for determining the level of mRNA corresponding to a marker of the present invention in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in U.S. Pat. No. " 4,683,202); ligase chain reaction (Barany et al, Proc. Natl. Acad. Sci. USA 88:189-193 (1991)) self-sustained sequence replication (Guatelli et al, Proc. Natl. Acad. Sci.
  • amplification primers are defined as being a pair of nucleic acid molecules that can anneal to 5' or 3' regions of a gene (plus and minus strands, respectively, or vice-versa) and contain a short region in between.
  • amplification primers are from about 10-30 nucleotides in length and flank a region from about 50-200 nucleotides in length.
  • TAQMAN® PE Applied Biosystems, Foster City, Calif, USA
  • AMPLITAQ GOLDTM DNA polymerase exploits the 5' nuclease activity of AMPLITAQ GOLDTM DNA polymerase to cleave a specific form of probe during a PCR reaction.
  • This is referred to as a TAQMANTM probe. See Luthra et al, Am. J. Pathol 153 : 63-68 (1998); Kuimelis et al , Nucl Acids Symp. Ser. 37: 255-256 (1997); and Mullah et al, Nucl Acids Res. 26(4): 1026- 1031 (1998)).
  • cleavage of the probe separates a reporter dye and a quencher dye, resulting in increased fluorescence of the reporter.
  • the accumulation of PCR products is detected directly by monitoring the increase in fluorescence of the reporter dye. Heid et al, Genome Res. 6(6): 986-994 (1996)). The higher the starting copy number of nucleic acid target, the sooner a significant increase in fluorescence is observed. See Gibson, Heid & Williams et al, Genome Res. 6: 995-1001 (1996).
  • cDNA pools such as by sequencing sufficient bases, e.g., 20-50 bases, in each of multiple cDNAs to identify each cDNA, or by sequencing short tags, e.g., 9-10 bases, which are generated at known positions relative to a defined mRNA end pathway pattern. See, e.g., Velculescu, Science 270: 484-487 (1995).
  • the cDNA levels in the samples are quantified and the mean, average and standard deviation of each cDNA is determined using by standard statistical means well-known to those of skill in the art. Norman T.J. Bailey, Statistical Methods In Biology, 3rd Edition (Cambridge University Press, 1995).
  • the probe is an antibody that recognizes the expressed protein.
  • a variety of formats can be employed to determine whether a sample contains a target protein that binds to a given antibody.
  • Immunoassay methods useful in the detection of target polypeptides of the present invention include, but are not limited to, e.g., dot blotting, western blotting, protein chips, competitive and non-competitive protein binding assays, inimunohistochemistry, enzyme-linked immunosorbant assays (ELISA), fluorescence activated cell sorting (FACS), and others commonly used and widely- described in scientific and patent literature, and many employed commercially.
  • a skilled artisan can readily adapt known protein/antibody detection methods for use in determining whether cells express a marker of the present invention and the relative concentration of that specific polypeptide expression product in blood or other body tissues.
  • Proteins from individuals can be isolated using techniques that are well-known to those of skill in the art. The protein isolation methods employed can, e.g., be such as those described in Harlow Sc Lane, Antibodies: A Laboratory Manual (Cold Spring Harbor Laboratory Press. Cold Spring Harbor, New York, 1988)).
  • Monoclonal antibodies which are homogeneous populations of antibodies to a particular antigen, may be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique of Kohler & Milstein, Nature 256: 495-497 (1975); and U.S. Pat. No. 4,376,1 10; the human B-cell hybridoma technique of Kosbor et al, Immunol. Today 4: 72 (1983); Cole et al.. Proc. Natl. Acad. ScL USA SO: 2026-2030 (1983); and the EBV- hybridoma technique of Cole et ah. Monoclonal Antibodies and Cancer Therapy (Alan R. Liss, Inc., 1985) pp. 77-96.
  • chimaeric antibodies are derived from different animal species, such as those having a variable or hypervariable region derived form a murine mAb and a human immunoglobulin constant region.
  • Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody.
  • Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros and magnetite.
  • a useful method for ease of detection, is the sandwich ELISA, of which a number of variations exist, all of which are intended to be used in the methods and assays of the present invention.
  • sandwich assay is intended to encompass all variations on the basic two-site technique. Immunofluorescence and EIA techniques are both very well- established in the art. However, other reporter molecules, such as radioisotopes, chemiluminescent or bioluminescent molecules may also be employed. It will be readily apparent to the skilled artisan how to vary the procedure to suit the required use.
  • Whole genome monitoring of protein i.e., the "proteome” can be carried out by constructing a microaxray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to testing or confirming a biological network model of interest.
  • methods for making monoclonal antibodies are well-known. See, e.g., Harlow & Lane, Antibodies: A Laboratory ManuaV (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1988)).
  • monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array and their binding is measured with assays known in the art.
  • Detection of Polypeptides Two-Dimensional Gel Electrophoresis. Two- dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et al, Gel Electrophoresis of Proteins: A Practical Approach (IRL Press, New York, 1990); Shevchenko et al., Proc. Natl. Acad. Sci. USA 93: 14440- 14445 (1996); Sagliocco et al, Yeast 12: 1519-1533 (1996); and Lander, Science 274: 536- 539 (1996)).
  • MS-based analysis methodology is useful for analysis of isolated target polypeptide as well as analysis of target polypeptide in a biological sample.
  • MS formats for use in analyzing a target polypeptide include ionization (I) techniques, such as, but not limited to, matrix assisted laser desorption (MALDI), continuous or pulsed electrospray ionization (ESI) and related methods, such as ionspray or thermospray, and massive cluster impact (MCI).
  • I ionization
  • MALDI matrix assisted laser desorption
  • ESI electrospray ionization
  • MCI massive cluster impact
  • Such ion sources can be matched with detection formats, including linear or non-linear reflectron time of flight (TOF), single or multiple quadrupole, single or multiple magnetic sector Fourier transform ion cyclotron resonance (FTICR), ion trap and combinations thereof such as ion-trap/TOF.
  • TOF linear or non-linear reflectron time of flight
  • FTICR magnetic sector Fourier transform ion cyclotron resonance
  • ion trap and combinations thereof such as ion-trap/TOF.
  • numerous matrix/wavelength combinations e.g.. matrix assisted laser desorption (MALDI)
  • solvent combinations ⁇ e.g., ESI
  • the target polypeptide can be solubilised in an appropriate solution or reagent system.
  • a solution or reagent system e.g., an organic or inorganic solvent
  • MS of peptides also is described, e.g., in International PCT Application No.
  • the matrix also can be an inorganic compound, such as nitrate of ammonium, which is decomposed essentially without leaving any residue.
  • an inorganic compound such as nitrate of ammonium, which is decomposed essentially without leaving any residue.
  • Use of these and other solvents is known to those of skill in the art. See, e.g., U.S. Pat. No. 5,062,935.
  • EIectrospray MS has been described by Fenn et al , J. Phys. Chem. 88: 4451-4459 (1984); and PCT Application No. WO 90/14148; and current applications are summarized in review articles. See Smith et al, Anal. Chem. 62: 882-89 (1990); and Ardrey, Spectroscopy 4: 10-18 (1992).
  • the mass of a target polypeptide determined by MS can be compared to the mass of a corresponding known polypeptide.
  • the corresponding known polypeptide can be the corresponding non-mutant protein, e.g., wild-type protein.
  • ESI the determination of molecular weights in femtomole amounts of sample is very accurate due to the presence of multiple ion peaks, all of which can be used for mass calculation.
  • Sub-attomole levels of protein have been detected, e g., using ESI MS (Valaskovic et al, Science 273: 1199-1202 (1996)) and MALDI MS (Li et al, J. Am. Chem. Soc. 1 18: 1662-1663 (1996)).
  • Matrix Assisted Laser Desorption The level of the target protein in a biological sample, e.g., body fluid or tissue sample, may be measured by means of mass spectrometric (MS) methods including, but not limited to, those techniques known in the art as matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI- TOF-MS) and surfaces enhanced for laser desorption/ionization, time-of-flight mass spectrometry (SELDI-TOF-MS) as further detailed below.
  • MS mass spectrometric
  • MALDI- TOF-MS time-of-flight mass spectrometry
  • SELDI-TOF-MS time-of-flight mass spectrometry
  • MALDI-TOF-MS has been described by Hillenkamp et al, Biological Mass Spectrometry, Burlingame & McCloskey, eds. (Elsevier Science Publ., Amsterdam, 1990) pp. 49-60. [0155] A variety of techniques for marker detection using mass spectroscopy can be used.
  • MS techniques allow the successful volatilization of high molecular weight biopolymers, without fragmentation, and have enabled a wide variety of biological macromolecules to be analyzed by mass spectrometry.
  • SMDl Surfaces Enhanced for Laser Desorption/Ioni ⁇ ation
  • Other techniques are used which employ new MS probe element compositions with surfaces that allow the probe element to actively participate in the capture and docking of specific analytes, described as Affinity Mass Spectrometry (AMS). See SELDI patents U.S. Pat. Nos. 5,719,060; 5,894,063; 6,020,208; 6,027,942; 6,124,137; and U.S. Patent application No. U.S. 2003/0003465.
  • SEAC probe elements have been designed with Surfaces Enhanced for Affinity Capture (SEAC). See Hutchens & Yip. Rapid Commun. Mass Spectrom. 7: 576-580 (1993).
  • SEAC probe elements have been used successfully to retrieve and tether different classes of biopolymers, particularly proteins, by exploiting what is known about protein surface structures and biospecific molecular recognition.
  • the immobilized affinity capture devices on the MS probe element surface, i.e., SEAC determines the location and affinity (specificity) of the analyte for the probe surface, therefore the subsequent analytical MS process is efficient.
  • SELDI Surfaces Enhanced for Neat Desorption
  • the probe element surfaces i.e., sample presenting means
  • EAM Energy Absorbing Molecules
  • probe element surfaces i.e., sample presenting means
  • affinity capture devices to facilitate either the specific or non-specific attachment or adsorption (so-called docking or tethering) of analytes to the probe surface, by a variety of mechanisms (mostly non-covalent);
  • SEPAR Photolabile Attachment and Release
  • the probe element surfaces i.e., sample presenting means
  • the chemical specificities determining the type and number of the photolabile molecule attachment points between the SEPAR sample presenting means (i.e., probe element surface) and the analyte (e.g., protein) may involve any one or more of a number of different residues or chemical structures in the analyte (e.g.. His, Lys, Arg, Tyr, Phe and Cys residues in the case of proteins and peptides).
  • Functionali ⁇ ing Polypeptides e.g. His, Lys, Arg, Tyr, Phe and Cys residues in the case of proteins and peptides.
  • a polypeptide of interest also can be modified to facilitate conjugation to a solid support.
  • a chemical or physical moiety can be incorporate into the polypeptide at an appropriate position.
  • a polypeptide of interest can be modified by adding an appropriate functional group to the carboxyl terminus or amino terminus of the polypeptide, or to an amino acid in the peptide, (e.g., to a reactive side chain, or to the peptide backbone.
  • an appropriate functional group to the carboxyl terminus or amino terminus of the polypeptide, or to an amino acid in the peptide, (e.g., to a reactive side chain, or to the peptide backbone.
  • a modification e.g., the incorporation of a biotin moiety, can affect the ability of a particular reagent to interact specifically with the polypeptide and, accordingly, will consider this factor, if relevant, in selecting how best to modify a polypeptide of interest.
  • a naturally-occurring amino acid normally present in the polypeptide also can contain a functional group suitable for conjugating the polypeptide to the solid support.
  • a cysteine residue present in the polypeptide can be used to conjugate the polypeptide to a support containing a sulfhydryl group through a disulfide linkage, e.g., a support having cysteine residues attached thereto.
  • bonds that can be formed between two amino acids include, but are not limited to, e.g., monosulfide bonds between two lanthionine residues, which are non- naturally-occurring amino acids that can be incorporated into a polypeptide; a lactam bond formed by a transamidation reaction between the side chains of an acidic amino acid and a basic amino acid, such as between the y-carboxyl group of GIu (or alpha carboxyl group of Asp) and the amino group of Lys; or a lactone bond produced, e.g., by a crosslink between the hydroxy group of Ser and the carboxyl group of GIu (or alpha carboxyl group of Asp).
  • a solid support can be modified to contain a desired amino acid residue, e.g., a GIu residue, and a polypeptide having a Ser residue, particularly a Ser residue at the N-terminus or C-terminus, can be conjugated to the solid support through the formation of a lactone bond.
  • the support need not be modified to contain the particular amino acid, e.g., GIu, where it is desired to form a lactone-like bond with a Ser in the polypeptide, but can be modified, instead, to contain an accessible carboxyl group, thus providing a function corresponding to the alpha carboxyl group of GIu.
  • a thiol-reactive functionality is particularly useful for conjugating a polypeptide to a solid support.
  • a thiol-reactive functionality is a chemical group that can rapidly react with a nucleophilic thiol moiety to produce a covalent bond, e.g., a disulfide bond or a thioether bond.
  • thiol-reactive functionalities include, e.g., haloacetyls, such as iodoacetyl; diazoketones; epoxy ketones, alpha- and beta-unsaturated carbonyls, such as alpha-enones and beta-enones; and other reactive Michael acceptors, such as maleimide; acid halides; benzyl halides; and the like. See Greene & Wuts, Protective Groups in Organic Synthesis, 2 nd Edition (John Wiley & Sons, 1991).
  • the thiol groups can be blocked with a photocleavable protecting group, which then can be selectively cleaved, e.g., by photolithography, to provide portions of a surface activated for immobilization of a polypeptide of interest.
  • Photocleavable protecting groups are known in the art (see, e.g., published International PCT Application No. WO 92/10092; and McCray et al, Ann. Rev. Biophys. Biophys. Chem. 18: 239-270 (1989)) and can be selectively de-blocked by irradiation of selected areas of the surface using, e.g., a photolithography mask.
  • Linkers A polypeptide of interest can be attached directly to a support via a linker. Any linkers known to those of skill in the art to be suitable for linking peptides or amino acids to supports, either directly or via a spacer, may be used. For example, the polypeptide can be conjugated to a support, such as a bead, through means of a variable spacer.
  • Linkers include, Rink amide linkers (see, e.g., Rink, Tetrahedron Lett. 28: 3787 (1976)); trityl chloride linkers (see, e.g., Leznoff, Ace Chem. Res.
  • linkers see, e.g., Bodansky et al., Peptide Synthesis, 2 nd Edition (Academic Press, New York, 1976)
  • trityl linkers are known. See, e.g , U.S. Pat. Nos. 5,410,068 and 5,612,474.
  • Amino trityl linkers are also known. See, e.g., U.S. Pat. No. 5,198.531.
  • Other linkers include those that can be incorporated into fusion proteins and expressed in a host cell. Such linkers may be selected amino acids, enzyme substrates or any suitable peptide.
  • the linker may be made, e.g., by appropriate selection of primers when isolating the nucleic acid. Alternatively, they may be added by post-translational modification of the protein of interest.
  • Linkers that are suitable for chemically linking peptides to supports include disulfide bonds, thioether bonds, hindered disulfide bonds and covalent bonds between free reactive groups, such as amine and thiol groups.
  • Cleavable Linkers A linker can provide a reversible linkage such that it is cleaved under the select conditions.
  • selectively cleavable linkers including photocleavable linkers (see U.S. Pat. No. 5,643,722), acid cleavable linkers (see Fattom et al.. Infect. Immun. 60: 584-589 (1992)), acid-labile linkers (see Welh ⁇ ner et al, J. Biol. Chem. 266: 4309-4314 (1991)) and heat sensitive linkers are useful.
  • a linkage can be, e.g., a disulfide bond, which is chemically cleavable by mercaptoethanol or dithioerythrol; a biotin/streptavidin linkage, which can be photocleavable; a heterobifunctional derivative of a trityl ether group, which can be cleaved by exposure to acidic conditions or under conditions of MS (see K ⁇ ster et al, Tetrahedron Lett.
  • a levulinyl-mediated linkage which can be cleaved under almost neutral conditions with a hydrazinium/acetate buffer; an arginine-arginine or a lysine-lysine bond, either of which can be cleaved by an endopeptidase, such as trypsin; a pyrophosphate bond, which can be cleaved by a pyrophosphatase; or a ribonucleotide bond, which can be cleaved using a ribonuclease or by exposure to alkali condition.
  • an endopeptidase such as trypsin
  • a pyrophosphate bond which can be cleaved by a pyrophosphatase
  • a ribonucleotide bond which can be cleaved using a ribonuclease or by exposure to alkali condition.
  • a photolabile cross-linker such as 3-amino-(2-nitrophenyl)propionic acid can be employed as a means for cleaving a polypeptide from a solid support.
  • Other linkers include RNA linkers that are cleavable by ribozymes and other RNA enzymes and linkers, such as the various domains, such as CHi, CH 2 and CH 3 , from the constant region of human IgGl .
  • linker that is cleavable under MS conditions, such as a silyl linkage or photocleavable linkage, can be combined with a linker, such as an avidin biotin linkage, that is not cleaved under these conditions, but may be cleaved under other conditions.
  • Acid-labile linkers are particularly useful chemically cleavable linkers for mass spectrometry, especially for MALDI-TOF, because the acid labile bond is cleaved during conditioning of the target polypeptide upon addition of a 3-HPA matrix solution.
  • Pin tools include those disclosed herein or otherwise known in the art. See, e.g., U.S. Application Serial Nos. 08/786,988 and 08/787,639; and International PCT Application No. WO 98/20166.
  • a pin tool in an array e.g., a 4 x 4 array, can be applied to wells containing polypeptides of interest.
  • the pin tool has a functional group attached to each pin tip, or a solid support, e.g., functionalized beads or paramagnetic beads are attached to each pin
  • the polypeptides in a well can be captured (1 pmol capacity).
  • the pins can be kept in motion (vertical, 1-2 mm travel) to increase the efficiency of the capture.
  • a reaction such as an in vitro transcription is being performed in the wells
  • movement of the pins can increase efficiency of the reaction.
  • Further immobilization can result by applying an electrical field to the pin tool.
  • the polypeptides are attracted to the anode or the cathode, depending on their net charge.
  • the pin tool (with or without voltage) can be modified to have conjugated thereto a reagent specific for the polypeptide of interest, such that only the polypeptides of interest are bound by the pins.
  • the pins can have nickel ions attached, such that only polypeptides containing a polyhistidine sequence are bound.
  • the pins can have antibodies specific for a target polypeptide attached thereto, or to beads that, in turn, are attached to the pins, such that only the target polypeptides, which contain the epitope recognized by the antibody, are bound by the pins.
  • Captured polypeptides can be analyzed by a variety of means including, e.g., _spectrometric techniques, such as UWV I S. IR, fluorescence, chemiluminescence, NMR spectroscopy, MS or other methods known in the art, or combinations thereof. If conditions preclude direct analysis of captured polypeptides, the polypeptides can be released or transferred from the pins, under conditions such that the advantages of sample concentration are not lost. Accordingly, the polypeptides can be removed from the pins using a minimal volume of eluent, and without any loss of sample. Where the polypeptides are bound to the beads attached to the pins, the beads containing the polypeptides can be removed from the pins and measurements made directly from the beads.
  • _spectrometric techniques such as UWV I S. IR, fluorescence, chemiluminescence, NMR spectroscopy, MS or other methods known in the art, or combinations thereof. If conditions preclude direct analysis of captured polypeptides, the
  • Pin tools can be useful for immobilizing polypeptides of interest in spatially addressable manner on an array. Such spatially addressable or pre-addressable arrays are useful in a variety of processes, including, for example, quality control and amino acid sequencing diagnostics.
  • the pin tools described in the U.S. Application Nos. 08/786,988 and 08/787,639 and International PCT Application No. WO 98/20166 are serial and parallel dispensing tools that can be employed to generate multi-element arrays of polypeptides on a surface of the solid support.
  • the array surface can be flat, with beads or geometrically altered to include wells, which can contain beads.
  • MS geometries can be adapted for accommodating a pin tool apparatus.
  • aspects of the biological activity state, or mixed aspects can be measured in order to obtain drug and pathway responses.
  • the activities of proteins relevant to the characterization of cell function can be measured, and embodiments of this invention can be based on such measurements.
  • Activity measurements can be performed by any functional, biochemical or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with natural substrates, and the rate of transformation measured. Where the activity involves association in multimeric units, e.g., association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured.
  • response data may be formed of mixed aspects of the biological state of a cell.
  • Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances and changes in certain protein activities.
  • TABLES 6 to 9 shows the alignments of wild type ERBB2 sequence with Pfam models of each domain.
  • Two of the mutations identifies in the present invention are found in a receptor L domain 366-486, and three of them are found in the protein tyrosine kinase domain.
  • A848 and R868 are highly conserved in the protein tyrosine kinase domain. Mutated positions are highlighted in bold and underlined text. Amino acid change in the highly conserved region may alter the protein structure and hence the protein function.
  • Sequence alignment of the wild-type human ERBB2 polypeptide sequence with the Pfam receptor L domain sequence is summarized below in TABLE 6.
  • ERBB2 220 CAGGCA RCKGPLPTDCCHEQCAAGCT—GPKH-SDCLACLHFNHSGI 263
  • ERBB2 264 C ⁇ LHCPALVTYNTDTFESMPNPEGRYTFGASCVTACPYNYLSTDVGSCTL 313 sCPsgHktevgAesGvreCekCReGpCPKvCe ⁇ -* +CP + ++ev+Ae+G+++CekC ++pC++vC+
  • ERBB2 1 314 VCPLH-NQEVTAEDGTQRCEKC-SKPCARVCY 343 SEQ ID NO:26
  • Sequence alignment of the wild-type human ERBB2 polypeptide sequence with the second Pfam receptor L domain sequence is summarized below in TABLE 8.
  • ERBB2 1 450 ISWLGLRSLRELGS—GLALIHHNT—HLCFVHT-VPWDQLF 486
  • NetPhos produces neural network predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins (Blom et al, J. MoI. Biol, 294(5): 1351 -1362, 1999).
  • Potential ERBB2 phosphorylation sites predicted by NetPhos are summarized below in TABLE 10. With the exception of Y 1221, other published phosphorylated sites were identified as predicted phosphorylation sites by the software in these studies. NetPhos analysis of ERBB2 indicated additional serine, threonine and tyrosine phosphorylation sites present in the ERBB2 polypeptide. To be considered a potential phosphorylation site a threshold score of 0.5 was required.
  • Pl 170 is close to a potential serine phosphorylation site Sl 174 and a potential threonine phosphorylation site Tl 172.
  • D873 is close a potential threonine phosphorylation site T875 and a potential tyrosine phosphorylation site Y877.
  • Threonine 23 328, 686, 701, 733, 759, 875, 900, 948, 1172. 1 198, 1236
  • Tyrosine 83 1 12, 289, 590, 772, 877, 1023, 1127, 1139, 1196, 1222,
  • SUMO Small ubiquitin-related modifier family proteins
  • PICl small ubiquitin-related modifier family proteins
  • UBLl UBLl
  • Sentrin GMPl
  • Smt3 Small ubiquitin-related modifier family proteins
  • SUMO modification is reversible, and does not appear to target proteins for degradation but rather alters the target protein function through changes in cellular localization, biochemical activation, or through protection from ubiquitin- dependent degradation.
  • Posttranslational modification via sumoylation influences numerous biological processes, including signal transduction, transcriptional regulation, and growth control.
  • Shiio and Eisenmann have demonstrated that the DNA-binding histone proteins are subject to sumoylation. Shiio & Eisenmann, Proc. Natl Acad. Sci. U.S.A. 100(23):13225-30 (2003).
  • ERBB2 sumoylation sites were identified by computational analysis using the SUMOPlot computational analysis tool. Hinsley et al., Protein Sci., 13 : 2588 - 2599 (2004); Van Dyck et al., J. Biol. Chem., 279: 36121 - 36131 (2004).
  • SUMOplotTM predicts the probability for the SUMO consensus sequence (SUMO-CS) to be engaged in SUMO attachment. That is, most SUMO-modified proteins contain the tetrapeptide motif B- K-x-D/E where B is a hydrophobic residue, K is the lysine conjugated to SUMO, x is any amino acid (aa), and D or E is an acidic residue.
  • Substrate specificity appears to be derived directly from Ubc9 and the respective substrate motif.
  • the SUMOplotTM score system is based on two criteria: 1) direct amino acid match to the SUMO-CS observed and shown to bind Ubc9, and 2) substitution of the consensus amino acid residues with amino acid residues exhibiting similar hydrophobicity.
  • No SUMO modification has been reported for ERBB2.
  • Potential SUMO modification sites are predicted by SUMOPLOT and summarized below in below TABLE 12 and TABLE 13.
  • ERBB2 mutation P856S of the present invention is predicted to eliminate a potential SUMO modification site at K854.
  • None of the mutation of ERBB2 identified in the present invention is located close to the predicted position Tl 1 17, Tl 240 and T1242 and thus, none of the mutation of the present invention is likely to cause a change in the O-glycosylation pattern of ERBB2.
  • PROSlTE Analysis of the Potential Effect of ERB B 2 mutations on Other ERBB2 Protein Regulatory Sites.
  • the effect of the ERBB2 mutations on other protein regulatory sites was analyzed using the PROSITE computational analysis tool.
  • PROSlTE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs as well as to identify potential sites for protein modification (HuIo N. et al, Nucl. Acids. Res., 32:D134-D137 (2004); Sigrist C.J.A. et al. Brief Bioinform., 3:265-274 (2002); Gattiker A.
  • Protein kinases ATP-binding 726-753 region (PSOO 107) Tyrosine protein kinases specific 841 - 853 (close to P856) active-site (PSOO 109) EF-hand calcium-binding domain 1011 - 1023 (includes G1015) (PS00018)
  • Proline-rich region profile 1102-1234 PS50099
  • N-myristoylation site PS00008
  • N-glycosylation site PSOOOOl
  • Tyrosine kinase phosphorylation 765 - 772 site PS00007
  • Tyrosine sulfation site (PS00003) 870 - 884 (includes D873), 1016 - 1030 (close to G1015), 1215 - 1229, 1241 - 1255.
  • ClustalW Polypeptide Alignment and Sequence Analysis to Estimate the Potential Effect ofERBB2 Mutation on ERBB2 Function ClustalW polypeptide alignment and sequence analysis was used to estimate the effect of ERBB2 mutation on ERBB2 biological function.
  • Known ERBB2 sequences from human (NP_004439), mouse (NP_001003817), rat (NP_058699), dog (NP_001003217) and zebrafish (NP_956413) were obtained from GenBank and aligned using ClustalW. Chenna et al, Nucleic Acids Res., 31 (13):3497-500 (2003).
  • ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.
  • mice_erbb2 MELAAWCRWGFLLALLSPGAAGTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQ rat_erbb2 MIIMELAAWCRWGFLLALLPPGIAGTQVCTGTDMKLRLPASPETHLD ⁇ -SLRHLYQGCQVVQ huiran_erbb2 MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQ dog_erbb2 MELAAWCRWGLLLALLPSGAAGTQVCTGTDMKLRLPASP ⁇ THLDMLRHLYQGCQVVQ zebrafish_erbb2 -M ⁇ ADRSFGLAWVLLLLLGITAATGREVCLGTDMKLALPSSLENHYEMLRLLYTGCQVVH
  • mice_erbb2 GNL ⁇ LTYLPANASLSFLQDIQEVQGYMLIAHNRVKHVPLQRLRIVRGTQLF ⁇ DKYALAVL rat_erbb2 GNLELTYVPANASLSFLQDIQEVQGYMLIAHNQVKRVPLQRLRIVRGTQLFEDKYALAVL human_erbb2 GNLELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVL dog_erbb2 GNLELTYLPANASLSFLQDIQ ⁇ VQGYVLIAHSQVRQIPLQRLRIVRGTQLFEDNYALAVL zebrafxsh_erbb2 GNLEITHLQGNPDLSFLQ ⁇ IVEVQGYVLIAHVSVRSLPLDNLRIIRG ⁇ QLYKSNYALAVH mouse_erbb2 DNRDPLDNVTTAAPGRTPEGtRELQLRSLTEILKG
  • nnPredict Method Analysis of the Wild-type ERBB2 Secondary Structure Secondary structure predictions of wild-type ERBB2 (TABLE 18) and mutant ERBB2 polypeptides (TABLE 2O 5 TABLE 22, TABLE 24, TABLE 26, TABLE 28, TABLE 30, TABLE 32 and TABLE 34) were performed by nnPredict.
  • the basis of the prediction is a two-layer, feedforward neural network.
  • the network weights were determined by a separate program — a modification of the Parallel Distributed Programming suite of McClelland & Rumelhart (MIT Press, Cambridge MA.1, Vol. 3, pp 318-362 (1988)). Complete details of the determination of the network weights is found in Kneller et. al. (J. MoI. Biol, (214): 171-182 (1990)).
  • the output is a secondary structure prediction for each position in the sequence.
  • ERBB2 protein secondary structure as predicted by nnPredict use "H", "E” and a dash “-" as identifiers, which are defined as follows.
  • a helix element is designated by the letter “H”.
  • a strand element is designated by the letter “E”.
  • No prediction is designated by a dash ("-")•
  • Gray shading represents polypeptide regions where mutation was identified.
  • SEQ ID NO:29 [0195] A schematic representation of the secondary structure of wild-type ERBB2 polypeptide (SEQ ID NO:29) predicted using nnPredict analysis is shown below in TABLE 19. The position of the mutated amino acid residues are identified with grey shaded text.
  • HHH 1 EE EE-HHHH H-H EEHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
  • a schematic representation of the secondary structure of ERBB2 mutant polypeptide R868Q (SEQ ID NO:47) is predicted using nnPredict analysis is shown below in TABLE 29. The position of the mutated amino acid residue is identified by grey shaded text.
  • G1015R (SEQ ID NO:49) is predicted using nnPredict analysis is shown below in
  • Protein secondary structure is indicated as “h”, “e”, “t”, and “c”, which are defined as follows.
  • An “h” designates alpha helix protein secondary structure.
  • An “e” designates extended strand protein secondary structure.
  • a “t” designates beta turn protein secondary structure.
  • a “c” designates random coil protein secondary structure.
  • the position of the mutated amino acid residue is highlighted as bold underlined text.
  • a shaded area designates a region of the ERBB2 polypeptide where a mutation of the invention was identified.
  • SEQ ID NO:29 The amino acid sequence of ERBB2 mutant polypeptide F371L (SEQ ID NO:43) is shown below in TABLE 38. The position of the mutated amino acid residue is highlighted in bold underlined text.
  • SEQ ID NO: 43 The amino acid sequence of ERJBB2 mutant polypeptide C475S (SEQ ID NO:44) is shown below in TABLE 39. The position of the mutated amino acid residue is highlighted in bold underlined text.
  • SEQ ID NO:46 [0219] The amino acid sequence of ERBB2 mutant polypeptide R868Q (SEQ ID NO:47) is shown below in TABLE 42. The position of the mutated amino acid residue is highlighted in bold underlined text.
  • SEQ ID NO: 47 [0220] The amino acid sequence of ERBB2 mutant polypeptide D873N (SEQ ID NO:48) is shown below in TABLE 43. The position of the mutated amino acid residue is highlighted in bold underlined text.
  • SEQ ID NO:48 The amino acid sequence of ERBB2 mutant polypeptide G1015R (SEQ ID NO:49) is shown below in TABLE 44. The position of the mutated amino acid residue is highlighted in bold underlined text.
  • SEQ ID NO:49 [0222] The amino acid sequence of ERBB2 mutant polypeptide Pl 170A (SEQ ID NO:50) is shown below in TABLE 45. The position of the mutated amino acid residue is highlighted in bold underlined text.
  • an agent that modulates ERBB2 biological activity i.e., ERBB2 modulating agent, e.g., ERBB2 antagonist
  • ERBB2 modulating agent e.g., ERBB2 antagonist
  • cancer e.g., breast cancer
  • SNP single nucleotide polymorphism
  • the SNP is selected from the group consisting of the ERBB2 mutation summarized in TABLE 1 and

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)

Abstract

This invention relates generally to the analytical testing of tissue samples in vitro, and more particularly to aspects of genetic polymorphisms and mutations of the ERBB 2gene. The invention provides new ERBB2 mutations and SNPs, useful in the diagnosis and treatment of subjects in need thereof. Accordingly, the various aspects of the present invention relate to polynucleotides encoding the ERBB2 mutations of the invention, expression vectors encoding the ERBB2 mutant polypeptides of the invention and organisms that express the ERBB2 mutant and polymorphic polynucleotides and/or ERBB2 mutant/polymorphic polypeptides of the invention. The various aspects of the present invention further relate to diagnostic/theranostic methods and kits that use the ERBB2 mutations and polymorphisms of the invention to identify individuals predisposed to disease or to classify individuals with regard to drug responsiveness, side effects, or optimal drug dose.

Description

MUTATIONS AND POLYMORPHISMS OF ERBB2
FIELD OF THE INVENTION
[0001] This invention relates generally to the analytical testing of tissue samples in vitro, and more particularly to aspects of genetic mutations and polymorphisms of v-erb-b2 erythroblastic leukaemia viral oncogene homolog 2 (ERBB2).
BACKGROUND OF THE INVENTION
[0002] Conventional medical approaches to diagnosis and treatment of disease is based on clinical data alone, or made in conjunction with a diagnostic test. Such traditional practices often lead to therapeutic choices that are not optimal for the efficacy of the prescribed drug therapy or to minimize the likelihood of side effects for an individual subject Therapy specific diagnostics (a.fca., theranostics) is an emerging medical technology field, which provides tests useful to diagnose a disease, choose the correct treatment regime and monitor a subject's response. That is, theranostics are useful to predict and assess drug response in individual subjects, i.e., individualized medicine. Theranostic tests are also useful to select subjects for treatments that are particularly likely to benefit from the treatment or to provide an early and objective indication of treatment efficacy in individual subjects, so that the treatment can be altered with a minimum of delay. Theranostics are useful in clinical diagnosis and management of a variety of diseases and disorders, which include, but are not limited to, e.g., cardiovascular disease, cancer, infectious diseases, Alzheimer's disease and the prediction of drug toxicity or drug resistance. Theranostic tests may be developed in any suitable diagnostic testing format, which include, but is not limited to, e.g., immunohistochemical tests, clinical chemistry, immunoassay, cell-based technologies, and nucleic acid tests.
[0003] Progress in pharmacogenomics and pharmacogenetics, which establishes correlations between responses to specific drugs and the genetic profile of individual patients and/or their tumours, is foundational to the development of new theranostic approaches. As such, there is a need in the art for the evaluation of patient-to-patient variations and tumour mutations in gene sequence and gene expression. A common form of genetic profiling relies on the identification of DNA sequence variations called single nucleotide polymorphisms ("SNPs"), which are one type of genetic alteration leading to patient-to-patient variation in individual drug response. In addition, it is well established in the art that acquired DNA changes (mutations) are responsible, alone or in part, for pathological processes. It follows that, there is a need art to identify and characterize genetic mutations and SNPs, which are useful to identify the genotypes of subjects and their tumours associated with drug responsiveness, side effects, or optimal dose.
[0004] v-erb-b2 erythroblastic leukaemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian) (ERBB2; a.k.a., C-erbB-2, c-erb B2, HER2, HER-2, HER- 2/neu, MLN 19, NEU, NEU proto-oncogene, NGL, pl85erbB2, receptor tyrosine-protein kinase erbB-2 precursor, TKRl, Tyrosine kinase-type cell surface receptor HER2) is located at chromosomal location: 17ql 1.2-ql2. (Yang-Feng et al, Cytogenet. Cell Genet. 40: 784 (1985), Coussens, et al, Science 230: 1132-1139 (1985)). ERBB2 encodes a 185-kDa, 1255 amino acid receptor tyrosine kinase belonging to the family of four transmembrane receptor tyrosine-kinase receptors (RTK). This HER family of RTKs consists of four members that mediate the growth, differentiation, and survival of cells: epidermal growth factor receptor (EGFR, also called HER-I or erbB-1), HER-2 (also called erbB-2 or Neu), HER-3 and HER-4 (also called erbB-3 and erbB-4, respectively). Yarden et al.,. Nat. Rev. MoI. Cell Biol 2:127- 137 (2001); Gschwind et al,. Nat. Rev. Cancer4:36l-370 (2004).
[0005] The ERBB2 protein is expressed in several human organs and tissues such as normal epithelium, endometrium and ovarian epithelium and at neuromuscular level; prostate, pancreas, lung, kidney, liver, heart, haematopoietic cells. ERBB2 expression is low in mononuclear cells from bone marrow, peripheral blood (PB) and mobilized PB. The higher expression has been found in cord blood-derived cells. Quiescent CD34+ progenitor cells from all blood sources and resting lymphocytes are ERBB2 negative, but the expression of this receptor is up-regulated during cell-cycle recruitment of progenitor cells. Similarly, it increases in mature, haematopoietic proliferating cells, underlying the correlation between ERBB2 and the proliferating status of haematopoietic cells. The most important intracellular pathways activated by ERBB2 are those involving mitogen activated protein kinase and phosphatidylinositol-3 kinase.
[0006] ERBB2 mutations and altered expression levels have been found to be associated with different cancers: ERBB2 is overexpressed in 25% to 40% of several human tumours and associated with the malignancy of the disease, high mitotic index and a shorter survival time for the patient. www.infobiogen.fr/services/chromcancer/Genes/ERBB2ID 162chl 7ql 1.html: [0007] ERBB2 overexpression occurs in 25-30% of human breast cancers. It is associated with tumour aggressiveness, thus, with shorter time to relapse and lower overall survival. Further, it has been shown to be associated with patient responsiveness to doxorubicin, cyclophosphamide, methotrexate, fluorouracil (CMF), and to paclitaxel, whereas tamoxifen was found to be ineffective and even detrimental in patients with ERBB2 -positive tumours. In Paget's disease of breast, ERBB2 protein overexpression is caused by amplification of the erbB-2 gene. ERBB2 has a role in this disease of the breast, where the epidermis of the nipple is infiltrated by large neoplastic cells of glandular origin. It seems that binding of heregulin- alfa to the receptor complex on Paget cells results in chemotaxis of these breast cancer cells. [0008] ERBB2 may be activated in the early stage of pathogenesis of cervical carcinoma in geriatric patients and is frequently amplified in squamous cell carcinoma of the uterine cervix. Further, overexpression of ERBB2 in medulloblastoma is associated with poor prognosis and metastasis and HER2-HER4 receptor heterodimerization is of particular biological significance in this disease. ERBB2 expression is reported to also be detected in blast cells from patients with hematological malignancies including acute lymphoblastic leukaemia (ALL). Overexpression of ERBB2 has also been shown to be associated with transitional cell carcinoma of the bladder. Further, it has been shown that ERBB2 overexpression occurs in muscle-invasive urothelial carcinomas of the bladder and is associated with worse survival. . Amplifications of ERBB2 gene are also frequently linked to alterations of the TOP2A gene in bladder cancer. Overexpression of ERBB2 was also shown to occur in a significant number of colorectal cancers, being significantly associated with poor survival and related to tumour progression in colorectal cancer. E6/E7 proteins of HPV type 16 and ERBB2 cooperate to induce neoplastic transformation of primary normal oral epithelial cells. Overexpression of ERBB2-receptor is a frequent event in oral squamous cell carcinoma and is correlated with, poor survival. ERB B2 amplifϊcation/overexpression has been reported to likely not play a role in the molecular pathogenesis of most gastrinomas. However, mild gene amplification is shown to occur in a subset of gastrinomas, and overexpression of this receptor is associated . with aggressiveness of the disease. ERBB2 is also correlated with tumour histological . differentiation and is associated with poor prognosis in well-differentiated gastric adenocarcinoma. A significant correlation was also observed between ERBB2 overexpression -A-
and clinical outcome in germ-cell testicular tumours. Increased ERBB2 expression was shown to contribute to the development of cholangiocarcinogenesis into an advanced stage associated with tumour metastasis. In addition, overexpression of ERBB2 and COX-2 were reported to directly correlate with tumour differentiation. Further, it has been reported that ERBB2 is overexpressed in less than 20% of patients with non-small cell lung cancer (NSCLC) and studies have shown that overexpression of this receptor is correlated with a poor prognosis in both resected and advanced NSCLC. Higher frequency of ERBB2 expression has been observed in samples from patients with metastatic disease at presentation and at the time of relapse, and it correlates with worse histologic response and decreased event-free survival. Patients with ERBB2-overexpression have a significantly worse prognosis compared to patients with ERBB2 -negative tumours. Overexpression of ERBB2 in pancreatic adenocarcinoma was reported to be a result of increased transcription rather than gene amplification. The coexpression of ERBB2 oncogene protein, epidermal growth factor receptor, and TGF-betal in pancreatic ductal adenocarcinoma is related to the histopathological grades and clinical stages of tumours. The expression of ERBB2 in prostate cancer is relatively low, but is up-modulated at onset of hormone resistance. Several results demonstrated significant positive staining of ERBB2 in the salivary tumorigenic tissue but not in the surrounding non-tumorigenic tissue, pointing to a biological role in the tumorigenic process. The presence of increased levels of ERBB2 in synovial sarcoma is associated with a more favorable clinical course.
[0009] Polymorphisms in erbb-2 gene have also been described: allelic variations at amino acid positions 654 and 655 of isoform (a) (positions 624 and 625 of isoform (b)) have been reported, with the most common Allele Bl (Ile-654/Ile-655); allele B2 (Ile-654/Val-655); allele B3 (Val-654/Val-655). This nucleotide polymorphism has been reported to may be associated with development of gastric carcinoma and with breast cancer risk, particularly among younger women. Xie, et ah, J. Nat. Cancer Inst. 92: 412-417 (2000)). Additionally, the Cancer Genome Project and Collaborative Group sequenced the ERB B2 gene from 120 primary lung tumours and identified 4% that had mutations within the kinase domain; and in the adenocarcinoma subtype of lung cancer, 10% of cases had mutations. (Cancer Genome Project and Collaborative Group, Nature 431: 525-526, 2004). [0010] Accordingly, there is a need in the art for additional information about the relationship between ERBB2 mutations and cancer. SUMMARY OF THE INVENTION
[0011] The invention provides for the use of an ERBB2 modulating agent in the manufacture of a medicament for the treatment of cancer in a selected patient population. The patient population is selected on the basis of the genotype of the patients at an ERBB2genetic locus indicative of efficacy of the ERBB2 modulating agent in treating cancer. In several embodiments, the cancer can be breast cancer.
[0012] The invention also provides an isolated polynucleotide having a sequence encoding an ERBB2mutation. In several embodiments, the ERBB2 mutations are the previously- unidentified mutations listed in TABLE 1. Accordingly, the invention provides vectors and organisms containing the ERBB2 mutations of the invention and polypeptides encoded by polynucleotides containing the ERBB2 mutations of the invention. [0013] The invention further provides a method for treating cancer in a subject. The genotype or haplotype of a subject is obtained at an ERBB2 gene locus, so that the genotype and/or haplotype are indicative of a propensity of the cancer to respond to the drug. Then, an anti-cancer therapy is administered to the subject.
[0014] The invention provides a method for diagnosing cancer in a subject and a method for choosing subjects for inclusion in a clinical trial for determining efficacy of an ERBB2 modulating agent; in both these methods the genotype and/or haplotype of a subject is interrogated at an ERBB2 gene locus. Also provided by the invention are kits for use in determining a treatment strategy for cancer.
[0015] The invention also provides for the use of each of the mutations of the inventions as a drug target.
DETAILED DESCRIPTION OF THE INVENTION
[0016] It is to be appreciated that certain aspects, modes, embodiments, variations and features of the invention are described below in various levels of detail in order to provide a substantial understanding of the present invention. In general, such disclosure provides new ERBB2 mutations and SNPs that may be useful, alone or in combination, in the diagnosis and treatment of subjects in need thereof. Accordingly, the various aspects of the present invention relate to polynucleotides encoding ERBB2 mutations and polymorphisms of the invention, expression vectors encoding the ERBB2 mutant polypeptides of the invention and organisms that express the ERBB2mutant/polymorphic polynucleotides and/or ERBB2mutant/polymorphic polypeptides of the invention. The various aspects of the present invention further relate to diagnostic/theranostic methods and kits that use the ERBB2 mutations and/or polymorphisms of the invention to identify individuals predisposed to disease or to classify individuals and tumours with regard to drug responsiveness, side effects, or optimal drug dose. In other aspects, the invention provides methods for compound validation and a computer system for storing and analyzing data related to the ERBB2 mutations and polymorphisms of the invention. Accordingly, various particular embodiments that illustrate these aspects follow.
[0017] Definitions. The definitions of certain terms as used in this specification are provided below. Definitions of other terms may be found in the glossary provided by the U.S. Department of Energy, Office of Science, Human Genome Project (http://www.ornl.gov/sci/techresources/Human_Genome/glossary/). [0018] As used herein, the term "allele" means a particular form of a gene or DNA sequence at a specific chromosomal location (locus).
[0019] As used herein, the term "antibody" includes, but is not limited to, polyclonal antibodies, monoclonal antibodies, humanized or chimaeric antibodies and biologically functional antibody fragments sufficient for binding of the antibody fragment to the protein. [0020] As used herein, the term "clinical response" means any or all of the following: a quantitative measure of the response, no response, and adverse response (i.e., side effects). [0021] As used herein, the term "clinical trial" means any research study designed to collect clinical data on responses to a particular treatment, and includes but is not limited to phase I, phase II and phase III clinical trials. Standard methods are used to define the patient population and to enrol subjects.
[0022] As used herein, the term "effective amount" of a compound is a quantity sufficient to achieve a desired pharmacodynamic, toxicologic, therapeutic and/or prophylactic effect, for example, an amount which results in the prevention of or a decrease in the symptoms associated with a disease that is being treated, e.g., the diseases associated with ERBB2 mutant polypeptides and ERBB2 mutant polynucleotides identified herein. The amount of compound administered to the subject will depend on the type and severity of the disease and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. It will also depend on the degree, severity and type of disease. The skilled artisan will be able to determine appropriate dosages depending on these and other factors. Typically, an effective amount of the compounds of the present invention, sufficient for achieving a therapeutic or prophylactic effect, range from about 0.000001 mg per kilogram body weight per day to about 10,000 mg per kilogram body weight per day. Preferably, the dosage ranges are from about 0.0001 mg per kilogram body weight per day to about 100 mg per kilogram body weight per day. The compounds of the present invention can also be administered in combination with each other, or with one or more additional therapeutic compounds.
[0023] Glivec® (Gleevec®; imatinib) is a medication for chronic myeloid leukaemia (CML) and certain stages of gastrointestinal stromal tumours (GIST). It targets and interferes with the molecular abnormalities that drive the growth of cancer cells.' Corless CL et al., J. CHn. Oncol.22(18):3813-25 (September 15, 2004); Verweij J et al, Lancet 364(9440):! 127- 34 (September 25, 2004); Kantarjian HM et al, Blood 104(7): 1979-88 (October 1, 2004). By inhibiting multiple targets, Glivec® has potential as an anticancer therapy for several types of cancer, including leukaemia and solid tumours.
[0024] The aromatase inhibitor FEMARA® is a treatment for advanced breast cancer in postmenopausal women. It blocks the use of oestrogen by certain types of breast cancer that require oestrogen to grow. Janicke F, Breast 13 Suppl 1 :S10-8 (December 2004); Mouridsen H et al, Oncologist 9(5):489-96 (2004).
[0025] Sandostatin® LAR® is used to treat patients with acromegaly and to control symptoms, such as severe diarrhoea and flushing, in patients with functional gastro-entero- pancreatic (GEP) tumours {e.g., metastatic carcinoid tumours and vasoactive intestinal peptide-secreting tumours [VIPomas]). Oberg K, Chemotherapy Al Suppl 2:40-53 (2001); Raderer M et al, Oncology 60(2):141-5 (2001); Aparicio T et al, Eur. J. Cancer 37(8):1014-9 (May 2001). Sandostatin® LAR® regulates hormones in the body to help manage diseases and their symptoms.
[0026] ZOMET A® is a treatment for hypocalcaemia of malignancy (HCM)I and for the treatment of bone metastases across a broad range of tumour types. These tumours include multiple myeloma, prostrate cancer, breast cancer, lung cancer, renal cancer and other solid tumours. Rosen LS et al, Cancer 100(12):2613-21 (June 15, 2004).
[0027] Vatalanib (1 -[4-chloroanilino]-4-[4-pyridylmethyl] phthalazine succinate) is a multi- VEGF receptor (VEGF) inhibitor that may block the creation of new blood vessels to prevent tumour growth. This compound inhibits all known VEGF receptor tyrosine kinases, blocking angiogenesis and lymphangiogenesis. Drevs J et ah, Cancer Res. 60:4819-4824 (2000); Wood JM et ah, Cancer Res. 60:2178-2189 (2000). Vatalanib is being studied in two large, multinational, randomized, phase III, placebo-controlled trials in combination with FOLFOX- 4 in first-line and second-line treatment of patients with metastatic colorectal cancer. Thomas A et ah, 37th Annual Meeting of the American Society of Clinical Oncology, San Francisco, CA5 Abstract 279 (May 12-15, 2001).
[0028] The orally bioavailable rapamycin derivative everolimus inhibits oncogenic signalling in tumour cells. By blocking the mammalian target of rapamycin (mTOR)- mediated signalling, everolimus exhibits broad antiproliferative activity in tumour cell lines and animal models of cancer. Boulay A et ah, Cancer Res. 64:252-261 (2004). In preclinical studies, everolimus also potently inhibited the proliferation of human umbilical vein endothelial cells directly indicating an involvement in angiogenesis. By blocking tumour cell proliferation and angiogenesis, everolimus may provide a clinical benefit to patients with cancer. Everolimus is being investigated for its antitumour properties in a number of clinical studies in patients with haematological and solid tumours. Huang S & Houghton PJ, Curr. Opin. Investig. Drugs 3:295-304 (2002).
[0029] Gimatecan is a novel oral inhibitor of topoisomerase I (topo I). Gimatecan blocks cell division in cells that divide rapidly, such as cancer cells, which activates apoptosis. Preclinical data indicate that gimatecan is not a substrate for multidrug resistance pumps, and that it increases the drug-target interaction. De Cesare M et ah, Cancer Res. 61 :7189-7195 (2001). Phase I clinical studies indicate that the dose-limiting toxicity of gimatecan is myelosuppression.
[0030] Patupilone is a microtubule stabilizer. Altmann K-H, Curr. Opin. Chem. Biol. 5:424-431 (2001); Altmann K-H et ah, Biochim. Biophys. Acta 470:M79-M91 (2000); O'Neill V et ah, 36th Annual Meeting of the American Society of Clinical Oncology; May 19- 23, 2000; New Orleans, LA, Abstract 829; Calvert PM et a Proceedings of the 11th National Cancer Institute-European Organization for Research and Treatment of Cancer/American Association for Cancer Research Symposium on New Drugs in Cancer Therapy; November 7-10, 2000; Amsterdam, The Netherlands, Abstract 575. Patupilone blocked mitosis and induced apoptosis greater than the frequently used anticancer drug paclitaxel. Also, patupilone retained full activity against human cancer cells that were resistant to paclitaxel and other chemotherapeutic agents.
[0031] Midostaurin is an inhibitor of multiple signalling proteins. By targeting specific receptor tyrosine kinases and components of several signal transduction pathways, midostaurin impacts several targets involved in cell growth (e.g., KIT, PDGFR, PKC), leukaemic cell proliferation (e.g., FLT3), and angiogenesis (e.g., VEGFR2). Weisberg E et al Cancer Cell 1 :433-443 (2002); Fabbro D et al, Anticancer Drug Des. 15:17-28 (2000). In preclinical studies, midostaurin showed broad antiproliferative activity against various tumour cell lines, including those that were resistant to several other chemotherapeutic agents. [0032] The somatostatin analogue pasireotide is a stable cyclohexapeptide with broad somatotropin release inhibiting factor (SRIF) receptor binding. Bruns C et al, Eur. J. Endocrinol. 146(5):707-16 (May 2002); Weckbecker G et al, Endocrinology 143(10):4123- 30 (October 2002); Oberg K, Chemotherapy 47 Suppl 2:40-53 (2001). [0033] LBH589 is a histone deacetylase (HDAC) inhibitor. By blocking the deacetylase activity of HDAC, HDAC inhibitors activate gene transcription of critical genes that cause apoptosis (programmed cell death). By triggering apoptosis, LBH589 induces growth inhibition and regression in tumour cell lines. LBH589 is being tested in phase I clinical trials as an anticancer agent. See also, George P et al, Blood 105(4): 1768-76 (February 15, 2005). [0034] AEE788 inhibits multiple receptor tyrosine kinases including EGFR, HER2, and VEGFR, which stimulate tumour cell growth and angiogenesis. Traxler P et al., Cancer Res. 64:4931-4941 (2004). In preclinical studies, AEE788 showed high target specificity and demonstrated antiproliferative effects against tumour cell lines and in animal models of cancer. AEE788 also exhibited direct antiangiogenic activity. AEE788 is currently in phase I clinical development.
[0035] AMN 107 is an oral tyrosine kinase inhibitor that targets BCR-ABL, KIT, and PDGFR. Preclinical studies have shown in cellular assays using Philadelphia chromosome- positive (Ph+) CML cells that AMN 107 is highly potent and has high selectivity for BCR- ABL, KIT, and PDGFR. Weisberg E et al., Cancer Cell 7(2): 129-41 (February 2005); OΗare T et al, Cancer Cell 7(2):117-9 (February 2005). AMN107 also shows activity against mutated variants of BCR-ABL. AMN107 is currently being studied in phase I clinical trials. [0036] As used herein, the term "ERBB2 modulating agent" is any compound that alters (e.g., increases or decreases). The expression level or biological activity level of ERBB2 polypeptide compared to the expression level or biological activity level of ERBB2 polypeptide in the absence of the ERBB2 modulating agent. ERBB2 modulating agent can be a small molecule, antibody, polypeptide, carbohydrate, lipid, nucleotide, or combination thereof. The ERBB2 modulating agent can be an organic compound or an inorganic compound. Preferably, the ERBB2 modulating agent is selected from the group consisting of: AEE788, lapatinib (GW572016), HKI-272, PD158780, PKI-166, AG879, TAK165, CI-1033, CP-654577, AG825, BMS-599626, EKB-569, PDl 53035, SUl 1925, ZM 252868, CP 127,374, SUCl 02, pertuzumab and trastuzumab.
[0037] As used herein, "expression" includes but is not limited to one or more of the following: transcription of the gene into precursor mRNA; splicing and other processing of the precursor mRNA to produce mature mRNA; mRNA stability; translation of the mature mRNA into protein (including codon usage and tRNA availability); and glycosylation and/or other modifications of the translation product, if required for proper expression and function. [0038] As used herein, the term "gene" means a segment of DNA that contains all the information for the regulated biosynthesis of an RNA product, including promoters, exons, introns, and other untranslated regions that control expression.
[0039] As used herein, the term "genotype" means an unphased 5' to 3' sequence of nucleotide pairs found at one or more polymorphic or mutant sites in a locus on a pair of homologous chromosomes in an individual. As used herein, genotype includes a full- genotype and/or a sub-genotype.
[0040] As used herein, the term "locus" means a location on a chromosome or DNA molecule corresponding to a gene or a physical or phenotypic feature. [0041 ] The term "modulate" or "modify" are used interchangeably herein and refer to the up-regulation or down-regulation of a target gene or a target protein. The term modifies or modified also refers to the increase, decrease, elevation, or depression of processes or signal transduction cascades involving a target gene or a target protein (e.g., a cascade or pathway that induces growth arrest in a cell). A target gene can be a gene involved in apoptosis. The target gene can also encode a target protein that is involved in apoptosis. Modification of the target protein e.g., a ERBB2 protein, may occur when a ERBB2 modulating agent such AEE788, lapatinib (GW572016), HKI-272, PD158780, PKI-166, A<3879, TAK165, CI-1033, CP-654577, AG825, BMS-599626, EKB-569, PD153035, SUl 1925, ZM 252868, CPl 27,374, SUC 102, pertuzumab, and/or trastnzumab that binds to the target protein. The modification may directly affect the ERBB2 protein, for example modifications that result in alteration in ERBB2 protein expression (i.e., an increase or decrease). Alternatively, the modifications may occur as an indirect effect of binding to the target protein. For example, binding of a ERBB2 modulating agent that leads to a change in downstream processes involving ERBB2, such as activation of signal transduction pathways involving apoptosis and cell proliferation. The modifications can therefore be direct modifications of the target protein, or an indirect modification of a process or cascade involving the target protein. Non- limiting examples of modifications includes modifications of morphological and functional processes, under-or over production or expression of proteins that, e.g., inhibit cell proliferation, cell activity, cell migration, chemotaxis and cell tumourogenicity. [0042] As used herein, the term "mutant" means any heritable or acquired variation from the wild-type that alters the nucleotide sequence thereby changing the protein sequence. The term "mutant" is used interchangeably with the terms "marker", "biomarker", and "target" throughout the specification.
[0043] As used herein, the term "medical condition" includes, but is not limited to, any condition or disease manifested as one or more physical and/or psychological symptoms for which treatment and/or prevention is desirable, and includes previously and newly identified diseases and other disorders.
[0044] As used herein, the term "nucleotide pair" means the two nucleotides bound to each other between the two nucleotide strands.
[0045] As used herein, the term "polymorphic site" means a position within a locus at which at least two alternative sequences are found in a population, the most frequent of which has a frequency of no more than 99%.
[0046] As used herein, the term "polymorphism" means any sequence variant present at a frequency of >1% in a population. The sequence variant may be present at a frequency significantly greater than 1% such as 5% or 10% or more. Also, the term may be used to refer to the sequence variation observed in an individual at a polymorphic site. Polymorphisms include nucleotide substitutions, insertions, deletions and microsatellites and may, but need not, result in detectable differences in gene expression or protein function. [0047] As used herein, the term "polynucleotide" means any RNA or DNA, which may be unmodified or modified RNA or DNA. Polynucleotides include, without limitation, single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, RNA that is mixture of single- and double-stranded regions, and hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions. In addition, polynucleotide refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The term polynucleotide also includes DNAs or RNAs containing one or more modified bases and DNAs or RNAs with backbones modified for stability or for other reasons. In a particular embodiment, the polynucleotide contains polynucleotide sequences from the ERBB2 gene.
[0048] As used herein, the term "polypeptide" means any polypeptide comprising two or more amino acids joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres. Polypeptide refers to both short chains, commonly referred to as peptides, glycopeptides or oligomers, and to longer chains, generally referred to as proteins. Polypeptides may contain amino acids other than the 20 gene-encoded amino acids. Polypeptides include amino acid sequences modified either by natural processes, such as post- translational processing, or by chemical modification techniques that are well known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature. In a particular embodiment, the polypeptide contains polypeptide sequences from the ERBB2 protein.
[0049] As used herein, the term "small molecule" means a composition that has a molecular weight of less than about 5 kDa and more preferably less than about 2 kDa. Small molecules can be, e.g., nucleic acids, peptides, polypeptides, glycopeptides, peptidomimetics, carbohydrates, lipids, lipopolysaccharides, combinations of these, or other organic or inorganic molecules.
[0050] As used herein, the term "mutant nucleic acid" means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus, existing as alleles. Such mutant nucleic acids are preferably from about 15 to about 500 nucleotides in length. The mutant nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning. The mutaπt probes according to the invention are oligonucleotides that are complementary to a mutant nucleic acid.
[0051] As used herein, the term "SNP nucleic acid" means a nucleic acid sequence, which comprises a nucleotide that is variable within an otherwise identical nucleotide sequence between individuals or groups of individuals, thus, existing as alleles. Such SNP nucleic acids are preferably from about 15 to about 500 nucleotides in length. The SNP nucleic acids may be part of a chromosome, or they may be an exact copy of a part of a chromosome, e.g., by amplification of such a part of a chromosome through PCR or through cloning. The SNP nucleic acids are referred to hereafter simply as "SNPs". The SNP probes according to the invention are oligonucleotides that are complementary to a SNP nucleic acid. In a particular embodiment, the SNP is in the ERBB2 gene.
[0052] As used herein, the term "subject" means that preferably the subject is a mammal, such as a human, but can also be an animal, e.g., domestic animals (e.g., dogs, cats and the like), farm animals (e.g., cows, sheep, pigs, horses and the like) and laboratory animals (e.g., monkey (e.g., cynmologous monkey), rats, mice, guinea pigs and the like). [0053] As used herein, the administration of an agent or drug to a subject or patient includes self-administration and the administration by another. It is also to be appreciated that the various modes of treatment or prevention of medical conditions as described are intended to mean "substantial", which includes total but also less than total treatment or prevention, and wherein some biologically or medically relevant result is achieved. [0054] The details of one or more embodiments of the invention are set forth in the accompanying description below. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. Other features, objects, and advantages of the invention will be apparent from the description and the claims. In the specification and the appended claims, the singular forms include plural referents unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. AU references cited herein are incorporated herein by reference in their entireties and for all purposes to the same extent as if each individual publication, patent, or patent application was specifically and individually incorporated by reference in its entirety for all purposes. ERBB2 Mutations and Polymorphisms of the Invention.
[0055] To investigate ERBB2 mutations in association with breast cancer, DHPLC
(Lilleberg S.L.. Curr. Opin. Drug. Discov. DeveL, 6(2):237-52 (2003)) analysis was conducted on 45 tumour tissue samples from breast cancer patients. Six (6) SNPs and six (6) mutations of ERBB2 were identified as summarized below in TABLE 1 and TABLE 2 below.
Computational analyses were designed to evaluate the effect of these mutations on ERBB2 function.
TABLE l ERBB2 Mutations Identified in Breast Cancer Patients
Location Mutation/ Allelic Unmutated Mutated
SNP Frequency Sequence Sequence
Exon 9 τττ>cττ 0.3 CAAGAAGATCTTTGGGA CAAGAAGATCCTTGGGAG
F371 L GCCTG SEQ ID NO:0 i CCTG SEQ ID NO.02
21 and 27 bp CCCTTCCO Het GACGGCCCCTTCCCCAC GACGGCTCCTTCCTCACCC from 5' SA TCCTTCCT CCAC SEQ ID NO 03 AC SEQ ID NO.04 of exon 9
14bp 5' G>A Het CAGGTCATCGTGCCCAC CAGGTCATCATGCCCACT intron of TCT SEQ ID NO:05 CT SEQ ID NO 06 exon 9
Exon 12 TGC>AGC 0.5 CCCACCTCTGCTTCG CCCACCTCAGCTTCG
C475S SEQ ID NO:07 SEQ IDNO.08
Exon 13/14 CCA>CCG Het CTGGGGTCCAGGGCCCA CTGGGGTCCGGGGCCCAC
P523P CCC SEQ ID NO:09 CC SEQ ID NOMO
Exon 13/14 CCOCCA Het CACTGTTTGCCGTGCCAC CACTGTTTGCCATGCCACC
P562P CCTG SEQ ID NO: 11 CTG SEQ ID NO: 12
Exon 21 CGOCAG 0.25 GGGCTGGCTCGGCTGCT GGGCTGGCTCAGCTGCTG
R868Q GGACATT SEQ ID NO: 13 GACATT SEQ ID NO: 14
Exon 21 CCOTCC 0.2 TGGTCAAGAGTCCCAAC TGGTCAAGAGTTCCAACC
P856S CATGT SEQ ID NO: 15 ATGT SEQ ID NO: 16
Exon 21 GCT>TCT 0.15 GGACTTGGCCGCTCGGA GGACTTGGCCTCTCGGAA
A848S ACGTG SEQ ID NO: 17 CGTG SEQ ID NO: 18
Exon 21 GAOAAC 0.1 GCTGGACATTGACGAGA GCTGGACATTAACGAGAC
D873N CAGAGT SEQ ID NO: 19 AGAGT SEQ ID NO:20
Exon 25 GGOAGG Het TGACATGGGGGACCTGG TGACATGGAGGACCTGGT
G1015R TG SEQ ID N0:21 G SEQ ID NO:22
Exon 27 CCOGCC Het CTGGAAAGGCCCAAGAC CTGGAAAGGGCCAAGACT
Pl 170A TCTC SEQ IDNO:23 CTC....SEQ ID NO:24
* horn: homozygous; het: heterozygous; SA: acceptor site [0056] As shown above in TABLE 1 and further summarized below in TABLE 2, six (6) SNPs and six (6) mutations of ERBB2 were identified in the present invention.
TABLE 2
ERBB2 Mutations in Breast Cancer Patients
Gene Cancer Nucleotide Change Mutation/S Allele Observation
Change NP Freq.
ERB B2 Breast TTT>CTT F371L Mutation 0.3 1
ERBB2 Breast CCCTTCCO 21 and 27 bp from SNP Het 1
TCCTTCCT 51 SA of exon 9
ERBB2 Breast G>A 14bp 5' intron of SNP Het 1 exon 9
ERBB2 Breast TGOAGC C475S Mutation 0.5 1
ERBB2 Breast CCA>CCG P523P SNP Het 1
ERJBB2 Breast CCG>CCA P562P SNP Het 1
ERBB2 Breast CGOCAG R868Q Mutation 0.25 1
ERBB2 Breast CCOTCC P856S Mutation 0.2 ' 1
ERBB2 Breast GCT>TCT A848S Mutation 0.15 1
ERBB2 Breast GAOAAC D873N Mutation 0.1 1
ERBB2 Breast GGOAGG G1015R SNP Het 3
ERBB2 Breast CCOGCC Pl 170A SNP Het 23
* horn: homozygous; het: heterozygous; SA: acceptor site.
[0057] The mutations were located in the human ERBB2 gene (NP_004439).
Bioinformatics analysis of the ERBB2 mutations of the invention are further detailed in
EXAMPLE I.
[0058] Identification ofERBB2 Mutations and Polymorphisms of the Invention in Human
Cancers. Identification and Characterization of Gene Sequence Variation. Sequence variation in the human germline consists primarily of SNPs, the remainder being short tandem repeats (including micro-satellites), long tandem repeats (mini-satellites), and other insertions and deletions. A SNP is the occurrence of nucleotide variability at a single position in the genome, in which two alternative bases occur at appreciable frequency (i.e., >1%) in the human population. A SNP may occur within a gene or within intergenic regions of the genome.
[0059] Due to their prevalence and widespread nature, SNPs have the potential to be important tools for locating genes that are involved in human disease conditions. See e.g.,
Wang et al, Science 280: 1077-1082 (1998)).
[0060] An association between SNP's and/or mutations and a particular phenotype {e.g., cancer type) does not necessarily indicate or require that the SNP or mutation is causative of the phenotype. Instead, an association with a SNP may merely be due to genome proximity between a SNP and those genetic factors actually responsible for a given phenotype, such that the SNP and said genetic factors are closely linked. That is, a SNP may be in linkage disequilibrium ("LD") with the "true" functional variant. LD exists when alleles at two distinct locations of the genome are more highly associated than expected. Thus, a SNP may serve as a marker that has value by virtue of its proximity to a mutation or other DNA alteration (e.g., gene duplication) that causes a particular phenotype.
[0061] SNPs and mutations that are associated with disorders may also have a direct effect on the function of the genes in which they are located. For example, a sequence variant (e.g., SNP) may result in an amino acid change or may alter exon-intron splicing, thereby directly modifying the relevant protein, or it may exist in a regulatory region, altering the cycle of expression or the stability of the mRNA (see, e.g., Nowotny et al., Current Opinions in Neurobiology, 11:637-641 (2001)).
[0062] In describing the polymorphic and mutant sites of the invention, reference is made to the sense strand of the gene for convenience. As recognized by the skilled artisan, however, nucleic acid molecules containing the gene may be complementary double stranded molecules and thus reference to a particular site on the sense strand refers as well to the corresponding site on the complementary antisense strand. That is, reference may be made to the same polymorphic or mutant site on either strand and an oligonucleotide may be designed to hybridize specifically to either strand at a target region containing the polymorphic and/or mutant site. Thus, the invention also includes single-stranded polynucleotides and mutations that are complementary to the sense strand of the genomic variants described herein. [0063] Identification and Characterization of SNPs and Mutations. Many different techniques can be used to identify and characterize SNPs and mutations, including single- strand conformation polymorphism (SSCP) analysis, heteroduplex analysis by denaturing high-performance liquid chromatography (DHPLC), direct DNA sequencing and computational methods (Shi et al, CHn. Chem. 47:164-172 (2001)). There is a wealth of sequence information in public databases; computational tools useful to identify SNPs in silico by aligning independently submitted sequences for a given gene (either cDNA or genomic sequences). The most common SNP-typing methods currently include hybridization, primer extension, and cleavage methods. Each of these methods must be connected to an appropriate detection system. Detection technologies include fluorescent polarization (Chan et al., Genome Res. 9:492-499 (1999)), luminometric detection of pyrophosphate release (pyrosequencing) (Ahmadiian et al., Anal. Biochem. 280:103-10 (2000)), fluorescence resonance energy transfer (FRET)-based cleavage assays, DHPLC, and mass spectrometry (Shi, Clin Chetn 47:164-172 (2001); U.S. Pat. No. 6,300,076 Bl). Other methods of detecting and characterizing SNPs and mutations are those disclosed in U.S. Pat. Nos. 6,297,018 Bl and 6,300,063 Bl.
[0064] In a particularly preferred embodiment, the detection of polymorphisms and mutations is detected using INVADER™ technology (available from Third Wave Technologies Inc. Madison, Wisconsin USA). In this assay, a specific upstream "invader" oligonucleotide and a partially overlapping downstream probe together form a specific structure when bound to complementary DNA template. This structure is recognized and cut at a specific site by the Cleavase enzyme, resulting in the release of the 5' flap of the probe oligonucleotide. This fragment then serves as the "invader" oligonucleotide with respect to synthetic secondary targets and secondary fluorescently labelled signal probes contained in the reaction mixture. This results in specific cleavage of the secondary signal probes by the Cleavase enzyme. Fluorescent signal is generated when this secondary probe (labelled with dye molecules capable of fluorescence resonance energy transfer) is cleaved. Cleavases have stringent requirements relative to the structure formed by the overlapping DNA sequences or flaps and can, therefore, be used to specifically detect single base pair mismatches immediately upstream of the cleavage site on the downstream DNA strand. Ryan D et al, Molecular Diagnosis 4(2): 135-144 (1999) and Lyamichev V et al. Nature Biotechnology 17: 292-296 (1999), see also U.S. Pat. Nos. 5,846,717 and 6,001,567. [0065] The identity of polymorphisms and mutations may also be determined using a mismatch detection technique including, but not limited to, the RNase protection method using riboprobes (Winter et al, Proc. Natl. Acad. ScL USA 82:7575 (1985); Meyers et al, Science 230:1242 (1985)) and proteins which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich P, Ann Rev Genet 25:229-253 (1991)). Alternatively, variant alleles can be identified by single strand conformation polymorphism (SSCP) analysis (Orita et al, Genomics 5:874-879 (1989); Humphries et al, in Molecular Diagnosis of Genetic Diseases, Elles R, ed. (1996) pp. 321-340) or denaturing gradient gel electrophoresis (DGGE) (Wartell et al, Nucl Acids Res. 18:2699-2706 (1990); Sheffield et al, Proc. Natl. Acad. Sd. USA 86: 232-236 (1989)). A polymerase-mediated primer extension method may also be used to identify the polymorphisms/mutations. Several such methods have been described in the patent and scientific literature and include the "Genetic Bit Analysis" method (WO 92/15712) and the ligase/polymerase mediated genetic bit analysis (U.S. Pat. No. 5,679,524). Related methods are disclosed in WO 91/02087, WO 90/09455, WO 95/17676, and U.S. Pat. Nos. 5,302,509 and 5,945,283. Extended primers containing a polymorphism or mutation may be detected by mass spectrometry as described in U.S. Pat. No. 5,605,798. Another primer extension method is allele-specifϊc PCR. Ruafio et ah, Nucl. Acids Res. 17: 8392 (1989); Ruafio et ah, Nucl Acids Res. 19: 6877-6882 (1991); WO 93/22456; Turki et ah, J. CUn. Invest. 95: 1635-1641 (1995). In addition, multiple polymorphic and/or mutant sites may be investigated by simultaneously amplifying multiple regions of the nucleic acid using sets of allele-specifϊc primers as described in WO 89/10414.
[0066] Haplotyping and Genotyping Oligonucleotides. The invention provides methods and compositions for haplotyping and/or genotyping the genetic polymorphisms (and possibly mutations) in an individual. As used herein, the terms "genotype" and "haplotype" mean the genotype or haplotype containing the nucleotide pair or nucleotide, respectively, that is present at one or more of the novel polymorphic (or mutant) sites described herein and may optionally also include the nucleotide pair or nucleotide present at one or more additional polymorphic (or mutant) sites in the gene. The additional polymorphic (and mutant) sites may be currently known polymorphic/mutant sites or sites that are subsequently discovered. [0067] The compositions contain oligonucleotide probes and primers designed to specifically hybridize to one or more target regions containing, or that are adjacent to, a polymorphic or mutant site. Oligonucleotide compositions of the invention are useful in methods for genotyping and/or haplotyping a gene in an individual. The methods and compositions for establishing the genotype or haplotype of an individual at the novel polymorphic/mutant sites described herein are useful for studying the effect of the polymorphisms and mutations in the aetiology of diseases affected by the expression and function of the protein, studying the efficacy of drugs targeting, predicting individual susceptibility to diseases affected by the expression and function of the protein and predicting individual responsiveness to drugs targeting the gene product.
[0068] Some embodiments of the invention contain two or more differently labelled genotyping oligonucleotides, for simultaneously probing the identity of nucleotides at two or more polymorphic or mutant sites. It is also contemplated that primer compositions may contain two or more sets of allele-specific primer pairs to allow simultaneous targeting and amplification of two or more regions containing a polymorphic or mutant site. [0069] Genotyping oligonucleotides of the invention may be immobilized on or synthesized on a solid surface such as a microchip, bead, or glass slide (see, e.g., WO 98/20020 and WO 98/20019). Such immobilized genotyping oligonucleotides may be used in a variety of polymorphism and mutation detection assays, including but not limited to probe hybridization and polymerase extension assays. Immobilized genotyping oligonucleotides of the invention may comprise an ordered array of oligonucleotides designed to rapidly screen a DNA sample for polymorphisms and mutations in multiple genes at the same time. [0070] An allele-specific oligonucleotide primer of the invention has a 3' terminal nucleotide, or preferably a 3' penultimate nucleotide, that is complementary to only one nucleotide of a particular SNP, thereby acting as a primer for polymerase-mediated extension only if the allele containing that nucleotide is present. Allele-specific oligonucleotide (ASO) primers hybridizing to either the coding or noncoding strand are contemplated by the invention. An ASO primer for detecting gene polymorphisms and mutations can be developed using techniques known to those of skill in the art.
[0071] Other genotyping oligonucleotides of the invention hybridize to a target region located one to several nucleotides downstream of one of the novel polymorphic or mutant sites identified herein. Such oligonucleotides are useful in polymerase-mediated primer extension methods for detecting one of the novel polymorphisms or mutations described herein and therefore such genotyping oligonucleotides are referred to herein as "primer- extension oligonucleotides". In a preferred embodiment, the 3 '-terminus of a primer- extension oligonucleotide is a deoxynucleotide complementary to the nucleotide located immediately adjacent to the polymorphic/mutant site.
[0072] Direct Genotyping Method of the Invention. One embodiment of a genotyping method of the invention involves isolating from an individual a nucleic acid mixture comprising at least one copy of the gene of interest and/or a fragment or flanking regions thereof, and determining the identity of the nucleotide pair at one or more of the polymorphic/mutant sites in the nucleic acid mixture. As will be readily understood by the skilled artisan, the two "copies" of a germline gene in an individual may be the same on each allele or may be different on each allele. In a particularly preferred embodiment, the genotyping method comprises determining the identity of the nucleotide pair at each polymorphic and mutant site. [0073] Typically, the nucleic acid mixture is isolated from a biological sample taken from the individual, such as a blood sample, tumour or tissue sample. Suitable tissue samples include whole blood, tumour or as part of any tissue type, semen, saliva, tears, urine, faecal material, sweat, buccal smears, skin and hair. The nucleic acid mixture may be comprised of genomic DNA, mRNA, or cDNA and, in the latter two cases, the biological sample must be obtained from an organ in which the gene may be expressed. Furthermore, it will be understood by the skilled artisan that mRNA or cDNA preparations would not be used to detect polymorphisms or mutations located in introns or in 5' and 3' nontranscribed regions. If a gene fragment is isolated, it must usually contain the polymorphic and/or mutant sites to be genotyped. Exceptions can include mutations leading to truncation of the gene where a specific polymorphism may be lost. In these cases, the specific DNA alterations are determined by assessing the flanking sequences of the gene and underscore the need to specifically look for both polymorphisms and mutations.
[0074] Direct Haplotyping Method of the Invention. One embodiment of the haplotyping method of the invention comprises isolating from an individual a nucleic acid molecule containing only one of the two copies of a gene of interest, or a fragment thereof, and determining the identity of the nucleotide at one or more of the polymorphic or mutant sites in that copy. The nucleic acid may be isolated using any method capable of separating the two copies of the gene or fragment. As will be readily appreciated by those skilled in the art, any individual clone will only provide haplotype information on one of the two gene copies present in an individual. If haplotype information is desired for the individual's other copy, additional clones will need to be examined. Typically, at least five clones should be examined to have more than a 90% probability of haplotyping both copies of the gene in an individual. In a particularly preferred embodiment, the nucleotide at each polymorphic or mutant site is identified.
[0075] In a preferred embodiment, a haplotype pair is determined for an individual by identifying the phased sequence of nucleotides at one or more of the polymorphic/mutant sites in each copy of the gene that is present in the individual. In a particularly preferred embodiment, the haplotyping method comprises identifying the phased sequence of nucleotides at each polymorphic/mutant site in each copy of the gene. When haplotyping both copies of the gene, the identifying step is preferably performed with each copy of the gene being placed in separate containers. However, if the two copies are labelled with different tags, or are otherwise separately distinguishable or identifiable, it is possible in some cases to perform the method in the same container. For example, if the first and second copies of the gene are labelled with different first and second fluorescent dyes, respectively, and an allele-specific oligonucleotide labelled with yet a third different fluorescent dye is used to assay the polymorphic/mutant sites, then detecting a combination of the first and third dyes would identify the polymorphism or mutation in the first gene copy, while detecting a combination of the second and third dyes would identify the polymorphism or mutation in the second gene copy.
[0076] In both the genotyping and haplotyping methods, the identity of a nucleotide (or nucleotide pair) at a polymorphic and/or mutant site may be determined by amplifying a target region containing the polymorphic and/or mutant sites directly from one or both copies of the gene, or fragments thereof, and sequencing the amplified regions by conventional methods. It will be readily appreciated by the skilled artisan that only one nucleotide will be detected at a polymorphic or mutant site in individuals who are homozygous at that site, while two different nucleotides will be detected if the individual is heterozygous for that site. The polymorphism or mutation may be identified directly, known as positive-type identification, or by inference, referred to as negative-type identification. For example, where a SNP is known to be guanine and cytosine in a reference population, a site may be positively determined to be either guanine or cytosine for all individuals homozygous at that site, or both guanine and cytosine, if the individual is heterozygous at that site. Alternatively, the site may be negatively determined to be not guanine (and thus cytosine/cytosine) or not cytosine (and thus guanine/guanine).
[0077] Indirect Genotyping Method using Polymorphic and Mutation Sites in Linkage Disequilibrium with a Target Polymorphism or Mutation. In addition, the identity of the alleles present at any of the novel polymorphic/mutant sites of the invention may be indirectly determined by genotyping other polymorphic/mutant sites in linkage disequilibrium with those sites of interest. As described supra, two sites are said to be in linkage disequilibrium if the presence of a particular variant (polymorphism or mutation) at one site is indicative of the presence of another variant at a second site. See, Stevens JC, MoI Diag. 4:309-317 (1999). Polymorphic and mutant sites in linkage disequilibrium with the polymorphic or mutant sites of the invention may be located in regions of the same gene or in other genomic regions. Genotyping of a polymorphic/mutant site in linkage disequilibrium with the novel polymoφhic/mutant sites described herein may be performed by, but is not limited to, any of the above-mentioned methods for detecting the identity of the allele at a polymorphic/mutant site.
[0078] Amplifying a Target Gene Region. The target regions may be amplified using any oligonucleotide-directed amplification method, including but not limited to polymerase chain reaction (PCR). (U.S. Pat. No. 4,965,188), ligase chain reaction (LCR)<Barany et al, Proc. Natl. Acad. Sci. USA 88:189-193 (1991); published PCT patent application WO 90/01069), and oligonucleotide ligation assay (OLA) (Landegren et al., Science 241: 1077-1080 (1988)). Oligonucleotides useful as primers or probes in such methods should specifically hybridize to a region of the nucleic acid that contains or is adjacent to the polymorphic/mutant site. Typically, the oligonucleotides are between 10 and 35 nucleotides in length and preferably, between 15 and 30 nucleotides in length. Most preferably, the oligonucleotides are 20 to 25 nucleotides long. The exact length of the oligonucleotide will depend on many factors that are routinely considered and practiced by the skilled artisan.
[0079] Other known nucleic acid amplification procedures may be used to amplify the target region including transcription-based amplification systems (U.S. Pat. No. 5,130,238; EP 329,822; U.S. Pat. No. 5,169,766, published PCT patent application WO 89/06700) and isothermal methods (Walker et al., Proc. Natl. Acad. Sci. USA 89: 392-396 (1992)). [0080] A polymorphism or mutation in the target region may be assayed before or after amplification using one of several hybridization-based methods known in the art. Typically, allele-specific oligonucleotides are utilized in performing such methods. The allele-specific oligonucleotides may be used as differently labelled probe pairs, with one member of the pair showing a perfect match to one variant of a target sequence and the other member showing a perfect match to a different variant. In some embodiments, more than one polymorphic/mutant site may be detected at once using a set of allele-specific oligonucleotides or oligonucleotide pairs. Preferably, the members of the set have melting temperatures within 5°C, and more preferably within 2°C, of each other when hybridizing to each of the polymorphic or mutant sites being detected.
[0081] Hybridizing Allele-Specific Oligonucleotide to a Target Gene. Hybridization of an allele-specific oligonucleotide to a target polynucleotide may be performed with both entities in solution, or such hybridization may be performed when either the oligonucleotide or the target polynucleotide is covalently or noncovalently affixed to a solid support. Attachment may be mediated, for example, by antibody-antigen interactions, poly-L-Lys, streptavidin or avidin-biotin, salt bridges, hydrophobic interactions, chemical linkages, UV cross-linking, baking, etc. Allele-specifϊc oligonucleotide may be synthesized directly on the solid support or attached to the solid support subsequent to synthesis. Solid-supports suitable for use in detection methods of the invention include substrates made of silicon, glass, plastic, paper and the like, which may be formed, for example, into wells (as in 96-well plates), slides, sheets, membranes, fibres, chips, dishes, and beads. The solid support may be treated, coated or derivatised to facilitate the immobilization of the allele-specifϊc oligonucleotide or target nucleic acid. .
[0082] The genotype or haplotype for the gene of an individual may also be determined by hybridization of a nucleic sample containing one or both copies of the gene to nucleic acid arrays and subarrays such as described in WO 95/11995. The arrays would contain a battery of allele-specifϊc oligonucleotides representing each of the polymorphic or mutant sites to be included in the genotype or haplotype.
[0083] Determining Population Genotypes and Haplotypes and Correlating them with a Trait. The present invention provides a method for determining the frequency of a genotype or haplotype in a population. The method comprises determining the genotype or the haplotype for a gene present in each member of the population, wherein the genotype or haplotype comprises the nucleotide pair or nucleotide detected at one or more of the polymorphic sites in the gene and mutations identified in the region, and calculating the frequency at which the genotype or haplotype is found in the population. The population may be a reference population, a family population, a same sex population, a population group, or a trait population (e.g., a group of individuals exhibiting a trait of interest such as a medical condition or response to a therapeutic treatment).
[0084] In another aspect of the invention, frequency data for genotypes and/or haplotypes found in a reference population are used in a method for identifying an association between a trait and a genotype or a haplotype. The trait may be any detectable phenotype, including but not limited to cancer, susceptibility to a disease or response to a treatment. The method involves obtaining data on the frequency of the genotypes or haplotypes of interest in a reference population and comparing the data to the frequency of the genotypes or haplotypes in a population exhibiting the trait. Frequency data for one or both of the reference and trait populations may be obtained by genotyping or haplotyping each individual in the populations using one of the methods described above. The haplotypes for the trait population may be determined directly or, alternatively, by the predictive genotype to haplotype approach described above.
[0085] In preferred embodiments, the trait is susceptibility to a disease, severity of a disease, the staging of a disease or response to a drug. Such methods have applicability in developing diagnostic tests and therapeutic treatments. for all pharmacogenetic applications where there is the potential for an association between a genotype and a treatment outcome, including efficacy measurements, PD measurements, PK measurements and side effect measurements.
[0086] In another embodiment, the frequency data for the reference and/or trait populations. are obtained by accessing previously determined frequency, data, which may be in written or electronic form. For example, the frequency data may be present in a database that is accessible by a computer. Once the frequency data are obtained, the frequencies of the genotypes or haplotypes of interest in the reference and trait populations are compared. In a preferred embodiment, the frequencies of all genotypes and/or haplotypes observed in the populations are compared. If a particular genotype or haplotype for the gene is more frequent in the trait population than in the reference population at a statistically significant amount, then the trait is predicted to be associated with that genotype or haplotype. [00087] In a preferred embodiment, the haplotype frequency data for different ethnogeographic groups are examined to determine whether they are consistent with Hardy- Weinberg equilibrium. Hartl DL et al, Principles of Population Genomics, 3rd Ed. (Sinauer Associates, Sunderland, MA, 1997). Hardy- Weinberg equilibrium postulates that the frequency of finding the haplotype pair HZH2 is equal to PH-W
Figure imgf000025_0001
(H2) if Hi ≠ H2 and PH-W (HxIHi) —p CHi) p (H2) if Hi = H2. A statistically significant difference between the observed and expected haplotype frequencies could be due to one or more factors including significant inbreeding in the population group, strong selective pressure on the gene, sampling bias, and/or errors in the genotyping process. If large deviations from Ηardy- Weinberg equilibrium are observed in an ethnogeographic group, the number of individuals in that group can be increased to see if the deviation is due to a sampling bias. If a larger sample size does not reduce the difference between observed and expected haplotype pair frequencies, then one may wish to consider haplotyping the individual using a direct haplotyping method such as, for example, CLASPER System™ technology (U.S. Pat. No. 5,866,404), SMD, or allele-specific long-range PCR (Michalotos-Beloin et ah, Nucl. Acids Res. 24: 4841-4843 (1996)).
[0088] In one embodiment of this method for predicting a haplotype pair, the assigning step involves performing the following analysis. First, each of the possible haplotype pairs is compared to the haplotype pairs in the reference population. Generally, only one of the haplotype pairs in the reference population matches a possible haplotype pair and that pair is assigned to the individual. Occasionally, only one haplotype represented in the reference haplotype pairs is consistent with a possible haplotype pair for an individual, and in such cases the individual is assigned a haplotype pair containing this known haplotype and a new haplotype derived by subtracting the known haplotype from the possible haplotype pair. In rare cases, either no haplotypes in the reference population are consistent with the possible haplotype pairs, or alternatively, multiple reference haplotype pairs are consistent with the possible haplotype pairs. In such cases, the individual is preferably haplotyped using a direct molecular haplotyping method such as, for example, those discussed supra. [0089] In a preferred embodiment, statistical analysis is performed by the use of standard ANOVA tests with a Bonferoni correction and/or a bootstrapping method that simulates the genotype phenotype correlation many times and calculates a significance value. When many polymorphisms and/or mutations are being analyzed, a calculation may be performed to correct for a significant association that might be found by chance. For statistical methods useful in the methods of the present invention, see Bailey NTJ, Statistical Methods in Biology, 3rd Edition (Cambridge Univ. Press, Cambridge, 1997); Waterman MS, Introduction to Computational Biology (CRC Press, 2000) and Bioinformatics, Baxevanis AD & Ouellette BFF, eds. (John Wiley & Sons, Inc., 2001).
[0090] In a preferred embodiment of the method, the trait of interest is a clinical response exhibited by a patient to some therapeutic treatment, for example, response to a drug targeting or to a therapeutic treatment for a medical condition.
[0091] In another embodiment of the invention, a detectable genotype or haplotype that is in linkage disequilibrium with a genotype or haplotype of interest may be used as a surrogate marker. A genotype that is in linkage disequilibrium with another genotype is indicated where a particular genotype or haplotype for a given gene is more frequent in the population that also demonstrates the potential surrogate marker genotype than in the reference population. If the frequency is statistically significant, then the marker genotype is predictive of that genotype or haplotype, and can be used as a surrogate marker. [0092] Correlating Subject Genotype or Haplotype to Treatment Response. In order to deduce a correlation between a clinical response to a treatment and a genotype or haplotype, genotype or haplotype data is obtained on the clinical responses exhibited by a population of individuals who received the treatment, hereinafter the "clinical population". This clinical data may be obtained by analyzing the results of a clinical trial that has already been previously conducted and/or by designing and carrying out one or more new clinical trials. [0093] It is preferred that the individuals included in the clinical population be graded for the existence of the medical condition of interest. This grading of potential patients could employ a standard physical exam or one or more lab tests. Alternatively, grading of patients could use genotyping or haplotyping for situations where there is a strong correlation between haplotype pair and disease susceptibility or severity.
[0094] The therapeutic treatment of interest is administered to each individual in the trial population, and each individual's response to the treatment is measured using one or more predetermined criteria. It is contemplated that in many cases, the trial population will exhibit a range of responses, and that the investigator may choose more than one responder groups {e.g., low, medium, high) made up by the various responses. In addition, the gene for each individual in the trial population is genotyped and/or haplotyped, which may be done before or after administering the treatment.
[0095] These results are then analyzed to determine if any observed variation in clinical response between polymoφhism/mutation groups is statistically significant. Statistical analysis methods, which may be used, are described in Fisher LD & vanBelle G, Biostatistics: A Methodology for the Health Sciences (Wiley-lnterscience, New York, 1993). This analysis may also include a regression calculation of which polymorphic/mutation sites in the gene contribute most significantly to the differences in phenotype.
[0096] A second method for finding correlations between genotype and haplotype content and clinical responses uses predictive models based on error-minimizing optimization algorithms, one of which is a genetic algorithm. Judson R, Genetic Algorithms and Their Uses in Chemistry, in Reviews in Computational Chemistry, Vol. 10, Lipkowitz KB & Boyd DB, eds. (VCH Publishers, New York, 1997) pp. 1-73. Simulated annealing (Press et al, Numerical Recipes in C: The Art of Scientific Computing, Ch. 10 (Cambridge University Press, Cambridge, 1992)), neural networks (Rich E & Knight K, Artificial Intelligence, 2nd Edition, Ch. 10 (McGraw-Hill, New York, 1991), standard gradient descent methods (Press et al, Numerical Recipes in C: The Art of Scientific Computing, Ch. 10 (Cambridge University Press, Cambridge, 1992), or other global or local optimization approaches (see discussion in Judson, supra) can also be used.
[0097] Correlations may also be analyzed using analysis of variation (ANOVA) techniques to determine how much of the variation in the clinical data is explained by different subsets of the polymorphic and mutant sites in the gene. ANOVA is used to test hypotheses about whether a response variable is caused by or correlates with one or more traits or variables that can be measured (Fisher & vanBelle, supra, Ch. 10).
[0098] After the clinical, mutation and polymorphism data have been obtained, correlations between individual response and genotype or haplotype content are created. Correlations may be produced in several ways. In one method, individuals are grouped by their genotype or haplotype (or haplotype pair) (also referred to as a polymorphism/mutation group), and then the averages and standard deviations of clinical responses exhibited by the members of each polymorphism/mutation group are calculated.
[0099] From the analyses described above, the skilled artisan that predicts clinical response as a function of genotype or haplotype content may readily construct a mathematical model. The identification of an association between a clinical response and a genotype or haplotype (or haplotype pair) for the gene may be the basis for designing a diagnostic method to determine those individuals who will or will not respond to the treatment, or alternatively, will respond at a lower level and thus may require more treatment, i.e., a greater dose of a drug or suffer an adverse reaction. The diagnostic method may take one of several forms: for example, a direct DNA test {i.e., genotyping or haplotyping one or more of the polymorphic/mutant sites in the gene), a serological test, or a physical exam measurement. The only requirement is that there be a good correlation between the diagnostic test results and the underlying genotype or haplotype. In a preferred embodiment, this diagnostic method uses the predictive genotyping/haplotyping method described above.
[0100] Patient Selection for Therapy Based Upon Polymorphisms and/or Mutations. The application of genotypes and/or haplotypes that correlate with efficacious drug responses will be used to select patients for therapy of existing diseases. Genotypes and haplotypes that correlate with adverse consequences will be used to either modify how the drug is administered (e.g., dose, schedule or in combination with other drugs) or eliminated as an option.
[0101] Patient Selection for Prophylactic Therapy Based Upon Polymorphisms and/or Mutations. The application of genotypes and/or haplotypes that correlate with a predisposition for disease will be used to select patients for preventative therapy. [0102] Computer System for Storing or Displaying Polymorphism and Mutation Data. The invention also provides a computer system for storing and displaying polymorphism and mutation data determined for the gene. The computer system comprises a computer processing unit, a display, and a database containing the polymorphism/mutation data. The polymorphism/mutation data includes the polymorphisms, mutations, the genotypes and the haplotypes identified for a given gene in a reference population. In a preferred embodiment, the computer system is capable of producing a display showing haplotypes organized according to their evolutionary relationships. A computer may implement any or all analytical and mathematical operations involved in practicing the methods of the present invention. In addition, the computer may execute a program that generates views (or screens) displayed on a display device and with which the user can interact to view and analyze large amounts of information relating to the gene and its genomic variation, including chromosome location, gene structure, and gene family, gene expression data, polymorphism data, mutation data, genetic sequence data, and clinical population data (e.g.. data on ethnogeographic origin, clinical responses, genotypes, and haplotypes for one or more populations). The polymorphism and mutation data described herein may be stored as part of a relational database (e.g., an instance of an Oracle database or a set of ASCII flat files). These polymorphism and mutation data may be stored on the computer's hard drive or may, for example, be stored on a CD-ROM or on one or more other storage devices accessible by the computer. For example, the data may be stored on one or more databases in communication with the computer via a network.
[0103] Nucleic Acid-based Diagnostics. In another aspect, the invention provides SNP and mutation probes, which are useful in classifying subjects according to their types of genetic variation. The SNP and mutation probes according to the invention are oligonucleotides, which discriminate between SNPs or mutations and the wild-type sequence in conventional allelic discrimination assays. In certain preferred embodiments, the oligonucleotides according to this aspect of the invention are complementary to one allele of the SNP/mutant nucleic acid, but not to any other allele of the SNP/Mutant nucleic acid. Oligonucleotides according to this embodiment of the invention can discriminate between SNPs and mutations in various ways. For example, under stringent hybridization conditions, an oligonucleotide of appropriate length will hybridize to one SNP or mutation, but not to any other. The oligonucleotide may be labelled using a radiolabel or a fluorescent molecular tag. Alternatively, an oligonucleotide of appropriate length can be used as a primer for PCR, wherein the 3 ' terminal nucleotide is complementary to one allele containing a SNP or mutation, but not to any other allele. In this embodiment, the presence or absence of amplification by PCR determines the haplotype of the SNP or the specific mutation. [0104] Genomic and cDNA fragments of the invention comprise at least one novel polymorphic site or mutation identified herein, have a length of at least 10 nucleotides and may range up to the full length of the gene. Preferably, a fragment according to the present invention is between 100 and 3000 nucleotides in length, and more preferably between 200 and 2000 nucleotides in length, and most preferably between 500 and 1000 nucleotides in length.
[0105] Kits of the Invention. The invention provides nucleic acid and polypeptide detection kits useful for haplotyping and/or genotyping the genes in an individual. Such kits are useful for classifying individuals for the purpose of classifying individuals. Specifically, the invention encompasses kits for detecting the presence of a polypeptide or nucleic acid corresponding to a marker of the invention in a biological sample, e.g., any tissue or bodily fluid including, but not limited to, serum, plasma, lymph, cystic fluid, urine, stool, cerebrospinal fluid, ascites fluid or blood, and including biopsy samples of body tissue. For example, the kit can comprise a labelled compound or agent capable of detecting a polypeptide or an mRNA encoding a polypeptide corresponding to a marker of the invention in a biological sample and means for determining the amount of the polypeptide or mRNA in the sample, e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA or mRNA encoding the polypeptide. Kits can also include instructions for interpreting the results obtained using the kit.
[0106] In another embodiment, the invention provides a kit comprising at least two genotyping oligonucleotides packaged in separate containers. The kit may also contain other components such as hybridization buffer (where the oligonucleotides are to be used as a probe) packaged in a separate container. Alternatively, where the oligonucleotides are to be used to amplify a target region, the kit may contain, packaged in separate containers, a polymerase and a reaction buffer optimized for primer extension mediated by the polymerase, such as in the case of PCR.
[0107] In a preferred embodiment, such kit may further comprise a DNA sample collecting means. In particular, the genotyping primer composition may comprise at least two sets of allele specific primer pairs. Preferably, the two genotyping oligonucleotides are packaged in separate containers.
[0108] For antibody-based kits, the kit can comprise, e.g., (1) a first antibody, e.g., attached to a solid support, which binds to a polypeptide corresponding to a marker or the invention; and, optionally; (2) a second, different antibody which binds to either the polypeptide or the first antibody and is conjugated to a detectable label.
[0109] For oligonucleotide-based kits, the kit can comprise, e.g., (1) an oligonucleotide, e.g., a detectably-labelled oligonucleotide, which hybridizes to a nucleic acid sequence encoding a polypeptide corresponding to a marker of the invention; or (2) a pair of primers useful for amplifying a nucleic acid molecule corresponding to a marker of the invention. [01 10] The kit can also comprise, e.g , a buffering agent, a preservative or a protein- stabilizing agent. The kit can further comprise components necessary for detecting the detectable-label, e.g., an enzyme or a substrate. The kit can also contain a control sample or a series of control samples, which can be assayed and compared to the test sample. Each component of the kit can be enclosed within an individual container and all of the various containers can be within a single package, along with instructions for interpreting the results of the assays performed using the kit.
[01 1 1] Making Polymorphisms and Mutations of the Invention. Effects of the polymorphisms and mutations identified herein on gene expression may be investigated by preparing recombinant cells and/or organisms, preferably recombinant animals, containing a polymorphic variant and/or mutation of the gene.
[01 12] In one aspect, the present invention includes one or more polynucleotides encoding mutant or polymorphic polypeptides, including degenerate variants thereof. The invention also encompasses allelic variants of the same, that is, naturally occurring alternative forms of the isolated polynucleotides that encode mutant polypeptides that are identical, homologous or related to those encoded by the polynucleotides. Alternatively, non-naturally occurring variants may be produced by mutagenesis techniques or by direct synthesis techniques well known in the art. Accordingly, nucleic acid sequences capable of hybridizing at low stringency with any nucleic acid sequences encoding mutant polypeptide of the present invention are considered to be within the scope of the invention. For example, for a nucleic acid sequence of about 20-40 bases, a typical prehybridization, hybridization, and wash protocol is as follows: (1) prehybridization: incubate nitrocellulose filters containing the denatured target DNA for 3-4 hours at 550C in SxDenhardt's solution, 6xSSC (2OxSSC consists of 175 g NaCl, 88.2 g sodium citrate in 800 ml H2O adjusted to pH. 7.0 with I O N NaOH), 0.1 % SDS, and 100 mg/ml denatured salmon sperm DNA, (2) hybridization: incubate filters in prehybridization solution plus probe at 42°C for 14-48 hours, (3) wash; three 15 minutes washes in 6xSSC and 0.1% SDS at room temperature, followed by a final 1-1.5 minutes wash in 6xSSC and 0.1% SDS at 55°C. Other equivalent procedures, e.g., employing organic solvents such as formamide, are well known in the art. Standard stringency conditions are well characterized in standard molecular biology cloning texts. See, for example, Sambrook, Fritsch, & Maniatis, Molecular Cloning A Laboratory Manual, 2nd Ed., (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989); Glover DN, DNA Cloning, Volumes 1 and II , (1985); Oligonucleotide Synthesis, Gait MJ, ed. (1984); Nucleic Acid Hybridization, Hames BD & Higgins SJ, eds. (1984).
[01 13] Recombinant Expression Vectors. Another aspect of the invention includes vectors containing one or more nucleic acid sequences encoding a mutant or polymorphic polypeptide. In practicing the present invention, many conventional techniques in molecular biology, microbiology and recombinant DNA are used. These techniques are well known and are explained in, e.g., Current Protocols in Molecular Biology, VoIs. I-III, Ausubel, ed. (1997); Sambrook et a!., Molecular Cloning: A Laboratory Manual, 2"d Edition. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989); Glover DN, DNA Cloning: A Practical Approach, VoIs. I and II (1985); Oligonucleotide Synthesis, Gait, Ed. (1984); Nucleic Acid Hybridization, Hames & Higgins, eds. (1985); Transcription and Translation, Hames & Higgins, Eds. (1984); Animal Cell Culture, Freshney, ed. (1986); Immobilized Cells and Enzymes (IRL Press, 1986); Perbal, A Practical Guide to Molecular Cloning; the series Methods in EnzymoL, (Academic Press, Inc., 1984); Gene Transfer Vectors for Mammalian Cells, Miller & Calos, eds. (Cold Spring Harbor Press, Cold Spring Harbor Laboratory, New York, 1987); and Methods in Enzymology, VoIs. 154 and 155, Wu & Grossman, and Wu, Eds., respectively. [01 14] For recombinant expression of one or more the polypeptides of the invention, the nucleic acid containing all or a portion of the nucleotide sequence encoding the polypeptide is inserted into an appropriate cloning vector, or an expression vector (i.e., a vector that contains the necessary elements for the transcription and translation of the inserted polypeptide coding sequence) by recombinant DNA techniques well known in the art and as detailed below. [0115] In general, expression vectors useful in recombinant DNA techniques are often in the form of plasmids. In the present specification, "plasmid" and "vector" can be used interchangeably as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors that are not technically plasmids. such as viral vectors (e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions. Such viral vectors permit infection of a subject and expression in that subject of a compound. Becker et al, Meth. Cell Biol. 43: 161 89 (1994).
[0116] The recombinant expression vectors of the invention comprise a nucleic acid encoding a mutant or polymorphic polypeptide in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory sequences, selected on the basis of the host cells to be used for expression that is operatively linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory sequences in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
[01 17] The term "regulatory sequence" is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals). Such regulatory sequences are described, for example, in Goeddel, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990). Regulatory sequences include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue specific regulatory sequences). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression of polypeptide desired, etc. The expression vectors of the invention can be introduced into host cells to thereby produce polypeptides or peptides, inclυding fusion polypeptides, encoded by nucleic acids as described herein (e.g., mutant polypeptides and mutant-derived fusion polypeptides, etc.).
[0118] Mutant and Polymorphic Polypeptide-Expressing Host Cells. Another aspect of the invention pertains to mutant and polymorphic polypeptide-expressing host cells, which contain a nucleic acid encoding one or more mutant/polymorphic polypeptides of the invention. To prepare a recombinant cell of the invention, the desired isogene may be introduced into a host cell in a vector such that the isogene remains extrachromosomal. In such a situation, the gene will be expressed by the cell from the extrachromosomal location. In a preferred embodiment, the isogene is introduced into a cell in such a way that it recombines with the endogenous gene present in the cell. Such recombination requires the occurrence of a double recombination event, thereby resulting in the desired gene polymorphism or mutation. Vectors for the introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector or vector construct may be used in the invention. Methods such as electroporation, particle bombardment, calcium phosphate co-precipitation and viral transduction for introducing DNA into cells are known in the art; therefore, the choice of method may lie with the competence and preference of the skilled practitioner.
[01 19] The recombinant expression vectors of the invention can be designed for expression of mutant polypeptides in prokaryotic or eukaryotic cells. For example, mutant/polymorphic polypeptides can be expressed in bacterial cells such as Escherichia coli (E. colϊ), insect cells (using baculovirus expression vectors), fungal cells, e.g., yeast, yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990). Alternatively, the recombinant expression vector can be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase. The SMP2 promoter is useful in the expression of polypeptides in smooth muscle cells, Qian et ah, Endocrinology 140(4): 1826 (1999).
[0120] Expression of polypeptides in prokaryotcs is most often carried out in E coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non fusion polypeptides. Fusion vectors add a number of amino acids to a polypeptide encoded therein, usually to the amino terminus of the recombinant polypeptide. Such fusion vectors typically serve three purposes: (i) to increase expression of recombinant polypeptide; (ii) to increase the solubility of the recombinant polypeptide; and (iii) to aid in the purification of the recombinant polypeptide by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant polypeptide to enable separation of the recombinant polypeptide from the fusion moiety subsequent to purification of the fusion polypeptide. Such enzymes, and their cognate recognition sequences, include Factor Xa, thrombin and enterokinase. Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, Gene 67: 31 40 (1988)), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S transferase (GST), maltose E binding polypeptide, or polypeptide A, respectively, to the target recombinant polypeptide. [0121] Examples of suitable inducible non fusion E. coli expression vectors include pTrc (Amrann et al , Gene 69:301 315 (1988)) and pET 1 Id (Studier et al. Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif, 1990) pp. 60-89). [0121] One strategy to maximize recombinant polypeptide expression in E. coli is to express the polypeptide in host bacteria with an impaired capacity to proteolytically cleave the recombinant polypeptide. See. e.g.. Gottesman, Gene Expression Technology: Methods In Enzymology (Academic Press, San Diego, Calif., 1990) 1 19 128. Another strategy is to alter the nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that the individual codons for each amino acid are those preferentially utilized in the expression host, e.g., E. coli (see, e.g., Wada et al, Niicl. Acids Res. 20: 21 1 1-2118 (1992)). Such alteration of nucleic acid sequences of the invention can be carried out by standard DNA synthesis techniques. In another embodiment, the mutant/polymorphic polypeptide expression vector is a yeast expression vector.
[0122] Examples of vectors for expression in yeast Saccharomyces cerivisiae include pYepSecl (Baldari et al, EMBO J. 6: 229 234 (1987)), pMFa (Kurjan & Herskowitz, Cell 30: 933 943 (1982)), pJRY88 (Schultz et al, Gene 54: 1 13 123 (1987)), pYES2 (InVitrogen Corporation, San Diego, Calif., USA), and picZ (InVitrogen Corp, San Diego, Calif., USA). Alternatively, mutant polypeptide can be expressed in insect cells using baculovirus expression vectors. Baculovirus vectors available for expression of polypeptides in cultured insect cells (e.g., SF9 cells) include the pAc series (Smith et al, MoI Cell Biol. 3: 2156 2165 (1983)) and the pVL series (Lucklow & Summers, Virology 170: 31 39 (1989)). [0123] In yet another embodiment, a nucleic acid of the invention is expressed in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 (Seed, Nature 329: 842 846 (1987)) and pMT2PC (Kaufman et al, EMBO J. 6: 187 195 (1987)). When used in mammalian cells, the expression vector's control functions are often provided. by viral regulatory elements. For example, commonly used promoters are derived from polyoma, adenovirus 2, cytomegalovirus, and simian virus 40. For other suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al., Molecular Cloning: A Laboratory Manual, 2nd Ed(CoId Spring Harbor Laboratory Press, Cold Spring Harbor, New York. 1989). [0124] In another embodiment, the recombinant mammalian expression vector is capable of directing expression of the nucleic acid preferentially in a particular cell type (e.g., tissue specific regulatory elements are used to express the nucleic acid). Tissue specific regulatory elements are known in the art. Nonlimiting examples of suitable tissue specific promoters include the albumin promoter (liver specific; Pinkert, et al., Genes Dev 1 : 268 277 (1987)), lymphoid specific promoters (Calame Sc Eaton, Adv. Immunol. 43: 235 275 (1988)), in particular promoters of T cell receptors (Winoto & Baltimore, EMBO J. 8: 729 733 (1989)) and immunoglobulins (Banerji et al, Cell 33: 729 740 (1983); Queen & Baltimore, Cell 33: 741 748 (1983)), neuron specific promoters (e.g., the neurofilament promoter; Byrne & Ruddle, Proc. Natl. Acad. Sci. USA 86: 5473 5477 (1989)), pancreas specific promoters (Edlund et al., Science 230: 912 916 (1985)), and mammary gland specific promoters (e.g., milk whey promoter; U.S. Pat. No. 4.873,316 and European Application Publication No. 264,166). Developmentally regulated promoters are also encompassed, e.g., the murine hox promoters (Kessel & Gruss, Science 249: 374 379 (1990)) and the α-fetoprotein promoter (Campes & Tilghman, Genes Dev. 3: 537 546 (1989)).
[0125] The invention further provides a recombinant expression vector comprising a DNA molecule of the invention cloned into the expression vector in an antisense orientation. That is, the DNA molecule is operatively linked to a regulatory sequence in a manner that allows for expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to a mutant polypeptide mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the antisense orientation can be chosen that direct the continuous expression of the antisense RNA molecule in a variety of cell types, for instance viral promoters and/or enhancers, or regulatory sequences can be chosen that direct constitutive, tissue specific or cell type specific expression of antisense RNA. The antisense expression vector can be in the form of a recombinant plasmid, phagemid or attenuated virus in which antisense nucleic acids are produced under the control of a high efficiency regulatory region, the activity of which can be determined by the cell type into which the vector is introduced. For a discussion of the regulation of gene expression using antisense genes see, e.g., Weintraub et al., "Antisense RNA as a molecular tool for genetic analysis," Reviews Trends in Genetics, Vol. 1(1) (1986). [0126] Another aspect of the invention pertains to host cells into which a recombinant expression vector of the invention has been introduced. The terms "host cell" and "recombinant host cell" are used interchangeably herein. It is understood that such terms refer not only to the particular subject cell but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein. [0127] A host cell can be any prokaryotic or eukaryotic cell. For example, mutant polypeptide can be expressed in bacterial cells such as E. coli, insect cells, yeast or mammalian cells (such as Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are known to those skilled in the art.
[0128] Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques. As used herein, the terms "transformation" and "transfection" are intended to refer to a variety of art recognized techniques for introducing foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co precipitation, DEAE dextran mediated transfection, lipofection, or electroporation. Suitable methods for transforming or transfecting host cells can be found in Sambrook, et al., Molecular Cloning: A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989), and other laboratory manuals. [0129] For stable transfection of mammalian cells, it is known that, depending upon the expression vector and transfection technique used, only a small fraction of cells may integrate the foreign DNA into their genome. In order to identify and select these integrants, a gene that encodes a selectable marker (e.g., resistance to antibiotics) is generally introduced into the host cells along with the gene of interest. Various selectable markers include those that confer resistance to drugs, such as G418, hygromycin and methotrexate. Nucleic acid encoding a selectable marker can be introduced into a host cell on the same vector as that encoding mutant polypeptide or can be introduced on a separate vector. Cells stably transfected with the introduced nucleic acid can be identified by drug selection {e.g., cells that have incorporated the selectable marker gene will survive, while the other cells die). [0130] A host cell that includes a compound of the invention, such as a prokaryotic or eukaryotic host cell in culture, can be used to produce {i.e., express) recombinant mutant/polymorphic polypeptide. In one embodiment, the method comprises culturing the host cell of invention (into which a recombinant expression vector encoding mutant/polymorphic polypeptide has been introduced) in a suitable medium such that mutant polypeptide is produced. In another embodiment, the method further comprises the step of isolating mutant/polymorphic polypeptide from the medium or the host cell. Purification of recombinant polypeptides is well known in the art and includes ion exchange purification techniques, or affinity purification techniques, for example with an antibody to the compound. Methods of creating antibodies to the compounds of the present invention are discussed below.
[0131] Transgenic Animals. Recombinant organisms, i.e., transgenic animals, expressing a variant gene of the invention are prepared using standard procedures known in the art. Transgenic animals carrying the constructs of the invention can be made by several methods known to those having skill in the art. See, e.g , U.S. Pat. No. 5.610,053 and "The Introduction of Foreign Genes into Mice" and the cited references therein, in: Recombinant DNA, Watson JD, Gilman M, Witkowski J & Zoller M, eds. (W. H. Freeman and Company, New York) pp. 254-272. Transgenic animals stably expressing a human isogene and producing human protein can be used as biological models for studying diseases related to abnormal expression and/or activity, and for screening and assaying various candidate drugs, compounds, and treatment regimens to reduce the symptoms or effects of these diseases. [0132] Characterizing Gene Expression Level. Methods to detect and measure mRNA levels (i.e., gene transcription level) and levels of polypeptide gene expression products (i.e., gene translation level) are well-known in the art and include the use of nucleotide microarrays and polypeptide detection methods involving mass spectrometers, reverse-transcription and amplification and/or antibody detection and quantification techniques. See also, Strachan T & Read A, Human Molecular Genetics, 2nd Edition. (John Wiley and Sons, Inc. Publication, New York, 1999)). [0133] Determination of Target Gene Transcription. The determination of the level of the expression product of the gene in a biological sample, e.g., the tissue or body fluids of an individual, may be performed in a variety of ways. The term "biological sample" is intended to include tissues, cells, biological fluids and isolates thereof, isolated from a subject, as well as tissues, cells and fluids present within a subject. Many expression detection methods use isolated RNA. For in vitro methods, any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA from cells. See, e.g., Ausubel et al, Ed., Ciirr. Prot. MoI. Biol. (John Wiley & Sons, New York, 1987-1999). [0134] In one embodiment, the level of the mRNA expression product of the target gene is determined. Methods to measure the level of a specific mRNA are well-known in the art and include Northern blot analysis, reverse transcription PCR and real time quantitative PCR or by hybridization to a oligonucleotide array or microarray. In other more preferred embodiments, the determination of the level of expression may be performed by determination of the level of the protein or polypeptide expression product of the gene in body fluids or tissue samples including but not limited to blood or serum. Large numbers of tissue samples can readily be processed using techniques well-known to those of skill in the art. such as, e.g., the single-step RNA isolation process of U.S. Pat. No. 4,843,155. [0135] The isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to. Southern or Northern analyses, PCR analyses and probe arrays. One preferred diagnostic method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected. The nucleic acid probe can be, e.g., a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an mRNA or genomic DNA encoding a marker of the present invention. Other suitable probes for use in the diagnostic assays of the invention are described herein. Hybridization of an mRNA with the probe indicates that the marker in question is being expressed. [0136] In one format, the probes are immobilized on a solid surface and the mRNA is contacted with the probes, for example, in an Affymetrix gene chip array (Affymetrix, Calif. USA1). A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoded by the markers of the present invention. [0137] An alternative method for determining the level of mRNA corresponding to a marker of the present invention in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in U.S. Pat. No. "4,683,202); ligase chain reaction (Barany et al, Proc. Natl. Acad. Sci. USA 88:189-193 (1991)) self-sustained sequence replication (Guatelli et al, Proc. Natl. Acad. Sci. USA 87: 1874-1878 (1990)); transcriptional amplification system (Kwoh et al, Proc. Natl. Acad. Sci. USA 86: 1173-1177 (1989)); Q-Beta Replicase (Lizardi ef α/., .5io/. Technology 6: 1197 (1988)); rolling circle replication (U.S. Pat. No. 5,854,033); or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well-known to those of skill in the art. These detection schemes are especially useful for the detection of the nucleic acid molecules if such molecules are present in very low numbers. As used herein, "amplification primers" are defined as being a pair of nucleic acid molecules that can anneal to 5' or 3' regions of a gene (plus and minus strands, respectively, or vice-versa) and contain a short region in between. In general, amplification primers are from about 10-30 nucleotides in length and flank a region from about 50-200 nucleotides in length.
[0138] Real-time quantitative PCR (RT-PCR) is one way to assess gene expression levels, e.g , of genes of the invention, e.g., those containing SNPs and mutations of interest. The RT- PCR assay utilizes an RNA reverse transcriptase to catalyze the synthesis of a DNA strand from an RNA strand, including an mRNA strand. The resultant DNA may be specifically detected and quantified and this process may be used to determine the levels of specific species of mRNA. One method for doing this is TAQMAN® (PE Applied Biosystems, Foster City, Calif, USA) and exploits the 5' nuclease activity of AMPLITAQ GOLD™ DNA polymerase to cleave a specific form of probe during a PCR reaction. This is referred to as a TAQMAN™ probe. See Luthra et al, Am. J. Pathol 153 : 63-68 (1998); Kuimelis et al , Nucl Acids Symp. Ser. 37: 255-256 (1997); and Mullah et al, Nucl Acids Res. 26(4): 1026- 1031 (1998)). During the reaction, cleavage of the probe separates a reporter dye and a quencher dye, resulting in increased fluorescence of the reporter. The accumulation of PCR products is detected directly by monitoring the increase in fluorescence of the reporter dye. Heid et al, Genome Res. 6(6): 986-994 (1996)). The higher the starting copy number of nucleic acid target, the sooner a significant increase in fluorescence is observed. See Gibson, Heid & Williams et al, Genome Res. 6: 995-1001 (1996). [0139] Other technologies for measuring the transcriptional state of a cell produce pools of restriction fragments of limited complexity for electrophoretic analysis, such as methods combining double restriction enzyme digestion with phasing primers (see, e.g., EP 0 534858 Al), or methods selecting restriction fragments with sites closest to a defined mRNA end. (See, e.g.. Prashar & Weissman, Proc. Natl. Acad. Sci. USA 93(2) 659-663 (1996)). [0140] Other methods statistically sample cDNA pools, such as by sequencing sufficient bases, e.g., 20-50 bases, in each of multiple cDNAs to identify each cDNA, or by sequencing short tags, e.g., 9-10 bases, which are generated at known positions relative to a defined mRNA end pathway pattern. See, e.g., Velculescu, Science 270: 484-487 (1995). The cDNA levels in the samples are quantified and the mean, average and standard deviation of each cDNA is determined using by standard statistical means well-known to those of skill in the art. Norman T.J. Bailey, Statistical Methods In Biology, 3rd Edition (Cambridge University Press, 1995).
[0141] Detection of Polypeptides: Immunological Detection Methods. Expression of the protein encoded by the genes of the invention can be detected by a probe which is detectably labelled, or which can be subsequently labelled. The term "labelled", with regard to the probe or antibody, is intended to encompass direct-labelling of the probe or antibody by coupling, i.e., physically linking, a detectable substance to the probe or antibody, as well as indirect- labelling of the probe or antibody by reactivity with another reagent that is directly-labelled. Examples of indirect labelling include detection of a primary antibody using a fluorescently- labelled secondary antibody and end-labelling of a DNA probe with biotin such that it can be detected with fluorescently-labelled streptavidin. Generally, the probe is an antibody that recognizes the expressed protein. A variety of formats can be employed to determine whether a sample contains a target protein that binds to a given antibody. Immunoassay methods useful in the detection of target polypeptides of the present invention include, but are not limited to, e.g., dot blotting, western blotting, protein chips, competitive and non-competitive protein binding assays, inimunohistochemistry, enzyme-linked immunosorbant assays (ELISA), fluorescence activated cell sorting (FACS), and others commonly used and widely- described in scientific and patent literature, and many employed commercially. A skilled artisan can readily adapt known protein/antibody detection methods for use in determining whether cells express a marker of the present invention and the relative concentration of that specific polypeptide expression product in blood or other body tissues. Proteins from individuals can be isolated using techniques that are well-known to those of skill in the art. The protein isolation methods employed can, e.g., be such as those described in Harlow Sc Lane, Antibodies: A Laboratory Manual (Cold Spring Harbor Laboratory Press. Cold Spring Harbor, New York, 1988)).
[0142] For the production of antibodies to a protein encoded by one of the disclosed genes, various host animals may be immunized by injection with the polypeptide, or a portion thereof. Such host animals may include, but are not limited to, rabbits, mice and rats. Various adjuvants may be used to increase the immunological response, depending on the host species including, but not limited to, Freund's (complete and incomplete), mineral gels, such as aluminium hydroxide; surface active substances, such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin and dinitrophenol; and potentially useful human adjuvants, such as bacille Camette-Guerin (BCG) and Corynebacterinm parvum.
[0143] Monoclonal antibodies (mAbs), which are homogeneous populations of antibodies to a particular antigen, may be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma technique of Kohler & Milstein, Nature 256: 495-497 (1975); and U.S. Pat. No. 4,376,1 10; the human B-cell hybridoma technique of Kosbor et al, Immunol. Today 4: 72 (1983); Cole et al.. Proc. Natl. Acad. ScL USA SO: 2026-2030 (1983); and the EBV- hybridoma technique of Cole et ah. Monoclonal Antibodies and Cancer Therapy (Alan R. Liss, Inc., 1985) pp. 77-96.
[0144] In addition, techniques developed for the production of "chimaeric antibodies" (see Morrison et al, Proc. Natl. Acad. Sci. USA 81 : 6851-6855 (1984); Neuberger et al, Nature 312: 604-608 (1984); and Takeda et al, Nature 314: 452-454 (1985)), by splicing the genes from a mouse antibody molecule of appropriate antigen specificity together with genes from a human antibody molecule of appropriate biological activity can be used. A chimaeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable or hypervariable region derived form a murine mAb and a human immunoglobulin constant region.
[0145] Alternatively, techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778; Bird, Science 242: 423-426 (1988); Huston et al., Proc. Natl Acad. Sci. USA 85: 5879-5883 (1988); and Ward et al., Nature 334: 544-546 (1989)) can be adapted to produce differentially expressed gene single-chain antibodies.
[0146] Techniques useful for the production of "humanized antibodies" can be adapted to produce antibodies to the proteins, fragments or derivatives thereof. Such techniques are disclosed in U.S. Pat. Nos. 5,932,448; 5,693,762; 5,693,761 ; 5,585.089; 5,530,101; 5,569,825; 5,625,126; 5,633,425; 5,789,650; 5,661.016; and 5,770,429. [0147] Antibodies or antibody fragments can be used in methods, such as Western blots or immunofluorescence techniques, to detect the expressed proteins. In such uses, it is generally preferable to immobilize either the antibody or proteins on a solid support. Suitable solid phase supports or carriers include any support capable of binding an antigen or an antibody. Well-known supports or carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros and magnetite.
[0148] A useful method, for ease of detection, is the sandwich ELISA, of which a number of variations exist, all of which are intended to be used in the methods and assays of the present invention. As used herein, "sandwich assay" is intended to encompass all variations on the basic two-site technique. Immunofluorescence and EIA techniques are both very well- established in the art. However, other reporter molecules, such as radioisotopes, chemiluminescent or bioluminescent molecules may also be employed. It will be readily apparent to the skilled artisan how to vary the procedure to suit the required use. [0149] Whole genome monitoring of protein, i.e., the "proteome," can be carried out by constructing a microaxray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome. Preferably, antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to testing or confirming a biological network model of interest. As noted above, methods for making monoclonal antibodies are well-known. See, e.g., Harlow & Lane, Antibodies: A Laboratory ManuaV (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1988)). In a preferred embodiment, monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence of the cell. With such an antibody array, proteins from the cell are contacted to the array and their binding is measured with assays known in the art. [0150] Detection of Polypeptides: Two-Dimensional Gel Electrophoresis. Two- dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et al, Gel Electrophoresis of Proteins: A Practical Approach (IRL Press, New York, 1990); Shevchenko et al., Proc. Natl. Acad. Sci. USA 93: 14440- 14445 (1996); Sagliocco et al, Yeast 12: 1519-1533 (1996); and Lander, Science 274: 536- 539 (1996)).
[0151] Detection of Polypeptides: Mass Spectroscopy. The identity as well as expression level of target polypeptide can be determined using mass spectrocopy technique (MS). MS- based analysis methodology is useful for analysis of isolated target polypeptide as well as analysis of target polypeptide in a biological sample. MS formats for use in analyzing a target polypeptide include ionization (I) techniques, such as, but not limited to, matrix assisted laser desorption (MALDI), continuous or pulsed electrospray ionization (ESI) and related methods, such as ionspray or thermospray, and massive cluster impact (MCI). Such ion sources can be matched with detection formats, including linear or non-linear reflectron time of flight (TOF), single or multiple quadrupole, single or multiple magnetic sector Fourier transform ion cyclotron resonance (FTICR), ion trap and combinations thereof such as ion-trap/TOF. For ionization, numerous matrix/wavelength combinations (e.g.. matrix assisted laser desorption (MALDI)) or solvent combinations {e.g., ESI) can be employed.
[0152] For mass spectroscopy (MS) analysis, the target polypeptide can be solubilised in an appropriate solution or reagent system. The selection of a solution or reagent system, e.g., an organic or inorganic solvent, will depend on the properties of the target polypeptide and the type of MS performed, and is based on methods well-known in the art. See, e.g., Vorm et al.. Anal. Chem. 61 : 3281 (1994) for MALDI; and Valaskovic et al, Anal. Chem. 61: 3802 (1995), for ESI. MS of peptides also is described, e.g., in International PCT Application No. WO 93/24834 and U.S. Pat.- No. 5,792,664. A solvent is selected that minimizes the risk that the target polypeptide will be decomposed by the energy introduced for the vaporization process. A reduced risk of target polypeptide decomposition can be achieved, e.g., by embedding the sample in a matrix. A suitable matrix can be an organic compound such as a sugar, e.g., a pentose or hexose, or a polysaccharide such as cellulose. Such compounds are decomposed thermolytically into CO2 and H2O such that no residues are formed that can lead to chemical reactions. The matrix also can be an inorganic compound, such as nitrate of ammonium, which is decomposed essentially without leaving any residue. Use of these and other solvents is known to those of skill in the art. See, e.g., U.S. Pat. No. 5,062,935. EIectrospray MS has been described by Fenn et al , J. Phys. Chem. 88: 4451-4459 (1984); and PCT Application No. WO 90/14148; and current applications are summarized in review articles. See Smith et al, Anal. Chem. 62: 882-89 (1990); and Ardrey, Spectroscopy 4: 10-18 (1992).
[0153] The mass of a target polypeptide determined by MS can be compared to the mass of a corresponding known polypeptide. For example, where the target polypeptide is a mutant protein, the corresponding known polypeptide can be the corresponding non-mutant protein, e.g., wild-type protein. With ESI. the determination of molecular weights in femtomole amounts of sample is very accurate due to the presence of multiple ion peaks, all of which can be used for mass calculation. Sub-attomole levels of protein have been detected, e g., using ESI MS (Valaskovic et al, Science 273: 1199-1202 (1996)) and MALDI MS (Li et al, J. Am. Chem. Soc. 1 18: 1662-1663 (1996)).
[0154] Matrix Assisted Laser Desorption (IvIALDl). The level of the target protein in a biological sample, e.g., body fluid or tissue sample, may be measured by means of mass spectrometric (MS) methods including, but not limited to, those techniques known in the art as matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI- TOF-MS) and surfaces enhanced for laser desorption/ionization, time-of-flight mass spectrometry (SELDI-TOF-MS) as further detailed below. Methods for performing MALDI are well-known to those of skill in the art. See, e.g., Juhasz et al, Analysis, Anal Chem. 68: 941-946 (1996), and see also, e.g., U.S. Pat. Nos. 5,777,325; 5,742,049; 5,654,545; 5,641 ,959; 5,654,545 and 5,760,393 for descriptions of MALDI and delayed extraction protocols. Numerous methods for improving resolution are also known. MALDI-TOF-MS has been described by Hillenkamp et al, Biological Mass Spectrometry, Burlingame & McCloskey, eds. (Elsevier Science Publ., Amsterdam, 1990) pp. 49-60. [0155] A variety of techniques for marker detection using mass spectroscopy can be used. See Bordeaux Mass Spectrometry Conference Report, Hillenkamp, Ed., pp. 354-362 (1988); Bordeaux Mass Spectrometry Conference Report, Karas & Hillenkamp, Eds., pp. 416-417 (1988); Karas & Hillenkamp, Anal. Chem. 60: 2299-2301 (1988); and Karas et al, Biomed. Environ. Mass Spectrum 18: 841-843 (1989). The use of laser beams in TOF-MS is shown, e.g., in U.S. Patent Nos. 4,694,167; 4,686,366, 4,295,046 and 5,045,694, which are incorporated herein by reference in their entireties. Other MS techniques allow the successful volatilization of high molecular weight biopolymers, without fragmentation, and have enabled a wide variety of biological macromolecules to be analyzed by mass spectrometry. [0156] Surfaces Enhanced for Laser Desorption/Ioni∑ation (SELDl) Other techniques are used which employ new MS probe element compositions with surfaces that allow the probe element to actively participate in the capture and docking of specific analytes, described as Affinity Mass Spectrometry (AMS). See SELDI patents U.S. Pat. Nos. 5,719,060; 5,894,063; 6,020,208; 6,027,942; 6,124,137; and U.S. Patent application No. U.S. 2003/0003465. Several types of new MS probe elements have been designed with Surfaces Enhanced for Affinity Capture (SEAC). See Hutchens & Yip. Rapid Commun. Mass Spectrom. 7: 576-580 (1993). SEAC probe elements have been used successfully to retrieve and tether different classes of biopolymers, particularly proteins, by exploiting what is known about protein surface structures and biospecific molecular recognition. The immobilized affinity capture devices on the MS probe element surface, i.e., SEAC, determines the location and affinity (specificity) of the analyte for the probe surface, therefore the subsequent analytical MS process is efficient.
[0157] Within the general category of SELDI are three separate subcategories: (1) Surfaces Enhanced for Neat Desorption (SEND), where the probe element surfaces, i.e., sample presenting means, are designed to contain Energy Absorbing Molecules (EAM) instead of "matrix" to facilitate desorption/ionizations of analytes added directly (neat) to the surface;
(2) SEAC. where the probe element surfaces, i.e., sample presenting means, are designed to contain chemically defined and/or biologically defined affinity capture devices to facilitate either the specific or non-specific attachment or adsorption (so-called docking or tethering) of analytes to the probe surface, by a variety of mechanisms (mostly non-covalent); and
(3) Surfaces Enhanced for Photolabile Attachment and Release (SEPAR), where the probe element surfaces, i.e., sample presenting means, are designed or modified to contain one or more types of chemically defined cross-linking molecules to serve as covalent docking devices. The chemical specificities determining the type and number of the photolabile molecule attachment points between the SEPAR sample presenting means (i.e., probe element surface) and the analyte (e.g., protein) may involve any one or more of a number of different residues or chemical structures in the analyte (e.g.. His, Lys, Arg, Tyr, Phe and Cys residues in the case of proteins and peptides). [0158] Functionali∑ing Polypeptides. A polypeptide of interest also can be modified to facilitate conjugation to a solid support. A chemical or physical moiety can be incorporate into the polypeptide at an appropriate position. For example, a polypeptide of interest can be modified by adding an appropriate functional group to the carboxyl terminus or amino terminus of the polypeptide, or to an amino acid in the peptide, (e.g., to a reactive side chain, or to the peptide backbone. The artisan will recognize, however, that such a modification, e.g., the incorporation of a biotin moiety, can affect the ability of a particular reagent to interact specifically with the polypeptide and, accordingly, will consider this factor, if relevant, in selecting how best to modify a polypeptide of interest. A naturally-occurring amino acid normally present in the polypeptide also can contain a functional group suitable for conjugating the polypeptide to the solid support. For example, a cysteine residue present in the polypeptide can be used to conjugate the polypeptide to a support containing a sulfhydryl group through a disulfide linkage, e.g., a support having cysteine residues attached thereto. Other bonds that can be formed between two amino acids, include, but are not limited to, e.g., monosulfide bonds between two lanthionine residues, which are non- naturally-occurring amino acids that can be incorporated into a polypeptide; a lactam bond formed by a transamidation reaction between the side chains of an acidic amino acid and a basic amino acid, such as between the y-carboxyl group of GIu (or alpha carboxyl group of Asp) and the amino group of Lys; or a lactone bond produced, e.g., by a crosslink between the hydroxy group of Ser and the carboxyl group of GIu (or alpha carboxyl group of Asp). Thus, a solid support can be modified to contain a desired amino acid residue, e.g., a GIu residue, and a polypeptide having a Ser residue, particularly a Ser residue at the N-terminus or C-terminus, can be conjugated to the solid support through the formation of a lactone bond. The support need not be modified to contain the particular amino acid, e.g., GIu, where it is desired to form a lactone-like bond with a Ser in the polypeptide, but can be modified, instead, to contain an accessible carboxyl group, thus providing a function corresponding to the alpha carboxyl group of GIu.
[0159] Thiol-Reactive Functionalities. A thiol-reactive functionality is particularly useful for conjugating a polypeptide to a solid support. A thiol-reactive functionality is a chemical group that can rapidly react with a nucleophilic thiol moiety to produce a covalent bond, e.g., a disulfide bond or a thioether bond. A variety of thiol-reactive functionalities are known in the art, including, e.g., haloacetyls, such as iodoacetyl; diazoketones; epoxy ketones, alpha- and beta-unsaturated carbonyls, such as alpha-enones and beta-enones; and other reactive Michael acceptors, such as maleimide; acid halides; benzyl halides; and the like. See Greene & Wuts, Protective Groups in Organic Synthesis, 2nd Edition (John Wiley & Sons, 1991). [0160] If desired, the thiol groups can be blocked with a photocleavable protecting group, which then can be selectively cleaved, e.g., by photolithography, to provide portions of a surface activated for immobilization of a polypeptide of interest. Photocleavable protecting groups are known in the art (see, e.g., published International PCT Application No. WO 92/10092; and McCray et al, Ann. Rev. Biophys. Biophys. Chem. 18: 239-270 (1989)) and can be selectively de-blocked by irradiation of selected areas of the surface using, e.g., a photolithography mask.
[0161] Linkers. A polypeptide of interest can be attached directly to a support via a linker. Any linkers known to those of skill in the art to be suitable for linking peptides or amino acids to supports, either directly or via a spacer, may be used. For example, the polypeptide can be conjugated to a support, such as a bead, through means of a variable spacer. Linkers, include, Rink amide linkers (see, e.g., Rink, Tetrahedron Lett. 28: 3787 (1976)); trityl chloride linkers (see, e.g., Leznoff, Ace Chem. Res. 1 1 : 327 (1978)); and Merrifield linkers (see, e.g., Bodansky et al., Peptide Synthesis, 2nd Edition (Academic Press, New York, 1976)). For example, trityl linkers are known. See, e.g , U.S. Pat. Nos. 5,410,068 and 5,612,474. Amino trityl linkers are also known. See, e.g., U.S. Pat. No. 5,198.531. Other linkers include those that can be incorporated into fusion proteins and expressed in a host cell. Such linkers may be selected amino acids, enzyme substrates or any suitable peptide. The linker may be made, e.g., by appropriate selection of primers when isolating the nucleic acid. Alternatively, they may be added by post-translational modification of the protein of interest. Linkers that are suitable for chemically linking peptides to supports, include disulfide bonds, thioether bonds, hindered disulfide bonds and covalent bonds between free reactive groups, such as amine and thiol groups.
[0162] Cleavable Linkers. A linker can provide a reversible linkage such that it is cleaved under the select conditions. In particular, selectively cleavable linkers, including photocleavable linkers (see U.S. Pat. No. 5,643,722), acid cleavable linkers (see Fattom et al.. Infect. Immun. 60: 584-589 (1992)), acid-labile linkers (see Welhδner et al, J. Biol. Chem. 266: 4309-4314 (1991)) and heat sensitive linkers are useful. A linkage can be, e.g., a disulfide bond, which is chemically cleavable by mercaptoethanol or dithioerythrol; a biotin/streptavidin linkage, which can be photocleavable; a heterobifunctional derivative of a trityl ether group, which can be cleaved by exposure to acidic conditions or under conditions of MS (see Kδster et al, Tetrahedron Lett. 31 : 7095 (1990)); a levulinyl-mediated linkage, which can be cleaved under almost neutral conditions with a hydrazinium/acetate buffer; an arginine-arginine or a lysine-lysine bond, either of which can be cleaved by an endopeptidase, such as trypsin; a pyrophosphate bond, which can be cleaved by a pyrophosphatase; or a ribonucleotide bond, which can be cleaved using a ribonuclease or by exposure to alkali condition. A photolabile cross-linker, such as 3-amino-(2-nitrophenyl)propionic acid can be employed as a means for cleaving a polypeptide from a solid support. Brown et al, MoI Divers, pp. 4-12 (1995); Rothschild et al. Nucl. Acids. Res. 24: 351 -66 (1996); and U.S. Pat. No. 5,643,722. Other linkers include RNA linkers that are cleavable by ribozymes and other RNA enzymes and linkers, such as the various domains, such as CHi, CH2 and CH3, from the constant region of human IgGl . See, Batra et al, MoI Immunol 30: 379-396 (1993). [0163] Combinations of any linkers are also contemplated herein. For example, a linker that is cleavable under MS conditions, such as a silyl linkage or photocleavable linkage, can be combined with a linker, such as an avidin biotin linkage, that is not cleaved under these conditions, but may be cleaved under other conditions. Acid-labile linkers are particularly useful chemically cleavable linkers for mass spectrometry, especially for MALDI-TOF, because the acid labile bond is cleaved during conditioning of the target polypeptide upon addition of a 3-HPA matrix solution. The acid labile bond can be introduced as a separate linker group, e.g., an acid labile trityl group, or can be incorporated in a synthetic linker by introducing one or more silyl bridges using diisopropylysilyl, thereby forming a diisopropylysilyl linkage between the polypeptide and the solid support. The diisopropylysilyl linkage can be cleaved using mildly acidic conditions, such as 1.5% trifluoroacetic acid (TFA) or 3-HPA/l% TFA MALDI-TOF matrix solution. Methods for the preparation of diisopropylysilyl linkages and analogues thereof are well-known in the art. See, e.g.. Saha et al., J. Org. Chem. 58: 7827-7831 (1993).
[0164] Use of a Pin Tool to Immobilize a Polypeptide. The immobilization of a polypeptide of interest to a solid support using a pin tool can be particularly advantageous. Pin tools include those disclosed herein or otherwise known in the art. See, e.g., U.S. Application Serial Nos. 08/786,988 and 08/787,639; and International PCT Application No. WO 98/20166. [0165] A pin tool in an array, e.g., a 4 x 4 array, can be applied to wells containing polypeptides of interest. Where the pin tool has a functional group attached to each pin tip, or a solid support, e.g., functionalized beads or paramagnetic beads are attached to each pin, the polypeptides in a well can be captured (1 pmol capacity). During the capture step, the pins can be kept in motion (vertical, 1-2 mm travel) to increase the efficiency of the capture. Where a reaction, such as an in vitro transcription is being performed in the wells, movement of the pins can increase efficiency of the reaction. Further immobilization can result by applying an electrical field to the pin tool. When a voltage is applied to the pin tool, the polypeptides are attracted to the anode or the cathode, depending on their net charge. [0166] For more specificity, the pin tool (with or without voltage) can be modified to have conjugated thereto a reagent specific for the polypeptide of interest, such that only the polypeptides of interest are bound by the pins. For example, the pins can have nickel ions attached, such that only polypeptides containing a polyhistidine sequence are bound. Similarly, the pins can have antibodies specific for a target polypeptide attached thereto, or to beads that, in turn, are attached to the pins, such that only the target polypeptides, which contain the epitope recognized by the antibody, are bound by the pins. [0167] Captured polypeptides can be analyzed by a variety of means including, e.g., _spectrometric techniques, such as UWV I S. IR, fluorescence, chemiluminescence, NMR spectroscopy, MS or other methods known in the art, or combinations thereof. If conditions preclude direct analysis of captured polypeptides, the polypeptides can be released or transferred from the pins, under conditions such that the advantages of sample concentration are not lost. Accordingly, the polypeptides can be removed from the pins using a minimal volume of eluent, and without any loss of sample. Where the polypeptides are bound to the beads attached to the pins, the beads containing the polypeptides can be removed from the pins and measurements made directly from the beads.
[0168] Pin tools can be useful for immobilizing polypeptides of interest in spatially addressable manner on an array. Such spatially addressable or pre-addressable arrays are useful in a variety of processes, including, for example, quality control and amino acid sequencing diagnostics. The pin tools described in the U.S. Application Nos. 08/786,988 and 08/787,639 and International PCT Application No. WO 98/20166 are serial and parallel dispensing tools that can be employed to generate multi-element arrays of polypeptides on a surface of the solid support. The array surface can be flat, with beads or geometrically altered to include wells, which can contain beads. In addition, MS geometries can be adapted for accommodating a pin tool apparatus.
[0169] Other Aspects of the Biological State. In various embodiments of the invention, aspects of the biological activity state, or mixed aspects can be measured in order to obtain drug and pathway responses. The activities of proteins relevant to the characterization of cell function can be measured, and embodiments of this invention can be based on such measurements. Activity measurements can be performed by any functional, biochemical or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with natural substrates, and the rate of transformation measured. Where the activity involves association in multimeric units, e.g., association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured. Also, where only a functional activity is known, e.g., as in cell cycle control, performance of the function can be observed. However known and measured, the changes in protein activities form the response data analyzed by the methods of this invention. In alternative and non-limiting embodiments, response data may be formed of mixed aspects of the biological state of a cell. Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances and changes in certain protein activities.
[0170] The following EXAMPLES are presented in order to more fully illustrate the preferred embodiments of the invention. These EXAMPLES should in no way be construed as limiting the scope of the invention, as defined by the appended claims.
EXAMPLE I
BIOINFORMATICS ANALYSIS OF ERBB2 MUTATIONS
[0171] Identification ofERBB2 Mutations in Melanoma Cancer. To determine ERBB2 mutations in association with cancer, DHPLC analysis (Lilleberg SL, Curr. Opin. Drug Discov DeveL, 6(2): 237-52 (March 2003)) was conducted on test samples derived from human tissues, e.g., breast cancer, as summarized in TABLE 3 below. Specifically, forty five (45) tumour tissue samples were analyzed from breast cancer patents. Six (6) SNPs and six (6) mutations of ERBB2 were identified as detailed below in TABLE 3 (see also, TABLE 1 and TABLE 2, supra). TABLE 3 ERBB2 Mutations in Breast Cancer Patients
Mutation Amino Amino Identifier
Acid Acid
Change Position
TTT>CTT F>L 371 F371 L SA
- - intronic
Figure imgf000052_0001
G>A 14bp 5' intron of exon 9 -. — intronic
TGOAGC OS 472 C475S
CCA>CCG — 534 P534P
CCOCCA — 562 P562P
CGOCAG R>Q 868 R868Q
CCC>TCC P>S 856 P856S
GCT>TCT A>S 848 A848S
GAOAAC D>N 873 D873N
GGG>AGG G>R 1015 G 1015R
CCOGCC P>A 1 170 Pl 170A
SA: acceptor site
[0172] Computational Analysis ofERBBl Mutations The ERBB2 mutations identified in human cancer were analyzed using computational analysis tools to determine the effect(s) of these mutations on ERBB2 function. Only missense mutations and non-synonymous SNPs were included in the analysis.
[0173] Comparison of Known ERBB 2 Mutations and SNPs with the ERBB2 Mutations and SNPs of the Present Invention. There were two (2) SNPs in the coding region of ERBB2 reported within the SNP database (dbSNP: http://www.ncbi.nlm.nih.gov/SNP/index. html); see TABLE 4). Further, numerous deletion and substitution mutations of ERBB2 in cancer have been reported. None of the previously reported SNPs or mutations within the coding region of ERBB2 is identified in this study.
TABLE 4
Known ERBB2 SNPs or mutations in the coding region
Identifier Variation type Cancer Tvpe/
Disease
S297I SNP: rsl803385 N/A
A78A SNP: rs202641 N/A
V655I Substitution N/A
V654I Substitution N/A
774insAYVM Insertion/duplication N/A
779insVGS Insertion/duplication N/A
L755P Substitution N/A
E914K Substitution N/A
G776S Substitution N/A
N857S Substitution N/A
775insYNMA Insertion/duplication NSCLC
G776V+776insC Substitution+insertion NSCLC
G776L+776insC Substitution+insertion NSCLC
780insGSP Insertion/duplication NSCLC ins: insertion
[0174] Analysis of the Effect of ERBB 2 Mutations on ERBB2 Protein Domain Structure and Function. Pfam Analysis of the Potential Effect of the ERBB2 mutations on ERBB2 Protein Domain Structure. The effect of the ERBB2 mutations on the protein domain structure of ERBB2 was analyzed using the Pfam computational analysis tool. Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein families based on the Swissprot 44.5 and SP-TrEMBL 27.5 protein sequence databases. Only non-synonymous SNPs and missense mutations F371L, C475S, R868Q, P856S, A848S, D873N, G1015R and Pl 170A are selected for analysis. Pfam analysis (http.V/pfam.wustl.edu/) indicates that ERBB2 contains six structural domains (See TABLE 5).
TABLE 5
Protein domain structure of ERBB2
Model Sequence Sequence Description SNP/ from to Mutation
Recep_L_domain 52 173 Receptor L domain
Furin-like 189 343 Furin-like cysteine rich region
Recep_L_domain 366 486 Receptor L F371L. domain C475S
Pkinase_Tyr 720 976 Protein A848S, tyrosine R856Q, kinase D873N
YLP 1020 1028 YLP motif
YLP 1193 1201 YLP motif
[0175] In more details, TABLES 6 to 9 shows the alignments of wild type ERBB2 sequence with Pfam models of each domain. Two of the mutations identifies in the present invention are found in a receptor L domain 366-486, and three of them are found in the protein tyrosine kinase domain. A848 and R868 are highly conserved in the protein tyrosine kinase domain. Mutated positions are highlighted in bold and underlined text. Amino acid change in the highly conserved region may alter the protein structure and hence the protein function. [0176] Sequence alignment of the wild-type human ERBB2 polypeptide sequence with the Pfam receptor L domain sequence is summarized below in TABLE 6. The receptor L domain is located in the wild-type ERBB2 polypeptide at amino acid position 52 to 173 (TABLE 6; ERBB2; score 154.9, E = 1.9e-43).
TABLE 6
Sequence Alignment Comparison of Human ERBB2 from AA 52 to AA 173 with Pfam Model of Receptor L Domain
*->nCtvIeGnLeItlrsengdkkwfsniedeleldseledlssLsniee +C+V+ GnLe+t+ +n +ls+L+ i+e
ERBB2 52 GCQVVQGNLELTYLPTN ASLSFLQDIQE 79 itGyLlIyrtpgnlvslsFLpNLrvIrGrnlfddsntdnyalvildNpnl + Gy+ll+++ v +L++Lr++rG++lf+d nyal++ldN +
ERBB2 80 VQGYVLIAHN QVRQVPLQRLRIVRGTQLFED NYALAVLDNGDP 122 nkss sGLeeLglpsLkeltskgGgvyihnNpHPkLCytetei
+++++ ++ +++GL+eL+l+sL+eI+ Ggv i++Np +LCy +t i
ERB32 123 LNNTtpvtgaspGGLRELQLRSLTΞILK--GGVLIQRNP—QLCYQDT-I 167 dwflit<-* +w++i+
ERBB2 168 LWKDIF 173 SΞQ I D N0 : 25
[0177] Sequence alignment of the wild-type human ERBB2 polypeptide sequence with the Pfam furin-like cysteine rich region (Furin-like domain) sequence is summarized below in TABLE 7. The furin-like domain is located in the wild-type ERBB2 polypeptide at amino acid position 189 to 343 (TABLE 7; ERBB2; score 323.4, E = 3.4e-94).
TABLE 7
Sequence Alignment Comparison ofHuman ERJ3B2 from AA 189 to AA 343 with Pfam Model ofFurin-Like Cysteine Rich Region t->greCpkvChGTleakGesCkkttiNGefdyRCWGsgpedCQklTKlv +r+C++ C ++Ck ++ RCWG+τ+edCQ 1T++V
ERB32 189 SRACHP-CS PMCK-GS RCWGESSEDCQSLTRTV 219
CpsqCsgGrrCtgpnptdCCHeeCaGGCTGHGPkdPtDClACRhFyddGi
C+ +C+ rC+gp+ptdCCHe+Ca+GCT GPk+ +DClAC+hF+++Gi
ERBB2 220 CAGGCA RCKGPLPTDCCHEQCAAGCT—GPKH-SDCLACLHFNHSGI 263
CketCPpptyynedTrqvdfNPegkYqfGasCVkeCPsnylvthnGsCvr
C+ +CP++++yn+dT+++++NPeg+Y+fGasCV++CP+nyl+t++GsC++
ERBB2 264 CΞLHCPALVTYNTDTFESMPNPEGRYTFGASCVTACPYNYLSTDVGSCTL 313 sCPsgHktevgAesGvreCekCReGpCPKvCe<-* +CP + ++ev+Ae+G+++CekC ++pC++vC+
ERBB2 1 314 VCPLH-NQEVTAEDGTQRCEKC-SKPCARVCY 343 SEQ ID NO:26 [0178] Sequence alignment of the wild-type human ERBB2 polypeptide sequence with the second Pfam receptor L domain sequence is summarized below in TABLE 8. The second receptor L domain is located in the wild-type ERBB2 polypeptide at amino acid position 366 to 486 (TABLE 8; ERBB2; score 1 13.3, E = 6.1e-31).
TABLE 8
Sequence Alignment Comparison of Human ERBB2 from AA 366 to AA 486 with Pfam Model of Receptor L Domain
*->nCtvIeGnLeItlrsengdkkwfsniedeleldseledlssLsniee +C +I-G+L ++++S+ gd-r sn++++ ++ 1 +++++ee ERBB2 1 366 GCKKIFGSLAFLPESFDGDP—ASNTAPLQPE QLQVFETLEE 405 itGyLlIyrtpgnlvslsFLpNLrvIrGrnlfddsntdnyalvildNpnl itGyL+l +p +1 +ls+++NL+vIrGr 1+++ y+1 ERBB2 1 406 ITGYL/ISAWPDSLPDLSVFQNLQVIRGRILHNG AYSLTLQGLG-- 449 nksssGLeeLglpsLkeltskgGgvyjLhnNpHPkLCyteteidwfl_.t<- ++ Lgl+sL+e+ s G ih+N LC+++t + w++++
ERBB2 1 450 ISWLGLRSLRELGS—GLALIHHNT—HLCFVHT-VPWDQLF 486
SEQ ID N0:27
[0179] Sequence alignment of the wild-type human ERBB2 polypeptide sequence with the Pfam protein tyrosine kinase (pkinase_Tyr ) domain sequence is summarized below in TABLE 9. The protein tyrosine kinase domain is located in the wild-type ERBB2 polypeptide at amino acid position 720 to 976 (TABLE 9; ERBB2; score 453.8, E = 1.9e-133).
TABLE 9
Sequence Alignment Comparison of Human ERJBB2 from AA 720 to AA 976 with Pfam Model of Protein Tyrosine Kinase Domain
*->lklgkkLGeGaFGeVykGtlkgsgegtkikVAVKtLkeigasseeig
1++ k+LG+GaFG+VykG+4- ++ge+ ki+VA+K+L+e ++S+++ ERBB2 1 720 LRKVKVLGSGAFGTVYKGIWI PDGENVKIPVAIKVLRE-NTSPKA— 763 redFlrEAsiMkklGdHpNiVrLlGvctkegePggpglyiVtEymegGdL ++ 1 EA +M p + rLlG+c+ + +Vt +m++G+L ERBB2 1 764 NKEILDEAYVMAGV-GSPYVSRLLGICL--TST VQLVTQLMPYGCL 806 idfLrkhregrpLtlkdLlsfalQiAkGMeYLesknfvHRDLAARNcLVs Id-1- r++ +^+L +dLlτ+++QiAkGM+YLe+ ++VHRDLAARN+LV ERBB2 1 807 LDHVREN—RGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVK 854 enlvVKIsDFGLaRdiynddyYvrkkgggklPvkWmAPEslkygkFtskS ++++VKI+DFGLaR+++ d+ + +ggk+P+kWmA+Es+ ++Ft++S ERBB2 1 855 SPNHVKITDFGLARLLDIDETE-YHADGGKVPIKWMALESILRRRFTHQS 903
DVWSFGVlLWEiftlGeqPFYpgmsneevlellyedGyRLprPenCPdel DVWS+GV+ WΞ-r+t+G++P Y g++ e+ +11 eAG+RLp+Pτ C+ +-r ERBB2 1 904 DVWSYGVTVWΞLMTFGAKP-YDGIPAREIPDLL-EKGERLPQPPICTIDV 951
YdlMlqCWaedPedRPtFselverL<-* Y++M +CW+ d e RP+F+elv+++ ERBB2 1 952 YMIMVKCWMIDSECRPRFRELVSEF 976 SEQ ID NO:28
[0180] Analysis of the Potential Effect ofERBB2 mutations on ERBB2 Protein Regulatory Sites. NetPhos Analysis of the Effect ofERBB2 mutations on ERBB2 Protein Phosphorylation. Known phosphorylation sites of ERBB2 include T686, Yl 139, Yl 196, Y1221 , Y1222 and Y1248. Potential phosphorylation sites on serine, threonine and tyrosine were identified by computational analysis using the NetPhos computational analysis tool (http://www.cbs.dtu.dk/services/NetPhos/). NetPhos produces neural network predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins (Blom et al, J. MoI. Biol, 294(5): 1351 -1362, 1999). Potential ERBB2 phosphorylation sites predicted by NetPhos are summarized below in TABLE 10. With the exception of Y 1221, other published phosphorylated sites were identified as predicted phosphorylation sites by the software in these studies. NetPhos analysis of ERBB2 indicated additional serine, threonine and tyrosine phosphorylation sites present in the ERBB2 polypeptide. To be considered a potential phosphorylation site a threshold score of 0.5 was required. The predicted phosphorylation sites which represent possible mutation interference are underlined in TABLE 10. Pl 170 is close to a potential serine phosphorylation site Sl 174 and a potential threonine phosphorylation site Tl 172. D873 is close a potential threonine phosphorylation site T875 and a potential tyrosine phosphorylation site Y877.
TABLE l O
ERBB2 phosphorylation sites predicted by NetPhos Phosphorylation Positions
Serine 38, 133, 189, 208, 214, 250, 281, 418, 602, 633, 649, 653,
760. 779, 834, 998. 1007, 1050, 1051, 1054, 1066, 1073, 1078, 1 1 13, 1 122, 1 151, 1 174. 1235
Threonine 23, 328, 686, 701, 733, 759, 875, 900, 948, 1172. 1 198, 1236
Tyrosine 83, 1 12, 289, 590, 772, 877, 1023, 1127, 1139, 1196, 1222,
1248
[0181] To evaluate the effect of the ERBB2 mutations on the phosphorylation pattern, each mutant sequence was analyzed by NetPhos. According to NetPhos, a site is predicted as a potential phosphorylation site if its score is greater than or equal to 0.5. The higher the score, the greater its potential to be a phosphorylation site. As shown in TABLE 1 1 , D873N eliminates the potential threonine phosphorylation site T875 and decreases the phosphorylation potential of Y877. Pl 170A slightly decreases the phosphorylation potential of Sl 174, and increases the phosphorylation potential of Tl 166 and Tl 172.
TABLE 1 1
Change in Phosphorylation Pattern by Mutations
Wild F371 L C475S A848S P856S R868O D873N G 1015R P l 170A tyβe
S 1 174 0.978 0.978 0.978 0.978 0.978 0.978 0.978 0.978 0.976
Tl 166 0.393 0.393 0.393 0.393 0.393 0.393 0.393 0.393 0.522
Tl 172 0.808 0.808 0.808 0.808 0.808 0.808 0.808 0.808 0.874
T875 0.515 0.515 0.515 0.515 0.5 15 0.515 0 0.515 0.515
Y877 0.971 0.971 0.971 0.971 0.971 0.971 0.965 0.971 0.971
[0182] Analysis of the Potential Effect ofERBB2 mutation on SUMO ERBB2 Protein Regulatory Sites. Small ubiquitin-related modifier (SUMO) family proteins (a.k.a., PICl, UBLl, Sentrin, GMPl, and Smt3) are covalently attached to select target proteins in post- translational modification. Johnson, E. S., Ann. Rev. Biochem., 73: 355-382 (2004). SUMO is a member of a ubiquitin-like protein family that is ligated to lysine residues in a variety of target proteins, modulating their functions. SUMO modification is reversible, and does not appear to target proteins for degradation but rather alters the target protein function through changes in cellular localization, biochemical activation, or through protection from ubiquitin- dependent degradation. Posttranslational modification via sumoylation influences numerous biological processes, including signal transduction, transcriptional regulation, and growth control. Shiio and Eisenmann have demonstrated that the DNA-binding histone proteins are subject to sumoylation. Shiio & Eisenmann, Proc. Natl Acad. Sci. U.S.A. 100(23):13225-30 (2003).
[0183] The potential ERBB2 sumoylation sites were identified by computational analysis using the SUMOPlot computational analysis tool. Hinsley et al., Protein Sci., 13 : 2588 - 2599 (2004); Van Dyck et al., J. Biol. Chem., 279: 36121 - 36131 (2004). SUMOplot™ predicts the probability for the SUMO consensus sequence (SUMO-CS) to be engaged in SUMO attachment. That is, most SUMO-modified proteins contain the tetrapeptide motif B- K-x-D/E where B is a hydrophobic residue, K is the lysine conjugated to SUMO, x is any amino acid (aa), and D or E is an acidic residue. Substrate specificity appears to be derived directly from Ubc9 and the respective substrate motif. The SUMOplot™ score system is based on two criteria: 1) direct amino acid match to the SUMO-CS observed and shown to bind Ubc9, and 2) substitution of the consensus amino acid residues with amino acid residues exhibiting similar hydrophobicity. No SUMO modification has been reported for ERBB2. Potential SUMO modification sites are predicted by SUMOPLOT and summarized below in below TABLE 12 and TABLE 13. ERBB2 mutation P856S of the present invention is predicted to eliminate a potential SUMO modification site at K854.
[0184] The amino acid sequence of ERBB2 wild-type polypeptide (SEQ ID NO:29) is shown below in TABLE 12. Bold text with grey shading designates a sumoylation motif. Bold and underlined text with grey shading with underlined text designates a sumoylation motif of high probability.
TABLE 12 Sumoylation Motifs in Wild-type ERBB2 i MELAALCRWG LLLALLPPGA ASTQVCTGTD MKLRLPASPE THLDMLRHLY
51 QGCQVVQGNL ELTYLPTNAS LSFLQDIQEV QGYVLIAHNQ VRQVPLQRLR
101 IVRGTQLFED NYALAVLDNG DPLNNTTPVT GASPGGLREL QLRSLTEIpς
151 ggVLIQRNPQ LCYQDTILWK DIFHKNNQLA LTLIDTNRSR ACHPCSPMCK
201 GSRCWGESSE DCQSLTRTVC AGGCARCKGP LPTDCCHEQC AAGCTGPKHS
251 DCLACLHFNH SGICELHCPA LVTYNTDTFE SMPNPEGRYT FGASCVTACP
301 YNYLSTDVGS CTLVCPLHNQ EVTAEDGTQR CEKCSKPCAR VCYGLGMEHL
351 REVRAVTSAN IQEFAGCJKKΪ fjGSLAFLPES FDGDPASNTA PLQPEQLQVF
401 ETLEEITGYL YISAWPDSLP DLSVFQNLQV IRGRILHNGA YSLTLQGLGI
451 SWLGLRSLRE LGSGLALIHH NTHLCFVHTV PWDQLFRNPH QALLHTANRP
501 EDECVGEGLA CHQLCARGHC WGPGPTQCVN CSQFLRGQEC VEECRVLQGL
551 PREYVNARHC LPCHPECQPQ NGSVTCFGPE ADQCVACAHY KDPPFCVARC
601 PSGKKgEjLSY MPlWKEgDEE GACQPCPINC THSCVDLDDK GCPAΞQRASP
651 LTSIISAVVG ILLVWLGW FGILIKRRQQ KIRKYTMRRL LQETELVJPL
701 TPSGAMPNQA QMRILKETEL RKVKVLGSGA FGTVYKGIWI PDGENJZregJjV
751 AIKVLRΞNTS PKANKEILDE AYVMAGVGSP YVSRLLGICL TSTVQLVTQL
801 MPYGCLLDHV RENRGRLGSQ DLLNWCMQIA KGMSYLEDVR LVHRDLAARN
851 VLyKSP1NHVK ITDFGLARLL DIDETEYHAD [KWMA LESILRRRFT
901 HQSDVWSYGV TVWELMTFGA KPYDGIPARE IPDLLJSKGgR LPQPPICTID
951 VYMIMVKCWM IDSECRPRFR ELVSEFSRMA RDPQRFVVIQ NEDLGPASPL
1001 DSTFYRSLLE DDDMGDLVDA EΞYLVPQQGF FCPDPAPGAG GMVHHRHRSS
1051 STRSGGGDLT LGLEPSEEEA PRSPLAPSEG AGSDVFDGDL GMGAAKGLQS
1101 LPTHDPSPLQ RYSEDPTVPL PSETDGYVAP LTCSPQPEYV NQPDVRPQPP
1151 SPREGPLPAA RPAGATLERP KTLSPgSjSv VKDVFAFGGA VENPEYLTPQ
1201 GGAAPQPHPP PAFSPAFDNL YYWDQDPPER GAPPSTFKGT PTAENPEYLG
1251 LDVPV
SEQ ID NO:29
[0185] The potential ERBB2 sumoylation sites and their relative score are summarized below in TABLE 13. A site with a score above 0.5 is considered as a potential sumoylation site with high probability.
TABLE 13
Potential Sumoylation Sites in ERBB2
ERBB2Amino Group SEO ID NO: Score Acid Position
K605 RCPSG VKPD LSYMP SEQ TD NO:30 0.93
K854 ARNVL VKSP NHVKI SEQ ID NO:31 0.82
K747 PDGEN VKIP VAIKV SEQ ID NO:32 0.82
K150 SLTEI LKGG VLIQR SEQ ID NO:33 0.73
K883 YHADG GKVP IKWMA SEQ ID NO:34 0.57
K615 SYMPI WKFP DEEGA SEQ ID NO:35 0.54
Kl 177 KTLSP GKNG VVKDV SEQ ID NO:36 0.50
K937 IPDLL EKGE RLPQP SEQ ID NO:37 0.50
K369 EFAGC KKIF GSLAF SEQ ID NO:38 0.13
[0186] Analysis of the Potential Effect ofERBB2 Mutation on ERBB2 Protein O-GalNAc Glycosylation Sites. The potential O-GalNAc glycosylation sites of ERBB2 have been predicted using NetOGlyc 3.1 (Julenius K., et al., Glycobiology, 15: 153-164 (2005)). Tl 1 17, T1240 and T1242 of ERBB2 are predicted to be potential O-GalNAc glycosylation sites by NetOGlyc 3.1. None of the mutation of ERBB2 identified in the present invention is located close to the predicted position Tl 1 17, Tl 240 and T1242 and thus, none of the mutation of the present invention is likely to cause a change in the O-glycosylation pattern of ERBB2.
[0187] PROSlTE Analysis of the Potential Effect of ERB B 2 mutations on Other ERBB2 Protein Regulatory Sites. The effect of the ERBB2 mutations on other protein regulatory sites was analyzed using the PROSITE computational analysis tool. PROSlTE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs as well as to identify potential sites for protein modification (HuIo N. et al, Nucl. Acids. Res., 32:D134-D137 (2004); Sigrist C.J.A. et al. Brief Bioinform., 3:265-274 (2002); Gattiker A. et al, Applied Bioinformatics, 1 :107-108 (2002)). Other potential sites for protein modification of ERBB2 polypeptide are as predicted by ProSite analysis summarized below in TABLE 14. Amino acid positions highlighted in bold text represent possible interference by mutations. The ERBB2 mutations at amino acid position P856 is close to the active site of protein tyrosine kinase domain, the amino acid position GlOl 5 is located in a EF-hand calcium binding domain and close to a potential tyrosine sulfation site, and additionally the amino acid position D873 is also close to a potential sulfation site at Y877. The mutations identified in the present invention may alter the phosphorylation, calcium binding and tyrosine sulfation of ERBB2.
TABLE 14 Potential ERBB2 protein modification sites predicted by PROSlTE
Function Positions
Protein kinases ATP-binding 726-753 region (PSOO 107) Tyrosine protein kinases specific 841 - 853 (close to P856) active-site (PSOO 109) EF-hand calcium-binding domain 1011 - 1023 (includes G1015) (PS00018)
Cysteine-rich region profile 192-268 (PS50311)
Proline-rich region profile 1102-1234 (PS50099) N-myristoylation site (PS00008) 10 - 15, 19 - 24, 131 - 136, 223 - 228, 327 - 332, 447 - 452, 462 - 467, 572 - 577, 668 - 673, 704 - 709, 729 - 734, 787 -792, 1056-1061, 1062- 1067, 1091 - 1096, 1093-1098, 1231-1236, 1239- 1244.
Casein kinase II phosphorylation 27-30,41 -44, 144-147, 182-185, site (PS00006) 208 - 211, 323 - 326, 402 - 405, 418 - 421, 457 - 460, 633 - 636, 834 - 837, 911 -914, 998-1001, 1007-1010, 1066- 1069, 1122-1125, 1151-1154.
N-glycosylation site (PSOOOOl) 68-71, 124-127, 187-190,259-262, 530 - 533, 571 - 574, 629 - 632.
Protein kinase C phosphorylation 186 - 188, 328 - 330, 457 - 459, 686 - site (PS00005) 688,760-762, 1051-1053, 1151 -1153, 1236-1238. cAlVLP- and cGMP-dependent 683-686, 897-900. protein kinase phosphorylation site (PS00004)
Tyrosine kinase phosphorylation 765 - 772 site (PS00007)
Tyrosine sulfation site (PS00003) 870 - 884 (includes D873), 1016 - 1030 (close to G1015), 1215 - 1229, 1241 - 1255.
[0188] ClustalW Polypeptide Alignment and Sequence Analysis to Estimate the Potential Effect ofERBB2 Mutation on ERBB2 Function. ClustalW polypeptide alignment and sequence analysis was used to estimate the effect of ERBB2 mutation on ERBB2 biological function. Known ERBB2 sequences from human (NP_004439), mouse (NP_001003817), rat (NP_058699), dog (NP_001003217) and zebrafish (NP_956413) were obtained from GenBank and aligned using ClustalW. Chenna et al, Nucleic Acids Res., 31 (13):3497-500 (2003). ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.
[0189] For every position with a mutation reported, the mutated residues are inspected for their occurrence in organisms other than human. It is hypothesized that if the mutated residue is present in the wild type sequence of another species in the corresponding position, the amino acid change may not have any adverse effect on the protein function.. The results of the Clustal W comparison analysis are summarized below in TABLE 15 and TABLE 16. The mutated amino acid residues identified in the present studies are highlighted in bold, underlined text.
TABLE 15 Summary of Sequence Alignment of ERBB2 Sequences from Iylultiple Organisms
mouse_erbb2 MELAAWCRWGFLLALLSPGAAGTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQ rat_erbb2 MIIMELAAWCRWGFLLALLPPGIAGTQVCTGTDMKLRLPASPETHLDΪ-SLRHLYQGCQVVQ huiran_erbb2 MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQ dog_erbb2 MELAAWCRWGLLLALLPSGAAGTQVCTGTDMKLRLPASPΞTHLDMLRHLYQGCQVVQ zebrafish_erbb2 -MΞADRSFGLAWVLLLLLGITAATGREVCLGTDMKLALPSSLENHYEMLRLLYTGCQVVH
mouse_erbb2 GNLΞLTYLPANASLSFLQDIQEVQGYMLIAHNRVKHVPLQRLRIVRGTQLFΞDKYALAVL rat_erbb2 GNLELTYVPANASLSFLQDIQEVQGYMLIAHNQVKRVPLQRLRIVRGTQLFEDKYALAVL human_erbb2 GNLELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVL dog_erbb2 GNLELTYLPANASLSFLQDIQΞVQGYVLIAHSQVRQIPLQRLRIVRGTQLFEDNYALAVL zebrafxsh_erbb2 GNLEITHLQGNPDLSFLQΞIVEVQGYVLIAHVSVRSLPLDNLRIIRGΞQLYKSNYALAVH mouse_erbb2 DNRDPLDNVTTAAPGRTPEGtRELQLRSLTEILKGGVLIRGNPQLCYQDMVLWKDVLRKN rαt_erbb2 DNRDPQDNVAASTPGRTPΞGLRΞLQLRSLTEILKGGVLIRGNPQLCYQDMVLWKDVFRKN human_erbb2 DNGDP-LNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKN dog_erbb2 DNGDP-LEGGIPAPGAAQGGLRCLQLRSLTEILKGGVLIQRSPQLCHQDTILWKDVTHKN zebrafXSh_erbb2 NNSNS SQAGLGLRELRLRSLTEILLGGVYIWGNPQLCFPRNINWEDTVSKV
; * ; ^ . *****.**** + * + * *** + * * * * ^ . * . * ^ * mouse_erbb2 NQLAPVDMDTNRSRACPPCAPTCKDNHCWGΞΞPEDCQILTGTICTSGCARCKGRLPTDCC rat_erbb2 NQLAPVDIDTNRSRACPPCAPACKDNHCWGΞSPΞDCQILTGTICTSGCARCKGRLPTDCC human_erbb2 NQLALTLI DTNRSRACHPCΞPMCKGSRCWGESSEDCQΞLTRTVCAGGCARCKGPLPTDCC dog_erbb2 NQLALTLI DTNRFSACPPCSPACKDAHCWGASSGDCQSLTRTVCAGGCARCKGPQPTDCC zebrafish_erbb2 Q—NKPLHLQDIPKNCPRCSΞACKSGGCWGEKDQDCQTLTSVNCSΞGCSRCKGPKPSDCC mouse_erbb2 HΞQCAAGCTGPKHSDCLACLHFNHΞGICELHCPALITYNTDTFESMLNPEGRYTFGAΞCV rat_erbb2 HEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMHNPEGRYTFGASCV human_erbb2 HEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGAΞCV dog_erbb2 HEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCV zebrafish erbb2 HVQCAAGCTGPKDSDCLACRHFNDSGTCKDSCPPPTIYDPITFQSKPNKDKKFSFGATCV
πouse_erbb2 TTCPYNYLSTEVGSCTLVCPPNNQEVTAEDG TQRCEKCΞKPCAGVCYGLGMEHLRGA rat_erbb2 TTCPYNYLSTEVGSCTLVCPPNNQEVTAEDG TQRCΞKCSKPCARVCYGLGMEHLRGA human_erbb2 TACPYNYLSTDVGSCTLVCPLHNQEVTAEDG TQRCEKCΞKPCARVCYGLGMEHLREV dog_erbb2 TSCPYNYLSTDVGSCTLVCPLNNQEVTAEDG TQRCEKCSKPCARVCYGLGMΞHLREV zebrafish erbb2 KQCPHNYLAMEVACTMVCPKANKEVISVEPDGQETQKCEKCEGECPKVCYGLGMGNLQGV nous e_erbb2 RAITSDNXQEFAGCKKIFGSLAFLPESFDGNPSSGVAPLKPEHLQVFETLEEITGYLYIS rat_erbb2 RAITSDNVQEFDGCKKIFGSLAFLPESFDGDPSSGIAPLRPEQLQVFETLEEITGYLYIS huiran_erbb2 RAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYIS dog_erbb2 RAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLRVFEALEEITGYLYIS zebrafish erbb2 SVVNSTNIGMFTGCEKIΫGSLAFLSDSFKGNADTNSSGLQPΞDLEKLKTIEEITGYLYID
mouse_erbb2 AWPESFQDLSVFQNLRVIRGRILHDGAYSLTLQGLGIHSLGLRSLRELGSGLALIHRNTH rat_erbb2 AWPDSLRDLSVFQNLRIIRGRILHDGAYSLTLQGLGIHSLGLRSLRELGSGLALIHRNAH human_erbb2 AWPDSLPDLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTH dog_erbb2 AWPDSLPNLSVFQNLRVIRGRVLHDGAYSLTLQGLGISWLGLRSLRELGSGLALIHRNAR zebrafish erbb2 AwsENLLDLsVΓENLKVIRGQMLYKGVFSLGVQSLQICSLGLRSLRSVSGGLVLIHNNSR nouse_erbb2 LCFVNTVPWDQLFRNPHQALLHSGNRPEE-ACGLΞGLVCNSLCARGHCWGPGPTQCVNCS rat_erbb2 LCFVHTVPWDQLFRNPHQALLHSGNRPEEDLCVSSGLVCNSLCAHGHCWGPGPTQCVNCS human_erbb2 LCFVHTVPWDQLFRNPHQALLHTANRPED-ECVGEGLACHQLCARGHCWGPGPTQCVNCS dog_erbb2 LCFVHTVPWDQLFRNPHQALLHSANRPEΞ-ECVGEGLACYP-CAHGHCWGPGPTQCVNCS zebrafish erbb2 LCYTSSLPWTSLLHPTQGPNLISNNNKDQQTCVSEGKICDPLCGDSGCWGPGPSQCVSCL mouse_erbb2 QFLRGQECVEECRVWKGLPREYVRGKKCLPCHPECQPQNSSETCYGSEADQCEACAHYKD rat_erbb2 HFLRGQECVEECRVWKGLPREYVSDKRCLPCHPECQPQNSSETCFGΞΞADQCAACAHYKD human_erbb2 QFLRGQECVEECRVLQGLPREYVNARHCLPCFPΞCQPQNGSVTCFGPEADQCVACAHYKD dog_erbb2 QFLRGQECVEECRVLQGLPREYVKDRYCLPCHSΞCQPQNGΞVTCFGΞEADQCVACAHYKD zebrafish erbb2 NYKRGTECVELCNVLHGSVREFEDGFNCVPCHPECRPINGTASCTGPGPDQCTDCMHFQD mouse_erbb2 SSSCVARCPSGVKPDLSYMPIWKY PDEΞGICQPCPINCTHSCVDLDERGCPAEQRASPVT rat_erbb2 SSSCVARCPSGVKPDLSYMPIWKYPDΞEGICQPCPINCTHSCVDLDERGCPAEQRASPVT hunan_erbb2 PPFCVARCPSGVKPDLSYMPIWKFPDΞEGACQPCPINCTHSCVDLDDKGCPAEQRASPLT dog_erbb2 PPFCVARCPSGVKPDLΞFMPIWKFADΞEGTCQPCPINCTHSCADLDEKGCPAEQRASPVT zebrafish erbb2 GDVCVERCPSGVKEE—QHTVWKYSNATGHCLPCETNCTVSCPLDD-RGCPIQQKTGPGT mouse_erbb2 FIIATVVGVLLFLIIVVVIGILIKRRRQKIRKYTMRRLLQETELVEPLTPSGAVPNQAQM rat_erbb2 FIIATVEGVLLFLILVVVVGILIKRRRQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQM human_erbb2 SI I SAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETΞLVEPLTPSGAMPNQAQM dog_erbb2 SIIAAVVGILLAVVVGLVLGILIKRRRQKIRKYTMRRLLQETELVΞPLTPSGAMPNQAQM zebrafish erbb2 TVAITVGGVLLFIILLALLVFYLRRQKhQKKKETIRRRLQEHELVEPLTPSGAMPNQAQM mouse_erbb2 RILKETELRKLKVLGSGAΓGTVYKGIWIPDGEUVKIPVAIKVLRENTSPKANKEILDEAY rat_erbb2 RILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAY human_erbb2 RILKETΞLRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAY dog_erbb2 RILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAY zebrafish erbb2 RILKETELKKLRVLGSGAFGTVFKGIWAPDGENVRIPVAIKVLRENTSPKANKEILDEAY mouse_erbb2 VMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRΞHRGRLGSQDLLNWCVQIAKG rat_erbb2 VMAGVGSPYVΞRLLGICLTSTVQLVlQLMPYGCLLDHVREHRGRLGΞQDLLNWCVQIAKG human_erbb2 VMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKG dog erbb2 VMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVREHRGRLGSQDLLNWCVQIAKG zebrafish erbb2 VMAGVASPYVCRLLGICLTΞTVQLVTQLMPYGCLLDYVRENKDRIGSQYLLΞWCVQIAKG mouse_erbb2 MSYLΞEVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDΞTEYHADGGKVPIKWMALE rat_erbb2 MSYLEDVRLVHRDLAARNVLVKSENHVKITDFGLARLLDIDETEYHADGGKVPIKWMALE human_erbb2 MSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDΞTΞYHADGGKVPIKWMALE dog_erbb2 MSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALE zebrafish erbb2 MSYLEEVRLVHRDLAARNVLVKNPNHVKITDFGLARLLDIDEKCYHADGGKVPIKWMALE mouse_erbb2 SILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGI PAREIPDLLEKGERLPQPPICTIDVY rat_erbb2 SILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGI PAREIPDLLEKGERLPQPPICTIDVY human_erbb2 SILRRRFTHQS DVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVY dog_erbb2 SIPPRRFTHQSDVWSYGVTVWΞLMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVY zebrafxsh erbb2 SILHRKFTHQSDVWSYGVTVWELMTFGMKPYESFQARDIPELLEAGERLSQPCNCTKEVY
mouse_erbb2 MIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFVVIQNED-LGPSSPMDSTFYRΞLLED rat_erbb2 MIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFVVIQNED-LGPSSPMDSTFYRSLLED human_erbb2 MIMVKCWMIDSECRPRFRELVΞEFSRMARDPQRFVVIQNED-LGPASPLDSTFYRSLLED dog_erbb2 MIMVKCWMIDSECRPRFRΞLVAEF3RMARDPQRFVVIQNED-LGPASPLDSTFYRSLLED zebrafish erbb2 MIMVKCWQIDPDNRPRFKDLVDEFTTMARDPSRYVVIQNEDQMSLSΞPVDSΞFFRILMAE iπouse_erbb2 D—DMGΞLVDAEEYLVPQQG--FFSPDPALGTGSTAHRRHRSSSARΞGGGELTLGLEPΞΞ rat_erbb2 D—DMGDLVDAEEYLVPQQG--FFSPDPTPGTGSTAHRRHRSS STRSGGGELTLGLEPSE human_erbb2 D—DMGDLVDAEEYLVPQQG--FFCPDPAPGAGGMVHHRHRΞSSTRSGGGDLTLGLΞPSE dog_erbb2 D--DMGDLVDAEEYLVPQQG--FFCPΞPTPGAGGTAHRRHRSSSTRNGGGΞLTLGLEPSE zebrafish erbb2 EGGNVKΞFLDAEEYLVPQPGS IFNTHGΞMRANGPSRKHSHRSTDQMVEVDGLPNGRΞLYS raouse_erbb2 EEPPRS PLAPSEGAGSDVFDGDLAVGVTKGLQSLSP HDLSPLQRYΞEDPTLPL rat_erbb2 EGPPRS PLAPSEGAGSDVFDGDLAMGVTKGLQSLSP HDLSPLQRYSEDPTLPL human_erbb2 EEAPRS PLAPSΞGAGSDVFCGDLGMGAAKGLQSLPT HDPSPLQRYSSDPTVPL dog_erbb2 EEPPKS PLAPSEGAGSDVFDGDLGMGAAKGLQSLPS QDPSPLQRYSΞDPTVPL zebrafish erbb2 SVSMISQSQYPTLPVGATANGMWPGTQYPPLARSI ΞHRSAGGQSDSVFLDGYVEDSCPPS mouse_erbb2 PPETDGYVAPLACSPQPEYVNQPEVRPQSPLTPEGPPPPIRPAGATLERP KTLSP rat_erbb2 PPETDGYVAPLACSPQPEYVNQSEVQPQPPLTPEGPLPPVRPAGATLERP KTLSP huiτian_erbb2 PΞETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERP KTLSP dog_erbb2 PPETDGKVAPLTCSPQPEYVNQPEVWPQPPLALEGPLPPSRPAGATLERPKTLS PKTLS P zebrafxsh erbb2 SPCRYSKDPTMPNGI DGDLETDGNMVFLΞHTLPRGTHTQPEYVNQDMASERP-STLPRKA mouse_erbb2 GKNGVVKDVFAFGGAVENPEYLAPRAG TASQPHPSPAFSPAFDNLYYWDQNSSEQG rat_erbb2 GKNGVVKDVFAFGGAVENPEYLVPREG TASPPHPSPAFSPAFDNLYYWDQNSSEQG human_erbb2 GKNGVVKDVFAFGGAVENPEYLTPQGG AAPQPHPPPAFSPAFDNLYYWDQDPPERG dog_erbb2 GKNGVVKDVFArGSAVENPEYLAPRGR AAPQPHPPPAFSPAFDNLYYWDQDPSERG ∑ebrafish erbb2 SERRFILNGLSTGNSVENPEYLVPIGSITPTSPAFDNPYYHDIAAKAQAVARVAINGGTN
mouse_erbb2 PPPSTFEGTPTAΞNPEYLGLDVPV SEQIDNO 39 rat_erbb2 PPPSNFEGTPTAΞNPΞYLGLDVPV SEQ ID NO.40 human_erbb2 APPSTFKGTPTAΞNPEYLGLDVPV SEQ ID NO 29 dog_erbb2 SPPSTFEGTPTAENPDYLGLDVPV SEQ ID NO 41 zebrafish erbb2 HRQPNGFMTPTAENPEYLGLADTWSGHKEYT SEQ ID NO.42 [0190] A summary of the sequence alignment of ERBB2 sequences from multiple organisms is shown in TABLE 16 below. Amino acid variations are found in zebrafish for corresponding positions for F371 , G1015 and Pl 170. Amino acid in high conserved positions may alter protein function.
TABLE 16
Summary of Sequence Alignment of ERBB2 Sequences from Multiple Organisms
Mutation Comment
F371L "Y" is present in zebrafish
C475S Conserved
A848S Conserved
P856S Conserved
R868Q Conserved
D873N Conserved
G1015R "K"" is present in zebrafish
P1170A "E" is present in zebrafish
[0191] Analysis of the Potential Effect ofERBB2 mutations on ERBB2 Protein Secondary Structure. The Effect ofERBBl Mutation on Amino Acid Property. The change of amino acid property observed by ERBB2 mutation is summarized in TABLE 17 (Valdar WS. Proteins 48(2): 227-41 (2002)).
TABLE 17 Influence of ERBB2 Mutations on ERBB2 Amino Acid Property
Mutation Property chanee
F371L Aromatic -> Aliphatic
C475S
A848S
P856S
R868Q Positive -> Polar
D873N Negative -> Polar
G 1015R Tiny -> Positive
Pl 170A
[0192] nnPredict Method Analysis of the Wild-type ERBB2 Secondary Structure. Secondary structure predictions of wild-type ERBB2 (TABLE 18) and mutant ERBB2 polypeptides (TABLE 2O5 TABLE 22, TABLE 24, TABLE 26, TABLE 28, TABLE 30, TABLE 32 and TABLE 34) were performed by nnPredict. The basis of the prediction is a two-layer, feedforward neural network. The network weights were determined by a separate program — a modification of the Parallel Distributed Programming suite of McClelland & Rumelhart (MIT Press, Cambridge MA.1, Vol. 3, pp 318-362 (1988)). Complete details of the determination of the network weights is found in Kneller et. al. (J. MoI. Biol, (214): 171-182 (1990)). The output is a secondary structure prediction for each position in the sequence.
[0193] All TABLES {e.g., TABLE 19, TABLE 21, TABLE 23, TABLE 25, TABLE 27,
TABLE 29, TABLE 315 TABLE 33,and TABLE 35) in this section that summarize the
ERBB2 protein secondary structure as predicted by nnPredict use "H", "E" and a dash "-" as identifiers, which are defined as follows. A helix element is designated by the letter "H". A strand element is designated by the letter "E". No prediction is designated by a dash ("-")•
Gray shading represents polypeptide regions where mutation was identified.
[0194] The amino acid sequence of wild-type ERBB2 polypeptide (SEQ ID NO:29) is shown below in TABLE 18.
TABLE 18
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDI FHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VΞECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLVΛn/LGWFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWI PDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRΞNRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFT HQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWM
IDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:29 [0195] A schematic representation of the secondary structure of wild-type ERBB2 polypeptide (SEQ ID NO:29) predicted using nnPredict analysis is shown below in TABLE 19. The position of the mutated amino acid residues are identified with grey shaded text.
• TABLE 19
--HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-ΞEΞEH HEΞEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH—HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH--HHHg—HH HHHHHHHHHHH-EEEEE HHHΞ-EΞE HEEEE EEEHHKHHHHHHH-HHHEEH §ΞE H—HHHHH HHHHH E
-HHHHHH HHH ΞΞ HHHH EEEEHEEE
EΞEEEEEEEEEEEEHHHH-H-H—HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEΞ
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-JΪH-HH 1 EE-HHHH-g g-H-H EEHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E g H
HHH ΞEEE EEE HHHHH E H__j3 EEEEHHH
[0196] The amino acid sequence of ERBB2 mutant polypeptide F371L (SEQ ID NO:43) is shown below in TABLE 20. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 20
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQWQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKILGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGI SWLGLRSLRELGSGLALI HHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLWVLGWFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKI PVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSPNHVKITDFGLARLLDI DETΞYHADGGKVPIKVJMALESILRRRFT HQSDVWSYGVTVWELMTFGAKPYDGI PAREI PDLLEKGERLPQPPICTIDVYMIMVKCWM
IDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO: 43
[0197] A schematic representation of the secondary structure of ERBB2 mutant polypeptide F371L (SEQ ID NO:43) predicted using nnPredict analysis is shown below in TABLE 21. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 21
— HHHHHHHH H H - H EE E HHHHHHHHH EEE
- EEE H -HH H-HHHHH-EEEEH HEEEE H HHHHHEE HHHHH-HHHHHH EEE HHHHHHHH — HHH H HH E E EE EEE EE H EEE HHHHHHHHH
-HHHH--HHHg—HH HHHHHHHHHHH-EEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHEEH EEE H--HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEEEEEEEEEEEEHHHH-H-H—HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEΞ
-E-EEEEEE HHEK HHHHHHHH HHHHH
HHHH-H-HH-HH EE-HHHH H-H EEHHHHHHHHHHE— EEE--EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
HHH EEEE EEE HHHHH E H EEEEHHH
[0198] The amino acid sequence of ERBB2 mutant polypeptide C475S (SEQ ID NO:44) is shown below in TABLE 22. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 22
MELΆALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCΞKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPΞQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLSFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLVVVLGWFGILIKRRQQKIRKYTMRRLLQSTELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKΞILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFT HQSDVWSYGVTVT-JELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWM IDSECRPRFRELVSΞFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEΞEAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAΞNPEYLGLDVPV
SEQ ID NO:44
[0199] A schematic representation of the secondary structure of ERBB2 mutant polypeptide C475S (SEQ ID NO:44) is predicted using nnPredict analysis is shown below in TABLE 23. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 23
—HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH—HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH—HHHH—HH HHHHHHHHHHH-ΞEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHEEE |JEE H--HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEΞEEEEEEEEEEHHHH-H-H—HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-HH-HH EE-HHHH H-H EEHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
HHH~" ~" ""-"E-IiEE —— ~EEE —— — __—. —_ HHHHH E H EEEEHHH
[0200] The amino acid sequence of ERBB2 mutant polypeptide A848S (SEQ ID NO:45) is shown below in TABLE 24. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 24
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQWQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILVJKDI FHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYI SAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALI HHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLVVVLGWFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLASRNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKVJMALESILRRRFT HQSDVWSYGVTVWELMTFGAKPYDGI PAREIPDLLEKGERLPQPPICTI DVYMIMVKCWM IDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA
EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEG AGSDVFDGDLGMGAAKGLOSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGWKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:45
[0201] A schematic representation of the secondary structure of ERBB2 mutant polypeptide A848S (SEQ ID NO:50) predicted using nnPredict analysis is shown below in TABLE 25. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 25
—HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH—HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH--HHHH--HH HHHHHHHHHHH-EEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHEEH EEE H—HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEEEEEEEEEEEEHHHH-H-H--HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE--EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHH 1 EE EE-HHHH H-H EEHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
HHH EEEE EEE . U L4 U U U — — — — _ _ _ _ _ _ __ _ — P — . H EEEEHHH
[0202] The amino acid sequence of ERBB2 mutant polypeptide P856S (SEQ ID NO:46) is shown below in TABLE 26. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 26
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSSNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFT HQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWM IDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHKRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGWKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:46
[0203] A schematic representation of the secondary structure of ERBB2 mutant polypeptide P856S (SEQ ID NO:46) is predicted using nnPredict analysis is shown below in TABLE 27. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 27
--HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH--HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH—HHHH—HH HHHHHHHHHHH-EEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHΞEH EEE H—HHHHH HHHHH E
-HHHHHH HKH EE HHHH EEEEHEEE
EEEEEEEEEEEEEEHHHH-H-H—HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE--EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-HHHHHH If-EEEEHHHH H-H EEHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
HHH EEEE EEE HHHHH E H EEEEHHH
[0204] The amino acid sequence of ERBB2 mutant polypeptide R868Q .(SEQ ID NO:47) is shown below in TABLE 28. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 28
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVROVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEΆYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSPNHVKITDFGLAQLLDIDETEYHADGGKVPIKWMALESILRRRFT HQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWM IDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEΞAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLΞRPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:47
[0205] A schematic representation of the secondary structure of ERBB2 mutant polypeptide R868Q (SEQ ID NO:47) is predicted using nnPredict analysis is shown below in TABLE 29. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 29
—HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH--HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH—HHHH—HH HHHHHHHHHHH-ΞEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHEEH EEE H—HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEEEEEEEEEEEΞHHHH-H-H—HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-HH-HH EE-HHHHH|E H-H EEHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
HHH EEEE EEE HHHHH E
[0206] The amino acid sequence of ERBB2 mutant polypeptide D873N (SEQ ID NO:48) is shown below in TABLE 30. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 30
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHΞQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLWVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR
LVHRDLAARNVLVKSPNHVKITDFGLARLLDINETEYHADGGKVPIKWMALESILRRRFT HQS DVWSYGVTVWELMTFGAKPYDGIPAREI PDLLEKGERLPQPPICTIDVYMIMVKCWM I DSECRPRFRELVSEFSRMARDPQRFVVIQNΞDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEG AGS DVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV
NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:48
[0207] A schematic representation of the secondary structure of ERBB2 mutant polypeptide D873N (SEQ ID NO:48) is predicted using nnPredict analysis is shown below in TABLE 31. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 31
—HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH—HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH—HHHH—HH HHHHHHHHHHH-EEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHEEH EEE H—HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEEEEEΞEEEEEEHHHH-H-H--HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-HH-HH EE-HHHHH-H—H§j-H-E EEHHHHHHHHHHE— EEE--EEEEHHHH HHHKH HHHEE EHHHHHHHH EEEE HHH-E H
HHH EEEE EEE HHHHH E H EEEEHHH
[0208] The amino acid sequence of ERBB2 mutant polypeptide G1015R (SEQ ID NO.49) is shown below in TABLE 32. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 32
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQWQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTV
PWDQLFRNPHQALLHTANRPΞDECVGΞGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PΞGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTS11SAVVG ILLWVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFT HQSDVWSYGVTVWΞLMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWM IDSECRPRFRELVSEFSRMARDPQRFWIQNEDLGPASPLDSTFYRSLLEDDDMRDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEΞAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:49
[0209] A schematic representation of the secondary structure of ERBB2 mutant polypeptide
G1015R (SEQ ID NO:49) is predicted using nnPredict analysis is shown below in
TABLE 33. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 33
--HKHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH—HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH—HHHH—HH HHHHHHHHHHH-EEEEE HHHE-EEE HEEEE EΞEHHHHHHHHHH-HHHEΞH EEE H—HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEEEEΞEEEEΞEEHHHH-H-H--HHH-HHHH--H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-HH-HH CE-HHHH H-H EΞHHHHHHHHHHE— EEE—EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEΞ HHH-E gj-HHHH
HHH EEEΞ EEE HHHHH E H EEEEHHH
[0210] The amino acid sequence of ERBB2 mutant polypeptide Pl 170A (SEQ ID NO:50) is shown below in TABLE 34. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 34
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNL ELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNG DPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLA
LTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQC AAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACP YNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSAN IQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTV PWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQEC VEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARC PSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVG ILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETΞLVEPLTPSGAMPNQAQMRILKETEL RKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSP YVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR LVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFT HQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQ?PICTIDVYMIMVKCWM IDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDA EEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEG AGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYV NQPDVRPQPPSPREGPLPAARPAGATLERAKTLSPGKNGVVKDVFAFGGAVENPEYLTPQ GGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:50
[0211] A schematic representation of the secondary structure of ERBB2 mutant polypeptide Pl 170A (SEQ ID NO:49) is predicted using nnPredict analysis is shown below in TABLE 35. The position of the mutated amino acid residue is identified by grey shaded text.
TABLE 35
—HHHHHHHHHH-H EE E HHHHHHHHH EEE
-EEE H-HHH-HHHHH-EEEEH HEEEE HHHHHHEE HHHHH-HHHHHH EEE HHHHHHHH—HHH
EEE H EEEE HHHE E EE EEE EE H EEE HHHHHHHHH
-HHHH—HHHH--HH HHHHHHHHHHH-EEEEE HHHE-EEE HEEEE EEEHHHHHHHHHH-HHHEEH EEE H—HHHHH HHHHH E
-HHHHHH HHH EE HHHH EEEEHEEE
EEEEEEEEEEEEEEHHHH-H-H—HHH-HHHH—H HHHHHH-HHHH
HHHHEE EEE—EE HHEHH HHHHHHHHEE
-E-EEEEEE HHEH HHHHHHHH HHHHH
HHHH-H-HH-HH EE-HHHH H-H EEHHHHHHHHHHE-- EEE--EEEEHHHH HHHHH HHHEE EHHHHHHHH EEEE HHH-E H
HHH EEEE EEE HHHHH E
-HHHHHg EEEEHHH-
[0212] The influence of ERBB2 mutations on ERBB2 protein secondary structure is summarized below in TABLE 36.
TABLE 36
Influence of ERBB2 Mutations on ERBB2 Protein Secondary Structure Mutation Predicted Protein Secondary Structure Change
F371L No change is predicted
C475S No change is predicted
A848S Alter the secondary structures of 5 amino acids
P856S Alter the secondary structures of 5 amino acids
R868Q Alter the secondary structures of 2 amino acids
D873N Alter the secondary structures of 4 'amino acids
G1015R No change is predicted
Pl 170A Alter the secondary structures of 4 amino acids [0213] Self-Optimized Prediction Method Analysis of the Effect ofERBB2 Mutations on ERBB2 Secondary Structure. Secondary structure prediction of wild-type (TABLE 37) and mutated ERBB2 sequences (see TABLES 38 - 45) were performed by SOPM (self-optimized prediction method; Geourjon C & Deleage G, Protein Eng., 7(2): 157-164 (1994)). SOPM has been developed to improve the success rate in the prediction of the secondary structure of proteins. The ERBB2 wild-type polypeptide or the ERBB2 mutated polypeptide sequences are shown in each TABLE. Also shown in each TABLES 37 to 45 are the predicted ERBB2 protein secondary structure as predicted by SOPM. Protein secondary structure is indicated as "h", "e", "t", and "c", which are defined as follows. An "h" designates alpha helix protein secondary structure. An "e" designates extended strand protein secondary structure. A "t" designates beta turn protein secondary structure. A "c" designates random coil protein secondary structure. As appropriate, the position of the mutated amino acid residue is highlighted as bold underlined text. A shaded area designates a region of the ERBB2 polypeptide where a mutation of the invention was identified.
[0214] The amino acid sequence of wild-type ERBB2 polypeptide (SEQ ID NO:29)' is shown below in TABLE 37.
TABLE 37
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPAS PETHLDMLRHLYQGCQWQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLI DTNRSRACHPCS PMCKGSRCWGESSE hhhhhhhhhttteeeccccceeeccheehhhhecttccceeeeeeccccccccccccccccccccccccc DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE
SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL
REVRAVTSANIQEFAGCKKIIGSLAFLPESFDGDPASNTAPLQPEQLQVFΞTLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLUFVHTVPWDQLFRNPH
QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC cheeeeccccccchhhhhhhccccccttcccccccccceeeeeecctccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC cccccccccccceeeecccccccchhhhccccccceeeccccccccccccceeeeccccccccccccccc
THSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDE AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhhtccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh LVHRDLA|RNVLVKSiNHVKITDFGLA|LLDIJE!ΞTEYHADGGKVPI KWMALES ILRRRFTHQS DVWSYGV
TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTI DVYMIMVKCWMI DSECRPRFRELVSEFSRMA
RDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMgDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL
PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERgJKTLSPGKNGVVKDVFAFGGA ccccccceeccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeccc
VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV
SEQ ID NO:29 [0215] The amino acid sequence of ERBB2 mutant polypeptide F371L (SEQ ID NO:43) is shown below in TABLE 38. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 38
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLRΞL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE hhhhhhhhhttteeeccccceeeccheehhhhecttccceeeeeeccccccccccccccccccccccccc DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE
SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL
REVRAVTSANIQEFAGCKKILGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPH cheehhhneeettceecttcceeeeetcccchhhhhhhhhhttteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC cheeeeccccccchhhhhhhccccccttcccccccccceeeeeecttccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC
THSCVDLDDKGCPAEQRASPLTSI ISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhnhhhhhhhheeeehhhhheeehttcchhhhhhhhhhhhhhhhcccc TPΞGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWI PDGΞNVKIPVAIKVLRENTSPKANKEILDE
AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhhtccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh LVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhhttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMI DSECRPRFRELVSEFSRMA
RDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhnhhhhcccccccccccccccceeeecccccc STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL
PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLS PGKNGVVKDVFAFGGA
VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV cccccccccccccccccccccccchhhhhhheecccccccccccccccccccccccceeeeecce
SEQ ID NO: 43 [0216] The amino acid sequence of ERJBB2 mutant polypeptide C475S (SEQ ID NO:44) is shown below in TABLE 39. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 39
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQWQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE
DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE chhhhhhheccccccccccccccccccccccttccccccccheeeeeccttccccccccceeeeccccee SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL ecccttcceeeeeeeeecccteeeeettcceeeecccccceeectttccccccccccccccccccccchh REVRAVTSANIQEFAGCKKI FGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLSFVHTVPWDQLFRNPH cheehhhheeettceecttcceeeeetcccchhhhhhhhhhtcteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC cheeeeccccccchhhhhhhccccccttcccccccccceeeeeecttccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC
THSCVDLDDKGCPAEQRASPLTS11SAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhhhhhhhhhheeeehhhhheeehttcchhhhhhhhhhhhhhhhcccc TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWI PDGENVKIPVAIKVLRENTSPKANKEILDE
AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhhtccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh
TVWELMTFGAKPYDGI PAREI PDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMA
RDPQRFVVIQNEDLGPAS PLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc
STRSGGGDLTLGLEPSEEΞAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGA VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV cccccccccccccccccccccccchhhhhhheecccccccccccccccccccccccceeeeecce
SΞQ ID NO: 44 [0217] The amino acid sequence of ERBB2 mutant polypeptide A848S (SEQ ID NO:45) is shown below in TABLE 40. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 40
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL
QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE hhhhhhhhhttteeeccccceeeccheehhhhecttccceeeeeeccccccccccccccccccccccccc DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE
SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL
REVRAVTSANIQEFAGCKKI FGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVIRGRILHNGAYΞLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPH cheehhhheeetrceecttcceeeeetcccchhhhhhhhhhttteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC cheeeeccccccchhhhhhhccccccttcccccccccceeeeeecttccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC
THSCVDLDDKGCPAEQRASPLTS11SAVVGILLVWLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhhhhhhhhhheeeehhhhheeehttcchhhhhhhhhhhhhhhhcccc TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDE
AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR
LVHRDLA^RNVLVKSPNHVKITDFGLARLLDIDΞTEYHADGGKVPIKWMALES ILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhhttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGI PAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMI DSECRPRFRELVSEFSRMA ehhheeettccccttcchhhhhhhhhttccccccccceeeeeeeehhhheccttccchhhhhhhhhnhhh RDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc
STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGA VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV c cccccc ccccccccccccccccchhhhhhheecccccccccccccccccccccccceeeeecce
SEQ I D NO : 45 [0218] The amino acid sequence of ERBB2 mutant polypeptide P856S (SEQ ID NO: 46) is shown below in TABLE 41. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 41
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLRCL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLI DTNRSRACHPCS PMCKGSRCWGESSE
DCQSLTRTVCAGGCΆRCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL REVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPH cheehhnheeettceecttcceeeeetcccchhhhhhhhhhttreeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC cheeeeccccccchhhhhhhccccccttcccccccccceeeeeecttccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC cccccccccccceeeecccccccchhhhccccccceeeccccccccccccceeeeccccccccccccccc THSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGI LIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhhhhhhhhhheeeehhhhheeehttcchhhhhhhhhhhhhhhhcccc TPSGAMPNQAQMRILKΞTELRKVKVLGSGAFGTVYKGIWI PDGENVKI PVAIKVLRENTSPKANKEILDE
AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhhtccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh LVHRDLAARNVLVKSSNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhhttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMI DSECRPRFRELVSEFSRMA
RDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc
STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGA VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV cccccccccccccccccccccccchhhhhhheecccccccccccccccccccccccceeeeecce
SEQ ID NO:46 [0219] The amino acid sequence of ERBB2 mutant polypeptide R868Q (SEQ ID NO:47) is shown below in TABLE 42. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 42
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNΆS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc "LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL
QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE hhhhhhhhhttteeeccccceeeccheehhhhecttccceeeeeeccccccccccccccccccccccccc DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE
SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAΞDGTQRCEKCSKPCARVCYGLGMEHL REVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP
DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPH cheehhhheeettceecttcceeeeetcccchhhhhhhhhhttteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC
LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC cccccccccccceeeeccccccccnhhhccccccceeeccccccccccccceeeeccccccccccccccc THSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhhhhhhhhhheeeehhhhheeehttcchhhhhhhhhhhhhhhhcccc TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDE
AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhht ccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh LVHRDLAARNVLVKSPNHVKITDFGLAQLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhcttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMA ehhheeettccccttcchhhhhhhhhttccccccccceeeeeeeehhhheccttccchhhhhhhhhhhhh RDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc
STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGA VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV cccccccccccccccccccccccchhhhhhheecccccccccccccccccccccccceeeeecce
SEQ ID NO: 47 [0220] The amino acid sequence of ERBB2 mutant polypeptide D873N (SEQ ID NO:48) is shown below in TABLE 43. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 43
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE hhhhhhhhhttteeeccccceeeccheehhhhecttccceeeeeeccccccccccccccccccccccccc DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE chhhhhhheccccccccccccccccccccccttccccccccheeeeeccttccccccccceeeeccccee SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL ecccttcceeeeeeeeecccteeeeettcceeeecccccceeectttccccccccccccccccccccchh REVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVIRGRILHNGAYSLTLQGLGI SWLGLRSLRELGSGLALI HHNTHLCFVHTVPWDQLFRNPH cheehhhheeettceecttcceeeeetcccchhhhhhhhhhttteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC cheeeeccccccchhhhhhhccccccttcccccccccceeeeeecttccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDΞEGACQPCPINC
THSCVDLDDKGCPAEQRASPLTSI ISAWGILLVVVLGWFGILIKRRQQKIRKYTMRRLLQETELVEPL
TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDE cccccccchhhhhhhhhhhhhheeeecccccceeeeeeeccttcccccceeeeeehttcccccchhhhhh AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhhtccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh LVHRDLAARNVLVKSPNHVKITDFGLARLLDINETΞYHADGGKVPIKWMALESILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhcttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMI DSΞCRPRFRELVSEFSRMA ehhheeettccccttcchhhhhhhhhttccccccccceeeeeeeehhhheccttccchhhhhhhhhhhhh RDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc
STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGA VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV cccccccccccccccccccccccchnhhhhheecccccccccccccccccccccccceeeeecce
SEQ ID NO:48 [0221] The amino acid sequence of ERBB2 mutant polypeptide G1015R (SEQ ID NO:49) is shown below in TABLE 44. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 44 MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNAS
LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE
DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL
REVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVI RGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPH cheehhhheeettceecttcceeeeetccccnhhhhhhhhhttteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC
LPCHPECQPQNGSVTCFGPΞADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC
THSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhhhhhhhhhheeeehhhhheeehrtcchhhhhhhhhhhhhhhhcccc TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDE
AYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR
LVHRDLAARNVLVKSPNHVKITDFGLARLLDI DETEYHADGGKVPIKWMALES ILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhhttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMI DSECRPRFRELVSEFSRMA ehhheeettccccttcchhhhhhhhhttccccccccceeeeeeeehhhheccttcccnhhhhhhhhhhhh RDPQRFWIQNEDLGPASPLDSTFYRSLLEDDDMRDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS ttttceeeeetcccccccccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeecccccc STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL
PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGA
VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV cccccccccccccccccccccccchhhhhhheecccccccccccccccccccccccceeeeecce
SEQ ID NO:49 [0222] The amino acid sequence of ERBB2 mutant polypeptide Pl 170A (SEQ ID NO:50) is shown below in TABLE 45. The position of the mutated amino acid residue is highlighted in bold underlined text.
TABLE 45
MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQWQGNLELTYLPTNAS hhhhhhhhhheeeeeccttccceeecttccceeeccccchhhhhhhhhhhtthheeetcheeeecccccc LSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLREL hhhhhhhhhhhteeeeehttccccccccheeettcceecthheeeeeetcccccccccccccccccchhh QLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSE
DCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFE chhhhhhheccccccccccccccccccccccttccccccccheeeeeccttccccccccceeeeccccee SMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHL θcccttccsθθθeeeθecccteθθeβttcceθθecccccθesθθtttccccccccccccccccccccchh REVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLP hhhheechtchhhhhtccccccceeeeeecccccccccccccccthhhhhhhhhhhhheeeeeccccccc DLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPH cheehhhheeettceecttcceeeeetcccchhhhhhhhhhctteeeeettcceeeeecccchheecccc QALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHC hheeeeccccccchhhhhhhccccccttcccccccccceeeeeecttccehhhhhhhttccchheccccc LPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINC
THSCVDLDDKGCPAEQRASPLTSI ISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPL cccccccccccccccccccccchhhhhhhhhhheeeehhhhheeehttcchhhhhhhhhhhhhhhhcccc TPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDE
AYVMAGVGS PYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVR hhhhhtccctthhheeeccccchhhhhhhhcccchhhhhhhhtttcccchhhhhhhhhhhtthhhhhhhh LVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKVJMALESILRRRFTHQSDVWSYGV hhhhhhhhhheeeccccceeeeehhhhhhhcccchhhhhttcccceehhhhhhhhhhhcccccceeeetc TVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMA ehhheeettccccttcchhhhhhhhhttccccccccceeeeeeeehhhheccttccchhhhhhhhhhhh'n RDPQRFWIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSS
STRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPL PSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERAKTLSPGKNGVVKDVFAFGGA
VENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV ccccteeccccccccccccccccchhhhhheeecccccccccccccccccccccccceeeeecce
SEQ ID NO:50 [0223] The influence of ERBB2 mutations on ERBB2 protein secondary structure is summarized below in TABLE 46.
TABLE 46
Influence of ERBB2 Mutations on ERBB2 Protein Secondary Structure Mutation Predicted Protein Secondary Structure Change F371L No change is predicted
C475S No change is predicted
A848S No change is predicted
P856S No change is predicted
R868Q Alter the secondary structure of 1 amino acid
D873N Alter the secondary structure of 1 amino acid
G1015R No change is predicted Pl 170A Alter the secondary structures of 8 amino acids which are located remotely
[0224] General Overview Analysis. A summary of the results of computational analysis of the effect of the ERBB2 mutations and SNPs identified in the present invention on select features of wild-type ERBB2 is provided below in TABLE 47. In 45 tumour tissue samples from breast cancer patients, eight (8) missense mutations of non-synonymous SNPs of ERBB2 were identified. Mutations F371L and C475S are located in a receptor L domain of ERBB2. A848S, P856S, R868Q and D873N are found in the protein tyrosine kinase domain of ERBB2, and A848 and R868 are highly conserved. Mutations D873N and Pl 170A are predicted to alter the phosphosphorylation patterns of ERBB2. Secondary structure analysis indicates that A848S, P856S, D873N and Pl 170A significantly alter the secondary structure ofERBB2.
TABLE 47
Evaluation of ERBB2 Mutations by Sequence Features
Mutation Protein Phospho- Other AA AA property Secondary domain rylation modification conservation change Structure
F371L +
C475S + +
A848S ++ +
P856S + + +
R868Q ++ + +
D873N + + + + +
G1015R + +
Pl 170A +
+: the effect of mutation on protein function is low the effect of mutation on protein function is medium the effect of mutation on protein function is high EXAMPLE II
ANALYSIS OF ERBB2 MUTATION FOR THERANOSTIC CANCER TREATMENT ΪN A
SUBJECT
[0225] In this invention, an agent that modulates ERBB2 biological activity (i.e., ERBB2 modulating agent, e.g., ERBB2 antagonist) is administered to a patient with cancer, e.g., breast cancer, when the patient has a single nucleotide polymorphism (SNP) pattern indicative of an ERBB2 mutation that correlates with the disease. In one embodiment, the SNP is selected from the group consisting of the ERBB2 mutation summarized in TABLE 1 and
TABLE 2.
EQUIVALENTS
[0226] The present invention is not to be limited in terms of the particular embodiments described in this application, which are intended as single illustrations of individual aspects of the invention. Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the invention, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. The present invention is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

CLAIMSWe claim:
1. The use of an ERBB2 modulating agent in the manufacture of a medicament for the treatment of cancer in a selected patient population, wherein the patient population is selected on the basis of the genotype of the patients at a ERBB2 genetic locus indicative of a propensity for having cancer, wherein the locus has a sequence selected the group of ERJBB2 mutations listed in TABLE 1.
2. The use of an ERBB2 modulating agent according to claim 1, wherein the ERBB2 modulating agent is selected from the group consisting of: AEE788, lapatinib (GW572016), HKI-272, PD158780, PKI-166, AG879, TAK165, CI-1033, CP- 654577, AG825, BMS-599626, EKB-569, PD153035, SUl 1925, ZM 252868, CP127,374, SUCl 02, pertuzumab and trastuzumab.
3. The use of an ERBB2 modulating agent according to any one of claims 1 to 3, wherein the cancer is selected from the group consisting of: breast cancer, genitourinary cancer, ovarian cancer, lung cancer, non-small-cell lung cancer (NSCLC), prostate cancer, gastric cancer, gastrointestinal cancer, colon cancer, bladder cancer, renal cancer, pancreas cancer, glioblastoma, melanoma, cholangioma, epidermoid cancer, neuroblastoma, head cancer, neck cancer, brain cancer, gastrinomas, adenocarcinoma, oral squamous cell carcinoma, urothelial carcinomas, squamous cell carcinoma of the uterine cervix, chronic myeloid leukaemia (CML), acute myelogenous leukaemia (AML), and hyperplasias.
4. An isolated polynucleotide having a sequence encoding an ERBB2 polypeptide having a sequence selected from the group consisting of: SEQ ID NO.43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, and SEQ ID NO:50.
5. A recombinant vector comprising a polynucleotide encoding an ERBB2 polypeptide having a sequence selected from the group consisting of: SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, and SEQ ID NO:50.
6. An isolated polypeptide having a sequence selected from the group consisting of: SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, and SEQ ID NO:50.
7. A method for treating cancer in a subject, comprising the steps of:
(a) obtaining the genotype or haplotype of a subject at a ERBB2 gene locus, wherein the genotype and/or haplotype is indicative of a propensity for having cancer, wherein the locus has a sequence selected the group of ERBB2 mutations listed in TABLE 1 ; and
(b) administering an anti-cancer therapy to the subject.
8. The method of claim 7, wherein the anti-cancer therapy is selected from the group consisting of Glivec®, FEMARA®, Sandostatin® LAR® , ZOMET A®, vatalanib, everolimus, gimatecan, patupilone, midostaurin, pasireotide, LBH589, AEE788 and AMN107.
9. The method of claim 7 or 8, wherein the cancer is selected from the group consisting of: breast cancer, ovarian cancer, melanoma, glioblastoma; cholangioma; non-small- cell lung cancer (NSCLC); prostate cancer; and colon cancer.
10. The method of any one of claims 7 to 9, wherein the anti-cancer therapy is the administration of a therapeutically effective amount of an ERBB2 modulating agent selected from the group consisting of: AEE788, lapatinib (GW572016), HKI-272, PDl 58780, PKI-166, AG879, TAK165, CI-1033, CP-654577, AG825, BMS-599626, EKB-569, PD153035, SUl 1925, ZM 252868, CP127,374, SUC102, pertuzumab and trastuzumab.
1. A method for diagnosing a propensity for having cancer in a subject, comprising the steps of:
(a) obtaining the genotype or haplotype of a subject at a ERBB2 gene locus, wherein the genotype and/or haplotype is indicative of a propensity for having cancer, wherein the locus has a sequence selected the group of ERBB2 mutations listed in TABLE 1 ; and
(b) identifying the subject as having a propensity for having cancer.
PCT/US2007/003305 2006-02-09 2007-02-07 Mutations and polymorphisms of erbb2 WO2007095038A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US77190706P 2006-02-09 2006-02-09
US60/771,907 2006-02-09

Publications (2)

Publication Number Publication Date
WO2007095038A2 true WO2007095038A2 (en) 2007-08-23
WO2007095038A8 WO2007095038A8 (en) 2008-04-10

Family

ID=38371999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/003305 WO2007095038A2 (en) 2006-02-09 2007-02-07 Mutations and polymorphisms of erbb2

Country Status (1)

Country Link
WO (1) WO2007095038A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130078270A1 (en) * 2010-06-07 2013-03-28 Pfizer Inc. Her-2 peptides and vaccines
WO2013108869A1 (en) * 2012-01-20 2013-07-25 国立大学法人岡山大学 Therapeutic or prophylactic agent for cancer
WO2013113796A1 (en) * 2012-01-31 2013-08-08 Smithkline Beecham (Cork) Limited Method of treating cancer
US8916574B2 (en) 2009-09-28 2014-12-23 Qilu Pharmaceutical Co., Ltd. 4-(substituted anilino)-quinazoline derivatives useful as tyrosine kinase inhibitors
US9139558B2 (en) 2007-10-17 2015-09-22 Wyeth Llc Maleate salts of (E)-N-{4-[3-Chloro-4-(2-pyridinylmethoxy)anilino]-3-cyano-7-ethoxy-6-quinolinyl}-4-(dimethylamino)-2-butenamide and crystalline forms thereof
US9211291B2 (en) 2009-04-06 2015-12-15 Wyeth Llc Treatment regimen utilizing neratinib for breast cancer
US9265784B2 (en) 2008-08-04 2016-02-23 Wyeth Llc Antineoplastic combinations of 4-anilino-3-cyanoquinolines and capecitabine
CN105693546A (en) * 2014-11-27 2016-06-22 中国科学院上海药物研究所 Uses of 2-hydroxy-N-(4-hydroxyphenyl)-benzamide compounds in preparation of tyrosinase inhibitors
US9511063B2 (en) 2008-06-17 2016-12-06 Wyeth Llc Antineoplastic combinations containing HKI-272 and vinorelbine
US10596162B2 (en) 2005-02-03 2020-03-24 Wyeth Llc Method for treating gefitinib resistant cancer
US10729672B2 (en) 2005-11-04 2020-08-04 Wyeth Llc Antineoplastic combinations with mTOR inhibitor, trastuzumab and/or HKI-272
WO2022206929A1 (en) * 2021-04-01 2022-10-06 上海医药集团股份有限公司 Application of compound in preparation of inhibitory drug targeting erbb2 mutant

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
No Search *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10603314B2 (en) 2005-02-03 2020-03-31 The General Hospital Corporation Method for treating gefitinib resistant cancer
US10596162B2 (en) 2005-02-03 2020-03-24 Wyeth Llc Method for treating gefitinib resistant cancer
US10729672B2 (en) 2005-11-04 2020-08-04 Wyeth Llc Antineoplastic combinations with mTOR inhibitor, trastuzumab and/or HKI-272
US9630946B2 (en) 2007-10-17 2017-04-25 Wyeth Llc Maleate salts of (E)-N-{4-[3-chloro-4-(2-pyridinylmethoxy)anilino]-3-cyano-7-ethoxy-6-quinolinyl}-4-(dimethylamino)-2-butenamide and crystalline forms thereof
US9139558B2 (en) 2007-10-17 2015-09-22 Wyeth Llc Maleate salts of (E)-N-{4-[3-Chloro-4-(2-pyridinylmethoxy)anilino]-3-cyano-7-ethoxy-6-quinolinyl}-4-(dimethylamino)-2-butenamide and crystalline forms thereof
US10035788B2 (en) 2007-10-17 2018-07-31 Wyeth Llc Maleate salts of (E)-N-{4[3-chloro-4-(2-pyridinylmethoxy)anilino]-3-cyano-7-ethoxy-6-quinolinyl}-4-(dimethylamino)-2-butenamide and crystalline forms thereof
US10111868B2 (en) 2008-06-17 2018-10-30 Wyeth Llc Antineoplastic combinations containing HKI-272 and vinorelbine
US9511063B2 (en) 2008-06-17 2016-12-06 Wyeth Llc Antineoplastic combinations containing HKI-272 and vinorelbine
US9265784B2 (en) 2008-08-04 2016-02-23 Wyeth Llc Antineoplastic combinations of 4-anilino-3-cyanoquinolines and capecitabine
US9211291B2 (en) 2009-04-06 2015-12-15 Wyeth Llc Treatment regimen utilizing neratinib for breast cancer
US8916574B2 (en) 2009-09-28 2014-12-23 Qilu Pharmaceutical Co., Ltd. 4-(substituted anilino)-quinazoline derivatives useful as tyrosine kinase inhibitors
US20130078270A1 (en) * 2010-06-07 2013-03-28 Pfizer Inc. Her-2 peptides and vaccines
US8895017B2 (en) * 2010-06-07 2014-11-25 Pfizer Inc. HER-2 peptides and vaccines
WO2013108869A1 (en) * 2012-01-20 2013-07-25 国立大学法人岡山大学 Therapeutic or prophylactic agent for cancer
WO2013113796A1 (en) * 2012-01-31 2013-08-08 Smithkline Beecham (Cork) Limited Method of treating cancer
CN105693546A (en) * 2014-11-27 2016-06-22 中国科学院上海药物研究所 Uses of 2-hydroxy-N-(4-hydroxyphenyl)-benzamide compounds in preparation of tyrosinase inhibitors
CN105693546B (en) * 2014-11-27 2019-05-31 中国科学院上海药物研究所 4- hydroxyl Salicylanilide compounds are preparing the application in tyrosinase inhibitor
WO2022206929A1 (en) * 2021-04-01 2022-10-06 上海医药集团股份有限公司 Application of compound in preparation of inhibitory drug targeting erbb2 mutant

Also Published As

Publication number Publication date
WO2007095038A8 (en) 2008-04-10

Similar Documents

Publication Publication Date Title
WO2007095038A2 (en) Mutations and polymorphisms of erbb2
WO2006130527A2 (en) Mutations and polymorphisms of fibroblast growth factor receptor 1
WO2007016532A2 (en) Mutations and polymorphisms of hdac4
US20100035251A1 (en) BioMarkers for the Progression of Alzheimer&#39;s Disease
KR20160068754A (en) Mutant calreticulin for the diagnosis of myeloid malignancies
EP1869214A2 (en) Biomarkers for pharmacogenetic diagnosis of type 2 diabetes
AU2003295986A1 (en) Methods for identifying risk of breast cancer and treatments thereof
US20100249107A1 (en) Biomarkers for Alzheimer&#39;s Disease Progression
WO2006110478A2 (en) Mutations and polymorphisms of epidermal growth factor receptor
WO2012131092A2 (en) Method and kits for the prediction of response/nonresponse to the treatment with an anti-egfr antibody in patients with colorectal cancer of all uicc stages
WO2007022041A2 (en) Mutations and polymorphisms of hdac3
WO2006060429A2 (en) Identification of variants in histone deacetylase 1 (hdac1) to predict drug response
AU2006227283B2 (en) Biomarkers for efficacy of aliskiren as a hypertensive agent
JP2010535517A (en) Predictive marker for EGFR inhibitor treatment
WO2007058992A2 (en) Mutations and polymorphisms of hdac6
WO2007030455A2 (en) Mutations and polymorphisms of hdac10
WO2007109183A2 (en) Mutations and polymorphisms of fms-related tyrosine kinase 1
WO2007038073A2 (en) Mutations and polymorphisms of hdac11
WO2007030454A2 (en) Mutations and polymorphisms of hdac9
WO2007002217A2 (en) Mutations and polymorphisms of bcl-2
WO2007109515A2 (en) Mutations and polymorphisms of knockdown resistance polypeptide
WO2007095032A2 (en) Mutations and polymorphisms of ptk2b
US20200370102A1 (en) Biomarker indicating response to poziotinib therapy for cancer
WO2007121017A2 (en) Mutations and polymorphisms of fms-like tyrosine kinase 4
WO2007127524A2 (en) Mutations and polymorphisms of insr

Legal Events

Date Code Title Description
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07717219

Country of ref document: EP

Kind code of ref document: A2

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载