+

WO2010093379A1 - Gene expression profiling identifies genes predictive of oral squamous cell carcinoma and its prognosis - Google Patents

Gene expression profiling identifies genes predictive of oral squamous cell carcinoma and its prognosis Download PDF

Info

Publication number
WO2010093379A1
WO2010093379A1 PCT/US2009/051743 US2009051743W WO2010093379A1 WO 2010093379 A1 WO2010093379 A1 WO 2010093379A1 US 2009051743 W US2009051743 W US 2009051743W WO 2010093379 A1 WO2010093379 A1 WO 2010093379A1
Authority
WO
WIPO (PCT)
Prior art keywords
sccigs
genes
oscc
gene
squamous cell
Prior art date
Application number
PCT/US2009/051743
Other languages
French (fr)
Inventor
Chu Chen
Eduardo Mendez
John Houck
Wenhong Fan
Pawadee Lohavanichbutr
Bevan Yueh
Neal D. Futran
Stephen M. Schwartz
Lue Ping Zhao
Original Assignee
Fred Hutchinson Cancer Research Center
The University Of Washington
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fred Hutchinson Cancer Research Center, The University Of Washington filed Critical Fred Hutchinson Cancer Research Center
Publication of WO2010093379A1 publication Critical patent/WO2010093379A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the presently disclosed invention embodiments relate to compositions and methods for the detection and treatment of cancer.
  • the present embodiments relate to identifying the presence of, or a risk for having, squamous cell carcinoma including oral squamous cell carcinoma (OSCC) and head-and-neck squamous cell carcinoma (HNSCC) in a subject, by identifying differential expression of one or more squamous cell carcinoma indicator genes as described herein.
  • OSCC oral squamous cell carcinoma
  • HNSCC head-and-neck squamous cell carcinoma
  • OSCC Squamous cell carcinoma of the oral cavity and oropharynx
  • the present invention provides a method for identifying a risk for having, or presence of, oral squamous cell carcinoma (OSCC) in a subject, the method comprising (a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; and (b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells; wherein differential expression of the SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC.
  • SCCIGS squamous cell carcinoma indicator gene set
  • the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1 -200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
  • the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, and (c) the SCCIGS consisting of COL1A1 and PADM genes.
  • the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, (c) the SCCIGS consisting of COL1A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene, (e) the SCCIGS consisting of KRT17 and PRSS3 genes, (f) the SCCIGS consisting of COL1 A2 and EST 230740_1 at genes, (g) the SCCIGS consisting of COL1A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 (referred to as C2orf54 in the Affymethx database) and HAS3 genes, (i) the SCCIGS consisting of POSTN and TIA2 (referred to as PDPN in the Affymetrix database) genes, (j) the SCCIGS consisting of MGC403
  • the step of determining a SCCIGS expression level comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
  • SCCIG detectable, squamous cell carcinoma indicator gene
  • the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331.
  • the biological sample comprises a biopsy tissue, which in certain further embodiments is selected from an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue.
  • the biological sample comprises one or a plurality of dysplastic cells.
  • a method for identifying a risk for having, or presence of, oral squamous cell carcinoma (OSCC) in a subject having oral epithelial dysplasia but no frank OSCC comprising (a) determining a squamous cell carcinoma indicator gene set
  • SCCIGS squamous cell carcinoma indicator gene set in a biological sample from the subject that comprises at least one dysplastic oral epithelial cell; and (b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of OSCC cells; wherein substantial similarity of the SCCIGS expression level in the biological sample relative to the OSCC reference SCCIGS expression levels indicates the subject has, or is at risk for having, OSCC.
  • the squamous cell carcinoma indicator gene set comprises any one or more of the genes shown in Figure 6.
  • the step of determining a SCCIGS expression level comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of: (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG- specific probe, and thereby determining the SCCIGS expression level.
  • SCCIG detectable, squamous cell carcinoma indicator gene
  • the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of the probes listed in Figure 9.
  • the biological sample comprises a biopsy tissue.
  • the subject has no detectable cancer and the biological sample comprises one or a plurality of dysplastic cells.
  • a method for identifying a risk for having, or presence of, a squamous cell carcinoma (SCC) in a subject wherein the SCC is selected from oral SCC (OSCC) and head-and- neck SCC (HNSCC), the method comprising (a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; and (b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells, wherein: if the biological sample comprises an OSCC cell then the control tissue comprises normal oral epithelium, and if the first biological sample comprises a HNSCC cell then the control tissue comprises normal epithelium from oropharynx, hypopharynx, larynx or
  • SCCIGS squam
  • the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1 -200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
  • the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, and (c) the SCCIGS consisting of
  • the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, (c) the SCCIGS consisting of COL1A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene, (e) the SCCIGS consisting of KRT17 and PRSS3 genes, (f) the SCCIGS consisting of COL1 A2 and EST 230740_1at genes, (g) the SCCIGS consisting of COL1A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 (referred to as C2orf54 in the Affymetrix database) and HAS3 genes, (i) the SCCIGS consisting of POSTN and TIA2 (referred to as PDPN in the Affymetrix database) genes, (j) the SCCIGS consisting of POSTN and TIA2
  • the biological sample comprises an OSCC cell and the control tissue comprises normal oral epithelium.
  • the biological sample comprises a HNSCC cell and the control tissue comprises normal epithelium from oropharynx, hypopharynx, larynx or oral cavity.
  • the step of determining a SCCIGS expression level comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of: (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
  • SCCIG detectable, squamous cell carcinoma indicator gene
  • the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331.
  • the biological sample comprises a biopsy tissue.
  • the biopsy tissue is selected from the group consisting of an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue.
  • the biological sample comprises one or a plurality of dysplastic cells.
  • a method for identifying an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC comprising: (a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; (b) determining that the subject has, or is at risk for having, OSCC by comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells, wherein a differentially expressed SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC; and (c) identifying within said differentially expressed SCCIGS a presence or absence of a substantially up- or down-regulated SCCIGS subset (SCCIGSS), wherein
  • the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1 -200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
  • the SCCIGSS is one or more SCCIGSS selected from the group consisting of (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGSS consisting of a LAMC2 gene, (c) the SCCIGSS consisting of OSMR, SERPINE1 and OASL genes, (d) the SCCIGSS consisting of a SLC16A1 gene, (e) the SCCIGSS consisting of a KLF7 gene, (f) the SCCIGSS consisting of THBS1 and SLC16A1 genes, (g) the SCCIGSS consisting of a HOMER3 gene, (h) the SCCIGSS consisting of a GRP68 gene, (i) the SCCIGSS consisting of a PDPN gene, (j) the SCCIGSS consisting of an ANKRD35 gene, and (k) the SCCIGSS consisting of CDH3 and EPS8L1 genes.
  • the SCCIGS is one or more SCCIGS selected from the group consisting of (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, (c) the SCCIGS consisting of COL1A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene, (e) the SCCIGS consisting of KRT17 and PRSS3 genes, (f) the SCCIGS consisting of COL1A2 and EST 230740_1 at genes, (g) the SCCIGS consisting of COL1A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 and HAS3 genes, (i) the SCCIGS consisting of POSTN and TIA2(PDPN) genes, (j) the SCCIGS consisting of MGC40368(TCP11 L2), GIP3(IFI6) and COL27A1
  • At least one of the steps selected from the step of determining a SCCIGS expression level and the step of identifying a SCCIGSS comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
  • SCCIG detectable, squamous cell carcinoma indicator gene
  • the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from SEQ ID NOS:201-331.
  • the biological sample comprises a biopsy tissue, which in certain further embodiments is selected from an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue.
  • the biological sample comprises one or a plurality of dysplastic cells.
  • determining one or a plurality of SCCIGS expression levels comprises measuring one or more protein levels in the biological sample.
  • the biological sample comprises a biological fluid, which in certain still further embodiments is selected from saliva, blood, serum, plasma and lymph.
  • Figure 1 shows the most prominently involved biological pathways in oral squamous cell carcinoma (OSCC). Top: IFN- ⁇ signaling pathway.
  • Figure 3 shows the top 10 squamous cell carcinoma indicator gene set (SSCIGS) models from the logistic regression analyses of the selected biomarker genes from Figure 2.
  • the predictive power of these SSCIGS models was validated using internal and external (GSE6791 ) controls, as measured by the area under the curve (AUC).
  • An AUC of 0.5 represents a test that is no better than chance at discriminating between cases and controls, and an AUC of 1.0 provides perfect discrimination.
  • Figure 4 shows qRT-PCR results comparing RNA transcripts for four genes between OSCC cases and controls (see Example 2).
  • FIG. 5 shows the tissue specificity of a squamous cell carcinoma gene set (SSCIGS) consisting of LAMC2 and COL4A1 (top) and a SSCIGS consisting of COL1A1 and PADH (bottom).
  • the data from Example 5 are represented in Box Whisker plots of logistic regression scores (y axis) for normal controls and cases in an internal testing set (N: normal, DYS: dysplasia, T: OSCC), GEO GSE6791 head and neck normal controls (HNN) and cases (HNT), GEOGSE 6791 cervical normal controls (CN) and cases (CT), and GEO GSE6044 lung normal controls (LN), lung squamous cell carcinoma (LSCC), lung adenocarcinoma (LAD) and lung small cell cancer (LSC).
  • Figure 6 shows a list of biomarker genes that are differentially expressed between OSCC and dysplasia/normal controls, and which can be utilized, for example, to distinguish between frank OSCC and dysplasia
  • Figure 7 shows the sequence information, including GenBank accession numbers and descriptive annotations for SEQ ID NOS:1 -200, a list of differentially expressed SCC biomarker genes that may be used to identify the risk or presence of SCC in a subject.
  • Figure 8 shows the sequence identifiers for SEQ ID NOS:201 -331 , a list of exemplary Affymetrix probes (Affymethx Corp., Santa Clara, CA) that specifically hybridize to certain of the SCC biomarker genes or gene sets described herein.
  • the middle column shows the Affymetrix probe identifier and the far right column shows the corresponding biomarker gene to which the probe specifically hybridizes.
  • Figure 9 shows a selected set of exemplary Affymetrix probes that specifically hybridize to certain of the SCC biomarker genes that can be utilized to discriminate between OSCC tumor cells and dysplastic epithelial cells.
  • the middle column shows the Affymetrix probe identifier and the far right column shows the corresponding biomarker gene to which the probe specifically hybridizes.
  • Figure 10 shows supervised hierarchical cluster analysis of the gene expression data.
  • the 131 probe sets were clustered as described in the text.
  • the bar underneath the heat map codes the samples according to tissue phenotype: normal, dysplasias and tumors.
  • Each column in the heat map represents the expression levels for all genes in a particular sample, whereas each row represents the relative expression of a particular gene across all samples.
  • the expression level of any gene in any given sample was also recorded along a color scale (not shown) in which red represents transcription up-regulation, green represents down-regulation, and the color intensity indicates the magnitude of deviation from the mean.
  • Cluster 1 refers to a group of probe sets which appear to be only fully downregulated in a group of 45 patients labeled with a bar at the bottom of the heat map.
  • Figure 11 shows Principal Component Analysis (PCA) using the
  • the first principal component (PC) is plotted on the x-axis and captures 63.28 % of the variance.
  • the second PC is plotted on the y-axis and captures 5.66 % of the variance.
  • Figure 12 shows survival and OSCC-specific mortality estimates in OSCC patients. The two groups were identified with hierarchical clustering analysis using the 131 differentially expressed genes in invasive OSCC as described in the text.
  • 12A Kaplan-Meier analysis of all-cause mortality. Vertical marks represent censored events.
  • 12B Cumulative incidence of OSCC-specific mortality.
  • Figure 13 shows Receiver Operating Characteristic Analysis of 2- year Survival Comparing the Prognostic Ability of Stage with Gene Expression Data.
  • 13A ROC Curves for 2-year survival for , 'stage', 'LAMC2' and 'PCA'.
  • 13B ROC Curves for 2-year survival for models 'stage', 'stage and LAMC2' and 'stage and PCA'.
  • 13C Area Under the Curve (AUC) and bootstrapped 95% Confidence Intervals for all five models.
  • Figure 14 shows three-dimensional plot of the first and second principal components and the risk scores from the top Cox regression model (0.59151 * LAMC2). The samples are color coded according to vital status. Diamonds (0) are used to show the overlap between risk scores and samples from either the group of 45 (red diamonds) or the group of 74 (blue diamonds).
  • Embodiments of the present invention relate generally to the use of highly predictive gene expression profiling, based on differentially expressed biomarker genes or gene sets, to detect the presence or risk of squamous cell carcinoma (SCC) in a subject, and in certain further embodiments to identify an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC.
  • the methods provided herein may be used to identify a variety of head and neck squamous cell carcinomas (HNSCC), including oral squamous cell carcinomas (OSCC).
  • Gene expression profiling is a useful way to distinguish between cells that express different phenotypes, and may be used in particular embodiments to distinguish between cancer cells and normal cells, or in other embodiments to distinguish between different types of cancer cells, and/or in certain further embodiments to identify aggressively neoplastic OSCC cells in a method for identifying an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC.
  • OSCC oral squamous cell carcinoma
  • Gene expression profiling according to the methods provided herein relates generally to measurements of selected biomarker genes or gene sets shown to be differentially expressed in various types of SCC, such as HNSCC and OSCC.
  • differentially expressed biomarker genes or gene sets that may be used to identify SCC are referred to herein as squamous cell carcinoma indicator gene sets (SCCIGS), which are exemplified in Figures 2 and 7 and detailed below.
  • SCCIGS squamous cell carcinoma indicator gene sets
  • subsets of such differentially expressed biomarker genes or gene sets that may be used to identify aggressively neoplastic SCC that, as described herein, may be indicators of an increased risk of OSCC-specific mortality are referred to herein as squamous cell carcinoma indicator gene set subsets (SCCIGSS), which are exemplified in Tables 2 and 4 and which are described in greater detail below.
  • determination and comparison of the expression levels of one or more selected biomarker genes or gene sets provide novel and useful parameters for diagnosing the risk or presence of SCC in a subject.
  • certain embodiments of the present invention relate to methods for identifying the risk or presence of SCC in a subject by comparing the expression levels of selected biomarker genes or gene sets in a biological sample from that subject to the expression levels of those same biomarker genes or gene sets in an appropriate control, such as a normal tissue known to be free of SCC (e.g., a control tissue or a reference tissue).
  • the presence or risk of SCC may be identified by the differential expression of the selected biomarker genes or gene sets in the subject sample as compared to the control.
  • Certain embodiments may also include a simple genetic test based on the gene expression profile of one or more selected biomarker genes, such as a selected SCCIGS.
  • a simple genetic test may employ selected probes or probe sets that are specific for one or more SCCIGS as provided herein, including but not limited to the Affymetrix oligonucleotide probes exemplified herein and variants thereof, to measure the gene expression levels of a SCCIGS or other biomarker gene, and to compare those levels to a reference SCCIGS expression level in an appropriate control (e.g., a control tissue or a reference tissue).
  • an appropriate control e.g., a control tissue or a reference tissue.
  • certain embodiments further contemplate identifying substantial down-regulation (e.g., expression that is reduced in a statistically significant manner by at least 50%, 60%, 70%, 80%, 90%, 95% or more, relative to an appropriate control group) of a subset of SCCIGS, referred to as SCCIGSS, where such down-regulation may indicate an increased risk of OSCC-specific mortality relative to other OSCC cases identified according to the disclosure herein.
  • substantial down-regulation e.g., expression that is reduced in a statistically significant manner by at least 50%, 60%, 70%, 80%, 90%, 95% or more, relative to an appropriate control group
  • patients who develop local recurrence and/or second primary oral tumors are those whose surgical margins or uninvolved buccal mucosa harbor molecular changes that are found in oral dysplasia or invasive OSCC.
  • the predictive models provided herein may be used to test biopsies of histologically normal surgical margins and clinically normal oral mucosa of OSCC patients, in order to identify a risk for having, or the presence of, local recurrence and/or second primary oral cancer.
  • the strong predictive power of the presently disclosed biomarker genes and/or gene sets when used according to the methods described herein may be exploited generally to differentiate between normal cells, pre-neoplastic cells (i.e., dysplasia), and SCC tumor cells, including a variety of HNSCC tumor cells, such as OSCC tumor cells.
  • this predictive power may find use in a clinical setting, for example, to identify or monitor subjects having SCC, or subjects at risk for developing SCC, and may also find use in research settings, for instance, to further characterize the underlying biological bases of SCC oncogenesis and pathology.
  • Squamous cell carcinoma generally includes malignant tumors of squamous epithelium (i.e., epithelium that shows squamous cell differentiation), which may occur in many different organs, including the skin, lips, mouth, esophagus, urinary bladder, prostate, lungs, vagina, and cervix.
  • Squamous cells form the surface tissue layer (i.e., epithelium) of much of the body, and include cells of the skin and mucous membranes.
  • SCC is thought to derive from keratinizing or malpighian epithelial cells.
  • keratin or "keratin pearls," on histologic evaluation.
  • SCC is morphologically variable, and may appear, by way of non-limiting example, as plaques, nodules, or verrucae. SCC usually begins as surface lesions with erythema and slight elevation, often termed erythroplasia. Some early SCC lesions may appear to be pure white, and are referred to as leukoplakia, but only a small percentage of leukoplakia lesions represent carcinoma in situ or invasive carcinoma. Erythoplasia, or early red lesions, are typically asymptomatic and may represent either carcinoma in situ or invasive carcinoma.
  • Tender painful lesions usually are suggestive of perineural invasions. When lesions become palpable masses, symptoms such as a vague persistent sore throat or ear infection typically occur. In more advanced cases, dissemination to ipsilateral submandibular and jugulodigastric nodes is common, and a subject suspected of having SCC may present with a mass in the neck. When lymph node or remote bone and organ metastases are associated with an early oral primary lesion, often a second, more advanced primary upper aero-digestive or lung cancer is responsible for the metastases.
  • HNSCC Head-and-neck squamous cell carcinoma
  • HNSCC refers generally to a group of biologically similar squamous cell carcinomas originating from the upper aero-digestive tract, including the lip, oral cavity (mouth), nasal cavity, paranasal sinuses, pharynx, and larynx, among others. HNSCC often spreads to the lymph nodes of the neck, which may represents the first, and sometimes only, manifestation of the disease at the time of diagnosis.
  • HNSCCs are typically characterized by their originating tissues.
  • HNSCCs may arise from the salivary glands, which produce saliva, the fluid that keeps mucosal surfaces in the mouth and throat moist.
  • the major salivary glands may be found in the floor of the mouth and near the jawbone.
  • the paranasal sinuses are small hollow spaces in the bones of the head surrounding the nose.
  • the nasal cavity is the hollow space inside the nose.
  • HNSCCs may also originate in the pharynx.
  • the pharynx is essentially a hollow tube common to the upper digestive and respiratory tracts, originating behind the nose, forming the throat lumen and leading to the esophagus and the trachea.
  • the pharynx has three parts, the nasopharynx, the oropharynx, and the hypopharynx.
  • Nasopharyngeal cancer arises in the nasopharynx, the region in which the nasal cavities and the Eustachian tubes connect with the upper part of the throat. Oropharyngeal cancer often begins in the oropharynx, the middle part of the throat that includes the soft palate, the base of the tongue, and the tonsils.
  • the hypopharynx includes the pyriform sinuses, the posterior pharyngeal wall, and the postcricoid area.
  • HNSCC may also originate in the larynx, or "voice box.” Such cancers may occur on the vocal folds themselves (i.e., "glottic” cancer), or on tissues above and below the true cords (i.e., "supraglottis and "subglottic” cancers, respectively). Laryngeal cancer is strongly associated with tobacco smoking. In general, HNSCC is highly curable if detected early, usually with some form of surgery, although chemotherapy and radiation therapy may also play an important role.
  • SCC of the oral cavity, or mouth may represent one particular aspect of HNSCC, and is typically referred to as oral squamous cell carcinoma (OSCC).
  • OSCC is associated with substantial mortality and morbidity.
  • OSCC relates generally to the formation of SCC in the area extending from the vermilion border of the lips to a plane between the junction of the hard and soft palate superiorly and the circumvallate papillae of the tongue infehorly.
  • This area includes, for example, the front two thirds of the tongue, the gingiva (gums), the buccal mucosa (the lining of the inside of the cheeks, the floor (bottom) of the mouth under the tongue, the hard palate (the roof of the mouth), and/or the retromolar trigone (the small area behind the wisdom teeth).
  • OSCC typically spreads primarily by either local extension or by the lymphatic system.
  • the extent of tumor invasion depends upon the anatomic site, the tumor's biologic aggressiveness, and host response factors.
  • the lymphatic system is the most important and frequent route of metastasis in OSCC.
  • the ipsilateral cervical lymph nodes are the primary site for metastatic deposits, but occasionally contralateral or bilateral metastatic deposits may be detected.
  • the risk for lymphatic spread is greater for posterior lesions of the oral cavity, possibly because of delayed diagnosis or increased lymphatic drainage at those sites, or both.
  • Cervical lymph nodes with metastatic deposits tend to appear as firm-to-hard, nontender enlargements. Once the tumor cells perforate the nodal capsule and invade the surrounding tissue, these lymph nodes often become fixed and non-mobile. Metastatic spread of tumor deposits from oral carcinoma usually occurs in an orderly pattern, beginning with the uppermost lymph nodes and spreading down the cervical chain.
  • lymph node metastasis is not an early event, many individuals with oral cancer nonetheless present at diagnosis with nodal metastasis. Hematogenous spread of tumor cells is infrequent in the oral cavity but may occur because of direct vascular invasion or seeding from surgical manipulation.
  • dysplasia refers generally to a maturation abnormality of cells within a tissue, which often involves the expansion of immature cells and a corresponding decrease in the number of mature cells at a given site. Dysplasia is often indicative of an early or preneoplastic process. The term dysplasia is typically used when the cellular abnormality is restricted to the originating tissue, as in the case of an early, in-situ neoplasm.
  • Dysplasia is often considered the earliest form of pre-cancerous lesion recognizable in a biopsy, and dysplasia relevant to HNSCC or OSCC typically relates to dysplasia of epithelial cells.
  • Dysplasia may be further characterized as "low grade” or "high grade.” The risk of low grade dysplasia transforming into high grade dysplasia and, eventually, cancer is low.
  • High grade dysplasia represents a more advanced progression towards malignant transformation, with increased risk of developing a carcinoma in situ.
  • Carcinoma in situ meaning "cancer in place,” represents generally the transformation of a neoplastic lesion to one in which cells undergo essentially no maturation, and thus may be considered cancer-like. In this state, cells are often considered to have lost their tissue identity, and have reverted to a primitive cell form that grows rapidly and without regulation. This form of cancer, however, often remains localized, and has not invaded into tissues below the surface.
  • Invasive carcinoma refers generally to a cancer that has invaded beyond the original tissue layer or basement membrane and may be able to spread to other parts of the body (i.e., metastasize).
  • metastasize The molecular events involved in the development of squamous dysplasia and subsequent carcinoma are poorly understood.
  • a subject may include any animal, any mammal, and particularly any human individual having, at risk for having, or suspected of having, a SCC tumor cell, such as a HNSCC tumor cell, an OSCC tumor cell, and/or a pre-neoplastic growth, such as a dysplastic cell.
  • a SCC tumor cell such as a HNSCC tumor cell, an OSCC tumor cell, and/or a pre-neoplastic growth, such as a dysplastic cell.
  • a subject may have previously undergone treatment for SCC and may be at risk for developing another case of SCC, or may be newly suspected of having SCC or SCC-related dysplasia.
  • Subjects may be identified according to routine clinical techniques described herein and known in the art, including, for example, by clinical examination of the head and neck, skin, mouth, or other relevant tissue (see, e.g., Epstein et al., Can Fam Physician.
  • Toluidine blue also provides a useful adjunct to clinical examination.
  • the mechanism of vital staining is based on selective binding of the dye to dysplastic or malignant cells in the oral epithelium (e.g., Helsper, CA Cane. J. Clin. 22:172, 1972).
  • toluidine blue selectively stains for acidic tissue components and thus binds more readily to DNA, which is increased in neoplastic cells.
  • Vital staining can also help to determine the most appropriate biopsy sites and to surgically delineate margins.
  • Diagnostic imaging evaluation such as either computer tomography (CT) scanning or magnetic resonance imaging (MRI), may also be used to identify a subject at risk for SCC, and further to assess the extent of local and/or regional tumor spread, the depth of invasion, and the extent of lymphadenopathy.
  • CT is often considered superior in detecting early bone invasion and lymph node metastasis, but MRI is typically preferred for assessing the extent of soft tissue involvement and for providing a three-dimensional display of the tumor.
  • MRI is also the preferred technique for imaging carcinoma of the nasopharynx or lesions involving paranasal sinuses or the skull base.
  • Typical symptoms associated with SCC such as HNSCC or OSCC may include, for example, a sore on the lip or in the mouth that does not heal, a lump or thickening on the lips or gums or in the mouth, a white or red patch on the gums, tongue, tonsils, or lining of the mouth, bleeding, pain, or numbness in the lip or mouth, change in voice, loose teeth or dentures that no longer fit well, trouble chewing or swallowing or moving the tongue or jaw, swelling of jaw, and/or sore throat or feeling that something is caught in the throat.
  • a subject having or suspected of having SCC may be identified by oral lesions that appear in areas of erythroplakia or leukoplakia, and which may be exophytic or ulcerated. Both the latter variants are typically indurated and firm with a rolled border. Tonsillar carcinoma in a subject usually presents as an asymmetric swelling and sore throat in which pain often radiates to the ipsilateral ear; a metastatic mass in the neck may be the first symptom.
  • OSCC associated lesions are described in Detecting Oral Cancer, A Guide for Health Care Professionals, U.S. Department of Health and Human Services, National Institutes of Health, Bethesda, MD.
  • Risk factors for HNSCC and/or OSCC may include, for example, tobacco product use, heavy alcohol use, exposure to sunlight (e.g., lower lip SCC), being male, and being infected with human papillomavirus (HPV) or Epstein-Barr virus (EBV).
  • HPV human papillomavirus
  • EBV Epstein-Barr virus
  • Environmental exposures to paint fumes, plastic byproducts, wood dust, asbestos, and/or gasoline fumes have also been implicated as risk factors.
  • Gastroesophageal reflux disease is thought to be a significant risk factor for cancer of the larynx, and especially the anterior two thirds of the vocal cords. Irritation from poorly fitting dentures also has been implicated.
  • Biological samples may be provided by obtaining a blood sample, biopsy specimen, tissue explant, organ culture or any other tissue or cell preparation from a subject or a biological source, including tissue extracts or lysates derived from biopsies, cell extracts or lysates, nucleic acid extracts ⁇ e.g., RNA or DNA), and/or protein extracts and/or biological fluids including body fluids.
  • the subject or biological source may be a human or non-human animal, a primary cell culture or culture adapted cell line including but not limited to genetically engineered cell lines that may contain chromosomally integrated or episomal recombinant nucleic acid sequences, immortalized or immortalizable cell lines, somatic cell hybrid cell lines, differentiated or differentiatable cell lines, transformed cell lines and the like.
  • the subject or biological source may be suspected of having or being at risk for having SCC, and in certain preferred embodiments of the invention the subject or biological source may be known to be free of a risk or presence of such a condition according to current art-accepted criteria with which the skilled person will be familiar.
  • a biological sample may include any type of cell-containing or cell- or tissue-derived sample that may be isolated, obtained, or derived from a subject and utilized to determine whether the cells from that subject show the differential SCCIGS expression profile that is characteristic of a SCC tumor cell as provided herein, or of a dysplastic cell as provided herein.
  • the expression level(s) may be determined for one or more SCC- related biomarker genes or SCCIGS as identified herein, using the methods described herein and molecular biology techniques as known in the art.
  • a suitable biological sample ⁇ e.g., a biological sample as provided herein) typically may be suspected of comprising a SCC tumor cell or a dysplastic cell, such as a dysplastic epithelial cell.
  • a biological sample may also include whole cells or fixed cells.
  • Other typical sources of biological samples include cell cultures, as noted above, including but not limited to those in which gene expression states may be manipulated to explore the relationship among genes (including, e.g., SCCIGS).
  • Biopsy tissues may be obtained, for example, using surgical scalpels, needles, biopsy punches or other means, and typically can be performed under local anesthesia. Incisional biopsy typically refers to the removal of a representative sample of the lesion; excisional biopsy typically refers to the complete removal of the lesion, with a border of normal tissue. A clinician may obtain multiple biopsy specimens of suspicious lesions to define the extent of the primary disease and to evaluate the patient for the presence of possible synchronous second malignancies. Useful adjuncts to biopsies include vital staining, exfoliative cytology, fine needle aspiration biopsy, routine dental radiographs and other plain films, and imaging with magnetic resonance imaging (MRI) or computed tomography (CT).
  • MRI magnetic resonance imaging
  • CT computed tomography
  • Biopsy tissues may include, by way of non-limiting example, excised tumors or suspected tumors, tumor-positive margin tissues, tumor- negative margin tissues, and/or close margin tissues.
  • Margin tissues refer generally to SCC-related surgical margins, which in turn relate to the area of tissue around the clinical border of a SCC tumor that should be surgically removed to reduce the chance of tumor recurrence at the margins of skin excision.
  • SCC surgical margins can range from about 3 mm to about 1 cm or more around the histologically established border of the SCC tumor, the size of which may be based in part on the staging by TNM classification to determine whether the tumor is considered a low-risk or high-risk tumor (see, e.g., Wittekind, Ch; Sobin, L. H. (2002). TNM classification of malignant tumours. New York: Wiley-Liss). High-grade tumors typically afford larger surgical margins, whereas low-grade tumors typically afford smaller surgical margins.
  • surgical margin tissues may be monitored for the differential expression of SCCIGS or SCC gene biomarkers after initial excision, such as for post-operative confirmation of tumor- negative margins, or during a follow-up period, such as for monitoring the potential recurrence of SCC tumor-positive cells in the surgical margin tissues (see, e.g., de Visscher et ai, International Journal of Oral and Maxillofacial Surgery 31 :154-157, 2002).
  • biological fluids include body fluids such as blood, serum and serosal fluids, plasma, lymph, urine, cerebrospinal fluid, saliva, mucosal secretions of the secretory tissues and organs, vaginal secretions, ascites fluids such as those associated with non-solid tumors, fluids of the pleural, pericardial, peritoneal, abdominal and other body cavities, and the like.
  • Biological fluids may also include liquid solutions contacted with a subject or biological source, for example, cell and organ culture medium including cell or organ conditioned medium, lavage fluids and the like.
  • the biological sample is saliva
  • the biological sample is blood or a fluid fraction thereof (e.g., serum, plasma), or lymph.
  • the biological sample is a cell-free liquid solution.
  • Certain embodiments of the present invention relate to the identification and use of one more selected Squamous Cell Carcinoma Gene Sets (SCCIGS).
  • SCCIGSs may be used to detect SCC in a subject, to identify a risk of developing SCC in a subject, and/or to monitor for the recurrence of SCC in a subject.
  • SSCIGSs represent one or more biomarker genes, alone or in selected combinations (i.e., a biomarker gene set), that identify a risk for having, or the presence of, SCC in a subject when the expression levels of the SSCIGS in a suspected biological sample reflect differential expression (e.g., in a statistically significant manner) compared to the expression levels of the same SSCIGS in control epithelial cells that are known to be free of SCC cells.
  • a gene relates generally to a unit of inheritance that occupies a specific locus on a chromosome and includes transcriptional and/or translational regulatory sequences and/or a coding region (e.g., a polypeptide encoding region or a region encoding a structural RNA such as tRNA or rRNA, or a functional RNA such as a miRNA) and/or non-translated sequences (i.e., introns, 5' and 3' untranslated sequences).
  • the individual genes in a given SSCIGS may be either over-expressed or under-expressed (i.e., statistically significant higher or lower expression levels) in the biological sample comprising an SCC cell when compared to the control (non-cancer) cell.
  • it is the differential expression of one or more selected SCCIGS that may identify the presence or risk of SCC in a subject.
  • a SCCIGS may be selected from the preferred SCCIGS models exemplified in Figure 3 (see SCCIGS models 1 -10).
  • Figures 2 and 7 also provide a list of predictive SCC biomarker genes one or more of which may be selected to generate a suitable SCCIGS (see, e.g., SEQ ID NOS:1 -200) for identifying the risk or presence of SCC in a subject.
  • SCC biomarker genes and SCCIGS gene sets described herein were identified according to the exemplary processes described in Example 1 below and in Chen et al., Cancer Epidemiol Biomarkers Prev 17(8) Published Online July 30, 2008.
  • the data provided herein were generated by comparing gene expression in samples of (i) incident primary OSCC, (ii) oral dysplasia, and (iii) clinically normal oral tissue from surgical patients without head and neck cancer or pre-neoplastic oral lesions (controls), using Affymetrix U133 2.0 Plus arrays.
  • Selected differentially expressed probe sets and their corresponding biomarker genes were identified using a training set of 119 OSCC patients and 35 controls (see Figures 2 and 7; and SEQ ID NOS:1 -200).
  • SCCIGSs that may be utilized according to the present methods include, for example, a SCCIGS that includes or consists essentially of the LAMC2 (SEQ ID NOS:18 and 19) and COL4A1 (SEQ ID NO:113) genes; a SCCIGS that includes or consists essentially of the COL1A1 (SEQ ID NOS:20 and 21 ) and PADH (SEQ ID NO:145) genes; a SCCIGS that includes or consists essentially of the C21orf81 gene (SEQ ID NOS:186 and 187); a SCCIGS that includes or consists essentially of the KRT17 (SEQ ID NO:59) and PRSS3 (SEQ ID NOS:123 and 124) genes; a SCCIGS that includes or consists essentially of the COL1A2 (SEQ ID NO:22) and EST 230740_1 at (SEQ ID NO:198) genes; a SCCIGS that includes or consists essentially of the COL1A1 and XLKD1 (SEQ
  • SCC biomarker gene or SSCIGS are “variants” of the biomarker or SSC indicator genes described herein (see, e.g., SEQ ID NOS:1 -200), such as splice variants, isoforms, allelic variants (same locus), homologs (different locus), and orthologs (different organism), or biological functional equivalents of such genes.
  • Such variants may also include biomarker genes comprising polynucleotide sequences having 30, 40, 50, 60, 70, 75, 80, 85, 90, 95, 97, 98, 99% identity to a sequence set forth in SEQ ID NOS:1- 200, and which are differentially expressed in a SCC cell as compared to a control cell known to be free of SCC.
  • biomarker genes comprising polynucleotide sequences having 30, 40, 50, 60, 70, 75, 80, 85, 90, 95, 97, 98, 99% identity to a sequence set forth in SEQ ID NOS:1- 200, and which are differentially expressed in a SCC cell as compared to a control cell known to be free of SCC.
  • Such variants also encompass polynucleotide sequences that are distinguished from a reference polynucleotide by the addition (e.g., insertion), deletion or substitution of at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15,
  • measurements of SSCIGS expression levels may include determination in a biological sample from a subject as provided herein, of ribonucleic acid (RNA) and/or protein abundances, or protein activity levels.
  • RNA ribonucleic acid
  • the expression levels of a biomarker gene or gene set, such as a SCCIGS may be determined according to the RNA transcript levels of the individual genes within that gene set, such as by measuring the levels of a specific mRNA that is the transcription product of a given SCCIGS gene.
  • RNA transcripts may include, but are not limited to, pre-mRNA nascent transchpt(s), transcript processing intermediates, mature mRNA(s), and degradation products, in addition to nucleic acid amplification products of such sequences (e.g., cDNAs).
  • RNA levels may be determined according to techniques known in the art and exemplified herein.
  • RNA levels can be measured by utilizing arrays, such as RNA microarray-based techniques known in the art (see, e.g., Goley et al., BMC Cancer 4:20, 2004, performing RNA microarray analysis on needle core biopsies of tumors), and described herein (see Example 1 ), or by utilizing quantitative reverse-transcriptase polymerase chain reactions (qRT-PCR) (see Example 2).
  • arrays such as RNA microarray-based techniques known in the art (see, e.g., Goley et al., BMC Cancer 4:20, 2004, performing RNA microarray analysis on needle core biopsies of tumors), and described herein (see Example 1 ), or by utilizing quantitative reverse-transcriptase polymerase chain reactions (qRT-PCR) (see Example 2).
  • qRT-PCR quantitative reverse-transcriptase polymerase chain reactions
  • RNA levels can be determined by relying on other quantitative RNA assays known in the art (see,
  • RNA levels may, for instance, be determined by reverse transcribing the mRNA transcript of a given gene to form a cDNA molecule, optionally amplifying the cDNA molecule, and measuring the levels of the DNA molecule, such as by quantitative real-time PCR (qRT-PCR, see, e.g., VanGuilder et al., Biotechniques. 44:619-26, 2008).
  • qRT-PCR quantitative real-time PCR
  • Examples of other useful techniques for determining the amount of nucleic acid target sequences ⁇ e.g., a mRNA transcript of a biomarker gene) present in a sample based on specific hybridization of an oligonucleotide primer or probe to the target sequence include specific amplification of target nucleic acid sequences and quantification of amplification products, including but not limited to polymerase chain reaction (PCR, Gibbs et al., Nucl. Ac. Res. 77:2437, 1989), transcriptional amplification systems, strand displacement amplification and self-sustained sequence replication (3SR, Gingeras et al., J. Infect. Dis. 164:1066, 1991 ).
  • PCR polymerase chain reaction
  • PCR Gibbs et al., Nucl. Ac. Res. 77:2437, 1989
  • transcriptional amplification systems strand displacement amplification and self-sustained sequence replication
  • ligase chain reaction e.g., Landegren et al., Science 241 :1077, 1988; Nickerson et al., Proc. Natl. Acad. Sci. USA 87:8923 1990; Barany, Proc. Natl. Acad. Sci. USA 88:189, 1991 ; Wu et al., Genomics 4:560, 1989
  • cycled probe technology e.g., Landegren et al., Science 241 :1077, 1988; Nickerson et al., Proc. Natl. Acad. Sci. USA 87:8923 1990
  • Barany, Proc. Natl. Acad. Sci. USA 88:189, 1991 Wu et al., Genomics 4:560, 1989
  • cycled probe technology e.g., Landegren et al., Science 241 :1077, 1988; Nickerson et al., Proc. Natl. Acad. Sci. USA 87:8923 1990;
  • ISH In situ hybridization
  • SCC SCC biomarker
  • Protein levels may also be determined according to certain embodiments, to determine the expression level(s) of an individual gene or genes in an SCCIGS, and thereby determine the SCCIGS expression level. Protein levels may be measured either directly, such as by measuring the amount of protein in an extract, cell, tissue, or other biological sample, or indirectly, such as by measuring the amount of protein activity in biological sample.
  • Protein levels may be measured from cell or tissue extracts, or from whole cells or tissues. Protein extraction from cell samples may be performed according to any of a number of methodologies with which the skilled person will be familiar. For instance, depending on whether the collection of particular cell fractions is desired, cell samples may be typically lysed using one or more of a hypotonic buffer, urea, a chaotrope ⁇ e.g., guanidine-HCI) and buffers containing various detergents, such as NonidetTM NP-40, TritonTM -X100, TweenTM-20, alkyl glucosides, betaine-containing surfactants, sodium dodecyl sulfate (SDS) or other detergents recognized in the art for this purpose, and further processed to be compatible with an intended assay.
  • a hypotonic buffer urea
  • a chaotrope ⁇ e.g., guanidine-HCI
  • buffers containing various detergents such as NonidetTM NP-40, TritonTM -
  • cells or tissues can be either placed in an appropriate analysis buffer for live analysis, or fixed with various fixation agents, such as formaldehyde, paraformaldehyde, methanol, or ethanol, among others, followed by further processing according to the requirements of the intended assay ⁇ e.g., flow cytometry, immunohistochemistry, etc.).
  • fixation agents such as formaldehyde, paraformaldehyde, methanol, or ethanol
  • protein or proteomic arrays may be used in certain embodiments, such as antibody microarrays, in which antibodies specific for one or more proteins of interest are spotted onto a protein chip and are used as capture molecules to detect and quantify the proteins from biological samples, such as cell lysate solutions (see generally, Jones et al. Nature 439:168-174, 2006; and Chen et al. Curr Opin Chem Biol 10:28-34, 2006 for protein arrays).
  • Antibodies specific for those proteins expressed by the SCC biomarker genes or gene sets described herein may be generated using techniques known in the art (see, e.g., Harlow and Lane, eds.
  • proteomic array based techniques that may be applied to clinical tissue samples include tissue microarrays and surface- enhanced laser desorption/ionization (SELDI-TOF) (see, e.g., Bertucci et al., MoI Cell Proteomics. 5:1772-86, 2006; and Bollard et ai, Proteomics: Clinical Applications, 1 :934-954, 2007).
  • Certain aspects may employ other techniques to measure protein levels, including, but not limited to, western blotting, radio- immunoprecipitation, proteomics, and flow cytometry (see, e.g., Prinz et al. Proteomics. 8:1179-96,2008; and Peterson et ai, Toxicol Pathol.
  • an array typically comprises a solid support with peptide or nucleic acid-based probes attached covalently or non- covalently to the support, but the presently contemplated embodiments need not be so limited and may also encompass assays based on fluid-phase interactions.
  • arrays typically comprise a plurality of different nucleic acid or peptide probes that are coupled to a surface of a substrate in discrete, known locations.
  • arrays may be produced using mechanical synthesis methods or light directed synthesis methods, incorporating a combination of photolithographic methods and solid phase synthesis methods. Techniques for the synthesis of these arrays using mechanical synthesis methods are described, for example, in U.S. Pat. Nos. 5,384,261 , and 6,040,193. Arrays may be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. In certain embodiments, an array is fabricated on a planar array surface.
  • Arrays may in other embodiments take the form of peptides or nucleic acids on beads, gels, polymeric surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992.
  • qRT-PCR quantitative real-time polymerase chain reaction
  • certain embodiments may include the use of selected oligonucleotide probes.
  • Oligonucleotide probes refer generally to polymers composed of a multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related structural variants or synthetic analogues thereof) linked via phosphodiester bonds (or related structural variants or synthetic analogues thereof).
  • oligonucleotide probe typically refers to a nucleotide polymer in which the nucleotide residues and linkages between them are naturally occurring, it will be understood that the term also includes within its scope various analogues including, but not restricted to, peptide nucleic acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl ribonucleic acids, and the like.
  • PNAs peptide nucleic acids
  • phosphoramidates phosphoramidates
  • phosphorothioates phosphorothioates
  • methyl phosphonates 2-O-methyl ribonucleic acids
  • an oligonucleotide probe can vary depending on the particular application.
  • An oligonucleotide is typically rather short in length, generally from about 10 to 30 nucleotide residues (e.g., 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotide residues), but the term can refer to molecules of any length, including oligonucleotides from about 30 to about 100 or more nucleotide residues in length (e.g., 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 52, 53, 54, 55, 56, 57, 58, 59, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more, including all integers in between).
  • an oligonucleotide probe may affect its ability specifically to bind or hybridize to its intended target sequence (e.g., a complementary sequence, according to well-established principles of Watson-Crick base pairing), which refers generally to its ability to bind more strongly to its intended target sequence than to any other sequences in a given sample, and, thus, to discriminate between its intended target and the other sequences present in the sample.
  • an oligonucleotide probe may be about 25 nucleotide residues in length, including, for example, certain Affymethx® probes (see Example 1 ).
  • Oligonucleotide probes may be selected or designed according to routine techniques known in the art and described herein for their ability to specifically bind or hybridize to an intended target sequence.
  • polynucleotides or oligonucleotide probes may be designed or selected to specifically hybridize or bind to the individual genes in certain preferred SSCIGSs (see Figure 3), or may be designed or selected to specifically hybridize or bind more to one or more SCC gene biomarkers, as described herein (see Figures 2 and 7; and SEQ ID NOS:1 -200).
  • a polynucleotide or oligonucleotide probe may also be selected to specifically bind or hybridize to variants, whether naturally- occurring or otherwise (e.g., allelic variants, splice variants), of the SCC biomarker genes described herein.
  • an oligonucleotide probe typically comprises a polynucleotide sequence that is complementary to at least a portion of the polynucleotide sequence of a target gene, e.g., the exemplary target genes of SEQ ID NOS:1-200.
  • complementary or complementarity refers generally to polynucleotides related by the well-known nucleotide base-pairing rules.
  • the sequence "A-G-T,” is complementary to the sequence “T-C-A.”
  • Complementarity may be “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules, or there may be “full” or “total” complementarity between the nucleic acids.
  • the degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands.
  • corresponds to or “corresponding to” is typically meant a polynucleotide having a nucleotide sequence that is substantially identical or complementary to all or a portion of a reference polynucleotide sequence.
  • an oligonucleotide probe is fully complementary to at least a portion of the polynucleotide sequence of a target gene, since simultaneous consideration of the percent similarity ( ⁇ 90%), the length of identical sequence stretches ( ⁇ 20 bases), and the binding free energy (>-35 kcal mol-1 ) may be predictive of probe specificity (see, e.g., Liebich et al., Appl Environ Microbiol. 72:1688-1691 , 2006).
  • a polynucleotide or probe is fully complementary when there are no base mismatches between the probe and the relevant portion of the target sequence.
  • oligonucleotide probes or other polynucleotide sequence that is complementary to the polynucleotide sequence of the target gene
  • certain characteristics of such probes or sequences may be considered to optimize the ability of the probe to specifically hybridize and detect the target sequence. For example, to avoid false positives, if there is substantial sequence information available for a given source organism (e.g., human) or cell type (e.g., oral epithelial cell), oligonucleotide probes may be chosen that are not similar to any other expressed sequences in that organism or cell type.
  • polynucleotide sequences such as those containing inverted repeats, may be able to self-hybridize and form secondary structures that interfere with specific detection of target sequences. Typically, such sequences may be avoided to improve probe specificity.
  • oligonucleotide probes of "high complexity,” as opposed to probes of "low complexity,” may provide more specific target sequence detection.
  • One example of a probe with low complexity includes “AAAAAAA GGAGTTTTTTTTTT CAAAAAACTTTTT AAAAAAGCTTT” (SEQ ID NO:332).
  • One example of a probe with higher complexity includes “CGTGACTGA CAGCTGACTGC TAGCCATGCAAC” (SEQ ID NO:333).
  • oligonucleotide probes that may used in the methods described herein include the Affymethx oligonucleotide probes described in SEQ ID NOS:201 -331 (see, e.g., Figures 2 and 8). Also contemplated are polynucleotide variants of these probes that are capable of specifically hybridizing or binding to a SCC biomarker gene, including one or more genes identified as a SSCIGS.
  • Polynucleotide variants refer to either a polynucleotide that displays substantial sequence identity with a reference polynucleotide sequence (e.g., at least 70, 75, 80, 85, 90, 91 , 92, 93, 94, 95, 96, 97, 98, or 99% identity to SEQ ID NOS:201 -331 ) or a polynucleotide that hybridizes with a reference sequence, or its complementary sequence, under moderate or stringent conditions that are described hereinafter.
  • a reference polynucleotide sequence e.g., at least 70, 75, 80, 85, 90, 91 , 92, 93, 94, 95, 96, 97, 98, or 99% identity to SEQ ID NOS:201 -331
  • a polynucleotide that hybridizes with a reference sequence, or its complementary sequence under moderate or stringent conditions that are described hereinafter.
  • Oligonucleotide probes may be modified according to techniques known in the art, such as to improve stability or facilitate detection.
  • oligonucleotides probes may be modified by directly attaching thereto one or more detectable molecules, as described below.
  • Oligonucleotide probes may also be modified by attaching thereto one or more ligand molecules, such as biotin, that may be used to indirectly attach a detectable molecule, such as a detectable molecule that is bound to one or more avidin molecules.
  • detectable molecules may be used to render an oligonucleotide probe detectable, such as a radioisotopes, fluorochromes, dyes, enzymes, nanoparticles, chemiluminescent markers, biotin, or other monomer known in the art that can be detected directly (e.g., by light emission) or indirectly (e.g., by binding of a fluorescently-labeled antibody).
  • a radioisotopes such as a radioisotopes, fluorochromes, dyes, enzymes, nanoparticles, chemiluminescent markers, biotin, or other monomer known in the art that can be detected directly (e.g., by light emission) or indirectly (e.g., by binding of a fluorescently-labeled antibody).
  • Radioisotopes provide examples of detectable molecules that can be utilized in certain aspects of the present invention.
  • Several radioisotopes can be used as detectable molecules for labeling nucleotides or proteins, including, for example, 32 P, 33 P, 35 S, 3 H, and 125 I. These radioisotopes have different half- lives, types of decay, and levels of energy which can be tailored to match the needs of a particular protocol.
  • 3 H is a low energy emitter which results in low background levels, however this low energy also results in long time periods for autoradiography.
  • Radioactively labeled ribonucleotides, deoxyribonucleotides and amino acids are commercially available.
  • Nucleotides are available that are radioactively labeled at the first, or ⁇ , phosphate group, or the third, or ⁇ , phosphate group.
  • both [ ⁇ - 32 P] dATP and [ ⁇ - 32 P] dATP are commercially available.
  • different specific activities for radioactively labeled nucleotides are also available commercially and can be tailored for different protocols.
  • fluorophores can be used for labeling nucleotides including, for example, fluorescein, tetramethylrhodamine, Texas Red, and a number of others (e.g., Haugland, Handbook of Fluorescent Probes - 9th Ed., 2002, Molec. Probes, Inc., Eugene OR; Haugland, The Handbook: A Guide to Fluorescent Probes and Labeling Technologies-10th Ed., 2005, Invitrogen, Carlsbad, CA).
  • Non-radioactive and non-fluorescent detectable molecules are also available.
  • biotin can be attached directly to nucleotides and detected by specific and high affinity binding to avidin or streptavidin which has been chemically coupled to an enzyme catalyzing a colohmethc reaction (such as phosphatase, luciferase, or peroxidase).
  • Digoxigenin labeled nucleotides can also similarly be used for non-isotopic detection of nucleic acids. Biotinylated and digoxigenin-labeled nucleotides are commercially available.
  • Nanoparticles also can be used to label oligonucleotide probes. These particles range from 1 -1000 nm in size and include diverse chemical structures such as gold and silver particles and quantum dots. When irradiated with angled incident white light, silver or gold nanoparticles ranging from 40-120 nm will scatter monochromatic light with high intensity. The wavelength of the scattered light is dependent on the size of the particle. Four to five different particles in close proximity will each scatter monochromatic light, which when superimposed will give a specific, unique color. The particles are being manufactured by companies such as Genicon Sciences (Carlsbad, CA).
  • Derivatized silver or gold particles can be attached to a broad array of molecules including, proteins, antibodies, small molecules, receptor ligands, and nucleic acids.
  • the surface of the particle can be chemically derivatized to allow attachment to a nucleotide.
  • Quantum dots are fluorescing crystals 1 -5 nm in diameter that are excitable by light over a large range of wavelengths. Upon excitation by light having an appropriate wavelength, these crystals emit light, such as monochromatic light, with a wavelength dependent on their chemical composition and size.
  • Quantum dots such as CdSe, ZnSe, InP, or InAs possess unique optical properties; these and similar quantum dots are available from a number of commercial sources ⁇ e.g., NN-Labs, Fayetteville, AR; Ocean Nanotech, Fayetteville, AR; Nanoco Technologies, Manchester, UK; Sigma-Aldhch, St. Louis, MO).
  • the size classes of the crystals are created either 1 ) by tight control of crystal formation parameters to create each desired size class of particle, or 2) by creation of batches of crystals under loosely controlled crystal formation parameters, followed by sorting according to desired size and/or emission wavelengths.
  • Two examples of references in which quantum dots are embedded within intrinsic silicon epitaxial layers of semiconductor light emitting/detecting devices are United States Patent Nos. 5,293,050 and 5,354,707 to Chappie Sokol, et al.
  • oligonucleotide probes may be labeled with one or more light-emitting dyes.
  • the light emitted by the dyes can be visible light or invisible light, such as ultraviolet or infrared light.
  • the dye may be a fluorescence resonance energy transfer (FRET) dye; a xanthene dye, such as fluorescein and rhodamine; a dye that has an amino group in the alpha or beta position (such as a naphthylamine dye, 1- dimethylaminonaphthyl-5-sulfonate, 1 -anilino-8-naphthalende sulfonate and 2-p- touidinyl-6-naphthalene sulfonate); a dye that has 3-phenyl-7- isocyanatocoumarin; an achdine, such as 9-isothiocyanatoachdine and acridine orange; a pyrene, a benso
  • FRET fluor
  • HEX 5-carboxy-2',4',5',7'-tetrachlorofluorescein
  • ZOE 5-carboxy-2',4',5',7'-tetrachlorofluorescein
  • NAN NED
  • Cy3 Cy3.5
  • Cy5 Cy5.5
  • CyT CyT
  • Cy7.5 Alexa Fluor 350
  • Alexa Fluor 488 Alexa Fluor 532
  • Alexa Fluor 546 Alexa Fluor 568
  • Alexa Fluor 594 Alexa Fluor 647.
  • a detectable molecule can be directly attached to a nucleotide using methods well known in the art. Nucleotides can also be chemically modified or derivatized in order to attach a detectable molecule. For example, a fluorescent monomer such as a fluorescein molecule can be attached to dUTP (deoxyuridine-triphosphate) using a four-atom aminoalkynyl group. In this example, each detectable molecule may be attached to a nucleotide making a detectable molecule: nucleotide complex. Amine-reactive and thiol-reactive fluorophores are available and may be used for labeling nucleotides and biomolecules.
  • nucleotides may be fluorescently labeled during chemical synthesis, since incorporation of amines or thiols during nucleotide synthesis permit addition of fluorophores.
  • Fluorescently labeled nucleotides are commercially available.
  • uridine and deoxyuhdine triphosphates are available that are conjugated to ten different fluorophores that cover the spectrum.
  • Fluorescent dyes that can be bound directly to nucleotides can also be utilized as detectable molecules.
  • FAM, JOE, TAMRA, and ROX are amine reactive fluorescent dyes that have been attached to nucleotides and are used in automated DNA sequencing.
  • These fluorescently labeled nucleotides for example, ROX-ddATP, ROX-ddCTP, ROX-ddGTP and ROX- ddUTP, are commercially available.
  • the terms specifically binds or specifically hybridizes refer generally to an oligonucleotide probe or polynucleotide sequence that not only binds to its intended target gene sequence in a sample under selected hybridization conditions, but does not bind significantly to other target sequences in the sample, and thereby discriminates between its intended target and all other targets in the target pool.
  • a probe that specifically hybridizes to its intended target sequence may also detect concentration differences under the selected hybridization conditions.
  • An intended target sequence refers typically to a polynucleotide or nucleic acid sequence, which refers generally to mRNA, RNA, cRNA, cDNA or DNA (i.e., polymeric forms of nucleotides of at least 10 bases in length, either ribonucleotides or deoxynucleotides or a modified form of either type of nucleotide).
  • an oligonucleotide probe may specifically bind or specifically hybridize to at least a portion of one polynucleotide having a sequence selected from SEQ ID NOS:1 -200, including variants thereof (e.g., allelic variants, splice variants, etc.) that is differentially expressed in SCC cells (e.g., OSCC) compared to control cells are known to be free of SCC cells.
  • SCC cells e.g., OSCC
  • an oligonucleotide probe specifically hybridizes to one or more of a polynucleotide having all or a SCCIG-charactehstic portion of a SCCIG transcript (e.g., mRNA transcript of a SCC biomarker gene, or a fragment thereof), a polynucleotide having a nucleotide sequence that is fully complementary to a SCCIG-charactehstic portion of a SCCIG transcript, and/or a nucleic acid amplification product of the above noted polynucleotides (e.g., cDNA of a SCC biomarker gene or SSCIGS, or fragment thereof).
  • a polynucleotide having all or a SCCIG-charactehstic portion of a SCCIG transcript e.g., mRNA transcript of a SCC biomarker gene, or a fragment thereof
  • a nucleic acid amplification product may be obtained from a biological sample, for example, by performing RT-PCR on a sample cell extract that contains a polynucleotide, such as an mRNA transcript, having all or a SCCIG-charactehstic portion of a SCCIG transcript.
  • a polynucleotide such as an mRNA transcript, having all or a SCCIG-charactehstic portion of a SCCIG transcript.
  • a SCCIG-characteristic portion of a SSCIG transcript refers to a segment, stretch, domain, region, portion or the like of any one of the polynucleotides set forth as SEQ ID NOS: 1 -200 (or the full complement thereof), which comprises less than the full-length polynucleotide of the respective one of SEQ ID NOS: 1-200, and which has a nucleotide sequence that is unique to that particular sequence among all polynucleotide transcript sequences found in the species from which the SSCIG set is obtained ⁇ e.g., the human transchptome) such that an oligonucleotide probe that hybridizes specifically to the SSCIG- charactehstic portion does not exhibit full complementarity to any other transcript in the subject transchptome.
  • a SCCIG-characteristic portion of any one of SEQ ID NOS: 1 -200 may be derived from the biomarker gene sequences set forth in SEQ ID NOS:1 -200, or from variants thereof.
  • Nucleic acid hybridization conditions include those described herein and known in the art for controlled, detectable annealing of a first oligonucleotide or polynucleotide sequence to a second oligonucleotide or polynucleotide sequence (see Examples 1 and 2), and will often vary depending on the particular application.
  • the term “hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions” refers generally to conditions for hybridization and washing.
  • Low stringency conditions referred to herein may include and encompass from at least about 1 % v/v to at least about 15% v/v formamide and from at least about 1 M to at least about 2 M salt for hybridization at 42 0 C, and at least about 1 M to at least about 2 M salt for washing at 42 0 C.
  • Low stringency conditions also may include 1 % Bovine Serum Albumin (BSA), 1 mM EDTA, 0.5 M NaHPO 4 (pH 7.2), 7% SDS for hybridization at 65 0 C, and (i) 2 x SSC, 0.1 % SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO 4 (pH 7.2), 5% SDS for washing at room temperature.
  • BSA Bovine Serum Albumin
  • 1 mM EDTA 1 mM EDTA, 0.5 M NaHPO 4 (pH 7.2), 7% SDS for hybridization at 65 0 C
  • 2 x SSC
  • low stringency conditions includes hybridization in 6 x sodium chloride/sodium citrate (SSC) at about 45 0 C, followed by two washes in 0.2 x SSC, 0.1 % SDS at least at 5O 0 C (the temperature of the washes can be increased to 55° C for low stringency conditions).
  • medium stringency conditions may include and encompass from at least about 16% v/v to at least about 30% v/v formamide and from at least about 0.5 M to at least about 0.9 M salt for hybridization at 42 0 C, and at least about 0.1 M to at least about 0.2 M salt for washing at 55 0 C.
  • Medium stringency conditions also may include 1 % Bovine Serum Albumin (BSA), 1 mM EDTA, 0.5 M NaHPO 4 (pH 7.2), 7% SDS for hybridization at 65 0 C, and (i) 2 x SSC, 0.1 % SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO 4 (pH 7.2), 5% SDS for washing at 60-65 0 C.
  • BSA Bovine Serum Albumin
  • 1 mM EDTA 1 mM EDTA, 0.5 M NaHPO 4 (pH 7.2), 7% SDS for hybridization at 65 0 C
  • 2 x SSC 0.1 % SDS
  • BSA Bovine Serum Albumin
  • high stringency conditions may include and encompass from at least about 31 % v/v to at least about 50% v/v formamide and from about 0.01 M to about 0.15 M salt for hybridization at 42 0 C, and about 0.01 M to about 0.02 M salt for washing at 55 0 C.
  • High stringency conditions also may include 1 % BSA, 1 mM EDTA, 0.5 M NaHPO 4 (pH 7.2), 7% SDS for hybridization at 65 0 C, and (i) 0.2 x SSC, 0.1 % SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO 4 (pH 7.2), 1 % SDS for washing at a temperature in excess of 65° C.
  • high stringency conditions includes hybridizing in 6 x SSC at about 45 0 C, followed by one or more washes in 0.2 x SSC, 0.1 % SDS at 65 0 C.
  • very high stringency conditions includes hybridizing in 0.5 M sodium phosphate, 7% SDS at 65 0 C, followed by one or more washes in 0.2 x SSC, 1 % SDS at 65 0 C.
  • T m is the melting temperature, or temperature at which two complementary polynucleotide sequences dissociate. Methods for estimating T m are well known in the art (see Ausubel et al., supra at page 2.10.8).
  • T m 81.5 + 16.6 (logTM M) + 0.41 (%G+C) - 0.63 (% formamide) - (600/length)
  • M is the concentration of Na + , preferably in the range of 0.01 molar to 0.4 molar
  • %G+C is the sum of guanosine and cytosine bases as a percentage of the total number of bases, within the range between 30% and 75% G+C
  • % formamide is the percent formamide concentration by volume
  • length is the number of base pairs in the DNA duplex.
  • the T m of a duplex DNA decreases by approximately 1 ° C with every increase of 1 % in the number of randomly mismatched base pairs. Washing is generally carried out at T m - 15 0 C for high stringency, or T m - 3O 0 C for moderate stringency.
  • a membrane e.g., a nitrocellulose membrane or a nylon membrane
  • chip containing immobilized DNA is hybridized overnight at 42 0 C in a hybridization buffer (50% deionized formamide, 5 x SSC, 5 x Denhardt's solution (0.1 % ficoll, 0.1 % polyvinylpyrollidone and 0.1 % bovine serum albumin), 0.1 % SDS and 200 mg/mL denatured salmon sperm DNA) containing a labeled probe.
  • a hybridization buffer 50% deionized formamide, 5 x SSC, 5 x Denhardt's solution (0.1 % ficoll, 0.1 % polyvinylpyrollidone and 0.1 % bovine serum albumin), 0.1 % SDS and 200 mg/mL denatured salmon sperm DNA
  • the membrane is then subjected to two sequential medium stringency washes (i.e., 2 x SSC, 0.1 % SDS for 15 min at 45 0 C, followed by 2 x SSC, 0.1 % SDS for 15 min at 5O 0 C), followed by two sequential higher stringency washes (i.e., 0.2 x SSC, 0.1 % SDS for 12 min at 55 0 C followed by 0.2 x SSC and 0.1 % SDS solution for 12 min at 65-68 0 C.
  • 2 x SSC medium stringency washes
  • 0.1 % SDS for 15 min at 45 0 C
  • 2 x SSC 0.1 % SDS for 15 min at 5O 0 C
  • two sequential higher stringency washes i.e., 0.2 x SSC, 0.1 % SDS for 12 min at 55 0 C followed by 0.2 x SSC and 0.1 % SDS solution for 12 min at 65-68 0 C.
  • hybridization can be performed according to the Examples provided herein (see Example 1 ) or per other protocols known in the art.
  • a cRNA may be hybridized to an Affymetrix U133 2.0 Plus GeneChip array and scanned using an Affymetrix GeneChip array Scanner 3000 7G per Affymetrix protocols. (Affymetrix Corp., Santa Clara, CA).
  • PCR based protocols such as qtRT-PCR
  • hybridization can be performed according to the Examples provided herein (see Example 2) or per other protocols known in the art (see, e.g., Skrzypski et al., Lung Cancer.
  • the methods of identifying a risk for having, or presence of, SCC in a subject may include comparing SSCIGS expression levels in (i) a reference sample that is from a subject known to be free from SCC, with (ii) SSCIGS expression levels in a suspected biological sample, wherein the differential expression of the SCCIGS indicates the subject has, or is at risk for having, SCC.
  • Differential expression of a SSCIGS refers generally to a statistically significant difference, in a biological sample from a subject that is suspected of having or being at risk for having SCC, in one or more gene expression levels of SCC biomarker(s) or SSCIGS members(s) as compared to the expression levels of the same SSC biomarker(s) or SSCIGS member(s) in an appropriate cancer-free control.
  • the statistically significant difference may relate to either an increase or a decrease in expression levels, as measured by RNA levels, protein levels, protein function, or any other relevant measure of gene expression such as those described herein.
  • a result is typically referred to as statistically significant if it is unlikely to have occurred by chance.
  • the significance level of a test or result relates traditionally to a frequentist statistical hypothesis testing concept.
  • statistical significance may be defined as the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true (a decision known as a Type I error, or "false positive determination"). This decision is often made using the p-value: if the p-value is less than the significance level, then the null hypothesis is rejected. The smaller the p-value, the more significant the result.
  • Bayes factors may also be utilized to determine statistical significance (see, e.g., Goodman S., Ann Intern Med 130:1005-13, 1999).
  • the significance level of a test or result may reflect an analysis in which the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true is no more than the stated probability. This type of analysis allows for those applications in which the probability of deciding to reject may be much smaller than the significance level for some sets of assumptions encompassed within the null hypothesis.
  • statistically significant differential expression may include situations wherein the expression level of a given SSCIGS provides at least about a 1.2X, 1.3X, 1.4X, 1.5X, 1.6X, 1.7X, 1.8X, 1.9X.
  • statistically significant differential expression may include situations wherein the expression level of a given SSCIGS provides at least about 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 percent (%) or greater difference in expression (i.e., differential expression that may be higher or lower) in a suspected biological sample as compared to an appropriate control, including all integers and decimal points in between.
  • differential expression may also be determined by performing Z-testing, i.e., calculating an absolute Z score, as described herein and known in the art (see Example 1 ).
  • Z-testing is typically utilized to identify significant differences between a sample mean and a population mean. For example, as compared to a standard normal table (e.g., a control tissue), at a 95% confidence interval (i.e., at the 5% significance level), a Z-score with an absolute value greater than 1.96 indicates non-randomness. For a 99% confidence interval, if the absolute Z is greater than 2.58, it means that p ⁇ .01 , and the difference is even more significant — the null hypothesis can be rejected with greater confidence.
  • an absolute Z-score of 1.96, 2, 2.58, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or more, including all decimal points in between ⁇ e.g., 10.1 , 10.6, 11.2, etc.), may provide a strong measure of statistical significance.
  • an absolute Z-score of greater than 6 may provide exceptionally high statistical significance.
  • differential expression may also be determined by the mean expression value summarized by Affymetrix Microarray Suite 5 software (Affymetrix, Santa Clara, CA), or other similar software, typically with a scaled mean expression value of 1000.
  • a control tissue or reference tissue refers generally to a cell-based sample, such as an epithelial cell sample, that is known to be free of SCC cells according to currently accepted diagnostic criteria such as those described herein, and may also in certain embodiments relate to an epithelial cell-containing sample that is free of dysplastic cells.
  • epithelial cells refers generally to any one or more of many types of closely packed cells that form the epithelium covering the body ⁇ e.g., skin) and the linings of body cavities, for instance, membranous mucosal tissue covering internal organs and other internal surfaces of the body ⁇ e.g., the inside of the mouth, the respiratory tract, the gastrointestinal tract, etc.).
  • control tissues may be obtained from individuals undergoing tonsillectomy or oral surgery for treatment of diseases other than cancer, such as obstructive sleep apnea (see Example 1 ). Control tissues, or an appropriate extract thereof, may also be obtained from commercially available sources.
  • a control tissue or reference tissue may be either internal ⁇ i.e., from the same subject as the biological sample) or external ⁇ i.e., from a source or subject that is different from the biological sample).
  • a control cell or reference cell may be obtained as, or derived from, an oral epithelial cell.
  • a control cell or reference cell may be obtained as, or derived from, a normal epithelium from the pharynx, hypopharynx, larynx, oral cavity, sinus tissue, or other appropriate control tissue.
  • Certain embodiments of the present invention relate to the use of the herein described SCC biomarker genes or SSCIGS that can differentiate between SCC tumor cells ⁇ e.g., OSCC tumor cells) and dysplastic cells, i.e., dysplasia, a pre-neoplastic cellular state typically considered at risk for developing into SCC carcinoma in situ or invasive carcinoma, although not necessarily destined to do so.
  • SCC tumor cells ⁇ e.g., OSCC tumor cells
  • dysplastic cells i.e., dysplasia
  • a pre-neoplastic cellular state typically considered at risk for developing into SCC carcinoma in situ or invasive carcinoma, although not necessarily destined to do so.
  • Certain of these and related embodiments include methods for identifying the risk or presence of OSCC in a subject having oral epithelial dysplasia but no frank OSCC ⁇ i.e., OSCC is not readily apparent), as described herein, such as by comparing the expression levels of one or more selected SSCIGSs in a biological sample from that subject with the expression levels of a reference SSCIGS that is characteristic of a OSCC tumor cell, wherein the substantial similarity of the selected SSCIGS between the biological sample and the reference sample indicates the presence or risk of OSCC.
  • a subject having no frank OSCC may include a subject having no clinically detectable SCC carcinoma, typically as determined by standard diagnostic techniques known in the art and described herein.
  • Substantial similarly relates generally to the lack of a statistically significant difference in the expression levels between the biological sample and the reference control.
  • substantially similar expression levels may include situations wherein the expression level of a given SSCIGS provides less than about a .05X, 0.1X, 0.2X, 0.3X, 0.4X, 0.5X, 0.6X, 0.7X, 0.8X, 0.9X. 1.OX., 1.1X, 1.2X, 1.3X, or 1.4X difference in expression ⁇ i.e., differential expression that may be higher or lower expression) in a suspected biological sample as compared to an OSCC reference sample, including all decimal points in between ⁇ e.g., .15X, 0.25X, 0.35X, etc.).
  • differential expression may include situations wherein the expression level of a given SSCIGS provides less than about 0.25. 0.5, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50 percent (%) difference in expression (i.e., differential expression that may be higher or lower) in a suspected biological sample as compared to a reference sample, including all decimal points in between.
  • the particular SCC biomarker genes (e.g., SCCIG) described in Figure 6, including variants thereof, may be utilized to differentiate between SCC tumor cells and dysplastic cells, since as described herein these genes are differentially expressed between SCC tumor cells and dysplastic/control cells.
  • certain embodiments may include the use of oligonucleotide probes in the methods for differentiating between SCC tumor cells and dysplastic cells, such as the exemplary probes described in Figures 6 and 9, including variants thereof that can specifically hybridize to a differentially expressed SCC biomarker or SSCIG.
  • the oligonucleotide probe may be specifically hybridized to one or more of (i) a polynucleotide having all or a SCCIG-charactehstic portion of a SCCIG transcript as provided herein (e.g., mRNA transcript of a SCC biomarker gene, or a fragment thereof), (ii) a polynucleotide having a nucleotide sequence that is fully complementary to a SCCIG-characteristic portion of a SCCIG transcript, and/or (iii) a nucleic acid amplification product of the above noted polynucleotides (e.g., cDNA of a SCC biomarker gene or SSCIGS, or fragment thereof).
  • a polynucleotide having all or a SCCIG-charactehstic portion of a SCCIG transcript as provided herein e.g., mRNA transcript of a SCC biomarker gene, or a fragment thereof
  • a nucleic acid amplification product may be obtained from a biological sample, for example, by performing reverse transcription-polymerase chain reaction amplification (RT- PCR) of polynucleotide transcription products such as mRNAs that are present in a cell extract and that have all or a SCCIG-characteristic portion of a SCCIG transcript.
  • RT- PCR reverse transcription-polymerase chain reaction amplification
  • a SCCIG-characteristic portion of a SSCIG transcript may be derived from the biomarker gene sequences or SSCIGS set forth in SEQ ID NOS:1-200, and in particular, in the biomarker genes referred to in Figure 6, including variants thereof that are differentially expressed in a SCC tumor cell as compared to a dysplastic cell.
  • tissues were obtained from patients either having or suspected of having oral SCC (OSCC) and from control patients.
  • OSCC oral SCC
  • Eligible cases were patients with their first primary OSCC scheduled for surgical resection or biopsy between December 1 , 2003 and April 17, 2007 at the University of Washington Medical Center, Harborview Medical Center and the VA Puget Sound Health Care System in Seattle, Washington. Patients with diagnosed dysplastic lesions were also enrolled at these medical centers during the same period.
  • Eligible controls were patients who had tonsillectomy or oral surgery for treatment of diseases other than cancer, e.g., obstructive sleep apnea, at the same institutions and during the same time periods in which the OSCC cases were treated.
  • Tumor tissue was obtained at time of resection or biopsy from patients with a primary OSCC, or dysplasia. Clinically normal tissue from the oral cavity or oropharynx was obtained from controls. For the small number of controls (-30%) with tonsillitis or tonsil hypertrophy, only mucosa tissue from tonsillar pillar was obtained to avoid potential influence of inflammation on the results. Immediately after surgical removal, the tissue was immersed in RNALater (Applied Biosystems, Inc. Foster City, CA) for a minimum of 12 hours at 4°C before being transferred to long term storage at 80 0 C prior to use.
  • RNALater Applied Biosystems, Inc. Foster City, CA
  • a "back extraction buffer” 4 M guanidine thiocyanate, 50 mM sodium citrate, and 1 M Tris, pH 8.0.
  • RNA was used to generate biotin-labeled cRNA using the GeneChip Expression 3'-Amplification Reagents Kit (Affymetrix) per manufacturer's protocol.
  • the cRNA was hybridized to an Affymetrix U133 2.0 Plus GeneChip array and scanned using an Affymetrix GeneChip array Scanner 3000 7G in the Fred Hutchinson Cancer Research Center's Genomics Shared Resources per Affymetrix protocols.
  • At least one clinically normal tissue sample from a control subject was processed in tandem with every seven to eight tumor tissue samples from OSCC cases.
  • Preprocessing and probe set filtering was performed on the GeneChip arrays that passed QC checks.
  • the gcRMA algorithm from Bioconductor was used to extract gene expression values and perform normalization.
  • probe sets were eliminated that either showed no variation across the samples being compared ⁇ i.e., inter quartile range (IQR) of expression levels less than 0.1 on Iog2 scale) or were expressed at very low magnitude ⁇ i.e., any probe set in which the maximum expression value for that probe set in any of the samples was less than 3 on Iog2 scale). After these criteria were applied, -21 ,000 probe sets remained for differential expression analyses.
  • IQR inter quartile range
  • NFD false discoveries
  • this list of candidate probe sets was further narrowed using the following criteria, which retained only those probe sets that showed a significantly large difference in signal intensity between cases and controls: 1 ) absolute Z- score of greater than 6 in the differential gene expression analysis, implying exceptionally high statistical significance; 2) a 1.5-fold or greater difference in gene expression between controls and cases and, 3) the mean expression value summarized by Affymetrix Microarray Suite 5.0 across samples >300 (with the scaled mean expression value of 1000). Probe sets with such expression values are more likely to be suitable for validation by alternative methodologies such as qRT-PCR. A selected number of probe sets and their corresponding biomarker genes, including variants thereof, were selected by these three criteria (see Figures 2 and 7).
  • the selected probe sets were analyzed using both forward and hybrid of forward-backward logistic regression procedures (SAS PROC LOGISTIC). For the one OSCC case with results from 5 replicate tissues and one control with results from duplicate tissues, the respective average of the replicate results was used.
  • SAS PROC LOGISTIC forward-backward logistic regression procedures
  • probe sets were processed in the logistic regression model: one probe set at a time until no probe set could be added based on the significance level of 0.01.
  • the hybrid stepwise selection was adopted, the probe set with the smallest p-values and p ⁇ 0.01 entered first, and significance levels for other selected probe sets was evaluated for possible removal if their p-values were greater than 0.05 in the current model.
  • ROC receiver operating characteristic
  • the selected prediction models were validated using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma (HNSCC) cases and 14 controls (Gene Expression Omnibus (GEO) GSE6791 , www.ncbi.nlm.nih.gov/geo) (13). CEL files from these datasets were extracted using the gcRMA algorithm. ROC curves were drawn by applying the expression results to the prediction models.
  • GEO Gene Expression Omnibus
  • Figures 2 and 7 list the selected probe sets and their corresponding biomarker genes that were differentially expressed between OSCC and controls based on the criteria described above. Included among these probe sets are transforming growth factor (TGFB1), cell signaling molecule (STAT1), immune markers (/L 7/3), chemokines ⁇ CXCL2, CXCL3, CXCL9), and genes encoding for extracellular matrix proteins and collagens that have previously been shown to be involved in the motility and invasion of tumor cells. Hierarchical clustering of gene expression using the selected probe sets showed that invasive OSCC and normal control formed two main clusters. About half the dysplasia tissues clustered with OSCC samples and half clustered with the controls.
  • TGFB1 transforming growth factor
  • STAT1 cell signaling molecule
  • immune markers /L 7/3
  • chemokines ⁇ CXCL2, CXCL3, CXCL9 genes encoding for extracellular matrix proteins and collagens that have previously been shown to be involved in the motility and invasion of tumor cells.
  • FIG. 3 shows the top 10 SCCIGS models from the logistic regression analyses of the selected probe sets in the training data set.
  • the model with LAMC2 probe set 207517_at, encoding Iaminin- ⁇ 2; SEQ ID NOS:91 and 92
  • COL4A1 probe set 211980_at, encoding collagen type IV ⁇ 1 ; SEQ ID NO:113
  • Described herein are selected probe sets, corresponding to a variety of known genes (see SEQ ID NOS:1-200), which are highly effective in distinguishing invasive OSCC and normal oral tissue. Also described herein is a list of genes that may be involved in the transformation of normal oral tissue to dysplasia, as well as a list of genes that may be involved in the transformation of oral dysplasia to invasive OSCC (see Example 3; and Figure 8).
  • Embodiments of the present invention provide prediction models that were generated using rigorous statistical analyses, and the differences in gene expression detected using microarray technology was validated not only by qRT-PCR, but by testing against independent internal and external genome-wide gene expression datasets. The result has been to generate candidate markers and indicator gene sets that can be easily applied to the testing of biopsies or surgical margins to aid diagnosis and prognosis of OSCC.
  • the prediction models and the biomarker genes identified herein may provide utility in predicting local recurrence at surgical margins or the development of second primary cancer of OSCC patients, or for selective screening of individuals who are at high risk of OSCC. If otherwise histologically-negative margins harbor microscopic original tumors as residual disease, the gene expression profiles of such margins would more likely resemble those of the resected invasive OSCC, such that the measurement of one or more of the biomarker genes or SCCIGs identified herein, and/or application of one of the predictive models described herein, could potentially be of use for the detection of residual tumor cells.
  • the expression patterns of two pairs of genes may be particularly effective in distinguishing OSCC from normal oral tissue in independent testing sets.
  • the sensitivity and specificity were close to 100%.
  • candidate markers e.g., SCCIGS
  • Laminin binds to Type IV collagen and to many cell types via cell surface laminin receptors (24). Following attachment to laminin in the basement membrane, tumor cells secrete collagenase IV that specifically breaks down type IV collagen and thereby facilitates cell spreading and migration (25). In addition, laminin fragments generated by post-translational proteolytic cleavage bind to cell surface integrins and other proteins to trigger and modulate cellular motility (26). Increased levels of laminin have been associated with a number of carcinoma (27-35). In some of these studies, laminin was associated with tumor aggressiveness, metastasis and poor prognosis.
  • results from mouse models showed that tumor cells with high levels of laminin and low level of unoccupied laminin receptor are resistant to killing by natural cytotoxic T cells and are highly malignant (36), and that treatment with low concentrations of laminin receptor binding fragments of laminin blocked lung metastasis of hematologenously introduced tumor cells (37).
  • a large number of unoccupied laminin receptors have been observed for breast and colon cancer cells (25); no similar reports have appeared on OSCC or HNSCC cells.
  • the gene products of COL4A1 and COL4A2 are assembled into type IV collagen that form the scaffold of basement membrane integrating other extracellular molecules, including laminin, to produce a highly organized structural barrier. Collagen IV also plays an important role in the interaction of basement membrane with cells (38, 39). Immune cells, migrating endothelial cells and metastatic tumor cells have been reported to produce and tightly regulate type IV collagen-specific collagenase (40-42).
  • Peptidyl arginine deiminases (EC 3.5.3.15) catalyze post- translational modification of proteins through conversion of arginine residues to citrullines. Although their physiological functions are not well understood, these deiminases have been implicated in the genesis of multiple sclerosis, rheumatoid arthritis, and psoriasis (43).
  • the isoform peptidyl arginine deiminases type 1 (PADM ) is present in the keratinocytes of all layers of human epidermis. It has been reported that deimination of filaggrin by PADM is necessary for epidermal barrier function and deimination of keratin K1 may lead to ultrastructural changes of the extracellular matrix (43).
  • PADM is downregulated in both dysplasia and OSCC when compared to controls. If deimination of arginine residues of proteins in the keratinocytes of oral mucosa by PADM forms an epidermis barrier, downregulation of PADM may allow the growth, expansion and movement of tumor cells.
  • qRT-PCR quantitative real-time PCR
  • each sample containing 7.5 ng purified total RNA was assayed in triplicate in 10 ⁇ l reaction volumes using the QuantiTectTM SYBR Green RT-PCR kit (Qiagen, Valencia, CA) and bioinformatically validated QuantiTect primers (Qiagen, Valencia, CA) on a 7900HT Sequence Detection System (ABI, Foster City, CA).
  • the cycling conditions were as follows: 30 minutes at 50 0 C, 15 minutes at 95°C, and 40 cycles of 15 seconds at 94°C, 30 seconds at 55°C, and 30 seconds at 72°C.
  • COL1A1 NM_000088
  • COL4A1 NM_0018405
  • LAMC2 NM_005562
  • PADH PADH
  • 80-bp amplicon spanning exons 3, 4, and 5 was amplified.
  • ACTB was used as the reference gene and amplified a 146-bp amplicon that spanned exons 3 and 4.
  • Ten-point standard curves were generated using Universal Human Reference RNA (Stratagene, La JoIIa, CA) for all genes except PADM , for which Normal Adjacent Esophagus Total RNA (Ambion, Austin, TX) was used.
  • the linear correlation coefficient (R2) was 0.99 or greater for all runs.
  • the mean threshold cycle (Ct) values were calculated from the triplicate Ct values. Mean Ct values were further normalized in relation to the mean Ct value of the ACTB gene.
  • dysplastic lesions and OSCC samples were combined and compared with the controls.
  • the genes that were also differentially expressed between dysplasia and cancer were excluded.
  • the resulting gene list contained genes that showed up- or downregulation relative to normal tissue as early as dysplasia.
  • Figure 6 lists the biomarker genes and probe sets that may be specific for the conversion of oral dysplasia to OSCC. Further, selected probe sets that were specific for the development of dysplasia from normal tissue were also identified.
  • genes that are involved in, and specific for, the malignant transformation of oral dysplasia into invasive OSCC include genes that are involved in, and specific for, the malignant transformation of oral dysplasia into invasive OSCC. Included genes encode for proteins having known roles in cell- ECM (extracellular matrix) and cell-cell interactions, and/or in cellular motility, migration and/or invasion, such as LAMC2 and SERPINE1 (PAI-1 ); for directed- cellular movement, such as CXCL2, CXCL3, and CXCL9, as well as for immune function, such as IL1 ⁇ and IFIT3.
  • LAMC2 and SERPINE1 PAI-1
  • directed- cellular movement such as CXCL2, CXCL3, and CXCL9
  • immune function such as IL1 ⁇ and IFIT3.
  • cytokines and growth factors can transmit signals through the JAK/STAT pathway (44, 45).
  • EGFR over-expression has been reported in up to 90% of HNSCC tumors (46).
  • Single modality therapeutics that target and negatively regulate EGFR such as small molecule tyrosine kinase inhibitors, monoclonal antibodies, antisense therapy or immunotoxin conjugates, however, were only effective in 5-15% of patients with advanced HNSCC (47). These observations suggest that there are other proteins and pathways driving the growth of some of these tumors.
  • the results described herein are believed to be the first to show a strong association between the IFN- ⁇ signaling pathway and OSCC, noting that IFN-Y signaling also involves the JAK/STAT pathway (44, 48).
  • the model containing LAMC2 and COL4A1 distinguished head and neck squamous cell carcinoma (HNSCC) from controls, but distinguished neither cervical cancer nor lung cancer from their respective controls ( Figure 5, top panel).
  • the COL1A1 and PADH predictive model also performed well for HNSCC and, to a lesser extent, for lung cancer ( Figure 5, bottom panel). Furthermore, these results showed that the models described herein could not only distinguish invasive cancer from controls, but could also distinguish oral dysplasia from controls.
  • the respective AUC was 0.98 for LAMC2 and COL4A1 and 0.99477 for COL1A1 and PADH.
  • the effect observed for the model LAMC2 and COL4A1 was primarily driven by COL4A1, suggesting that COL4A1 up- regulation occurred earlier than LAMC2 up-regulation in oral carcinogenesis.
  • Zhao LP Prentice R, Breeden L. Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc Natl Acad Sci USA 2001 ;98:5631-6.
  • Haslam SZ Woodward TL. Host microenvironment in breast cancer development: epithelial-cell-stromal-cell interactions and steroid hormone action in normal and cancerous mammary gland. Breast Cancer Res 2003;5:208- 15. 31. Kaklamani VG, Gradishar WJ. Gene expression in breast cancer. Current Treat Options Oncol 2006;7: 123-8.
  • OSCC invasive oral squamous cell carcinoma
  • OSCC can be sub-classified on the basis of gene expression (e.g., using OSCCIGSS).
  • OSCCIGSS OSCCIGSS
  • Gene expression and cancer stage combined predicted survival of OSCC patients better than stage alone.
  • OSCC oral squamous cell carcinoma
  • the overall prognosis for advanced stage disease has not improved significantly in the past two decades (1 ).
  • One of the impediments to the effective management of OSCC patients is the limited ability to predict the natural history of individual lesions.
  • MATERIALS AND METHODS Study population As described in Chen et al., English-speaking patients 18 year of age or older were identified with a first, primary OSCC or dysplasia undergoing surgery or biopsy between December 16 th , 2003 and April 17 th , 2007 at one of the three University of Washington-affiliated hospitals: University of Washington Medical Center, Harborview Medical Center and the Puget Sound Veterans Affairs Health Care System (VA). Eligible controls were patients who were scheduled to undergo surgery of the oral cavity or oropharynx for non-cancer treatment, such as tonsillectomy or sleep apnea, at the aforementioned institutions during the same time period the cases were recruited. All patients recruited to the study were interviewed in person using a structured lifestyle and medical history questionnaire.
  • Comorbidity scores were calculated using Adult Comorbidity Evaluation-27 Test (9,10). Patients were followed actively through phone contact and passively through review of medical records and linkage to the U.S. Social Security Death Index. If a patient had died, the death was classified as due to OSCC or not due to OSCC based on review of medical records and death certificates. All participants gave informed consent, and all study procedures were approved by the Institutional Review Boards of the Fred Hutchinson Cancer Research Center, University of Washington, and the VA.
  • the 131 probe set list was obtained by comparing the differential gene expression between 119 OSCC cases and 35 normal controls as described in the preceding Examples and in Chen et al (8).
  • Prediction model building for OSCC-specific mortality For this analysis, a total of 150 OSCC cases were used: 119 cases which had been used to derive the 131 probe sets in the preceding Examples (e.g., Fig. 2A) (8), plus an additional 31 cases that were recruited thereafter for which vital status information was obtained and at least 4 months follow-up.
  • PC principal components
  • ROC Receiver Operating Characteristics
  • the nearest 10% was used to estimate true positive and false positive rates.
  • the survival ROC package http://faculty.washington.edu/heagerty/Software/SurvROC), available for R-project software, was used to implement these methods.
  • the Area Under the Curve (AUC) was calculated to quantify the ability of each model to predict two year survival. One thousand bootstrap samples were generated to estimate standard errors and 95% confidence intervals for AUC estimates, and to obtain p- values for testing the null hypothesis that specific gene expression values or PCA do not add to ability of stage to predict survival.
  • a jackknife leave-one-out analysis (16) was performed. Parameter estimates for the risk model were obtained excluding one subject, and the resulting risk model was used to estimate a risk score based on the excluded subject's gene expression and/or stage characteristics. This process was repeated until risk scores were assigned to each subject. ROC and AUC estimates were calculated for these jackknife risk scores as they were for the original risk scores.
  • qRT-PCR was performed to validate the expression of the four genes found to be related to survival in the top two models. Sixty samples were chosen at random for testing. Each sample was assayed in triplicate in 10 ⁇ l reaction volumes using the QuantiTect SYBR Green RT-PCR kit (Qiagen, Valencia, CA) and bioinformatically validated QuantiTect primers (Qiagen, Valencia, CA) on a 7900HT Sequence Detection System (ABI, Foster City, CA).
  • the cycling conditions were as follows: 30 minute incubation at 50° C, 15 minute incubation at 95° C, and 40 cycles each of 15 seconds at 94° C, 30 seconds at 55° C, and 30 seconds at 72° C.
  • the fragment amplified included: 1 ) For LAMC2 (NM_005562) a 74-bp amplicon spanning exons 18 and 19 ; 2) For OASL (NM_003733), a 98-bp amplicon spanning exons 4 and 5; 3) for OSMR (NM_003999) a 113-bp amplicon spanning exons 13 and 14 ;4)
  • SERPINE1 NM_000602
  • SERPINE1 a 105-bp amplicon spanning exons 3 and 4
  • ACTB a 146-bp amplicon spanning exons 3 and 4.
  • Figure 11 shows the results of a principal component analysis (PCA) on the 131 probe set expression data based on the samples' phenotype (normal, dysplasia or cancer).
  • PCA principal component analysis
  • the first principal component (PC) which accounted for the greatest amount of variability, captured 60.26% of the variance, whereas the second PC captured 6.31 %.
  • the controls and OSCC cases were at opposite ends of the spectrum with dysplasia samples in between (Fig. 11 ).
  • the same group of 45 OSCC samples identified in the hierarchical cluster analysis was at one extreme on the basis of the first PC scores (Fig. 11 ). Although some dysplastic lesions had first PC scores that overlapped with OSCC, none reached the first PC scores of the group of 45 OSCC samples.
  • LAMC2 (207517_at) 0.59151*LAMC2
  • HNSCC head and neck squamous cell carcinomas
  • 62 probe sets were differentially expressed between the group of 45 patients and the remaining OSCC. Ingenuity Pathway Analysis of the 62 probe sets showed an overrepresentation of genes involved in cell migration; cell- to-cell signaling and interaction; and cellular growth and proliferation.
  • the proteins encoded by these genes reside in the extracellular matrix and are believed to function, for example, in angiogenesis, platelet aggregation and/or cell movement.
  • THBS1 and PDPN have both been ascribed a role in platelet aggregation and may be involved in tumor metastasis by facilitating tumor cell-platelet interactions and platelet-facilitated tumor cell metastasis (20). It is also known that THBS1 binds with members of the tenacin family and SPARC/osteonectin (20). In fact, tenacin C and SPARC were found to be significantly upregulated at both the gene expression and protein levels in this and other studies by our group (2,24).
  • CDH3 P-cadherin
  • a component of the 10 th model described here is associated with cell-to-cell signaling, and the CDH3 gene has previously been shown to be significantly downregulated in metastatic tumors cell isolated from lymph nodes (25).
  • the findings in this Example showing that the dysregulation of these genes' expression was associated with OSCC-specific survival were consistent with the non-limiting theory that tumor proliferation and metastasis may be mediated by complex interactions between extracellular matrix proteins and cell-surface receptors.
  • OASL appears to be a member of a family of Trips (Thyroid hormone-interacting proteins) and may thus be involved in signal transduction in the presence of thyroid hormone (26).
  • Oncostatin M receptor (OSMR) is a member of the IL6 cytokine family and is thought to be involved in signal transduction and proliferation (27,28).
  • This Example is believed to provide the first demonstration of an association between a gene signature and OSCC-specific survival, and in particular of the use of gene expression data to improve upon AJCC stage in predicting survival.
  • regression models that combined stage with gene expression had significantly higher AUCs than stage alone ( Figure 13). Given the recent emphasis on genome-wide gene expression studies to find signatures predictive of clinical outcomes, these and related embodiments will permit integration of meaningful genetic data into clinical practice.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods are provided for determining the presence or risk of squamous cell carcinoma in a subject, including head and neck carcinoma, oral squamous cell carcinoma (OSCC), and pre-neoplastic dysplasia. Specifically, the invention relates to gene expression profiling of selected squamous cell carcinoma biomarker genes or gene indicator sets. In certain embodiments, the differential expression of selected biomarker genes or gene indicator sets in a subject sample as compared to a control sample identifies the presence or risk of squamous cell carcinoma, or an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC. Methods are also provided for differentiating between squamous cell carcinoma cells and pre-neoplastic dysplastic cells in a subject. Selected oligonucleotide probes are also provided for detecting the squamous cell carcinoma biomarker genes or gene indicator sets.

Description

GENE EXPRESSION PROFILING IDENTIFIES GENES PREDICTIVE OF ORAL SQUAMOUS CELL CARCINOMA AND ITS PROGNOSIS
CROSS-REFERENCE TO RELATED APPLICATION
This application claims the benefit under 35 U. S. C. § 119(e) of U.S. Provisional Patent Application No. 61/152,541 filed February 13, 2009, where this provisional application is incorporated herein by reference in its entirety.
STATEMENT OF GOVERNMENT INTEREST
This invention was made in part with government support under Grant No. R01 CA095419 awarded by the National Cancer Institute/ National Institutes of Health, National Research Service Award T32DC00018 from the National Institute on Deafness and Other Communication Disorders, and trans- NIH K12RR023265 Career Development Programs for Clinical Researchers. The government has certain rights in this invention.
STATEMENT REGARDING SEQUENCE LISTING
The Sequence Listing associated with this application is provided in text format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The name of the text file containing the Sequence Listing is 360056_402PC_SEQUENCE_LISTING.txt. The text file is 883 KB, was created on July 22, 2009, and is being submitted electronically via EFS-Web.
BACKGROUND
Technical Field
The presently disclosed invention embodiments relate to compositions and methods for the detection and treatment of cancer. In particular, the present embodiments relate to identifying the presence of, or a risk for having, squamous cell carcinoma including oral squamous cell carcinoma (OSCC) and head-and-neck squamous cell carcinoma (HNSCC) in a subject, by identifying differential expression of one or more squamous cell carcinoma indicator genes as described herein.
Description of the Related Art
Squamous cell carcinoma of the oral cavity and oropharynx (OSCC) is of considerable public health significance. In the United States, it is estimated that nearly 35,000 new OSCC cases were diagnosed in 2007, and approximately 7,550 OSCC deaths are estimated to occur (see http://www.cancer.org, web site of the American Cancer Society, Atlanta, GA). World-wide, OSCC is the 6th most common caner, with an estimated 405,000 new cases and 211 ,000 deaths annually (http://www-dep.iarc.fr, International Agency for Research on Cancer, Lyon, France) (1 ). Despite considerable advances in surgical techniques, and the use of adjuvant treatment modalities, the 5-year survival rate for OSCC patients is about 60% for U.S. Caucasians and 36% for U.S. African-Americans (http://www.cancer.org, American Cancer Society, Atlanta, GA). In addition, OSCC is often associated with loss of eating and speech function, disfigurement and psychological distress. As much as 20% of oral dysplasia undergoes malignant transformation to OSCC (2, 3).
Among post-surgical OSCC patients with histologic positive tumor margins, the likelihood of local recurrence is as high as 70 to 80%. Even among post-surgical OSCC patients with negative margins, the reported probability of recurrence is 30-40% (4), suggesting histologic examination alone is inadequate in predicting recurrence (4-6).
Clearly there is an urgent need to identify better ways to predict which patients with dysplastic precursor lesions are likely to develop OSCC, and to predict whether patients who have been surgically treated for OSCC are likely to relapse, so that high-risk patients can be selected for more rigorous treatment and follow-up. The present invention addresses these needs and provides other related advantages. BRIEF SUMMARY
In certain embodiments the present invention provides a method for identifying a risk for having, or presence of, oral squamous cell carcinoma (OSCC) in a subject, the method comprising (a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; and (b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells; wherein differential expression of the SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC.
In certain further embodiments the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1 -200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells. In certain embodiments the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, and (c) the SCCIGS consisting of COL1A1 and PADM genes. In certain embodiments the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, (c) the SCCIGS consisting of COL1A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene, (e) the SCCIGS consisting of KRT17 and PRSS3 genes, (f) the SCCIGS consisting of COL1 A2 and EST 230740_1 at genes, (g) the SCCIGS consisting of COL1A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 (referred to as C2orf54 in the Affymethx database) and HAS3 genes, (i) the SCCIGS consisting of POSTN and TIA2 (referred to as PDPN in the Affymetrix database) genes, (j) the SCCIGS consisting of MGC40368 (referred to as TCP11 L2 in the Affymetrix database), GIP3 (referred to as IFI6 in the
Affymetrix database) and COL27A1 genes, (k) the SCCIGS consisting of CDH3 and ELOVL6 genes, (I) the SCCIGS consisting of the COL4A1 gene and (m) the SCCIGS consisting of genes identified by 131 probe sets as set forth in Figure 2A.
In certain embodiments the step of determining a SCCIGS expression level comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level. In certain further embodiments the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331. In certain embodiments the biological sample comprises a biopsy tissue, which in certain further embodiments is selected from an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue. In certain other embodiments the biological sample comprises one or a plurality of dysplastic cells.
In other embodiments there is provided a method for identifying a risk for having, or presence of, oral squamous cell carcinoma (OSCC) in a subject having oral epithelial dysplasia but no frank OSCC, the method comprising (a) determining a squamous cell carcinoma indicator gene set
(SCCIGS) expression level in a biological sample from the subject that comprises at least one dysplastic oral epithelial cell; and (b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of OSCC cells; wherein substantial similarity of the SCCIGS expression level in the biological sample relative to the OSCC reference SCCIGS expression levels indicates the subject has, or is at risk for having, OSCC. In certain further embodiments the squamous cell carcinoma indicator gene set (SCCIGS) comprises any one or more of the genes shown in Figure 6. In certain other embodiments the step of determining a SCCIGS expression level comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of: (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG- specific probe, and thereby determining the SCCIGS expression level. In certain further embodiments the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of the probes listed in Figure 9. In certain other embodiments the biological sample comprises a biopsy tissue. In certain other embodiments the subject has no detectable cancer and the biological sample comprises one or a plurality of dysplastic cells.
Turning to another embodiment, there is provided a method for identifying a risk for having, or presence of, a squamous cell carcinoma (SCC) in a subject, wherein the SCC is selected from oral SCC (OSCC) and head-and- neck SCC (HNSCC), the method comprising (a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; and (b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells, wherein: if the biological sample comprises an OSCC cell then the control tissue comprises normal oral epithelium, and if the first biological sample comprises a HNSCC cell then the control tissue comprises normal epithelium from oropharynx, hypopharynx, larynx or oral cavity; and wherein differential expression of the SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC. In certain further embodiments the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1 -200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
In certain embodiments the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, and (c) the SCCIGS consisting of
COL1A1 and PADM genes. In certain embodiments the SCCIGS is one or more SCCIGS selected from (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, (c) the SCCIGS consisting of COL1A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene, (e) the SCCIGS consisting of KRT17 and PRSS3 genes, (f) the SCCIGS consisting of COL1 A2 and EST 230740_1at genes, (g) the SCCIGS consisting of COL1A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 (referred to as C2orf54 in the Affymetrix database) and HAS3 genes, (i) the SCCIGS consisting of POSTN and TIA2 (referred to as PDPN in the Affymetrix database) genes, (j) the SCCIGS consisting of MGC40368 (referred to as TCP11 L2 in the Affymetrix database), GIP3 (referred to as IFI6 in the Affymetrix database) and COL27A1 , (k) the SCCIGS consisting of CDH3 and ELOVL6 genes, (I) the SCCIGS consisting of the COL4A1 gene and (m) the SCCIGS consisting of genes identified by 131 probe sets as set forth in Figure 2A.
In certain embodiments the biological sample comprises an OSCC cell and the control tissue comprises normal oral epithelium. In certain embodiments the biological sample comprises a HNSCC cell and the control tissue comprises normal epithelium from oropharynx, hypopharynx, larynx or oral cavity. In certain embodiments the step of determining a SCCIGS expression level comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of: (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level. In certain further embodiments the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331. In certain other further embodiments the biological sample comprises a biopsy tissue. In certain still further embodiments the biopsy tissue is selected from the group consisting of an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue. In certain embodiments the biological sample comprises one or a plurality of dysplastic cells. In certain other embodiments of the present invention there is provided a method for identifying an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC, the method comprising: (a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; (b) determining that the subject has, or is at risk for having, OSCC by comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells, wherein a differentially expressed SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC; and (c) identifying within said differentially expressed SCCIGS a presence or absence of a substantially up- or down-regulated SCCIGS subset (SCCIGSS), wherein presence of the substantially up- or down-regulated SCCIGSS indicates the subject has an increased risk of OSCC-specific mortality.
In certain further embodiments the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1 -200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells. In certain embodiments the SCCIGSS is one or more SCCIGSS selected from the group consisting of (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGSS consisting of a LAMC2 gene, (c) the SCCIGSS consisting of OSMR, SERPINE1 and OASL genes, (d) the SCCIGSS consisting of a SLC16A1 gene, (e) the SCCIGSS consisting of a KLF7 gene, (f) the SCCIGSS consisting of THBS1 and SLC16A1 genes, (g) the SCCIGSS consisting of a HOMER3 gene, (h) the SCCIGSS consisting of a GRP68 gene, (i) the SCCIGSS consisting of a PDPN gene, (j) the SCCIGSS consisting of an ANKRD35 gene, and (k) the SCCIGSS consisting of CDH3 and EPS8L1 genes. In certain embodiments (i) the SCCIGS is one or more SCCIGS selected from the group consisting of (a) the SCCIGS consisting of a LAMC2 gene, (b) the SCCIGS consisting of LAMC2 and COL4A1 genes, (c) the SCCIGS consisting of COL1A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene, (e) the SCCIGS consisting of KRT17 and PRSS3 genes, (f) the SCCIGS consisting of COL1A2 and EST 230740_1 at genes, (g) the SCCIGS consisting of COL1A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 and HAS3 genes, (i) the SCCIGS consisting of POSTN and TIA2(PDPN) genes, (j) the SCCIGS consisting of MGC40368(TCP11 L2), GIP3(IFI6) and COL27A1 genes, (k) the SCCIGS consisting of CDH3 and ELOVL6 genes, (I) the SCCIGS consisting of the COL4A1 gene, and (m) the SCCIGS consisting of genes identified by 131 probe sets as set forth in Figure 2A, and wherein (ii) the SCCIGSS consists of one or more genes identified by a probe set as set forth in Table 2.
In certain further related embodiments at least one of the steps selected from the step of determining a SCCIGS expression level and the step of identifying a SCCIGSS comprises (a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of (i) all or a SCCIG-characteristic portion of a SCCIG transcript, (ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and (iii) a nucleic acid amplification product of one or more of (i) and (ii); and (b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level. In certain further embodiments the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from SEQ ID NOS:201-331. In certain embodiments the biological sample comprises a biopsy tissue, which in certain further embodiments is selected from an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue. In certain embodiments the biological sample comprises one or a plurality of dysplastic cells.
In certain related embodiments determining one or a plurality of SCCIGS expression levels comprises measuring one or more protein levels in the biological sample. In certain further embodiments the biological sample comprises a biological fluid, which in certain still further embodiments is selected from saliva, blood, serum, plasma and lymph. These and other aspects of the invention will be evident upon reference to the following detailed description and attached drawings. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet, are incorporated herein by reference in their entirety, as if each was incorporated individually. Aspects of the invention can be modified, if necessary, to employ concepts of the various patents, applications and publications to provide yet further embodiments of the invention.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
Figure 1 shows the most prominently involved biological pathways in oral squamous cell carcinoma (OSCC). Top: IFN-γ signaling pathway.
Bottom: JAK/STAT pathway. Analysis was performed using lngenuity®Systems, version 4.0. Figure 2 shows the selected biomarker genes and corresponding probe sets that were shown to be differentially expressed between OSCC and controls based on the criteria described in Example 1.
Figure 3 shows the top 10 squamous cell carcinoma indicator gene set (SSCIGS) models from the logistic regression analyses of the selected biomarker genes from Figure 2. The predictive power of these SSCIGS models was validated using internal and external (GSE6791 ) controls, as measured by the area under the curve (AUC). An AUC of 0.5 represents a test that is no better than chance at discriminating between cases and controls, and an AUC of 1.0 provides perfect discrimination. Figure 4 shows qRT-PCR results comparing RNA transcripts for four genes between OSCC cases and controls (see Example 2).
Figure 5 shows the tissue specificity of a squamous cell carcinoma gene set (SSCIGS) consisting of LAMC2 and COL4A1 (top) and a SSCIGS consisting of COL1A1 and PADH (bottom). The data from Example 5 are represented in Box Whisker plots of logistic regression scores (y axis) for normal controls and cases in an internal testing set (N: normal, DYS: dysplasia, T: OSCC), GEO GSE6791 head and neck normal controls (HNN) and cases (HNT), GEOGSE 6791 cervical normal controls (CN) and cases (CT), and GEO GSE6044 lung normal controls (LN), lung squamous cell carcinoma (LSCC), lung adenocarcinoma (LAD) and lung small cell cancer (LSC). Figure 6 shows a list of biomarker genes that are differentially expressed between OSCC and dysplasia/normal controls, and which can be utilized, for example, to distinguish between frank OSCC and dysplasia.
Figure 7 shows the sequence information, including GenBank accession numbers and descriptive annotations for SEQ ID NOS:1 -200, a list of differentially expressed SCC biomarker genes that may be used to identify the risk or presence of SCC in a subject.
Figure 8 shows the sequence identifiers for SEQ ID NOS:201 -331 , a list of exemplary Affymetrix probes (Affymethx Corp., Santa Clara, CA) that specifically hybridize to certain of the SCC biomarker genes or gene sets described herein. The middle column shows the Affymetrix probe identifier and the far right column shows the corresponding biomarker gene to which the probe specifically hybridizes.
Figure 9 shows a selected set of exemplary Affymetrix probes that specifically hybridize to certain of the SCC biomarker genes that can be utilized to discriminate between OSCC tumor cells and dysplastic epithelial cells. The middle column shows the Affymetrix probe identifier and the far right column shows the corresponding biomarker gene to which the probe specifically hybridizes.
Figure 10 shows supervised hierarchical cluster analysis of the gene expression data. The 131 probe sets were clustered as described in the text. The bar underneath the heat map codes the samples according to tissue phenotype: normal, dysplasias and tumors. Each column in the heat map represents the expression levels for all genes in a particular sample, whereas each row represents the relative expression of a particular gene across all samples. The expression level of any gene in any given sample (relative to the mean expression level of that gene across all samples) was also recorded along a color scale (not shown) in which red represents transcription up-regulation, green represents down-regulation, and the color intensity indicates the magnitude of deviation from the mean. Cluster 1 refers to a group of probe sets which appear to be only fully downregulated in a group of 45 patients labeled with a bar at the bottom of the heat map. Figure 11 shows Principal Component Analysis (PCA) using the
131 probe sets. The first principal component (PC) is plotted on the x-axis and captures 63.28 % of the variance. The second PC is plotted on the y-axis and captures 5.66 % of the variance.
Figure 12 shows survival and OSCC-specific mortality estimates in OSCC patients. The two groups were identified with hierarchical clustering analysis using the 131 differentially expressed genes in invasive OSCC as described in the text. 12A. Kaplan-Meier analysis of all-cause mortality. Vertical marks represent censored events. 12B. Cumulative incidence of OSCC-specific mortality. Figure 13 shows Receiver Operating Characteristic Analysis of 2- year Survival Comparing the Prognostic Ability of Stage with Gene Expression Data. 13A. ROC Curves for 2-year survival for , 'stage', 'LAMC2' and 'PCA'. 13B. ROC Curves for 2-year survival for models 'stage', 'stage and LAMC2' and 'stage and PCA'. 13C. Area Under the Curve (AUC) and bootstrapped 95% Confidence Intervals for all five models.
Figure 14 shows three-dimensional plot of the first and second principal components and the risk scores from the top Cox regression model (0.59151 *LAMC2). The samples are color coded according to vital status. Diamonds (0) are used to show the overlap between risk scores and samples from either the group of 45 (red diamonds) or the group of 74 (blue diamonds).
DETAILED DESCRIPTION
Embodiments of the present invention relate generally to the use of highly predictive gene expression profiling, based on differentially expressed biomarker genes or gene sets, to detect the presence or risk of squamous cell carcinoma (SCC) in a subject, and in certain further embodiments to identify an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC. In certain embodiments, the methods provided herein may be used to identify a variety of head and neck squamous cell carcinomas (HNSCC), including oral squamous cell carcinomas (OSCC).
Gene expression profiling is a useful way to distinguish between cells that express different phenotypes, and may be used in particular embodiments to distinguish between cancer cells and normal cells, or in other embodiments to distinguish between different types of cancer cells, and/or in certain further embodiments to identify aggressively neoplastic OSCC cells in a method for identifying an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC. Gene expression profiling according to the methods provided herein relates generally to measurements of selected biomarker genes or gene sets shown to be differentially expressed in various types of SCC, such as HNSCC and OSCC. In certain embodiments, differentially expressed biomarker genes or gene sets that may be used to identify SCC are referred to herein as squamous cell carcinoma indicator gene sets (SCCIGS), which are exemplified in Figures 2 and 7 and detailed below. In certain embodiments, subsets of such differentially expressed biomarker genes or gene sets that may be used to identify aggressively neoplastic SCC that, as described herein, may be indicators of an increased risk of OSCC-specific mortality, are referred to herein as squamous cell carcinoma indicator gene set subsets (SCCIGSS), which are exemplified in Tables 2 and 4 and which are described in greater detail below. According to certain embodiments of the present invention, determination and comparison of the expression levels of one or more selected biomarker genes or gene sets, such as a given SCCIGS, provide novel and useful parameters for diagnosing the risk or presence of SCC in a subject. As such, certain embodiments of the present invention relate to methods for identifying the risk or presence of SCC in a subject by comparing the expression levels of selected biomarker genes or gene sets in a biological sample from that subject to the expression levels of those same biomarker genes or gene sets in an appropriate control, such as a normal tissue known to be free of SCC (e.g., a control tissue or a reference tissue). The presence or risk of SCC may be identified by the differential expression of the selected biomarker genes or gene sets in the subject sample as compared to the control. Certain embodiments may also include a simple genetic test based on the gene expression profile of one or more selected biomarker genes, such as a selected SCCIGS. In these and other aspects, a simple genetic test may employ selected probes or probe sets that are specific for one or more SCCIGS as provided herein, including but not limited to the Affymetrix oligonucleotide probes exemplified herein and variants thereof, to measure the gene expression levels of a SCCIGS or other biomarker gene, and to compare those levels to a reference SCCIGS expression level in an appropriate control (e.g., a control tissue or a reference tissue). Such a simple genetic test provides advantages over other previous methods in the art, which have failed to utilize rigorous statistical analyses to identify highly predictive and readily detectable sets of biomarker genes or gene sets. As described herein, also provided is a corresponding set of biomarker gene probes, for use in diagnosing the presence of SCC in a subject and/or the risk of developing SCC. In similar fashion and as described herein, certain embodiments further contemplate identifying substantial down-regulation (e.g., expression that is reduced in a statistically significant manner by at least 50%, 60%, 70%, 80%, 90%, 95% or more, relative to an appropriate control group) of a subset of SCCIGS, referred to as SCCIGSS, where such down-regulation may indicate an increased risk of OSCC-specific mortality relative to other OSCC cases identified according to the disclosure herein.
In certain embodiments of the present invention, without wishing to be bound by any one theory, it is believed that patients who develop local recurrence and/or second primary oral tumors are those whose surgical margins or uninvolved buccal mucosa harbor molecular changes that are found in oral dysplasia or invasive OSCC. In these and related embodiments, the predictive models provided herein may be used to test biopsies of histologically normal surgical margins and clinically normal oral mucosa of OSCC patients, in order to identify a risk for having, or the presence of, local recurrence and/or second primary oral cancer.
The strong predictive power of the presently disclosed biomarker genes and/or gene sets when used according to the methods described herein may be exploited generally to differentiate between normal cells, pre-neoplastic cells (i.e., dysplasia), and SCC tumor cells, including a variety of HNSCC tumor cells, such as OSCC tumor cells. Among other uses apparent to a person skilled in the art based on the present disclosure, this predictive power may find use in a clinical setting, for example, to identify or monitor subjects having SCC, or subjects at risk for developing SCC, and may also find use in research settings, for instance, to further characterize the underlying biological bases of SCC oncogenesis and pathology.
Squamous cell carcinoma (SCC) generally includes malignant tumors of squamous epithelium (i.e., epithelium that shows squamous cell differentiation), which may occur in many different organs, including the skin, lips, mouth, esophagus, urinary bladder, prostate, lungs, vagina, and cervix. Squamous cells form the surface tissue layer (i.e., epithelium) of much of the body, and include cells of the skin and mucous membranes. SCC is thought to derive from keratinizing or malpighian epithelial cells. One hallmark of squamous cell carcinoma is the presence of keratin, or "keratin pearls," on histologic evaluation. These keratin formations typically relate to well-formed desmosome attachments and intracytoplasmic bundles of keratin tonofilaments. SCC is morphologically variable, and may appear, by way of non-limiting example, as plaques, nodules, or verrucae. SCC usually begins as surface lesions with erythema and slight elevation, often termed erythroplasia. Some early SCC lesions may appear to be pure white, and are referred to as leukoplakia, but only a small percentage of leukoplakia lesions represent carcinoma in situ or invasive carcinoma. Erythoplasia, or early red lesions, are typically asymptomatic and may represent either carcinoma in situ or invasive carcinoma. Tender, painful lesions usually are suggestive of perineural invasions. When lesions become palpable masses, symptoms such as a vague persistent sore throat or ear infection typically occur. In more advanced cases, dissemination to ipsilateral submandibular and jugulodigastric nodes is common, and a subject suspected of having SCC may present with a mass in the neck. When lymph node or remote bone and organ metastases are associated with an early oral primary lesion, often a second, more advanced primary upper aero-digestive or lung cancer is responsible for the metastases.
Head-and-neck squamous cell carcinoma (HNSCC) refers generally to a group of biologically similar squamous cell carcinomas originating from the upper aero-digestive tract, including the lip, oral cavity (mouth), nasal cavity, paranasal sinuses, pharynx, and larynx, among others. HNSCC often spreads to the lymph nodes of the neck, which may represents the first, and sometimes only, manifestation of the disease at the time of diagnosis.
HNSCCs are typically characterized by their originating tissues. For example, HNSCCs may arise from the salivary glands, which produce saliva, the fluid that keeps mucosal surfaces in the mouth and throat moist. The major salivary glands may be found in the floor of the mouth and near the jawbone. The paranasal sinuses are small hollow spaces in the bones of the head surrounding the nose. The nasal cavity is the hollow space inside the nose. HNSCCs may also originate in the pharynx. The pharynx is essentially a hollow tube common to the upper digestive and respiratory tracts, originating behind the nose, forming the throat lumen and leading to the esophagus and the trachea. The pharynx has three parts, the nasopharynx, the oropharynx, and the hypopharynx. Nasopharyngeal cancer arises in the nasopharynx, the region in which the nasal cavities and the Eustachian tubes connect with the upper part of the throat. Oropharyngeal cancer often begins in the oropharynx, the middle part of the throat that includes the soft palate, the base of the tongue, and the tonsils. The hypopharynx includes the pyriform sinuses, the posterior pharyngeal wall, and the postcricoid area. Tumors of the hypopharynx frequently have an advanced stage at diagnosis, and have the most adverse prognoses of pharyngeal tumors, tending to metastasize early due to the extensive lymphatic network around the larynx. HNSCC may also originate in the larynx, or "voice box." Such cancers may occur on the vocal folds themselves (i.e., "glottic" cancer), or on tissues above and below the true cords (i.e., "supraglottis and "subglottic" cancers, respectively). Laryngeal cancer is strongly associated with tobacco smoking. In general, HNSCC is highly curable if detected early, usually with some form of surgery, although chemotherapy and radiation therapy may also play an important role.
SCC of the oral cavity, or mouth, may represent one particular aspect of HNSCC, and is typically referred to as oral squamous cell carcinoma (OSCC). OSCC is associated with substantial mortality and morbidity. OSCC relates generally to the formation of SCC in the area extending from the vermilion border of the lips to a plane between the junction of the hard and soft palate superiorly and the circumvallate papillae of the tongue infehorly. This area includes, for example, the front two thirds of the tongue, the gingiva (gums), the buccal mucosa (the lining of the inside of the cheeks, the floor (bottom) of the mouth under the tongue, the hard palate (the roof of the mouth), and/or the retromolar trigone (the small area behind the wisdom teeth).
OSCC typically spreads primarily by either local extension or by the lymphatic system. The extent of tumor invasion depends upon the anatomic site, the tumor's biologic aggressiveness, and host response factors.
The lymphatic system is the most important and frequent route of metastasis in OSCC. Typically, the ipsilateral cervical lymph nodes are the primary site for metastatic deposits, but occasionally contralateral or bilateral metastatic deposits may be detected. The risk for lymphatic spread is greater for posterior lesions of the oral cavity, possibly because of delayed diagnosis or increased lymphatic drainage at those sites, or both. Cervical lymph nodes with metastatic deposits tend to appear as firm-to-hard, nontender enlargements. Once the tumor cells perforate the nodal capsule and invade the surrounding tissue, these lymph nodes often become fixed and non-mobile. Metastatic spread of tumor deposits from oral carcinoma usually occurs in an orderly pattern, beginning with the uppermost lymph nodes and spreading down the cervical chain. Because of this pattern of spread, the jugulo- digastric nodes are most prone to early metastasis. Carcinomas involving the lower lip and floor of the mouth are an exception, as they tend to spread to the submental nodes. Although lymph node metastasis is not an early event, many individuals with oral cancer nonetheless present at diagnosis with nodal metastasis. Hematogenous spread of tumor cells is infrequent in the oral cavity but may occur because of direct vascular invasion or seeding from surgical manipulation.
Many patients with OSCC will present initially with highly confined localized disease stages. These patients may be treated with curative intent, usually involving surgery, radiation therapy, or both. Only about 20-40% of patients will develop a local or regional tumor recurrence. However, over subsequent years, these "cured" patients are often at higher risk for developing a second malignancy than for developing a recurrence of the initial tumor. Tumor recurrences most often occur during the first 2 years after therapy; later recurrences are rare. Second malignancies, on the other hand, may be observed at a steady rate-perhaps 3-5% per year. Thus, with sufficient follow-up time, second malignancies or other medical diseases become greater problems than recurrence of the primary disease. The methods provided herein may be utilized to monitor such tumor recurrences or second malignancies. The term dysplasia, or dysplastic cell, refers generally to a maturation abnormality of cells within a tissue, which often involves the expansion of immature cells and a corresponding decrease in the number of mature cells at a given site. Dysplasia is often indicative of an early or preneoplastic process. The term dysplasia is typically used when the cellular abnormality is restricted to the originating tissue, as in the case of an early, in-situ neoplasm. Dysplasia is often considered the earliest form of pre-cancerous lesion recognizable in a biopsy, and dysplasia relevant to HNSCC or OSCC typically relates to dysplasia of epithelial cells. Dysplasia may be further characterized as "low grade" or "high grade." The risk of low grade dysplasia transforming into high grade dysplasia and, eventually, cancer is low. High grade dysplasia represents a more advanced progression towards malignant transformation, with increased risk of developing a carcinoma in situ. Carcinoma in situ, meaning "cancer in place," represents generally the transformation of a neoplastic lesion to one in which cells undergo essentially no maturation, and thus may be considered cancer-like. In this state, cells are often considered to have lost their tissue identity, and have reverted to a primitive cell form that grows rapidly and without regulation. This form of cancer, however, often remains localized, and has not invaded into tissues below the surface.
Invasive carcinoma refers generally to a cancer that has invaded beyond the original tissue layer or basement membrane and may be able to spread to other parts of the body (i.e., metastasize). The molecular events involved in the development of squamous dysplasia and subsequent carcinoma are poorly understood.
According to certain embodiments, a subject may include any animal, any mammal, and particularly any human individual having, at risk for having, or suspected of having, a SCC tumor cell, such as a HNSCC tumor cell, an OSCC tumor cell, and/or a pre-neoplastic growth, such as a dysplastic cell. Such a subject may have previously undergone treatment for SCC and may be at risk for developing another case of SCC, or may be newly suspected of having SCC or SCC-related dysplasia. Subjects may be identified according to routine clinical techniques described herein and known in the art, including, for example, by clinical examination of the head and neck, skin, mouth, or other relevant tissue (see, e.g., Epstein et al., Can Fam Physician. 54:870-5, 2008; and Robinson et al., Otolaryngol Clin North Am. 39:295-306, 2006), and/or by various imaging technologies (see, e.g., Piatta et al., Acta Otorhinolaryngol Ital. 28:49-54, 2008; and Isles et al., Clin Otolaryngol. 33:210-22, 2008). Toluidine blue (vital staining) also provides a useful adjunct to clinical examination. The mechanism of vital staining is based on selective binding of the dye to dysplastic or malignant cells in the oral epithelium (e.g., Helsper, CA Cane. J. Clin. 22:172, 1972). According to non-limiting theory, it may be that toluidine blue selectively stains for acidic tissue components and thus binds more readily to DNA, which is increased in neoplastic cells. Vital staining can also help to determine the most appropriate biopsy sites and to surgically delineate margins. Diagnostic imaging evaluation, such as either computer tomography (CT) scanning or magnetic resonance imaging (MRI), may also be used to identify a subject at risk for SCC, and further to assess the extent of local and/or regional tumor spread, the depth of invasion, and the extent of lymphadenopathy. CT is often considered superior in detecting early bone invasion and lymph node metastasis, but MRI is typically preferred for assessing the extent of soft tissue involvement and for providing a three-dimensional display of the tumor. MRI is also the preferred technique for imaging carcinoma of the nasopharynx or lesions involving paranasal sinuses or the skull base. Typical symptoms associated with SCC such as HNSCC or OSCC may include, for example, a sore on the lip or in the mouth that does not heal, a lump or thickening on the lips or gums or in the mouth, a white or red patch on the gums, tongue, tonsils, or lining of the mouth, bleeding, pain, or numbness in the lip or mouth, change in voice, loose teeth or dentures that no longer fit well, trouble chewing or swallowing or moving the tongue or jaw, swelling of jaw, and/or sore throat or feeling that something is caught in the throat.
Merely by way of non-limiting example, a subject having or suspected of having SCC may be identified by oral lesions that appear in areas of erythroplakia or leukoplakia, and which may be exophytic or ulcerated. Both the latter variants are typically indurated and firm with a rolled border. Tonsillar carcinoma in a subject usually presents as an asymmetric swelling and sore throat in which pain often radiates to the ipsilateral ear; a metastatic mass in the neck may be the first symptom. OSCC associated lesions are described in Detecting Oral Cancer, A Guide for Health Care Professionals, U.S. Department of Health and Human Services, National Institutes of Health, Bethesda, MD.
Risk factors for HNSCC and/or OSCC that may be used to identify a subject according to certain herein disclosed embodiments may include, for example, tobacco product use, heavy alcohol use, exposure to sunlight (e.g., lower lip SCC), being male, and being infected with human papillomavirus (HPV) or Epstein-Barr virus (EBV). Environmental exposures to paint fumes, plastic byproducts, wood dust, asbestos, and/or gasoline fumes have also been implicated as risk factors. Gastroesophageal reflux disease is thought to be a significant risk factor for cancer of the larynx, and especially the anterior two thirds of the vocal cords. Irritation from poorly fitting dentures also has been implicated. Additional risk factors for HNSCC or SCC are described, for example, in Napier et al. J Oral Pathol Med. 37:1 -10, 2008. Biological samples may be provided by obtaining a blood sample, biopsy specimen, tissue explant, organ culture or any other tissue or cell preparation from a subject or a biological source, including tissue extracts or lysates derived from biopsies, cell extracts or lysates, nucleic acid extracts {e.g., RNA or DNA), and/or protein extracts and/or biological fluids including body fluids. The subject or biological source may be a human or non-human animal, a primary cell culture or culture adapted cell line including but not limited to genetically engineered cell lines that may contain chromosomally integrated or episomal recombinant nucleic acid sequences, immortalized or immortalizable cell lines, somatic cell hybrid cell lines, differentiated or differentiatable cell lines, transformed cell lines and the like. In certain preferred embodiments of the invention, the subject or biological source may be suspected of having or being at risk for having SCC, and in certain preferred embodiments of the invention the subject or biological source may be known to be free of a risk or presence of such a condition according to current art-accepted criteria with which the skilled person will be familiar.
A biological sample may include any type of cell-containing or cell- or tissue-derived sample that may be isolated, obtained, or derived from a subject and utilized to determine whether the cells from that subject show the differential SCCIGS expression profile that is characteristic of a SCC tumor cell as provided herein, or of a dysplastic cell as provided herein. In certain preferred embodiments, the expression level(s) may be determined for one or more SCC- related biomarker genes or SCCIGS as identified herein, using the methods described herein and molecular biology techniques as known in the art. A suitable biological sample {e.g., a biological sample as provided herein) typically may be suspected of comprising a SCC tumor cell or a dysplastic cell, such as a dysplastic epithelial cell. A biological sample may also include whole cells or fixed cells. Other typical sources of biological samples include cell cultures, as noted above, including but not limited to those in which gene expression states may be manipulated to explore the relationship among genes (including, e.g., SCCIGS).
Biopsy tissues may be obtained, for example, using surgical scalpels, needles, biopsy punches or other means, and typically can be performed under local anesthesia. Incisional biopsy typically refers to the removal of a representative sample of the lesion; excisional biopsy typically refers to the complete removal of the lesion, with a border of normal tissue. A clinician may obtain multiple biopsy specimens of suspicious lesions to define the extent of the primary disease and to evaluate the patient for the presence of possible synchronous second malignancies. Useful adjuncts to biopsies include vital staining, exfoliative cytology, fine needle aspiration biopsy, routine dental radiographs and other plain films, and imaging with magnetic resonance imaging (MRI) or computed tomography (CT). Biopsy tissues may include, by way of non-limiting example, excised tumors or suspected tumors, tumor-positive margin tissues, tumor- negative margin tissues, and/or close margin tissues. Margin tissues refer generally to SCC-related surgical margins, which in turn relate to the area of tissue around the clinical border of a SCC tumor that should be surgically removed to reduce the chance of tumor recurrence at the margins of skin excision. Merely by way of illustrative example, SCC surgical margins can range from about 3 mm to about 1 cm or more around the histologically established border of the SCC tumor, the size of which may be based in part on the staging by TNM classification to determine whether the tumor is considered a low-risk or high-risk tumor (see, e.g., Wittekind, Ch; Sobin, L. H. (2002). TNM classification of malignant tumours. New York: Wiley-Liss). High-grade tumors typically afford larger surgical margins, whereas low-grade tumors typically afford smaller surgical margins. In certain embodiments contemplated herein, surgical margin tissues may be monitored for the differential expression of SCCIGS or SCC gene biomarkers after initial excision, such as for post-operative confirmation of tumor- negative margins, or during a follow-up period, such as for monitoring the potential recurrence of SCC tumor-positive cells in the surgical margin tissues (see, e.g., de Visscher et ai, International Journal of Oral and Maxillofacial Surgery 31 :154-157, 2002).
Examples of biological fluids include body fluids such as blood, serum and serosal fluids, plasma, lymph, urine, cerebrospinal fluid, saliva, mucosal secretions of the secretory tissues and organs, vaginal secretions, ascites fluids such as those associated with non-solid tumors, fluids of the pleural, pericardial, peritoneal, abdominal and other body cavities, and the like. Biological fluids may also include liquid solutions contacted with a subject or biological source, for example, cell and organ culture medium including cell or organ conditioned medium, lavage fluids and the like. In certain preferred embodiments the biological sample is saliva, and in certain other highly preferred embodiments the biological sample is blood or a fluid fraction thereof (e.g., serum, plasma), or lymph. In other preferred embodiments the biological sample is a cell-free liquid solution. Certain embodiments of the present invention relate to the identification and use of one more selected Squamous Cell Carcinoma Gene Sets (SCCIGS). Among other uses as may be apparent to a person skilled in the art based on the present disclosure, the SCCIGSs provided herein may be used to detect SCC in a subject, to identify a risk of developing SCC in a subject, and/or to monitor for the recurrence of SCC in a subject. SSCIGSs represent one or more biomarker genes, alone or in selected combinations (i.e., a biomarker gene set), that identify a risk for having, or the presence of, SCC in a subject when the expression levels of the SSCIGS in a suspected biological sample reflect differential expression (e.g., in a statistically significant manner) compared to the expression levels of the same SSCIGS in control epithelial cells that are known to be free of SCC cells.
A gene relates generally to a unit of inheritance that occupies a specific locus on a chromosome and includes transcriptional and/or translational regulatory sequences and/or a coding region (e.g., a polypeptide encoding region or a region encoding a structural RNA such as tRNA or rRNA, or a functional RNA such as a miRNA) and/or non-translated sequences (i.e., introns, 5' and 3' untranslated sequences). The individual genes in a given SSCIGS may be either over-expressed or under-expressed (i.e., statistically significant higher or lower expression levels) in the biological sample comprising an SCC cell when compared to the control (non-cancer) cell. Typically, it is the differential expression of one or more selected SCCIGS that may identify the presence or risk of SCC in a subject.
In certain embodiments, a SCCIGS may be selected from the preferred SCCIGS models exemplified in Figure 3 (see SCCIGS models 1 -10). In these and other embodiments, Figures 2 and 7 also provide a list of predictive SCC biomarker genes one or more of which may be selected to generate a suitable SCCIGS (see, e.g., SEQ ID NOS:1 -200) for identifying the risk or presence of SCC in a subject.
The SCC biomarker genes and SCCIGS gene sets described herein were identified according to the exemplary processes described in Example 1 below and in Chen et al., Cancer Epidemiol Biomarkers Prev 17(8) Published Online July 30, 2008. As a brief summary of such an exemplary process, to identify potential biomarkers for early detection of invasive OSCC, the data provided herein were generated by comparing gene expression in samples of (i) incident primary OSCC, (ii) oral dysplasia, and (iii) clinically normal oral tissue from surgical patients without head and neck cancer or pre-neoplastic oral lesions (controls), using Affymetrix U133 2.0 Plus arrays. Selected differentially expressed probe sets and their corresponding biomarker genes were identified using a training set of 119 OSCC patients and 35 controls (see Figures 2 and 7; and SEQ ID NOS:1 -200).
To identify certain preferred biomarker gene sets, or preferred SCCIGSs, forward and stepwise logistic regression analyses identified 10 successive combinations of genes the expression of which differentiated OSCC from controls. One preferred SSCIGS model included the LAMC2 gene, encoding laminin gamma 2 chain, and the COL4A1 gene, encoding type IV collagen, alpha 1 chain. Subsequent SSCIGS modeling without these two markers showed that differential expression of the COL1A1 gene, encoding type I collagen, alpha 1 chain, and of the PADH gene, encoding type 1 peptidyl arginine deiminase, can also distinguish OSCC from controls. These two models were validated using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma (HNSCC) cases and 14 controls (GEO GSE6791 ), with sensitivity and specificity above 95% (see Figure 3). These two models were also able to distinguish oral epithelial dysplasia (n=17) from control (n=35) tissue. Differential expression of these four genes was confirmed by qRT-PCR (see Figure 4).
Certain exemplary SCCIGSs that may be utilized according to the present methods include, for example, a SCCIGS that includes or consists essentially of the LAMC2 (SEQ ID NOS:18 and 19) and COL4A1 (SEQ ID NO:113) genes; a SCCIGS that includes or consists essentially of the COL1A1 (SEQ ID NOS:20 and 21 ) and PADH (SEQ ID NO:145) genes; a SCCIGS that includes or consists essentially of the C21orf81 gene (SEQ ID NOS:186 and 187); a SCCIGS that includes or consists essentially of the KRT17 (SEQ ID NO:59) and PRSS3 (SEQ ID NOS:123 and 124) genes; a SCCIGS that includes or consists essentially of the COL1A2 (SEQ ID NO:22) and EST 230740_1 at (SEQ ID NO:198) genes; a SCCIGS that includes or consists essentially of the COL1A1 and XLKD1 (SEQ ID NO:142) genes; a SCCIGS that includes or consists essentially of the THY1 (SEQ ID NOS:97 and 126), FLJ22671 FLJ22671 (also referred to as C2orf54) (SEQ ID NO:143), and HAS3 (SEQ ID NOS:155 and 156) genes; a SCCIGS that includes or consists essentially of the POSTN (SEQ ID NOS:6 and 108) and TIA2 (also referred to as PDPN; SEQ ID NOS:264, 274, 52-55 and 150-153) genes; a SCCIGS that includes or consists essentially of the MGC40368 (also referred to as TCP11 L2; SEQ ID NOS:3, 314), GIP3 (also referred to as IFI6; SEQ ID NOS:34-36, 206) and COL27A1 (SEQ ID NOS:157 and 158) genes; a SCCIGS that includes or consists essentially of the CDH3 (SEQ ID NO:24) and ELOVL6 (SEQ ID NO:109) genes; and a SCCIGS that includes or consists essentially of the COL4A1 (SEQ ID NO:113) gene (see Figures 3 and 7). Additional SCC biomarker genes providing strong predictive powers that may be utilized according to the present methods include those described in Figures 2 and 7.
Also included within the meaning of a SCC biomarker gene or SSCIGS are "variants" of the biomarker or SSC indicator genes described herein (see, e.g., SEQ ID NOS:1 -200), such as splice variants, isoforms, allelic variants (same locus), homologs (different locus), and orthologs (different organism), or biological functional equivalents of such genes. Such variants may also include biomarker genes comprising polynucleotide sequences having 30, 40, 50, 60, 70, 75, 80, 85, 90, 95, 97, 98, 99% identity to a sequence set forth in SEQ ID NOS:1- 200, and which are differentially expressed in a SCC cell as compared to a control cell known to be free of SCC. Such variants also encompass polynucleotide sequences that are distinguished from a reference polynucleotide by the addition (e.g., insertion), deletion or substitution of at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45 or 50 nucleotides. Accordingly, the terms "polynucleotide variant" and "variant" include polynucleotides in which one or more nucleotides have been added, inserted or deleted, or replaced with different nucleotides.
A variety of methods known in the art and described herein may be utilized to determine an SCCIGS expression level, according to the methods provided herein. In certain aspects, measurements of SSCIGS expression levels may include determination in a biological sample from a subject as provided herein, of ribonucleic acid (RNA) and/or protein abundances, or protein activity levels. For example, the expression levels of a biomarker gene or gene set, such as a SCCIGS, may be determined according to the RNA transcript levels of the individual genes within that gene set, such as by measuring the levels of a specific mRNA that is the transcription product of a given SCCIGS gene. RNA transcripts may include, but are not limited to, pre-mRNA nascent transchpt(s), transcript processing intermediates, mature mRNA(s), and degradation products, in addition to nucleic acid amplification products of such sequences (e.g., cDNAs).
Methods for isolating total mRNA, and for determining levels of specific mRNA transcripts therein, are well known to those of skill in the art. For example, methods of isolation and purification of nucleic acids from a biological sample are described in detail in Chapter 3 of Laboratory Techniques in
Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed. Elsevier, N.Y. (1993) and Chapter 3 of Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed. Elsevier, N.Y. (1993)). See also Ausubel et al. (Eds.), 2007 Curr. Protocols Molec. Biol., Wiley, N.Y. RNA levels may be determined according to techniques known in the art and exemplified herein. As non-limiting examples, RNA levels can be measured by utilizing arrays, such as RNA microarray-based techniques known in the art (see, e.g., Goley et al., BMC Cancer 4:20, 2004, performing RNA microarray analysis on needle core biopsies of tumors), and described herein (see Example 1 ), or by utilizing quantitative reverse-transcriptase polymerase chain reactions (qRT-PCR) (see Example 2). As other examples, RNA levels can be determined by relying on other quantitative RNA assays known in the art (see, e.g., Lee et al., Analytical Biochemistry 357:299-301 , 2006). RNA levels may, for instance, be determined by reverse transcribing the mRNA transcript of a given gene to form a cDNA molecule, optionally amplifying the cDNA molecule, and measuring the levels of the DNA molecule, such as by quantitative real-time PCR (qRT-PCR, see, e.g., VanGuilder et al., Biotechniques. 44:619-26, 2008).
Examples of other useful techniques for determining the amount of nucleic acid target sequences {e.g., a mRNA transcript of a biomarker gene) present in a sample based on specific hybridization of an oligonucleotide primer or probe to the target sequence include specific amplification of target nucleic acid sequences and quantification of amplification products, including but not limited to polymerase chain reaction (PCR, Gibbs et al., Nucl. Ac. Res. 77:2437, 1989), transcriptional amplification systems, strand displacement amplification and self-sustained sequence replication (3SR, Gingeras et al., J. Infect. Dis. 164:1066, 1991 ). Examples of other useful techniques include ligase chain reaction (e.g., Landegren et al., Science 241 :1077, 1988; Nickerson et al., Proc. Natl. Acad. Sci. USA 87:8923 1990; Barany, Proc. Natl. Acad. Sci. USA 88:189, 1991 ; Wu et al., Genomics 4:560, 1989), cycled probe technology and solid- phase DNA-binding assays such as those disclosed in U.S. Patent No.
6,340,566, as well as other suitable methods that will be known to those familiar with the art. In situ hybridization (ISH) using oligonucleotide probes also represents a widely used technique to measure mRNA levels, and thereby determine gene expression levels in tissue samples. ISH may be employed in respect to distribution as well as in respect to quantification of gene expression levels of a SCC biomarker (see, e.g., Erdtmann-Vourliotis et al. (Brain Research Protocols. 4:82-91 , 1999).
Protein levels may also be determined according to certain embodiments, to determine the expression level(s) of an individual gene or genes in an SCCIGS, and thereby determine the SCCIGS expression level. Protein levels may be measured either directly, such as by measuring the amount of protein in an extract, cell, tissue, or other biological sample, or indirectly, such as by measuring the amount of protein activity in biological sample.
Protein levels may be measured from cell or tissue extracts, or from whole cells or tissues. Protein extraction from cell samples may be performed according to any of a number of methodologies with which the skilled person will be familiar. For instance, depending on whether the collection of particular cell fractions is desired, cell samples may be typically lysed using one or more of a hypotonic buffer, urea, a chaotrope {e.g., guanidine-HCI) and buffers containing various detergents, such as Nonidet™ NP-40, Triton™ -X100, Tween™-20, alkyl glucosides, betaine-containing surfactants, sodium dodecyl sulfate (SDS) or other detergents recognized in the art for this purpose, and further processed to be compatible with an intended assay. See, e.g., Sambrook et al. (Molecular Cloning: A Laboratory Manual (3rd Edition, 2001 , Cold Spring Harbor Press, Cold Spring Harbor, NY). For protein level analysis in whole cells or tissues, cells or tissues can be either placed in an appropriate analysis buffer for live analysis, or fixed with various fixation agents, such as formaldehyde, paraformaldehyde, methanol, or ethanol, among others, followed by further processing according to the requirements of the intended assay {e.g., flow cytometry, immunohistochemistry, etc.). Methods for measuring protein levels are known in the art. For example, protein or proteomic arrays may be used in certain embodiments, such as antibody microarrays, in which antibodies specific for one or more proteins of interest are spotted onto a protein chip and are used as capture molecules to detect and quantify the proteins from biological samples, such as cell lysate solutions (see generally, Jones et al. Nature 439:168-174, 2006; and Chen et al. Curr Opin Chem Biol 10:28-34, 2006 for protein arrays). Antibodies specific for those proteins expressed by the SCC biomarker genes or gene sets described herein may be generated using techniques known in the art (see, e.g., Harlow and Lane, eds. (1988) Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY; Freshney, (Ed.), Culture of Animal Cells-5th Ed., 2005, Wiley-Liss, NY; ;Masters (Ed.), Animal Cell Culture-3rd Edition, 2000 Oxford Univ. Press, NY).
Other examples of proteomic array based techniques that may be applied to clinical tissue samples include tissue microarrays and surface- enhanced laser desorption/ionization (SELDI-TOF) (see, e.g., Bertucci et al., MoI Cell Proteomics. 5:1772-86, 2006; and Bollard et ai, Proteomics: Clinical Applications, 1 :934-954, 2007). Certain aspects may employ other techniques to measure protein levels, including, but not limited to, western blotting, radio- immunoprecipitation, proteomics, and flow cytometry (see, e.g., Prinz et al. Proteomics. 8:1179-96,2008; and Peterson et ai, Toxicol Pathol. 36:117-32, 2008). Immunohistochemistry-based assessment of protein expression also provides a natural validation method of expression-profiling data that is easily performed on tissue samples (see, e.g., Sullivan et ai, Clin Colorectal Cancer. 7:172-177, 2008). A person skilled in the art will appreciate that the above- described methods are merely exemplary, and that any quantitative or semiquantitative methods to determine gene expression levels may be employed to practice the methods provided herein.
Generally, whether used to detect nucleic acid levels (e.g., mRNA levels, cDNA levels, etc.) or protein levels, an array typically comprises a solid support with peptide or nucleic acid-based probes attached covalently or non- covalently to the support, but the presently contemplated embodiments need not be so limited and may also encompass assays based on fluid-phase interactions. In certain embodiments arrays typically comprise a plurality of different nucleic acid or peptide probes that are coupled to a surface of a substrate in discrete, known locations. These arrays, also described as "microarrays" or "chips" have been generally described in the art, for example, in U.S. Pat. Nos. 5,143,854, 5,445,934, 5,744,305, 5,677,195, 6,040,193, 5,424,186 and Fodor et al., Science, 251 :767 777 (1991 ). Among other methods known in the art, these and similar arrays may be produced using mechanical synthesis methods or light directed synthesis methods, incorporating a combination of photolithographic methods and solid phase synthesis methods. Techniques for the synthesis of these arrays using mechanical synthesis methods are described, for example, in U.S. Pat. Nos. 5,384,261 , and 6,040,193. Arrays may be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. In certain embodiments, an array is fabricated on a planar array surface. Arrays may in other embodiments take the form of peptides or nucleic acids on beads, gels, polymeric surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992. When utilizing RNA arrays or microarrays, quantitative real-time polymerase chain reaction (qRT-PCR), or other nucleic acid based protocols to determine the expression levels of a SCCIGS in a biological sample, certain embodiments may include the use of selected oligonucleotide probes. Oligonucleotide probes refer generally to polymers composed of a multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related structural variants or synthetic analogues thereof) linked via phosphodiester bonds (or related structural variants or synthetic analogues thereof). Thus, while the term oligonucleotide probe typically refers to a nucleotide polymer in which the nucleotide residues and linkages between them are naturally occurring, it will be understood that the term also includes within its scope various analogues including, but not restricted to, peptide nucleic acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl ribonucleic acids, and the like.
The exact size of an oligonucleotide probe can vary depending on the particular application. An oligonucleotide is typically rather short in length, generally from about 10 to 30 nucleotide residues (e.g., 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotide residues), but the term can refer to molecules of any length, including oligonucleotides from about 30 to about 100 or more nucleotide residues in length (e.g., 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 , 52, 53, 54, 55, 56, 57, 58, 59, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more, including all integers in between). Among other features, the length of an oligonucleotide probe may affect its ability specifically to bind or hybridize to its intended target sequence (e.g., a complementary sequence, according to well-established principles of Watson-Crick base pairing), which refers generally to its ability to bind more strongly to its intended target sequence than to any other sequences in a given sample, and, thus, to discriminate between its intended target and the other sequences present in the sample. In certain preferred embodiments, an oligonucleotide probe may be about 25 nucleotide residues in length, including, for example, certain Affymethx® probes (see Example 1 ).
Oligonucleotide probes may be selected or designed according to routine techniques known in the art and described herein for their ability to specifically bind or hybridize to an intended target sequence. In particular, polynucleotides or oligonucleotide probes may be designed or selected to specifically hybridize or bind to the individual genes in certain preferred SSCIGSs (see Figure 3), or may be designed or selected to specifically hybridize or bind more to one or more SCC gene biomarkers, as described herein (see Figures 2 and 7; and SEQ ID NOS:1 -200). A polynucleotide or oligonucleotide probe may also be selected to specifically bind or hybridize to variants, whether naturally- occurring or otherwise (e.g., allelic variants, splice variants), of the SCC biomarker genes described herein. To specifically bind to its intended target sequence, an oligonucleotide probe typically comprises a polynucleotide sequence that is complementary to at least a portion of the polynucleotide sequence of a target gene, e.g., the exemplary target genes of SEQ ID NOS:1-200. For polynucleotides, complementary or complementarity refers generally to polynucleotides related by the well-known nucleotide base-pairing rules. For example, the sequence "A-G-T," is complementary to the sequence "T-C-A." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules, or there may be "full" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. By "corresponds to" or "corresponding to" is typically meant a polynucleotide having a nucleotide sequence that is substantially identical or complementary to all or a portion of a reference polynucleotide sequence. In certain preferred embodiments, an oligonucleotide probe is fully complementary to at least a portion of the polynucleotide sequence of a target gene, since simultaneous consideration of the percent similarity (<90%), the length of identical sequence stretches (<20 bases), and the binding free energy (>-35 kcal mol-1 ) may be predictive of probe specificity (see, e.g., Liebich et al., Appl Environ Microbiol. 72:1688-1691 , 2006). A polynucleotide or probe is fully complementary when there are no base mismatches between the probe and the relevant portion of the target sequence.
In designing an oligonucleotide probe or other polynucleotide sequence that is complementary to the polynucleotide sequence of the target gene, certain characteristics of such probes or sequences may be considered to optimize the ability of the probe to specifically hybridize and detect the target sequence. For example, to avoid false positives, if there is substantial sequence information available for a given source organism (e.g., human) or cell type (e.g., oral epithelial cell), oligonucleotide probes may be chosen that are not similar to any other expressed sequences in that organism or cell type.
It is also known in the art that certain polynucleotide sequences, such as those containing inverted repeats, may be able to self-hybridize and form secondary structures that interfere with specific detection of target sequences. Typically, such sequences may be avoided to improve probe specificity.
As another useful feature of probe design, in certain embodiments oligonucleotide probes of "high complexity," as opposed to probes of "low complexity," may provide more specific target sequence detection. One example of a probe with low complexity includes "AAAAAAA GGAGTTTTTTTT CAAAAAACTTTTT AAAAAAGCTTT" (SEQ ID NO:332). One example of a probe with higher complexity includes "CGTGACTGA CAGCTGACTGC TAGCCATGCAAC" (SEQ ID NO:333). A fast and flexible approach to oligonucleotide probe design for genomes and gene families may be found, for example, in Feng et al. {Bioinformatics 23:1195-1202, 2007) and He et al. (Applied and Environmental Microbiology, 71 :3753-3760, 2005). One example of rational oligonucleotide probe design for in situ hybridization protocols may be found in Erdtmann-Vourliotis et al. {Brain Research Protocols. 4:82-91 , 1999).
Certain exemplary oligonucleotide probes that may used in the methods described herein include the Affymethx oligonucleotide probes described in SEQ ID NOS:201 -331 (see, e.g., Figures 2 and 8). Also contemplated are polynucleotide variants of these probes that are capable of specifically hybridizing or binding to a SCC biomarker gene, including one or more genes identified as a SSCIGS. Polynucleotide variants refer to either a polynucleotide that displays substantial sequence identity with a reference polynucleotide sequence (e.g., at least 70, 75, 80, 85, 90, 91 , 92, 93, 94, 95, 96, 97, 98, or 99% identity to SEQ ID NOS:201 -331 ) or a polynucleotide that hybridizes with a reference sequence, or its complementary sequence, under moderate or stringent conditions that are described hereinafter.
Oligonucleotide probes may be modified according to techniques known in the art, such as to improve stability or facilitate detection. For example, oligonucleotides probes may be modified by directly attaching thereto one or more detectable molecules, as described below. Oligonucleotide probes may also be modified by attaching thereto one or more ligand molecules, such as biotin, that may be used to indirectly attach a detectable molecule, such as a detectable molecule that is bound to one or more avidin molecules.
A variety of detectable molecules may be used to render an oligonucleotide probe detectable, such as a radioisotopes, fluorochromes, dyes, enzymes, nanoparticles, chemiluminescent markers, biotin, or other monomer known in the art that can be detected directly (e.g., by light emission) or indirectly (e.g., by binding of a fluorescently-labeled antibody).
Radioisotopes provide examples of detectable molecules that can be utilized in certain aspects of the present invention. Several radioisotopes can be used as detectable molecules for labeling nucleotides or proteins, including, for example, 32P, 33P, 35S, 3H, and 125I. These radioisotopes have different half- lives, types of decay, and levels of energy which can be tailored to match the needs of a particular protocol. For example, 3H is a low energy emitter which results in low background levels, however this low energy also results in long time periods for autoradiography. Radioactively labeled ribonucleotides, deoxyribonucleotides and amino acids are commercially available. Nucleotides are available that are radioactively labeled at the first, or α, phosphate group, or the third, or γ, phosphate group. For example, both [α - 32P] dATP and [γ - 32P] dATP are commercially available. In addition, different specific activities for radioactively labeled nucleotides are also available commercially and can be tailored for different protocols.
Other examples of detectable molecules that can be utilized to detect an oligonucleotide probe include fluorophores. Several fluorophores can be used for labeling nucleotides including, for example, fluorescein, tetramethylrhodamine, Texas Red, and a number of others (e.g., Haugland, Handbook of Fluorescent Probes - 9th Ed., 2002, Molec. Probes, Inc., Eugene OR; Haugland, The Handbook: A Guide to Fluorescent Probes and Labeling Technologies-10th Ed., 2005, Invitrogen, Carlsbad, CA). Non-radioactive and non-fluorescent detectable molecules are also available. As noted above, biotin can be attached directly to nucleotides and detected by specific and high affinity binding to avidin or streptavidin which has been chemically coupled to an enzyme catalyzing a colohmethc reaction (such as phosphatase, luciferase, or peroxidase). Digoxigenin labeled nucleotides can also similarly be used for non-isotopic detection of nucleic acids. Biotinylated and digoxigenin-labeled nucleotides are commercially available.
Very small particles, termed nanoparticles, also can be used to label oligonucleotide probes. These particles range from 1 -1000 nm in size and include diverse chemical structures such as gold and silver particles and quantum dots. When irradiated with angled incident white light, silver or gold nanoparticles ranging from 40-120 nm will scatter monochromatic light with high intensity. The wavelength of the scattered light is dependent on the size of the particle. Four to five different particles in close proximity will each scatter monochromatic light, which when superimposed will give a specific, unique color. The particles are being manufactured by companies such as Genicon Sciences (Carlsbad, CA). Derivatized silver or gold particles can be attached to a broad array of molecules including, proteins, antibodies, small molecules, receptor ligands, and nucleic acids. For example, the surface of the particle can be chemically derivatized to allow attachment to a nucleotide.
Other types of nanoparticles that can be used for detection of a detectable molecule include quantum dots. Quantum dots are fluorescing crystals 1 -5 nm in diameter that are excitable by light over a large range of wavelengths. Upon excitation by light having an appropriate wavelength, these crystals emit light, such as monochromatic light, with a wavelength dependent on their chemical composition and size. Quantum dots such as CdSe, ZnSe, InP, or InAs possess unique optical properties; these and similar quantum dots are available from a number of commercial sources {e.g., NN-Labs, Fayetteville, AR; Ocean Nanotech, Fayetteville, AR; Nanoco Technologies, Manchester, UK; Sigma-Aldhch, St. Louis, MO).
Many dozens of classes of particles can be created according to the number of size classes of the quantum dot crystals. The size classes of the crystals are created either 1 ) by tight control of crystal formation parameters to create each desired size class of particle, or 2) by creation of batches of crystals under loosely controlled crystal formation parameters, followed by sorting according to desired size and/or emission wavelengths. Two examples of references in which quantum dots are embedded within intrinsic silicon epitaxial layers of semiconductor light emitting/detecting devices are United States Patent Nos. 5,293,050 and 5,354,707 to Chappie Sokol, et al.
In certain embodiments, oligonucleotide probes may be labeled with one or more light-emitting dyes. The light emitted by the dyes can be visible light or invisible light, such as ultraviolet or infrared light. In exemplary embodiments, the dye may be a fluorescence resonance energy transfer (FRET) dye; a xanthene dye, such as fluorescein and rhodamine; a dye that has an amino group in the alpha or beta position (such as a naphthylamine dye, 1- dimethylaminonaphthyl-5-sulfonate, 1 -anilino-8-naphthalende sulfonate and 2-p- touidinyl-6-naphthalene sulfonate); a dye that has 3-phenyl-7- isocyanatocoumarin; an achdine, such as 9-isothiocyanatoachdine and acridine orange; a pyrene, a bensoxadiazole and a stilbene; a dye that has 3-(ε- carboxypentyO-S'-ethyl-δ.δ'-dinnethyloxacarbocyanine (CYA); 6-carboxy fluorescein (FAM); 5&6-carboxyrhodamine-110 (R110); 6-carboxyrhodamine-6G (R6G); N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA); 6-carboxy-X- rhodamine (ROX); θ-carboxy^'.δ'-dichloro^'J'-dinnethoxyfluorescein (JOE); ALEXA Fluor™; Cy2; Texas Red and Rhodamine Red; 6-carboxy-2',4,7,7'- tetrachlorofluorescein (TET); 6-carboxy-2',4,4',5',7,7'-hexachlorofluorescein
(HEX); 5-carboxy-2',4',5',7'-tetrachlorofluorescein (ZOE); NAN; NED; Cy3; Cy3.5; Cy5; Cy5.5; CyT; and Cy7.5; Alexa Fluor 350; Alexa Fluor 488; Alexa Fluor 532; Alexa Fluor 546; Alexa Fluor 568; Alexa Fluor 594; or Alexa Fluor 647.
A detectable molecule can be directly attached to a nucleotide using methods well known in the art. Nucleotides can also be chemically modified or derivatized in order to attach a detectable molecule. For example, a fluorescent monomer such as a fluorescein molecule can be attached to dUTP (deoxyuridine-triphosphate) using a four-atom aminoalkynyl group. In this example, each detectable molecule may be attached to a nucleotide making a detectable molecule: nucleotide complex. Amine-reactive and thiol-reactive fluorophores are available and may be used for labeling nucleotides and biomolecules.
As one example, nucleotides may be fluorescently labeled during chemical synthesis, since incorporation of amines or thiols during nucleotide synthesis permit addition of fluorophores. Fluorescently labeled nucleotides are commercially available. For example, uridine and deoxyuhdine triphosphates are available that are conjugated to ten different fluorophores that cover the spectrum. Fluorescent dyes that can be bound directly to nucleotides can also be utilized as detectable molecules. For example, FAM, JOE, TAMRA, and ROX are amine reactive fluorescent dyes that have been attached to nucleotides and are used in automated DNA sequencing. These fluorescently labeled nucleotides, for example, ROX-ddATP, ROX-ddCTP, ROX-ddGTP and ROX- ddUTP, are commercially available.
As noted herein, the terms specifically binds or specifically hybridizes refer generally to an oligonucleotide probe or polynucleotide sequence that not only binds to its intended target gene sequence in a sample under selected hybridization conditions, but does not bind significantly to other target sequences in the sample, and thereby discriminates between its intended target and all other targets in the target pool. A probe that specifically hybridizes to its intended target sequence may also detect concentration differences under the selected hybridization conditions.
An intended target sequence refers typically to a polynucleotide or nucleic acid sequence, which refers generally to mRNA, RNA, cRNA, cDNA or DNA (i.e., polymeric forms of nucleotides of at least 10 bases in length, either ribonucleotides or deoxynucleotides or a modified form of either type of nucleotide). In certain embodiments, an oligonucleotide probe may specifically bind or specifically hybridize to at least a portion of one polynucleotide having a sequence selected from SEQ ID NOS:1 -200, including variants thereof (e.g., allelic variants, splice variants, etc.) that is differentially expressed in SCC cells (e.g., OSCC) compared to control cells are known to be free of SCC cells. In certain embodiments, an oligonucleotide probe specifically hybridizes to one or more of a polynucleotide having all or a SCCIG-charactehstic portion of a SCCIG transcript (e.g., mRNA transcript of a SCC biomarker gene, or a fragment thereof), a polynucleotide having a nucleotide sequence that is fully complementary to a SCCIG-charactehstic portion of a SCCIG transcript, and/or a nucleic acid amplification product of the above noted polynucleotides (e.g., cDNA of a SCC biomarker gene or SSCIGS, or fragment thereof). A nucleic acid amplification product may be obtained from a biological sample, for example, by performing RT-PCR on a sample cell extract that contains a polynucleotide, such as an mRNA transcript, having all or a SCCIG-charactehstic portion of a SCCIG transcript. A SCCIG-characteristic portion of a SSCIG transcript refers to a segment, stretch, domain, region, portion or the like of any one of the polynucleotides set forth as SEQ ID NOS: 1 -200 (or the full complement thereof), which comprises less than the full-length polynucleotide of the respective one of SEQ ID NOS: 1-200, and which has a nucleotide sequence that is unique to that particular sequence among all polynucleotide transcript sequences found in the species from which the SSCIG set is obtained {e.g., the human transchptome) such that an oligonucleotide probe that hybridizes specifically to the SSCIG- charactehstic portion does not exhibit full complementarity to any other transcript in the subject transchptome. Accordingly, for example, a SCCIG-characteristic portion of any one of SEQ ID NOS: 1 -200 may be derived from the biomarker gene sequences set forth in SEQ ID NOS:1 -200, or from variants thereof. Nucleic acid hybridization conditions include those described herein and known in the art for controlled, detectable annealing of a first oligonucleotide or polynucleotide sequence to a second oligonucleotide or polynucleotide sequence (see Examples 1 and 2), and will often vary depending on the particular application. The term "hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions" refers generally to conditions for hybridization and washing. Guidance for performing hybridization reactions can be found in Sections 6.3.1-6.3.6 of Ausubel et al., ("Current Protocols in Molecular Biology", John Wiley & Sons Inc, 1994-1998, Chapter 15). Aqueous and non-aqueous methods are described in that reference, and either class of method can be used according to embodiments contemplated herein.
Low stringency conditions referred to herein may include and encompass from at least about 1 % v/v to at least about 15% v/v formamide and from at least about 1 M to at least about 2 M salt for hybridization at 420C, and at least about 1 M to at least about 2 M salt for washing at 420C. Low stringency conditions also may include 1 % Bovine Serum Albumin (BSA), 1 mM EDTA, 0.5 M NaHPO4 (pH 7.2), 7% SDS for hybridization at 650C, and (i) 2 x SSC, 0.1 % SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO4 (pH 7.2), 5% SDS for washing at room temperature. One embodiment of low stringency conditions includes hybridization in 6 x sodium chloride/sodium citrate (SSC) at about 450C, followed by two washes in 0.2 x SSC, 0.1 % SDS at least at 5O0C (the temperature of the washes can be increased to 55° C for low stringency conditions). In certain embodiments, medium stringency conditions may include and encompass from at least about 16% v/v to at least about 30% v/v formamide and from at least about 0.5 M to at least about 0.9 M salt for hybridization at 420C, and at least about 0.1 M to at least about 0.2 M salt for washing at 550C. Medium stringency conditions also may include 1 % Bovine Serum Albumin (BSA), 1 mM EDTA, 0.5 M NaHPO4 (pH 7.2), 7% SDS for hybridization at 650C, and (i) 2 x SSC, 0.1 % SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO4 (pH 7.2), 5% SDS for washing at 60-650C. One embodiment of medium stringency conditions includes hybridizing in 6 x SSC at about 450C, followed by one or more washes in 0.2 x SSC, 0.1 % SDS at 6O0C.
In certain embodiments, high stringency conditions may include and encompass from at least about 31 % v/v to at least about 50% v/v formamide and from about 0.01 M to about 0.15 M salt for hybridization at 420C, and about 0.01 M to about 0.02 M salt for washing at 550C. High stringency conditions also may include 1 % BSA, 1 mM EDTA, 0.5 M NaHPO4 (pH 7.2), 7% SDS for hybridization at 650C, and (i) 0.2 x SSC, 0.1 % SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO4 (pH 7.2), 1 % SDS for washing at a temperature in excess of 65° C. One embodiment of high stringency conditions includes hybridizing in 6 x SSC at about 450C, followed by one or more washes in 0.2 x SSC, 0.1 % SDS at 650C. One embodiment of very high stringency conditions includes hybridizing in 0.5 M sodium phosphate, 7% SDS at 650C, followed by one or more washes in 0.2 x SSC, 1 % SDS at 650C.
Other stringency conditions are well known in the art and skilled persons will recognize that various factors can be manipulated to optimize the specificity of the hybridization. Optimization of the stringency of the final washes can serve to ensure a high degree of hybridization. For detailed examples, see Ausubel et al., supra at pages 2.10.1 to 2.10.16 and Sambrook et al. (Molecular Cloning: A Laboratory Manual (3rd Edition, 2001 )), and Maniatis et al. (Molecular Cloning: A Laboratory Manual (1982)). While stringent washes are typically carried out at temperatures from about 420C to about 680C, one skilled in the art will appreciate that other temperatures may be suitable for stringent conditions. Maximum hybridization rate typically occurs at about 2O0C to 250C below the Tm for formation of a DNA- DNA hybrid. It is well known in the art that the Tm is the melting temperature, or temperature at which two complementary polynucleotide sequences dissociate. Methods for estimating Tm are well known in the art (see Ausubel et al., supra at page 2.10.8).
In general, the Tm of a perfectly matched duplex of DNA may be predicted as an approximation by the formula: Tm = 81.5 + 16.6 (log™ M) + 0.41 (%G+C) - 0.63 (% formamide) - (600/length) wherein: M is the concentration of Na+, preferably in the range of 0.01 molar to 0.4 molar; %G+C is the sum of guanosine and cytosine bases as a percentage of the total number of bases, within the range between 30% and 75% G+C; % formamide is the percent formamide concentration by volume; length is the number of base pairs in the DNA duplex. The Tm of a duplex DNA decreases by approximately 1 ° C with every increase of 1 % in the number of randomly mismatched base pairs. Washing is generally carried out at Tm - 150C for high stringency, or Tm - 3O0C for moderate stringency.
In one example of a hybridization procedure, a membrane (e.g., a nitrocellulose membrane or a nylon membrane) or chip containing immobilized DNA is hybridized overnight at 420C in a hybridization buffer (50% deionized formamide, 5 x SSC, 5 x Denhardt's solution (0.1 % ficoll, 0.1 % polyvinylpyrollidone and 0.1 % bovine serum albumin), 0.1 % SDS and 200 mg/mL denatured salmon sperm DNA) containing a labeled probe. The membrane is then subjected to two sequential medium stringency washes (i.e., 2 x SSC, 0.1 % SDS for 15 min at 450C, followed by 2 x SSC, 0.1 % SDS for 15 min at 5O0C), followed by two sequential higher stringency washes (i.e., 0.2 x SSC, 0.1 % SDS for 12 min at 550C followed by 0.2 x SSC and 0.1 % SDS solution for 12 min at 65-680C.
For RNA microarrays, hybridization can be performed according to the Examples provided herein (see Example 1 ) or per other protocols known in the art. For example, a cRNA may be hybridized to an Affymetrix U133 2.0 Plus GeneChip array and scanned using an Affymetrix GeneChip array Scanner 3000 7G per Affymetrix protocols. (Affymetrix Corp., Santa Clara, CA). For PCR based protocols, such as qtRT-PCR, hybridization can be performed according to the Examples provided herein (see Example 2) or per other protocols known in the art (see, e.g., Skrzypski et al., Lung Cancer. 59:147- 54, 2008; and Bustin et al., CHn Sci (Lond). 109:365-79, 2005). The use of particular hybridization temperatures and PCR-based reaction buffers according to the characteristics of one or more selected oligonucleotide probe(s) are well known in the art (see, e.g., Sambrook et al., supra). A person skilled in the art will appreciate that these and other similar protocols may be used to specifically hybridize an oligonucleotide probe to its intended target sequence, and to measure the biomarker gene or SSCIGS expression levels therefrom.
In certain embodiments, the methods of identifying a risk for having, or presence of, SCC in a subject may include comparing SSCIGS expression levels in (i) a reference sample that is from a subject known to be free from SCC, with (ii) SSCIGS expression levels in a suspected biological sample, wherein the differential expression of the SCCIGS indicates the subject has, or is at risk for having, SCC. Differential expression of a SSCIGS refers generally to a statistically significant difference, in a biological sample from a subject that is suspected of having or being at risk for having SCC, in one or more gene expression levels of SCC biomarker(s) or SSCIGS members(s) as compared to the expression levels of the same SSC biomarker(s) or SSCIGS member(s) in an appropriate cancer-free control. The statistically significant difference may relate to either an increase or a decrease in expression levels, as measured by RNA levels, protein levels, protein function, or any other relevant measure of gene expression such as those described herein. A result is typically referred to as statistically significant if it is unlikely to have occurred by chance. The significance level of a test or result relates traditionally to a frequentist statistical hypothesis testing concept. In simple cases, statistical significance may be defined as the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true (a decision known as a Type I error, or "false positive determination"). This decision is often made using the p-value: if the p-value is less than the significance level, then the null hypothesis is rejected. The smaller the p-value, the more significant the result. Bayes factors may also be utilized to determine statistical significance (see, e.g., Goodman S., Ann Intern Med 130:1005-13, 1999).
In more complicated, but practically important cases, the significance level of a test or result may reflect an analysis in which the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true is no more than the stated probability. This type of analysis allows for those applications in which the probability of deciding to reject may be much smaller than the significance level for some sets of assumptions encompassed within the null hypothesis. In certain exemplary embodiments, statistically significant differential expression may include situations wherein the expression level of a given SSCIGS provides at least about a 1.2X, 1.3X, 1.4X, 1.5X, 1.6X, 1.7X, 1.8X, 1.9X. 2.0X., 2.2X, 2.4X, 2.6X, 2,8X, 3.0X, 4.0X, 5.0X, 6.0X, 7.0X, 8.0X, 9.0X, 10.0X, 15.0X, 20.0X, 50.0X, 100.0X, or greater difference in expression (i.e., differential expression that may be higher or lower expression) in a suspected biological sample as compared to an appropriate control, including all integers and decimal points in between (e.g., 1.24X, 1.25X, 2.1X, 2.5X, 60.0X, 75.0X, etc.). In certain embodiments, statistically significant differential expression may include situations wherein the expression level of a given SSCIGS provides at least about 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 percent (%) or greater difference in expression (i.e., differential expression that may be higher or lower) in a suspected biological sample as compared to an appropriate control, including all integers and decimal points in between. As an additional example, differential expression may also be determined by performing Z-testing, i.e., calculating an absolute Z score, as described herein and known in the art (see Example 1 ). Z-testing is typically utilized to identify significant differences between a sample mean and a population mean. For example, as compared to a standard normal table (e.g., a control tissue), at a 95% confidence interval (i.e., at the 5% significance level), a Z-score with an absolute value greater than 1.96 indicates non-randomness. For a 99% confidence interval, if the absolute Z is greater than 2.58, it means that p<.01 , and the difference is even more significant — the null hypothesis can be rejected with greater confidence. In these and related embodiments, an absolute Z-score of 1.96, 2, 2.58, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or more, including all decimal points in between {e.g., 10.1 , 10.6, 11.2, etc.), may provide a strong measure of statistical significance. In certain embodiments, an absolute Z-score of greater than 6 may provide exceptionally high statistical significance.
In certain embodiments, such as when using an Affymetrix Microarray to measure the expression levels of a SSCIGS, differential expression may also be determined by the mean expression value summarized by Affymetrix Microarray Suite 5 software (Affymetrix, Santa Clara, CA), or other similar software, typically with a scaled mean expression value of 1000.
A control tissue or reference tissue refers generally to a cell-based sample, such as an epithelial cell sample, that is known to be free of SCC cells according to currently accepted diagnostic criteria such as those described herein, and may also in certain embodiments relate to an epithelial cell-containing sample that is free of dysplastic cells. "Epithelial" cells refers generally to any one or more of many types of closely packed cells that form the epithelium covering the body {e.g., skin) and the linings of body cavities, for instance, membranous mucosal tissue covering internal organs and other internal surfaces of the body {e.g., the inside of the mouth, the respiratory tract, the gastrointestinal tract, etc.). A person skilled in the art can readily obtain a control sample and determine if it is free of SCC or dysplasia according to techniques known in the art and described herein, e.g., by using pathological, cytological, or molecular biological techniques (see, e.g., Mehrotra et al., MoI Cancer. 5:11 , 2006; and Civantos et al., J Surg Oncol. 97:683-90, 2008). As one example, control tissues may be obtained from individuals undergoing tonsillectomy or oral surgery for treatment of diseases other than cancer, such as obstructive sleep apnea (see Example 1 ). Control tissues, or an appropriate extract thereof, may also be obtained from commercially available sources.
A control tissue or reference tissue may be either internal {i.e., from the same subject as the biological sample) or external {i.e., from a source or subject that is different from the biological sample). In certain embodiments, such as those wherein the biological sample is suspected of containing an OSCC cell, a control cell or reference cell may be obtained as, or derived from, an oral epithelial cell. In certain embodiments, such as those wherein the biological sample is suspected of containing an HNSCC cell, a control cell or reference cell may be obtained as, or derived from, a normal epithelium from the pharynx, hypopharynx, larynx, oral cavity, sinus tissue, or other appropriate control tissue.
Certain embodiments of the present invention relate to the use of the herein described SCC biomarker genes or SSCIGS that can differentiate between SCC tumor cells {e.g., OSCC tumor cells) and dysplastic cells, i.e., dysplasia, a pre-neoplastic cellular state typically considered at risk for developing into SCC carcinoma in situ or invasive carcinoma, although not necessarily destined to do so. Certain of these and related embodiments include methods for identifying the risk or presence of OSCC in a subject having oral epithelial dysplasia but no frank OSCC {i.e., OSCC is not readily apparent), as described herein, such as by comparing the expression levels of one or more selected SSCIGSs in a biological sample from that subject with the expression levels of a reference SSCIGS that is characteristic of a OSCC tumor cell, wherein the substantial similarity of the selected SSCIGS between the biological sample and the reference sample indicates the presence or risk of OSCC. A subject having no frank OSCC may include a subject having no clinically detectable SCC carcinoma, typically as determined by standard diagnostic techniques known in the art and described herein.
Substantial similarly relates generally to the lack of a statistically significant difference in the expression levels between the biological sample and the reference control. Examples of substantially similar expression levels may include situations wherein the expression level of a given SSCIGS provides less than about a .05X, 0.1X, 0.2X, 0.3X, 0.4X, 0.5X, 0.6X, 0.7X, 0.8X, 0.9X. 1.OX., 1.1X, 1.2X, 1.3X, or 1.4X difference in expression {i.e., differential expression that may be higher or lower expression) in a suspected biological sample as compared to an OSCC reference sample, including all decimal points in between {e.g., .15X, 0.25X, 0.35X, etc.). In certain embodiments, differential expression may include situations wherein the expression level of a given SSCIGS provides less than about 0.25. 0.5, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50 percent (%) difference in expression (i.e., differential expression that may be higher or lower) in a suspected biological sample as compared to a reference sample, including all decimal points in between.
In certain embodiments, the particular SCC biomarker genes (e.g., SCCIG) described in Figure 6, including variants thereof, may be utilized to differentiate between SCC tumor cells and dysplastic cells, since as described herein these genes are differentially expressed between SCC tumor cells and dysplastic/control cells. Similarly, certain embodiments may include the use of oligonucleotide probes in the methods for differentiating between SCC tumor cells and dysplastic cells, such as the exemplary probes described in Figures 6 and 9, including variants thereof that can specifically hybridize to a differentially expressed SCC biomarker or SSCIG. When employing selected oligonucleotide probes to differentiate between SCC tumor cells and dysplastic cells, the oligonucleotide probe may be specifically hybridized to one or more of (i) a polynucleotide having all or a SCCIG-charactehstic portion of a SCCIG transcript as provided herein (e.g., mRNA transcript of a SCC biomarker gene, or a fragment thereof), (ii) a polynucleotide having a nucleotide sequence that is fully complementary to a SCCIG-characteristic portion of a SCCIG transcript, and/or (iii) a nucleic acid amplification product of the above noted polynucleotides (e.g., cDNA of a SCC biomarker gene or SSCIGS, or fragment thereof). As noted above, a nucleic acid amplification product may be obtained from a biological sample, for example, by performing reverse transcription-polymerase chain reaction amplification (RT- PCR) of polynucleotide transcription products such as mRNAs that are present in a cell extract and that have all or a SCCIG-characteristic portion of a SCCIG transcript. A SCCIG-characteristic portion of a SSCIG transcript may be derived from the biomarker gene sequences or SSCIGS set forth in SEQ ID NOS:1-200, and in particular, in the biomarker genes referred to in Figure 6, including variants thereof that are differentially expressed in a SCC tumor cell as compared to a dysplastic cell. * * * * * * *
All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification are incorporated herein by reference in their entireties.
Although the foregoing invention has been described in some detail to facilitate understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.
Accordingly, the described embodiments, including the Examples below, are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
EXAMPLES
EXAMPLE 1 IDENTIFICATION OF SQUAMOUS CELL CARCINOMA INDICATOR GENE SETS (SCCIGS)
To identify biomarkers or indicator gene sets for squamous cell carcinoma (SCC), tissues were obtained from patients either having or suspected of having oral SCC (OSCC) and from control patients. Eligible cases were patients with their first primary OSCC scheduled for surgical resection or biopsy between December 1 , 2003 and April 17, 2007 at the University of Washington Medical Center, Harborview Medical Center and the VA Puget Sound Health Care System in Seattle, Washington. Patients with diagnosed dysplastic lesions were also enrolled at these medical centers during the same period. Eligible controls were patients who had tonsillectomy or oral surgery for treatment of diseases other than cancer, e.g., obstructive sleep apnea, at the same institutions and during the same time periods in which the OSCC cases were treated.
Among 244 eligible OSCC patients, consent was obtained from 187 patients. Of these, 171 patients gave permission for medical chart abstraction and provided sufficient tissue to yield GeneChip arrays results that passed quality control (QC) criteria (see below). Among 21 eligible dysplasia cases, 15 provided consent for the study. Of these, the GeneChip results from 11 patients passed QC checks. One dysplasia patient provided dysplasia tissues from two different sites. One OSCC patient provided one piece of cancer tissue and one piece of dysplasia tissue, and assay results from this latter tissue were grouped with the dysplasia patients. Four of the eligible patients originally believed to have OSCC had a final pathology report of dysplasia, and the results from these patients were included in the dysplasia group, and not in the OSCC group for analyses. In total, 17 dysplasia samples were used for analysis. During the case recruitment period, 47 of 55 eligible controls consented to participate. Samples from two controls failed QC checks, leaving 45 control samples for analysis. Each participant was interviewed using a structured questionnaire regarding demographic, medical, functional, quality of life, and lifestyle history, including tobacco and alcohol use. Tumor characteristics (site, stage) were obtained from medical records. This study was conducted with written informed consent and Institutional Review Office approvals.
Tumor tissue was obtained at time of resection or biopsy from patients with a primary OSCC, or dysplasia. Clinically normal tissue from the oral cavity or oropharynx was obtained from controls. For the small number of controls (-30%) with tonsillitis or tonsil hypertrophy, only mucosa tissue from tonsillar pillar was obtained to avoid potential influence of inflammation on the results. Immediately after surgical removal, the tissue was immersed in RNALater (Applied Biosystems, Inc. Foster City, CA) for a minimum of 12 hours at 4°C before being transferred to long term storage at 800C prior to use.
For DNA Microarray analysis, total RNA was extracted, purified, processed using a GeneChip Expression 3'-Amplification Reagents Kit (Affymetrix, Santa Clara, CA), and interrogated with an Affymetrix U133 2.0 Plus GeneChip arrays. Specifically, the RNA and DNA from each specimen was simultaneously extracted using the TRIzol method (Invitrogen, Carlsbad, CA). To increase DNA purity, the DNA extraction protocol was modified to include the use of a "back extraction buffer" (4 M guanidine thiocyanate, 50 mM sodium citrate, and 1 M Tris, pH 8.0). RNA was further purified with the use of an RNeasy mini kit (Qiagen, Valencia, CA) per Affymetrix (Santa Clara, CA) recommendations.
For expression array analysis, 1.0 to 2.5 μg of total RNA was used to generate biotin-labeled cRNA using the GeneChip Expression 3'-Amplification Reagents Kit (Affymetrix) per manufacturer's protocol. The cRNA was hybridized to an Affymetrix U133 2.0 Plus GeneChip array and scanned using an Affymetrix GeneChip array Scanner 3000 7G in the Fred Hutchinson Cancer Research Center's Genomics Shared Resources per Affymetrix protocols. At least one clinically normal tissue sample from a control subject was processed in tandem with every seven to eight tumor tissue samples from OSCC cases.
Two rounds of QC checks were conducted to evaluate whether to include results from each of the GeneChips. In the first round, recommendations made by Affymetrix were followed. In the second round, the "affyQCReport" and "affyPLM" software in the Bioconductor package (http://www.bioconductor.org) were used to search for poor quality chips. In total, 172 chips from 165 patients (119 OSCC patients, 35 controls and 11 dysplasia patients passed two rounds of QC evaluation.
Preprocessing and probe set filtering was performed on the GeneChip arrays that passed QC checks. To this end, the gcRMA algorithm from Bioconductor was used to extract gene expression values and perform normalization. Next, to limit the multiple testing penalty in the statistical testing step, probe sets were eliminated that either showed no variation across the samples being compared {i.e., inter quartile range (IQR) of expression levels less than 0.1 on Iog2 scale) or were expressed at very low magnitude {i.e., any probe set in which the maximum expression value for that probe set in any of the samples was less than 3 on Iog2 scale). After these criteria were applied, -21 ,000 probe sets remained for differential expression analyses.
To examine differential gene expression and to build prediction models, the samples were divided into a training set of 119 OSCC cases and 35 controls and a testing set of 48 OSCC cases and 10 controls. The division of study subjects into training and testing sets was based on the calendar date that patients were enrolled into the study. Gene expression values from gcRMA were analyzed using a regression-based, estimating equations approach implemented in GenePlus software (7, 8). Age and sex were included as covariates in the analyses of the training set. To control type I errors, particular group of genes were declared either "upregulated/overexpressed" or
"downregulated/underexpressed" based on a fixed number of false discoveries (NFD), i.e., the number of false discoveries in a list of discovered genes is controlled at the pre-specified NFD (9). The choice of NFD, with an appropriate account for the number of genes under investigation (J), dictated the threshold for individual gene-specific p-values as NFD/J. Using NFD<1 as a statistical testing criterion, 7,604 candidate probe sets were identified as being differentially expressed between controls and OSCC cases. To build predictive models and substantially reduce the number of comparisons, this list of candidate probe sets was further narrowed using the following criteria, which retained only those probe sets that showed a significantly large difference in signal intensity between cases and controls: 1 ) absolute Z- score of greater than 6 in the differential gene expression analysis, implying exceptionally high statistical significance; 2) a 1.5-fold or greater difference in gene expression between controls and cases and, 3) the mean expression value summarized by Affymetrix Microarray Suite 5.0 across samples >300 (with the scaled mean expression value of 1000). Probe sets with such expression values are more likely to be suitable for validation by alternative methodologies such as qRT-PCR. A selected number of probe sets and their corresponding biomarker genes, including variants thereof, were selected by these three criteria (see Figures 2 and 7).
The selected probe sets were analyzed using both forward and hybrid of forward-backward logistic regression procedures (SAS PROC LOGISTIC). For the one OSCC case with results from 5 replicate tissues and one control with results from duplicate tissues, the respective average of the replicate results was used. In the forward stepwise selection, probe sets were processed in the logistic regression model: one probe set at a time until no probe set could be added based on the significance level of 0.01. When the hybrid stepwise selection was adopted, the probe set with the smallest p-values and p< 0.01 entered first, and significance levels for other selected probe sets was evaluated for possible removal if their p-values were greater than 0.05 in the current model.
The performance of the two models (i.e., results from the forward and hybrid stepwise procedures) was compared using receiver operating characteristic (ROC) curves. An ROC curve is a plot of true positive rate (sensitivity) on the Y-axis against false positive rate (1 -specificity) on the X-axis for each possible value (i.e., the logistic score for each individual for a given model) representing a positive test. A model with perfect discrimination between cases from controls will have a ROC curve that passes through the upper left corner, with 100% sensitivity, 100% specificity, and an area under the curve (AUC) of 1. An AUC=O.5 represents a test that is no better than chance at discriminating between cases and controls (10-12). The selected prediction models were validated using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma (HNSCC) cases and 14 controls (Gene Expression Omnibus (GEO) GSE6791 , www.ncbi.nlm.nih.gov/geo) (13). CEL files from these datasets were extracted using the gcRMA algorithm. ROC curves were drawn by applying the expression results to the prediction models.
Results:
The cases in both the training and testing sets tended to be older than the controls. Compared to controls, cases were more likely to be male, white, and current smokers. Approximately two thirds of the cases had AJCC stage III or IV disease with about 50% of the cases presenting with metastasis to the neck. Oral cavity tumors accounted for 74% and 60% of the OSCC cases in the training and testing sets, respectively. Oropharyngeal tumors accounted for 26% and 40% of the OSCC cases in the training and testing sets, respectively. Most of the dysplasia subjects were white males whose lesions were located in the oral cavity.
Figures 2 and 7 list the selected probe sets and their corresponding biomarker genes that were differentially expressed between OSCC and controls based on the criteria described above. Included among these probe sets are transforming growth factor (TGFB1), cell signaling molecule (STAT1), immune markers (/L 7/3), chemokines {CXCL2, CXCL3, CXCL9), and genes encoding for extracellular matrix proteins and collagens that have previously been shown to be involved in the motility and invasion of tumor cells. Hierarchical clustering of gene expression using the selected probe sets showed that invasive OSCC and normal control formed two main clusters. About half the dysplasia tissues clustered with OSCC samples and half clustered with the controls. Compared to invasive OSCC, oral dysplasia tissue appeared to have a set of genes that were not yet upregulated and another set of genes that were not yet down regulated. Figure 3 shows the top 10 SCCIGS models from the logistic regression analyses of the selected probe sets in the training data set. The model with LAMC2 (probe set 207517_at, encoding Iaminin-γ2; SEQ ID NOS:91 and 92) and COL4A1 (probe set 211980_at, encoding collagen type IV α1 ; SEQ ID NO:113) provided the most discriminating power to separate OSCC from controls (AUC=O.99952). The power to distinguish OSCC from controls was still significant if expression of only one of these two probe sets was used (AUC=0.99424 with COL4A1 alone).
After removing LAMC2 and COL4A1 from subsequent modeling, COL1A1 (probe set 202310_s_, encoding for collagen type I α1 ; SEQ ID NOS:20 and 21 ) and PADH (probe set 220962_s_, encoding for peptidyl arginine deiminase type 1 ; SEQ ID NO:145) emerged as the next set of markers that best separated OSCC from controls (AUC=0.99976).
When the expression values from the testing datasets were applied to the predictive models derived from the training dataset, the model with LAMC2 and COL4A1 provided the most discriminating power to separate OSCC from controls: AUC=O.997 in the independent, internal testing set and AUC=O.976 in the external testing set (GEO GSE6791 ) (see Figure 3). The model with COL1A1 and PADH was also strongly predictive: AUC=O.99167 in the independent, internal testing set and AUC=0.97789 in the external GEO GSE6791 data set (see Figure 3). Results from the testing of the other eight models against the internal and external datasets indicate that they also performed well in distinguishing OSCC from controls (see Figure 3).
Discussion: Described herein are selected probe sets, corresponding to a variety of known genes (see SEQ ID NOS:1-200), which are highly effective in distinguishing invasive OSCC and normal oral tissue. Also described herein is a list of genes that may be involved in the transformation of normal oral tissue to dysplasia, as well as a list of genes that may be involved in the transformation of oral dysplasia to invasive OSCC (see Example 3; and Figure 8). Although prior studies have described global changes in gene transcription that distinguish normal oral epithelium from carcinoma, there is considerable heterogeneity among the lists of genes that have been reported, and few studies have utilized rigorous statistical testing and validation with independent datasets to produce a limited combinations of genes, as described herein, with high sensitivity and specificity in distinguishing OSCC from normal oral tissue, and which provide accurate prediction models (14). Embodiments of the present invention provide prediction models that were generated using rigorous statistical analyses, and the differences in gene expression detected using microarray technology was validated not only by qRT-PCR, but by testing against independent internal and external genome-wide gene expression datasets. The result has been to generate candidate markers and indicator gene sets that can be easily applied to the testing of biopsies or surgical margins to aid diagnosis and prognosis of OSCC.
It is believed that the prediction models and the biomarker genes identified herein may provide utility in predicting local recurrence at surgical margins or the development of second primary cancer of OSCC patients, or for selective screening of individuals who are at high risk of OSCC. If otherwise histologically-negative margins harbor microscopic original tumors as residual disease, the gene expression profiles of such margins would more likely resemble those of the resected invasive OSCC, such that the measurement of one or more of the biomarker genes or SCCIGs identified herein, and/or application of one of the predictive models described herein, could potentially be of use for the detection of residual tumor cells.
Also, for individuals who are at high risk of OSCC, their oral epithelium could contain cells that are molecularly abnormal and primed for the development of cancer. As such, the molecular profile might be more similar to that of a pre-neoplastic oral lesion than that of an invasive OSCC. The list of genes that distinguishes invasive OSCC from dysplasia and controls could be used to gauge the malignant potential of these molecular changes. Recently, p53 and elF4E have been evaluated to augment histologic assessment of surgical margins (4, 15). elF4E expression, but not P53 mutation and overexpression, in histologically negative surgical margins was a significant predictor of recurrence and shorter disease-free survival of HNSCC patients (16- 18).
According to certain embodiments of the present invention, the expression patterns of two pairs of genes (LAMC2 and COL4A1 ; COL1A1 and PADM ) may be particularly effective in distinguishing OSCC from normal oral tissue in independent testing sets. The sensitivity and specificity were close to 100%. Because of the stringent criteria applied to select candidate markers (e.g., SCCIGS), there may be other probe sets among the biomarker genes and exemplary probe sets with a similar predictive property. The results provided herein were adjusted for age and sex.
Although life style characteristics, such as tobacco use and infection with human papillomavirus (HPV) play an important role in OSCC development, no appreciable difference in gene expression on the genome-wide level was observed according to smoking status (i.e., former/current vs. never) or HPV status (i.e., positive vs. negative).
Laminin binds to Type IV collagen and to many cell types via cell surface laminin receptors (24). Following attachment to laminin in the basement membrane, tumor cells secrete collagenase IV that specifically breaks down type IV collagen and thereby facilitates cell spreading and migration (25). In addition, laminin fragments generated by post-translational proteolytic cleavage bind to cell surface integrins and other proteins to trigger and modulate cellular motility (26). Increased levels of laminin have been associated with a number of carcinoma (27-35). In some of these studies, laminin was associated with tumor aggressiveness, metastasis and poor prognosis. Results from mouse models showed that tumor cells with high levels of laminin and low level of unoccupied laminin receptor are resistant to killing by natural cytotoxic T cells and are highly malignant (36), and that treatment with low concentrations of laminin receptor binding fragments of laminin blocked lung metastasis of hematologenously introduced tumor cells (37). A large number of unoccupied laminin receptors have been observed for breast and colon cancer cells (25); no similar reports have appeared on OSCC or HNSCC cells. The gene products of COL4A1 and COL4A2 are assembled into type IV collagen that form the scaffold of basement membrane integrating other extracellular molecules, including laminin, to produce a highly organized structural barrier. Collagen IV also plays an important role in the interaction of basement membrane with cells (38, 39). Immune cells, migrating endothelial cells and metastatic tumor cells have been reported to produce and tightly regulate type IV collagen-specific collagenase (40-42).
Peptidyl arginine deiminases (EC 3.5.3.15) catalyze post- translational modification of proteins through conversion of arginine residues to citrullines. Although their physiological functions are not well understood, these deiminases have been implicated in the genesis of multiple sclerosis, rheumatoid arthritis, and psoriasis (43). The isoform peptidyl arginine deiminases type 1 (PADM ) is present in the keratinocytes of all layers of human epidermis. It has been reported that deimination of filaggrin by PADM is necessary for epidermal barrier function and deimination of keratin K1 may lead to ultrastructural changes of the extracellular matrix (43). The expression of PADM is downregulated in both dysplasia and OSCC when compared to controls. If deimination of arginine residues of proteins in the keratinocytes of oral mucosa by PADM forms an epidermis barrier, downregulation of PADM may allow the growth, expansion and movement of tumor cells.
EXAMPLE 2 VALIDATION OF SQUAMOUS CELL CARCINOMA INDICATOR GENE SETS BY QRT-PCR
The expression of LAMC2, COL4A1, COL1A1, and PADH in OSCC compared to controls was validated by quantitative real-time PCR (qRT-PCR). Generally, qRT-PCR was performed in triplicate on a subset of 30 OSCC cases and 30 controls and bioinformatically validated.
Specifically, each sample containing 7.5 ng purified total RNA was assayed in triplicate in 10 μl reaction volumes using the QuantiTect™ SYBR Green RT-PCR kit (Qiagen, Valencia, CA) and bioinformatically validated QuantiTect primers (Qiagen, Valencia, CA) on a 7900HT Sequence Detection System (ABI, Foster City, CA). The cycling conditions were as follows: 30 minutes at 500C, 15 minutes at 95°C, and 40 cycles of 15 seconds at 94°C, 30 seconds at 55°C, and 30 seconds at 72°C.
For COL1A1 (NM_000088), a 118-bp amplicon spanning exons 1 and 2 was amplified. For COL4A1 (NM_001845), a 119-bp amplicon spanning exons 6, 7, 8, and 9 was amplified. For LAMC2 (NM_005562), a 74-bp amplicon spanning exons 18 and 19 was amplified. For PADH (NM_013358), an 80-bp amplicon spanning exons 3, 4, and 5 was amplified. ACTB was used as the reference gene and amplified a 146-bp amplicon that spanned exons 3 and 4.
Ten-point standard curves were generated using Universal Human Reference RNA (Stratagene, La JoIIa, CA) for all genes except PADM , for which Normal Adjacent Esophagus Total RNA (Ambion, Austin, TX) was used. The linear correlation coefficient (R2) was 0.99 or greater for all runs. The mean threshold cycle (Ct) values were calculated from the triplicate Ct values. Mean Ct values were further normalized in relation to the mean Ct value of the ACTB gene.
Results of qRT-PCR on LAMC2, COL4A1, COL1A1 and PADH confirmed the differential expression of these genes between OSCC and controls at the transcript level (see Figure 4).
EXAMPLE 3 COMPARISON OF GENE EXPRESSION PROFILES IN CONTROLS, DYSPLASTIC LESIONS
AND INVASIVE CANCER.
While the expression of some genes may be continuously increasing or decreasing from the moment normal oral tissue begins its oncogenic process, it is also possible that some genes get turned on or off during the conversion from dysplasia to invasive cancer. To test this hypothesis, and to identify genes that may be specific for the conversion of dysplasia to OSCC, the gene expression profiles of invasive cancers (n=167) were compared with the profiles of a combination of normal oral tissue (from 45 controls) and dysplastic lesions (n=17) using -21 ,000 filtered probe sets. From those probe sets that were differentially expressed between OSCC samples and the combination of controls and dysplastic lesions, those that were differentially expressed between controls and dysplasia were further excluded using as the number of false discoveries (discussed above), NFD=I . The resulting gene list contained the genes that were up- or downregulated in OSCC but not in dysplasia.
Conversely, the dysplastic lesions and OSCC samples were combined and compared with the controls. For those probe sets showing differential expression, the genes that were also differentially expressed between dysplasia and cancer were excluded. The resulting gene list contained genes that showed up- or downregulation relative to normal tissue as early as dysplasia. Comparison of gene expressions of invasive cancer with those of normal oral tissue (from controls) and dysplasia combined using -21 ,000 filtered probe sets, followed by elimination of those probe sets that were differentially expressed between dysplasia and controls, showed the differential expression of 6544 probe sets, including 3988 up-regulated and 2666 down-regulated probe sets in invasive OSCC. From the original biomarker genes and corresponding probe sets used to identify OSCC from controls, Figure 6 lists the biomarker genes and probe sets that may be specific for the conversion of oral dysplasia to OSCC. Further, selected probe sets that were specific for the development of dysplasia from normal tissue were also identified.
Accordingly, a set has been identified herein of genes that are involved in, and specific for, the malignant transformation of oral dysplasia into invasive OSCC. Included genes encode for proteins having known roles in cell- ECM (extracellular matrix) and cell-cell interactions, and/or in cellular motility, migration and/or invasion, such as LAMC2 and SERPINE1 (PAI-1 ); for directed- cellular movement, such as CXCL2, CXCL3, and CXCL9, as well as for immune function, such as IL1β and IFIT3. Among the biomarker genes and probe sets described herein, a large number of collagen genes were among the sets that may be associated with the conversion of oral tissue to dysplasia (Figure 2) and were absent among the probe sets that may be involved in the conversion of dysplasia to invasive OSCC (Figure 6). These observations suggested that collagen genes may play an important role early in the oral carcinogenesis process. EXAMPLE 4 ANALYSIS OF BIOLOGICAL PATHWAYS IN OSCC
Biological pathway analyses and hierarchical clustering of differentially expressed genes were performed. To this end, the 7,604 probe sets that were differentially expressed between OSCC and controls (see Example 1 , supra) were analyzed using Ingenuity® Pathway Analysis 4.0 software (lngenuity®Systems, www.lngenuity.com). Using Affymetrix GeneSpring™ software GX7.3.1 , hierarchical clustering of all the samples was also performed based on the expression of the SCC biomarker genes and corresponding probe sets.
Results obtained with the Ingenuity Pathway Analyses tool showed that the JAK/STAT signaling pathway and the IFN-γ signaling pathway were the two biological pathways that could be most strongly associated with the differentially expressed genes. Figure 1 shows genes that were up- or down- regulated in these two pathways in the training dataset.
A wide array of cytokines and growth factors, including epidermal growth factor receptor (EGFR), can transmit signals through the JAK/STAT pathway (44, 45). EGFR over-expression has been reported in up to 90% of HNSCC tumors (46). Single modality therapeutics that target and negatively regulate EGFR, such as small molecule tyrosine kinase inhibitors, monoclonal antibodies, antisense therapy or immunotoxin conjugates, however, were only effective in 5-15% of patients with advanced HNSCC (47). These observations suggest that there are other proteins and pathways driving the growth of some of these tumors. The results described herein are believed to be the first to show a strong association between the IFN-γ signaling pathway and OSCC, noting that IFN-Y signaling also involves the JAK/STAT pathway (44, 48).
EXAMPLE 5 COMPARISON OF THE PREDICTION MODELS IN DIFFERENT CANCER TISSUES
To test the predictive power of the LAMC2/COL4A1 and COL1A1/PADI1 indicator gene sets for other cancers, gene expression data were downloaded from GEO GSE6791 for normal and tumor cervical tissue samples, and from GSE6044 for normal and tumor lung samples (see also Example 1 ). These datasets were chosen because: 1 ) they were generated using the same Affymetrix U133 GeneChip platform utilized herein, facilitating the testing of the tissue specificity of the predictive models described herein; and 2) OSCC share some of the same risk factors as cervical cancer and lung cancer (e.g., human papillomavirus for cervical cancer and cigarette smoking for both cervical and lung cancer). Gene expression values were extracted using gcRMA, and scores for each of the above-noted prediction models were calculated for each tissue type.
Results:
The model containing LAMC2 and COL4A1 distinguished head and neck squamous cell carcinoma (HNSCC) from controls, but distinguished neither cervical cancer nor lung cancer from their respective controls (Figure 5, top panel). The COL1A1 and PADH predictive model also performed well for HNSCC and, to a lesser extent, for lung cancer (Figure 5, bottom panel). Furthermore, these results showed that the models described herein could not only distinguish invasive cancer from controls, but could also distinguish oral dysplasia from controls. The respective AUC was 0.98 for LAMC2 and COL4A1 and 0.99477 for COL1A1 and PADH. The effect observed for the model LAMC2 and COL4A1 was primarily driven by COL4A1, suggesting that COL4A1 up- regulation occurred earlier than LAMC2 up-regulation in oral carcinogenesis.
Literature Cited
1. Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. CA Cancer J Clin 2005;55:74-108.
2. Silverman SJ, Gorsky M, Lozada F. Oral leukoplakia and malignant transformation. A follow-up study of 257 patients. Cancer
1984;53:563-568.
3. Reibel J. Prognosis of oral pre-malignant lesions: significance of clinical, histopathological, and molecular biological characteristics. Crit Rev Oral Biol Med 2003;14:47-62. 4. Upile T, Fisher C, Jerjes W et al. The uncertainty of the surgical margin in the treatment of head and neck cancer. Oral Oncol 2007;43:321 -6.
5. Batsakis JG. Surgical excision margins: a pathologist's perspective. Adv Anatomic Path 1999;6:140-8.
6. Brandwein-Gensler M, Teixeira MS, Lewis CM et al. Oral squamous cell carcinoma: histologic risk assessment, but not margin status, is strongly predictive of local disease-free and overall survival. Am J Surg Pathol 2005;29:167-178.
7. Thomas JG, Olson JM, Tapscott SJ, Zhao LP. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 2001 ;11 :1227-36.
8. Zhao LP, Prentice R, Breeden L. Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc Natl Acad Sci USA 2001 ;98:5631-6.
9. Xu XL, Olson JM, Zhao LP. A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model. Hum MoI Genet 2002;11 :1977-85.
10. Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978:8:283-98. 11. Griner PF, Mayewski RJ, Mushlin Al, Greenland P. Selection and interpretation of diagnostic tests and procedures. Principles and applications. Ann Intern Med 1981 ;94:557-92.
12. Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem
1993;39:561 -77.
13. Pyeon D, Newton M.A, Lambert PF et al. Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers. Cancer Res 2007;67:4605-4619.
14. Choi P, Chen C. Genetic Expression Profiles and Biologic Pathway Alterations in Head and Neck Squamous Cell Carcinoma. Cancer 2005:104:1113-28.
15. Black C, Marotti J, Zarovnaya E, Paydarfar J. Critical evaluation of frozen section margins in head and neck cancer resections. Cancer
2006:107:2792-2800.
16. Nathan CO, Franklin S, Abreo FW, Nassar R, De Benedetti A, Glass J. Analysis of surgical margins with the molecular marker elF4E: a prognostic factor in patients with head and neck cancer. J Clin Oncol 1999:17:2909-2914.
17. Nathan CO, Sanders K, Abreo FW, Nassar R, Glass J. Correlation of p53 and the protooncogene elF4E in larynx cancers: prognostic implications. Cancer Res 2000;60:3599-604.
18. Nathan CA, Amirghahri N, Rice C, Abreo FW, Shi R, Stucker FJ. Molecular analysis of surgical margins in head and neck squamous cell carcinoma patients, Laryngoscope 2002;112:2129-40.
19. Mendez E, Cheng C, Farwell DG et al. Transcriptional expression profiles of oral squamous cell carcinomas. Cancer 2002;95:1482-94.
20. Patel V, Aldhdge K, Ensley JF et al. Laminin-gamma2 overexpression in head-and-neck squamous cell carcinoma, lnt J Cancer
2002:99:583-8. 21. Lindberg P, Larsson A, Nielsen BS. Expression of plasminogen activator inhibitor-1 , urokinase receptor and laminin gamma-2 chain is an early coordinated event in incipient oral squamous cell carcinoma, lnt J Cancer 2006;118:2948-56. 22. Ziober AF, Kirtesh RP, Faizan A et al. Identification of a Gene
Signature for Rapid Screening of Oral Squamous Cell Carcinoma. Clin Cancer Res 2006;12:5960-71.
23. Gonzalez HE, Gujrati M, Frederick M et al. Identification of 9 genes differentially expressed in head and neck squamous cell carcinoma. Arch Otolaryngol Head Neck Surg 2003;129:754-9.
24. McCarthy JB, Basara ML, Palm SL, Sas DF, Furcht LT. The role of cell adhesion proteins- -laminin and fibronectin-in the movement of malignant and metastatic cells. Cancer Metastasis Rev 1985;4:125-52.
25. Liotta LA, Wewer U, Rao NC et al. Biochemical mechanisms of tumor invasion and metastases. Prog Clin Biol Res 1988;256:3-16.
26. Hintermann E, Quaranta V. Epithelial cell motility on laminin-5: regulation by matrix assembly, proteolysis, integrins and erbB receptors. Matrix Biol 2004;23:75-85.
27. Yamamoto H, ltoh F, lku S, Hosokawa M, lmai K. Expression of the gamma(2) chain of laminin-5 at the invasive front is associated with recurrence and poor prognosis in human esophageal squamous cell carcinoma. Clin Cancer Res 2001 ;7:896-900.
28. Kagesato Y, Mizushima H, Koshikawa N et al. Sole expression of laminin gamma 2 chain in invading tumor cells and its association with stromal fibrosis in lung adenocarcinomas, Jpn J Cancer Res 2001 ;92:184-192.
29. Barsky SH, Rao CN, Hyams D, Liotta LA. Characterization of a laminin receptor from human breast carcinoma tissue. Breast Cancer Res Treat 1984;4:181-8.
30. Haslam SZ, Woodward TL. Host microenvironment in breast cancer development: epithelial-cell-stromal-cell interactions and steroid hormone action in normal and cancerous mammary gland. Breast Cancer Res 2003;5:208- 15. 31. Kaklamani VG, Gradishar WJ. Gene expression in breast cancer. Current Treat Options Oncol 2006;7: 123-8.
32. Aishima S, Matsuura S, Terashi T et al. Aberrant expression of laminin gamma 2 chain and its prognostic significance in intrahepatic cholangiocarcinoma according to growth morphology. Mod Pathol 2004;17:938- 45.
33. Soini Y, Maatta M, SaIo S, Tryggvason K, Autio-Harmainen H. Expression of the laminin gamma 2 chain in pancreatic adenocarcinoma. J Pathol 1996;180:290-4. 34. Olsen J, Kirkeby LT, Brorsson MM et al. Converging signals synergistically activate the LAMC2 promoter and lead to accumulation of the laminin gamma 2 chain in human colon carcinoma cells. Biochem J 2003;371 :1 - 21.
35. Gontero P, Banisadr S, Frea B, Brausi M. Metastasis markers in bladder cancer: a review of the literature and clinical considerations. Eur Urol,
2004;46:296-311 .
36. Malinoff HL, McCoy JP Jr, Varani J, Wicha MS. Metastatic potential of murine fibrosarcoma cells is influenced by cell surface laminin. lnt J Cancer 1984:33:651 -655. 37. Barsky SH, Rao CN, Williams JE, Liotta LA. Laminin molecular domains which alter metastasis in a murine model. J Clin Invest 1984;74:843-8.
38. Kuhn K. Basement membrane (type IV) collagen. Matrix Biol 1995:14:439-445.
39. Van Waes C, Carey TE. Overexpression of the A9 antigen/alpha 6 beta 4 integrin in head and neck cancer. Otolaryngol Clin North Am 1992:25:1117-39.
40. Bosman FT, Havenith M, Cleutjens JP. Basement membranes in cancer. Ultrastruct Pathol 1985;8:291-304.
41. Schmidt C, Pollner R, Poschl E, Kuhn K. Expression of human collagen type IV genes is regulated by transcriptional and post-transcriptional mechanisms. FEBS Lett 1992;312: 174-8. 42. Schmidt C, Fischer G, Kadner H, Genersch E, Kuhn K, Poschl,E. Differential effects of DNA-binding proteins on bidirectional transcription from the common promoter region of human collagen type IV genes COL4A1 and COL4A2. Biochim Biophys Acta 1993;1174: 1 -10. 43. Nachat R, Mechin MC, Takahara H et al. Peptidylarginine deiminase isoforms 1-3 are expressed in the epidermis and involved in the deimination of K1 and filaggrin. J Invest Dermatol 2005;124:384-93.
44. O'Shea JJ, Gadina M, Schreiber RD. Cytokine signaling in 2002: new surprises in the Jak/Stat pathway. Cell 2002;109:S121 -31. 45. Rawlings JS, Rosier KM, Harrison DA. The JAK/STAT signaling pathway. J Cell Sci 2004;117:8-3.
46. Kalyankrishna S, Grandis JR. Epidermal growth factor receptor biology in head and neck cancer. J Clin Oncol 2006;24:2666-72.
47. Choong NW, Cohen EE. Epidermal growth factor receptor directed therapy in head and neck cancer. Crit Rev Oncol-Hematol 2006;57:25- 43.
48. Hebenstreit D, Horejs-Hoeck J, Duschl A. JAK/STAT-dependent gene regulation by cytokines. Drug News Persp 2005;18:243-249.
49. Kondoh N, Ohkura S, Arai M et al. Gene expression signatures that can discriminate oral leukoplakia subtypes and squamous cell carcinoma.
Oral Oncol 2007;43:455-462.
EXAMPLE 6 GENETIC EXPRESSION PROFILE ASSOCIATED WITH ORAL CANCER
IDENTIFIES A GROUP OF PATIENTS AT HIGH-RISK OF POOR SURVIVAL
In this Example, OSCC was further sub-classified on the basis of 131 probe sets (108 known genes) which were previously found (Examples 1 -5) to be differentially expressed between OSCC and normal controls; this sub- classification was shown to be associated with survival. In particular, the Example shows that 1 ) there were significant survival differences in cluster analysis-defined OSCC subgroups; 2) this classification was independently associated with overall and OSCC-specific survival after adjustment for potential confounders such as age, sex and stage; and 3) genetic expression data and AJCC stage combined predicted survival of OSCC patients better than AJCC stage alone. This Example provides the first description of an association between gene expression profiling and OSCC-specific survival in manner that is prospective and predictive of clinical outcomes.
To determine if gene expression signature of invasive oral squamous cell carcinoma (OSCC) can sub-classify OSCC on the basis of survival, hierarchical clustering was performed on the expression of 131 genes in 119 OSCC, 35 normal and 17 dysplastic mucosae to identify cluster-defined subgroups. Multivariate Cox regression ascertained association between gene expression and survival. By stepwise Cox regression the top predictive models of OSCC-specific survival were determined, and then compared by Receiver Operating Characteristics (ROC) analysis.
As described in detail below, the 3-year overall mean survival (± SE) for a cluster of 45 OSCC patients was 38.7 ± 0.09%, compared to 69.1 ± 0.08% for the remaining patients. Multivariate analysis adjusted for age, sex and stage showed that the 45 OSCC cluster patients had worse overall and OSCC- specific survival (HR=3.31 , 95% Cl: 1.66, 6.58; HR=5.43, 95% Cl: 2.32, 12.73, respectively). Stepwise Cox regression on the 131 probe sets revealed that a model with a term for LAMC2 (laminin, gamma 2) gene expression best identified patients with worst OSCC-specific survival. A Cox model was fit with a term for a principal component analysis-derived risk-score marker (1PCA') and two other models that combined stage with either LAMC2 or PCA. ROC analysis Area
Under the Curve for models combining stage with either LAMC2 or PCA was 0.80 or 0.82, respectively, compared to 0.70 for stage alone (p=0.013 and 0.008, respectively). This Example thus shows that OSCC can be sub-classified on the basis of gene expression (e.g., using OSCCIGSS). Gene expression and cancer stage combined predicted survival of OSCC patients better than stage alone. As a brief background, allthough advances in surgical techniques and the use of adjuvant treatment modalities have led to some site-specific improvements in survival of patients with oral squamous cell carcinoma (OSCC), the overall prognosis for advanced stage disease has not improved significantly in the past two decades (1 ). One of the impediments to the effective management of OSCC patients is the limited ability to predict the natural history of individual lesions. Unfortunately, the current head and neck cancer staging system is inadequate for predicting survival outcomes, and there seems to be significant clinical and molecular heterogeneity within stages (2,3). However, to date, there are no molecular markers that are used clinically to stratify OSCC and other head and neck cancer patients. Recently, many studies have utilized high- throughput microarray technology in an attempt to identify the different genetic pathways involved in the carcinogenic process and to relate gene expression signatures to clinical outcomes (4-7). Gene expression profiling of OSCC would be most useful if it could add to the existing staging system to predict clinical outcomes more accurately, yet no studies to date have addressed this question. As described above, 131 probe sets were identified (corresponding to 108 known genes) which were differentially expressed between OSCC and normal oral mucosa (8). In this Exampler, hierarchical clustering and principal component analyses of OSCC, dysplasia and normal oral mucosa using these 131 probe sets revealed that oral dysplasias appeared to have varied expression patterns such that some clustered with OSCC and others with normal oral mucosae. According to non-limiting theory, data described in this Example indicated that there may be a spectrum of oral carcinogenesis on the basis of these 131 probe sets, and that OSCC that were least 'dysplasia-like' in gene expression were those that were further along in the carcinogenic process and, thus, were associated with poor survival rates.
MATERIALS AND METHODS Study population: As described in Chen et al., English-speaking patients 18 year of age or older were identified with a first, primary OSCC or dysplasia undergoing surgery or biopsy between December 16th, 2003 and April 17th, 2007 at one of the three University of Washington-affiliated hospitals: University of Washington Medical Center, Harborview Medical Center and the Puget Sound Veterans Affairs Health Care System (VA). Eligible controls were patients who were scheduled to undergo surgery of the oral cavity or oropharynx for non-cancer treatment, such as tonsillectomy or sleep apnea, at the aforementioned institutions during the same time period the cases were recruited. All patients recruited to the study were interviewed in person using a structured lifestyle and medical history questionnaire. Data regarding tumor characteristics, such as stage, were abstracted from medical records. Comorbidity scores were calculated using Adult Comorbidity Evaluation-27 Test (9,10). Patients were followed actively through phone contact and passively through review of medical records and linkage to the U.S. Social Security Death Index. If a patient had died, the death was classified as due to OSCC or not due to OSCC based on review of medical records and death certificates. All participants gave informed consent, and all study procedures were approved by the Institutional Review Boards of the Fred Hutchinson Cancer Research Center, University of Washington, and the VA.
For 150 of the 187 OSCC patients that were recruited, Affymetrix U133 2.0 Plus array data that had passed quality control criteria and at least 4 months of follow-up time. Array results from 17 dysplasias (11 dysplasia patients and an additional six dysplastic lesions from five OSCC patients were also included. Out of these five OSCC patients, only one invasive tumor tissue was included among the 150 OSCC cases and 35 normal oral mucosa from controls. All samples were collected, processed, and hybridized onto Affymetrix HG-U133 Plus 2.0 oligonucletoide arrays as described above and in Chen et al (8).
Generation of the 131 -probe set list: The 131 probe set list was obtained by comparing the differential gene expression between 119 OSCC cases and 35 normal controls as described in the preceding Examples and in Chen et al (8). Hierarchical Clustering and Principal Component Analysis:
Supervised hierarchical clustering analysis and principal component analysis (PCA) of the expression data from 119 OSCCs and 35 controls used to generate the 131 probe set list in from the preceding Examples (e.g., Fig. 2A), plus an additional 17 dysplasias, were performed using GeneSphng GX Software v7.3.1 (Silicone Genetics, CA).
Differential Gene Expression Among OSCC: To identify sub-groups of the 119 OSCC cases based on differential gene expression values from GC Robust Multi-array Average (GCRMA), a regression-based approach was implemented in GenePlus software (11 ). For this comparison, the number of false discoveries (NFD) was used as the type I error selection criterion (12). Gene ontology and pathway analysis for the resultant list of genes were performed using Ingenuity Pathway Analysis software version 6.
Survival Analysis: Follow-up time for analyses of survival for the 119 OSCC cases was calculated from the date of surgery to the date of death, loss-to-follow-up, or April 30, 2007, whichever came first, according to the Kaplan-Meier method. Differences between groups were assessed with the log- rank test. OSCC-specific Kaplan Meier survival estimates were not computed because of possible informative censoring due to death from other causes. Rather, OSCC-specific cumulative mortality was estimated using methods described by Kalbfleisch and Prentice, which account for competing risk events (13,14). Cox-proportional hazards regression model was used to estimate associations with cluster-defined OSCC sub-group status adjusted for age, sex, stage and co-morbidity score. Dummy variables were created for cluster-defined OSCC sub-group status, sex, stage, and co-morbidity scores. These statistical analyses were conducted using STATA software version 9.2.
Prediction model building for OSCC-specific mortality: For this analysis, a total of 150 OSCC cases were used: 119 cases which had been used to derive the 131 probe sets in the preceding Examples (e.g., Fig. 2A) (8), plus an additional 31 cases that were recruited thereafter for which vital status information was obtained and at least 4 months follow-up. Stepwise Cox- proportional hazards regression was used based on the 131 probe sets previously found (preceding Examples) to be differentially expressed between OSCC cases and controls (SAS version 9.2) (8). For the stepwise regression, the significance level for both entrance and exit were each set at α=0.01. To obtain the top 10 models, ten sequential stepwise regression procedures were conducted, with each successive procedure eliminating the selected probe set(s) from the previous procedure. Individual risk scores from the top probe set Cox regression model were compared graphically to risk scores from Cox models with terms for the first and second principal components (PC) from PCA of the 131 probe sets, using Matlab version R2006b.
Comparing survival prediction models with TNM stage: To assess whether a survival model which incorporates gene expression data is better than one without it, an adapted Receiver Operating Characteristics (ROC) analysis was used (15). Risk scores were calculated for 5 models. The first three models contained the terms: 'stage'; 'gene(s) from top prediction model'; and 'PCA' (principal component analysis) - a score representing the expression of the entire 131 probe sets as summarized by the combination of the first and second Principal Components (PCs). The other two models combined the term 'stage' with either of the other two terms. For each model, ROC curves were constructed for predicting two year all-cause survival. At each level of the model- derived risk score, the nearest 10% (using Nearest Neighbor Estimation) was used to estimate true positive and false positive rates. The survival ROC package (http://faculty.washington.edu/heagerty/Software/SurvROC), available for R-project software, was used to implement these methods. The Area Under the Curve (AUC) was calculated to quantify the ability of each model to predict two year survival. One thousand bootstrap samples were generated to estimate standard errors and 95% confidence intervals for AUC estimates, and to obtain p- values for testing the null hypothesis that specific gene expression values or PCA do not add to ability of stage to predict survival.
In order to reduce the over-optimism of ROC and AUC estimates due to using the same data both to estimate and assess the predictive ability of risk scores, a jackknife leave-one-out analysis (16) was performed. Parameter estimates for the risk model were obtained excluding one subject, and the resulting risk model was used to estimate a risk score based on the excluded subject's gene expression and/or stage characteristics. This process was repeated until risk scores were assigned to each subject. ROC and AUC estimates were calculated for these jackknife risk scores as they were for the original risk scores.
Validation of LAMC2, OSMR, SERPINE1, OASL, by qRT-PCR: qRT-PCR was performed to validate the expression of the four genes found to be related to survival in the top two models. Sixty samples were chosen at random for testing. Each sample was assayed in triplicate in 10 μl reaction volumes using the QuantiTect SYBR Green RT-PCR kit (Qiagen, Valencia, CA) and bioinformatically validated QuantiTect primers (Qiagen, Valencia, CA) on a 7900HT Sequence Detection System (ABI, Foster City, CA). The cycling conditions were as follows: 30 minute incubation at 50° C, 15 minute incubation at 95° C, and 40 cycles each of 15 seconds at 94° C, 30 seconds at 55° C, and 30 seconds at 72° C. The fragment amplified included: 1 ) For LAMC2 (NM_005562) a 74-bp amplicon spanning exons 18 and 19 ; 2) For OASL (NM_003733), a 98-bp amplicon spanning exons 4 and 5; 3) for OSMR (NM_003999) a 113-bp amplicon spanning exons 13 and 14 ;4) For SERPINE1 (NM_000602), a 105-bp amplicon spanning exons 3 and 4; and as the reference gene, ACTB, a 146-bp amplicon spanning exons 3 and 4. Ten point standard curves were generated using Universal Human Reference RNA (Stratagene, La JoIIa, CA) for all genes. The linear correlation coefficient (R2) was 0.99 or greater for all runs. The mean threshold cycles (Ct) values were calculated from the triplicate Ct values. Samples that had Ct values with standard deviation greater than 0.35 in their triplicate run were repeated. Mean Ct values were standardized to the mean Ct value of ACTB.
RESULTS:
Study population: The characteristics of the study participants are shown in Table 1. In general, the OSCC cases tended to be older and male, and they were more likely to be current smokers when compared to controls. The majority of the OSCC cases had advanced stage disease (approximately two thirds with AJCC stage III and IV). Table 1. Clinical Characteristics of Study Participants
Cases Controls Dysplasias n=(150) n=(35) n=(16) n % n % n %
Age (years)
19-39 6 4 14 40 0 0
40-49 23 15.3 8 22.9 2 12.5
50-59 51 34 4 11.4 5 31.3
60-88 70 46.7 9 25.7 9 56.3
Gender
Male 109 72.7 25 71.4 10 62.5
Female 41 27.3 10 28.6 6 37.5
Race
White 133 88.7 24 68.6 15 93.8
Nonwhite 11 7.3 11 31.4 1 6.3
Unknown 6 4 0 0 0 0
Smoking
Never/former 77 51.3 25 71.4 8 50
Current 73 48.7 10 28.6 8 50
Alcohol use
Never/former 52 34.7 9 25.7 9 56.3
Current 96 64 26 74.3 7 43.8
Unknown 2 1.3 0 0 0 0
AJCC Stage
I 38 25.3 0 0 N/A N/A
Il 16 10.7 0 0 N/A N/A
III 18 12 0 0 N/A N/A
IV 78 52 0 0 N/A N/A
Site
Oral 103 68.7 1 2.9 14 87.5
Oropharynx 47 31.3 34 97.1 2 12.5
Unknown 0 0 0 0 0 0
Tumor size
T1/T2 101 67.3 0 0 N/A N/A
T3/T4 49 32.7 0 0 N/A N/A
Nodal status NO 71 47.3 N/A O N/A N/A
N1 79 52.7 N/A O N/A N/A
* One patient contributed to two separate samples.
Hierarchical Cluster and Principal Component Analysis: Results from a supervised hierarchical cluster analysis of the 119 OSCC cases, 35 normal controls and 17 dysplastic lesions using the 131 probe sets are shown in Figure 10. Although OSCC cases largely clustered separately from controls, 7 OSCC cases clustered with the controls. One cluster of genes (cluster 1 , Figure 1 ) appeared to show an increasing gradient of down-regulation progressing from normal to dysplastic to invasive lesions. Notably, neither the dysplasias nor those OSCC that misclassified with the normal controls demonstrated consistent down-regulation of these genes. In particular, all members of this group of 12 probe sets (corresponding to nine genes) were completely down-regulated in a subset of 45 OSCC (cluster 1 , Figure 1 and Table S2). It therefore appeared according to non-limiting theory that the gene expression signature of this group of 45 OSCC represented one end of a continuum of gene expression that was characteristic of increasingly aggressive neoplastic behavior.
Figure 11 shows the results of a principal component analysis (PCA) on the 131 probe set expression data based on the samples' phenotype (normal, dysplasia or cancer). The first principal component (PC), which accounted for the greatest amount of variability, captured 60.26% of the variance, whereas the second PC captured 6.31 %. On the basis of these two components alone, the controls and OSCC cases were at opposite ends of the spectrum with dysplasia samples in between (Fig. 11 ). In addition, the same group of 45 OSCC samples identified in the hierarchical cluster analysis was at one extreme on the basis of the first PC scores (Fig. 11 ). Although some dysplastic lesions had first PC scores that overlapped with OSCC, none reached the first PC scores of the group of 45 OSCC samples.
Differential Expression of the 45 sample sub-cluster: This cluster- defined OSCC subgroup was initially identified largely based on a qualitative analysis of the expression of a group of 12 down-regulated probe sets (Fig. 10, cluster 1 ). A linear regression model was used to more rigorously determine which probe sets were differentially expressed in this sub-cluster, compared to the rest of the OSCC cases. After adjusting for age and sex, 62 out of the 131 probe sets were detectably shown to be differentially expressed between these two groups (NFD = 1 ) (Table 2). Therefore, although the 12 down-regulated probe sets each represented a readily apparent change in expression in these 45 samples, nearly one-half of the 131 probe sets showed a distinctive signature in this sub-cluster.
Table 2. Differentially expressed probe sets in group of 45 OSCC sub-cluster out of the 131 probe set list (NFD=I ).
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0001
Figure imgf000077_0001
11 probe sets from "cluster 1 " in Figure 1 are in bold. One probe set, TCP11 L2 t- complex 11 (mouse) like 2 (1553861_at), out of the 12 probe sets in "cluster 1" was not down-regulated with statistical significance.
Survival Analysis: The patient characteristics for this sub-cluster compared to those of the rest of the cases are shown in Table 3. Patients in the 45-sample sub-cluster had more advanced disease, as determined by both tumor size and nodal metastasis. The range of follow-up time for patients known to be alive at the end of the study was 10.7 to 38.7 months, with a median of 22.2 months.
Table 3. Characteristics of two cluster-defined OSCC subgroups
Group of Group of 74 45
(n=74) (n=45) (%) (%) Age (years)
20-39 1 -1. 4 4 -8.9 40-49 10 -13 .5 8 -17.8 50-59 28 -37 .8 12 -26.7 60-88 35 -47 .3 21 -46.7
Gender
Male 55 -74.3 29 -64.4
Female 19 -25.7 16 -35.6
AJCC Stage
I 21 -28.4 7 -15.6
II 1 1 -14.9 3 -6.7
III 9 -12.2 7 -15.6
IV 33 -44.6 28 -62.2
Tumor Size
T1/T2 56 -75.7 22 -48.9
T3/T4 18 -24.3 23 -51.1
Nodal Status
NO 40 -54.1 15 -33.3
N1 34 -45.9 30 -66.7
Vital Status
Alive 59 -79.7 21 -46.7
Dead-OSCC 9 -12.2 17 -37.8
Dead-non OSCC 6 -8.1 3 -6.7
Dead -unknown cause 0 0 4 -8.9
To test the hypothesis that this 45 sample sub-cluster had a more aggressive phenotype, Kaplan-Meier survival curves for overall survival were compared (Figure 12A). The 3-year mean (% ± SE) overall survival for the 45 sample sub-cluster was 38.7 ± 0.09% compared to 69.1 ± 0.08 % (p=0.0001 ) for the other 74 samples. The estimated cumulative mortality (%± SE) due to OSCC at 3 years for the 45 sample sub-cluster was 45.7 ± 0.09% compared to 16.8 ± 0.06% (p=0.0003) for the other 74 samples (Figure 3B). Hazard ratios (HR) were estimated for both overall and OSCC-specific mortality, adjusting for AJCC stage, age and sex. Patients from the 45 sample sub-cluster had a significantly higher rate of death, both overall (HR=3.31 , 95% Cl: 1.66, 6.58) and due to OSCC (HR= 5.43, 95% Cl: 2.32, 12.73). HR were then adjusted for tumor size, nodal status and tumor site (oral vs. oropharyngeal) separately and the results did not change appreciably. Although a higher co-morbidity score was statistically significantly associated with both mortality outcomes, it neither confounded nor improved the precision of the association with OSCC sub-cluster when included in the Cox regression model. Prediction models: A stepwise Cox-proportional hazards regression was performed based on these 131 probe sets to determine which, if any, were associated with OSCC-specific survival. Out of 150 patients, 109 were alive and 41 had died at the end of the follow-up period (Table 1 ). Among these, there were 27 OSCC-specific deaths and 14 deaths of either unknown or non-OSCC- specific causes. A model containing LAMC2 (laminin, gamma 2) was preferred for identifying patients with the worst OSCC-specific survival. The subsequent nine models that were identified through the stepwise approach are shown in Table 4.
Table 4. 10 preferred multivariate Cox regression models of OSCC-specific survival
Model Gene Symbol (Affymetrix Model coefficients Probe set ID)
LAMC2 (207517_at) 0.59151*LAMC2
OSMR (1554008_at)
SERPINEl (1568765_at) 0.42485*OSMR + 0.40482*SERPINEl 0.33483 *OASL
OASL (210797_at)
3 SLC16A1 (209900_s_at) O.81478*SLC16A1
4 JKZF7 (1555420_at) 0.60694*KLF7
5 THBSl (20\ \08_s_at) SLC16A1 (202235_at) O.44241*THBS1 + 0.43257*SLC16Al
6 H0MER3 (204647_at) 0.66632*HOMEPv3
7 GRP68 (229055_at) 0.63313 *GPvP68
8 PDiW(204879_at) 0.51904*PDPN
9 ANKRD35 (23\ \ \8_at)
0.58503*ANKPvD35
10 CDHS (203256_at) EPS8L1 (218779_x_at) 0.75146* CDH3 - 0.50956* EPS8L1 The 3-D plot (Figure 14) shows that the risk score from the preferred model (0.59151 *LAMC2) were highly correlated with the risk scores from models containing terms for the first and second PCs from the analysis of the 131 probe sets. In addition, those patients with the highest risk scores from either the preferred model or the PC models were mostly the ones in the cluster- defined group of 45 patients.
Comparing survival prediction models with AJCC stage: Results from the ROC analysis for each of the 5 models described above are shown in Figure 13. The AUCs for models with either gene expression alone or in combination with stage were higher than for a model with stage alone (Fig. 13C). The differences in the AUCs between models with 'stage' plus either 'LAMC2' or'PCA' and stage alone were statistically significant (p= 0.013 and 0.008, respectively). The AUCs from the jackknife leave-one-out analyses (0.81 for the model with 'stage' and 'LAMC2' and 0.79 for the model with 'stage' and 'PCA') were virtually the same as those estimated using conventional methods.
Validation of LAMC2, OSMR, SERPINE1, OASL, by q RT-PCR: The correlation coefficients for LAMC2, OSMR, SERPINE1 and OASL between microarray and qRT-PCR expression data for the 60 samples assayed were 0.65, 0.14, 0.74, and 0.89, respectively. Thus, with the exception of OSMR, qRT-PCR results were well-correlated with those from microarray analyses. Discussion:
In this Example, from expression levels of 131 probe sets that were previously found to be highly associated with OSCC (8), it has been shown that OSCC can be further sub-classified on the basis of gene expression signatures. Moreover, this classification was independently associated with overall and OSCC-specific survival after adjustment for potential confounders such as age, sex and stage. None of the dysplastic lesions overlapped with the group of 45 OSCC cases on the basis of the first PC (Fig. 11 ) suggesting that this 45 sample sub-cluster represented a more invasive phenotype. From these findings it was investigated whether there was a trend of differential expression of these 131 probe sets in OSCC, such that the varying degrees of up- or down-regulation of some genes might be of prognostic significance. The observation that the score (1PCA') that summarized the expression levels of all 131 probe sets as a combination of the first and second PCs was significantly associated with OSCC- specific survival supports this hypothesis. Another important finding of this Example is that similar results were obtained between summary measures of all 131 probe sets and the preferred model containing only one gene. The risk scores from models with each of the first two principal components and the preferred, highest-scoring gene-specifc model (0.59151 *LAMC2) were highly correlated (Fig. 14). This result underscored the possibility for reducing the dimensionality of the data not only to the summary of principal components, but to one single probe set, without substantial loss of information. This efficiency was important because it will be easier to implement molecular tests for clinical use if fewer molecules need to be measured. Two previous studies by Chung et al have shown an association between microarray-derived expression data and clinical outcomes in head and neck squamous cell carcinomas (HNSCC) (4,5). In the first study, a 582 gene set from 60 HNSCC classified the tumors into 4 different subclasses with statistically significant differences in recurrence-free survival. In a separate study using formalin-fixed tissues, these authors identified a second 950 gene signature from unsupervised analysis and a 75-gene list from supervised PCA that was predictive of recurrence. Using the Unigene ID numbers for each gene, we determined that these two previous signatures have 39 genes in common.
In contrast, comparison of the presently disclosed 131 probe sets with the gene lists from the previous two studies revealed only one gene shared in all three lists: solute carrier family 16, member 1 (SLC16A1). This gene comprises the third-highest scoring model for OSCC-specific survival according to the data presently available (Table 4).
Pramana et al tested 42 genes with known function out of the 75- gene list from Chung et al , and showed that these genes were predictive of locoregional control in their own data set (5,6). Only two genes overlapped between the presently disclosed 131 probe set list and these 42 genes: Glycine- rich protein (GRP3S) {MACF1) and collagen type V, alpha 1 (COL5A1).
There are likely to be many reasons for the lack of more substantial overlap between the presently disclosed gene lists and those of Chung et al. and Pramana et al. For example, their 950- and 75-gene lists were derived from formalin-fixed samples and a different array platform (5). In addition, the samples in the studies of Chung et al were from multiple head and neck sites, whereas the samples used to generate the data in the present Example were limited to the oral cavity and oropharynx. The end points also differed between studies, since overall and OSCC-specific survival were analyzed in the present Example, while Chung et al. examined recurrence-free survival (5,8). The statistical approaches to derive these gene signatures were also substantially different (5,8,11 ). Given all these issues and without wishing to be bound by theory, it appears that overlapping genes in particular should be further investigated for their potential generalizability in predicting clinical outcomes.
Among the 131 genes used in the supervised cluster analysis described here, 62 probe sets were differentially expressed between the group of 45 patients and the remaining OSCC. Ingenuity Pathway Analysis of the 62 probe sets showed an overrepresentation of genes involved in cell migration; cell- to-cell signaling and interaction; and cellular growth and proliferation. In addition, five of the genes in the 10 highest-scoring models that are presented here as predictive of OSCC-specific mortality, such as LAMC2, SERPINE1, THBS1, PDPN and CDH3, play a role in cell motility and cell-to-cell signaling, implying that expression of genes involved in the process of invasion and metastasis is an important determinant of outcome in patients with these malignancies (17-23). Specifically, the proteins encoded by these genes reside in the extracellular matrix and are believed to function, for example, in angiogenesis, platelet aggregation and/or cell movement.
For instance, THBS1 and PDPN have both been ascribed a role in platelet aggregation and may be involved in tumor metastasis by facilitating tumor cell-platelet interactions and platelet-facilitated tumor cell metastasis (20). It is also known that THBS1 binds with members of the tenacin family and SPARC/osteonectin (20). In fact, tenacin C and SPARC were found to be significantly upregulated at both the gene expression and protein levels in this and other studies by our group (2,24). In addition, P-cadherin (CDH3), a component of the 10th model described here, is associated with cell-to-cell signaling, and the CDH3 gene has previously been shown to be significantly downregulated in metastatic tumors cell isolated from lymph nodes (25). The findings in this Example showing that the dysregulation of these genes' expression was associated with OSCC-specific survival were consistent with the non-limiting theory that tumor proliferation and metastasis may be mediated by complex interactions between extracellular matrix proteins and cell-surface receptors.
The functions of other genes in the 10 top-scoring models described in this Example are less well understood. OASL appears to be a member of a family of Trips (Thyroid hormone-interacting proteins) and may thus be involved in signal transduction in the presence of thyroid hormone (26). Oncostatin M receptor (OSMR) is a member of the IL6 cytokine family and is thought to be involved in signal transduction and proliferation (27,28).
This Example is believed to provide the first demonstration of an association between a gene signature and OSCC-specific survival, and in particular of the use of gene expression data to improve upon AJCC stage in predicting survival. As described above, regression models that combined stage with gene expression had significantly higher AUCs than stage alone (Figure 13). Given the recent emphasis on genome-wide gene expression studies to find signatures predictive of clinical outcomes, these and related embodiments will permit integration of meaningful genetic data into clinical practice.
References:
1. Carvalho AL, Nishimoto IN, Califano JA, Kowalski LP. Trends in incidence and prognosis for head and neck cancer in the United States: A site-specific analysis of the SEER database, lnt J Cancer 2005;114:806-16.
2. Mendez E, Cheng C, Farwell DG et al. Transcriptional expression profiles of oral squamous cell carcinomas. Cancer 2002;95:1482-94. 3. Baatenburg de Jong RJ, Hermans J, Molenaar J, Briaire JJ, Ie Cessie S. Prediction of survival in patients with head and neck cancer. Head Neck 2001 ;23:718-24.
4. Chung CH, Parker JS, Karaca G et al. Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 2004;5:489-500.
5. Chung CH, Parker JS, Ely K et al. Gene Expression Profiles Identify Epithelial-to-Mesenchymal Transition and Activation of Nuclear Factor-{kappa}B Signaling as Characteristics of a High-risk Head and Neck Squamous Cell Carcinoma. Cancer Res 2006;66:8210-8.
6. Pramana J, van den Brekel MW, Van Velthuysen ML et al. Gene expression profiling to predict outcome after chemoradiation in head and neck cancer, lnt J Radiat Oncol Biol Phys 2007;69:1544-52.
7. Ginos MA, Page GP, Michalowicz BS et al. Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res 2004;64:55-63.
8. Chen C, Mendez E, Houck J et al. Gene expression profiling identifies genes predictive of oral squamous cell carcinoma. Cancer Epidemiology, Biomarkers and Prevention, In Press 2008 9. Piccihllo JF. Importance of comorbidity in head and neck cancer. Laryngoscope 2000; 110:593-602.
10. Piccirillo JF, Creech C, Zequeira R, Anderson S,
Johnston AS. Inclusion of Comorbidity into Oncology Data Registries. Journal of
Registry Management 1999;26:66-70. 11. Thomas JG, Olson JM, Tapscott SJ, Zhao LP. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res
2001 ;1 1 :1227-36.
12. Xu XL, Olson JM, Zhao LP. A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model.
Hum MoI Genet 2002;11 :1977-85. 13. Kalbfleisch, J. D. and Prentice, R. L. The Statistical Analysis of Failure Time Data. New York: John Wiley and Sons, 1980.
14. Satagopan JM, Ben-Porat L, Berwick M et al. A note on competing risks in survival data analysis. Br J Cancer 2004;91 :1229-35. 15. Heagerty PJ, Lumley T, Pepe MS. Time-dependent
ROC curves for censored survival data and a diagnostic marker. Biometrics 2000;56:337-44.
16. Wasson JH, Sox HC, Neff RK, Goldman L. Clinical prediction rules. Applications and methodological standards. N Engl J Med 1985;313:793-9.
17. Schenk S, Hintermann E, Bilban M et al. Binding to EGF receptor of a laminin-5 EGF-like fragment liberated during MMP-dependent mammary gland involution. J Cell Biol 2003;161 :197-209.
18. Bajou K, Masson V, Gerard RD et al. The plasminogen activator inhibitor PAI-1 controls in vivo tumor vascularization by interaction with proteases, not vitronectin. Implications for antiangiogenic strategies. J Cell Biol 2001 ;152:777-84.
19. Pedersen H, Brunner N, Francis D et al. Prognostic impact of urokinase, urokinase receptor, and type 1 plasminogen activator inhibitor in squamous and large cell lung cancer tissue. Cancer Res 1994;54:4671 -5.
20. Bornstein P. Diversity of function is inherent in matricellular proteins: an appraisal of thrombospondin 1. J Cell Biol 1995:130:503-6. 21. Yee KO, Streit M, Hawighorst T, Detmar M, Lawler J.
Expression of the type-1 repeats of thrombospondin-1 inhibits tumor growth through activation of transforming growth factor-beta. Am J Pathol 2004;165:541 -
52.
22. Kato Y, Sasagawa I, Kaneko M et al. Aggrus: a diagnostic marker that distinguishes seminoma from embryonal carcinoma in testicular germ cell tumors. Oncogene 2004;23:8552-6. 23. Sandler MA, Zhang JN, Westerhausen DRJr,
Billadello JJ. A novel protein interacts with the major transforming growth factor- beta responsive element in the plasminogen activator inhibitor type-1 gene. J Biol Chem 1994;269:21500-4. 24. Choi P, Jordan CD, Mendez E et al. Examination of oral cancer biomarkers by tissue microarray analysis. Arch Otolaryngol Head Neck Surg 2008;134:539-46.
25. Mendez E, Fan W, Choi P et al. Tumor-specific genetic expression profiles of metastatic oral squamous cell carcinoma. Head Neck 2007;29:803-14.
26. Lee JW, Choi HS, Gyuris J, Brent R, Moore DD. Two classes of proteins dependent on either the presence or absence of thyroid hormone for interaction with the thyroid hormone receptor. MoI Endocrinol 1995;9:243-54. 27. Mosley B, De lmus C, Friend D et al. Dual oncostatin
M (OSM) receptors. Cloning and characterization of an alternative signaling subunit conferring OSM-specific receptor activation. J Biol Chem 1996;271 :32635-43.
28. Gearing DP, Comeau MR, Friend DJ et al. The IL-6 signal transducer, gp130: an oncostatin M receptor and affinity converter for the LIF receptor. Science 1992;255:1434-7.

Claims

CLAIMS What is claimed is:
1. A method for identifying a risk for having, or presence of, oral squamous cell carcinoma (OSCC) in a subject, the method comprising:
(a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; and
(b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells; wherein differential expression of the SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC.
2. The method of claim 1 wherein the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1-200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
3. The method of claim 1 or claim 2 wherein the SCCIGS is one or more SCCIGS selected from the group consisting of:
(a) the SCCIGS consisting of a LAMC2 gene,
(b) the SCCIGS consisting of LAMC2 and COL4A1 genes, and
(c) the SCCIGS consisting of COL1 A1 and PADM genes.
4. The method of claim 1 or claim 2 wherein the SCCIGS is one or more SCCIGS selected from the group consisting of:
(a) the SCCIGS consisting of a LAMC2 gene,
(b) the SCCIGS consisting of LAMC2 and COL4A1 genes,
(c) the SCCIGS consisting of COL1 A1 and PADM genes, (d) the SCCIGS consisting of a C21orf81 gene,
(e) the SCCIGS consisting of KRT17 and PRSS3 genes,
(f) the SCCIGS consisting of COL1A2 and EST 230740_1at genes,
(g) the SCCIGS consisting of COL1 A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 and HAS3 genes,
(i) the SCCIGS consisting of POSTN and TIA2(PDPN) genes, (j) the SCCIGS consisting of MGC40368(TCP11 L2), GIP3(IFI6) and COL27A1 genes,
(k) the SCCIGS consisting of CDH3 and ELOVL6 genes,
(I) the SCCIGS consisting of the COL4A1 gene, and
(m) the SCCIGS consisting of genes identified by 131 probe sets as set forth in Figure 2A.
5. The method of any one of claims 1 -4 wherein the step of determining a SCCIGS expression level comprises:
(a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of:
(i) all or a SCCIG-characteristic portion of a SCCIG transcript,
(ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and
(iii) a nucleic acid amplification product of one or more of (i) and (ii); and
(b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
6. The method of claim 5 wherein the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331.
7. The method of claim 5 wherein the biological sample comprises a biopsy tissue.
8. The method of claim 7 wherein the biopsy tissue is selected from the group consisting of an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue.
9. The method of claim 5 wherein the biological sample comprises one or a plurality of dysplastic cells.
10. A method for identifying a risk for having, or presence of, oral squamous cell carcinoma (OSCC) in a subject having oral epithelial dysplasia but no frank OSCC, the method comprising:
(a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one dysplastic oral epithelial cell; and
(b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of OSCC cells; wherein substantial similarity of the SCCIGS expression level in the biological sample relative to the OSCC reference SCCIGS expression levels indicates the subject has, or is at risk for having, OSCC.
11. The method of claim 10 wherein the squamous cell carcinoma indicator gene set (SCCIGS) comprises any one or more of the genes shown in Figure 6.
12. The method of claim 10 wherein the step of determining a SCCIGS expression level comprises:
(a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of: (i) all or a SCCIG-characteristic portion of a SCCIG transcript,
(ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and
(iii) a nucleic acid amplification product of one or more of (i) and (ii); and
(b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
13. The method of claim 12 wherein the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of the probes listed in Figure 9.
14. The method of claim 10 wherein the biological sample comprises a biopsy tissue.
15. The method of claim 10 wherein the subject has no detectable cancer and the biological sample comprises one or a plurality of dysplastic cells.
16. A method for identifying a risk for having, or presence of, a squamous cell carcinoma (SCC) in a subject, wherein the SCC is selected from oral SCC (OSCC) and head-and-neck SCC (HNSCC), the method comprising:
(a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin; and
(b) comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells, wherein: if the biological sample comprises an OSCC cell then the control tissue comprises normal oral epithelium, and if the first biological sample comprises a HNSCC cell then the control tissue comprises normal epithelium from oropharynx, hypopharynx, larynx or oral cavity; and wherein differential expression of the SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC.
17. The method of claim 16, wherein the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1-200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
18. The method of claim 15 or claim 16 wherein the SCCIGS is one or more SCCIGS selected from the group consisting of:
(a) the SCCIGS consisting of a LAMC2 gene,
(b) the SCCIGS consisting of LAMC2 and COL4A1 genes, and
(c) the SCCIGS consisting of COL1 A1 and PADM genes.
19. The method of claim 15 or claim 16 wherein the SCCIGS is one or more SCCIGS selected from the group consisting of:
(a) the SCCIGS consisting of a LAMC2 gene,
(b) the SCCIGS consisting of LAMC2 and COL4A1 genes,
(c) the SCCIGS consisting of COL1 A1 and PADM genes,
(d) the SCCIGS consisting of a C21orf81 gene,
(e) the SCCIGS consisting of KRT17 and PRSS3 genes,
(f) the SCCIGS consisting of COL1A2 and EST 230740_1at genes,
(g) the SCCIGS consisting of COL1 A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 and HAS3 genes,
(i) the SCCIGS consisting of POSTN and TIA2(PDPN) genes, G) the SCCIGS consisting of MGC40368(TCP11 L2), GIP3(IFI6) and COL27A1 ,
(k) the SCCIGS consisting of CDH3 and ELOVL6 genes, (I) the SCCIGS consisting of the COL4A1 gene and (m) the SCCIGS consisting of genes identified by 131 probe sets as set forth in Figure 2A.
20. The method of any one of claims 16-19 wherein the biological sample comprises an OSCC cell and the control tissue comprises normal oral epithelium.
21. The method of any one of claims 16-19 wherein the biological sample comprises a HNSCC cell and the control tissue comprises normal epithelium from oropharynx, hypopharynx, larynx or oral cavity.
22. The method of any one of claims 16-19 wherein the step of determining a SCCIGS expression level comprises:
(a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of:
(i) all or a SCCIG-characteristic portion of a SCCIG transcript,
(ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and
(iii) a nucleic acid amplification product of one or more of (i) and (ii); and
(b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
23. The method of claim 22 wherein the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331.
24. The method of claim 22 wherein the biological sample comprises a biopsy tissue.
25. The method of claim 24 wherein the biopsy tissue is selected from the group consisting of an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue.
26. The method of claim 22 wherein the biological sample comprises one or a plurality of dysplastic cells.
27 A method for identifying an increased risk of oral squamous cell carcinoma (OSCC)-specific mortality in a subject having OSCC, the method comprising:
(a) determining a squamous cell carcinoma indicator gene set (SCCIGS) expression level in a biological sample from the subject that comprises at least one OSCC cell or at least one cell from an OSCC surgical margin;
(b) determining that the subject has, or is at risk for having, OSCC by comparing the SCCIGS expression level of (a) to a reference SCCIGS expression level that is characteristic of epithelial cells from a control tissue that comprises normal oral epithelium known to be free of squamous cell carcinoma cells, wherein a differentially expressed SCCIGS in the biological sample relative to the control tissue indicates the subject has, or is at risk for having, OSCC; and
(c) identifying within said differentially expressed SCCIGS a presence or absence of a substantially up- or down-regulated SCCIGS subset (SCCIGSS), wherein presence of the substantially up- or down-regulated SCCIGSS indicates the subject has an increased risk of OSCC-specific mortality.
28. The method of claim 27 wherein the SCCIGS comprises one or more SCC biomarker genes selected from SEQ ID NOS:1-200, or variants thereof that are differentially expressed in a squamous cell carcinoma cell as compared to a control tissue that comprises normal oral epithelium known to be free of squamous carcinoma cells.
29. The method of claim 27 or claim 28 wherein the SCCIGSS is one or more SCCIGSS selected from the group consisting of:
(a) the SCCIGS consisting of a LAMC2 gene,
(b) the SCCIGSS consisting of a LAMC2 gene,
(C) the SCCIGSS consisting of OSMR, SERPINE1 and OASL genes,
(d) the SCCIGSS consisting of a SLC16A1 gene,
(e) the SCCIGSS consisting of a KLF7 gene,
(f) the SCCIGSS consisting of THBS1 and SLC16A1 genes,
(g) the SCCIGSS consisting of a HOMER3 gene, (h) the SCCIGSS consisting of a GRP68 gene, (i) the SCCIGSS consisting of a PDPN gene,
G) the SCCIGSS consisting of an ANKRD35 gene, and
(k) the SCCIGSS consisting of CDH3 and EPS8L1 genes.
30. The method of claim 27 or claim 28 wherein (i) the SCCIGS is one or more SCCIGS selected from the group consisting of:
(a) the SCCIGS consisting of a LAMC2 gene,
(b) the SCCIGS consisting of LAMC2 and COL4A1 genes,
(c) the SCCIGS consisting of COL1 A1 and PADM genes,
(d) the SCCIGS consisting of a C21orf81 gene,
(e) the SCCIGS consisting of KRT17 and PRSS3 genes,
(f) the SCCIGS consisting of COL1A2 and EST 230740_1at genes,
(g) the SCCIGS consisting of COL1 A1 and XLKD1 genes, (h) the SCCIGS consisting of THY1 , FLJ22671 and HAS3 genes,
(i) the SCCIGS consisting of POSTN and TIA2(PDPN) genes,
G) the SCCIGS consisting of MGC40368(TCP11 L2), GIP3(IFI6) and COL27A1 genes,
(k) the SCCIGS consisting of CDH3 and ELOVL6 genes, (I) the SCCIGS consisting of the COL4A1 gene, and
(m) the SCCIGS consisting of genes identified by 131 probe sets as set forth in Figure 2A, and wherein (ii) the SCCIGSS consists of one or more genes identified by a probe set as set forth in Table 2.
31. The method of any one of claims 27-30 wherein at least one of the steps selected from the step of determining a SCCIGS expression level and the step of identifying a SCCIGSS comprises:
(a) specifically hybridizing a detectable, squamous cell carcinoma indicator gene (SCCIG)-specific oligonucleotide probe to one or more of:
(i) all or a SCCIG-characteristic portion of a SCCIG transcript,
(ii) a polynucleotide having a nucleotide sequence that is fully complementary to (i), and
(iii) a nucleic acid amplification product of one or more of (i) and (ii); and
(b) detecting the SCCIG-specific probe, and thereby determining the SCCIGS expression level.
32. The method of claim 31 wherein the SCCIG-specific oligonucleotide probe has a nucleotide sequence that is selected from the group consisting of SEQ ID NOS:201 -331.
33. The method of claim 31 wherein the biological sample comprises a biopsy tissue.
34. The method of claim 33 wherein the biopsy tissue is selected from the group consisting of an excised tumor, a tumor-positive margin tissue, a tumor-negative margin tissue and a close margin tissue.
35. The method of claim 31 wherein the biological sample comprises one or a plurality of dysplastic cells.
36. The method of any one of claims 1 -4, 10, 16, 17, 27 and 28 wherein determining one or a plurality of SCCIGS expression levels comprises measuring one or more protein levels in the biological sample.
37. The method of claim 36 wherein the biological sample comprises a biological fluid.
38. The method of claim 37 wherein the biological fluid is selected from the group consisting of saliva, blood, serum, plasma and lymph.
PCT/US2009/051743 2009-02-13 2009-07-24 Gene expression profiling identifies genes predictive of oral squamous cell carcinoma and its prognosis WO2010093379A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15254109P 2009-02-13 2009-02-13
US61/152,541 2009-02-13

Publications (1)

Publication Number Publication Date
WO2010093379A1 true WO2010093379A1 (en) 2010-08-19

Family

ID=42562010

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/051743 WO2010093379A1 (en) 2009-02-13 2009-07-24 Gene expression profiling identifies genes predictive of oral squamous cell carcinoma and its prognosis

Country Status (1)

Country Link
WO (1) WO2010093379A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120082994A1 (en) * 2003-11-03 2012-04-05 Lars Dyrskjot Andersen Expression Levels of COL4A1 and other Markers Correlating with Progression or Non-Progression of Bladder Cancer
WO2015175858A1 (en) * 2014-05-16 2015-11-19 The Research Foundation For The State University Of New York Keratin 17 as a biomarker for head and neck cancers
CN105483246A (en) * 2015-12-29 2016-04-13 北京泱深生物信息技术有限公司 Application of differential expression of gene in oral cancer diagnosis
WO2017079571A1 (en) * 2015-11-05 2017-05-11 Arphion Diagnostics Process for the indentication of patients at risk for oscc
EP3060914A4 (en) * 2013-10-23 2017-05-17 Oregon Health & Science University Methods of determining breast cancer prognosis
WO2017217935A1 (en) * 2016-06-14 2017-12-21 Agency For Science, Technology And Research Use of mi r-198 in treating and diagnosing cutaneous squamous cell carcinoma
IT201800004137A1 (en) * 2018-03-30 2019-09-30 Domenico Marina Di IN VITRO SCREENING METHOD FOR EARLY DIAGNOSIS OF TUMORS OF THE ORAL CAVITY AND RELATED KIT, BASED IN PARTICULAR ON THE ELISA ASSAY
EP3666906A1 (en) * 2018-12-11 2020-06-17 Consejo Superior De Investigaciones Científicas Methods and kits for the prognosis of squamous cell carcinomas (scc)
CN111304325A (en) * 2020-02-22 2020-06-19 深圳大学 Oral cancer marker STOML1 gene expression and application thereof
WO2020181080A1 (en) * 2019-03-06 2020-09-10 Celldex Therapeutics, Inc. Biomarkers for treatment of cancer
CN111979324A (en) * 2020-08-28 2020-11-24 中国医科大学附属口腔医院 Fusobacterium nucleatum-associated oral epithelial cell tumor biomarker and screening application
CN114990223A (en) * 2022-07-13 2022-09-02 复旦大学附属中山医院 A biomarker for the diagnosis of oral squamous cell carcinoma and its application
CN116798520A (en) * 2023-06-28 2023-09-22 复旦大学附属肿瘤医院 Method for constructing squamous cell carcinoma tissue origin site protein marker prediction model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHEN ET AL.: "Gene expression profiling identifies genes predictive of oral squamous cell carcinoma", CANCER EPIDEMIOL. BIOMARKERS PREV., vol. 17, no. 8, 31 July 2008 (2008-07-31), pages 2152 - 2162, XP055091600, DOI: doi:10.1158/1055-9965.EPI-07-2893 *
FRANZ ET AL.: "A quantitative co-localization analysis of large unspliced tenascin-C(L) and laminin-5/gamma2-chain in basement membranes of oral squamous cell carcinoma by confocal laser scanning microscopy", HISTOCHEM. CELL BIOL., vol. 126, no. 1, 13 December 2005 (2005-12-13), pages 125 - 131 *
GASPARONI ET AL.: "Prognostic value of differential expression of Laminin-5 gamma2 in oral squamous cell carcinomas: correlation with survival", ONCOL. REP., vol. 18, no. 4, October 2007 (2007-10-01), pages 793 - 800 *
PATEL ET AL.: "Laminin-gamma2 overexpression in head-and-neck squamous cell carcinoma", INT. J. CANCER, vol. 99, no. 4, 1 June 2002 (2002-06-01), pages 583 - 588 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120082994A1 (en) * 2003-11-03 2012-04-05 Lars Dyrskjot Andersen Expression Levels of COL4A1 and other Markers Correlating with Progression or Non-Progression of Bladder Cancer
EP3060914A4 (en) * 2013-10-23 2017-05-17 Oregon Health & Science University Methods of determining breast cancer prognosis
WO2015175858A1 (en) * 2014-05-16 2015-11-19 The Research Foundation For The State University Of New York Keratin 17 as a biomarker for head and neck cancers
WO2017079571A1 (en) * 2015-11-05 2017-05-11 Arphion Diagnostics Process for the indentication of patients at risk for oscc
CN105483246A (en) * 2015-12-29 2016-04-13 北京泱深生物信息技术有限公司 Application of differential expression of gene in oral cancer diagnosis
CN105483246B (en) * 2015-12-29 2019-01-22 北京泱深生物信息技术有限公司 Application of the differential expression of gene in carcinoma of mouth diagnosis
WO2017217935A1 (en) * 2016-06-14 2017-12-21 Agency For Science, Technology And Research Use of mi r-198 in treating and diagnosing cutaneous squamous cell carcinoma
US11053496B2 (en) 2016-06-14 2021-07-06 Agency For Science, Technology And Research Consequences of a defective switch in cutaneous squamous cell carcinoma
WO2019186521A1 (en) * 2018-03-30 2019-10-03 Di Domenico Marina In vitro screening method and kit for early diagnosis of oral cavity tumours
IT201800004137A1 (en) * 2018-03-30 2019-09-30 Domenico Marina Di IN VITRO SCREENING METHOD FOR EARLY DIAGNOSIS OF TUMORS OF THE ORAL CAVITY AND RELATED KIT, BASED IN PARTICULAR ON THE ELISA ASSAY
EP3666906A1 (en) * 2018-12-11 2020-06-17 Consejo Superior De Investigaciones Científicas Methods and kits for the prognosis of squamous cell carcinomas (scc)
WO2020181080A1 (en) * 2019-03-06 2020-09-10 Celldex Therapeutics, Inc. Biomarkers for treatment of cancer
CN111304325A (en) * 2020-02-22 2020-06-19 深圳大学 Oral cancer marker STOML1 gene expression and application thereof
CN111979324A (en) * 2020-08-28 2020-11-24 中国医科大学附属口腔医院 Fusobacterium nucleatum-associated oral epithelial cell tumor biomarker and screening application
CN114990223A (en) * 2022-07-13 2022-09-02 复旦大学附属中山医院 A biomarker for the diagnosis of oral squamous cell carcinoma and its application
CN116798520A (en) * 2023-06-28 2023-09-22 复旦大学附属肿瘤医院 Method for constructing squamous cell carcinoma tissue origin site protein marker prediction model

Similar Documents

Publication Publication Date Title
WO2010093379A1 (en) Gene expression profiling identifies genes predictive of oral squamous cell carcinoma and its prognosis
Chen et al. Gene expression profiling identifies genes predictive of oral squamous cell carcinoma
US10457994B2 (en) 4-miRNA signature for predicting clear cell renal cell carcinoma metastasis and prognosis
JP2015521480A (en) Methods for head and neck cancer prognosis
EP2390370B1 (en) A method for predicting the response of a tumor in a patient suffering from or at risk of developing recurrent gynecologic cancer towards a chemotherapeutic agent
JP2008521412A (en) Lung cancer prognosis judging means
MX2008011839A (en) Propagation of primary cells.
CN119220676A (en) Prostate cancer detection kit or device and detection method
ES2504242T3 (en) Breast Cancer Prognosis
Liu et al. Development of a novel serum exosomal MicroRNA nomogram for the preoperative prediction of lymph node metastasis in esophageal squamous cell carcinoma
CN109468382B (en) Application of lncRNA in diagnosis and treatment of lung adenocarcinoma
Andreasen Molecular features of adenoid cystic carcinoma with an emphasis on microRNA expression.
Haaland et al. Differential gene expression in tumor adjacent histologically normal prostatic tissue indicates field cancerization
CN111269985B (en) Application of hsa _ circRNA6448-14 in diagnosis and prognosis prediction of esophageal squamous cell carcinoma
US20170002424A1 (en) Microrna signature as an indicator of the risk of early recurrence in patients with breast cancer
US10597728B2 (en) Methylation site regulating expression of mda-9/syntenin
CN103687963A (en) A method for determining the prognosis of hepatocellular carcinoma using a multigene signature associated with metastasis
CA2504403A1 (en) Prognostic for hematological malignancy
Belbin et al. Site-specific molecular signatures predict aggressive disease in HNSCC
BR112020012280A2 (en) compositions and methods for diagnosing lung cancers using gene expression profiles
KR20210144353A (en) Method for Predicting Colorectal Cancer Prognosis Based on Single Cell Transcriptome Analysis
Bayır et al. Differentially expressed genes related to lymph node metastasis in advanced laryngeal squamous cell cancers
CN111979315A (en) Application of cyclic TP63 as a diagnostic or therapeutic target for lung squamous cell carcinoma
EP2065473A1 (en) A method to assess prognosis and to predict therapeutic success in gynecologic cancer
WO2010003772A1 (en) Method for predicting adverse response to erythropoietin in breast cancer treatment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09840154

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09840154

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载