WO1999058720A1 - Quantitative methods, systems and apparatuses for gene expression analysis - Google Patents
Quantitative methods, systems and apparatuses for gene expression analysis Download PDFInfo
- Publication number
- WO1999058720A1 WO1999058720A1 PCT/US1999/010387 US9910387W WO9958720A1 WO 1999058720 A1 WO1999058720 A1 WO 1999058720A1 US 9910387 W US9910387 W US 9910387W WO 9958720 A1 WO9958720 A1 WO 9958720A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene expression
- relatedness
- gene
- genes
- quantifying
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- This invention relates to bioinformatic methods applicable to pharmaceutical drug development. More specifically, this invention relates to methods, systems and apparatuses for the quantitative analysis, comparison, storage, and visual display of gene expression profiles. The invention further relates to quantitative methods, systems, and apparatuses for the selection of informative subsets of genes for expression analysis.
- a specific drug target such as an enzyme in a known biochemical pathway
- one or more in vi tro or in vivo assays specific to the chosen target must be developed. Only after the target is chosen and specific assays developed can chemical compounds be screened for the desired activity. Once compounds are identified that have the desired activity against the chosen target in the dedicated assays, these initial lead compounds serve as the structural predicates for developing derivatives with more favorable therapeutic, phar acokinetic, and clinical properties. The bioactivity of these derivatives is often assessed using the same dedicated assays which identified the lead compound.
- selection of the drug target presupposes knowledge of the biological pathways that are clinically relevant to the disease or pathologic process for which the drug is intended.
- the chosen target may prove to be physiologically unsuitable.
- the target may, for example, be involved in a number of related or unrelated biological pathways.
- the dedicated in vi tro assays may fail to identify effects of the candidate drug on these parallel or intersecting biological pathways.
- drugs that desirably affect the target's activity in vi tro may prove unacceptably toxic or present undesirable side effects when administered in vivo.
- the in vi tro assay methods may themselves prove to be insufficiently sensitive, insufficiently specific, or both.
- the use of the same assays in the development of derivatives of the lead compound may compound these problems.
- the elements of the matrix are selected to represent the totality of genes that can be expressed by the host from which the immobilized DNA matrix was prepared. Specific hybridization to various DNA elements in the matrix, as recorded, e.g., by scanning laser, scanning confocal fluorescence microscopy, or Phosphorlmager, indicates expression of the respective gene. The identity of the respective gene is encoded in the spatial location of the element in the matrix. The data are acquired, digitized, and stored electronically. Taken together, the data identify the subset of genes expressed by the chosen cell culture. Ashby et al . , U.S. Patent No. 5,549,588
- each element of the spatially-addressable matrix consists of one or more identical cells (or clones of cells) , rather than of specific nucleic acid sequences.
- the cells at each matrix position contain a recombinant construct that directs expression of a common reporter gene from a distinct transcriptional regulatory element.
- the transcriptional regulatory element may be drawn from any number of potential eukaryotic or prokaryotic organisms. A sufficient number of matrix elements, and thus of transcriptional regulatory elements, is included to provide a representative sampling of the gene expression repertoire of the chosen organism.
- Ashby et al read the matrix directly by scanning with a detection device as appropriate to, and dictated by, the reporter.
- the reporter encodes a protein that generates a fluorescent signal, such as green fluorescent protein, and is thus scanned with a fluorescence detector; in another embodiment, the reporter encodes enzymes that produce signals detectable photometrically, and is scanned with a photometer. Signals, as recorded by the scanner, indicate expression operably controlled by the respective transcriptional regulatory element, the identity of which is encoded in the spatial location of the element in the matrix.
- Each of the above-described technologic platforms for generating gene expression profiles herein collectively termed “expression matrices”, generates a large amount of information about the concurrent expression of genes in a cell under defined conditions.
- Such a gene expression profile in its totality, captures the global gene expression state of the cell under a chosen set of environmental conditions.
- genes prove equally informative. Some may have an insufficient dynamic range in expression to provide significant information, no matter what the environmental condition. Other genes may vary in expression coordinately, or cooperatively, providing redundancy in the information collected.
- the present invention solves these and other problems in the art by providing methods, systems, and apparatuses for the quantitative analysis of gene expression profiles.
- the experimental examples demonstrate that such analyses allow one to quantify and to order the relatedness of various drug treatments, permitting the identification of chemical agents that act on the identical molecular target as that affected by a reference drug; permitting the identification of chemical agents that act elsewhere in the same physiologic pathway as that of the reference drug; clarifying the mechanism of action of the reference drug; and clarifying the mechanisms of action of the chemical agents compared to the reference drug— all without the prior identification of the reference drug's molecular target or the development of a dedicated assay.
- the analyses apply equally to comparison of other cellular phenotypes, including those caused by other environmental conditions and by genotypic perturbations, including mutations.
- the invention provides a method of quantifying the relatedness of a first and second gene expression profile.
- This first method comprises the steps of: (a) generating a first and second gene expression signal, respectively, for each gene commonly represented in the first and second gene expression profiles; (b) formulating a relative expression score for each pair of first and second gene expression signals; and then (c) calculating from these pair-wise formulated relative expression scores a composite score, the composite score quantifying the relatedness of the two gene expression profiles.
- the invention provides a second method of quantifying the relatedness of a first and second gene expression profile, the second method particularly well-suited for comparison of gene expression profiles obtained under mild conditions.
- This second method comprises the steps of: (a) generating a first and second gene expression signal, respectively, for each gene commonly represented in the first and second gene expression profiles; and then (b) performing a linear regression on the set of paired first and second gene expression signals for the commonly represented genes; wherein the correlation coefficient of such regression quantifies the relatedness of the two gene expression profiles.
- the invention provides a method of ordering the relatedness of a plurality of gene expression profiles to a single preselected gene expression profile, comprising the steps of: (a) quantifying pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile; and then
- the invention provides methods of quantifying the relatedness of a first and a second environmental condition upon a cell, comprising the steps of: (a) obtaining from the cell, or from genotypically identical cells, a gene expression profile under each of the first and second environmental conditions; and then (b) quantifying the relatedness of the first and second gene expression profile.
- the first and second environmental conditions each comprises exposure to a chemical compound, such as a pharmaceutical agent.
- the invention further provides methods for ordering the relatedness of a plurality of environmental conditions to a single preselected environmental condition upon a cell, comprising the steps of: (a) obtaining from the cell, or from genotypically identical cells, a gene expression profile for each of the plurality of environmental conditions and for the preselected environmental condition; (b) quantifying pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile; and then
- the environmental conditions comprise exposure to a chemical compound.
- the invention provides methods of quantifying the relatedness of a preselected environmental condition to a defined genetic mutation of a cell, comprising the steps of: (a) obtaining a first gene expression profile from a cell bearing the defined mutation and a second gene expression profile from a wild-type cell under the preselected environmental condition; and then (b) quantifying the relatedness of said first and second gene expression profile.
- the invention further provides methods of ordering the relatedness of each of a plurality of environmental conditions to a defined genetic mutation of a cell, comprising the steps of: (a) obtaining a set of first gene expression profiles from a wild type cell under each one of the plurality of environmental conditions and a second gene expression profile from a cell having the defined mutation; (b) quantifying pairwise the relatedness of each of the first gene expression profiles to the second gene expression profile; and then (c) ordering the pairwise-measured quantities.
- the environmental conditions comprise exposure to a chemical compound, and the pair-wise quantification is performed according to one of the two methods newly presented herein.
- the invention provides methods of quantifying the relatedness of a first genetic mutation of a cell to a second genetic mutation of a cell, comprising the steps of: (a) obtaining a first gene expression profile from a cell having the first genetic mutation and a second gene expression profile from a cell having the second genetic mutation; and (b) quantifying the relatedness of the first and second gene expression profile.
- the invention further provides methods of ordering the relatedness of each of a plurality of genetic mutations to a preselected genetic mutation of a cell, comprising the steps of: (a) obtaining a set of first gene expression profiles from cells each having one of the plurality of genetic mutations and a second gene expression profile from a cell having the preselected mutation; (b) quantifying pairwise the relatedness of each of the first gene expression profiles to the second gene expression profile; and (c) ordering the pairwise-measured quantities.
- the environmental condition includes exposure of the cell to a chemical compound
- the cell is a yeast cell, preferably
- Saccharomyces cerevisiae and the gene expression profile is acquired from a genome reporter matrix.
- the methods may broadly be applied, however, to any environmental condition, prokaryotic as well as eukaryotic cells, including human cells, and to gene expression profiles obtained from other types of expression matrices.
- the present invention provides systems, including computer systems, for performing the aforementioned quantitative methods.
- the invention provides a system for quantifying the relatedness of a first and second gene expression profile, comprising: (a) means for generating a first and second gene expression signal, respectively, for each gene commonly represented in the first and second gene expression profiles; (b) means for formulating a relative expression score for each pair of first and second gene expression signals; and (c) means for calculating from the pair-wise relative expression scores a composite score, the composite score serving to quantify the relatedness of the two gene expression profiles.
- the invention provides a system for quantifying the relatedness of a first and second gene expression profile, comprising: (a) means for generating a first and second gene expression signal, respectively, for each gene commonly represented in the first and second gene expression profiles; (b) means for performing a linear regression on the set of paired first and second gene expression signals for the commonly represented genes; wherein the correlation coefficient of such regression quantifies the relatedness of the two gene expression profiles.
- the invention provides a system for ordering the relatedness of a plurality of gene expression profiles to a single preselected gene expression profile, comprising: (a) means for quantifying pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile; and (b) means for ordering the pairwise-measured quantities.
- the invention also provides computer systems for quantifying the relatedness of a first and second gene expression profile, comprising a processor, such as a digital microprocessor, programmed to:
- (c) calculate from the pair-wise relative expression scores a composite score, wherein the composite score quantifies the relatedness of the two gene expression profiles .
- the invention provides computer systems for quantifying the relatedness of a first and second gene expression profile, comprising a processor, such as a digital microprocessor, programmed to: (a) generate a first and second gene expression signal, respectively, for each gene commonly represented in the first and second gene expression profiles; (b) perform a linear regression on the set of paired first and second gene expression signals for the commonly represented genes; wherein the correlation coefficient of such regression quantifies the relatedness of the two gene expression profiles.
- a processor such as a digital microprocessor
- the invention additionally provides computer systems for ordering the relatedness of a plurality of gene expression profiles to a single preselected gene expression profile, comprising a processor, such as a digital microprocessor, programmed to: (a) quantify pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile; and (b) order these pairwise- measured quantities.
- apparatuses comprising a programmable digital computer, with input means and display means, capable of performing the described computational methods on input expression data and reporting the quantitative results on the associated display means.
- the present invention provides computer readable storage media storing instructions that, when executed by a computer, cause the computer to perform each of the novel methods herein described, including methods for quantifying the relatedness of a first and second gene expression profile and methods for ordering the relatedness of a plurality of gene expression profiles to a single preselected gene expression profile.
- the invention provides computer readable storage media containing data structures adapted for the methods of the present invention.
- the invention provides a computer readable storage medium containing a data structure configured to store data that quantitatively relate a first and second gene expression profile, the data structure comprising an identifier for each of the expression profiles and a scalar, the scalar quantitatively relating the first to the second gene expression profile.
- the invention further provides computer readable storage media containing a data structure configured to store data that orders the relatedness of a plurality of gene expression profiles to a single preselected gene expression profile, comprising: (a) an ordered list of scalars, each scalar quantifying pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile; and (b) identifiers that associate each scalar with its respective gene expression profile.
- samples of drug candidates may be in limiting supply, particularly when produced in small quantity by combinatorial chemistries; there may simply be too little of the agent to permit the testing of its effects on all possible genes of a given cell type. It may also, or in the alternative, be too expensive to assay each candidate agent across each expressible gene of the cell.
- the present invention provides methods for selecting informative subsets of genes for expression analysis.
- the invention provides methods of cellular phenotyping, comprising selecting no more than 20% of a cell's expressible genes for expression analysis, wherein the concurrent expression of the selected genes sufficiently defines the cell's phenotype as to permit the cell's phenotype quantitatively to be related to the phenotype of another cell.
- no more than about 20% of the cell's potentially expressible genes are selected, more preferably no more than about 15% of the cell's potentially expressible genes, even more preferably no more than about 10% of the cell's potentially expressible genes, optimally no more than about 5% of the cell's potentially expressible genes, and in the most preferred embodiments, about 1% - 5%, and even 1 - 2% of the cell's potentially expressible genes.
- Algorithms for effecting such selection, and computers, systems, networks, and other devices for effecting the methods are also presented.
- the methods of this aspect of the invention comprises selecting, from each group of genes whose expression is correlated, the gene with greatest expressive range.
- the selection is made from the set of genes commonly represented in a plurality of gene expression profiles, and each of the ranges and each of the correlations is calculated from expression data in the plurality of gene expression profiles.
- the invention provides a system for selecting an informative subset of genes for expression analysis, comprising: means for selecting, from each group of genes whose expression is correlated, the gene with greatest expressive range.
- the selection is made from the set of genes commonly represented in a plurality of gene expression profiles, and each of the ranges and each of the correlations is calculated from expression data in the plurality of gene expression profiles.
- the invention also provides a computer system for selecting an informative subset of genes for expression analysis, comprising a processor, such as a digital microprocessor, programmed to select, from each group of genes whose expression is correlated, the gene with greatest expressive range; a computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of selecting an informative subset of genes for expression analysis, the method comprising selecting, from each group of genes whose expression is correlated, the gene with greatest expressive range; and a computer readable storage medium containing a data structure configured to store data that identifies an informative subset of genes for expression analysis, the data structure comprising a set of gene identifiers, optionally including a description of gene function.
- a processor such as a digital microprocessor
- FIG. 1 is a flow chart describing the process by which gene expression signals suitable for the quantitative analysis of gene expression profiles are derived from signals initially acquired from a gene expression matrix, with FIG. IA schematizing initial signal processing and FIG. IB describing an optional subsequent correction according to an environmentally- matched control;
- FIG. 2 is a scatter plot of gene expression signals, as processed according to FIG. 1, derived from genome reporter matrices treated individually with one of two chemotherapeutic agents known to be closely related in structure and function: 50 ⁇ g/ml daunarubicin and 50 ⁇ g/ml doxorubicin (see Example 2) ;
- FIG. 3 plots gene expression signals derived from matrices treated individually with one of two drugs of disparate structure and disparate function: 50 ⁇ g/ml doxorubicin and 0.08 ⁇ g/ml miconazole;
- FIG. 4 plots gene expression signals derived from matrices treated individually with one of two drugs of disparate structure but similar function: 9 ⁇ g/ml mycophenolic acid and 50 ⁇ g/ml daunarubicin;
- FIG. 5 is a flow chart describing a first process for reducing sets of individual gene expression signals, prepared according to the process schematized in FIG. 1, to values that may be used quantitatively to rank the relatedness of gene expression profiles;
- FIG. 6 is a flow chart describing a second process for reducing sets of individual gene expression signals, prepared according to the process schematized in FIG. 1, to values that may be used quantitatively to rank the relatedness of gene expression profiles;
- FIG. 7 is a scatter plot of gene expression signals as processed substantially according to FIG. 1, derived from genome reporter matrices comprising 1532 separate gene expression reporters, each matrix treated individually with one of two agents known to be closely related in structure and function: 10 ⁇ g/ml Lovastatin (X axis) and 20 ⁇ g/ml Mevastatin (Y axis) ;
- FIG. 8 is a scatter plot of gene expression signals from a 96 gene subset of the 1532 gene expression signals presented in FIG. 7, the subset selected according to the algorithm charted in FIGS. 9 and 10;
- FIG. 9 is a flow chart schematizing the first of two major steps in an algorithm for selecting informative subsets of genes for quantitative analysis of gene expression profiles.
- FIG. 10 schematizes two full iterations of the second of two major steps in an algorithm for selecting informative subsets of genes for quantitative analysis of gene expression profiles.
- the phrase "gene expression matrix” refers to a device for acquiring data on the concurrent expression of a plurality of genes, such as is described in Lashkari et al . , Proc. Natl. Acad. Sci. USA, 94, pp. 13057-62 (1997); DeRisi et al . , Science, 278, pp. 680-86 (1997); Wodicka et al . , Nature Biotechnology, 15, pp. 1359-67 (1997); Pietu et al . , Genome Research, 6, pp. 492-503 (1996); Ashby et al . , U.S. Patent No. 5,549,588.
- "Genome reporter matrix” particularly refers to the gene expression matrices of Ashby et al .
- gene expression profile refers to a data set, however constructed, whether stored permanently or ephemerally, in an electronic medium or otherwise, each element of which set represents a measure of the concurrent expression of a distinct and identifiable open reading frame of a cell, typically as acquired from a gene expression matrix.
- the present invention provides a method of quantifying the relatedness of a first and second gene expression profile, comprising the steps of: (a) generating, for each gene commonly represented in the first and second gene expression profiles, a first and a second gene expression signal, respectively; (b) formulating a relative expression score for each pair of first and second gene expression signals; and then (c) calculating, from the pair-wise relative expression scores, a composite score, the composite score quantifying the relatedness of the two gene expression profiles.
- a second method of quantifying the relatedness of a first and second gene expression profile comprising: (a) generating, for each gene commonly represented in the first and second gene expression profiles, a first and a second gene expression signal; and then performing a linear regression on the set of paired first and second gene expression signals for the commonly represented genes; wherein the correlation coefficient of such regression quantifies the relatedness of the two gene expression profiles .
- the present invention further provides a method of ordering the relatedness of a plurality of gene expression profiles to a single preselected gene expression profile, comprising the steps of: (a) quantifying pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile, using either of the two described methods, and then (b) ordering these pairwise-measured quantities.
- the pairwise quantification of relatedness is performed according to one of the two methods newly described herein.
- Each of these methods may be better understood through reference to the figures, as will now be described in further detail.
- FIG. 1 is a flow chart describing the process by which gene expression signals suitable for the quantitative analysis of gene expression profiles are derived from signals initially acquired from a gene expression matrix, with FIG. IA schematizing initial signal processing and FIG. IB describing an optional subsequent correction according to an environmentally- matched control.
- the initial data acquisition steps, delimited by box 116, may be performed serially, as shown, or may be performed concurrently; digitization 101 may be performed by the signal acquisition device itself, by a separate analog-to-digital converter, or may be obviated by acquiring expression data directly in digital form.
- Each of the subsequent data manipulation steps may be accomplished in a programmable digital computer using techniques well known in the computer science art. Some of the steps may alternatively be accomplished using analog circuitry well known in the art. The steps may be performed in a single computing device, a series of computing devices, or distributed in parallel across multiple computing devices, as long as the temporal order of steps is observed. The process may be carried out continuously, as shown, or discontinuously, with intermediate values stored, for example, at the identified steps for subsequent processing.
- the steps shown in FIGS. 1, 5, 6, 9 and 10 may be coded in any of the higher level languages well known in the art, including but not limited to FORTRAN, BASIC, Pascal, C, C+, C++, JavaTM, or the like; the results shown in the Figures and presented in the Examples herein were generated using digital computers programmed in C.
- the steps shown in FIGS. 1, 5, 6, 9 and 10 may be coded directly in assembly language.
- expression data are first acquired 100 as initial expression signals of a form and in a manner appropriate to the particular gene expression matrix; for the expression matrix of Ashby et al . , for example, fluorescence data may be acquired by a scanning laser.
- the initial expression signals are acquired individually for each of the physical locations of the expression matrix, also termed matrix elements. These initial expression signals represent the level of expression of each of the genes individually assayed in the matrix under a selected environmental condition.
- Initial background signals will typically also be acquired, most often concurrently, from one or more control locations on the gene expression matrix.
- the nature of such background controls depends on the nature of the physical matrix.
- those matrices that measure hybridization of fluorescently labeled or radiolabeled nucleic acids may include, as such a control, a measurement from one or more locations on the matrix that contain no nucleic acid at all, or one or more locations on the matrix containing nucleic acid that is not complementary to a known ORF, or both.
- matrices that measure expression from recombinant reporters within transformed cells see e.g., Ashby et al .
- control measurements from one or more locations on the matrix may include as such a control measurements from one or more locations on the matrix that contain cells lacking the recombinant reporter construct; that contain cells including a recombinant construct unable to express the reporter gene; that contain cells comprising a reporter construct but lacking a necessary substrate; or the like.
- background control elements will typically be included on each matrix, background measurements may also be acquired from distinct physical matrices, or even historically by reference to earlier stored data values from similar matrices. The choice of the type and number of such controls is well within the competence of the skilled artisan.
- the initial expression signals and initial background signals are then digitized 101 and stored electronically as initial signal values and initial background values, respectively.
- Any convenient tabular, matrix, or spreadsheet format may be used to store these data, which are collectively referred to as a gene expression profile.
- the data may be stored as volatile data, such as values in random access memory. Alternatively, the data may be stored more permanently on magnetic, optical, or magnetooptical storage media, or the like.
- the initial signal value for each distinct element of the expression matrix is separately and distinctly identified, whether by its location in a corresponding multidimensional data matrix, by the appending of header information to each component of the data itself, or by other suitable means known to those skilled in the art.
- the fluorescence intensity of a single physical matrix element may be represented by a single record with multiple fields, one or more fields of which identify the physical origin of the signal, the date and time of data acquisition, an identifier for the experiment run, and/or the like.
- the dynamic range of the initial expression signals will be established by physical limits imposed by the format of the expression matrix, and in particular by the dynamic range of the expression reporter and the sensitivity range of the acquisition device.
- the analog signal may be represented as an initial signal value by digital data of varying depths, such as 8-bit, 16-bit, 32-bit, and the like, and that the greater the data depth the finer the distinction in intensity which may be encoded, but the greater the storage requirements for those data. The choice of data depth will therefore be made based upon empiric requirements which will be well understood by the skilled artisan.
- initial digitization may be performed using one data depth, with subsequent analysis proceeding with data of lesser depth. In the latter case, a simple linear transformation may be used to reduce the data depth.
- a background correction 118 may preferably, but need not necessarily, be performed.
- the measured (or historical) background value is added to each initial signal value, irrespective of the input's original value. In another, one-half the measured background value is added to each input value.
- Each initial signal value is compared 102 to the initial background value. If the signal value equals or exceeds the background value, no correction is made and the variable Signal is assigned 106 the initial signal value. Alternatively, if the initial signal value is less than the background value, Signal is assigned the value of background 104.
- FIGS. 2, 3, and 4 are scatter plots, each plotted point of which reports the relative expression of a distinct gene under the two identified conditions. The figures are further described below.
- the scale is logarithmic, with tick marks on the horizontal and vertical axes of these figures set at intervals of one natural log (e x ,e 2 ,e 3 , etc.).
- tick marks on the horizontal and vertical axes of these figures set at intervals of one natural log (e x ,e 2 ,e 3 , etc.).
- much of the data lies within the square delimited by the first tick mark in each direction on the two axes. That is, all of the data within such square would be discarded from the analysis were changes of less than one natural log (approximately 2.7-fold) disregarded for the inability to distinguish such change from the standard measurement error.
- Were changes of less than two natural logs (e 2 , or 7.4-fold) disregarded all data within the square delimited by the second tick mark in each direction would be eliminated from the analysis. As made evident by the Figures, most of the useful data would consequently be lost.
- the present invention makes possible the use of these data. Although the significance of small changes in expression of any one gene may be undeterminable due to the size of the standard error, the significance of the collection of changes may indeed often be determined; where prior methods focus upon the standard error as a measure of significance, the present invention instead focuses upon the standard error of the mean. On average, the collection of changes in gene expression as between two different environmental conditions may be strongly correlated, as further shown below.
- the Signal for each matrix element is then normalized 108 to control for variance as between otherwise identical experiments, that is, as between data acquisition runs on a single expression matrix, or as between individual data acquisitions from duplicate matrices.
- the utility of normalizing expression signals was recognized in the art well before the advances which made possible highly parallel measurements of gene expression using gene expression matrices.
- individual gene expression measurements for example, by Northern blot analysis, were frequently normalized by comparing expression to that of a constitutive housekeeping gene, such as actin, probed either concurrently or serially on the same blot. In this way, variability introduced by unequal gel loading, variation in mRNA purity, or the like, could be controlled.
- the limitation of the prior approach was the possibility that the individual gene chosen as the reference standard might itself vary in expression.
- the problem is compounded in the present invention by the desire to measure the entirety of the gene expression of a cell, including that of "housekeeping genes, " and by the desire to measure changes in gene expression in the presence of drugs, the effects of which cannot be predicted a priori .
- the preferred method herein is to assume the mean signal, across all genes, to be constant: normalization is thus achieved by dividing each signal by the sum of all signals, as shown 108 (FIG. IA) .
- the normalization step may optionally be omitted. Accordingly, normalization step 108 was omitted from the analysis of the 96 gene subset during the quantitative analyses reported in Example 5, below: the normalization step was omitted because the assumption of constant mean expression may not prove valid.
- the logarithm of each signal value is taken; that is, Signal is assigned the logarithm of the Signal value.
- log 10 m y The natural logarithm is preferred, although log 10 m y also be used. There are three advantages to performing the comparative analysis using the logarithm of the signal value. First, conversion to logarithmic values allows equivalent fold-changes in expression levels to be assessed equivalently, whether such change is an increase or decrease in expression.
- the third advantage of using logarithmic values is that plotting the values on a logarithmic scale presents advantages in the visual display of data, as demonstrated in FIGS. 2 - 4 (see below) .
- the Signal that results from the process of FIG. IA, concluding with step 110, is suitable for use in the quantitative analysis of gene expression profiles, as further schematized in FIGS. 5 and 6.
- a series of additional steps, as set forth in FIG. IB, is preferably performed.
- Drugs are formulated in various solvents, including organic solvents, which themselves may variously affect gene expression.
- changes in a gene expression profile that result from introduction of a drug into a cell's culture media include changes (1) wrought by the drug, and (2) changes caused by the solvent.
- the media itself may contribute changes, as demonstrated in Example 4 and Table 7, infra .
- strain or cell-type differences may exist as between the cells assayed.
- the signal from a solvent-matched, media-matched, and preferably strain-matched control should be subtracted, as detailed in FIG. IB.
- initial expression signals and initial background signals from a matched control expression matrix are acquired.
- an otherwise identical expression matrix such as a genome reporter matrix
- methanol alone would be treated with methanol alone at the identical concentration, and initial expression signals and initial background signals acquired therefrom.
- the gene's Signal from the matched control matrix (Signal mc 132) is subtracted 134 from the gene's Signal 130 as acquired from the experimental matrix.
- the first decisional query 136 asks whether the corrected Signal 134 is less than zero and if Signal mc was less than its background at step 102. If the first decisional query 136 returns true, the corrected Signal is set to zero, 138. That is, because it is impossible to determine whether the corrected Signal is real, the value is set to zero so that the Signal is discarded from subsequent analysis.
- FIGS. 2, 3, and 4 show scatter plots of gene expression data processed as described above, including the steps set forth in FIG. IA and FIG. IB.
- FIGS. 2 - 4 are derived from initial expression signals generated by genome reporter matrices (for details, see the Examples below) .
- FIG. 2 plots data derived from matrices treated individually with one of two chemotherapeutic agents known to be closely related in structure and function: daunarubicin and doxorubicin.
- FIG. 3 plots data derived from matrices treated individually with one of two drugs of disparate structure and disparate function: doxorubicin, a chemotherapeutic agent, and miconazole, an antifungal agent.
- FIG. 4 plots data derived from matrices treated individually with one of two drugs of disparate structure but related function, mycophenolic acid and daunarubicin, both of which inhibit DNA synthesis .
- Each point plotted on the graphs of FIGS. 2, 3, and 4 represents the expression of a specific gene: the X coordinate plots the value as calculated from the signal obtained in the presence of one of the drugs (doxorubicin in FIG. 2, doxorubicin in FIG. 3, daunarubicin in FIG. 4) , and the Y coordinate plots the value as calculated from the signal obtained in the presence of the second of the drugs (daunarubicin in FIG. 2, miconazole in FIG. 3, and mycophenolic acid in FIG. 4) .
- FIGS. 2, 3, and 4 Visual inspection of FIGS. 2, 3, and 4 demonstrates the usefulness of expression profile analysis for facilitating drug discovery, and further demonstrates that at the extremes of relatedness (unrelatedness) presented in these figures, even casual qualitative analysis of data processed as presented above proves useful.
- the expression of some genes is increased by both drugs (those points in the upper right quadrant)
- the expression of some genes is decreased by treatment with both of the drugs (those points in the lower left quadrant)
- the expression of other genes is oppositely affected by the drugs (those points in the upper left and lower right quadrants) .
- FIG. 4 presents an intermediate case, in which both drugs are known to affect DNA synthesis, albeit by different mechanisms.
- This invention addresses this problem by providing a reproducible, quantitative assessment of relatedness of gene expression profiles; the invention additionally permits analysis of greater than two compounds, allowing a ranked order of gene expression profile relatedness to be generated.
- the present invention provides a method of quantifying the relatedness of a first and second gene expression profile, comprising the steps of: (a) generating, for each gene commonly represented in the first and second gene expression profiles, a first and a second gene expression signal; (b) formulating a relative expression score for each pair of said first and second gene expression signals; and then (c) calculating, from said pair-wise relative expression scores, a composite score, wherein said composite score quantifies the relatedness of the two gene expression profiles.
- a relative expression score 524 is formulated 528 separately for each gene commonly represented in the two gene expression profiles. Thereafter, a composite score is calculated 526 from the collection of all such individual gene relative expression scores, the composite score serving to quantify the relatedness of the two gene expression profiles.
- the signal for a gene under a first condition, Signall, 500 is input. This signal has been processed as set forth in FIG. 1; as noted above, the signal has preferentially, but need not have been, corrected as set forth in FIG. IB by subtraction of an environmentally-matched control.
- the signal for the same gene under a second condition, Signal2, 502, similarly processed as set forth in FIG. 1, is subtracted to provide a relative expression score, 504. Since the signal values input are logarithmic values, 110, the difference represents a ratio of expression.
- the artifact correction is performed using two decisional queries, 506 and 510.
- the queries may be done sequentially in any order, or may more typically be accomplished in a single line of code.
- Score 504 is less than zero — that is, when Signal2 exceeds Signall — there exists the possibility that Signal2 had been artificially and artefactually increased 104 during background correction, as followed by normalization, and that the true value of Signal2 is less than or equal to Signall.
- the first decisional query 506 whether the relative expression score 504 is less than zero and if Signal2 was less than its background at step 102. If the first decisional query 506 returns true, the relative expression score is set to zero, 508.
- the value is set to zero so that the score does not contribute to the composite score 526.
- the relative expression score 504 is greater than zero — that is, when Signall exceeds Signal2 — there exists the possibility that Signall was artificially and artefactually increased 104 during background correction, as followed by normalization, and that the true value of Signall is less than or equal to Signal2.
- the relative expression score is also set to zero 518 so that this relative score does not contribute to the composite score.
- Each expression matrix technology has its own detection threshold below which signals cannot reliably be measured.
- the oligonucleotide hybridization platform of Lashkari et al . , supra has a different detection threshold from the cellular genome reporter matrix of Ashby et al . , supra .
- Such thresholds are determined empirically. In a simple approach, one twice performs the identical experiment, whether acquisition of a no-treatment profile, or acquisition of a profile from cells identically treated by the same drug.
- the standard deviation of this distribution provides a guide for setting an appropriate threshold.
- the steps delimited by box 522 also remove from further consideration the direction of the change in the expression of a gene as between a first and second gene expression profile. This is of course necessarily the case for Scores set to zero 518 for failure to exceed the user-defined threshold. As for the remaining scores, the directionality is eliminated by the assignment of the absolute value 520 of any non- negative scores to Score. In measuring the relatedness of two treatments, the informational content of a gene's repression is thus treated equivalently to that of a gene's activation— only the magnitude of the relative change is used.
- steps 506, 508, 510 and 512, together delimited by box 514 scores are set to zero when, due to background correction and normalization, it cannot be said accurately whether the direction of the relative score is real.
- steps 516, 518, and 520, together delimited by box 522 scores are set to zero when, although not artifactual, they may not be distinguishable statistically from zero.
- a final manipulation 524 corrects for the disparate dynamic ranges of gene expression manifested by the various genes of the organism. For example, some genes may be capable of only a two-fold change in gene expression no matter how severe the change in condition; other genes may be capable of a 200-fold change in gene expression. To prevent those genes with greater dynamic range from unduly skewing the comparative analyses, each relative expression score is divided by the log of the square root of the historical maximum expression observed for that gene over all prior experiments.
- each relative expression score is divided by the log square root of the largest signal historically output from step 108; that is, each relative expression score is divided by the log square root (one-half the log) of the largest normalized signal observed historically for that gene.
- the value for each gene will depend both upon the expression matrix technology (such as array size) and the data previously collected, and will, on occasion, change as further experiments are done.
- each relative expression score is divided by the log square root of the largest signal historically output from step 108 - that is, by the largest normalized signal — with the difference from the first approach lying in the value chosen to accomplish the normalization (" ⁇ Signals" in step 108) .
- ⁇ Signals This approach is further discussed and exemplified in Example 5, below.
- each relative expression score is divided by the log square root of the largest signal historically input to step 108; that is, each relative expression score is divided by the log square root (one-half the log) of the largest un- normalized signal observed historically for that gene. This may be particularly preferred in circumstances in which normalization proves inappropriate.
- a further alternative approach is to make no correction at all, on the assumption that genes whose expression can vary the most are biologically more important, or at least more significant in assessing relatedness of environmental conditions.
- genes treat the various genes differently, depending upon their empirically- determined significance to the analysis being performed. For example, most of the genes may be treated as above-described, dividing by the log of the square root of the historical maximum expression observed for that gene over all prior experiments. A predetermined subset of particular genes, however, may be differentially treated at this step to increase or decrease their significance in the subsequent analysis.
- the aforementioned steps, collectively delimited by box 528, are followed for each of the genes commonly represented in a first and second gene expression profile. For some expression matrices such as those that measure gene expression in prokaryotes or small eukaryotes such as yeast, all, or substantially all, open reading frames may be so compared.
- a final, scalar measure also termed a composite score, which expresses in a scalar value the relatedness of the gene expression profiles of the two conditions, may be calculated 526 by summation. The lower the resulting number, the more closely related the gene expression profiles under the two compared conditions, with complete identity giving a value of zero.
- the summation is optionally and preferably corrected
- the analyses presented in Examples 1 - 4 below were performed on gene expression profiles acquired from matrices with 864 reporters.
- the scores obtained from step 526 may optionally be normalized to express the relative expression score per 1000 genes, to permit comparisons from different sized matrices.
- the relative profile score 526 is further multiplied by the ratio of 1000 divided by the total number of genes in the matrix used.
- the above-described method allows one quantitatively to rank the relatedness of two gene expression profiles: the lower the resulting composite score, the more related the profiles; the more related the profiles, the more related the global gene expression state of the cells under the two distinct conditions under which the gene expression profiles were obtained.
- the environmental condition may, for example, be incubation in different media, as further demonstrated in Example 4 below.
- the two environmental conditions may comprise treatment with two different chemicals, such as pharmaceutical drug candidates, with the relatedness of the gene expression profiles, as reported by the composite score, indicating the relatedness of the action of the drugs. This aspect of the invention is demonstrated in Examples 1 - 3.
- the method may also be used quantitatively to relate a preselected environmental condition to a defined genetic mutation of a cell, comprising the steps of: (a) obtaining a first gene expression profile from a cell bearing a mutation and obtaining a second gene expression profile from a wild-type cell under a preselected environmental condition; and then (b) quantifying the relatedness of the first and second gene expression profiles.
- the environmental condition under which expression data are acquired from the wild type cell comprises exposure to a chosen chemical compound.
- mutations that mimic the effects of the drug may be identified by the quantitative relatedness of their gene expression profile to that obtained in the presence of the drug. The result is the elucidation of the mechanism of drug action through identification of all targets, direct and indirect, affected by the drug. Furthermore, the relatedness of two mutations may be determined by quantitatively relating the gene expression profile obtained from each, absent additional drug.
- the cells are preferably yeast cells, more preferably, Saccharomyces cerevisiae .
- yeast are particularly preferred for this purpose, and for other applications in which relatedness of genetic mutations is assessed, because (1) the entire genome of S. cerevisiae has been sequenced, (2) targeted deletions or insertions may readily be made by homologous recombination, and (3) many fundamental metabolic pathways are highly conserved as between yeast and humans. See, e.g., the discussion in Lashkari et ai. The methods may be applied more broadly, however, whenever mutations are identified in the cells of other prokaryotic or eukaryotic organisms.
- the present invention also provides a method for ordering the relatedness of a plurality of gene expression profiles.
- a series of composite scores are obtained, each measuring the relatedness to a common index, or reference, profile. Thereafter, the composite scores are ordered, with lower scores indicating greater relatedness to the index profile.
- Such ordered rankings are presented in the Tables below.
- the invention provides a method to order the relatedness of environmental conditions to a single preselected environmental condition upon a cell, comprising the steps of: (a) obtaining from the cell or from genotypically identical cells a gene expression profile for each of the plurality of environmental conditions and for the preselected environmental condition; (b) quantifying pairwise the relatedness of each of the plurality of gene expression profiles to the preselected gene expression profile; and
- one or more of the environmental conditions comprises exposure of the cells to a chemical compound.
- the invention also provides a method to order the relatedness of each of a plurality of environmental conditions to a defined genetic mutation of a cell, comprising the steps of: (a) obtaining a set of first gene expression profiles from a wild type cell under each one of the plurality of environmental conditions and a second gene expression profile from a cell having said defined mutation; (b) quantifying pairwise the relatedness of each of said first gene expression profiles to said second gene expression profile; and then (c) ordering the pairwise-measured quantities.
- the invention also provides a method to order the relatedness of each of a plurality of genetic mutations to a defined, or preselected, mutation of a cell, comprising the steps of: (a) obtaining a set of first gene expression profiles from cells each having one of the plurality of genetic mutations and a second gene expression profile from a cell having the preselected mutation;
- the composite score, and thus the ranking of relatedness that is provided by the procedures of FIG. 5, is weighted substantially by outliers, that is, by those genes whose expression changes substantially as between the two measured conditions. This is true notwithstanding the correction for the dynamic range of expression of the various genes, 524, and results from steps 516, 518, and 520, delimited by box 522 in FIG. 5, in which application of a threshold requirement for data inclusion reduces the contribution by genes with small changes in expression as between the measured conditions.
- An advantage of such bias is that it focuses the ranking on genes that contribute most substantially to the phenotypic change.
- FIG. 6 provides an alternative method for quantitatively relating gene expression profiles, one that instead weights the ranking of relatedness more toward the commonality of the direction of change in individual gene expression, rather than the magnitude of such change.
- the method presented in FIG. 6 presents several advantages over that set forth in FIG. 5, particularly the ability accurately to relate gene expression profiles obtained using small concentrations of pharmaceutical agents, and is now preferred for quantitating the relatedness of profiles acquired under mild treatment conditions, such as low concentrations of drug.
- the method of FIG. 5, however, remains preferable for quantitating the relatedness of gene expression profiles acquired under more severe treatment conditions, such as treatment with high concentrations of drugs.
- FIG. 6 represents as a scatter plot the relative gene expression of distinct genes in yeast cells that have been treated individually with two closely related antineoplastic chemotherapeutic drugs. As discussed above, the treatments are seen to be closely related, each affecting both the direction and the magnitude of individual gene expression equivalently: as a result, most of the points lie approximately on a line through the origin. It will be understood that identical conditions, absent background, absent noise, and absent other variation, would produce theoretically a series of expression points all of which lie exactly on a line through the origin.
- the threshold applied in steps 516, 518, and 520 may be conceptualized, in FIG. 2, as two parallel lines of identical slope equidistant from the regression line drawn through the data, somewhat akin to a confidence interval.
- the lower the threshold applied empirically in step 516 the more closely the threshold lines may be conceived to lie to the data regression line, and the greater the number of data points that lie outside; the higher the threshold applied empirically in step 516, the further the threshold lines may be conceived to lie from the data regression line, and the fewer the number of data points that lie outside. Because only those points that lie outside the threshold lines contribute to the expression profile score (compare step 518 to 520), the method set forth in FIG. 5 is affected substantially by the distance such points lie from the regression line.
- FIG. 6 schematizes this second approach to quantifying the relatedness of two gene expression profiles .
- the gene expression signal for each gene commonly represented in the first (Signall 600) and second (Signal2 601) gene expression profile, as processed according to FIG. 1, is input.
- the Signals have been further corrected for matched controls according to the algorithm set forth in FIG. IB.
- a manipulation 610, 611 analogous to that performed at step 524 in the earlier algorithm set forth in FIG. 5 — corrects for the disparate dynamic ranges of gene expression manifested by the various genes of the organism.
- the same alternatives for adjusting for dynamic range as are set forth above with respect to step 524 apply here as well.
- Signal 600, 601 may be divided by the log square root of the maximum (normalized) signal historically output from step 108; may be divided by the log square root of the maximum signal historically input to step 108; may be divided by the log square root of the maximum (unnormalized) signal historically input to step 108; may be divided by the log of the maximal signal — either normalized or unnormalized — rather than by the log square root; may be left unaltered, making no correction for dynamic range at all; or may be adjusted individually using empirically chosen values.
- the first (Signall 610) and second (Signal2 611) expression signal are associated 620 to provide, for each gene, two-dimensional coordinates.
- Linear regression 625 on the collection of paired data — representing the expression of all genes commonly represented in the two gene expression profiles — then provides a Score 626 that provides a quantitative measure of the relatedness of the two gene expression profiles, with higher numbers indicating a closer degree of relatedness.
- the correlation coefficient itself may be used as the score, as may be any multiple thereof.
- the scores provided in the Examples below were further derived by multiplying the correlation coefficient by 100.
- any data structure that permits the first and second signal for each commonly represented gene to be associated for purposes of linear regression may be used, such as a single 2- dimensional matrix, a set of vectors, or the like.
- any statistical method that reports the closeness of the fit of the data to a best-fit theoretical line through the two-dimensional data may be used according to this invention for calculation of the relative profile score in steps 625 and 626.
- Those skilled in the art are both able to identify such data structures and statistical methods and to encode such calculations in a digital computer; it is the discovery that such closeness of fit permits reliable, reproducible, and ready quantitation of the relatedness of gene expression profiles that is newly described herein.
- An additional step, not described in FIG. 6, may optionally be added to the present method.
- Signall 600 and Signal2 601 may be subjected to queries identical to those presented at 506 and 510. That is, the question may be posed whether the earlier background correction and normalization potentially precludes the definitive determination of the direction of change in expression as between the two conditions. If so, that is, if the query presented at either 506 or 510 returns true, the Signals for the gene may optionally be omitted from the linear regression.
- the method described in FIG. 6 may be used, like that set forth in FIG. 5, to assess quantitatively the relatedness of two environmental conditions on the global gene expression of a cell; to assess quantitatively the relatedness of a preselected environmental condition to a defined genetic mutation of a cell; and to quantify the relatedness of two different mutations.
- the algorithm and methods set forth in FIG. 6 may be used, like that set forth in FIG. 5, to order the relatedness of a plurality of gene expression profiles, whether acquired under disparate environmental conditions, acquired from cells bearing various mutations, or acquired from a combination thereof.
- each gene commonly represented in a first and second gene expression profile is treated identically to other genes represented in the gene expression profiles, whether the algorithm given in FIG. 5 or that given in FIG. 6 is applied.
- weighting may be done, for example, by adjusting the Signal at step 524 or at step 610, 611.
- data may be stored for any individual gene expression profile at any or all of the intermediate points in the processes described in FIGS. 1, 5, or 6.
- Data acquired from any single expression matrix may, for example, be stored as raw digitized data as obtained at step 101, as background-adjusted, normalized signals as obtained at step 108, as the log of background-adjusted, normalized signals as obtained at step 110, or as signals fully corrected for matched controls, as obtained in step 112.
- identifying drugs that effect similar changes in the gene expression profile of a target cell may identify drugs that are similarly effective in treating that pathologic state, albeit drugs of similarly unknown mechanism.
- the ability to quantitate the relatedness of gene expression profiles may obviate the present need to identify an isolated pharmaceutical target, to develop a dedicated assay, and then to screen compounds for their activity in the dedicated assay.
- the ability to quantitate the relatedness of gene expression profiles may, moreover, facilitate efforts during latter stages of drug development to narrow and focus the specificity of action of promising drug candidates.
- pharmacologically- effective derivatives of a lead compound may be identified, as above, based on quantitative relatedness of their gene expression profiles to that of a lead candidate .
- Example 1 the relatedness of drugs to actinomycin D was assessed by quantitative comparison of a gene expression profile obtained in the presence of actinomycin D to a plurality of gene expression profiles obtained upon exposure to other pharmaceutical agents. Using either of the above-described algorithms, varying concentrations of daunarubicin, 5-FUDR, doxorubicin, 5-FU, hydroxyurea and mycophenolic acid were identified as causing quantitatively similar effects on the global gene expression of the cell, here an S. cerevisiae cell. All of these agents, like actinomycin D, are known to affect nucleic acid synthesis.
- Examples 2 and 3 similarly assess the relatedness, as measured by changes in global gene expression, of a plurality of drugs to one of two concentrations of daunarubicin, again demonstrating that the relatedness of action can be determined without foreknowledge of the structure or mechanism of the preselected reference drug.
- Example 4 demonstrates that the methods set forth herein may be used more broadly, quantitatively to relate the effects on a cell of global environmental conditions.
- the gene expression profiles that are quantitatively compared in the analyses presented in Examples 1 - 4 each contains data on the contemporaneous level of expression of over 800 different S. cerevisiae genes. These 800 genes represent a subset of the organism's expressible genes, estimated to be just slightly over 6000 in number. The results thus demonstrate that only a portion of a cell's global gene expression need be assayed for successful application of the methods described herein. Although the quantitative analysis will be increasingly robust and informative as the percentage of assessed genes increases, it is clear that the expression of fewer than all genes may be used in these analyses.
- samples of drug candidates may be in limiting supply, particularly when produced in small quantity by combinatorial chemistries; there may simply be too little of the agent to permit the testing of its effects on all possible genes of a given cell type. It may also, or in the alternative, be too expensive to assay each candidate agent across each expressible gene of the cell.
- genes are not all genes prove equally informative. Some may have an insufficient dynamic range in expression to provide significant information, no matter what the environmental condition. Other genes may vary in expression coordinately, or cooperatively, providing redundancy in the information collected.
- One approach to selecting informative subsets of genes for expression analysis is to choose the genes individually by known or suspected function.
- U.S. Patent No. 5,811,231 and European patent no. EP 0680517 BI disclose, inter alia , the selection of "stress genes" particularly to identify and characterize compounds that are toxic to the cell. Such an approach, however, requires antecedent knowledge of the gene's function.
- the bias imposed by such directed selection would reduce the possibility of identifying previously unsuspected relationships; in a method useful for the identification of such unsuspected relationships, such as the methods presented herein, such directed preselection would be particularly disfavored.
- Another approach is to choose the subset entirely at random, in the hope that the subset so selected proves representative of the whole.
- the problem, clearly, is that the subset so chosen may in fact prove uninformative for describing the cellular state under one or more environmental conditions.
- FIGS. 7 and 8 demonstrate qualitatively the results of a novel alternative for selection of informative gene subsets for gene expression analysis, to be described more fully below. This novel approach predicates the selection of genes for expression analysis upon the diversity — rather than size, direction, or commonality — of their expression.
- FIG. 7 is a scatter plot of gene expression signals, processed according to FIG. 1, derived from genome reporter matrices comprising 1532 separate S. cerevisiae gene expression reporters, each matrix treated individually with one of two agents known to be closely related in structure and function: 10 ⁇ g/ml Lovastatin (X axis) and 20 ⁇ g/ml Mevastatin (Y axis) . As earlier discussed with respect to FIG.
- FIG. 8 plots the gene expression signals from a 96 gene subset selected from the 1532 gene expression signals presented in FIG. 7. Although only 1 in 16 of the genes presented in FIG. 7 is selected for display in FIG. 8, the strong correlation in the two drug treatments may still be seen.
- the 96 genes in the selected subset are listed in Table 9, presented in Example 5 below. Although selected without regard to known function, the genes retained in the subset are seen to have diverse functions (the gene functions listed in the Table are drawn from the Stanford University Saccharo yces genome data base http: //genome-www. stanford.edu/Saccharomyces) .
- the subset of genes displayed in FIG. 8 was selected from those displayed in FIG. 7 in a process comprising two basic algorithmic steps: in a first step, each of the genes displayed in FIG. 7 was sorted according to its maximal historical dynamic range of expression; in the second step, an iterative process eliminated from the sorted list all but the first in each group of genes whose expression is strongly correlated. The result is retention in the chosen subset of the diversity of gene response seen in the original set, with each group of correlated genes being represented in the retained subset by that one gene with greatest dynamic response.
- Examples 1 - 4 demonstrate that the measurement of the expression of 864 of the 6000 genes potentially expressible by S. cerevisiae — that is, just about 14.4% of the total number of genes potentially expressible by the cell — permits the quantitative definition of cellular phenotype, and thus the quantitative determination of the relatedness of cellular states.
- Example 5 demonstrates that it is possible to select an even smaller subset of potentially expressible genes — just 96 of 6000, or about 1.6% of potentially expressible genes — the expression of which is sufficiently informative as to permit the quantitative definition of cellular phenotype, and thus the quantitative determination of the relatedness of cellular states.
- an important aspect of the present invention is to provide methods of cellular phenotyping, comprising selecting no more than 20% of a cell's expressible genes for expression analysis, wherein the concurrent expression of the selected genes sufficiently defines the cell's phenotype as to permit the cell's phenotype quantitatively to be related to the phenotype of another cell.
- no more than about 20% of the cell's potentially expressible genes are selected, more preferably no more than about 15% of the cell's potentially expressible genes, even more preferably no more than about 10% of the cell's potentially expressible genes, optimally no more than about 5% of the cell's potentially expressible genes, and in the most preferred embodiments, about 1% - 5%, and even 1 - 2% of the cell's potentially expressible genes.
- Algorithms for effecting such selection, and computers, systems, networks, and other devices for effecting the methods are also presented.
- the two basic steps in the algorithm for selecting an informative subset of expressible genes for expression analysis may be better understood by particular reference to FIGS. 9 and 10.
- the first of two major steps in the algorithm orders genes according to the dynamic range of their expression.
- historical data are used: for each gene, the maximum and minimum value of Signal 108 in the database of electronically stored gene expression profiles is determined by an appropriately formulated query (or series of queries) 900.
- gene expression data may be stored at any or all of the intermediate points in the processes described in FIGS. 1, 5, or 6.
- the Signal as output from step 108 is used. If Signal values as output from step 108 are not present in the database, the values may in certain instances be reconstructed from the values so stored— for example, if the Signal values output from step 110 are stored, the Signal as it would have been output from step 108 may be calculated by reversing step 110, that is, by exponentiation.
- a threshold is applied 904 by comparing the Range obtained in step 902 to a value that is empirically established. If Range exceeds the threshold, the gene is retained for subsequent use; if Range fails to exceed the threshold, the gene is discarded from further analysis. As shown in step 906, that discard may readily be achieved by setting Range to a null value. For the selection shown in FIG. 8 and exemplified in Example 5, a threshold of 10 was set. That is, only those genes demonstrating at least a 10- fold change in gene expression level across the set of historical gene expression profiles stored in the database were retained in the selected subset.
- a range threshold at this step in the algorithm will be determined by empiric needs, and is well within the skill in the art. Typically, a threshold of 10-fold will provide informative subsets of appropriately reduced size. It is, however, possible to set the threshold as low as 1; that is, to eliminate the cutoff entirely. The result, all other factors held constant, will be the selection of a much larger subset of genes. Furthermore, it will be understood that the threshold that is set at this step need not be limited to whole numbers .
- the threshold may be set as low as 1 or may, preferably, be greater than 1.
- the threshold will be set at 2 or greater, more preferably at 3 or greater, even more preferably at 4, 5, 6, 7, 8, or 9 or greater, in that order, most preferably to at least 10.
- the threshold may also be greater than 10, ranging as high as 100, preferably no more than 50, more preferably no more than 25, most preferably 10 - 20.
- FIG. 10 schematizes the second, iterative process of the second basic algorithmic step.
- FIG. 10 outlines two full iterations of the second step of the algorithm. At the left is shown the list of genes, as output from step 908, ordered from greatest to least dynamic range. Genes that were discarded at step 906 due to inadequate dynamic range are not shown.
- the first gene in the list serves as the index, or reference, gene. Taking each successive gene in the list in turn, the degree to which that gene's expression is correlated with the expression of the index gene across the set of stored gene expression profiles, is calculated. If the correlation (r 2 ) exceeds an empirically set value, the gene is discarded from the set.
- the effect of this step is to remove all genes whose expression is strongly correlated with that of the index gene, "gene 1"; the high degree of correlation implies that information contributed by the expression of these discarded genes is in large measure redundant of the information inherent in the expression values of the index gene.
- the index gene (“gene 1") is retained in the informative gene subset; as exemplified in the middle of FIG. 10, genes highly correlated therewith (“gene 3" and "gene 4") are discarded. Because the list is ordered from greatest to least expressive range, the single gene retained from the correlated group is that with the greatest dynamic range of expression.
- the first of the genes retained after gene 1 becomes the index, or reference gene. It too will be retained, as shown at the bottom of the figure.
- each successive gene that has been retained in the list the degree to which that gene's expression is correlated with the expression of the index gene (now “gene 2") across the set of stored gene expression profiles, is calculated. If the correlation (r 2 ) exceeds an empirically set value, the gene is discarded from the set. The next retained (uncorrelated) gene, here exemplified by “gene 6”, then becomes the index gene for the next iteration.
- the process is repeated until the list is exhausted.
- the correlation is preferably performed on the gene expression Signal as output from step 140 (i.e., as output from box 141).
- the number of genes retained in the final subset will be determined by the total number of genes contributing data to the database of gene expression profiles, by the range threshold applied at step 904, and by the correlation threshold applied during the iterative process schematized in FIG. 10.
- the two threshold values may be adjusted empirically to yield an informative subset containing any chosen number of genes.
- Example 5 the range threshold and correlation threshold were adjusted empirically to provide a subset with 96 genes — equal to the number of wells of standard microtiter plate — by setting the range threshold to 10 and the correlation threshold to 0.675.
- quantitative analyses may be performed, using just that subset of genes, according to the algorithms set forth in FIGS. 5 and 6.
- the analyses may be performed, as in Example 5, by selecting from more comprehensive gene expression profiles, or may, more usefully, be performed by acquiring prospectively gene expression profiles using just the identified subset of genes in the reporter matrix.
- Example 5 demonstrates the selection of a subset of 96 genes from the 1532 genes available in our database of stored gene expression profiles.
- Example 5 Although the quantitative analysis of gene expression in Example 5 was performed on the 96 gene subset using the algorithm of FIG. 6 (i.e., FIGS. IA, IB, and 6), the algorithm given in FIG. 5 (i . e . , FIGS. IA, IB, and 5) may also be used. Furthermore, FIG. 8 - which plots the expression data for the 96 genes from the index profile (appearing at rank 0) versus data from the profile appearing at rank 2 (20 ⁇ g/ml Mevastatin in 1% Ethanol) — demonstrates that the subset so selected may also be used for the qualitative analysis of gene expression profiles.
- FIG. 8 which plots the expression data for the 96 genes from the index profile (appearing at rank 0) versus data from the profile appearing at rank 2 (20 ⁇ g/ml Mevastatin in 1% Ethanol) — demonstrates that the subset so selected may also be used for the qualitative analysis of gene expression profiles.
- Replicate genome reporter matrices were prepared according to Ashby et ai., which is incorporated herein by reference. Briefly, for each such matrix recombinant constructs, each driving a fluorescent reporter from a distinct yeast promoter, were transformed individually into discrete cultures of Saccharomyces cerevisiae of identical strain background. Selection was applied to transformed cultures both to maintain the reporter and to prevent contamination by untransformed cells. Each such culture of transformed yeast was segregated and maintained in a separate spatially-addressable well of the matrix. The matrices as used contained 864 separate constructs, permitting the contemporaneous measurement of the expression levels of over eight hundred genes. Each matrix was subjected to a defined environmental condition, as specified in the entries of Tables 1 and 2. A gene expression profile was obtained from each matrix, as set forth in Ashby et ai., digitized, and stored electronically.
- Tables 1 and 2 demonstrate that each of the methods described herein is able to quantitate the relatedness of gene expression profiles, and by so doing, to identify the relatedness of drug treatments.
- the algorithm of FIGS. IA, IB, and 5 identifies treatment with 60 ⁇ g/ml actinomycin D as the most closely related of the treatments to the reference, or index, condition, which is exposure to 80 ⁇ g/ml actinomycin D. Treatment with 40 ⁇ g/ml actinomycin D and 50 ⁇ g/ml actinomycin D follow thereafter.
- Varying concentrations of daunarubicin, 5- FUDR, doxorubicin, 5-FU, hydroxyurea and mycophenolic acid follow. All of these agents, like actinomycin D, are known to affect nucleic acid synthesis. Much less closely related are treatments with agents of disparate activity: treatment with yeast alpha factor at rank 26 and 27, followed thereafter by Mevastatin, the latter an inhibitor of HMG-CoA reductase. At rank 31 may be found the profile generated by treating with no drug at all, the environmentally-matched control, and below that, treatment with the antifungal agents miconazole and griseofulvin, and treatment with the calcium channel blocker verapamil .
- Table 2 presents a quantitative ranking of relatedness of gene expression profiles generated using the method and algorithm of FIGS. IA, IB, and 6, as applied to the same set of electronically-stored gene expression profile data.
- agents that affect nucleic acid synthesis are again ranked as most closely related to treatment with 80 ⁇ g/ml actinomycin D.
- the ordered ranking of the decreasing concentrations of actinomycin D is also considered.
- the data set forth in Table 3 generated using the method set forth in FIG. 5 — identifies as agents that are closely related in action to daunarubicin the following: doxorubicin, actinomycin D, 5-FU, and 5-FUDR, consistent with the known activities of these agents.
- verapamil a calcium channel blocker
- Tables 5 and 6 demonstrate the substantial advantage of the second method for quantifying relatedness of gene expression profiles at low drug concentrations.
- the first method that set forth in FIG. 5, is unable accurately to quantitate the relatedness of gene expression profiles to that produced in the presence of only 12.5 ⁇ g/ml daunarubicin, ranking 5% Saline and 1000 ⁇ g/ml diltiazem (a calcium channel blocker) ahead of 5-FU, which itself just precedes anaerobic growth and verapamil in the ranking.
- Replicate genome reporter matrices were prepared as in Example 1 and Ashby et a . , with 864 distinct elements reporting the contemporaneous expression of 864 different yeast open reading frames.
- Gene expression profiles were acquired, under the conditions shown below, for each of the matrices, digitized, and stored. Thereafter, the relatedness of each gene expression profile to that produced by incubation of cells in yeast minimal media was quantified pairwise, substantially according to the method set forth in FIGS. IA, IB and 5. The measures of pairwise relatedness were then ordered, with the following results, as set forth in Table 7:
- Replicate genome reporter matrices were prepared according to Ashby et ai., which is incorporated herein by reference.
- the matrices as used for the analyses presented in this Example contained 1532 separate constructs, permitting the contemporaneous measurement of the expression levels of over fifteen hundred genes, about one quarter of the genes expressible by S. cerevisiae .
- Each matrix was subjected to a defined environmental condition, as specified in the individual entries in each of Tables 8 and 10.
- a gene expression profile was obtained from each matrix, as set forth in Ashby et ai., digitized, and stored electronically. Thereafter, the relatedness of each gene expression profile to that produced in the presence of 10 ⁇ g/ml Lovastatin was quantified pairwise, substantially according to the method set forth in FIGS. IA, IB and 6, with two minor differences.
- normalization step 108 was omitted from the analysis of the 96 gene subset because the assumption of constant mean expression may not prove valid as applied to such a small percentage of the cell's genes.
- the correction for disparate dynamic range of the reporters was accomplished in steps 610 and 611 by dividing each gene by the log square root of the maximum normalized signal; however, the value used to effect normalization was in each case that value appropriate to the 1532 gene subset.
- Table 8 demonstrates — in accord with results presented in Examples 1 - 4, above — that applying the algorithms of FIGS. IA, IB, and 6 to gene expression profiles containing 1532 distinct gene reporters permits quantitation of the relatedness of drugs to 10 ⁇ g/ml Lovastatin, an HMG-CoA reductase inhibitor.
- Lovastatin an HMG-CoA reductase inhibitor.
- other drugs of the same class Mevastatin, Fluvastatin, Simvastatin and Atorvastatin
- Drugs affecting other steps of the sterol biosynthetic pathway such as econazole, clotrimazole, and fluconazole, appear next in the ordered list.
- Drugs with substantially different structure or mode of action such as progesterone, nifedipine and tunicamycin, follow thereafter. A wide variety of other agents, having even lower relative profile scores, are not shown.
- the database of gene expression profiles that was used to generate Table 8 was then queried and subjected to the algorithm schematized in FIGS. 9 and 10. This algorithm is designed to identify a subset of the 1532 genes in the gene expression profiles that is, notwithstanding the reduced number of genes, sufficiently representative of the gene expression repertoire to permit quantitation of the relatedness of the gene expression profiles.
- the range threshold was empirically set to 10 and the correlation threshold to 0.675.
- the algorithms were implemented on a digital computer, with the algorithmic steps coded in C.
- this subset selected without regard to gene function, embraces a diverse collection of genes with disparate functions.
- Table 10 confirms that informative subsets of genes may be selected that permit the quantitative analysis of gene expression profiles.
- the analysis presented in Table 10 using only the 96 genes listed in Table 9, identifies HMG-CoA reductase drugs as most closely related to Lovastatin, with drugs acting elsewhere in the same biosynthetic pathway appearing next most closely related, with drugs that are entirely unrelated in target and effect shown as least closely related.
- this demonstration was performed by selecting 96 genes from among the 1532 genes for which expression data were available in the database, the identification of this informative subset would permit the subsequent, prospective, acquisition of informative gene expression data from only those identified reporters, with confidence that the data so acquired would permit the quantitative analysis of gene expression profiles.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002331510A CA2331510A1 (en) | 1998-05-12 | 1999-05-11 | Quantitative methods, systems and apparatuses for gene expression analysis |
KR1020007012657A KR20010052341A (en) | 1998-05-12 | 1999-05-11 | Quantitative methods, systems and apparatuses for gene expression analysis |
AU40751/99A AU750975B2 (en) | 1998-05-12 | 1999-05-11 | Quantitative methods, systems and apparatuses for gene expression analysis |
EP99924189A EP1076722A1 (en) | 1998-05-12 | 1999-05-11 | Quantitative methods, systems and apparatuses for gene expression analysis |
JP2000548511A JP2002514804A (en) | 1998-05-12 | 1999-05-11 | Numericalization method, system and apparatus for gene expression analysis |
IL13956799A IL139567A0 (en) | 1998-05-12 | 1999-05-11 | Quantitative methods, systems and apparatuses for gene expression analysis |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US7666898A | 1998-05-12 | 1998-05-12 | |
US09/076,668 | 1998-05-12 | ||
US29265799A | 1999-04-15 | 1999-04-15 | |
US09/292,657 | 1999-04-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999058720A1 true WO1999058720A1 (en) | 1999-11-18 |
Family
ID=26758353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/010387 WO1999058720A1 (en) | 1998-05-12 | 1999-05-11 | Quantitative methods, systems and apparatuses for gene expression analysis |
Country Status (8)
Country | Link |
---|---|
EP (1) | EP1076722A1 (en) |
JP (1) | JP2002514804A (en) |
KR (1) | KR20010052341A (en) |
CN (1) | CN1309722A (en) |
AU (1) | AU750975B2 (en) |
CA (1) | CA2331510A1 (en) |
IL (1) | IL139567A0 (en) |
WO (1) | WO1999058720A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6203987B1 (en) | 1998-10-27 | 2001-03-20 | Rosetta Inpharmatics, Inc. | Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns |
WO2001029268A2 (en) * | 1999-10-18 | 2001-04-26 | Curagen Corporation | Method for identifying interacting gene products |
WO2001051667A2 (en) * | 2000-01-14 | 2001-07-19 | Integriderm, L.L.C. | Informative nucleic acid arrays and methods for making same |
WO2001066803A2 (en) * | 2000-03-09 | 2001-09-13 | Yale University | Phytomics: a genomic-based approach to herbal compositions |
EP1141412A1 (en) * | 1998-12-28 | 2001-10-10 | Rosetta Inpharmatics Inc. | Methods for drug interaction prediction using biological response profiles |
WO2001084139A1 (en) * | 2000-05-04 | 2001-11-08 | The Board Of Trustees Of The Leland Stanford Junior University | Significance analysis of microarrays |
US6468476B1 (en) | 1998-10-27 | 2002-10-22 | Rosetta Inpharmatics, Inc. | Methods for using-co-regulated genesets to enhance detection and classification of gene expression patterns |
WO2003008630A2 (en) * | 2001-07-19 | 2003-01-30 | Syngenta Limited | Methods of profiling gene expression, protein or metabolite levels |
WO2003091450A1 (en) * | 2002-04-24 | 2003-11-06 | Azign Bioscience A/S | Method for evaluating a therapeutic potential of a chemical entity |
FR2840323A1 (en) * | 2002-05-31 | 2003-12-05 | Centre Nat Rech Scient | METHOD OF ANALYZING TRANSCRIPTION VARIATIONS OF A GENE SET |
US6691042B2 (en) | 2001-07-02 | 2004-02-10 | Rosetta Inpharmatics Llc | Methods for generating differential profiles by combining data obtained in separate measurements |
US6692916B2 (en) | 1999-06-28 | 2004-02-17 | Source Precision Medicine, Inc. | Systems and methods for characterizing a biological condition or agent using precision gene expression profiles |
US6801859B1 (en) | 1998-12-23 | 2004-10-05 | Rosetta Inpharmatics Llc | Methods of characterizing drug activities using consensus profiles |
US6839635B2 (en) | 1998-12-23 | 2005-01-04 | Rosetta Inpharmatics Llc | Method and system for analyzing biological response signal data |
US6950752B1 (en) | 1998-10-27 | 2005-09-27 | Rosetta Inpharmatics Llc | Methods for removing artifact from biological profiles |
US6960439B2 (en) | 1999-06-28 | 2005-11-01 | Source Precision Medicine, Inc. | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
US6964850B2 (en) | 2001-11-09 | 2005-11-15 | Source Precision Medicine, Inc. | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
US7054755B2 (en) | 2000-10-12 | 2006-05-30 | Iconix Pharmaceuticals, Inc. | Interactive correlation of compound information and genomic information |
US7467118B2 (en) | 2006-01-12 | 2008-12-16 | Entelos Inc. | Adjusted sparse linear programming method for classifying multi-dimensional biological data |
US7588892B2 (en) | 2004-07-19 | 2009-09-15 | Entelos, Inc. | Reagent sets and gene signatures for renal tubule injury |
CN112687370A (en) * | 2020-12-28 | 2021-04-20 | 博奥生物集团有限公司 | Electronic prescription generation method and device and electronic equipment |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100829867B1 (en) * | 2006-12-05 | 2008-05-16 | 한국전자통신연구원 | Gene clustering method using gene expression profile |
KR100964181B1 (en) * | 2007-03-21 | 2010-06-17 | 한국전자통신연구원 | Gene expression profile clustering method and apparatus using gene vocabulary classification system |
KR101394339B1 (en) * | 2012-03-06 | 2014-05-13 | 삼성에스디에스 주식회사 | System and method for processing genome sequence in consideration of seed length |
CA2936107C (en) * | 2014-01-14 | 2022-09-13 | University Of Utah | Methods and systems for genome analysis |
CN108664769B (en) * | 2017-03-31 | 2021-09-21 | 中国科学院上海营养与健康研究所 | Drug relocation method based on cancer genome and non-specific gene tag |
CN107723343B (en) * | 2017-11-28 | 2021-03-23 | 宜昌美光硅谷生命科技股份有限公司 | Method for quantitative analysis of gene |
CN109935341B (en) * | 2019-04-09 | 2021-04-13 | 北京深度制耀科技有限公司 | Method and device for predicting new drug indication |
CN113539366B (en) * | 2020-04-17 | 2024-11-08 | 中国科学院上海药物研究所 | An information processing method and device for predicting drug targets |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994017208A1 (en) * | 1993-01-21 | 1994-08-04 | President And Fellows Of Harvard College | Methods and diagnostic kits utilizing mammalian stress promoters to determine toxicity of a compound |
WO1997006277A1 (en) * | 1995-08-09 | 1997-02-20 | The Regents Of The University Of California | Methods for drug screening |
WO1997013877A1 (en) * | 1995-10-12 | 1997-04-17 | Lynx Therapeutics, Inc. | Measurement of gene expression profiles in toxicity determination |
WO1997022720A1 (en) * | 1995-12-21 | 1997-06-26 | Kenneth Loren Beattie | Arbitrary sequence oligonucleotide fingerprinting |
WO1998006874A1 (en) * | 1995-08-09 | 1998-02-19 | The Regents Of The University Of California | Systems for generating and analyzing stimulus-response output signal matrices |
-
1999
- 1999-05-11 KR KR1020007012657A patent/KR20010052341A/en not_active Application Discontinuation
- 1999-05-11 IL IL13956799A patent/IL139567A0/en unknown
- 1999-05-11 WO PCT/US1999/010387 patent/WO1999058720A1/en not_active Application Discontinuation
- 1999-05-11 CA CA002331510A patent/CA2331510A1/en not_active Abandoned
- 1999-05-11 CN CN99808552A patent/CN1309722A/en active Pending
- 1999-05-11 AU AU40751/99A patent/AU750975B2/en not_active Ceased
- 1999-05-11 JP JP2000548511A patent/JP2002514804A/en active Pending
- 1999-05-11 EP EP99924189A patent/EP1076722A1/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994017208A1 (en) * | 1993-01-21 | 1994-08-04 | President And Fellows Of Harvard College | Methods and diagnostic kits utilizing mammalian stress promoters to determine toxicity of a compound |
WO1997006277A1 (en) * | 1995-08-09 | 1997-02-20 | The Regents Of The University Of California | Methods for drug screening |
WO1998006874A1 (en) * | 1995-08-09 | 1998-02-19 | The Regents Of The University Of California | Systems for generating and analyzing stimulus-response output signal matrices |
WO1997013877A1 (en) * | 1995-10-12 | 1997-04-17 | Lynx Therapeutics, Inc. | Measurement of gene expression profiles in toxicity determination |
WO1997022720A1 (en) * | 1995-12-21 | 1997-06-26 | Kenneth Loren Beattie | Arbitrary sequence oligonucleotide fingerprinting |
Non-Patent Citations (3)
Title |
---|
PIETU G ET AL: "NOVEL GENE TRANSCRIPTS PREFERENTIALLY EXPRESSED IN HUMAN MUSCLES REVEALED BY QUANTITATIVE HYBRIDIZATION OF A HIGH DENSITY CDNA ARRAY", GENOME RESEARCH, vol. 6, no. 6, 1 June 1996 (1996-06-01), pages 492 - 503, XP000597086, ISSN: 1088-9051 * |
SCHENA M ET AL: "PARALLEL HUMAN GENOME ANALYSIS: MICROARRAY-BASED EXPRESSION MONITORING OF 1000 GENES", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, vol. 93, no. 20, 1 October 1996 (1996-10-01), pages 10614 - 10619, XP002912238, ISSN: 0027-8424 * |
WODICKA ET AL: "GENOME-WIDE EXPRESSION MONITORING IN SACCHAROMYCES CEREVISIAE", NATURE BIOTECHNOLOGY, vol. 15, no. 15, December 1997 (1997-12-01), pages 1359 - 1367 1367, XP002100297, ISSN: 1087-0156 * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6203987B1 (en) | 1998-10-27 | 2001-03-20 | Rosetta Inpharmatics, Inc. | Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns |
US6950752B1 (en) | 1998-10-27 | 2005-09-27 | Rosetta Inpharmatics Llc | Methods for removing artifact from biological profiles |
US6468476B1 (en) | 1998-10-27 | 2002-10-22 | Rosetta Inpharmatics, Inc. | Methods for using-co-regulated genesets to enhance detection and classification of gene expression patterns |
US6847897B1 (en) | 1998-12-23 | 2005-01-25 | Rosetta Inpharmatics Llc | Method and system for analyzing biological response signal data |
US6839635B2 (en) | 1998-12-23 | 2005-01-04 | Rosetta Inpharmatics Llc | Method and system for analyzing biological response signal data |
US6801859B1 (en) | 1998-12-23 | 2004-10-05 | Rosetta Inpharmatics Llc | Methods of characterizing drug activities using consensus profiles |
EP1141412A1 (en) * | 1998-12-28 | 2001-10-10 | Rosetta Inpharmatics Inc. | Methods for drug interaction prediction using biological response profiles |
US6370478B1 (en) | 1998-12-28 | 2002-04-09 | Rosetta Inpharmatics, Inc. | Methods for drug interaction prediction using biological response profiles |
EP1141412A4 (en) * | 1998-12-28 | 2006-08-30 | Rosetta Inpharmatics Inc | Methods for drug interaction prediction using biological response profiles |
US7957909B2 (en) | 1999-06-28 | 2011-06-07 | Source Precision Medicine, Inc. | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
US6960439B2 (en) | 1999-06-28 | 2005-11-01 | Source Precision Medicine, Inc. | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
US6692916B2 (en) | 1999-06-28 | 2004-02-17 | Source Precision Medicine, Inc. | Systems and methods for characterizing a biological condition or agent using precision gene expression profiles |
WO2001029268A3 (en) * | 1999-10-18 | 2002-01-31 | Curagen Corp | Method for identifying interacting gene products |
WO2001029268A2 (en) * | 1999-10-18 | 2001-04-26 | Curagen Corporation | Method for identifying interacting gene products |
US6635423B2 (en) | 2000-01-14 | 2003-10-21 | Integriderm, Inc. | Informative nucleic acid arrays and methods for making same |
WO2001051667A3 (en) * | 2000-01-14 | 2002-07-18 | Integriderm L L C | Informative nucleic acid arrays and methods for making same |
WO2001051667A2 (en) * | 2000-01-14 | 2001-07-19 | Integriderm, L.L.C. | Informative nucleic acid arrays and methods for making same |
WO2001066803A3 (en) * | 2000-03-09 | 2002-04-18 | Univ Yale | Phytomics: a genomic-based approach to herbal compositions |
WO2001066803A2 (en) * | 2000-03-09 | 2001-09-13 | Yale University | Phytomics: a genomic-based approach to herbal compositions |
US7363165B2 (en) | 2000-05-04 | 2008-04-22 | The Board Of Trustees Of The Leland Stanford Junior University | Significance analysis of microarrays |
WO2001084139A1 (en) * | 2000-05-04 | 2001-11-08 | The Board Of Trustees Of The Leland Stanford Junior University | Significance analysis of microarrays |
US7054755B2 (en) | 2000-10-12 | 2006-05-30 | Iconix Pharmaceuticals, Inc. | Interactive correlation of compound information and genomic information |
US6691042B2 (en) | 2001-07-02 | 2004-02-10 | Rosetta Inpharmatics Llc | Methods for generating differential profiles by combining data obtained in separate measurements |
WO2003008630A3 (en) * | 2001-07-19 | 2003-07-31 | Syngenta Ltd | Methods of profiling gene expression, protein or metabolite levels |
WO2003008630A2 (en) * | 2001-07-19 | 2003-01-30 | Syngenta Limited | Methods of profiling gene expression, protein or metabolite levels |
US6964850B2 (en) | 2001-11-09 | 2005-11-15 | Source Precision Medicine, Inc. | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
US8055452B2 (en) | 2001-11-09 | 2011-11-08 | Life Technologies Corporation | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
US8718946B2 (en) | 2001-11-09 | 2014-05-06 | Life Technologies Corporation | Identification, monitoring and treatment of disease and characterization of biological condition using gene expression profiles |
WO2003091450A1 (en) * | 2002-04-24 | 2003-11-06 | Azign Bioscience A/S | Method for evaluating a therapeutic potential of a chemical entity |
WO2003102849A1 (en) * | 2002-05-31 | 2003-12-11 | Centre National De La Recherche Scientifique | Method for analysis of transcription variations in a set of genes |
FR2840323A1 (en) * | 2002-05-31 | 2003-12-05 | Centre Nat Rech Scient | METHOD OF ANALYZING TRANSCRIPTION VARIATIONS OF A GENE SET |
US7588892B2 (en) | 2004-07-19 | 2009-09-15 | Entelos, Inc. | Reagent sets and gene signatures for renal tubule injury |
US7467118B2 (en) | 2006-01-12 | 2008-12-16 | Entelos Inc. | Adjusted sparse linear programming method for classifying multi-dimensional biological data |
CN112687370A (en) * | 2020-12-28 | 2021-04-20 | 博奥生物集团有限公司 | Electronic prescription generation method and device and electronic equipment |
CN112687370B (en) * | 2020-12-28 | 2023-12-22 | 北京博奥晶方生物科技有限公司 | Electronic prescription generation method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
EP1076722A1 (en) | 2001-02-21 |
AU750975B2 (en) | 2002-08-01 |
CA2331510A1 (en) | 1999-11-18 |
JP2002514804A (en) | 2002-05-21 |
IL139567A0 (en) | 2002-02-10 |
KR20010052341A (en) | 2001-06-25 |
AU4075199A (en) | 1999-11-29 |
CN1309722A (en) | 2001-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1076722A1 (en) | Quantitative methods, systems and apparatuses for gene expression analysis | |
AU724474B2 (en) | Methods for drug screening | |
US5777888A (en) | Systems for generating and analyzing stimulus-response output signal matrices | |
Gibson | Microarrays in ecology and evolution: a preview | |
US8521441B2 (en) | Method and computer program product for reducing fluorophore-specific bias | |
Causton et al. | Microarray gene expression data analysis: a beginner's guide | |
Lee et al. | Microarrays: an overview | |
Tanzer et al. | Global nutritional profiling for mutant and chemical mode-of-action analysis in filamentous fungi | |
US6326140B1 (en) | Systems for generating and analyzing stimulus-response output signal matrices | |
Lopez et al. | Feature extraction and signal processing for nylon DNA microarrays | |
Herzel et al. | Extracting information from cDNA arrays | |
US20030182066A1 (en) | Method and processing gene expression data, and processing programs | |
Mary-Huard et al. | Spotting effect in microarray experiments | |
Cojacaru et al. | The use of microarrays in medicine | |
Lockhart et al. | DNA arrays and gene expression analysis in the brain | |
CA2343076A1 (en) | Geometrical and hierarchical classification based on gene expression | |
AU720427B2 (en) | Systems for generating and analyzing stimulus-response output signal matrices | |
US20090143238A1 (en) | Oligonucleotide matrix and methods of use | |
Sasidharan et al. | An approach to comparing tiling array and high throughput sequencing technologies for genomic transcript mapping | |
Braam et al. | Expression profiling in cardiovascular disease using microarrays | |
Ramanathan | Microarrays and bioinformatics: microarrays and bioinformatics in identifying genes that could form the molecular basis for glutamate induced cell death and protection conferred by Vitamin E | |
Regulons | Transcriptional Profiling of Cross Pathway |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 99808552.9 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2331510 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 139567 Country of ref document: IL |
|
WWE | Wipo information: entry into national phase |
Ref document number: 40751/99 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref document number: 2000 548511 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020007012657 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 508244 Country of ref document: NZ |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999924189 Country of ref document: EP Ref document number: IN/PCT/2000/786/CHE Country of ref document: IN |
|
WWP | Wipo information: published in national office |
Ref document number: 1999924189 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 1020007012657 Country of ref document: KR |
|
WWG | Wipo information: grant in national office |
Ref document number: 40751/99 Country of ref document: AU |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1020007012657 Country of ref document: KR |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1999924189 Country of ref document: EP |