WO2001036980A2

WO2001036980A2 - A process for identifying the active site in a biological target

Info

Publication number: WO2001036980A2
Application number: PCT/GB2000/004420
Authority: WO
Inventors: Torbjörn Lundstedt; Per Andersson; Jarl Wikberg; Ruta Muceniece; Peteris Prusis
Original assignee: Melacure Therapeutics Ab; Pett, Christopher, Phineas
Priority date: 1999-11-18
Filing date: 2000-11-20
Publication date: 2001-05-25
Also published as: HK1049218A1; NZ518980A; GB9927346D0; ZA200203963B; WO2001036980A3; CA2392086A1; EP1232466A2; AU1530501A

Abstract

The invention relates to processes for characterising the sites of interaction between a Ligand Y and a Target X, comprising the steps of (1) inputting information on the chemical/physical properties of at least two targets of the type X; (2) inputting information on the chemical/physical properties of at least two ligands of the type Y; (3) inputting information on the interactions between at least two of the targets of type X with at least two of the ligands of type Y; and then correlating the information from steps 1, 2 and 3 using one or more multivariate methods in order to produce a model of the interaction between the Ligand Y and Target X, from which the sites of interaction may be identified and/or characterised.

Description

A process for identifying the active site in a biological target

The present invention relates to processes for the identification of binding site(s) in biological targets such as mRNA, rRNA, tRNA, DNA, proteins, peptides, endogenous ligands, receptors and enzymes. The invention is based on the use of multivariate methods such as experimental design, Principal Component Analysis (PCA), Soft Independent Modelling of Class Analogues (SIMCA), Principal Component Regression (PCR), Projections to Latent Structures (PLS), Multivariate Design (MVD), Statistical Molecular Design (SMD), Informative Chemical Libraries, Multivariate Quantitative Structure Activity Relationships (MQSAR) and Multivariate Characterisation (MNC). These methods have been developed and applied since the beginning of the 1980's in the design and investigation of chemical, pharmaceutical, pharmacological and biochemical systems. This invention, however, provides the use of the abovementioned methods, inter alia, in an integrated approach in order to identify the binding site (active site) or the binding sites of a ligand or ligands in different kinds of macromolecules. This is of outstanding value in the drug discovery process, both regarding lead-finding as well as for lead-optimisation.

Currently known methods for drug design include the synthesis of new compounds

(referred to as "ligands" herein) followed by biological testing of these ligands. Usually the interaction(s) with the macromolecule or macromolecules of interest (i.e. the "targets") are measured, and a lead compound is identified and optimised for fulfilling the demands of candidate drug (CD). Typically the currently known methods involve the synthesis of a huge number of compounds before a lead can be identified. A typical criterion for a ligand to be of interest is that it shows a desired activity, affinity or selectivity for a particular target.

The ligands are most often tested for their affinity to macromolecular targets that are proteins such as enzymes, hormone receptors or G-protein coupled receptors. However, the testing of these ligands with other macromolecular targets such as specific sequences of DΝA is also common. Typical examples of the ligands that it is desired to test, and to design improved variants of, include organic compounds, peptides (linear or cyclic sequences of amino acids), mixtures of peptides and organic compounds, and sequences of DΝA. A common procedure in the development of active compounds is to randomly synthesise a library containing a few hundred up to millions of different compounds. An HTS-assay is then used for measuring the biological activity of the compounds or the interaction with the macromolecule(s) of interest. The most promising compounds are then selected for further refinement.

Engineering of proteins is also a method in use. Through the use of DNA technologies, well known in the art, artificial proteins may be constructed. A typical example constitutes the design of an antibody. Antibodies constitute proteins with fixed and variable regions. Changing amino acids in the variable region of an existing antibody can afford new properties to the antibody. The antibodies with the desired properties are usually selected from a set of engineered antibodies, by using a suitable selection method.

Other methods well known in the art constitute the rational design of pharmaceutical entities based on knowledge of the three dimensional (3D) structure of the macromolecule that the pharmaceutical is desired to interact with. A macromolecule of interest can, for example, be crystallized and its 3D structure determined by use of crystallographic methods. It is also possible to use NMR for elucidating the 3D structure of proteins. Once the 3D structure of the macromolecule is known, a chemical entity can then be designed to fit into a suitable region of the macromolecule. A large problem is however that the determination of the 3D structure of macromolecules is difficult, expensive and not always possible (see Branden and Tooze, 1991).

There also exist methods well known in the art collectively termed QSAR methods (QSAR = Quantitative Structure Activity Relationships). Such methods analyse the relation between the structures of test compounds and their affinity to the macromolecule. The information is then used to deduce better structures. A well-known example of a QSAR method constitutes CoMFA (Cramer et al., J. Amer. Chem. Soc, 1988, 110, 5959-5967).

So called pharmacophore models are also used in drug design (see e.g. Daveu et al. 1999; de Groot et al. 1999; McGregor et al. 1999).

It is also known that protein and DNA originating from living organisms virtually always exist in variants, that, although they have more or less differing amino acid sequences or DNA sequences, retain similar structural organizations and functions (Branden and Tooze, 1991). Thus, in living organisms numerous proteins exist showing homologous amino acid sequences, and which share similar structures and biological properties. Such variants of proteins very often exist within the same species. However, when one also considers all living organisms, one will find that numerous mutations have occurred during evolution that has led to the accumulation of a very large number of variants of proteins and genes that show similar structural and biological properties (Branden and Tooze, 1991).

Proteins are built from amino acid chains (generally termed primary structures) that form structural elements (motifs) such as α-helices, β-sheets, loop regions, hairpin β motifs, and the like (generally termed secondary structures), which then are used in the building of larger structures (generally termed tertiary structures or domains); these domains in turn forming the overall protein structure (generally called quartenary structures) (for extensive examples and discussion on this topic, reference is given to Branden and Tooze, 1991). Integral membrane proteins often exist in large number of homologous variants. Well known examples include tyrosine kinases, serine/threonine kinases, ion channels, G- protein coupled receptors and the steroid/thyroid hormone receptor family. For example, about 1000 different variants of the G-protein coupled receptors have been cloned and sequenced. The large group of G-protein coupled receptors constitutes a good example of homologous proteins with similar structural organization. The G-protein coupled receptors are known to be built from one single amino acid chain forming seven transmembrane α- helices, one extracellular N-terminal amino acid chain, one intra-cellularly located C- terminal amino acid chain, three extracellular loops and three intracellular loops (Baldwin, 1993).

The methods that are known in the art make use of information on new or known ligands, and in some cases variants thereof, and the affinity to the target. In most of the cases, information on a number of variants of the ligands in question is correlated with information derived from the binding of these variants with a single target molecule. In a large number of cases, the information of the 3D-structure of the target is used. However, use of the combination of chemical physical descriptors for both the target and the ligands simultaneously for the identification of the active site of the target by applying quantitative methods has never been done. The present invention also takes advantage of the fact that available technology allows the construction and production of modified macromolecules. This is a technology that in the case of proteins is generally referred to as protein engineering (Branden and Tooze, 1991).

Using techniques well known in the art, one or several amino acids in a protein may be exchanged for other amino acid(s), removed or new amino acids are added. This is generally done by so-called directed mutagenesis techniques. E.g. one specific amino acid in a protein can be exchanged for another amino acid (see e.g. Frandberg et al, 1994).

However, another approach constitutes the construction of so called chimeric proteins. A chimeric protein has incorporated or exchanged parts of the amino acid sequence(s) from another protein. For an example of the approach, see Schioth et al. (1998). The analogous procedure for the construction of chimeric DNA's can of course also be undertaken.

The use of one, two, three or more different chemical/physical properties for molecules is in QSAR (Hansh et al) well known and established for the description of the ligands in a Multiple Linear Regression (MLR) model. One basic assumption in the traditional QSAR is that the descriptors are independent of each other. A number of variations on this concept, introducing new descriptors and applying Stepwise Regression in order to obtain the best possible correlation, has been used in QSAR.

Since the physical/chemical descriptors for different compounds are highly unlikely to be independent of each other, they have to be handled by a different approach. The first example of handling this problem was in the investigation of "Solvent Selection for Organic Synthesis", by Carlson, Lundstedt and Albano (1985). In this paper, a multivariate characterisation of 82 solvents was made. In order to determine the number of "independent variables" describing the solvents, PC A was used (Wold, 1987; Jackson, 1991). In addition to this, different strategies for selecting solvents on the basis of diversity were suggested. In WOOO/033218 Al, a similar approach was suggested, i.e. to investigate and make designs in the chemical space of drug-like compounds. Recently, an investigation was published where the binding sites of 7TM (seven transmembrane) receptors were investigated. However, the investigation did not include or even suggest a joint analysis of both the ligands and the receptors (Clementi et al., 2000).

The first example of investigating systems with several types of chemicals involved was described by Lundstedt (1986). In this work, Multivariate Design (MVD) was applied for the first time and exemplified by the investigation of scope and limitations for "The Willgerodt-Kindler Reaction". Solvents, starting material and reagents were described by multivariate characterisation. The selection for each subset was based on diversity and finally the combination of reactions for the investigation of "scope and limitations" was suggested by the use of MVD. The same principle as above was the basis for "informative chemical libraries" (Lundstedt et al., 1997, and Andersson et al., 1999) and also for describing peptides and proteins, as further discussed below.

The translation of protein and peptide sequences to a quantitative description based on chemical/physical properties is reported in the literature in several investigations. A multivariate characterisation of amino acids with physical/chemical descriptors has been done and then followed by a PCA to determine the dimensionality (Hellberg et al. 1986; Hellberg et al. 1987; Jonsson et al. 1989; Collantes and Dunn, 1995; Sandberg et al. 1998). The characterisation made by Hellberg et. al. (1986; 1987) includes experimentally- as well as semi-empirically-derived variables. Using Principal Component Analysis (PCA, Wold, 1987), three (or more) latent variables, so-called principal property variables may be generated, which summarise the information from the original variables. By using these principal properties, each amino acid in a protein or peptide sequence may be quantitatively characterised, i.e. be translated to three latent variables containing physical and chemical information. This means that instead of comparing sequences with a one- letter code, a quantitative description of each sequence can be generated. This method has been used for obtaining descriptors of the ligands (the active compounds) and then used in MQSAR to relate chemical structures to properties or biological activities (BA). By applying this approach, the important chemical properties of a ligand which are needed for binding to the active site in a target, are identified (Andersson et. al., 1998, and Lundstedt et al., 2000). The same principle for peptides may be applied for DNA, RNA, and proteins, and similar polymeric molecules.

The analysis of MQSAR models between peptides of different length and the biological test results using PLS (Wold, 1993a) and many other calculation methods (e.g. neural networks) require a uniform matrix of descriptors where all sequences are described with the same number of variables. However, sequences of RNA, DNA, proteins and peptides are often of different length.

In order for the calculations to be able to handle this, a transform has to made to obtain a uniform matrix, i.e. a matrix where each sequence is described with the same number of variables. A solution to the problem is to calculate auto covariances and auto cross covariances (ACC) (Wold et al. 1993b) between amino acids. A similar approach has been developed for handling branched or circular peptides as well as for branched proteins (Lundstedt et al., 2000). Similar approaches have been used for classification of DNA and RNA sequences of different length, as well as for any set of polymers composed of building blocks (Wold et al. 1998). A further improvement is to combine the ACC with OSC (Orthogonal Scatter Correction) to reduce the noise in the models (Andersson et al. 1998). These methods have so far been used for obtaining descriptors of targets and relating them to a measured biological activity or using them for comparing sequences to each other (Wold et al., 2000).

The same principle as for peptides may be applied for DNA, RNA and other polymers or oligomers.

The current invention provides a novel method for identifying the interaction site, binding site or active site in a macromolecule such as mRNA, rRNA, tRNA, DNA, peptides, proteins, carbohydrates or any kind of oligomers or polymers, whether natural or synthetic. The invention relates to the use of "informative combinatorial chemistry", "informative peptide libraries", MQSAR and a chemical/physical description of the target either based on the principal properties for the building blocks of the target (i.e. aminoacids or similar) or handled as chimeric target proteins or handled as mutated target proteins. Other macromolecules such as mRNA, rRNA, tRNA, DNA, peptides or enzymes can be handled in the same way.

In one embodiment, the invention relates to a process for characterising the interaction between a Ligand Y and a Target X comprising:

Step 1 Obtaining information representing one or more chemical and/or physical properties of at least two ligands of the type Y; Step 2 Obtaining information representing one or more chemical and/or physical properties of at least two targets of the type X; Step 3 Obtaining information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X;

and processing the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and Target X from which one or more of the properties of the interaction between the Ligand Y and the Target X may be characterised.

In this context, the term "characterising the interaction" includes obtaining information on, determining, predicting or estimating at least one chemical and/or physical property of the interaction or of the sites of interaction; estimating or predicting the position of the site of interaction within the Target X; estimating or predicting the position of the site of interaction within the Ligand Y; estimating or predicting the binding affinity, selectivity, activity, biological activity or avidity of the Ligand Y or Y' for Target X; estimating or predicting which subsequences, regions or parts of the Ligand Y interact with the Target X; or estimating or predicting which subsequences, regions or parts of the Target X interact with the Ligand Y.

The invention also provides a process for estimating the position of the active site in a Target X in an interaction between a Ligand Y and a Target X, or estimating one or more physical and/or chemical properties of the active site, comprising the above Steps 1, 2, and 3, and correlating the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the position of the active site or one or more physical and/or chemical properties of the active site in the Target X may be estimated.

The invention further provides a process for predicting the position of the active site in an interaction between a Ligand Y and a Target X, or predicting one or more physical and/or chemical properties of the active site, comprising: the above Steps 1, 2, and 3; Step 4, which comprises correlating the information from Steps 1 , 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X; and using the model to predict the position of the active site or one or more physical and/or chemical properties of the active site.

A further embodiment of the invention provides a process performed with the aid of a programmed computer for the estimation of the position of the active site in a Target X, in an interaction between a Ligand Y and a Target X, or one or more physical and/or chemical properties of the active site, comprising the steps of:

Step 1 Inputting information representing one or more chemical and/or physical properties of at least two ligands of the type Y;

Step 2 Inputting information representing one or more chemical and/or physical properties of at least two targets of the type X; Step 3 Inputting information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X;

Step 4 Computing a model from the inputted information which describes the interaction between the Ligand Y and the Target X; and then using the model to estimate the position of the active site, or to estimate one or more physical and/or chemical properties of the active site.

The invention also provides a process for assisting in the design of a Ligand Y' which binds to a Target X, the Ligand Y' having an increased or decreased binding affinity, selectivity or avidity for the Target X compared to that of a Ligand Y, comprising the Steps 1, 2 and 3 of the invention; and then correlating the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the structure and/or one or more chemical and/or physical properties of the Ligand Y' may be estimated or predicted.

A further embodiment provides a process for estimating or predicting the binding affinity, selectivity or avidity of a Ligand Y' with a Target X, comprising Steps 1, 2 and 3 of the invention; and then correlating the information from Steps 1 , 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the binding affinity, selectivity or avidity of the Ligand Y with the Target X may be estimated or predicted. In this context, the Ligand Y' is a ligand of the type Y, as hereindefined. The processes according the the invention may generally be divided into the following steps:

STEPS 1 AND 2

The first steps are to describe the target X (and ligand Y) by numbers which give a good representation of the chemical/physical properties of the target (and/or ligand, respectively). Examples of different ways to describe the targets and ligands are given below:

In some cases, the processes of the invention will make use of information which directly represents the chemical/physical properties of the targets and/or ligands. In most cases, however, the processes of the invention will make use of the information (i.e. descriptors) which indirectly represents the information on the chemical/physical properties of the targets and/or ligands, i.e. the latter information is subjected to a conversion, operation, transformation or translation process such as those described herein (e.g. principal properties, bit-vectors, PCA, ACC, etc.) before being correlated.

In the present invention, the targets X may be of any chemical nature but it is preferred that X is represented by proteins, peptides, large or small peptides, protein subunits, receptors, ion-channels, transporters, carriers, enzymes, drug binding proteins, proteins participating in cell signaling, polymers, structures (including linear, cyclic and branched structures and combinations thereof) which at least in part are being composed of building blocks, DNA's, part of DNA's, DNA sequences, RNA's, part of RNA's, RNA sequences, transfer RNA's, messenger RNA's, carbohydrate, with proteins being most preferred. In the case of a protein, X may be selected so as to contain one peptide chain (i.e. being a monomeric protein; in other words being composed of one sub-unit). However, X may equally well be selected to contain several peptide chains held together by molecular interactions (e.g. the macromolecule being a multimeric protein, in other words being composed of several sub- units).

In the case of X being a protein, any chemical modification of the amino acid chain of X is allowed. Specific examples of such modification(s) include (but are not limited to) glycosylation, palmitoylation, phosphorylation, proteolytic degradation, peptide chain breaks, knicking, oxidations, or any other chemical, biochemical or biological modification(s). X can also contain non-protein moieties such as co-factors, prosthetic groups, metal atoms, and the like. It is also allowed that natural amino acids of a protein are exchanged for non-natural amino acids.

In the present invention, the molecular weight of the target X is preferably larger than 5000 g/mole, more preferably larger than 7000 g/mole, even more preferably larger than 10000 g/mole, even more preferably larger than 12000 g/mole, still even more preferably larger than 14000 g/mole, even still even more preferably larger than 17000 g/mole and most preferably larger than 20000 g/mole. However, in some specific embodiments of the invention it is preferred that the molecular weight of target X is larger than 25000 g/mole or even larger than 30000 g/mole and more. However, for most embodiments of the invention the molecular weight of X can be as low as 3000 g/mole or even as low as 2000 g/mole or even as low as 1000 g/mole or smaller, or even lower, or of any other molecular weight suited for the problem to be investigated.

In the present invention, the ligands included in Y are of any chemical nature. Thus included in Y are (but not limited to) organic compounds, chemical libraries, peptides, peptide libraries, protein subunits, proteins, receptors, ion-channels, transporters, carriers, enzymes, drug binding proteins, proteins participating in cell signaling, polymers and structures (including linear, cyclic and branched structures and combinations thereof) which at least in part are being composed of building blocks, non-peptides, organic chemical compounds, DNA's, part of DNA's, DNA sequences, RNA's, RNA sequences, part of RNA's, transfer RNA's, messenger RNA's, carbohydrates, hybrids of any of the aforementioned and the like.

Y is preferably an informative organic library or an informative peptide library. Also Y can be a set of substances taken from nature, e.g., natural substance libraries.

In some cases Y is selected to be a ligand which has the properties that are listed above for the properties of target X. In these cases the molecular weight of Y is preferably not restricted to any particular size; it may be small or it may be large. Thus it can be seen that the terms Target X and Ligand Y are essentially interchangeable, i.e. the invention is not limited to interactions between "targets" and "ligands"; it applies to any entities which are capable of interacting with one another.

Usually the molecular weight of Y is within the range 100 - 5000 g/mole. However, as mentioned, for many very important implementations of the present invention it is desired that Y is of a macromolecular nature. Thus, in these cases Y is preferably larger than 5000 g/mole, more preferably larger than 7000 g/mole, even more preferably larger than 10000 g/mole, even more preferably larger than 12000 g/mole, still even more preferably larger than 14000 g/mole, even still even more preferably larger than 17000 g/mole and most preferably larger than 20000 g/mole. However, in some specific embodiments of the invention it is preferred that the molecular weight of molecule Y is larger than 25000 g/mole or even larger than 30000 g/mole and more.

In other embodiments of the invention, it is preferred that the ligand Y is a small peptide or a low molecular weight organic compound within the range of 100-5000g/mole, preferably below 2000g/mole, or even more preferably below lOOOg/mole or most preferably below 850g/mole.

The information on the properties of the targets of type X and/or the information on the properties of the ligands of type Y (i.e. descriptors of X and/or Y) may be derived, inter alia, from atom counts, measured or calculated thin layer liquid chromatography (TLC), retention times on HPLC, refractive index, isoelectric point, melting point, boiling point, molecular weight, hydrophobicity, hydrophilicity, chromatographic mobility, van der Waals volume, octanol/water partion coefficient (logP), energy of molecular orbital, heat of formation, polarizability, electronegativity, hardness, total accessible molecular surface area, polar accessible molecular surface area, nonpolar accessible molecular surface area, number of hydrogen bond donors, number of hydrogen bond acceptors, charge, IR-spectra, NMR-spectra or other spectra, HOMO, LUMO, semi-empirical calculations ab inito calculations or 3D quantum mechanical calculations. In most cases the descriptors of X and descriptors of Y may be calculated from already known facts about X or Y (e.g. the structural formula of X or Y), rather than obtaining them by chemical or physical measurements. In some cases, however, the information on the properties may be determined experimentally. With regard to the "targets of type X" and the "ligands of type Y", at least some of the "targets of type X" should be capable of interacting with at least some of the "ligands of type Y". However, it is not a prerequisite that all targets of type X are capable of interacting with all of the ligands of type Y since a non- interacting X/Y pair might also provide useful information. It is preferred that the majority (e.g. 50%, 60%, 70%, 80%, 90%) or even 100%) of the "targets of type X" are capable of interacting with the majority (e.g. 50%, 60%, 70%, 80%, 90% or even 100%) of the "ligands of type Y".

In some embodiments of the invention, the "targets of type X" all have similar physical, chemical, biological and/or pharmacological properties. In other embodiments of the invention, the "targets of type X" share similar structural, compositional or organisational features.

In the invention, it is preferred that the targets of the type X show high diversity.

In some embodiments of the invention, the "ligands of type Y" all have similar physical, chemical, biological and/or pharmacological properties. In other embodiments of the invention, the "ligands of type Y" share similar structural, compositional or organisational features.

In this invention, it is preferred that the ligands of type Y form or are derived or are obtained from an informative library (low molecular weight organic compounds or small peptides) with as high chemical/physical diversity as possible.

Preferred targets of type X are macromolecules having a polymeric structure that show sequence (or other building block or composite building block) homologies of at least 10 %, more preferably of at least 20 % and most preferably of at least 30 %. Even more preferred are macromolecules of type X whose sequence(s) or subsequence(s) (or other building block(s) or composite building block(s)) included in the region(s) included in the analysis according to the procedures of the invention show homologies of at least 10 %, more preferably of at least 20 % and most preferably of at least 30 %. Also preferred are macromolecules of type X where the subsequences (or other building blocks or composite building blocks) comprising parts included in the analysis according to the procedures of the invention show homologies of at least 10 %, more preferably of at least 20 % and most preferably of at least 30 %. Preferred methods for calculating homologies are by using the BLAST algorithm (http:/www.bioactivesite.com/darwin2000/blast/; November 15 2000), using standing settings.

Thus it can be seen that a "target of the type X " is, in most cases, what would generally be termed a variant of the Target X, i.e. one which shares a property or function in common with the Target X. The same applies, mutatis mutandis, to the term "ligand of the type Y".

Preferably each of the targets of the type X independently shares a structure or a DNA, RNA or amino acid sequence in common with the Target X; or a building block (as defined herein) or combination of building blocks in common with the Target X. The same applies, mutatis mutandis, to the term "ligand of the type Y" when the ligands of type Y are macromolecules.

Particularly preferred are targets of the type X which are chimeric variants of the Target X, i.e. which differ from the Target X though the exchange or addition of one or more aminoacids or nucleotides, or aminoacid or nucleotide sequences.

The present invention recognizes that biological macromolecules have polymeric structures, as they are composed of smaller building blocks linked together. Thus, proteins are composed of chains of amino acids linked together, while DNA is composed of chains of nucleotides linked together. As will be evident below, the present invention takes advantage of the polymeric nature of macromolecules in the analysis of the chemical and/or physical properties of X. Using such an approach it is not necessary to have any information of the positions of all atoms in X in three dimensional space. This contrasts the present method from e.g. molecular modelling (or any similar method) which aims to determine the position of all (or at least most of) the molecule's atoms in X in three dimensional space. Therefore, most embodiments of the present invention exclude the use of numeric information which refers to co-ordinates of all (or at least most of) the atoms in X in three-dimensional space. Moreover, for the same reason, some embodiments of the invention exclude the use of numeric information that directly refers to co-ordinates of all (or at least most of) atoms in Y in three-dimensional space. Accordingly, some embodiments of the invention exclude the use of numeric information which directly refers to co-ordinates of all (or at least most of) atoms in both X and Y in three-dimensional space. Moreover, the present invention recognizes that the three dimensional structure of a molecule can be described by the angles betweens its atoms. Therefore most embodiments of the invention exclude the use of numeric information which refers to angles between all (or at least most of) atoms in X. Moreover, for the same reason, some embodiments of the invention exclude the use of numeric information that directly refers angles between all (or at least most of) of atoms in Y. Accordingly, some embodiments of the invention exclude the use of numeric information which directly refers to angles between all (or at least most of) atoms in both X and Y.

By term "angle" in this context is included bond angles, torsion angles and dihedral angles.

For the same reason as stated in the preceeding paragraph most embodiments of the present invention exclude the use of any physical method (such as X-ray crystallography or two- dimensional NMR) that directly determines or predicts the structure of X in three- dimensions, or the information derived therefrom.

In the present context, "building block" is defined as a chemical residue that can be linked together with other chemical residues so as to create a chain. Building blocks usually come in sets, where each member contains variable region(s) that bring different chemical properties to the different building blocks, and chemical groups which are used to linking the building blocks together. For example a set of building blocks could contain the eight members ("residues") a, b, c, d, e, f, g and h. A molecule of desired size and composition could then created by linking the building blocks together, e.g.:

f-a or b-e-a-b-d-g-a-c-c-a-f-f-b, or the like, thus creating a polymer. Usually special chemical groups are present at the "start" and "end" of the chain, such as

T-b-e-a-b-d-g-a-c-c-a-f-f-b-E, or the like,

where T and E denotes start and end groups respectively. However, polymers used in the invention can also exist in cyclic variants, such as:

or

T-b-e-a-b-d-h-b-c-c-a-f-f-e-E

where L is a chemical group comprising a linker so as to create a cycle, and T and E are start and end groups.

Moreover polymeric structures can exist in branched variants, such as e.g.

T-b-e-a-b-d-g-a-c-c-a-f-f-b-E I

or T-b-e-a-b-d-g-a-c-c-a-f-f-b-E I L

T-a-c-g-h-a-c-c-E where L is a chemical group comprising a linker so as to create a branch in the molecule, and T and E are start and end groups.

(It should be noted that the above examples are just given for the case of illustration and are not intended to limit the invention in any way. Thus, any number and order of building blocks, number of cycles and branches, as well as placement of linkers at any postion(s) are allowed).

A structure used in conjunction with the present invention can have zero, one or more number of cycles.

A structure used in conjunction with the present invention can have zero, one or more number of branches.

A structure used in conjunction with the present invention can be modified chemically by removing, adding or exchanging atom(s) within a building block.

A structure used in conjunction with the present invention can be modified chemically by removing, adding or exchanging chemical groups within a building block.

If more than one start group is present in the molecule, the start-groups can be the same or different.

If more than one end-group is present in the molecule, the end-groups can be the same or different.

If more than one linker is present in the molecule, the linkers can be the same or different. The molecular weight of building block is preferably less than 10000 g/mole, more preferably less than 5000 g/mole, even more preferably less than 3000 g/mole, even somewhat more preferably less than 2000 g/mole and most preferably less than 1500 g/mole. However, for many cases the molecular weight of building blocks are quite small such as less than 1000 g/mole, more preferably less than 600 g/mole, even more preferably less than 400 g/mole, even less than 300 g/mole, even less than 200 g/mole, even less than 100 g/mole.

In the case of a peptide or a protein, the building blocks are sets of amino acid residues. In the present context "amino acid residue" is defined as residue of glycine, alanine, valine, leucine, isoleucine, serine, cysteine, threonine, methionine, phenylalanine, tyrosine, tryptophan, proline, histidine, lysine, arginine, aspartic acid, glutamic acid, asparagine, glutamine, and any other naturally occuring amino acid. Moreover, in the case of a peptide or an artificial protein, the building block may also include a residue having the following general structure

in which Z is hydrogen, X or -CH2X where X is chemical moiety of any structure with molecular weight preferably less than 2000 g/mole, more preferably less than 1000 g/mole, even more preferably less than 600 g/mole, and most preferably less than 400 g/mole, T is start-group of any desired structure, or bond to another amino acid residue, and E end- group of any desired structure, or bond to another amino acid residue.

In the present context natural amino acid residues are denoted by one letter codes as defined in Branden & Tooze (1991, p. 6-7). In case of DNA or RNA, "building block" is defined as residue of a nucleotide, in other words e.g. deoxyadenosine 5'-phosphoric acid, deoxyguanosine 5'-phosphoric acid, deoxycytidine 5 '-phosphoric acid, deoxythymidine 5 '-phosphoric acid, deoxyuridine 5'- phosphoric acid. Moreover, artifical nucleotides may be used, such as deoxyinosine 5'- phosphoric acid, and alike. The invention of course recognizes that DNA and RNA generally occur in double stranded form with matching (or eventually mismatching) base pairs.

Compounds of building blocks may include common atoms of organic compounds such as hydrogen, carbon, nitrogen, oxygen, sulphur, phosphor. However, also other atoms may be used, e.g silicon. Polymers used in the present inventions include for both X and Y silicon- containing compounds (as well as other types of organo-metallic-compounds) which by use of the procedures of the invention can be optimized for desired properties, e.g. for use as catalysts that can withstand harsh conditions (e.g. high temperature, high or low pH, etc.).

Both X and Y for use with the procedures of the invention include both natural and synthetic compounds.

Other important embodiments of the present invention take advantage of the use of chimeric proteins (and in an analogous fashion to the use of chimeric DNAs). By use of technology well known in the art, regions of the amino-acid chains of two or more homologous proteins or DNA's can be exchanged so as to create new proteins (or DNA's) inheriting properties of the original proteins (or DNA's). Creating such a set of chimeric proteins (or DNA's) and using them as X's in conjunction with the procedures of the invention will create a case where the used proteins (or DNA's) are likely to show gross similarities in their three dimensional organization. This will have the effect that any differences in the chemical and/or physical properties of the X's will be dependent only on the differences such as, in case of protein differences in chemical properties in amino acid residues, rather than differences in the overall positions of larger structural elements in the proteins. Therefore, one of the most important parts in the present invention is the use of chimeric variations of targets of type X, together with informative variations of ligands of type Y and of observed biological activity. However, it has to be stressed that inactive ligands (no observed biological activity (BA)) is as important as active ligands in the modelling and identification of the active sites. The use of proteins with single or multiple amino acid mutations as the target X is also preferred in the present invention.

The invention also includes the use of informative peptide libraries and the use thereof for identification of the active site in any kind of biological target. Examples of such libraries are given in Examples 3 to 8 and can be used separately or in combination with each other or with any kind of peptides. Since the length of the number of aminoacids (AA) in the peptides is varied, a pre-treatment with ACC of the matrix describing the properties of different peptides is preferably made in order to obtain an uniform matrix. This is generally necessary for making the required calculations in order to identify the "active site" in the target.

Further examples of the implementation of the present invention are given below. The procedure of the invention and its specific emodiments have in experimental investigations (including those of the specific examples of the present invention, and its amendments) found to comprise a surprisingly effective and useful method for analysis and design of ligands of chemical and biochemical nature, something which prior to the disclosure of the present invention was not known.

Step la: Principal Properties (PP's)

Principal Properties (PP's) makes use of amino acids (see Table 1 or the 5PP's described by Sandberg et al.) and replace each amino acid in a target or a group of targets with corresponding descriptors (PP's).

Table 1 The z-scale used to characterise each amino acid.

No. Name Name One letter code zl z2 z3

1 Alanine ALA A 0.07 -1.73 0.09

2 Valine VAL V -2.69 -2.53 -1.29

3 Leucine LEU L -4.19 -1.03 -0.98 4 Isoleucine ILE I -4.44 -1.68 -1.03

5 Proline PRO P -1.22 0.88 2.23

6 Phenyalanine PHE F -4.92 1.30 0.45

7 Tryptophan TRP W -4.75 3.65 0.85

8 Methionine MET M -2.49 -0.27 -0.41

9 Lysine LYS K 2.84 1.41 -3.14

10 Arginine ARG R 2.88 2.52 -3.44

1 1 Histidine HIS H 2.41 1.74 1.11

12 Glycine GLY G 2.23 -5.36 0.30

13 Serine SER S 1.96 -1.63 0.57

14 Threonine THR T 0.92 -2.09 -1.40

15 Cysteine CYS C 0.71 -0.97 4.13

16 Tyrosine TYR Y -1.39 2.32 0.01

17 Aspargine ASN N 3.22 1.45 0.84

18 Glutamine GLN Q 2.18 0.53 -1.14

19 Aspartic acid ASP D 3.64 1.13 2.36

20 Glutamicacid GLU E 3.08 0.39 -0.07

A result of this is that a 330 AA long peptide (e.g. a receptor) will be described by 990 numbers reflecting its chemical/physical properties. Principal properties may also be used to describe the ligand. In an analogous manner, any polymeric structure may be described by numbers representing chemical and/or physical properties of its building blocks.

Step lb: Binary coding bit vectors

A second approach for the assignment of descriptors to the ligand X and/or to the target Y is to use a binary coding and create a "bit- vector". For example, in the case of molecules that are composed of parts (or building blocks or composite building blocks) that can be systematically exchanged (i.e. such as when constructing chimeric molecules), a binary assignment may be performed as follows:

In the case of two variations of the part, a zero is used for one of the variations and an one for the other. For example, molecules that are composed of three parts A, B, C ... with two variations for each part (i.e. A=Aj or A2, B=Bι or B2 and C= C or C2) may be described by one binary number for each part as descriptor; thus three binary numbers may be used to describe the whole molecule e.g.: AjBi C\ is described by 000; A2B1 Ci is described by 100; Ai B2C1 is described by 010, etc.

In the case of more variations of a part, assignments may be done essentially as is exemplified for a case with three variations of a part where three binary numbers are used for each type of A: Ai = 100; A2 = 010; A3 = 001. For four variations of A, one would used four binary numbers, etc..

The binary approach is a convenient way of describing DNA. It is recognized that DNA is composed of building blocks (also termed bases or nucleotides) termed adenine (A), thymidine (T), cytosine (C) and guanosine (G). Thus four binary numbers may be used as descriptors:

A=1000 T=0100 C=0010 G=0001

However, it may also be recognized that bases of DNA are sometimes allowed to be exchanged with other bases without affecting the functionality of the DNA. Such a case occurs in the coding of amino acids. Thus, phenylalanine is coded for by TTT and TTC while serine is coded for by TCU, TCC, TCA and TCG. Thus, the following principle may be used in binary coding of bases, where different possibilites are allowed:

A or T = 1100 A or C = 1010 AorG=1001; T or C = 0110 A or C or G = 1011

AorTorCorG=l 111, etc.

Moreover, when an artifical base such as inosine (I) is used, hybridization does not occur. Accordingly, inosine could be described as 0000. (However, the numbers 1111 may be used if more appropriate for the problem under investigation, as inosine would not have any negative effect on hybridization thus allowing any base to match).

Of course a similar approach as for DNA may be done for RNA also, as well as for any other systematic varation of molecules being composed of building blocks where the physical and chemical effects may be described in terms of the specific effect of one type of building block only, the same effect as one or more building blocks, or no effect of the building block, respectively. This is exemplified as follows for a part with three original building blocks Ai , A2 and A3.

Part of type Aj - 1 0 0

Part of type A2 = 0 1 0

Part of type A3 = 0 0 1

If additional parts are used that may be described in terms of A\, A2 and A3, then assignments may be done based on the above, e.g.:

An additional building block A4 having "no effect" may be described by 0 0 0.

An additional building block A5 combining the effects of A\ and A2 may be described by 1 1 0.

An additional building block Ag combining the effects of A2 and A3 may be described by

0 1 1.

An additional building block A7 combining the effects of A\, A2 and A3 may be described by 1 1 l, etc.

In the above, "effect" may be interpreted as changes in the chemical and/or physical properties and for identification of interactions between molecules of type X with molecules of type Y.

The approach described above is very useful in the handling of chimeric receptors. Step lc: Bit vectors for the description/characterisation of structures for use as descriptors of X or descriptors of Y.

Another way of assigning descriptors to X and/or Y is to create bit vectors, counting how many times a defined structural feature occurs in a structure. The concept is illustrated for a small set of structures in Table 2.1 using the bit string variables defined in Table 2.2. The structural features defined in the bit strings are used to identify how many times they occur in the investigated structures, resulting in Table 2.3, which is a description of the structures in Table 2.1, using the descriptors in Table 2.2. The bit strings used here only serves as an example to illustrate the method. All structural functionalities that occur in the structures under investigation may be added as bit string variables. The bit strings may also be used as only indicating the presence or absence of the feature defined by the bit string, which would result in Table 2.4.

Table 2.1

Table 2.2

Table 2.3

Table 2.4

Additional operations may also be carried out on the descriptors of X and/or Y, for example, translation, PCA and ACC, as exemplified further below.

Translating protein and peptide sequences to a quantitative description

In the literature several investigations have previously characterised amino acids with physico-chemical variables (Hellberg et al. 1986; Hellberg et al. 1987; Josson et al. 1989; Collantes and Dunn, 1995; Sandberg et al. 1998). The characterisation made by Hellberg et. al. (1986; 1987) includes experimental as well as semi-empirical derived variables. Using Principal Component Analysis (PCA), three (or more) new so-called principal property variables may be generated, which summarise the information from the original variables. By using these principal properties, each amino acid in a protein or peptide sequence may be quantitatively characterised, i.e. be translated to three (or more) variables containing physical and chemical information (see Figure 1 for an example). This means that instead of comparing sequences with a one-letter code, or a binary code, a quantitative description of each sequence is generated. The method may be used both for obtaining descriptors of X and descriptors of Y, and used in the procedures of the invention. The same principle as for peptides may be applied for DNA, RNA, proteins and organic libraries, as well.

Translating protein and peptide sequences of different lengths to a uniform matrix

When comparing a set of amino acid sequences of different lengths problems may arise because they would be characterized with a different number of variables if the amino acid residues were used as basis for assigment of descriptors. The analysis of the biological testing results using PLS and many other calculation methods (e.g. neural networks) require a uniform matrix of descriptors where all sequences are described with the same number of variables. A solution to the problem is to calculate auto covariances and auto cross covariances (ACC) (Wold et al. 1993b) between building blocks, e.g. amino acids, which have been translated. ACC compares one amino acid to another neighbouring amino acid in the sequence, which are L positions away. ACC is illustrated in Figure 2. The only restriction on L, called the "lag", is that the largest lag possible is restrained to the shortest sequence in the set minus one. ACC is thereby not dependent on all sequences being of equal length, no alignment is required and neighbouring effects are taken in to account. Auto covariances with lags 1 = 1, 2...L are given by the following equation:

n-l 7 Υ

Index j is used for the scales (j = 1, 2, 3), n is the number of amino acids in a sequence and index i is the amino acid position (i = 1, 2...n). Crossed auto covariances (CC) between two different scales, j and k, are calculated according to the following equation:

This generates a new uniform matrix were each sequence is described by the same number of variables (see Figure 3 for an illustration of the method; the Figure 3 being based on the approach shown in Figure 2), which may be used for further analysis with e.g. PCA or PLS or other calculation methods. A similar approach may be used for branched peptides. A similar approach may be used for DNA and RNA sequences of different length, as well as for any set of polymers composed of building blocks (or even composite building blocks).

The method may be used both for obtaining Descriptors of X and Descriptors of Y, to be used in conjunction with the procedures of the invention.

A further improvement is to combine ACC with OSC (Orthogonal Scatter Correction) to reduce the noise in the models (Andersson et al., 1998).

Application of experimental design for manufacture of optimized sets of molecules for use in conjunction with the procedures of the invention

By combining parts (or building blocks or composite building blocks) from two macromolecules of type X chimeric macromolecules may be manufactured. In the same way, two molecules of type Y that are divided into parts (or building blocks or composite building blocks) may be exchanged so as to create chimeric variants of Y. The approach is of course not limited to mixtures of two original variants of X (or Y), but may be extended to any number of original variants. However, for the case that not all chimeric variants are manufactured, the use of experimental design (Lundstedt et al. 1998; Box et al. 1978) will enhance the analysis when used in conjunction with the procedures of the invention. Experimental design is used to allow the extraction of the maximum information from the selected subset of chimeras.

The method is exemplified by a molecule that contains four parts A, B, C and D, with two possible variations each (c.f. the case shown in Example 1 :1 for chimeric MCI and MC3 receptors). Thus the molecule may be coded in a binary fashion using "-" or "+" as follows: Aj= -, A2= +, Bι= -, B2= +, Cι = -, C2= +, D^*ι= -, D2= +. All the 16 possible chimeric variants are shown in Table 3.1. Making a fractional factorial design where only eight chimeric receptors are manufactured the selection should desirably be done from the ones marked "YES" in the "Manufacture" column of Table 3.1. This ensures that the best subset is selected, resulting in the best possible representation of all possible chimeric molecules when only eight molecules are generated. The molecules marked "NO" in the Manufacture column are the complementary chimeras, also resulting in a full factorial design. Table 3.1

The concept can be further illustrated graphically for the case where molecules are combined with only three parts. Making all possible combinations would result in a total of eight molecules, see Table 3.2, which corresponds to the full factorial design, as is further illustrated in Figure 5 (i.e. all eight possible molecules being illustrated by white and shaded circles). Making a fractional factorial design (marked in the Manufacture column of Table 3.2 as "YES", and as shaded circles in Figure 4), will cover the possible combinations as best as possible when only four receptors are manufactured. (That this is the case becomes evident when one analyzes Figure 4). It is of course equally valid to choose the "NO" chimeras of Table 3.2 as these represent the complementary part of the full factorial design.

Table 3.2

When the selected chimeras of X (and eventually Y) have been generated according to the proper experimental design and tested in order to obtain the biological activity (B A) the results are analyzed according to the procedures of the invention. The use of experimental design according to the principles of the present method (and as is further described in the literature; see Lundstedt et al. 1998) ensures that the experimental efforts will yield as much knowledge and information as possible about the investigated system. Analysing the result of the testing together with a calculation method, e.g. PLS, new directions as to how new more effective molecules of type X and/or Y should be constructed, may be derived. The use of experimental design is not limited to the cases where only two variants for each parts are present. It may be generalized to any case including any desired variant for each part, building block or composite building block.

The second step in this procedure is to describe the ligands Y in a relevant way. The approaches include those described above and also those which are used to design and describe informative chemical libraries or informative peptide libraries (Lundstedt et al. 1997 and Andersson et al. 1999).

Any other description of the Ligand Y may be used, such as any of the conventional descriptions used in QSAR and MQSAR for description of physicochemical properties of organic molecules. Examples of such useful descriptions include GRID (Goodford, 1985) and GRIND descriptors (http/www.miasrl.com/software/amanual/backgr.html of November 15, 2000).

A preferred method for the description of the Ligand Y is through the use of an informative peptide library.

Methods for design of an informative peptide library

The twenty natural aminoacids (aa's) were characterised using the z-scale developed by Hellberg et al., Table 1, resulting in a description of each amino acid with three numerical variables. The twenty aa's where thereafter sorted in nine different groups according to a 2³ full factorial design, as in Table 2.

Table 4 Aminoacids sorted according to the following 2³ design.

1 - - -

2 + - -

3 - + -

4 + + -

5 - - +

6 + - +

7 - + +

8 + + +

9 0 0 0

This resulted in the groups presented in Table 5. For some of the experimental settings in the design there were no alternatives among the natural aa's. As an alternative, the aa closest to the setting was selected using a visual inspection of the score plot. These are indicated in grey, Table 5. For the center point (0;0;0), there was also no obvious alternative and therefore a number of aa's in the vicinity of the center of the structural space were selected to represent the center points.

Table 5. The resulting grouping of the aa's.

Using Table 5, the selection of peptides to include in peptide libraries ranging from di- to heptapeptides was made, as presented in Examples 3 to 8.

The library may also consist of non-peptidic compounds, e.g. low molecular weight organic or inorganic compounds.

STEP 3

The third step in the procedure is to measure the interaction between the ligands of the type Y and targets of the type X. This may be measured by any means known per se. The interaction may be quantitated, for example, on the basis of binding affinity, selectivity, activity, biological activity, avidity, Km of enzyme, hybridisation or any other means which directly or indirectly provides a measure of the interaction.

Preferably the affinity or activity of the different Ligands Y (most preferably from an informative compound library) for a target X or a number of targets X is measured.

The binding affinity or biological activity may, for example, be determined using methods described by Lunec et al. (1992), Szardenings et al. (1992), Schioth et al. (1992) or other similar methods.

A very specific example for how to the biological activity (B A) constitutes ligand binding methods. In this method different concentrations of X or Y are usually incubated together. Usually the concentrations of Y are varied systematically into different assays containing the same concentration of X and the amount of Y bound to X is then measured and related to the activity for the interaction of variants of Y with variants of X. In most cases, a third labelled molecule (the "labelled ligand") is added which also binds to X, the binding of the labelled ligand being prevented by Y. The degree and concentrations active for variants of Y being capable of preventing the binding of the labelled ligand to variants of X are related the activity of the interactions of the Y's with the X's. Such ligand binding methods are well known in the art, specific examples are found in Uhlen and Wikberg

(1991) and Schiδth et al. (1995).

The binding approach may also be useful when X is a non-protein macromolecule, such as DNA. Methods for hybridization measurements for DNA are well known in the art.

When X includes enzymes, the capacity of variants of X to convert a substrate to a product may be measured. The influence of different concentrations of variants of Y to either inhibit or promote the conversion of the substrate to the product may then be measured and used as a measure of B A.

Other examples for use as measures of B A include quantifying second messenger elements (cAMP, cGMP, intracellular calcium concentrations, inositol triphosphate, diacylglycerol, and the like), and quantifying protein phosphorylation (including phosphorylations of tyrosine, serine and threonine). Such measurements can be typically done in organs, isolated cells, cells in culture, cell free systems, membrane preparations, and the like.

Other examples included of BA constitute measurements of ion-channel opening and closure, single ion channel currents, membrane potentials, voltage clamping and other electrophysiological measurements.

When X and Y are both represented by macromolecules any suitable biochemical, biophysical or pharmacological response related to the interaction of X and Y can be used as a measure of BA. A very specific example is quantifying the dimerization of tyrosine kinase receptors. In such a case e.g. X could be one variant of subunit of the tyrosine kinase and Y another variant of a subunit of the tyrosine kinase and the capacity of X to interact with Y quantitated and used as a measure of BA. Yet another example of measurment for use as BA is measuring the avidity. Thus X could include variants of antibodies and Y could include variants of antigens. The degree of interaction of X with Y can then be measured by using methods well known in immunology, such as by quantifying avidity.

The X may also be included in a multicellular organism. The production of transgenic animal is well known in the art. Obtaining the BA may in e.g. involve the administration of a Y to transgenic animals containing different variants of X and observing any desired physiological response in the animal.

X may also be included in a viral particle or a phage. E.g. X may be included within the amino acid sequence of a capsid protein of a virus or phage, e.g. M13-phages.

In some cases it may be possible to calculate values for the interactions of variants of X with variants of Y. Also such calculated values are of direct use for the procedures of the present invention. However, this embodiment of the invention is very rare and seldom used in practice.

The method for obtaining BA according to the procedure of the invention may be used in several ways for the analysis of the interactions of X and Y, for the design of improved macromolecules X and/or for the design of improved molecules Y.

It will, of course, be appreciated that Steps 1, 2 and 3 do not have to be carried out in this order. They may be carried out in any order or even simultaneously.

STEP 4

The fourth step is to establish a mathematical model describing the observed interaction between the Ligand X and Target Y, as a function of the properties of the ligands Y and the properties of the targets X. A preferred procedure for identifying an active site in a macromolecule is based on the chimeric approach exemplified by receptors. This is the fastest and simplest route for finding the region of the target wherein the active site is located. The chimeric receptors are preferably combined in accordance with a multivariate design, factorial or fractional factorial design, in order to obtain a well balanced and informative set of combined

"chimeric" receptors. This is the first real step towards informative combinatorial biology. However, the use of naturally-occurring variants of the Target X has proven to be surprisingly effective and useful in conjunction with the present invention.

Identification of the active site is done by describing the biological activity (BA) as a function of the properties of the ligands Y, the properties of the targets X, and the interaction between the ligands Y and the targets X. The interaction is defined by multiplying the descriptors capturing the properties of the ligand with the descriptors of the target where the descriptors may be principal properties or other chemical and/or physical descriptors. However, new descriptors may be generated by any function of the descriptors of the target X and ligand Y. One example of a model of the biological activity is given by the equation :

BA = f (X,Y)

This general function can, by a Taylor expansion, be approximated to a polynomial function with different degree of complexity. In the invention, it was found that a second- order interaction model is sufficient for identification of "active-sites" in a macromolecule (see equation below in matrix-form)

BA = BA_average + b,*(X) + b₂*(Y) + b₁₂*(X)*(Y)

The coefficients in the equation above may be determined by PLS but may also, if the number of measurements are big enough, be determined by PCR, MLR, NN (neural net), Stepwise regression or other similar method.

The coefficients in the equation provide the necessary information for finding the location of the binding site in the target as well as with important information of the chemical/physical properties needed for a very active ligand. The coefficients provide information about which features are important in the ligands, the targets and the important features of the interaction between them.

The binding site and/or active site is identified as the interaction terms in between the chemical/physical descriptors or "principal properties" of the ligands and the chemical/physical descriptors or principal properties for the target.

Biological activity (BA) is described as function of the "PP^'s" for the ligand and the PP's of the target and the interaction term between the ligand and the target.

The estimated correlation coefficient of the "target-ligand-interaction" provides information of the position of the active site in the macromolecule as well as a description of the chemical/physical properties of the active-site. This information is of outstanding value in the design of new leads as well as in lead optimisation. Mathematically this a simple procedure which surprisingly provides information regarding the "active site".

The model is preferrably produced using one or more of multivariate methods, partial least squares methods, neural networks, multiple linear regression, non-linear regression, curve fitting, model fitting, stepwise regression and maximum likelihood methods.

STEP 5

When the region for the active site has been located, a new description of the interesting regions of the target may optionally be made with higher resolution, i.e each AA in the interesting region is replaced by its principal properties (see Table 1). If further information regarding the target is needed, then exchange of specific AA's can be made by mutations in the interesting region or regions of the target. This should preferably be done by an informative design in order to ensure diversity in properties.

The model derived by the use of the present invention may be directly useful for predicting the properties of novel targets of type X as well as novel ligands of type Y. Hence the invention is particularly useful in drug design as well as in the engineering of new molecules of type X or Y (e.g. in protein engineering). The processes in accordance with the present invention may be implemented at least partially using software e.g. computer programs. It will thus be seen that when viewed from further aspects, the present invention provides computer software specifically adapted to carry out the processes hereinabove described when installed on data processing means, and a computer program element comprising computer software code portions for performing the processes hereinabove described when the program element is run on data processing means. The invention also extends to a computer software carrier comprising such software, particularly when used to operate a process of the invention. Such a computer software carrier may be a physical storage medium such as a ROM chip, CD ROM or disk, or may be a signal such as an electronic signal over wires, an optical signal or a radio signal such as to a satellite or the like.

It will further be appreciated that not all steps of the method of the invention need to be carried out by computer software and thus from a further broad aspect the present invention provides computer software and such software installed on a computer software carrier for carrying out at least one of the steps of the processes set out hereinabove.

LEGENDS TO THE FIGURES

Figure 1 Translation of physical and chemical properties of amino acids into principal properties. Figure 2 The ACC approach.

Figure 3 Calculation of auto covariances and auto cross covariances. Figure 4 Full factorial and fractional factorial design. Figure 5 Generalised template for X of Example 1 comprising aligned MCI and MC3 receptor amino acid sequences and division of template into parts A, B, C and D. Figure 6 Generalised template for Y of Example 1 comprising aligned MSH and

MS04 receptor amino acid sequences and division of template into parts α and β.

Figure 7 Molecules of type Y in Example 1. Figure 8 Permutation testing in Example 1:1. Figure 9 Observed versus calculated K* for model of Example 1 :1.

Figure 10 Permutation testing in Example 1 :2.

Figure 11 Observed versus calculated K* for model of Example 1 :2.

Figure 12 Permutation testing in Example 1 :3. Figure 13 Observed versus calculated K* for model of Example 1 :3.

Figure 14 Variable importance in the projection (VIP) in Example 1 :4.

Figure 15 Variable importance in the projection (VIP) in Example 1 :6.

Figure 16 Alignment of three subtypes of human wild-type alpha 1 adrenoceptors.

Figure 17 52 positions with sequence variation, extracted from TM regions of human alpha- 1 adrenoceptor subtypes and chimeras of alpha- 1 adrenoceptors. Black characters on white background denote variation between 2 amino acids; white characters on black background denote variation between 3 amino acids.

Figure 18 X data set from Example 2. Figure 19 Molecular template and details of the compounds used in Example 2.

Figure 20 Y block data from Example 2.

Figure 21 Summary of pKi values (BA) reported for alpha- 1 adrenergic receptor interactions with 4-piperidyl oxazole antagonists.

Figure 22 Graph showing observed v calculated pKi for Example 2. Figure 23 Normalised PLS regression coefficients from Example 2.1.

Figure 24 Summation of MIPs and MICs for parts A-G corresponding to TM regions 1-

7 (Example 2.2).

Figure 25 The 16 peptides selected according to a 2 fractional factorial design + 3 cp

(or 1 cp 2 random). Figure 26 16 peptides selected according to a 2 ^9"s fractional factorial design + 3 cp (or

1 cp 2 random).

Figure 27 32 peptides selected according to a 2 ^" fractional factorial design + 3 cp (or

1 cp 2 random).

Figure 28 32 peptides selected according to a 2 ^l5"10 fractional factorial design + 3 cp (or 1 cp 2 random).

Figure 29 32 peptides selected according to a 2 ^" fractional factorial design + 3 cp (or 1 cp 2 random). Figure 30 51 peptides selected according to a 2 ^21"16 fractional factorial design with additional experiments added from a half a fold over +3 cp.

EXAMPLES

The following examples are intended to illustrate but not to limit the scope of the invention.

Example 1:1

Analysis of the interaction of MSH-peptide variants with chimeric MCI and MC3 receptors by use of the method of the invention

The analysis is divided into steps as described below, conforming essentially to the above- dscribed steps of the invention.

i) A macromolecular template X was made by using a generalised structure of the melanocortin receptor 1 (MCI) (FEBS Lett. 1992, 309, 417-420) and melanocortin receptor 3 (MC3) (J. Biol. Chem. 1993, 268, 8246-8250). Thus, X was generalised into one entity by aligning the MCI and MC3 receptor amino acid sequencese as earlier described (J. Molecular Graphics Modelling. 1997, 15, 307-317) (Figure 5). The thus formed template was then divided into 4 parts termed A, B, C and D, as illustrated in Fig. 5. Thus, Figure 5 shows the aligned amino acid sequences of the MCI and the MC3 receptors with the parts A, B, C and D of template X indicated.

ii) Thus according to the foregoing paragraph, there are two structural variants for each part of the template X. We then selected A, B, C and D parts from the MCI receptor sequence and termed them Ai, Bi, d, D] (Fig. 5). In the similar fashion, parts from the MC3 receptor sequence were also selected and termed A₂, B₂, C₂, D (Figure 5). Combining different variants of the same part in template X would make total 16 possible combinations; one being the MCI receptor, one the MC3 receptor, and 14 being MC1/MC3 receptor chimeras. Using molecular biological techniques we earlier made 8 of these 14 possible chimeras (Mol. Pharmacol. 1998, 54, 154-161). For the present analysis we thus had in total 10 different receptors which by their parts could be described as A1B1C1D, (i.e. native MCI -receptor), A,B,C,D₂, A,B,C₂D,, A_!BιC₂D₂, A-BzCD,, A1B2C2D1, A,B₂C₂D₂, A2B₂CιDι, A₂B₂C₂D,, A₂B₂C₂D₂ (i.e. native MC3-receptor).

iii) The template Y was made by using a generalised structure derived from two known peptides MSH and MS04 (J. Biol. Chem. 1997, 272, 27943-27948) (Figure 6). As shown in the figure, both peptides have a common sequence in the middle, but their C- and N- terminals differ. Using this central common part, both peptides could be aligned to each other creating the template Y (Figure 6). We then divided Y into three parts: N-terminai part, the middle and C-terminal part (see Figure 6). Because both peptides have exactly the same sequence in their middle part, we neglected it for the further analysis, leaving two selected parts in Y: α (i.e. N-terminal part) and β (i.e. C-terminal part).

iv) Thus according to the foregoing paragraph there are two structural variants for each part of the template Y. Combining different variants of the same part in template Y would make total 4 possible combinations; one being the MSH and one the MS04, and two being MSH/MS04 chimeras. All were synthesized thus yielding MSH (cnβ-), MS04, (α₂β₂), MS05 (α₂βι) and MS06 (αιβ₂) (Figure 7).

v) In the present example, we used a binary representation of the data for both X and Y. To the variants with subscript 1 we assigned value 0, and to variants with subscript 2 with assigned a value 1. Thus, e.g. the MCI receptor together with MSH peptide could be described with six zeroes (0,0,0,0,0,0), whereas e.g. the chimeric receptor A]BιC₂Dι with MS06 peptide (αιβ₂) could be described as 0,0,1,0,0,1. An abstraction of the signals used being shown in Table 6, for all the 40 possible cases:

Table 6

vi) In order to obtain quantitative information for peptide and receptor for the purpose of deriving BA, we performed standard binding assays using the procedures, essentially as described in Mol. Pharmacol. 1998, 54, 154-161, for all receptors versus all peptides included in the analysis, resulting in a data set of 40 binding constants (Kis). In the further analysis, we used the positive logarithm of the Kj values [Logιo(Kj)] of the data in order to derive the BA that was used, an abstraction being shown in Table 6.

vii) We then applied the partial least squares (PLS) analysis method (Analytica Chimica Acta, 1986, 185, 1-17 ) to correlate the stored signals obtained in step v) with the stored signals obtained in step vi). For this purpose we used the Simca program (Umetri AB, Box 7960, SE-90719 Umea, Sweden) which was appropriately configured for use of the approriate stored profiles as is detailed further below which resulted in a highly useful model of the B A. Results.

One PLS component (see Analytica Chimica Acta 1986, 185, 1-17 for description of PLS component) was sufficent for deriving a good model BA. The R² and Q² values for the model was 0.70 and 0.61 (see Eriksson et al., 1996, for definition of R² and Q²). (Computations were performed using SIMCA 7.0 (see SIMCA 7.0 manual, 1998) using autofit with 7 cross-validation groups which indicated only one significant PLS component). Moreover, additional validation of model BA was performed by randomising the data (i.e. input signals) and calculating corresponding R and Q values for each random model, by performing so called permutation testing (see Eriksson et al., 1996, for a description of the procedure). The results, which are represented as the output abstraction shown in Figure 8, demonstrate the usefulness of the model BA. The goodness of the fit was further be explored by comparing predicted and actual values of responses (i.e. predicted BA versus measured BA). The correlation results are shown in as the abstraction shown in Figure 9. As can be seen the correlation is good.

Example 1:2

Improvement of model BA of Example 1 :1 by the addition of cross-terms

The model BA of Example 1 : 1 was improved by adding cross-terms. This was done by calculating new descriptor signals from the original descriptor signals given in Table 6 of Example 1 :1 by performing all possible multiplications of two different original descriptors. The new descriptor signals thus obtained are generally referred to as cross- terms (see SIMCA 7.0 manual, 1998). The improved PLS model (i.e. improved model BA; in the following termed model BA of Example 1 :2) was obtained using SIMCA autofit and had 2 significant components (see SIMCA 7.0 manual, 1998) and yielded R² and Q² values of respectively, 0.95 and 0.66. The permutations of the new model are shown in the output abstraction of Figure 10. In Figure 11 is shown an output abstraction representing the comparison of the calculated BA and measured BAs being derived by use of the model BA of Example 1 :2. As seen the correlation is excellent. Example 1:3

Improvement of model BA of Example 1 :2 by removing descriptors with low variable influence

A new model BA was created from the model of Example 1:2 by removing descriptor signals which had lower variable influence values than 0.3 (see SIMCA 7.0 manual, 1998 for the meaning of variable influence and how this is performed) and performing PLS calculations essentially as described above for Examples 1 : 1 and 1 :2. The permutations of the new model BA (in the following termed model BA of Example 1 :3) are shown by the output abstraction represented in Figure 12. In Figure 13 is shown the output abstraction representing a comparison of the calculated BAs derived by the used of model BA of Example 1:3 and the measured BAs. As seen the correlation for the values is excellent.

Example 1:4

Analysis of influence of parts

The model BA of Example 1 :3 was used to analyze the influence and interactions of parts in X and Y. This was done by calculating the variable importance in the projection (VIP) for each descriptor of Example 1:3 (including the cross-terms retained in Example 1 :3) using SIMCA 7.0 (see SIMCA 7.0 manual, 1998, p. 15-11). An output abstraction representing these influences are shown in Figure 18. As can be seen from the abstraction the highest influence is exerted by part β of Y and part B of X. Part A of X and part α of Y are also important, while D and C parts of X are unimportant. Although part D is not important, the interaction of this part with part B (i.e. B x D column) shows a significant effect on the responses (Figure 14).

Example 1:5

Use of model BA of Example 1 :3 for prediction of properties of new receptors

The model BA created in Example 1 :3 was used to predict the abilty of new variants of X to bind MSH peptides. According to Example 1:1, step ii) only the signals derived from 8 MC1/MC3 receptor chimeras were used out of 14 possible chimeras. The interaction of the remaining 6 with the MSH peptides was predicted using the Model BA of Example :3, an output abstraction for the prediction being shown in Table 7.

Table 7

Example 1:6

Studies of the interactions of parts of X with parts of Y

For this purpose we created a new model BA (in the following termed model BA of Example 1 :6) using only those cross-terms containing signals derived from parts from both X and Y, and using the signals derived from measured BAs and using the same PLS procedure as above. The new model showed R² and Q values of respectively, 0.64 and 0.61. The variable importance in the projection (VIP) for each cross-term descriptor was calculated as in Example 1 :4, an output abstraction for which is being shown in Figure 15. As can be seen from the Figure 15, the most important interactions are between part B and part β, and between part B and part α.

Example 2

Use of principal property variables of amino acids for describing macromolecules X in the analysis of the interaction of alpha- 1 adrenoceptors with 4-piperidyl oxazoles

The published data of Hamaguchi et al (Biochemistry. 37 (1998) 5730-5737) comprising studies on human alpha- 1 adrenoceptor subtypes formed the basis for the analysis. The analysis of this data was performed in a computer essentially according to the steps of the invention, as follows:

1. The three subtypes of the human wild type alpha- 1 adrenoceptors used in the

Hamaguchi study were elected and aligned, thereby creating the macromolecular template X consisting of 7 parts A, B, C, D, E, F and G (i.e. the underlined amino acid sequences of Fig. 16). The alpha- 1 recepor sequences were mixed as described in the Hamaguchi study creating 12 wild-type and chimeric receptor. From the seven parts A-G of the 12 receptors the differing amino acids were identified as shown in Fig. 17. Each of the parts A-G was assigned numbers as follows: Sequence positions that did not differ among the 12 receptors was not assigned any numbers. (Note that amino acids at positions that did not differ are omitted in Fig. 17). For each amino acid position that differed by two amino acids or more among the 12 receptors, every amino acid was assigned 5 numbers selected from the 5 z- scale descriptors for amino-acids derived by Sandberg (Sandberg et al J. Med. Chem. 41 (1998) 2481-2491). However, for positions differing by only 2 amino acids, the 5 z-scale descriptor numbers were in an additional step merged into one number by calculating physico-chemical distances based on the two differing amino acids, as follows:

Wherein AB is the physiocochemical distance between amino acids A and B and Z_A the z-scale of amino acid A and Zβ the z-scale of amino acid B. The number of positions with two amino acids differing were for parts A-G, respectively, 9, 2, 5, 9, 9, 4 and 3 (totally 41). Number of positions with three amino acids differing were, respectively, 2, 3, 1, 2, 0, 2, 1 (totally 11). Thus, in total 52 amino acid positions differed in the data set that yielded in total 41 + 11*5 = 96 numbers for each receptor describing its physico- chemical properties. Thus, in total the X data set comprised a matrix of 96*12 = 1152 floating point numbers stored in the computer according to Fig. 18 hereinafter termed X-block.

2. Twelve compounds comprising derivatives of 4-piperidyl oxazole modified at three positions were used. A molecular template Y for these compounds is indicated in Fig. 19 as well as each of the 12 compounds used. Each compound was coded using 24 binary descriptors comprising parts α, β and γ as shown in Fig. 20. Hereinafter the data created by these descriptors is termed Y-block. 3. BA for the interaction of X and Y defined in steps 1 and 2 of the present Example were obtained from the literature (Biochemistry. 37 (1998) 5730-5737) and is given as the pK* values shown in Fig. 21.

4. In order to correlate the X and Y with BA first all descriptors of Y were multiplied creating CI -block as well as all descriptors of X and Y were multiplied creating C12 block. Thereby four blocks of descriptors X, Y, CI and C12 were obtained and stored in the computer. Descriptors were used to correlate to BA using PLS. In order to obtain optimal models the four descriptor blocks were scaled using scaling weights. Optimal scaling was achieved by giving the same scaling weight to one block and then varying the scaling weight of the other blocks systematically until an optimal model was found using the Simplex optimisation strategy (see Lundstedt et al. Chemometrics Intelligent Laboratory Systems. 42 (1998) 3-40). We also systematically excluded descriptors with VIPs < 0.3-0.5 until optimal models (with respect to Q² values) were obtained. The model finally obtained showed R²X = 91.5 %, R²Y = 95.6 % and Q² = 91.3 %.

PLS calculations were performed using SIMCA 7.0 software (Umetrics, Umea, Sweden). (For definitions of R²X, R², Q² and VIP see SIMCA 7.0 Manual, Umetrics, Umea, Sweden).

5. A graphical representation of the derived relationships is shown in Fig. 22, the figure showing observed and predicted pK*- values.

Example 2:1

Assessment of importance of the physico-chemical properties of amino acids in alpha- 1 adrenoceptors for binding 4-piperidyl oxazoles using normalized PLS regression coefficients

The model of Example 2 was used to assess the importance of amino acids in the alpha- 1 adrenoceptors for their binding of the 4-piperidyl oxazoles. This was achieved by assessing the PLS regression coefficients of the model. (In order to normalize the coefficients they were multiplied with the standard deviation of the corresponding descriptors). These normalised PLS regression coefficients are illustrated in Fig. 23. As can be seen, the largest impact is taken by amino acids in TM2 (i.e. transmembrane region 2 = Part B) and TM5 (i.e. transmembrane region 5 = Part E).

Example 2:2

Assessment of importance of trans-membrane regions in alpha- 1 adrenoceptors for binding of 4-piperidyl oxazoles

In order to get an over all assessment of the importance of TM regions MIPs were calculated as follows:

Wherein MIP_a is the modelling importance of primary term, σ_a the standard deviation, and coeff_a the regression coefficient of variable a in the data set.

The MIPs were summed for each of the parts A-G corresponding to TM regions 1-7, the results of which are illustrated in Fig. 24 A. As can be seen, TM2 and TM5 show clearly higher importance than the other TM regions for the binding of 4-piperidyl oxazoles.

In order to find the specificity portion of importance of TM regions MICs were calculated as follows:

Wherein MIC_a is the modelling importance, σ_a the standard deviation, and coeff_a._n the regression coefficient of cross-terms in the data set. The AD_n corresponds to the average deviation from the means of cross-terms partners of a, and was approximated by 0.8 • σ_n, where σ„ is the standard deviation of the cross-term partners of a.

The MICs were summed for each of the parts A-G corresponding to TM regions 1-7, the results of which are illustrated in Fig. 24B. As can be seen, TM2 and TM5 show clearly higher importance for the specificity of 4-piperidyl oxazoles binding to the alpha- 1 adrenoceptors, compared the other TM regions.

Example 3

Reference is made to the dipeptides disclosed in Figure 25.

The 16 peptides were selected according to a 2 fractional factorial design + 3 cp (or 1 cp 2 random).

Example 4

Reference is made to the tripeptides disclosed in Figure 26.

The 16 peptides were selected according to a 2 ^9"5 fractional factorial design + 3 cp (or 1 cp 2 random).

Example 5

Reference is made to the tetrapeptides disclosed in Figure 27.

The 32 peptides were selected according to a 2 ^" fractional factorial design + 3 cp (or 1 cp 2 random).

Example 6

Reference is made to the pentapeptides disclosed in Figure 28.

The 32 peptides were selected according to a 2 ^I5"10 fractional factorial design + 3 cp (or 1 cp 2 random).

Example 7 Reference is made to the hexapeptides disclosed in Figure 29.

1 R I T

The 32 peptides were selected according to a 2 ^" fractional factorial design + 3 cp (or 1 cp 2 random). Example 8

Reference is made to the heptapeptides disclosed in Figure 30.

The 51 peptides were selected according to a 2 ^2I"16 fractional factorial design with additional experiments added from a half a fold over +3 cp.

REFERENCES

Adan, R.A., Oosterom, J., Toonen, R.F., Kraan, M.V., Burbach, J.P., Gispen, W.H.:

Molecular pharmacology of neural melanocortin receptors. Receptors Channels. 1997, 5, 215-123.

Andersson, P.M., Sjδstrδm, M., Lundstedt, T. Preprocessing peptide sequences for multivariate sequence-property analysis. Chemometr. Intell. Lab. Syst. 42,41-50 (1998)

Andersson, P.M., Linusson, A., Wold, S., Sjostrδm, M., Lundstedt, T., Norden, B. 'Design of Small Libraries for Lead Exploration'. In Molecular Diversity in Drug Design (Ed. R. Lewis, P.M. Dean) Kluwer Academic Publishers, November 1999, ISBN 0-7923-5980-1.

Baldwin, J.M.: The probable arangement of the helices in G protein-coupled receptors. EMBO Journal. 1993, 12, 1693-1703.

Bard, Y: Nonlinear parameter estimation. Academic Press, London, 1974, ISBN 0-12- 078250-2.

Bergstrόm, A. and Wikberg, J.E.S.: Structural and pharmacological differences between cod fish and rat brain alpha- 1 receptors revealed by photoaffinity labeling with 125 _ APDQ. Acta Pharmacol. Toxicol. 1986, 58, 148-155.

Box, G.E.P., Hunter, J.S., Hunter, W.G.: Statistics for experimenters: An introduction to design, data analysis, and model building. John Wiley & Sones, 1978.

Branden, C. and Tooze, J.: Introduction to protein strucure. Garland Publishing, New York, 1991, ISBN 0-8153-0344-0. Bylund, DB., Eikenberg, DC, Hieble, JP., Langer, SZ., Lefkowitz, RJ., Minneman, KP., Molinoff, PB., Ruffolo Jr, RR., Trendelenburg, U.: IV. International union of pharmacology nomenclature of adrenoceptors. Pharmacol. Rev. 1994, 46, 121-136.

Carlson, R., Lundstedt, T., Albano, C: Screening of suitable solvents in organic synthesis. Strategies for solvent selection. Acta Chem. Scand. (1985), B39(2):79-91

Carlson, R., Prochazka, P., Lundstedt, T.: Principal properties for synthetic screening: Ketones and aldehydes, Acta Chem. Scand, 1988, B42, 145-156.

Chothia, C. & Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. EMBO J. 1986, 5, 823-826.

Clementi, M., Clementi, S., Clementi, S., Cruciani, G., Pastor, M. (2000). "Chemometric detection of binding sites of 7TM receptors QSAR" in Molecular Modelling and Prediction of Bioactivity (Eds. K. Gundertofte, F.S. Jørgensen) New York, Kluwer Academic/Plenum Publishers.

Collantes, E.R., Dunn III, W.J.: Amino acid side chain descritpors for quantitative structure-activity relationship studies of peptide analogues. J. Med. Chem. 1995, 38, 2705- 2713.

Cramer III, R.D., Patteson, DE and Bunce, JD: Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J. Amer. Chem. Soc, 1988, 110, 5959-5967.

Daveu C, Bureau R.: Definition of a pharmacophore for partial agonists of serotonin 5- HT3 receptors. J Chem Inf Comput Sci. 1999, 39, 362-369.

de Groot MJ, Ackland MJ.: Novel approach to predicting P450-mediated drug metabolism: development of a combined protein and pharmacophore model for CYP2D6. J Med Chem. 1999, 42, 1515-1524. Eriksson, L., Johansson, E. and Wold, S. Quantitative Structure.Activity Relationship Model Validation. In: Quantitative Structure-Activity Relationships in Environmental Sience-VII, Eds. F Chen et al, Proceedings of QSAR 1996, June 24-28, Elsinore Denmark, SETAC Press, Florida, US, page 381-397.

Frandberg, P-A., Muceniece, R., Prusis, P., Wikberg, JES., Chhajlani, V.: Evidence for alternate points of attachement for -MSH and its stereoisomer [Nle⁴, D-Phe⁷]- -MSH at the melanocortin receptor. Biochem. Biophys. Res. Commun. 1994, 202, 1266-1271.

Goodford, J. Med. Chem. (1985) 28, 849-857.

Hansch, C, Maloney, P.P., Fujita, T., Muir, R. Correlation of biological activity of phenoxyacetic acids with Hamett substituent constants and partition coefficients. Nature (London) 1962, 194(178-180):1616-1626

Hansch, C, Fujita, T. p-σ-π-Analysis. A method for the correlation of biological activity and chemical structure. J.Am. Chem. Soc. 1964. 86:1616-1626

Hellberg, S., Sjόstrόm, M. and Wold, S.: The prediction of bradykinin potency of pentapeptides. An example of a peptide quantitative structure-activity relationship, Acta Chem. Scand. 1986, B40, 135-140.

Hellberg, S., Sjostrόm, M, Skagerberg, B., Wikstrόm, C and Wold, S.: On the design of multipositionally varied test series for quantitative structure-activity relationsships, Acta Pharm. Jugoslavia, 1987, 37, 53-65.

Jackson, J.E.: A users guide to principal components, Wiley, New York, 1991.

Jensen, K. and Wirth, N.: Pascal User Manual and Report, 3d edition, Springer- Verlag, 1985.

Jonsson, J., Eriksson, L., Hellberg, M., Sjόstrόm, M. and Wold, S.: Multivariate parametrization of 55 coded and non-coded amino acids, Quant. Struct- Act. Relat, 1989, 8, 204-209. Lawrence, J.: Neural networks. Design, theory and applications. California Scientific Software Press, Nevada City, CA 95959, USA, 1993.

Lundstedt, T. The Willgerodt-Kindler reaction, a multivariate approach. (Thesis, Umea) 1986. ISBN 91-7174-248-4

Lundstedt, T., Andersson, P.M., Clementi, S., Cruciani, G., Kettaneh, N., Linusson, A., Norden, B., Pastor, M., Sjόstrόm, M., Wold, S., 'Intelligent combinatorial libraries'. In computer-assisted lead finding and optimization (Ed. H. van de Waterbeemd) Verlag Helvetica Chimica Acta, Basel, Switzerland, 1997, 191-208.

Lundstedt, T., Seifert, E., Abramo, L., Thelin, B., Nystrόm, A., Pettersen, J., Bergman, R.: Experimental design and optimization. Chemometrics Intelligent Laboratory Systems. 1998, 42, 3-40.

Lunec, J., Pieron, C, Thody, A.J. MSH receptor expression and the relationship to melanogenesis and metastatic activity in B16 melanoma. Melanoma Res. (May 1992), 2(1): 5-12.

McGregor, MJ., Muskal, SM.: Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J Chem Inf Comput Sci. 1999, 39, 569-574.

Nystrόm, A., Andersson, P.M., Lundstedt, T. Multivariate data analysis of topographically modified-melanotropin analogues using Auto and Cross Auto Covariances (ACC). Quant. Struct.-Act. Relat. 264-269 (2000)

Rang, H.P., Dale, M.M. and Ritter, J.M.: Pharmacology, 4^th edition, Churchil Livingstone, UK, 1999, ISBN 0443 059748.

Sandberg, M., Eriksson, L, Jonsson, J, Sjόstrόm, M and Wold, S.: New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J. Med Chem., 1998, 41, 2481-2491. Schiόth, H.B., Muceniece, R., Wikberg, J.E.S., Chhajlani, V.: Characterisation of melanocortin receptor subtypes by radioligand binding analysis. Eur. J. Pharmacol., Mol. Pharm. Sect. 1995, 288, 311-317.

Schiόth, H.B., Mutulis, F., Muceniece, R., Prusis, P., Wikberg, J.E.S.: Discovery of novel melanocortin 4 receptor selective MSH analogues. Br. J. Pharmacol. 1998, 124, 75-82.

Schiόth, H.B., Yook, P., Muceniece, R., Wikberg, JES., Szardenings, M.: Chimeric melanocortin 1/3 receptors: Identification of domains determining the specificity of MSH peptides. Mol. Pharmacol. 1998, 54, 154-161.

SIMCA 7.0 A new standard in multivariate data analysis, Manual, Edition August 21, 1998, Umetri AB, Box 7960, SE907 19 Umea, Sweden.

Sjόstrόm, M. and Eriksson K: Application of statistical experimental design and PLS modelling in QSAR. In: QSAR: Chemometric method in molecular design, Methods and principles in medicinal chemistry, vol. 2. (Ed. H. Van de Waterbeemd) Verrlag Chemie, Weinheim, Germany, 1995, 63-90.

Szardenings, M., Tomroth, S., Mutulis, F., Muceniece, R., Keinanen, K., Kuusinen, A., Wikberg, J.E. Phage display selection on whole cells yields a peptide specific for melanocortin receptor 1. J. Biol. Chem.1997 Oct 31 :272(44), 27943-8

Uhlen, S., Wikberg, J.E.S.: Delineation of rat kidney α.2A~ ^an 2B~^adrenoceptors with [3HJRX821002 radioligand binding: computer modelling reveals that guanfacine is an ^α2A"^se-'^ecti'^ve compound. Eur. J. Pharmacol. 1991, 202, 235-243.

Wold, S, Esbensen, K. and Geladi, P: Principal component analysis. In Chemometrics and intelligent laboratory systems, 1987, 2, 37-52.

Wold, S, Johansson, M., Cocchi, M.: PLS - partial least-squares projections to latent stuctures. In 3D QSAR in drug design; Theory, methods and application. (Ed. H. Kubinyi) ESCOM Science Publishers, Leiden, Holland, 1993a, 523-550. Wold, S., Jonsson, M., Sjδstrόm, M., Sandberg, S. and Rannar, S.: DNA and peptide sequences and chemical processes multivariately modelled by PCA and PLS projections to latent structures. Anal. Chim. Acta, 1993b, 227, 239-253.

Wold, S.: PLS for multivariate modelling. In: QSAR: Chemometric method in molecular design, Methods and principles in medicinal chemistry, vol. 2. (Ed. H. Van de Waterbeemd) Verlag Chemie, Weinheim, Germany, 1995, p. 195-218.

S. Wold, M. Sjόstrϋm, P.M. Andersson, A. Linusson, M. Edman, T. Lundstedt, B. Norden, M. Sandberg, L. Uppgard, Multivariate Design and Modelling in QSAR, Combinatorial Chemistry, and Bioinformatics, in Molecular Modelling and Prediction of Bioactivity, Eds. K. Gunddertofte, F.S. Jørgensen, Kluwer Academic/Plenum Publishers, New York (2000), p. 27-45

Zaliani, A and Gancia, E: MS-WHIM scores for amino acids: A new 3D-description for peptide QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 1999, 39, 525-533.

Claims

1. A process for characterising the interaction between a Ligand Y and a Target X comprising:

Step 1 Obtaining information representing one or more chemical and/or physical properties of at least two ligands of the type Y; Step 2 Obtaining information representing one or more chemical and/or physical properties of at least two targets of the type X;

Step 3 Obtaining information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X;

and processing the information from Steps 1 , 2 and 3 in order to produce a model of the interaction between the Ligand Y and Target X from which one or more of the properties of the interaction between the Ligand Y and the Target X may be identified and/or characterised.

2. A process for estimating the position of the active site in a Target X in an interaction between a Ligand Y and a Target X, or estimating one or more physical and/or chemical properties of the active site, comprising:

Step 1 Obtaining information representing one or more chemical and/or physical properties of at least two ligands of the type Y;

Step 2 Obtaining information representing one or more chemical and/or physical properties of at least two targets of the type X; Step 3 Obtaining information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X;

and correlating the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the position of the active site or one or more physical and/or chemical properties of the active site in the Target X may be estimated.

3. A process for identifying the position of the active site in an interaction between a Ligand Y and a Target X, or predicting one or more physical and/or chemical properties of the active site, comprising:

Step 1 Obtaining information representing one or more chemical and/or physical properties of at least two ligands of the type Y; Step 2 Obtaining information representing one or more chemical and/or physical properties of at least two targets of the type X; Step 3 Obtaining information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X; Step 4 Correlating the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X;

and using the model to identify the position of the active site or one or more physical and/or chemical properties of the active site.

4. A process performed with the aid of a programmed computer for the estimation of the position of the active site in a Target X, in an interaction between a Ligand Y and a Target X, or one or more physical and/or chemical properties of the active site, comprising the steps of:

Step 1 Inputting information representing one or more chemical and/or physical properties of at least two ligands of the type Y; Step 2 Inputting information representing one or more chemical and or physical properties of at least two targets of the type X; Step 3 Inputting information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X; Step 4 Computing or calculating a model from the inputted information which describes the interaction between the Ligand Y and the Target X; and using the model to estimate the position of the active site, or to estimate one or more physical and/or chemical properties of the active site.

5. A process for assisting in the design of a Ligand Y' which binds to a Target

X, the Ligand Y' having an increased or decreased binding affinity, selectivity or avidity for the Target X compared to that of a Ligand Y, comprising the steps of:

Step 2 Obtaining information representing one or more chemical and/or physical properties of at least two targets of the type X; Step 3 Obtaining information representing one or more chemical and/or physical properties of the interaction between at least two of the ligands of type Y and at least two of the targets of the type X

and correlating the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the structure and/or one or more chemical and/or physical properties of the Ligand Y may be estimated or predicted.

6. A process for estimating or predicting the binding affinity, selectivity or avidity of a Ligand Y' with a Target X, comprising Steps 1, 2 and 3 of claim 1; and correlating the information from Steps 1, 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the binding affinity, selectivity or avidity of the Ligand Y' with the Target X may be estimated or predicted.

7. A process for producing a Ligand Y' which binds to a Target X, the Ligand Y' having an increased or decreased binding affinity, selectivity or avidity for the Target X compared to that of a Ligand Y, comprising Steps 1 , 2 and 3 of claim 1 ; and correlating the information from Steps 1 , 2 and 3 in order to produce a model of the interaction between the Ligand Y and the Target X from which the structure and/or one or more properties of the Ligand Y' may be estimated or predicted; and then producing the Ligand Y' by a method known er se.

8. A process as claimed in any one of claims 1 to 4 from which the region(s) or part(s) or subsequence(s) of the Target X which interact with the Ligand Y can be estimated or predicted from the model.

9. A process as claimed in any one of claims 1 to 4 from which the region(s) or part(s) or subsequence(s) of the Ligand Y which interact with the Target X can be estimated or predicted from the model.

10. A process as claimed in any one of claims 1 to 7 which additionally comprises the step of determining experimentally part or all of the information on the chemical and/or physical properties of at Jeast two targets of type X or part or all of the information on the chemical and/or physical properties of at least two ligands of type Y or part or all of the information on the interaction between the targets of type X and the ligands of type Y.

11. A process as claimed in any one of claims 1 to 10 which additionally comprises producing one or more targets of type X or one or more ligands of type Y.

12. A process as claimed in claim 11 which additionally comprises designing and producing one or more targets of type X or one or more ligands of type Y.

13. A process as claimed in any one of claims 1 to 12 which additionally comprises displaying or presenting part or all of the derived model or a representation thereof.

14. A process as claimed in claim 13 wherein the model is displayed or presented in the form of a table, graph or mathematical function.

15. A process as claimed in any one of claims 1 to 14 which additionally comprises the production of one or more lead compounds.

16. A process as claimed in any one of claims 1 to 14 which additionally comprises the production of one or more outliers.

17. A process as claimed in any one of claims 1 to 16 which additionally comprises the production of a further ligand of type Y with an affinity and/or selectivity for a target of type X.

18. A process as claimed in any one of claims 1 to 17 wherein the derived model is used to design a further target of type X or a further ligand of type Y.

19. A process as claimed in claim 18 wherein the further target of type X or the further ligand of type Y is subsequently produced.

20. A process as claimed in any one of claims 1 to 19 wherein the process is repeated using information on the chemical and/or physical properties of the further target of type X or the further ligand of type Y.

21. A process as claimed in claim 20 wherein the repeated method additionally makes use of information on the interactions of the further target of type X and/or ligand of type Y with one or more of the formerly-used ligands of type Y and/or targets of type X, respectively.

22. A process as claimed in any one of the previous claims, wherein the information on the properties of the targets of type X is derived, at least in part, from regions or parts or subsequences of the targets.

23. A process as claimed in any one of the previous claims, wherein the information on the properties of the ligands of type Y is derived, at least in part, from regions or parts or subsequences of the ligands.

24. A process as claimed in any one of the previous claims wherein the information in Steps 1 , 2 and/or 3 comprises, at least in part, a binary descriptor or the information is represented, at least in part, in binary form.

25. A process as claimed in any one of the previous claims wherein the information comprises, at least in part, a bit vector or the information is represented, at least in part, in bit vector form.

26. A process as claimed in any one of the previous claims wherein the information is represented, at least in part, by Principal Property variables.

27. A process as claimed in claim 26 wherein Principal Component Analysis is used to generate the Principal Property variables.

28. A process as claimed in claim 26 or claim 27 wherein one or more characterstics of amino acids are used as the Principal Properties.

29. A process as claimed in claim 26 wherein the z-scale is used as Principal Properties for amino acids.

30. A process as claimed in any one of the previous claims which additionally comprises generating an unequal number of descriptors for each target of type X and then transforming said unequal numbers of descriptors into an equal numbers of descriptors for each target of type X.

31. A process as claimed in any one of the previous claims which additionally comprises generating an unequal number of descriptors for each ligand of type Y and then transforming said unequal numbers of descriptors into an equal numbers of descriptors for each ligand of type Y.

32. A process as claimed in claim 30 or claim 31 which involves the use of Auto Covariances and/or Auto Cross Covariances (ACC) and/or Auto Correlations.

33. A process as claimed in any one of the previous claims, wherein the model is derived using one or more of multivariate methods, partial least squares methods, neural networks, multiple linear regression, non-linear regression, curve fitting, model fitting, stepwise regression and maximum likelihood methods.

34. A process as claimed in any one of the previous claims, wherein experimental design is applied to the selection, design, manufacture or synthesis of the targets of type X and/or ligands of type Y.

35. A process as claimed in claim 34 wherein the experimental design is directed onto regions of the targets of type X and/or ligands of type Y.

36. A process as claimed in claim 34 or claim 35 wherein the experimental design is directed onto part(s) of the targets of type X and/or ligands of type Y.

37. A process as claimed in any one of the previous claims, which additionally comprises the use of cross-terms.

38. A process as claimed in any one of the previous claims, wherein the information on the properties of the targets of type X and/or the information on the properties of the ligands of type Y is derived from atom counts, measured or calculated thin layer liquid chromatography (TLC), retention times on HPLC, refractive index, isoelectric point, melting point, boiling point, molecular weight, hydrophobicity, hydrophilicity, chromatographic mobility, van der Waals volume, octanol/water partion coefficient (logP), energy of molecular orbital, heat of formation, polarizability, electronegativity, hardness, total accessible molecular surface area, polar accessible molecular surface area, nonpolar accessible molecular surface area, number of hydrogen bond donors, number of hydrogen bond acceptors, charge, IR-spectra, NMR-spectra or other spectra, HOMO, LUMO, connectivity indices, semi-empirical calculations ab inito calculations or 3D quantum mechanical calculations.

39. A process as claimed in any one of the previous claims, wherein the information on the interaction of the targets of type X with ligands of type Y is derived from experiments, the experiment preferably being selected from chemical, physical, biological, molecular biological, physiologcal, microbiological, enzymological, pharmacological and molecular pharmacological experiments.

40. A process as claimed in any one of the previous claims, wherein the information on the chemical and/or physical properties of the targets of type X is derived from at least two different targets of type X, preferably more than 3, even more preferably more than 4, still even more preferably more than 6, still even more preferably more than 9, and most preferably more than 19 different targets, and/or wherein the information on the chemical and/or physical properties of the ligands of type Y is derived from at least one ligand, preferably more than 2, even more preferably more than 3, still even more preferably more than 4, still even more preferably more than 5, and still even more preferably more than 6, preferably at least 9, more preferably at least 11 and most preferably at least 19 different ligands.

41. A process as claimed in any one of the previous claims, wherein the information derived the targets of type X is derived from targets whose molecular weight is larger than 1000 g/mole, preferably larger than 2000 g/mole, larger than 3000 g/mole, larger than 5000 g/mole, larger than 7000 g/mole, larger than 10000 g/mole, larger than 12000 g/mole, larger than 14000 g/mole, larger than 17000 g/mole, larger than 20000 g/mole, larger than 25000 g/mole, and most preferably larger than 30000 g/mole; and/or the molecular weight of the ligands of type Y is within the range 100 - 5000 g/mole, or the molecular weight of the ligands is below 3000g/mole, below 2000g/mole, below lOOOg/mole or preferably below 800g/mole.

42. A process as claimed in any one of the previous claims, wherein the information on the properties of the targets of type X does not include information on the three-dimensional co-ordinates of the atoms of the targets of type X or information on the angles between the atoms of the targets of type X.

43. A process as claimed in any one of the previous claims, wherein the information on the properties of the ligands of type Y does not include information on the three-dimensional co-ordinates of the atoms of the ligands of type Y or information on the angles between the atoms of the ligands of type Y.

44. A process as claimed in any one of the previous claims, wherein the targets of type X are composed of building blocks and/or the targets of type X are composed of composite building blocks.

45. A process as claimed in any one of the previous claims, wherein the ligands of type Y are composed of building blocks and/or the ligands of type Y are composed of composite building blocks.

46. A process as claimed in claim 44 or claim 45, wherein the molecular weight of the building block is less than 10000 g/mole, less than 5000 g/mole, less than 3000 g/mole, less than 2000 g/mole, less than 1500 g/mole, less than 1000 g/mole, less than 600 g/mole, less than 400 g/mole, less than 300 g/mole, less than 200 g/mole or less than 100 g/mole.

47. A process as claimed in any one of claims 44 to 46, wherein the building block is an amino acid residue, anucleotide, a deoxyadenosine 5 '-phosphoric acid, a deoxyguanosine 5 '-phosphoric acid, a deoxycytidine 5 '-phosphoric acid, a deoxythymidine 5 '-phosphoric acid, a deoxyuridine 5 '-phosphoric acid, an organic residue or a sugar residue.

48. A process as claimed in any one of claims 44 to 47, wherein the composite building block is constructed from less than 11, more preferably less than 9, even more preferably less than 6, still even more preferably less than 4, and most preferably less than 3 building blocks and or wherein a composite building block is constructed from 16 or less, 24 or less, or 33 or less of building blocks.

49. A process as claimed in any one of the previous claims, wherein the information on the physical/chemical properties of the target X is derived from one or more building blocks and/or composite building blocks within the target X.

50. A process as claimed in any one of the previous claims, wherein the information on the physical chemical properties of the ligand Y is derived from one or more building blocks and/or composite building blocks within the ligand Y.

51. A process as claimed in any one of the previous claims, wherein the target X has a polymeric structure and/or wherein the ligand Y has a polymeric structure.

52. A method as claimed in any one of the previous claims, wherein the target X has a chimeric structure and/or wherein the ligand Y has a chimeric structure, preferably wherein target X is a chimeric protein/peptide or chimeric DNA molecule and/or ligand Y is a chimeric protein/peptide or a chimeric DNA molecule.

53. A process as claimed in any one of the previous claims, wherein the target X is one or more of synthetic or natural polymeric structures, synthetic or natural cyclic polymeric structures, synthetic or natural branched polymeric structures, peptides, polypeptides, proteins, DNA, RNA, enzymes, ion-channels, receptors, G-protein coupled receptors, tyrosine kinase receptors, serine/threonine kinase receptors, steroid hormone receptors, thyroid hormone receptors, membrane transporters, structural proteins, antibodies or carbohydrates.

54. A process as claimed in any one of the previous claims, wherein the ligand Y is selected from one or more of synthetic or natural polymeric structures, synthetic or natural cyclic polymeric structures, synthetic or natural branched polymeric structures, peptides, polypeptides, proteins, DNA, RNA, organic compounds, organic libraries, enzymes, ion-channels, receptors, G-protein coupled receptors, tyrosine kinase receptors, serine/threonine kinase receptors, steroid hormone receptors, thyroid hormone receptors, membrane transporters, structural proteins, antibodies or carbohydrates.

55. A process as claimed in any one of the previous claims, wherein the information is derived from a target of type X and or a ligand of type Y when it is situated in a viral particle, a cell and/or a multicellular organism.

56. A process as claimed in any one of the previous claims, wherein the information on the physical/chemical properties of the targets of type X and/or the ligands of type Y is derived from one or more building blocks and/or composite building blocks within the macromolecules of type X and/or the molecules of type Y and the information is derived from atom counts, measured or calculated thin layer liquid chromatography (TLC), retention times on HPLC, refractive index, isoelectric point, melting point, boiling point, molecular weight, hydrophobicity, hydrophilicity, chromatographic mobility, van der Waals volume, octanol/water partion coefficient (logP), energy of molecular orbital, heat of formation, polarizability, electronegativity, hardness, total accessible molecular surface area, polar accessible molecular surface area, nonpolar accessible molecular surface area, number of hydrogen bond donors, number of hydrogen bond acceptors, charge, IR-spectra, NMR-spectra or other spectra, HOMO, LUMO, connectivity indices, semi-empirical calculations, ab inito calculations, 3D-quantum mechanical calculations.

57. A process as claimed in any one of the previous claims, wherein the information is derived from the three dimensional structure of one or more of the building blocks and/or the angles between one or more of the atoms in one or more of the building blocks.

58. A process as claimed in any one of the previous claims, wherein the use of angles between atoms in different building blocks is excluded and/or wherein the use of distances between atoms in different building blocks is excluded

59. A process as claimed in any one of the previous claims, wherein the use of the coordinates of the Cα atoms (in three-dimensional space) of a peptide or a protein are excluded, and/or wherein the use of psi and phi angles in a peptide or a protein are excluded.

60. A process as claimed in any one of the previous claims, wherein the use of a pharmacophore model is excluded.

61. A process as claimed in any one of the previous claims, wherein the information is derived from chimeric variations of the targets of type X and/or chimeric variations of the ligands of type Y.

62. A process as claimed in any one of the previous claims, for use in identifying outliers of type X or outliers of type Y.

63. A process as claimed in any one of the previous claims, for use in drug design.

64. A process as claimed in any one of the previous claims, for use in the design or identification of lead compounds.

65. A process as claimed in any one of the previous claims, for use in the design of ligands of type Y with improved affinity and/or selectivity for targets of type X.

66. A process as claimed in any one of the previous claims, for use in protein engineering.

67. A process as claimed in any one of the previous claims, for the design of DNA or RNA molecules.

68. A process as claimed in any one of the previous claims, for the design of artificial targets of type X and or artificial ligands of type Y.

69. A process as claimed in any one of the previous claims, for analysis and/or in the engineering of regions and/or parts of targets of type X and/or ligands of type Y.

70. A process as claimed in any one of the previous claims, for the design of an organic compound, catalyst, pharmaceutical, drug, macromolecule being capable of binding a molecule, peptide, peptidomimetic, protein, enzyme, antibody, molecule, macromolecule, DNA, RNA or a carbohydrate.

71. A process as claimed in any one of the previous claims, for the design of a ligand of type Y being capable of binding a target of type X.

72. A process as claimed in any one of the previous claims, for the design of any one of organic compound, catalyst, pharmaceutical, drug, macromolecule being capable of binding a molecule, peptide, peptidomimetic, protein, enzyme, antibody, molecule and a macromolecule.

73. A lead, organic compound, catalyst, pharmaceutical, drug, macromolecule being capable of binding a molecule, peptide, peptidomimetic, protein, enzyme, antibody, molecule, macromolecule, DNA, RNA, carbohydrate when designed by a process comprising a process as claimed in any one of claims

74. A process as claimed in any one of claims comprising the use of an organic library.

75. A process as claimed in any one of claims operated on or performed with the aid of a digital computer.

76. Computer software specifically adapted to carry out a process as claimed in any one of the previous claims when installed on data processing means.

77. A computer program element comprising computer software code portions for performing a process as claimed in any one claims 1 to 75 when the program element is run on data processing means.

78. A computer software carrier comprising software as claimed in claim 76.

79. A ligand whose structure and/or properties has been estimated or predicted through the use of a process as claimed in any of the claims 1 to 75.

80. Use of a process as claimed in any one of claims 1 to 75 for designing new ligands for known targets and/or for new targets.

81. A process as claimed in any one of claims 1 to 75 wherein the Target X is a

7TM receptor, preferably a melanocortin receptor.

82. A process as claimed in any one of claims 1 to 75 wherein the Ligand Y is any one of the peptides disclosed in any one of Figures 25 to 30.

83. A process as claimed in any one of claims 1 to 75 wherein the ligands of the type Y comprise the set of peptides disclosed in any one of Figures 25 to 30.