WO1992011378A1

WO1992011378A1 - A method of constructing synthetic leader sequences

Info

Publication number: WO1992011378A1
Application number: PCT/DK1991/000396
Authority: WO
Inventors: Lars Christiansen
Original assignee: Novo Nordisk A/S
Priority date: 1990-12-19
Filing date: 1991-12-18
Publication date: 1992-07-09
Also published as: AU9134891A; HU9301801D0; FI932831L; IL100408A0; AU660161B2; EP0563175A1; MX9102684A; KR930703450A; JPH06503957A; HUT68751A; NZ241011A; ZA919932B; DK300090D0; CA2098731A1; CZ119293A3; FI932831A0; PT99848A; SK62593A3; IE914433A1

Abstract

A yeast expression cloning vector comprising the following sequence 5'-SP-Xn-3'-RS-5'-Xm-(NZT)p-Xq-PS-*gene*-3' wherein SP is a DNA sequence encoding a signal peptide, Xn is a DNA sequence encoding n amino acids, wherein n is 0 or an integer of from 1 to about 10 amino acids, RS is a restriction endonuclease recognition site provided at the junction of Xn and Xm, Xm is a DNA sequence encoding m amino acids, wherein m is 0 or an integer from 1 to about 10, (NZT)p is a DNA sequence encoding Asn-Xaa-Thr, wherein p is 0 or 1, Xq is a DNA sequence encoding q amino acids, wherein q is 0 or an integer from 1 to about 10, PS is a DNA sequence encoding a peptide defining a yeast processing site, and *gene* is a DNA sequence encoding a heterologous polypeptide. The vector may be used to construct synthetic leader peptide sequences by inserting random DNA fragments in the 'RS' site, culturing a yeast cell transformed with this vector and screening the culture for secretion of the heterologous polypeptide.

Description

A METHOD OF CONSTRUCTING SYNTHETIC LEADER SEQUENCES

FIELD OF INVENTION The present invention relates to a method of constructing synthetic leader peptide sequences for secreting heterologous polypeptides in yeast, and yeast expression vectors for use in the method. BACKGROUND OF THE INVENTION

Yeast organisms produce a number of proteins which are synthesized intracellularly, but which have a function outside the cell. Such extracellular proteins are referred to as secreted proteins. These secreted proteins are expressed initially inside the cell in a precursor or a pre-protein form containing a presequence ensuring effective direction of the expressed product across the membrane of the endoplasmic reticulum (ER). The presequence, normally named a signal peptide, is generally cleaved off from the desired product during translocation. Once entered in the secretory pathway, the protein is transported to the Golgi apparatus. From the Golgi the protein can follow different routes that lead to compartments such as the cell vacuole or the cell membrane, or it can be routed out of the cell to be secreted to the external medium (Pfeffer, S.R. and Rothman, J.E. Ann.Rev. Biochem. 56 (1987), 829-852).

Several approaches have been suggested for the expression and secretion in yeast of proteins heterologous to yeast. European published patent application No. 88 632 describes a process by which proteins heterologous to yeast are expressed, processed and secreted by transforming a yeast organism with an expression vehicle harbouring DNA encoding the desired protein and a signal peptide, preparing a culture of the transformed organism, growing the culture and recovering the protein from the culture medium. The signal peptide may be the signal peptide of the desired protein itself, a heterologous signal peptide or a hybrid of native and heterologous signal peptide.

A problem encountered with the use of signal peptides heterologous to yeast might be that the heterologous signal peptide does not ensure efficient translocation and/or cleavage after the signal peptide.

The S. cerevisiae MFα1 (α-factor) is synthesized as a prepro form of 165 amino acids comprising signal-or prepeptide of 19 amino acids followed by a "leader" or propeptide of 64 amino aicds, encompassing three N-linked glycosylation sites followed by (LysArg(Asp/Glu, Ala)_2-3α-factor)₄ (Kurjan, J. and Herskowitz, I. Cell 30 (1982), 933-943). The signal-leader part of the preproMFα1 has been widely employed to obtain synthesis and secretion of heterologous proteins in S. cerivisiae.

Use of signal/leader peptides homologous to yeast is known from i.a. US patent specification No. 4,546,082, European published patent applications Nos. 116 201, 123 294, 123 544, 163 529, and 123 289 and DK patent application No. 3614/83.

In EP 123 289 utilization of the S. cerevisiae a-factor precursor is described whereas WO 84/01153 indicates utilization of the Saccharomyces cerevisiae invertase signal peptide and DK 3614/83 utilization of the Saccharomyces cerevisiae PH05 signal peptide for secretion of foreign proteins.

US patent specification No. 4,546,082, EP 16201, 123 294, 123 544, and 163 529 describe processes by which the α-factor signal-leader from Saccharomyces cerevisiae (MFαl or MFα2) is utilized in the secretion process of expressed heterologous proteins in yeast. By fusing a DNA sequence encoding the S. cerevisiea MFα1 signal/leader sequence at the 5' end of the gene for the desired protein secretion and processing of the desired protein was demonstrated. EP 206783 discloses a system for the secretion of polypeptides from S. cerevisiae whereby the α-factor leader sequence has been truncated to eliminate the four α-factor peptides present on the native leader sequence so as to leave the leader peptide itself fused to a heterologous polypeptide via the α-factor processing site LysArgGluAlaGluAla. This construction is indicated to lead to an efficient process of smaller peptides (less than 50 amino acids). For the secretion and processing of larger polypeptides, the native α-factor leader sequence has been truncated to leave one or two α-factor peptides between the leader peptide and the polypeptide.

A number of secreted proteins are routed so as to be exposed to a proteolytic processing system which can cleave the peptide bond at the carboxy end of two consecutive basic amino acids. This enzymatic activity is in S. cerevisiae encoded by the KEX 2 gene (Julius, D.A. et al., Cell 37 (1984b), 1075). Processing of the product by the KEX 2 gene product is needed for the secretion of active S. cerevisiae mating factor α1 (MFα1 or α-factor) but is not involved in the secretion of active S. cerevisiae mating factor a.

Secretion and correct processing of a polypeptide intended to be secreted is obtained in some cases when culturing a yeast organism which is transformed with a vector constructed as indicated in the references given above. In many cases, however, the level of secretion is very low or there is no secretion, or the proteolytic processing may be incorrect or incomplete. It is therefore the object of the present invention to provide leader peptides which ensure a more efficient expression and/or processing of heterologous polypeptides.

SUMMARY OF THE INVENTION It has surprisingly been found possible to replace the α-factor leader peptide by a variety of different DNA sequences, thereby obtaining secretion of a heterologous polypeptide in yeast. Based on this observation, a method has been developed by which random DNA fragments are cloned into yeast vectors downstream of a DNA sequence coding for a signal peptide and upstream of a DNA sequence coding for a heterologous polypeptide. After transformation with the vectors, yeast cells are screened for secretion of the heterologous polypeptide in question.

More specifically, the present invention relates to a method of constructing a synthetic leader peptide sequence for secreting heterologous polypeptides in yeast, the method comprising

(a) inserting a random DNA fragment into a yeast expression vector comprising the following sequence

5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' wherein SP is a DNA sequence encoding a signal peptide,

X_n is a DNA sequence encoding n amino acids, wherein n is O or an integer of from 1 to about 10 amino acids,

RS is a restriction endonuclease recognition site for insertion of random DNA fragments, which site is provided at the junction of X_n and X_m,

X_m is a DNA sequence encoding m amino acids, wherein m is O or an integer from 1 to about 10,

(NZT) is a DNA sequence encoding Asn-Xaa-Thr, wherein p is O or 1,

X is a DNA sequence encoding q amino acids, wherein q is O or an integer from 1 to about 10,

PS is a DNA sequence encoding a peptide defining a yeast processing site, and

*gene* is a DNA sequence encoding a heterologous polypeptide;

(b) transforming a yeast host cell with the expression vector of step (a);

(c) culturing the transformed host cell of step (b) under appropriate conditions; and

(d) screening the culture of step (c) for secretion of the heterologous polypeptide.

In the present context, the expression "leader peptide" is understood to indicate a peptide whose function is to allow the heterologous polypeptide to be directed from the endoplasmic reticulum to the Golgi apparatus and further to a secretory veside for secretion into the medium, (i.e. exportation of the expressed polypeptide across the cell wall or at least through the cellular membrane into the periplasmic space of the cell). The term "synthetic" used in connection with leader peptides is intended to indicate that the leader peptide constructed by the present method is one not found in nature.

The term "signal peptide" is understood to mean a presequence which is predominantly hydrophobic in nature and present as an N-terminal sequence of the precursor form of an extracellular protein expressed in yeast. The function of the signal peptide is to allow the heterologous protein to be secreted to enter the endoplasmic reticulum. The signal peptide is normally cleaved off in the course of this process. The signal peptide may be heterologous or homologous to the yeast organism producing the protein but, as explained above, a more efficient cleavage of the signal peptide may be obtained when it is homologous to the yeast organism in question.

The expression "heterologous polypeptide" is intended to indicate a polypeptide which is not produced by the host yeast organism in nature. In the method of the invention, the heterologous polypeptide is preferably one the secretion of which by transformed yeast cells may easily be detected, e.g. by established standard methods such as by immunological screening by means of antibodies reactive with the polypeptide in question (cf. for instance Sambrook, Fritsch and Maniatis, Molecular Cloning; A Laboratory Manual, Cold Spring Harbor, New York, 1989) or by screening for a specific biological activity of the heterologous polypeptide. A positive result of the screening indicates that a leader peptide useful for the secretion of heterologous polypeptides in yeast has been constructed.

The expression "a random DNA fragment" is intended to indicate any sequence of DNA at least 3 nucleotides in length, for instance obtained by digesting genomic DNA (of any organism) with restriction endonuclease(s) or by preparing synthetic DNA, e.g. by the phosphoamidite method described by S.L. Beaucage and M.H. Caruthers, Tetrahedron Letters 22, 1981, pp. 1859- 1869. The peptide Asn-Xaa-Thr encoded by "(NZT)_p" is an asparaginelinked glycosylation site. "Xaa" denotes any one of the known amino acids except Pro.

In another aspect, the present invention relates to a yeast expression cloning vector comprising the following sequence

5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' wherein SP, X_n, RS, X_m, (NZT)_p, X_q, PS and *gene* are as defined above.

This vector may be used in the construction of leader peptide sequences according to the method described above. In a further aspect, the present invention relates to a yeast expression vector comprising the following sequence

5'-SP-X_n-ranDNA-X_m-(NZT)_p-X_q-PS-*gene*-3' wherein SP, X_n, X_m, (NZT) , X_q, PS and *gene* are as defined above, and ranDNA is a random DNA fragment inserted in a restriction endonuclease recognition site provided at the junction of X_n and X_m.

In this vector, the leader peptide sequence (once identified by the method of the invention) will be composed of the sequence X_n-ranDNA-X_m-(NZT)_p-X_q. Such a vector may be used in the production of a heterologous polypeptide of interest.

In a still further aspect, the present invention relates to a process for producing a heterologous polypeptide in yeast, the process comprising culturing a yeast cell, which is capable of expressing a heterologous polypeptide and which is transformed with a yeast expression vector as described above including a leader peptide sequence constructed by the method of the invention, in a suitable medium to obtain expression and secretion of the heterologous polypeptide, after which the heterologous polypeptide is recovered from the medium.

DETAILED DISCLOSURE OF THE INVENTION The length of the random DNA fragment inserted in the expression vector is not particularly critical. However, in order to be of a manageable length, the fragment preferably has a length of from 16 to about 600 base pairs. More preferably, the fragment has a length of from about 15 to about 300 base pairs. It is at present considered that a suitable length of the fragment is from about 30 to about 150 base pairs.

The random DNA fragment preferably encodes a high proportion of polar amino acids. These are selected from the group consisting of Glu, Asp, Lys, Arg, His, Thr, Ser, Asn and Gin. In the present context, the term "a high proportion of" is understood to indicate that the DNA fragment encodes a larger number of polar amino acids than do other DNA sequences of a corresponding length. Independently hereof, or in addition hereto, it may be advantageous that the fragment encodes at least one proline. In the sequence 5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3', n and/or m and/or q are preferably ≥1. In particular, all of n, m and q are ≥1.

There is some evidence (cf. WO 89/02463) to support that the presence of an asparagine-linked glycosylation site in the leader sequence may confer a higher secretion efficiency to the leader peptide. In the sequence 5'-SP-X_n-3'-RS-5' -X_m- (NZT) _p-X_q- PS-*gene*-3', p is therefore preferably 1.

The signal peptide sequence (SP) may encode any signal peptide which ensures an effective direction of the expressed heterologous polypeptide into the secretory pathway of the cell. The signal peptide may be a naturally occurring signal peptide or functional parts thereof, or it may be a synthetic peptide. Suitable signal peptides have been found to be the αfactor signal peptide, the signal peptide of mouse salivary amylase, a modified carboxypeptidase signal peptide, the yeast BAR1 signal peptide or the Humicola lanuginosa lipase signal peptide, or a derivative thereof. The mouse salivary amylase signal sequence is described by 0. Hagenbüchle et al., Nature 289, 1981, pp. 643-646. The carboxypeptidase signal sequence is described by L.A. Valls et al., Cell 48, 1987, pp. 887-897. The BAR1 signal peptide is disclosed in WO 87/02670. The H. lanuginosa lipase signal peptide is disclosed in EP 305 216.

The yeast processing site encoded by the DNA sequence PS may suitably be any paired combination of Lys and Arg, such as Lys-Arg, Arg-Lys, Lys-Lys or Arg-Arg, which permits processing of the heterologous polypeptide by the KEX2 protease of Saccharomyces cerevisiae or the equivalent protease in other yeast species (D.A. Julius et al., Cell 37, 1984, 1075 ff.). If KEX2 processing is not convenient, e.g. if it would lead to cleavage of the polypeptide product, a processing site for another protease may be selected instead comprising an amino acid combination which is not found in the polypeptide product, e.g. the processing site for FX_a, Ile-Glu-Gly-Arg (cf. Sambrook, Fritsch and Maniatis, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989).

The heterologous protein produced by the method of the invention may be any protein which may advantageously be produced in yeast. Examples of such proteins are aprotinin, tissue factor pathway inhibitor or other protease inhibitors, insulin or insulin precursors, human or bovine growth hormone, interleukin, glucagon, tissue plasminogen activator, transforming growth factor α or β , platelet-derived growth factor, enzymes, or a functional analogue thereof. In the present context, the term "functional analogue" is meant to indicate a polypeptide with a similar function as the native protein (this is intended to be understood as relating to the nature rather than the level of biological activity of the native protein). The polypeptide may be structurally similar to the native protein and may be derived from the native protein by addition of one or more amino acids to either or both the C- and N-terminal end of the native protein, substitution of one or more amino acids at one or a number of different sites in the native amino acid sequence, deletion of one or more amino acids at either or both ends of the native protein or at one or several sites in the amino acid sequence, or insertion of one or more amino acids at one or more sites in the native amino acid sequence. Such modifications are well known for several of the proteins mentioned above.

The random DNA fragment and the sequence 5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' may be prepared synthetically by established standard methods, e.g. the phosphoamidite method described by S.L. Beaucage and M.H. Caruthers, Tetrahedron Letters 22, 1981, pp. 1859-1869, or the method described by Matthes et al., EMBO Journal 3, 1984, pp. 801-805. According to the phosphoamidite method, oligonucleotides are synthesized, e.g. in an automatic DNA synthesizer, purified, annealed, ligated and cloned into the yeast expression vector. It should be noted that the sequence 5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS- *gene*-3' need not be prepared in a single operation, but may be assembled from two or more oligonucleotides prepared synthetically in this fashion. The random DNA fragment or one or more parts of the sequence 5l-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' may also be of genomic or cDNA origin, for instance obtained by preparing a genomic or cDNA library and screening for DNA sequences coding for said parts (typically SP or *gene*) by hybridization using synthetic oligonucleotide probes in accordance with standard techniques (cf. Sambrook, Fritsch and Maniatis, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989). In this case, a genomic or cDNA sequence encoding a signal peptide may be joined to a genomic or cDNA sequence encoding the heterologous protein, after which the DNA sequence may be modified by the insertion of synthetic oligonucleotides encoding the sequence X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS in accordance with well-known procedures. Finally, the random DNA fragment and/ or the sequence 5'-SP-X_n- 3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' may be of mixed synthetic and genomic, mixed synthetic and cDNA or mixed genomic and cDNA origin prepared by annealing fragments of synthetic, genomic or cDNA origin (as appropriate), the fragments corresponding to various parts of the entire DNA sequence, in accordance with standard techniques. Thus, it may be envisaged that the DNA sequence encoding the signal peptide or the heterologous polypeptide may be of genomic or cDNA origin, while the sequence X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS may be prepared synthetically.

Preferred DNA constructs encoding insulin precursors are as shown in Sequence Listings ID Nos. 1-13, or suitable modifications thereof. Examples of suitable modifications of the DNA sequence are nucleotide substitutions which do not give rise to another amino acid sequence of the protein, but which may correspond to the codon usage of the yeast organism into which the DNA construct is inserted or nucleotide substitutions which do give rise to a different amino acid sequence and therefore, possibly, a different protein structure. Other examples of possible modifications are insertion of three or multiples of three nucleotides into the sequence, addition of three or multiples of three nucleotides at either end of the sequence and deletion of three or multiples of three nucleotides at either end of or within the sequence. The recombinant expression vector carrying the sequence 5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' or 5'-SP-X_n-ranDNA-X_m- (NZT)_p-X_q-PS-*gene*-3' may be any vector which is capable of replicating in yeast organisms. In the vector, either DNA sequence should be operably connected to a suitable promoter sequence. The promoter may be any DNA sequence which shows transcriptional activity in yeast and may be derived from genes encoding proteins either homologous or heterologous to yeast. The promoter is preferably derived from a gene encoding a protein homologous to yeast. Examples of suitable promoters are the Saccharomyces cerevisiae MFα1, TPI, ADH or PGK promoters.

The sequences shown above should also be operably connected to a suitable terminator, e.g. the TPI terminator (cf. T. Alber and G. Kawasaki, J. Mol. Appl. Genet. 1, 1982, pp. 419-434).

The recombinant expression vector of the invention further comprises a DNA sequence enabling the vector to replicate in yeast. Examples of such sequences are the yeast plasmid 2μ replication genes REP 1-3 and origin of replication. The vector may also comprise a selectable marker, e.g. the Schizosaccharomyces pombe TPI gene as described by P.R. Russell, Gene 40, 1985, pp. 125-130.

The procedures used to ligate the sequence 5'-SP-X_n-3'-RS-5'*X_m-(NZT)_p-X_q-PS-*gene*-3', the random DNA fragment, the promoter and the terminator, respectively, and to insert them into suitable yeast vectors containing the information necessary for yeast replication, are well known to persons skilled in the art

(cf., for instance, Sambrook, Fritsch and Maniatis, op.cit.),

It will be understood that the vector may be constructed either by first preparing a DNA construct containing the entire sequence 5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' and subsequently inserting this fragment into a suitable expression vector, or by sequentially inserting DNA fragments containing genetic information for the individual elements (such as the signal peptide, the sequence X_n-3'-RS-5'-X_m-(NZT)_p-X_q or the heterologous polypeptide) followed by ligation.

The yeast organism used in the method of the invention may be any suitable yeast organism which, on cultivation, produces large amounts of the heterologous polypeptide in question. Examples of suitable yeast organisms may be strains of the yeast species Saccharomyces cerevisiae, Saccharomyces kluyveri, Schizosaccharomyces pombe or Saccharomyces uvarum. The transformation of the yeast cells may for instance be effected by protoplast formation followed by transformation in a manner known per se. The medium used to cultivate the cells may be any conventional medium suitable for growing yeast organisms. The secreted heterologous protein, a significant proportion of which will be present in the medium in correctly processed form, may be recovered from the medium by conventional procedures including separating the yeast cells from the medium by centrifugation or filtration, precipitating the proteinaceous components of the supernatant or filtrate by means of a salt, e.g. ammonium sulphate, followed by purification by a variety of chromatographic procedures, e.g. ion exchange chromatography, affinity chromatography, or the like.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is further illustrated with reference to the appended drawings wherein

Fig. 1 schematically shows the construction of pMT742δ; Fig. 2 schematically shows the construction of pLaC202;

Fig. 3 shows the DNA sequence and derived amino acid sequence at the cloning site in pLaC202 for random DNA fragments (it should be noted that the sequence is cleaved in the unique ClaI site and that ligation without insertion of random DNA will lead to a change in the reading frame);

Fig. 4 schematically shows the construction of pLSC6315D#;

The invention is further described in the following examples which are not to be construed as limiting the scope of the invention as claimed.

EXAMPLES

Plasmids and DNA materials All expression plasmids are of the C-POT type. Such plasmids are described in EP patent application No. 171 142 and are characterized in containing the Schizosaccharomyces pombe triose phosphate isomerase gene (POT) for the purpose of plasmid selection and stabilization. A plasmid containing the POT-gene is available from a deposited E. coli strain (ATCC 39685). The plasmids furthermore contain the S. cerevisiae triose phosphate isomerase promoter and terminator (P_TPIand T_TPI). They are identical to pMT742 (M. Egel-Mitani et al., Gene 73, 1988, pp. 113-120) (see fig. 1) except for the region defined by the Sph-XbaI restriction sites encompassing the P_TPI and the coding region for signal/leader/product.

The P_TPI has been modified with respect to the sequence found in pMT742, only in order to facilitate construction work. An internal SphI restriction site has been eliminated by SphI cleavage, removel of single stranded tails and religation. Furthermore, DNA sequences, upstream to and without any impact on the promoter, have been removed by Bal31 exonuclease treatment followed by addition of an SphI restriction site linker. This promoter construction present on a 373 bp SphI- EcoRI fragment is designated P_TPIδ and when used in plasmids already described this promoter modification is indicated by the addition of a δ to the plasmid name, e.g. pMT7425 (fig. 1).

The assembly of various DNA fragments have occasionally taken place in a smaller E. coli plasmid of the pT7 type previously described (cf. WO 89/02463) only modified with respect to the P_TPI as described above, i.e. pT7δ. For random cloning described below, genomic DNA of various origins have been employed. S. cerevisiae DNA was isolated from strain MT633 (deposited on 7 December 1990 in the Deutsche Sammlung von Mikroorganismen und Zellkulturen under the provisions of the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for the Purpose of Patent Procedure with the deposit number DSM 6278). A. oryzae DNA was isolated from strain A1560 (IFO 4177).

Finally a number of synthetic DNA fragments have been employed all of which were synthesized on an automatic DNA synthesizer (Applied Biosystems model 380A) using phosphoramidite chemistry and commercially available reagents (S.L. Beaucage and M.H. Caruthers (1981) Tetrahedron Letters 22, 1859-1869). The oligonucleotides were purified by polyacrylamide gel electrophoresis under denaturing conditions. Prior to annealing complementary pairs of such DNA single strands these were kinased by T4 polynucleotide kinase and ATP.

All other methods and materials used are common state of the art knowledge (J. Sambrook et al., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press) Cold Spring Harbor, N.Y. 1989). Example 1

Construction of pLaC202 The 490 bp SphI-ApaI of pT7 1965 (cf. WO 89/02463, fig. 5) and the 179 bp HinfI-XbaI fragment of pT7.αMI3 (cf. WO 89/02463, fig. 1) joined in the 11 kb XbaI-SphI fragment of pMT742 via the synthetic adaptor: NOR367/373: CAACCATCGATAACACCACTTTGGCTAAGAG

CCGGGTTGGTAGCTATTGTGGTGAAACCGATTCTCTAA resulting in the plasmid pLaC202 (fig. 2 and 3 as well as Sequence Listing ID No. 1).

This vector containing a unique Clal site constitutes one embodiment of the random DNA cloning vector in which the product gene codes for the insulin precursor MI3 (B(1-29)-Ala-Ala-Lys-A(l-21)). The following examples concerns the leaders cloned via this construct.

Example 2 Construction of pLSC6315 and pLSC5210

Total DNA was isolated from S. cerevisiae strain MT663, and digested by TaqI, HinPI or TaqI + HinP I. The digests were separated according to size on a 1% agarose gel, and fragments smaller than 600 bp were isolated from each of the three digestions. pLaC202, previously digested with ClaI, prevented from self ligation with Calf Intestine Alkaline Phosphatase (CIAP), dephosphorylation, was mixed with the fragment pools described above and ligated. E. coli strain MT172 (MT172 = MC 1000 m⁺r- ara⁺ leuB-6; MC 1000 (cf. M. Casadaban and S. Cohen, J.Mol.Biol. 138, 1980, p. 179)) was transformed with above ligation mixture, and appr. 5000 Ap^R transformants for each mixture were obtained. Recombinant plasmids were prepared from each of the three types in pools encompassing all 5000 transformants. These plasmid pools were used to transform S. cerevisiae strain MT663 and the resulting TPI transformants were immunoscreened for MI3 secretion.

Among the surprisingly large number of positive transformants the eight apparently most efficient were reisolated and the plasmid content isolated therefrom.

As a result of this procedure, it is expected that most of the yeast transformant obtained have a heterogeneous population of plasmids and to obtain true clones, a step of plasmid reisolation was therefore performed. The plasmid preparations from each of the eight reisolated yeast transformants were used to transform E. coli strain MT172 to Ap^R. Plasmids from 12 E. coli transformants for each of the eight yeast isolates, were individually used to transform yeast strain MT663, TPI, and MI3 secreting transformants were identified by immunoscreening.

Sequencing of the inserts of the eight isolated pLaC202 derivatives showed three different sequences, two of which, pLSC6315 and pLSC5210, most efficiently support MI3 secretion. The sequences of the cloned DNA and flanking regions are shown in Sequence Listings ID Nos. 2 and 4, respectively.

Example 3

Modifications of pLSC6315 pLSC6315 was chosen for further modification of the cloned synthetic leader sequence. pLSC6315 was digested with the ApaI endonuclease followed by treatment with the exonuclease Bal31. After phenol extraction the resulting DNA was digested with XbaI and DNA fragments smaller than the original 367 bp Apal-XbaI fragment, were isolated. pLaC202 was digested with ClaI, and the single stranded CG tails generated were removed, followed by XbaI digestion and isolation of the 11 Kb XbaI-]ClaI[ fragment ("] [" indicates that the single-stranded tails have been trimmed off). This fragment was mixed with the pLSC6415 fragments isolated above and ligated (fig. 6).

The transformation and screening procedure described in example 2 was repeated, and pLSC6315D3 and pLSC6315D7 were isolated as plasmids supporting MI3 secretion more efficiently than the original pLSC6315 (cf. Sequence Listings ID Nos. 6 and 8, respectively). xample 4

Construction of pLAO1 - 5

Total DNA from Aspergillus oryzae strain A1560 was treated as previously described for S. cerevisiae DNA in example 2, and cloning and recloning was performed exactly as described in example 2, except that the number of E. coli transformants in the first cloning was reduced to approximately 3000 per ligation mixture. This experiment resulted in the isolation of five clones of A. oryzae DNA which in the pLaC202 context mediates secretion of the insulin precursor MI3, from S. cerevisiae.

Sequencing of the inserts showed 5 different sequences, two of which (pLAO2 and pLAO5) are more efficient where MI3 secretion is concerned. The sequences of the DNA inserts in pLAO2 and pLAO5 are shown, together with the flanking regions, in Sequence Listings ID Nos. 10 and 12, respectively.

Example 5 Yeast strains harbouring plasmids as described above, were grown in YPD medium (Sherman, F. et al., Methods in Yeast Genetics, Cold Spring Harbor Laboratory 1981). For each strain 6 individual 5 ml cultures were shaken at 30ºC for 60 hours, with a final OD₆₀₀ of approx. 15. After centrifugation the supernatant was removed for HPLC analysis by which method the concentration of secreted insulin precursor was measured by a method described by Leo Snel et al. (1987) Chromatographia 24, 329-332. In table I the expression levels of insulin precursor, MI3, by use of leader sequences isolated according to the present invention, are given as a percentage of the level obtained with transformants of pMT742δ,, utilizing the MFα(1) leader of S . cerevisiae.

Table I pMT742 100%

pLSC6315 100%

pLSC5210 60%

pLSC6315D3 175%

pLSC6315D7 120%

pLAO2 120%

pLAO5 60% SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT: Novo Nordisk A/S

(ii) TITLE OF INVENTION: A Method of Constructing Synthetic

Leader Sequences

(iii) NUMBER OF SEQUENCES: 13

(iv) CORRESPONDENCE ADDRESS:

(A) ADDRESSEE: Novo Nordisk A/S, Patent Department

(B) STREET: Novo Alle

(C) CITY: Bagsvaerd

(E) COUNTRY: Denmark

(F) ZIP: DK-2880

(v) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25

(vi) CURRENT APPLICATION DATA:

(A) APPLICATION NUMBER:

(B) FILING DATE:

(C) CLASSIFICATION:

(viii) ATTORNEY/AGENT INFORMATION:

(A) NAME: Thalsoe-Madsen, Birgit

(C) REFERENCE/DOCKET NUMBER: 3540.204-WO

(ix) TELECOMMUNICATION INFORMATION:

(A) TELEPHONE: +45 4444 8888

(B) TELEFAX: +45 4449 3256

(C) TELEX: 37304

( 2 ) INFORMATION FOR SEQ ID NO : 1 :

( i) SEQUENCE CHARACTERISTICS :

(A) LENGTH : 335 base pairs

(B) TYPE : nucleic acid

( C) STRANDEDNESS : single

(D) TOPOLOGY : linear

( ii ) MOLECULE TYPE : cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:

GAATTCATTC AAGAATACTT CAAACAAGAA GATTACAAAC TATCAATTTC ATACACAATA 60

TAAACGATTA AAAGAATGAA AGTCTTCCTG CTGCTTTCCC TCATTGGATT CTGCTGGGCC 120 CAACCATCGA TAACACCACT TTGGCTAAGA GATTCGTTAA CCAACACTTG TGCGGTTCCC 180 ACTTGGTTGA AGCTTTGTAC TTGGTTTGCG GTGAAAGAGG TTTCTTCTAC ACTCCTAAGG 240 CTGCTAAGGG TATTGTCGAA CAATGCTGTA CCTCCATCTG CTCCTTCTAC CAATTGGAAA 300

ACTACTGCAA CTAGACGCAG CCCGCAGGCT CTAGA 335

(2) INFORMATION FOR SEQ ID NO: 2:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 492 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 76. .468

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCATION: 76..309

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCATION: 310..468

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:

GAATTCATTC AAGAATAGTT CAAACAAGAA GATTACAAAC TATCAATTTC ATACACAATA 60

TAAACGATTA AAAGA ATG AAA GTC TTC CTG CTG GET TCC CTC ATT GGA TTC 111

Mat Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe

-78 -75 -70

TGC TGG GCC CAA CCA TCG ATA GAT GGA ACA CAT TTT CCG AAC AAC AAT 159 Cys Trp Ala Gln Pro Ser Ile Asp Gly Thr His Phe Pro Asn Asn Asn

-65 -60 -55

GTC CCA ATA GAC ACA AGA AAA GAA GGA CTA CAG CAT GAT TAC GAT ACA 207 Val Pro Ile Asp Thr Arg Lys Glu Gly Leu Gln His Asp Tyr Asp Thr

-50 -45 -40 -35

GAA ATT TTG GAG CAC ATT GGA AGC GAT GAG TTA ATT TTG AAT GAA GAG 255 Glu Ile Leu Glu His Ile Gly Ser Asp Glu Leu Ile Leu Asn Glu Glu

-30 -25 -20

TAT GTT ATT GAA AGA ACT TTG CAA GCC ATC GAT AAC ACC ACT TTG GCT 303 Tyr Val Ile Glu Arg Thr Leu Gln Ala Ile Asp Asn Thr Thr Leu Ala

-15 -10 -5

AAG AGA TTC GTT AAC CAA CAC TTG TGC GGT TCC CAC TTG GTT GAA GCT 351 Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala

1 5 10 TTG TAC TTG GTT TGC GGT GAA AGA GCT TTC TTC TAC ACT CCT AAG GCT 399 Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala

15 20 25 30

GCT AAG GGT ATT GTC GAA CAA TGC TGT ACC TCC ATC TGC TCC TTG TAC 447 Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr

35 40 45

CAA TTG GAA AAC TAC TGC AAC TAGAOGCAGC COGCAGGCTC TAGA 492 Gln Leu Glu Asn Tyr Cys Asn

50

(2) INFORMATION FOR SEQ ID NO: 3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 131 amino acids

(B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe Cys Trp Ala Gln

-78 -75 -70 -65

Pro Ser Ile Asp Gly Thr His Phe Pro Asn Asn Asn Val Pro Ile Asp

-60 -55 -50

Thr Arg Lys Glu Gly Leu Gln His Asp Tyr Asp Thr Glu Ile Leu Glu

-45 -40 -35

His Ile Gly Ser Asp Glu Leu Ile Leu Asn Glu Glu Tyr Val Ile Glu

-30 -25 -20 -15

Arg Thr Leu Gln Ala Ile Asp Asn Thr Thr Leu Ala Lys Arg Phe Val

-10 -5 1

Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr Leu Val

5 10 15

Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala Ala Lys Gly Ile

20 25 30

Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr Gln Leu Glu Asn

35 40 45 50

Tyr Cys Asn

(2) INFORMATION FOR SEQ ID NO:4:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 420 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 76..396

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCATION: 76..237

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCAITON: 238. .396

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:

GAATTCATTC AAGAATAGIT CAAACAAGAA GATTACAAAC TATCAAITTC ATACACAATA 60

TAAACGATTA AAAGA ATG AAA GTC TTC CTG CTG CTT TCC CTC ATT GGA TTC 111

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe

-54 -50 -45

TGC TGG GCC CAA CCA TOG CTA TTG GAG TCA CTT ACG CTC CTT GAT GTT 159 Cys Trp Ala Gln Pro Ser Leu Leu Glu Ser Leu Thr Leu Val Asp Val

-40 -35 -30

GAC GCA CTG TOG GAT ATT GAT CTA CTT GTT GAG TCT GAA ACG CTT GTG 207 Asp Ala Leu Ser Asp Ile Asp Val Leu Val Glu Ser Glu Thr Leu Val

-25 -20 -15

CTT GTC GAT AAC ACC ACT TTG GCT AAG AGA TTC GTT AAC CAA CAC TTG 255 Leu Val Asp Asn Thr Thr Leu Ala Lys Arg Phe Val Asn Gln His Leu

-10 -5 1 5

TGC GGT TCC CAC TTG GTT GAA GCT TTG TAC TTG GTT TGC GGT GAA AGA 303 Cys Gly Ser His Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg

10 15 20

GGT TTC TTC TAC ACT CCT AAG GCT GCT AAG GGT ATT GTC GAA CAA TGC 351 Gly Phe Phe Tyr Thr Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys

25 30 35

TGT ACC TCC ATC TGC TCC TTG TAC CAA TTG GAA AAC TAC TGC AAC 396

Cys Thr Ser Ile Cys Ser Leu Tyr Gln Leu Glu Asn Tyr Cys Asn

40 45 50

TAGACGCAGC CCGCAGGCTC TAGA 420

(2) INFORMATION FOR SEQ ID NO:5:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 107 amino acids

(B) TYPE: amino acid

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe Cys Trp Ala Gln

-54 -50 -45 -40

Pro Ser Leu Leu Glu Ser Leu Thr Leu Val Asp Val Asp Ala Leu Ser

-35 -30 -25

Asp Ile Asp Val Leu Val Glu Ser Glu Thr Leu Val Leu Val Asp Asn

-20 -15 -10

Thr Thr Lau Ala Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His

-5 1 5 10

Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr

15 20 25

Thr Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile

30 35 40

Cys Ser Leu Tyr Gln Leu Glu Asn Tyr Cys Asn

45 50

(2) INFORMATION FOR SEQ ID NO: 6:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 453 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 76..420

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCATION: 76..270

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCATION: 271. .420

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:

GAATTCATTC AAGAATAGTT CAAACAAGAA GATTACAAAC TATCAATTTC ATACACAATA 60

TAAACGATTA AAAGA ATG AM GTC TTC CTG CTG CTT TCC CTC ATT GGA TTC 111

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe

-65 -60 -55 TGC TGG GCC CAA CCA ATA GAC ACA AGA AAA GAA GGA CTA CAG CAT GAT 159 Cys Trp Ala Gln Pro Ile Asp Thr Arg Lys Glu Gly Leu Gln His Asp

-50 -45 -40

TAC GAT ACA GAA ATT TTG GAG CAC ATT GGA AGC GAT GAG TEA ACC COG 207 Tyr Asp Thr Glu Ile Leu Glu His Ile Gly Ser Asp Glu Leu Thr Pro

-35 -30 -25

AAT GAA GAG TAT GTT ATT GAA AGA ACT TTG CAA GCC ATC GAT AAC ACC 255 Asn Glu Glu Tyr Val Ile Glu Arg Thr Leu Gln Ala Ile Asp Asn Thr

-20 -15 -10

ACT TTG GCT AAG AGA TTC GTT AAC CAA CAC TTG TGC GCT TCC CAC TTG 303 Thr Leu Ala Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His Leu

-5 1 5 10

GTT GAA GCT TIG TAC TTG GTT TGC GGT GAA AGA GGT TTC TTC TAC ACT 351 Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr

15 20 25

CCT AAG GCT GCT AAG GGT ATT GTC GAA CAA TGC TGT ACC TCC ATC TGC 399 Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys

30 35 40

TCC TTG TAC CAA TTG GAA AAC TACTGCAACT AGACGCAGCC CGCAGGCTCT 450

Ser Leu Tyr Gln Leu Glu Asn

45 50

AGA 453

(2) INFORMATION FOR SEQ ID NO:7:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 115 amino acids

(B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe Cys Trp Ala Gln

-65 -60 -55 -50

Pro Ile Asp Thr Arg Lys Glu Gly Leu Gln His Asp Tyr Asp Thr Glu

-45 -40 -35 Ile Leu Glu His Ile Gly Ser Asp Glu Leu Thr Pro Asn Glu Glu Tyr

-30 -25 -20

Val Ile Glu Arg Thr Leu Gln Ala Ile Asp Asn Thr Thr Leu Ala lys

-15 -10 -5

Arg Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala Leu

1 5 10 15 Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala Ala

20 25 30

Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu Tyr Gln

35 40 45

Leu Glu Asn

50

(2) INFORMATION FOR SEQ ID NO:8:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 459 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: CDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 76. .435

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCAITON: 76..276

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCATION: 277..435

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:

GAATTCATTC AAGAATAGTT CAAACAAGAA GATTACAAAC TATCAATTTC ATACACAATA 60

TAAACGATTA AAAGA ATG AAA GTC TTC CTG CTG CTT TCC CTC ATT GGA TTC 111

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe

-67 -65 -60

TGC TGG GCC CAA CCT CTC CCA ATA GAC ACA AGA AAA GAA GGA CTA CAG 159 Cys Trp Ala Gln Pro Val Pro Ile Asp Thr Arg Lys Glu Gly Leu Gln

-55 -50 -45 -40

CAT GAT TAC GAT ACA GAA ATT TTG GAG CAC ATT GGA AGC GAT GAG TTA 207 His Asp Tyr Asp Thr Glu Ile Leu Glu His Ile Gly Ser Asp Glu Leu

-35 -30 -25

ACC CCG AAT GAA GAG TAT GTT ATT GAA AGA ACT TTG CAA GCC ATC GAT 255 Thr Pro Asn Glu Glu Tyr Val Ile Glu Arg Thr Leu Gln Ala Ile Asp

-20 -15 -10

AAC ACC ACT TTG GCT AAG AGA TTC GTT AAC CAA CAC TTG TGC GGT TCC 303 Asn Thr Thr Leu Ala Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser

-5 1 5 CAC TTG GTT GAA GCT TTG TAC TTG GTT TGC GCT GAA AGA GCT TTC TTC 351 His Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe

10 15 20 25

TAC ACT CCT AAG GCT GCT AAG GCT AIT CTC GAA CAA TGC TCT ACC TCC 399 Tyr Thr Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser

30 35 40

ATC TGC TCC TTG TAC CAA TTG GAA AAC TAC TGC AAC TAGACGCAGC 445 Ile Cys Ser Leu Tyr Gln Leu Glu Asn Tyr Cys Asn

45 50

CCGCAGGCTC TAGA 459

(2) I-NPO-ayETION FOR SEQ ID N0:9:

(i) SEQUENCE CHARACTERISTICS:

(A) IENGIH: 120 amino acids

(B) TYPE: amino acid

(D) TOPOIOGY: linear

(ii) MDI-ECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe Cys Trp Ala Gln

-67 -65 -60 -55

Pro Val Pro Ile Asp Thr Arg Lys Glu Gly Leu Gln His Asp Tyr Asp

-50 -45 -40

Thr Glu Ile Leu Glu His Ile Gly Ser Asp Glu Leu Thr Pro Asn Glu

-35 -30 -25 -20

Glu Tyr Val Ile Glu Arg Thr Leu Gln Ala Ile Asp Asn Thr Thr Leu

-15 -10 -5

Ala Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu

1 5 10

Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys

15 20 25

Ala Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu

30 35 40 45

Tyr Gln Leu Glu Asn Tyr Cys Asn

50

(2) INFORMATION FOR SEQ ID NO:10:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 408 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA (ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 76..384

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCATION: 76..225

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCATION: 226..384

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:

GAATTCATTC AAGAATAGTT CAAACAAGAA GATTACAAAC TATCAATTTC ATACACAATA 60

TAAACGATTA AAAGA ATG AAA GTC TTC CTG CTG CTT TCC CTC ATT GGA TTC 111

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe

-50 -45 -40

TGC TGG GCC CAA CCA TCG ATC TTG GAT TAT GTT GAC TTG GGT GCG GAA 159 Cys Trp Ala Gln Pro Ser Ile Leu Asp Tyr Val Asp Leu Gly Ala Glu

-35 -30 -25

CTG ATC TCC ATT CGT GGG TAT GAT AAC CTC AAC GAC GCG ATC GAT AAC 207 Leu Ile Ser Ile Arg Gly Tyr Asp Asn Leu Asn Asp Ala Ile Asp Asn

-20 -15 -10

ACC ACT TTG GCT AAG AGA TTC GTT AAC CAA CAC TTG TGC GGT TCC CAC 255 Thr Thr Leu Ala Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His

-5 1 5 10

TTG GTT GAA GCT TTG TAC TTG GTT TGC GGT GAA AGA GGT TTC TTC TAC 303 Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr

15 20 25

ACT CCT AAG GCT GCT AAG GGT ATT GTC GAA CAA TGC TGT ACC TOC ATC 351 Thr Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile

30 35 40

TGC TCC TTG TAC CAA TTG GAA AAC TAC TGC AAC TAGACGCAGC CCGCAGGCTC 404

Cys Ser Leu Tyr Gln Leu Glu Asn Tyr Cys Asn

45 50

TAGA 408

(2) INFORMATION FOR SEQ ID NO:11:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 103 amino acids

(B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe Cys Trp Ala Gln

-50 -45 -40 -35

Pro Ser Ile Leu Asp Tyr Val Asp Leu Gly Ala Glu Leu Ile Ser Ile

-30 -25 -20

Arg Gly Tyr Asp Asn Leu Asn Asp Ala Ile Asp Asn Thr Thr Leu Ala

-15 -10 -5

Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His Leu Val Glu Ala

1 5 10

Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala

15 20 25 30

Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile Cys Ser Leu -Tyr

35 40 45

Gln Leu Glu Asn Tyr Cys Asn

50

(2) INFORMATION FOR SEQ ID NO:12:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 372 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cENA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 76..348

(ix) FEAIURE:

(A) NAME/KEY: sig_peptide

(B) LOCATICN: 76..189

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCATION: 190..348

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:

GAATTCATTC AAGAATAGTT CAAACAAGAA GATTACAAAC TATCAATTTC ATACACAATA 60

TAAACGATTA AAAGA ATG AAA GTC TTC CTGCTG CTT TCC CTC ATT GGA TTC 111

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe

-38 -35 -30

TGC TGG GCC CAA CCA TCG CAC ACT ACC ATC GGC ACC GCA ACT GAC AAA 159 Cys Trp Ala Gln Pro Ser His Thr Thr Ile Gly Thr Ala Thr Asp Lys

-25 -20 -15 AAC ATC GAT AAC ACC ACT TTG GCT AAG AGA TTC GTT AAC CAA CAC TTG 207 Asn Ile Asp Asn Thr Thr Leu Ala Lys Arg Phe Val Asn Gln His Leu

-10 -5 1 5

TGC GGT TCC CAC TTG GTT GAA GCT TTG TAC TTG GTT TGC GGT GAA AGA 255 Cys Gly Ser His Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg

10 15 20

GGT TTC TTC TAC ACT CCT AAG GCT GCT AAG GGT ATT GTC GAA CAA TGC 303 Gly Phe Phe Tyr Thr Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys

25 30 35

TGT ACC TCC ATC TGC TCC TTG TAC CAA TTG GAA AAC TAC TGC AAC 348 cys Thr Ser Ile Cys Ser Leu Tyr Gln Leu Glu Asn Tyr Cys Asn

40 45 50

TAGAOGCAGC CCGCAGGCTC TAGA 372

(2) INFORMATION FOR SEQ ID NO: 13:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 91 amino acids

(B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:

Met Lys Val Phe Leu Leu Leu Ser Leu Ile Gly Phe Cys Trp Ala Gln

-38 -35 -30 -25

Pro Ser His Thr Thr Ile Gly Thr Ala Thr Asp Lys Asn Ile Asp Asn

-20 -15 -10

Thr Thr Leu Ala Lys Arg Phe Val Asn Gln His Leu Cys Gly Ser His

-5 1 5 10

Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr

15 20 25

Thr Pro Lys Ala Ala Lys Gly Ile Val Glu Gln Cys Cys Thr Ser Ile

30 35 40

Cys Ser Leu Tyr Gln Leu Glu Asn Tyr Cys Asn

45 50

Claims

1. A method of constructing a synthetic leader peptide sequence for secreting heterologous polypeptides in yeast, the method comprising

(a) inserting a random DNA fragment into a yeast expression vector comprising the following sequence 5'-SP-X_n-3'-RS-5'-X_m-(NZT)_p-X_q-PS-*gene*-3' wherein SP is a DNA sequence encoding a signal peptide,

(NZT) is a DNA sequence encoding Asn-Xaa-Thr, wherein p is O or 1,

PS is a DNA sequence encoding a peptide defining a yeast processing site, and

*gene* is a DNA sequence encoding a heterologous polypeptide;

(b) transforming a yeast host cell with the expression vector of step (a);

2. A method according to claim 1, wherein the random DNA fragment inserted in the vector is of genomic or synthetic origin.

3. A method according to claim 1 or 2, wherein the random DNA fragment has a length of from 6 to about 600 base pairs.

4. A method according to any of claims 1-3, wherein the random DNA fragment encodes a high proportion of polar amino acids.

5. A method according to any of claims 1-4, wherein the random DNA fragment encodes at least one proline.

6. A method according to claim 1, wherein n and/or m and/or q are ≥1.

7. A method according to claim 1, wherein p is 1.

8. A method according to claim 1, wherein SP is a DNA sequence encoding the α-factor signal peptide, the signal peptide of mouse salivary amylase, the carboxypeptidase signal peptide, the yeast BAR1 signal peptide, or the Humicola lanuginosa lipase signal peptide, or a derivative thereof.

9. A method according to claim 1, wherein PS is a DNA sequence encoding Lys-Arg, Arg-Lys, Lys-Lys, Arg-Arg or Ile-Glu-Gly-Arg.

10. A method according to claim 1, wherein the heterologous polypeptide is selected from the group consisting of aprotinin, tissue factor pathway inhibitor or other protease inhibitors, insulin or insulin precursors, human or bovine growth hormone, interleukin, glucagon, tissue plasminogen activator, transforming growth factor α or β , platelet-derived growth factor, enzymes, or a functional analogue thereof.

11. A yeast expression cloning vector comprising the following sequence 5 ' -SP-X_n-3 ' -RS-5 ' -X_m- (NZT) _p-X_q-PS-*gene*-3 ' wherein SP is a DNA sequence encoding a signal peptide,

RS is a restriction endonuclease recognition site provided at the junction of X_n and X_m,

(NZT) is a DNA sequence encoding Asn-Xaa-Thr, wherein p is O or 1,

PS is a DNA sequence encoding a peptide defining a yeast processing site, and

*gene* is a DNA sequence encoding a heterologous polypeptide.

12. A vector according to claim 11, wherein n and/or m and/or q are ≥1.

13. A method according to claim 11, wherein p is 1.

14. A method according to claim 11, wherein SP is a DNA sequence encoding the α-factor signal peptide, the signal peptide of mouse salivary amylase, the carboxypeptidase signal peptide or the yeast BAR1 signal peptide.

15. A method according to claim 11, wherein PS is a DNA sequence encoding Lys-Arg, Arg-Lys, Arg-Arg, Lys-Lys or Ile-Glu-Gly-Arg.

16. A method according to claim 11, wherein the heterologous polypeptide is selected from the group consisting of aprotinin, extrinsic pathway inhibitor or other protease inhibitors, insulin or insulin precursors, human or bovine growth hormone, interleukin, glucagon, tissue plasminogen activator, transforming growth factor α or β , platelet-derived growth factor, enzymes, or a functional analogue thereof.

17. A yeast expression vector comprising the following sequence 5'-SP-X_n-ranDNA-X_m-(NZT)_p-X_q-PS-*gene*-3' wherein SP is a DNA sequence encoding a signal peptide,

ranDNA is a random DNA fragment inserted in a restriction endonuclease recognition site provided at the junction of X_n and X_m,

(NZT) is a DNA sequence encoding Asn-Xaa-Thr, wherein p is O or 1,

PS is a DNA sequence encoding a peptide defining a yeast processing site, and

*gene* is a DNA sequence encoding a heterologous polypeptide, the sequence X_n-X_q encoding a leader peptide sequence.

18. A method according to claim 17, wherein the random DNA fragment inserted in the vector is of genomic or synthetic origin.

19. A method according to claim 17 or 18, wherein the random DNA fragment has a length of from 6 to about 600 base pairs.

20. A method according to any of claims 17-19, wherein the random DNA fragment encodes a high proportion of polar amino acids.

21. A method according to any of claims 17-20, wherein the random DNA fragment encodes at least one proline.

22. A method according to claim 17, wherein n and/or m and/or q are >1.

23. A method according to claim 17, wherein p is 1.

24. A method according to claim 17, wherein SP is a DNA sequence encoding the α-factor signal peptide, the signal peptide of mouse salivary amylase, the carboxypeptidase signal peptide, the yeast BAR1 signal peptide, or the Humicola lanuginosa lipase signal peptide, or a derivative thereof.

25. A method according to claim 17, wherein PS is a DNA sequence encoding Lys-Arg, Arg-Lys, Arg-Arg, Lys-Lys or Ile-Glu-Gly-Arg.

26. A method according to claim 1, wherein the heterologous polypeptide is selected from the group consisting of aprotinin, tissue factor pathway inhibitor or other protease inhibitors, insulin or insulin precursors, human or bovine growth hormone, interleukin, glucagon, tissue plasminogen activator, transforming growth factor α or β , platelet-derived growth factor, enzymes, or a functional analogue thereof.

27. A yeast cell which is capable of expressing a heterologous polypeptide and which is transformed with a yeast expression vector according to any of claims 11-16.

28. A yeast cell which is capable of expressing a heterologous polypeptide and which is transformed with a yeast expression vector according to any of claims 17-26.

29. A process for producing a heterologous polypeptide in yeast, the process comprising culturing a yeast cell, which is capable of expressing a heterologous polypeptide and which is transformed with a yeast expression vector according to any of claims 17-26 including a leader peptide sequence constructed by the method of claim 1, in a suitable medium to obtain expression and secretion of the heterologous polypeptide, after which the heterologous polypeptide is recovered from the medium.