WO1990006369A1

WO1990006369A1 - Use of hiv protease cleavage site to express mature proteins

Info

Publication number: WO1990006369A1
Application number: PCT/US1989/005009
Authority: WO
Inventors: Christine Marie Debouck
Original assignee: Smithkline Beecham Corporation
Priority date: 1988-12-09
Filing date: 1989-11-07
Publication date: 1990-06-14
Also published as: AU4838690A

Abstract

The invention discloses methods for producing mature protein products. A host cell which has been co-transformed with two recombinant DNA molecules, the first of said recombinant DNA molecules comprises a fusion protein nucleotide sequence, wherein the fusion protein encoded by said coding sequence comprises a first portion, a protein product portion and a protease cleavage site portion therebetween, and the second of said recombinant DNA molecules comprising a coding sequence for a protease or an active fragment or derivative thereof, capable of cleaving amino acid encoded by the protease cleavage site portion is cultured, whereby upon culturing the cell, both the fusion protein and the protease are expressed. The protease cleaves the fusion protein at the protease cleavage site to thereby produce the protein product. In preferred embodiments of the invention, the fusion protein nucleotide sequence and the protease nucleotide sequence are incorporated into two expression vectors that are then inserted into the same cell for expression and production of the mature protein product.

Description

USE OF HIV PROTEASE CLEAVAGE SITE TO EXPRESS MATURE PROTEINS

Field of Invention

This invention relates to the field of recombinant DNA techniques for production of proteins and polypeptides.

Background of Invention

There are numerous gene products of biological interest that cannot be obtained from natural sources in quantities sufficient for detailed biochemical and

physical analysis. Moreover, the limited bioavailability of these molecules has made it impossible to consider their potential utilization as either pharmacological agents or targets. One solution to this problem has been the development of recombinant vector systems that are designed to achieve efficient expression of cloned genes in bacteria or other recombinant hosts. The rationale used in the design of these systems involves insertion of the gene of interest into a multicopy vector system

(usually a plasmid), such that the gene is efficiently transcribed and translated in a host cell and the product of. the gene of interest is produced in quantity.

Nevertheless, high-level expression of many genes in host cells has proved difficult to achieve even when placed under the control of a strong promoter. To

overcome this problem, many proteins have been efficiently produced as hybrids after fusion of the gene coding for the protein with a sequence coding for a whole or part of a highly expressible protein. However, such hybrid proteins are not usually suitable for functional studies or pharmaceutical or other commercial use, unless the authentic protein sequence can be released by cleavage of the fusion protein portion from the protein portion of interest.

Fusion proteins can be cleaved by chemical means, such as cyanogen bromide, or more specific cleavage of hybrid proteins can be achieved by including a nucleotide sequence encoding a protease cleavage site or other selectively cleavable site into the nucleotide sequence encoding the hybrid protein between the sequences encoding the respective proteins. For example, Nagai and

Thogersen, Nature, 309:810-812 (1984) reported an

expression/cleavage system employing the recognition site for blood coagulation factor X_a. Plasmids were

constructed comprising nucleic acid sequences encoding the 31 amino terminal residues of Lambda ell protein, a sequence encoding the tetrapeptide: Ile-Glu-Gly-Arg, and the sequence encoding beta-globin. Ile-Glu-Gly-Arg is the sequence that immediately precedes the two factor X_a cleavage sites in prothrombin. The purified hybrid protein was harvested and subsequently digested with blood coagulation factor X_a, which had been activated with

Russel's viper venom.

Germino and Bastia, Proc. Nat'l Acad. Sci., USA, 81: 4692-4696 (1984) described a similar system using an initiator sequence, a sequence encoding 60 amino acids of chicken pro-alpha-2 collagen and a marker gene. The resulting hybrid protein was purified and subsequently cleaved by collagenase. This system, however, had the drawback of apparently non-specific proteolysis when high concentration of collagenase were used to cleave the hybrid protein, and it was necessary to carefully titrate each batch of collagenase to ensure specific cleavage and to guard against this apparently non-specific proteolysis. U.S. Patent 4,751,180, issued June 14, 1988 to Cousens et al discloses methods and compositions for producing heterologous proteins in a host organism such as yeast. A DNA sequence coding for superoxide dismutase or other highly expressed protein is joined to a DNA sequen coding for a polypeptide of interest by a selectively cleavable linkage. The fusion protein thus produced can then be purified and subsequently cleaved by chemical or enzymatic means to produce the polypeptide of interest.

These methods of producing proteins can be time-consuming and costly, especially when the production of large quantities for commercial use is considered.

Cleavage of fusion proteins with certain types of

proteases, e.g., trypsin or cyanogen bromide is relatively non-specific. As a result, functional proteins are difficult to produce with these methods. In the case of site-specific proteases, cleavage requires an extra processing step following production of a fusion protein. Poor solubility of the fusion proteins in the cells that produce them is also often a drawback of these methods.

It is an object of this invention to provide methods for producing mature protein products which are less time-consuming, but which can also provide protein products in mature form in large quantities.

Summary of Invention

The invention provides methods of preparing a protein product by expressing, in the same cell, a fusion protein and a protease capable of cleaving the protease cleavage site, contained in the fusion protein, to

generate the mature protein product.

Complete processing of the fusion protein within the same cell by methods of the invention has several advantages over the prior methods. HIV-1 protease cleavage of a fusion protein having and HIV-1 protease cleavage site to produce the selected protein product is more selective than when cleavage is done with chemical reagents or enzymes such as trypsin. The amino acid sequence recognized by these latter methods may appear several times in the fusion protein, and the use of these methods will result in several, usually non-fictional, protein fragments. The retroviral cleavage site is not commonly found in proteins, thus, the likelihood of inadvertently cleaving the protein product while cleaving the deliberate retroviral protease cleavage site is correspondingly significantly reduced.

The invention provides the efficient production of protein products in quantities sufficient for

commercial or research purposes. Because fusion proteins are synthesized and processed in the same cell, the invention eliminates the need to first purify the fusion protein, release the protein product and then isolate the protein product in a second, separate purification process. With the methods of the invention, the protein product is present in cells in a mature form. There should be little or no cost for large quantities of purified protease to cleave the fusion protein, since the organism is producing the appropriate enzyme at the same time it is producing the fusion protein. The methods of the invention will result in more efficient, less

time-consuming and more economical production of protein products.

In the methods of the invention, a first recombinant DNA molecule comprising a fusion protein nucleotide sequence coding for a fusion protein is provided. The fusion protean nucleotide sequence

comprises a first portion, a protein product portion and protease cleavage site portion therebetween. The first portion preferably codes for a protein that is

well-expressed in the host cell, to ensure high levels of production of the fusion protein. The protease cleavage site portion codes for amino acids recognized by the protease selected for use in the invention. The protein product portion codes for a selected protein product. The amino acids of the cleavage site act as substrate for the proteolytic enzyme. Cleavage of the protease recognition site releases the protein product from its association with the first portion of the fusion protein.

A second recombinant DNA molecule comprising a coding sequence for a protease or an active fragment or derivative thereof, capable of cleaving amino acids encoded by the protease cleavage site portion is also provided. This nucleotide sequence codes for a functional protease that enzymatically releases the protein product from the fusion protein.

The first and second recombinant DNA molecules are then introduced into a cell for expression of the nucleotide sequences and production of the protein

product. Next, the cells are cultured under conditions selected to express the fusion protein and the protease, whereby the protease cleaves the fusion protein at the protease cleavage site to thereby produce the mature protein product. The mature protein product may then be isolated by conventional methods.

The fusion protein nucleotide sequence and the protease nucleotide sequence may be incorporated into at least one expression vector and then introduced into the cell, or the two sequences may be incorporated into the genome of the cell for expression of the fusion protein and protease. In preferred embodiments of the invention, the first and second recombinant DNA molecules are

incorporated into two expression vectors, the first of which comprises the first recombinant DNA molecule and the second of which comprises the second recombinant DNA molecule. If two expression vectors are employed, it is desirable that the two expression vectors are compatible in the same cell. The protease and cleavage site are preferably derived from HIV-1. The first and second recombinant DNA molecules may further comprise at least one expression control sequence operatively linked with the fusion protein nucleotide sequence and the protease nucleotide sequence, respectively.

The invention also provides fusion protein nucleotide sequences as described above, nucleotide sequences coding for HIV-1 protease or proteolytically active fragments or derivatives thereof, recombinant

E. coli protease products and proteases specific for a cleavage site.

The invention further provides cells having incorporated therein two recombinant DNA molecules, a first recombinant DNA molecule comprising a nucleotide sequence coding for a fusion protein nucleotide sequence and a second recombinant DNA molecule comprising a nucleotide sequence coding for a protease capable of cleaving the fusion protein at the protease cleavage site. The fusion protein nucleotide sequences are those described herein for the production of fusion proteins. The cells of the invention preferably have two expression vectors, one of which comprises the first recombinant DNA molecule, and the other of which comprises the second recombinant DNA molecule.

Detailed Description of the Invention

The invention provides nucleotide sequences, peptides, cells and methods for producing protein

products. In preferred embodiments of the invention, a retroviral protease is coupled with a retroviral protease cleavage site for production of mature protein products. Preferred retroviruses are the immunodeficiency viruses; more preferably, Human Immunodeficiency Virus-1 (HIV-1).

Retroviruses, that is, viruses within the family Retroviridae, are a large family of enveloped, icosohedral viruses of about 150 nm having a coiled nucleocapsid within the core structure and having RNA as the genetic material. The family comprises the oncoviruses, such as the sarcoma and leukemia viruses, the immunodeficiency viruses and the lentiviruses.

HIV, causative agent of acquired immunodeficiency syndrome and related disorders, is a member of the

Retroviridae family. There exist several isolates of HIV, including human T-lymphotropic virus type-Ill (HTLV-III). the lymphadenopathy virus (LAV) and the AIDS-associated retrovirus (ARV) which have been grouped in type 1.

Related Immunodeficiency Viruses include HIV type 2, which has been shown to be associated with AIDS in West Africa.

Other Immunodeficiency Viruses include the Simian

Immunodeficiency Virus (SIV), such as SIV_mac -BK28 [see

Hirsch et al, Cell, 49:307 (1987)].

Molecular characterization of the HIV genome has demonstrated that the virus exhibits the same overall gag-pol-env genetic organization as other retroviruses.

HIV also contains a highly specific protease responsible for post-translational processing of HIV precursor

proteins. The HIV gag region is initially translated int a polyprotein precursor of about 55 kDa that is then processed into the mature p17, p24, and p15 gag structural proteins. Similarly, gag-pol region is believed to be translated into a larger precursor through a translational frameshift between the overlapping gag and pol reading frames. This gag-pol precursor is post-translationally processed as well, to yield the mature gag proteins and the products of the pol region , including protease ,

reverse transcriptase and endonuclease. In several retroviruses, the proteolytic

maturation of the gag and gag-pol polyproteins has been shown to be effected at least, in part, by a

highly-specific protease that is virally encoded between the gag and pol regions. The retrovirus protease is essential to the retroviral life cycle as indicated by the roduction of noninfectious, replication-deficient virions by Moloney murine leukemia virus variants mutated in the protease coding region.

The HIV protease codin ion is believed to be positioned between the p15 gag and reverse transcriptase genes and to reside entirely with the pol reading frame. Determination of the amino-terminal sequences of HIV gag p24 and reverse transcriptase proteins and knowledge of the surrounding sequences have revealed a common sequence [Asn-(Phe or Tyr)*Pro-Ile] at the cleavage site, with cleavage occurring at the asterisk between (phenylalanine or tyrosine) and proline. A similar sequence,

(Asn-Phe-Pro-Gln-Ile) occurs at the amino terminus of the protease, Debouck et al, Proc. Nat'l Acad. Sci.,

84:8903-8906, December, 1987 which is specifically

incorporated as if fully set forth herein.

The first and second recombinant DNA sequences are generally introduced into a cell for production of the protein product either by incorporation into at least one expression vector that is then introduced into the cell, or. by integration into the genome of the host cell.

While, for the most part, the nucleic acid sequences of the invention will be prepared as a single entity, it should be appreciated that they may be prepared as various fragments, these fragments joined in expression cassettes and expression vectors to various untranslated regions providing for particular functions , and ultimately the coding sequences brought together at a subsequent stage. For clarity, however, the discussion will be directed primarily to the situation where the coding sequence is prepared as a single entity and then transferred to an expression vector.

The fusion protein nucleic acid sequences of the invention preferably comprise a first portion, a protease cleavage site portion and a protein product portion. The parts of the fusion protein nucleic acid sequence are preferably arranged in an order wherein the first portion is upstream of the protease cleavage site portion and the protein product portion, respectively. However, in some cases, the protein product portion may be upstream of the protease cleavage site portion and the first portion. The various sequences comprising the parts of the fusion protein nucleic acid sequence, as well as any other nucleic acid manipulations, are performed using

conventional techniques for manipulating nucleic acids.

The first portion of the nucleotide sequences of the invention are preferably chosen from genes for

portions of genes that are well-expressed in the

particular cell into which the nucleotide sequence will be introduced. Nucleic acid sequences encoding proteins produced in amounts of about 5% or greater of the total protein produced by the cell are considered to be

well-expressed and suitable for use in the invention. The nucleic acid sequence for use in the invention may be identical to the gene encoding the well-expressed protein portion, or it may be a mutant of the gene or have one or more codons substituted. The entire gene or any portion of the gene which provides for the desired high yield of protein in the host may be employed.

For example, when the nucleotide sequence is to be expressed in E. coli, several genes, such as the galactokinase gene, Rosenberg et al, Meth. Enzymol.,

101:123 (1983); and the first 56 codons of this gene; the phage Lambda O protein, Rosenberg et al, Meth. Enzymol., 101:123 (1983); E. coli beta-galactosidase protein, Bartus et al, Antim. Ag. Ch., 25:622 (1984); and the influenza NS-1 protein, Young et al, Proc. Nat'l Acad. Sci, USA,

80:6105 (1983) may be used.

A preferred first portion is an E. coli galactokinase gene, more preferably the first 56 codons of the E. coli galactokinase gene. This gene and gene fragment result in high levels of expression in E. coli.

The choice of protein product portion will depend upon the identity of the selected protein product. The protein products of the invention are mature proteins or polypeptides and do not comprise generally amino acids derived from the first portion. The protein products may however, comprise one or more amino acids derived from the protease cleavage site portion, depending on the length of the protease cleavage site portion and the exact position of cleavage within this portion. The protein product portion can code for an entire protein, part of a protein polypeptide, or oligopeptide. The size and choice of protein product will depend on such factors as its use or desired characteristics. Protein products of interest include enzymes, hormones, blood clotting factors,

immunoglobulins, etc., as well as fragments or parts of such molecules.

The protease cleavage site portion comprises nucleotides coding for the amino acids forming the

cleavage site of the protease used in the invention. The protease cleavage site portion should be chosen to provide an enzyme-substrate pair with the protease nucleotide sequence. For example, if the protease cleaves an Ala-Leu bond, the protease cleavage site portion should include at least nucleotides coding these amino acids. Suitable protease cleavage sites for use in the invention include those which serve as substrates for proteases from

eukaryotic and prokaryotic organisms, such as chymotrypsin and pepsin. The protease cleavage site preferably

comprises nucleotides coding for amino acid combinations absent from the protein product or occurring in parts of the protein product where function of the product would not be adversely affected by cleavage of the product.

Suitable proteases for use in the invention include those that are capable of exhibiting proteolytic activity within the environment of the cell in which the expression system is introduced and that are not

substantially toxic for the host cell upon induction of the expression system. Proteolytically active fragments and derivatives of such proteases are also suitable for use in the invention. Proteases, such as trypsin, having cleavage site which commonly, and frequently, appear in protein products are not suitable for use in the invention. However, these proteases may, nevertheless, be suitable for use in the invention in situation where the protein product is functional after cleavage of the fusion protein with the protease.

Preferred protease cleavage sites include those of retroviruses, preferably immunodeficiency virus

cleavage sites, such as those of HIV and Simian

Immunodeficiency Virus (SIV).

Preferred cleavage sites are selected from the group consisting of R₁.tyrosine.proline.R₂,

R₁ leucine.alanine.R₂, R₁.methionine.methionine.R₂,

R₁.phenylalanine.leucine.R₂,R₁.phenylalanine.proline.

R₂, and R₁. leucine. phenylalanine. R₂ wherein R₁ is

(X₁)_n(X₂)_O(X₃)_p and R₂ is (X₄)r(X₅)_s

and X₁, X₂, X₃, X₄ and X₅ are independently an

amino acid and n,o,p,r and s are independently 0 or 1.

X₁ is preferably selected from the group consisting of serine and threonine. X₁, X₂, X₃, X₄ and X₅ may

be any amino acid generally found in proteins or any non-protein amino acid, providing that X₁, X₂, X₃,

X₄ or X₅ do not substantially interfere with the

ability of the protease to cleave the protease cleavage site. X₄ and X₅ are preferably hydrophobic amino

acids, such as isoleucine, valine, glycine or serine.

Examples of suitable cleavage sites are:

Ser. Gly.Asn. Tyr.Pro. lie.Val;

Ser .Phe.Asn.Phe.Pro.Gly. lie; and

Thr .Leu.Asn. Phe. Pro. lie. Ser.

As used herein, abbreviations of amino acids are those conventionally given to amino acids found in

proteins, i.e., ala-alanine, arg-arginine, asp-aspartic acid, asn-asparagine, cys-cysteine, gly-glycine,

glu-glutamic acid, gln-glycine, his-histadine,

ile-isoleucine, lys-lysine, met-methionine,

phe-phenylalanine, pro-proline, ser-serine, thr-threonine, trp-tryptophan, tyr-tyrosine, val-valine.

The protease cleavage site portion may also include additional nucleotides flanking the nucleotides coding for the cleavage site, providing these additional nucleotides do not encode amino acids that substantially interfere with the activity of the protease or the

functioning of the well-expressed gene portion or protein product portion. Examples of additional nucleotides are those coding for further amino acids near the protease cleavage site, those derived from the genetic material of the organism from which the protease cleavage site was isolated, linker sequences, or sequences tending to promote proper folding of the fusion protein produced from the nucleotide sequences of the invention.

A nucleotide sequence coding for HIV-1 protease may be obtained by restriction endonuclease digestion of clone of HIV-1, such as the BH10 clone of the HTLVIIIB isolate of HIV, by isolating HIV-1 from persons infected with the virus, and synthesis of single-stranded DNA from the HIV RNA with reverse transcriptase using conventiona techniques. The single-stranded DNA complementary may then be used as the template for preparing a second strand to provide double-stranded cDNA containing the coding region for the HIV-1 protease. Once DNA encoding of the protease region is obtained, fragments containing the nucleotide sequence coding for protease are obtained by digestion of the DNA with appropriate restriction

endonuclease, such as Haelll, and NlalV and Haelll, and the fragment containing the nucleotide sequence coding for protease is isolated in accordance with conventional techniques. Fragments, mutations, additions and deletions from the nucleotide sequence coding for HIV-1 protease that have proteolytic activity are also within the scope of the invention and are suitable for use in the methods of the invention.

The DNA sequences coding for the first portion, the protease cleavage site portion, the protein product portion, and the DNA sequence coding for the protease may be obtained in a variety of ways. The sequences encoding the portions may be derived from natural sources, where the messenger RNA or chromosomal DNA may be identified with appropriate probes, which are complementary to a portion of the coding or non-coding sequence. From

messenger RNA, single-stranded (ss) DNA may be prepared employing reverse transcriptase in accordance with

conventional techniques. The ss DNA complementary strand may then be used as the template for preparing a second strand to provide double-stranded (ds) cDNA containing the coding region for the portion. Where chromosomal DNA is employed, the region containing the coding region may be detected employing probes, restriction mapped, and by appropriate techniques isolated, substantially free of untranslated 5' and 3' regions. Where only portions of the coding sequence are obtained, the remaining portions may be provided by synthesis of adapters which can be ligated to the coding portions and provide for convenient termini for ligation to other sequences providing

particular functions or properties.

The first portion, the protein product portion, the protease cleavage site portion, and the DNA sequence coding for the protease may also be prepared synthetically using conventional methods and reagents for synthesizing nucleic acids.

The first and second recombinant DNA molecules may further comprise expression control elements

operatively linked with the fusion protein nucleotide sequence and the protease nucleotide sequence,

respectively. The fusion protein nucleotide sequence is preferably flanked by expression control elements that provide control of transcription and translation. These may be included with the fusion protein nucleotide

sequence in the first recombinant DNA molecule to form an expression cassette, which can be moved intact into a selected expression vector. Alternatively, the fusion protein nucleic acid sequence, without transcription control elements, can be moved to selected expression vectors already having transcription control elements incorporated therein.

An expression cassette containing the nucleic acid sequences of the invention preferably comprises, in an upstream position, a promoter, followed by a

translation initiation signal comprising a ribosome binding site, and an initiation codon, and, in a

downstream position, a transcription termination signal. However, additional transcription control elements, such as capping sequences and enhancers may also be used when expression is done in eukaryotic hosts. The transcription and translation control elements may be ligated in any functional combination or order. The transcription and translation control elements used in any particular embodiment of the invention will be chosen with reference to the type of cell into which the expression vector will be introduced, so that an expression system will be created.

Nucleotide sequences coding for the protease also preferably form parts of an expression cassette containing the elements described above for the nucleotide sequences coding for the fusion protein, or they may be inserted into expression vectors already containing transcription control elements. If the nucleotide sequences coding fo r the fusion protein and the protease are directed by different sets of transcription control elements, it is preferable that both sequences are expressed in comparable amounts to avoid a relative overabundance of the fusion protein or protease.

Inducible promoters are preferred for use in the invention. Examples of suitable inducible promoters are those where induction may be as a result of a physical change, e.g., temperature; or chemical change, e.g., change in nutrient or metabolite concentration, such as glucose or tryptophan; or change in pH or ionic strength. Suitable promoters in E. coli include promoters, such as the Lambda P_L and P_R promoters, and the E. coli

^PLAC'^PTRP and ^PTAC Promoters; in yeast, the CUP ₁

metallothionein promoter; and in mammalian systems, the mouse mammalian tumor virus (MMTV).

The transcription initiation elements can be separated from the transcription terminator sequence by a polylinker, which has a plurality of unique restriction sites for insertion of the fusion protein nucleic acid sequence. The polylinker will be followed by the

terminator region, which may be obtained from the same wild-type gene from which the promoter region was obtained or a different wild-type gene, so long as efficient transcription initiation and termination is achieved when the two regions are used. By digestion of the expression vector with the appropriate restriction enzymes, the polylinker will be cleaved and the open reading frame nucleotide sequence coding for the fusion protein may be inserted.

The expression vector will be selected so as to have an appropriate copy number, as well as providing for stable extrachromosomal maintenance. The expression vector will usually have a marker that allows for

selection in the expression host. Suitable markers include those which provide the host with an enhanced ability to utilize an essential nutrient or metabolite in short supply, or provide resistance to drugs or

antibiotics, such as ampicillin or chloramphenicol.

Cells suitable for use in the invention include prokaryotic and eukaryotic cells that can be transformed to stably contain and express both the fusion protein and the protease. Suitable types of cells include bacterial, yeast and mammalian cells. E. coli cells are preferred. Introduction of at least one expression vector

incorporating the fusion protein nucleotide sequence and the protease nucleotide sequence can be performed in a variety of ways, such as calcium chloride or lithium chloride treatment, calcium-polyethylene glycol-treated DNA with spheroplasts, use of liposomes, mating,

electroporation.

When one plasmid is employed for expression of both the fusion protein and the protease, the plasmid may be constructed to have two separate sets of transcription control elements, one for the fusion protein nucleic acid sequence and one for the protease nucleotide sequence; or both the fusion protein and the protease may be under the control of the same transcription control elements. When two plasmids are used, the transcription control elements may be selected to be identical in both plasmids, or the may be selected to be different in one or more

transcription control elements.

When two expression vectors are used, both are inserted into the same cell. Care should be taken that the expression vectors are chosen to be compatible and stable in the same cell. For example, when E. coli cell are chosen for production of the protein product, and two plasmids are used (one for the fusion protein, one for the protease), a suitable compatible pair of plasmids is one plasmid carrying the colEl origin of replication and the ampicillin-resistance marker and the other plasmid

carrying the incFII origin of replication and the

chloramphenicol-resistance marker. Other suitable

pairings of plasmids for use in E. coli can be determined by reference to Novick et al, Bacteriological Reviews, 40:168-189 (1976). Examples of other types of host cells for which two expression vectors can be used are

Streptomyces and Bacillus species, and the yeast

Saccharomyces cerevisiae (micron-based and CEN-based expression vectors), Gorman et al, Gene, 48:13-22 (1986). Stable expression of both the fusion protein and the protease using two expression vectors should also be possible with some mammalian cells.

Cells, expression vectors and transcription control elements are selected to be complementary to one another to form an expression system. Suitable expressio systems in E. coli utilize, for example, influenza NSl, Young et al, Proc. Nat'l Acad. Sci., USA, 80:6105 (1983), phage Lambda O protein, Rosenberg et al, Meth. Enzymol., 101:123 (1983), the E. coli galactokinase protein,

Rosenberg et al, Meth. Enzymol., 101: 123 (1983) and the E. coli beta-galactosidase protein, Bartus et al, Antim. Ag. Ch., 25:622 (1984). Expression in Streptomyces

includes use of E. coli galK, Brawner et al, Gene, 40:191 (1985), the genes from the Streptomyces gal operon,

Fornwald et al, Proc. Nat'l Acad. Sci., USA, 84: 2130 (1987), and the Streptomyces beta-galactosidase, Burnett et al, in Microbiology 1985, ed. Levin, Amer. Soc.

Microbiol., Washington, D.C. (1985). In yeast expression, can be achieved with E. coli galK, Rosenberg et al, in

Genetic Engineering, vol. 8, ed. Setlow, Plenum Press (1986), 3-phosphoglycerate kinase (PKG), Chen et al,

Nucleic Acid Res., 12:8951 (1984) and human superoxide dismutase, Cousens et al, U.S. Patent 4,751,180, issued June 14, 1988. In mammalian cells, Schumperli et al,

Proc. Nat'l Acad. Sci., USA, 79:257 (1982) has expressed E. coli galK; such system may also be useful as an

expression system for the invention.

As an alternative to the use of at least one expression vector inserted into the host cell for

expression of both the fusion protein and the protease in one cell, at least one expression cassette comprising the first and second recombinant DNA molecules can be

integrated into the genome of the host cell. This may be preferable if a eukaryotic expression system, such as a yeast or mammalian expression system, is employed. The expression cassette can be incorporated into the genome of the host cell by site-directed, e.g., by homologous recombination, or random integration. Introduction of the expression cassette into the host cell can be accomplished in. a variety of ways, such as co-precipitation, calcium chloride or lithium chloride treatment, calcium

polyethylene glycol treated with DNA or electroporation. The nucleotide sequence coding for the fusion protein and the nucleotide sequence coding for the protease may be under the control of the same or different transcription control elements. Additionally, the nucleotide sequence coding for the fusion protein and the nucleotide sequence coding for the protease may be incorporated into the same cell in different manners, i.e., the first recombinant DNA molecule coding for the fusion protein may be incorporated into the genome of the host cell, and the second

recombinant DNA molecule comprising the nucleotide

sequence coding for the protease may be incorporated into an expression vector, or vice versa.

Host cells containing the first and second recombinant DNA molecules are then grown in an appropriate medium for the host. Where an inducible promoter is employed, the host cell may be grown to high density and the promoter turned on for expression of the fusion protein and protease. Where the promoter is not

inducible, then constitutive production of the fusion protein and protease will occur. Constitutive production of the protease is preferable only in expression systems where the protease is not substantially toxic to the host cell. The cells may be grown until there is no further increase in product formation or the ratio of nutrients consumed to product formation falls below a predetermine level, at which time, the cells may be harvested, lysed and the protein product obtained and purified in

accordance with conventional techniques. These techniques include chromatography, electrophoresis, extraction, density gradient centrifugation, or the like.

The bacterium E. coli is the organism of choice for the production of the protein products. Cloning and expression can be obtained rapidly in E. coli; and to date, in some cases, much higher levels of gene expression have been achieved in E. coli as compared to other

organisms, e.g., mammalian systems (Chinese hamster ovary cells) or yeast. Production in E. coli is readily

amenable to cost-effective, large-scale fermentation and protein purification. Preferred E. coli strains are defective lysogen strains. Some such strains can be used in combination with the inducible Lambda phage P_L promoter for heat induction of the promoter, Devare et al,

Cell, 36:43-49 (1984), or chemical induction using

nalidixic, Mott et al, Proc. Nat'l Acad. Sci., 82:88-92

(1985). A more detailed description of an E. coli system for expression of the fusion protein and the protease is found in the Examples herein.

Nucleotide sequences coding for HIV-1 protease or a proteolytically active fragment or derivative thereof may be obtained by restriction endonuclease digestion of a clone of HIV-1, using the appropriate restriction

endonuclease, or from persons infected with HIV-1, as described herein supra. The examples herein disclose the methods of obtaining nucleotide sequences having

proteolytic activity towards HIV-1 cleavage sites.

Proteolytically active fragments, derivatives, additions and deletions of nucleotide sequences coding for HIV-1 protease are also within the scope of the invention.

Suitable nucleotide sequences coding for peptides having proteolytic activity towards HIV-1 cleavage sites include, or the complementary sequences thereof:

CCTCAGATCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAA

----+----------------------------------+----------------------+--------------------+-------------------+

GGAGTCTAGTGAGAAACCGTTGCTGGGGAGCAGTGTTATTTCTATCCCCCCGTT

CTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTG

------------------+------------------+----------------------+-----------------+----------------------------------+

GATTTCCTTCGAGATAATCTATGTCCTCGTCTACTATGTCATAATCTTCTTTACTCAAAC CCAGGAAGATGAAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAG

GGTCCTTCTACCTTTGGTTTTTACTATCCCCCTTAACCTCCAAA ATAGTTTCATTCTGTC TATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGA ----------------------------+------------------+----------------------+-----------------+--------------------------+ ATACTAGTCTATGAGTATCTTTAGACACCT GTATTTCGATATCCA GTCATAATCATCCT

CCTACACCTGTCMCATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAAT

----------------------------+------------------+----------------------+-----------------+--------------------------+

GGATGTGGACAGTTGTATTAACCTTCTTTAGACAACTGAGTCTAACCAACGTGAAATTTA ^TTT AAA;

NlalV

GCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCAGATCACTCTTTGGCAACGACCC

---------------------------+------------------+----------------------+-----------------+-------------------------+

CGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCTAGTGAGAAACCGTTGCTGGG CTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGAT

------------------------------+------------------+----------------------+-----------------+----------------------+

GAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATCTATGTCCTCGTCTA

GATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGA

-----------------------------+------------------+----------------------+-----------------+------------------------+

CTATGTCATAATCTTCTTTACTCAAACGGTCCTTCTACCTTTGGTTTTTACTATCCCCCT ATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACAT

---------------------------+------------------+----------------------+-----------------+-----------------------+

TAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATCTTTAGACACCTGTA

AAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTG

----------------------+ -------------+------------------+----------------------+-----------------+------------------+

TTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATTAACCTTCTTTAAAC

TTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTA ------------------+----------------------------------+------------------+----------------------+-----------------+

AACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAACTCTGACATGGTCAT

Hae III

AAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGG

------------------+----------------------------------+------------------+----

TTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACC;

Hae III

GCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCAGATCACTCTTTGGCAACGACCC

------------------+----------------------------------+------------------+----------------------+-----------------+

CGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCTAGTGAGAAACCGTTGCTGGG

CTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGAT

GAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATCTATGTCCTCGTCTA

GATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGA

CTATGTCATAATCTTCTTTACTCAAACGGTCCTTCTACCTTTGGTTTTTACTATCCCCCT

ATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACAT

------------------------+----------------------------------+------------------+----------------------+---------+

TAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATCTTTAGACACCTGTA

AAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTG

TTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATTAACCTTCTTTAGAC TTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTA ------------------+----------------------------------+------------------+----------------------+-----------------+

AACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAACTCTGACATGGTCAT

Hae III

AAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGG

------------------+----------------------------------+-------------------------+

TTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACC;

Nde I

CATATGCCTCAGATCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAA

GTATACGGAGTCTAGTGAGAAACCGTTGCTGGGGAGCAGTGTTATTTCTATCCCCCCGTT

CTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTG

GATTTCCTTCGAGATAATCTATGTCCTCGTCTACTATGTCATAATCTTCTTTACTCAAAC

CCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAG

GGTCCTTCTACCTTTGGTTTTTACTATCCCCCTTAACCTCCAAAATAGTTTCATTCTGTC

TATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGA

ATACTAGTCTATGAGTATCTTTAGACACCTGTATTTCGATATCCATGTCATAATCATCCT

CCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAAT

GGATGTGGACAGTTGTATTAACCTTCTTTAGACAACTGAGTCTAACCAACGTGAAATTTA

TTTTAGATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCA

AAAATCTAATCGGGATAACTCTGACATGGTCATTTTAATTTCGGTCCTTACCTACCGGGT Hae III

AAAGTTAAACAATGG

------------------+-------

TTTCAATTTGTTACC.

The recombinant E. coli protease product is produced by recombinant DNA techniques from nucleotide sequences coding for nucleotide sequences coding for peptides having proteolytic activity toward HIV-1 cleava sites. The recombinant E. coli protease has proteolytic activity towards HIV-1 cleavage sites and is useful for cleaving fusion proteins as disclosed herein. Suitable methods and nucleotide sequences for preparing the

recombinant E. coli protease product include those

disclosed herein for preparing the protease in E. coli. The protease product may be produced in cells with or without production of a fusion protein.

Suitable peptides include those selected from the group of peptides consisting of-

ProGlnlleThrLeuTrpGlnArgProLeuValThrlleLysIleGlyGlyGln

LeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluMetSerLeu

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGln

TyrAspGlnlleLeuIleGluIleCysGlyHisLysAlaIleGlyThrValLeuValGly

ProThrProValAsnllelleGlyArgAsnLeuLeuThrGlnlleGlyCysThrLeuAsn

Phe,

AlaAspArgGlnGlyThrValSerPheAsnPheProGlnlleThrLeuTrpGlnArgPro

LeuValThrlleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp

AsPThrValLeuGluGluMetSerLeuProGlyArgTrpLysProLysMetIleGlyGly

IleGlyGlyPheUeLysValArgGlnTyrAspGlnlleLeuIleGlylleCysGlyHis

LysAlalleGlyThrValLeuValGlyProThrProValAsnllelleGlyArgAsnLeu

LeuThrGlnlleGlyCysThrLeuAsniPheProIleSerProIleGluTiirValProVal

LysLeuLysProGlyMetAspGlyProLysValLysGlnTrp, AlaAspArgGlnGlyThrValSerPheAsnPheProGlnlleThrLeuTrpGlnArgPro

LeuValThrlleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp

AspThrValLeuGluGluMetSerLeuProGlyArgTrpLysProLysMetlleGlyGly

ileGlyGlyPhelleLysValArgGlnTyrAspGlnlleLeuIleGlylleCysGlyHis

LysAlalleGlyThrValLeuValGlyProThrProValAsnllelleGlyArgAsnLeu

LeuThrGlnlleGlyCysThrLeuAsnPhePcoIleSerProIleGluThrValProVal

LysLeuLysProGlyMetAspGlyProLysValLysGlnTrp, and MetProGlnlleThrLeuTrpGlnArgProLeuValThrlleLyslleGlyGlyGln

LeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluMetSerLeu

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGln

TyrAspGlnlleLeuIleGluIleCysGlyHisLysAlalleGlyThrValLeuValGly

ProThrProValAsnllelleGlyArgAsnLeuLeuThrGlnlleGlyCysThrLeuAsn

PheEnd.

Proteolytically active fragments, mutations, additions or deletions of the recombinant E. coli protease product are also within the scope of the invention.

Example 1.

The E. coli strain used for expression was AR120, as described in Mott et al, Proc. Nat'l Acad. Sci., USA, 82: 88-92 (1985). All DNA manipulations were carried out as described in Maniatis et al. Molecular Cloning: A

Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1982. E. coli strain AR120 carrying the expression plasmids were induced, and total cell extracts were prepared and analyze by NaDodSO₄/PAGE as described in Aldovini, Debouck et al, Proc. Nat'l Acad. Sci., USA, 83:6672-6676 (1886). HIV Protease and Activity

I. Construction of vectors for expression of HIV-1

protease in E. coli

Vectors for expression of HIV-1 protease were constructe by inserting the appropriate restriction fragment into the expression vector pOTSKF33 All HIV-1 restriction fragments were isolated from the BH10 clone of the HTLVIIIB isolate of HIV (Shaw et al, Science, 226:1165-1171 (1984).

A. Construction of the galK translation fusion vector pOTSKF33 : pOTSKF33 derives from the pASl

expression vector of Rosenberg as disclosed in U.S. Patent 4,578,355, issued March 25, 1986, and comprises a portion of the E. coli galK gene (56 codons) flanked by a

polylinker with restriction sites for fusion in any of the three translation frames (phase 1 - Xmal filled-in;

phase 2 - Smal; phase 3 - Stul), stop codons for each phase and some additional cloning sites. pOTSKF33 was constructed in several steps:

Step 1 : Insertion of the polylinker region of pUC19 into the pOTS34 vector: pOTS34, a derivative of pASl (Devare et al, Cell, 36:43 (1984), was digested with the Bglll restriction endonuclease, treated with DNA polymerase I (Klenow) to create blunt ends and redigested with the Sphl restriction endonuclease. The large 5251 base pair (bp) Bglll-Sphl fragment was purified and ligated to the 135 bp Bglll-Sp restriction fragment isolated and purified from pUC19 (Yanisch-Perron et al, Gene, 33:103 (1985). The resulting plasmid was called pOTS-UC. Step 2: Insertion of the partial galK expression unit from pASK into pOTS-UC: pOTS-UC was digested with Xmal, treated with Klenow to create blunt ends and redigested with Hindlll. The large fragment was purified and ligated to a 2007 bp Bell filled-Hindlll fragment isolated and purified from pASK. The resulting plasmid was called pOTSKF45.

The pASK plasmid vector is a pASl derivative in which the entire galK coding sequence is inserted after the ATG initiation codon. This galK sequence differs from the authentic sequence (Debouck et al, Nucleic Acids Res., 13:1841 (1985)) at the 5' end:

-pASK sequence: ATG.GAT.CCG.GAA.TTC.CAA.GAA.AAA

-authentic seq: ATG.AGT.CTG.AAA.GAA.AAA...

The pASK sequence contains a BamHI site (GGATCC) at the ATG whereas the authentic sequence does not.

Step 3: Introduction of a linker with

translation stops into pOTSKF45: pOTSKF45 was digested with Xmal and Sail and then treated with Klenow to create blunt ends. The linker

5'-GTA.GGC.CTA.GTT.AAC.TAG-3' was then introduced between the Xmal filled and Sail filled sites by ligation with T4 DNA ligase to yield the vector: pOTSKF33.

B. Insertion of Sequence Coding for HIV-1 Protease into pOTSKF33

1. Construction of PRO3

A 382-bp NlalV-Hae III fragment was isolated from BH10 using standard cloning techniques and inserted at the Stul restriction site of pOTSKF33 in the correct orientation. The NlalV-Haelll restriction fragment has the following nucleotide sequence and corresponding translation for the reading frame of interest:

NlaIv

GCCGATAGACAA∞AACTGTATCCTTTAACTTCCCTCAGATCACTCTTTGGCAACGACCC

2253 ------------------+----------------------------------+------------------+-------------------------------+ 2312

CGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCTAGTGAGAAACCGTTGCTGGG

AlaAspArgGInGIyThrValSerPheAsπPheProGInileThrLeuTrpGlnArgPro -

CTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGAT

2313 ---------------- -----------------+----------------------------------+------------------+----------------------+ 2432

GAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATCTATGTCCTCGTCTA

LeuValThrIleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp -

GATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGA

2373 -------------- --------------+----------------------------------+------------------+--------------------------------+2432

CTATGTCATAATCTTCTTTACTCAAACGGTCCTTCTACCTTTGGTTTTTACTATCCCCCT

AspThrValLeuGluGluMetSerLeuProGlyArgTrpLysProLysMetlileGIyGly -

ATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACAT

2433 ---------------- -----------------+----------------------------------+------------------+------------------------+ 2492

TAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATCTTTAGACACCTGTA

IleGlyGlyPhelleLysValArgGlnTyrAspGlnlleLeuIleGlylleCysGlyHis -

AAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTG

2493 ---------------- -----------------+----------------------------------+------------------+------------------------+ 2552

TTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATTAACCπCTTTAGAC

LysAIalleGIyThrVal LeuValG1yProThrProValAsnllell eGlyArgAsnLeu -

TTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTA

2553 ---------------- -----------------+----------------------------------+------------------+------------------------+ 2612

AACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAACTCTGACATGGTCAT

LeuThrGInil eGI yCysThrLeuAsnPheProII eSerProII eGI uThrVal ProVal - Hae III

AAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGG

2613 --------------------------------- -----------------+---------------------------+ 2654

TTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACC

LysLeuLysProGlyMetAspGlyProLysValLysGlnTrp -

(Note: The numbering corresponds to the numbering of Ratner et al, Nature, 313:277 (1985) plus 453 base pairs.)

2. Construction of PRO4

A 516-bp Hae III fragment was isolated from BH 10, using standard cloning techniques and inserted at the XmaI filled-in restriction site of pOTSKF33 in the correct orientation. The Hae III restriction fragment has the following nucleotide sequence and corresponding

translation for the reading frame of interest:

CCAGGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCATTTCTTCAGAGCAGA

21 13 ------------------+----------------------------------+------------------+-------------------------------+

GGTCCCTTAAAAGAAGTCTCGTCTGGTCTCGGTTGTCGGGGTGGTAAAGAAGTCTCGTCT

ArgGIuPheSerSerGIuGlnThrArgAlaAsnSerProThrlleSerSerGluGInThr -

CCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAGAGACAACAACTCCC

2179 ------------------+----------------------------------+------------------+-------------------------------+ 2238

GGTCTCGGTTGTCGGGGTGGTCTTCTCTCGAAGTCCAGACCCCATCTCTGTTGTTGAGGG

ArgAlaAsnSerProThrArgArgGluLeuGlnValTrpGlyArgAspAsnAsnSerPro -

CCTCAGAAGCAGGAGCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCAGATCACTC

2239 ------------------+----------------------------------+------------------+-------------------------------+ 2298

GGAGTCTTCGTCCTCGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCTAGTGAG

SerGluAlaGlyAlaAspArgGlnGlyThrValSerPheAsnPheProGlnileThrLeu -

TTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAG

2299 ------------------+----------------------------------+------------------+-------------------------------+ 2358

AAACCGTTGCTGGGGAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATC

TrpGlnArgProLeuValThrlleLysIIeGIyGlyGlnLeuLysGluAl aLeuLeuAsp -

ATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAA

2359 ------------------+----------------------------------+------------------+-------------------------------+ 2418

TATGTCCTCGTCTACTATGTCATAATCTTCTTTACTCAAACGGTCCTTCTACCTTTGGTT

ThrGl yAl aAspAspThrVal LeuGl uGl uMetSerLeuProGl yArgTrpLysProLys -

AAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAG

2419 ------------------+----------------------------------+------------------+-------------------------------+ 2478

TTTACTATCCCCCTTAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATC

MetlleGlyGlylleGIyGIyPhelleLysValArgGInTyrAspGlnileLeuIIeGIu - AAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAA

2479 ----------- ------------+----------------------------------+------------------+-------------------------------+ 2538

TTTAGACACCTGTATTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATT

IleCysGlyHisLysAlalleGlyThrValLeuValGlyProThrProValAsnllelle -

TTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTG

2539 --------------------------+----------------------------------+------------------+-----------------------------+ 2598

AACCTTCTTTAGACAACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAAC

GlyArgAsnLeuLeuThrGl nIl eGlyCysThrLeuAsnPheProII eSerProII eGI u

Hae III AGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGG

2599 --------------------------+----------------------------------+------------------+-------------+---------- 2654

TCTGACATGGTCATTTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACC

ThrValProValLysLeuLysProGlyMetAspGlyProLysValLysGInTrp -

(Note: The numbering corresponds to the numbering of Ratner et al,

Nature. 313:277 (1985) plus 453 base pairs.)

II. Expression of the HIV Protease in E. coli

To express the HIV protease gene product in

E. coli, overlapping fragments, encompassing the protease coding region, were inserted into the pOTSKF33 bacterial expression vector, as described above in I. The fragments contained the two domains (I and II) that are highly conserved among all known retroviral protease, as well as the region between these domains. The fragments differed only in the amount of viral sequence information they carried upstream and downstream from the conserved

domains. The PRO3 and PRO4 fragments contained the proteolytic cleavage site, (indicated by the asterisk) Thr-Leu-Asn-Phe*Pro, positioned downstream of domain II, from which the amino terminus of reverse transcriptase derives (Figure 1B) and another protease cleavage site, Ser-Phe-Asn-Phe*Pro, positioned approximately 20 codons upstream of conserved domain I, from which the amino terminus of HIV-1 protease derives. Proteolytic cleavage at these two sites yields a 10-kDa polypeptide which correspond to the mature protease.

The protein products expressed by the PRO4 constructs were examined initially by NaDodSO./PAGE by the method described in Aldovini, Debouck et al, supra.

The PRO3 and PRO44 constructs did not produce proteins with the size expected for their encoded products (20 kDa and 25 kDa, respectively). Instead, both

constructs gave rise to a protein of approximately 10 kDa that was visualized by immunoblot analysis using a

polyclonal antibody specific for the protease region, produced according to the method of Aldovini, Debouck et al, supra. The 20-kDa and 25-kDa proteins expected from the PR03 and PR04 constructs could be observed for a brief period immediately after induction, but rapidly

disappeared to give rise to the 10-kDa protein. This apparent conversion from precursor to mature 10-kDa form occurred in the presence of chloramphenicol, a strong translation inhibitor, indicating that the 10-kDa protein results from the post-translational processing of the larger precursors.

III. Characterization of HIV-1 Protease Expressed in

E. coli

A. Auto-processing of the HIV Protease

The generation of a discrete 10-kDa protein by both PRO3 and PRO4 constructs is consistent with specific processing occurring at the two protease cleavage sites positioned on either side of the conserved domains. If the HIV protease product itself is responsible for this processing, then it suggests that this protease has an autocatalytic capability. To discriminate between

bacterial proteolysis and HIV protease auto-processing, smaal alteration was positioned.within the protease coding region far from the cleavage sites to inactivate the protease without affecting the susceptibility of the precursor protein to bacterial proteolysis.

Plasmid PRO4-BX was derived from the PRO4 expression vector by digestion with Bel I, treatment with DNA polymerase (Klenow) and ligation with the 8-mer oligonucleotide 5' CCTCGAGG 3'. This treatment resulted in the insertion of four codons (encoding Pro-Ser-Arg-Asp) in the protease region between the conserved domains.

Induction of this mutant derivative gave rise to a 25-kDa protein that is the size expected from the entire PR04 coding region. No 10-kDa protein was ooserved in the extract. This result suggested that the small insertion destroyed the protease activity and that the processing observed truly, results from the expression and

autocatalytic processing of the HIV protease itself. B. Amino Acid Sequencing of HIV-1 Protease

To demonstrate directly that the 10-kDa product from induced PRO3 and PRO44 extracts indeed corresponds to the protease protein, this product was isolated and subjected to standard amino acid sequence analysis. The results indicated that the amino terminal sequence

precisely matched the sequence predicted from the proposed cleavage site located upstream of conserved domain I, Pro-Gln-Ile-Thr-Leu.

IV. Catalytic Activity of HIV-1 Protease Expressed in

E. coli

A. Construction of Vectors Expressing Fusion

Protein Having HIV Protease Cleavage Site

For the expression of HIV-1 precursor protein gag in E. coli, the pOTSKF33 bacterial expression vector, as described above in I, was used. A translational fusion was created between the HIV-1 gag reading frame and the first 56 amino acids of galactokinase sequence on the expression vector. A 1286-base pair (bp) Cla I-Bg II filled-in restriction endonuclease fragment, including most of the pl7, all of the p24, and half of the p15 gag coding sequence, was inserted at the Stul site of the pOTSKF33 fusion vector for expression of p55 gag. This construct is known as pOTSKF33-p55. The Clal-Bglll restriction fragment was isolated from the BH10 clone of the HTLVIIIB isolate of HIV [Shaw et al, Science,

226:1165-1171 (1984)]. B. Construction of pDPT287

Plasmid pDPT287 is a derivative of the chloramphenicol-resistant pDPT270 plasmid of the incFII incompatibility group, which can co-exist with the

ColEI-like pOTSKF33 derivatives in bacteria [Taylor et al, J. Bacteriol., 137:92-104 (1979)]. pDPT287 differs from pDPT270 by two deletions, one removing a small HindiII fragment between the streptomycin-resistance gene and the incFII origin of replication and the other removing a Hindi fragment between the chloramphenicol and

streptomycin-resistance gene.

Plasmid pDPT287-p55 was constructed by excising the entire P_L-p55 transcriptional cassette from

pOTSKF33-p55, described in IV above, on a Bgllll-Sall filled-in restriction endonuclease fragment and inserting it between the unique BamHI and HindiII filled-in sites of pDPT287. This deletes the streptomycin-resistance gene of the vector. The orientation of the fragment with respect to the plasmid markers did not influence the expression of the galK-p55 product.

Induction of E. coli cells carrying the gag-expressing vector pDPT287-p55 alone gave rise to a 50-kDa protein, which is the size expected for the p55 gag protein encoded by this construct. Using the more

sensitive immunoblot analysis with an antiserum specific for the gag region, the same 50-kDa major protein was observed, as well as some smaller, minor polypeptides that result presumably from bacterial proteolysis or internal translation initiation.

C. Activity of HIV-1 Protease

To examine the proteolytic effect of the protein products from each of the PR03 and PRO4 constructs, on the 50-kDa gag precursor, bacteria were co-transformed with the p55-expressing vector and each of the PRO constructs separately. Both expression plasmids were induced simultaneously, and the extracts were analyzed by

immunoblot with the gag-specific antiserum. Polyclonal antibodies were raised against the p55 gag and PROl products according to the method described in Aldovini, Debouck et al, supra. Co-induction of the pDPT287-p55 and either the PRO3 or PRO4 construct resulted in the

disappearance of the large 50-kDa gag precursor and the concomitant appearance of two new proteins, one of about 24 kDa and the other of about 17 kDa. The 24-kDa product corresponds to the mature p24 gag protein, whereas the 17-kDa protein contains most of the pl7 gag region fused to the first 56 amino acids of galK. Clearly, induction of both vectors within the same cell resulted in the processing of the p55 gag precursor into products with sizes consistent with cleavage at the known viral

processing sites. Proper processing of purified 50-kDa gag precursor protein was also observed in vitro upon incubation with an induced PRO4 bacterial extract that only contained the mature 10-kDa protease product.

To confirm that this effect was specific for the products expressed by the PR03 and PRO4 constructs, the co-induction experiment was repeated with two inactive PR04 derivatives. PRO4-BX (described in IIIA) contained the four-codon insertion that eliminated protease

auto-processing, and the other, PRO4-BS (described

immediately below) contained a translation stop codon in the middle of the protease coding region.

Plasmid PRO4-BS was derived from the PRO4 expression vector by digestion with Bel I, treatment with DNA polymerase (Klenow), and ligation with the 12-mer oligonucleotide 5" CATGTTAACTAG 3", which introduces stop codons in all three reading frames. This treatment resulted in the interruption of the protease coding sequence between the conserved domains.

No p55 gag processing was observed with either of these vector constructs. These results clearly

demonstrate that the products of the PRO3 and PRO4

constructs are proteolytically active in both their precursor and mature forms and can carry out the

maturation of the gag polyprotein precursor in addition to auto-processing.

D. Protease from PRO4 Cleaves Protein at Sites Used in Viral Infection

To examine whether the PRO4-mediated p55 gag cleavage reaction observed in E. coli occurs precisely at the sites used in viral infection, the processed 24-kDa gag protein was purified from induced cells, and its amino-terminal sequence was determined.

E. coli cells co-expressing the PRO4 and pDPT287-p55 constructs were disrupted by sonication in

50 mM Tris-HCl, pH 7.5/1 mM EDTA/1 mM dithiothreitol/1 mM phenylmethylsulfonyl fluoride. The soluble fraction was applied to a Mono Q high-performance anion-exchange column (Pharmacia) equilibrated in 10 mM Tris-HCl, pH 7.5. The putative p24 protein was found in the nonadsorbed fraction and subjected to reverse-phase HPLC [Brownlee RP-300 octyl column, 4.6 x 250 mm, equilibrated with 30% (vol/vol) acetonitrile in 0.05% trifluoroacetic acid over 45

minutes. The p24 protein eluted in the range of 48-54% (vol/vol) acetonitrile and was approximately 90% pure as judged by Coomassie blue-stained NaDodSO₄/polyacrylamide gel.

A sample of purified protein was subjected to 10 cycles of automated Edman degradation in a Beckman 890M protein sequencer. Released phenylthiohydantoin-amino- acid derivatives were analyzed by reverse-phase HPLC on a Beckman Ultrasphere ODS column (2 x 250 mm). The first 10 amino acid residues precisely matched the amino-terminal sequence of the mature p24 gag protein isolated from viral particles, as reported in Casey et al, J. Virol.,

55:417-423 (1985). This demonstrates the authenticity of the p55 gag proteolytic cleavage carried out in bacteria by the 10-kDa recombinant HIV protease. The

bacterially-produced protease seems to be extremely active as relatively small amounts of the enzyme efficiently cleave high levels of the gag precursor.

Example 2.

Expression of the HIV-1 Protease Requiring No Auto- processing

A plasmid expressing HIV-1 protease not requiring auto-processing for proteolytic activity was constructed as follows:

The PRO5 construct was derived from the PRO4 construct, described in Example 1, by introduction of a translation initiation codon immediately before the first codon of the mature protease (Proline 1) and a translation termination codon immediately after the last codon of the mature protease (Phenylalanine 99). This was done by site-specific mutagenesis using oligonucleotides with the following sequences:

* ** *

5' CCTTTCCATATGCCTCAGATC 3' for the initiation codon

***

5' GCACTTTAAATTTTTAGATTAGCCC 3' for the termination codon

* indicates the nucleotides that differ from the

wild-type sequence The mutation that created the initiation ATG also introduced a convenient Ndel restriction endonuclease site that was used to connect the ATG initiation codon directly to the ribosome binding site of pMG, a pAS derivative with a unique Ndel site at the ATG [M. Gross et al, Mol. Cell Biol., 5:1015-1024 (1985)].

Induction of PRO5, using the method described in Example 1, results in the direct synthesis of the

full-length mature protease without a protease precursor intermediate.

The PRO5 insert, a 375 bp Ndel-Haelll fragment, has the following nucleotide sequence and corresponding translation for the reading frame of interest:

CATATGCCTCAGATCACTCTTTGGCAACGACCCCTCGRCACAATAAAGATAGGGGGGCAA 1 ----------- ------------+----------------------------------+------------------+-------------------------------+60

GTATACGGAGTCTAGTGAGAAACCGTTGCTGGGGAGCAGTGTTATTTCTATCCCCCCGTT

MetProGlnIleThrLeuTrpGlnArgProLeuValThrlleLysIIeGIyGIyGIn -

CTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTG

61 ----------- ------------+----------------------------------+------------------+-------------------------------+120

GATTTCCTTCGAGATAATCTATGTCCTCGTCTACTATGTCATAATCTTCTTTACTCAAAC

LeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluMetSerLeu -

CCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAG

121 ----------- ------------+----------------------------------+------------------+------------------------------+180

GGTCCTTCTACCTTTGGTTTITACTATCCCCCTTAACCTCCAAAATAGTTTCATTCTGTC

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGIn -

TATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGA

181 ----------- ------------+----------------------------------+------------------+----------------------------+240

ATACTAGTCTATGAGTATCTTTAGACACCTGTATTTCGATATCCATGRCATAATCATCCT

TyrAspGlnIleLeuIIeGIulleCysGlyHisLysAlalleGIyThrValLeuValGly -

CCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAAT

241 ----------- ------------+----------------------------------+------------------+-------------------------------+ 300

GGATGRGGACAGTTGTATTAACCTTCTTTAGACAACTGAGTCTAACCAACGTGAAATTTA

ProThrProValAsnllelleGIyArgAsnLeuLeuThrGlnileGIyCysThrLeuAsn -

TTTTAGATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCA

301 ----------- ------------+----------------------------------+------------------+-------------------------------+ 360

AAAATCTAATCGGGATAACTCTGACATGGTCATTTTAATTTCGGTCCTTACCTACCGGGT

PheEnd

Hae III

AAAGTTAAACAATGG

361 ----------- + 375

TTTCAATTTGTTACC Example 3 .

Production of Mature Beta-Globin

I. Construction of Vector-expressing Beta-Globin Fusion Protein

The bacterial expression vector, pSKF-Pr-bglo, which produces a fusion protein consisting of the first 56 amino acids of galK, followed by a cleavage sequence for the HIV protease and ending with the full-length coding sequence for the beta-chain of human hemoglobin was constructed as follows:

A. The full-length human beta-globin coding sequence was inserted between the Smal and Xbal

restriction sites of the p0TSKF33 expression vector, described in Example 1, to create a translational fusion between the first 56 codons of galK from the vector and the beta-globin coding sequence. This construction, pSKF-bglo, contained a unique Neol restriction site at the junction between galK and beta-globin.

B. pSKF-bglo was linearized with Neol and an oligonucleotide with Neol overends,

5' CATGCAGGTCAGCCAAAATTACCCGATCGTGGC 3' encoding an HIV protease cleavage site, was inserted in the vector to create pSKF-Pr-bglo.

When pSKF-Pr-bglo was introduced in E. coli cell together with a plasmid encoding the HIV protease, as described in Example 1, complete processing of the fusion protein at the galK-cleavage site junction was observed upon induction of the promoters, resulting in the

over-production of mature beta-globin. The above description and examples fully disclose the invention, including preferred embodiments thereof. Modifications of the methods described that are obvious to those skilled in the art to which the invention pertains are intended to be within the scope of the following claims:

Claims

WHAT IS CLAIMED IS:

1. A method for producing a protein product in a

recombinant host cell comprising:

• culturing a host cell which has been

co-transformed with two recombinant DNA molecules, • the first of said recombinant DNA molecules

comprises a fusion protein nucleotide sequence, wherein the fusion protein encoded by said coding sequence comprises a first portion, a protein product portion and a protease cleavage site portion therebetween, and • the second of said recombinant DNA molecules

comprising a coding sequence for a protease or an active fragment or derivative thereof, capable of cleaving amino acids encoded by the protease cleavage site portion,

• whereby upon culturing said cell, both said fusion protein and said protease are expressed, and said protease cleaves said fusion protein at said protease cleavage site to thereby produce said protein product.

2. The method of claim 1 further comprising the step of isolating the protein product from the culture.

3. The method of claim 1 wherein said first recombinant DNA molecule further comprises at least one

expression control sequence operatively linked with said fusion protein nucleotide sequence.

4. The method of claim l wherein second recombinant DNA molecule further comprises at least one expression control sequence operatively linked with said protease nucleotide sequence.

5. The method of claim 1 where in said two recombinant DNA molecules are incorporated into at least one expression vector.

6. The method of claim 5 wherein said at least one

expression vector is two expression vectors, the first of which comprises said first recombinant DNA molecule and the second of which comprises said second recombinant DNA molecule.

7. The method of claim 6 wherein said two expression vectors are compatible in the same cell.

8. The method of claim 1 wherein said two recombinant DNA molecules are incorporated into the genome of said cell.

9. The method of claim 1 wherein said protease cleavage site is a retroviral protease cleavage site.

10. The method of claim 9 wherein said retroviral

protease cleavage site is an immunodeficiency virus protease cleavage site.

11. The method of claim 10 wherein said immunodeficiency virus protease cleavage site is an HIV-1 protease cleavage site.

12. The method of claim 11 wherein said cleavage site is selected from the group consisting of

R₁.tyrosine.proline.R₂,

R₁leucine.alanine.R₂,

R₁.methionine.methionine.R₂,

R₁. phenylalanine.leucine.R₂,

R₁.phenylalanine.proline.R₂, and

R₁.leucine.phenylalanine.R₂

wherein R₁ is (X₁)_n(X₂)_o(X₃)_p ^{and R}2

is (X₄)_r(X₅)_s and X₁ , X₂, X₃, X₄ and

X₅ are independently an amino acid and n,o,p,r and s are independently 0 or 1.

13. The method of claim 12 wherein X₁ is selected from the group consisting of serine and threonine.

14. The method of claim 9 wherein said cleavage site

comprises a nucleic acid sequence coding for the peptides selected from the group consisting of

Serine.Glycine.Asparagine.Tyrosine.Proline.Isoleucine.

Valine,

Serine.Phenylalanine.Asparagine.Phenylalanine.Proline.

Glycine.Isoleucine; and

Threonine.Leucine.Asparagine.Phenylalanine.Proline.

Isoleucine.Serine.

16. The method of claim 1 wherein said protease is a

retroviral disease.

17. The method of claim 16 wherein said retrovirus is an immunodeficiency virus.

18. The method of claim 17 wherein said immunodeficiency virus is HIV-1.

19. The method of claim 1 wherein said first portion is an E. coli galactokinase gene segment.

20. The method of claim 19 wherein said E. coli

galactokinase gene segment is the first 56 codons of the E. coli galactokinase gene.

21. The method of claim 1 wherein said cell is an E. coli cell.

22. A method of preparing a protein product, comprising the steps of : a. providing a first recombinant DNA molecule

comprising a fusion protein nucleotide sequence, wherein the fusion protein encoded by said coding sequence comprises a first portion, a protein product portion and a protease cleavage site portion therebetween; b. providing a second recombinant DNA molecule

comprising a coding sequence for a protease or an active fragment or derivative thereof, capable of cleaving amino acids encoded by the protease cleavage site portion; c. incorporating said first and second recombinant DNA molecules into the same cell, said cell being capable of expressing said fusion protein and said protease; d. culturing said cell under conditions selected to express said fusion protein and said protease, whereby said protease cleaves said fusion protein at said protease cleavage site to thereby produce said protein product.

23. The method of claim 20 wherein said incorporating step comprises incorporating said first and second recombinant DNA molecules into at least one

expression vector and incorporating said at least one expression vector into a cell capable of expressing said fusion protein and said protease.

24. The method of claim 23 wherein said at least one

25. The method of claim 24 wherein said two expression vectors are compatible in the same cell.

26. The method of claim 22 wherein said incorporating

comprises incorporating said first and second

recombinant DNA molecules into the genome of the same cell.

27. The method of claim 22 wherein said protease cleavage site is a retroviral protease cleavage site.

28. The method of claim 27 wherein said retroviral

protease cleavage site is an immunodeficiency virus protease cleavage site.

29. The method of claim 28 wherein said immunodeficiency virus cleavage site is an HIV-1 protease cleavage site.

30. The method of claim 27 wherein said retroviral cleavage site is selected from the group consisting of

R₁.tyrosine.proline.R₂,

R₁leucine.alanme.R₂,

R₁.methionine.methionine.R₂,

R₁. phenylalanine.leucine.R₂,

R₁.phenylalanine.proline.R₂, and

R₁.leucine.phenylalanine.R₂

wherein R₁ is (X₁)_n(X₂)_o(X₃)_pand R₂

is (X₄)_r(X₅)_s and X₁, X₂, X₃, X₄ and

X₅ are independently an amino acid and n,o,p,r and s are independently 0 or 1

31. The method of claim 30 wherein X is selected from the group consisting of serine and threonine.

32. The method of claim 27 wherein said cleavage site

Serine.Glycine.Asparagine.Tyrosine.Proline.Isoleucine.

Valine,

Serine.Phenylalanine.Asparagine.Phenylalanine.Proline.

Glycine. Isoleucine; and

Threonine.Leucine.Asparagine.Phenylalanine.Proline.

Isoleucine.Serine.

33. The method of claim 22 wherein said protease is a

retroviral protease.

34. The method of claim 33 wherein said retrovirus is an immunodeficiency virus.

35. The method of claim 34 wherein said immunodeficiency virus is HIV-1.

36. A fusion protein nucleotide sequence comprising a first portion, a protein product portion and a retroviral protease cleavage site portion

therebetween.

37. The nucleotide sequence of claim 36 wherein said

first portion is an E. coli galactokinase gene segment.

38. The method of claim 37 wherein said E. coli

galactokinase gene is the first 56 codons the E. coli galactokinase gene.

39. The nucleotide sequence of claim 36 wherein said

retroviral protease cleavage site is an

immunodeficiency virus protease cleavage site.

40. The method of claim 39 wherein said immunodeficiency virus protease cleavage site is an HIV-1 protease cleavage site.

41. The nucleotide sequence of claim 36 wherein said

retroviral protease cleavage site comprises

nucleotides coding for the amino acid sequence selected from the group consisting of

R₁.tyrosine.proline.R₂,

R₁leucine.alanine.R₂,

R₁.methionine.methionine.R₂,

R₁.phenylalanine.leucine.R₂,

R₁.phenylalanine.proline.R₂, and

R₁.leucine.phenylalanine.R₂

wherein R₁ is (x₁ ) _n (x₂ ) _o (X₃ ) _p and R₂

is (X₄)_r(X₅)_s and X₁, X₂, X₃, X₄ and

42. The method of claim 41 wherein X₁ is selected from thr group consisting of serine and threonine.

43. The method of claim 36 wherein said cleavage site

Serine. Glycine.Asparagine.Tyrosine.Proline.Isoleucine.

Valine,

Serine.Phenylalanine .Asparagine. Phenylalanine .Proline.

Glycine. Isoleucine; and

Threonine. Leucine .Asparagine.Phenylalanine.Proline.

Isoleucine. Serine.

44. The nucleotide sequence of claim 36 incorporated into a plasmid.

45. The nucleotide sequence of claim 36 incorporated into a cell.

46. A nucleotide sequence coding for HIV-1 protease or a proteolytically active fragment or derivative thereof.

47. The nucleotide sequence of claim 46 wherein said

nucleotide sequence comprises the sequence or the complement of said sequence

CCTCAGATCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAA

--------+---- -------------+----------------------------------+------------------+--------------------------+

GGAGTCTAGTGAGAAACCGTTGCTGGGGAGCAGTGTTATTTCTATCCCCCCGTT

CTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTG

---------+-- ------------+----------------------------------+------------------+-------------------------------+

GATTTCCTTCGAGATAATCTATGTCCTCGTCTACTATGTCATAATCTTCTTTACTCAAAC CCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAG ----------- -------+-----+----------------------------------+------------------+-------------------------------+

GGTCCTTCTACCTTTGGTTTTTACTATCCCCCTTAACCTCCAAAATAGTTTCATTCTGTC TATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGA

----------- -----+-------+----------------------------------+------------------+-------------------------------+

ATACTAGTCTATGAGTATCTTTAGACACCTGTATTTCGATATCCATGTCATAATCATCCT

CCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAAT

----------- ----+--------+----------------------------------+------------------+-------------------------------+

GGATGTGGACAGTTGTATTAACCTTCTTTAGACAACTGAGTCTAACCAACGTGAAATTTA

TTT

---

AAA or proteolytic fragments, mutations, additions or dele thereof.

48. The nucleotide sequence of claim 46 wherein said

sequence or the complement of said sequence is

GCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCAGATCACTCTTTGGCAACGACCC

----------- --+----------+----------------------------------+------------------+-------------------------------+

CGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCTAGTGAGAAACCGTTGCTGGG

CTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGAT

----------- -+-----------+----------------------------------+------------------+-------------------------------+

GAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATCTATGTCCTCGTCTA

GATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGA

-------- ---------+---+----------------------------------+------------------+-------------------------------+

CTATGTCATAATCTTCTTTACTCAAACGGTCCTTCTACCTTTGGTTTTTACTATCCCCCT ATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACAT ----------- ------------+----------------------------------+------------------+-------------------------------+

TAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATCTTTAGACACCTGTA

AAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTG

----------- ------------+----------------------------------+------------------+-------------------------------+

TTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATTAACCTTCTTTAGAC

TTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTA

AACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAACTCTGACATGGTCAT

AAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGG

----------- ------------+----------------------------------+------------------+

TTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACC

49. The nucleic acid sequence of claim 46 wherein said sequence or the complement of said sequence is

GCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCAGATCACTCTTTGGCAACGACCC

----------- ------------+----------------------------------+------------------+--------------------------------------+

CGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCTAGTGAGAAACCGTTGCTGGG

CTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGAT

----------- ------------+----------------------------------+------------------+------------------------------------+

GAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATCTATGTCCTCGTCTA

GATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGA

----------- ------------+----------------------------------+------------------+-------------------------------------+

----------- ------------+----------------------------------+------------------+---------------------------------+

TAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATCTTTAGACACCTGTA AAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTG

TTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATTAACCTTCTTTAGAC TTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTA

AACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAACTCTGACATGGTCAT

AAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGG

----------- ------------+-----------------------------+-----+------------------+----

TTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACC or proteolytic fragments, mutations, additions or deletiol thereof.

50. The nucleic acid sequence of claim 46 wherein said

sequence or the complement of said sequence is

CATATGCCT-CAGATCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAA

------ ------------+----------------------------------+------------------+------------------------------------+-------+

GTATACGGAGTCTAGTGAGAAACCGTTGCTGGGGAGCAGTGTTATTTCTATCCCCCCGTT

CTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTG

----------- ------------+--------------------------+--------+------------------+-------------------------------+------+

GATTTCCTTCGAGATAATCTATGTCCTCGTCTACTATGTCATAATCTTCTTTACTCAAAC

CCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAG

------ ------------+------------------------------+----+------------------+-----------------------------------+----+

GGTCCTTCTACCTTTGGTTTTTACTATCCCCCTTAACCTCCAAAATAGTTTCATTCTGTC

TATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGA

----------- ------------+----------------------------------+------------------+----------------------------------+

ATACTAGTCTATGAGTATCTTTAGACACCTGTATTTCGATATCCATGTCATAATCATCCT CCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAAT ----------- -------+-----+----------------------------------+------------------+--------------+------------------------+

GGATGTGGACAGTTGTATTAACCTTCTTTAGACAACTGAGTCTAACCAACGTGAAATTTA

TTTTAGATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCA

----------- ---------+---+----------------------------+------+------------------+-------------------------------------+

AAAATCTAATCGGGATAACTCTGACATGGTCATTTTAATTTCGGTCCTTACCTACCGGGT

AAAGTTAAACAATGG

--------------------+----------

TTTCAATTTGTTACC

51. A recombinant E. coli protease product selected from the group of peptides consisting of

ProGlnlleThrLeuTrpGlnArgProLeuValThrIleLysIleGlyGlyGln

LeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluMetSerLeu

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGln

TyrAspGlnlleLeuIleGluIleCysGlyHisLysAlalleGlyThrValLeuValGly

ProThrProValAsnllelleGlyArgAsnLeuLeuThrGlnlleGlyCysThrLeuAsn

Phe,

AlaAspArgGlnGlyThrValSerPheAsnPheProGlnlleThrLeuTrpGlnArgPro

LeuValThrlleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp

AspThrValLeuGluGluJϊetSerLexiProGlyArgTrpLysProLysMetlleGlyGly

IleGlyGlyPhelleLysValArgGlnTyrAspGlnlleLeuIleGlylleCysGlyHis

LysAlalleGlyThrValLeuValGlyProThrProValAsnllelleGlyArgAsnLeu

LeuTlirGlnlleGlyCysThrLeiiAsnPheProIleSerProIleGluThrValProVal

LysLeuLysProGlyMetAspGlyProLysValLysGlnTrp,

AlaAspArgGlnGlyThrValSerPheAsnPheProGlnlleThrLeuTrpGlnArgPro

LeuValThrlleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp

AspThrValLeuGluGluMetSerLeuProGlyArgTrpLysProLysMetlleGlyGly

IleGlyGlyPhelleLysValArgGlnTyrAspGlnlleLeuIleGlylleCysGlyHis

LysAlalleGlyThrValLeuValGlyProThrProValAsnllelleGlyArgAsnLeu

LeuThrGlnlleGlyCysThrLeuAsnPheProIleSerProIleGluThrValProVal

LysLeuLysProGlyMetAspGlyProLysValLysGlnTrp, and MetProGlnlleThrLeuTrpGlnArgProLeuValThrlleLysIleGlyGlyGln

LeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluMetSerLeu

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGln

TyrAspGlnlleLeuIleGluIleCysGlyHisLysAlalleGlyThrValLeuValGly

ProThrProValAsnllelleGlyArgAsnLeuLeuThrGlnlleGlyCysThrLeuAsn

PheEnd

52. The protease product of claim 51 wherein said

protease product is a proteolytically active fragment, mutation, addition or deletion of said protease product.

53. A protease specific for a cleavage site comprising the amino acid sequence selected from the group comprising

ProGlnIleThrLeuTrpGlArgProLeuValThrIleLysIleGlyGlyGln

LeuLysGluAlaLeuLeuAspϊhrGlyAlaAspAspThrValLeuGluGluMetSerLeu

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGln

TyrAspGlnlleLeuIleGluIleCysGlyHisLysAlalleGlyThrValLeuValGly

ProThrProValAsnlleileGlyArgArnLeuLeuThrGlnlleGlyCysThrLeuAsn

Phe,

AlaAspArgGlnGlyThrValSerPheAsnPheProGlnIleThrLeuTrpGlnArgPro

LeuValThrlleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp

AspThrValLeuGluGluMetSerLeuProGlyArgTrpLysProLysMetlleGlyGly

IleGlyGlyPhelleLysValArgGΪnTyrAspGlnlleLeuIleGlylleCysGlyHis

LysAlalleGlyThrValLeuValGlyProThrProValAsnlleileGlyArgAsnLeu

LeuThrGlnlleGlyCysThrLeuAsnPheProIleSerProIleGluThrValProVal

LysLeuLysProGlyMetAcpGlyProLysValLysGlnTrp,

AlaAspArgGlnGlyThrValSerPheAsnPheProGlnlleThrLeuTrpGlnArgPro

LeuValThrIleLysIleGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAsp

AspThrValLeuGluGluMetSerLeuProGlyArgTrpLysProLysMetlleGlyGly

IleGlyGlyPhelleLysValArgGlnTyrAspGlnlleLeuIleGlylleCysGlyHis

LysAlalleGlyThrValLeuValGlyProThrProValAsnllelleGlyArgAsnLeu

LeuThrGlnlleGlyCysThrLeuAsnPheProIleSerProIleGluThrValProVal

LysLeuLysProGlyMetAspGlyProLysValLysGlnTrp, and

MetProGlnlleThrLeuTrpGlnArgProLeuValThrlleLysIleGlyGlyGln

LeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluMetSerLeu

ProGlyArgTrpLysProLysMetlleGlyGlylleGlyGlyPhelleLysValArgGln

TyrAspGlnlleLeuIleGluIleCysGlyHisLysAlalleGlyThrValLeuValGly

ProThrProValAsnllelleGlyArgAsnLeuLeuThrGlnlleGlyCysThrLeuAsn.

PheEnd

54. The protease of claim 53 wherein said protease is a proteolytically active fragment, mutation, addition or deletion of said protease.

55. A cell having incorporated therein two recombinant DNA molecules, one of said recombinant molecules comprising a fusion protein nucleotide sequence and the second of said recombinant DNA molecules

comprising a nucleotide sequence coding for a protease capable of cleaving the fusion protein at said protease cleavage site; said fusion protein nucleotide sequence encoding a first portion, a protein product portion and a protease cleavage site portion therebetween.

56. The cell of claim 55 wherein said fusion protein

nucleotide sequence and said protease nucleotide sequence are incorporated into at least one

expression vector within said cell.

57. The cells of claim 55 wherein said cells have two expression vectors, one of which comprises said fusion protein nucleotide sequence, and the other of which comprises said protease nucleotide sequence.

58. The cells of claim 55 wherein said cells are E. coli cells.

59. The cells of claim 55 wherein said two recombinant DNA molecules further comprise at least one

expression control sequence operatively linked with said fusion protein nucleotide sequence and said protease nucleotide sequence, respectively.