+

WO1998007867A2 - Metabolically engineered lactic acid bacteria and means for providing same - Google Patents

Metabolically engineered lactic acid bacteria and means for providing same Download PDF

Info

Publication number
WO1998007867A2
WO1998007867A2 PCT/DK1997/000336 DK9700336W WO9807867A2 WO 1998007867 A2 WO1998007867 A2 WO 1998007867A2 DK 9700336 W DK9700336 W DK 9700336W WO 9807867 A2 WO9807867 A2 WO 9807867A2
Authority
WO
WIPO (PCT)
Prior art keywords
ala
val
leu
gly
glu
Prior art date
Application number
PCT/DK1997/000336
Other languages
French (fr)
Other versions
WO1998007867A3 (en
Inventor
José ARNAU
Hans Israelsen
Astrid Vrang
Flemming Joergensen
Soeren Michael Madsen
Original Assignee
Bioteknologisk Institut
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bioteknologisk Institut filed Critical Bioteknologisk Institut
Priority to EP97934442A priority Critical patent/EP0938566A2/en
Priority to NZ334294A priority patent/NZ334294A/en
Priority to CA002262418A priority patent/CA2262418A1/en
Priority to AU37659/97A priority patent/AU721803B2/en
Publication of WO1998007867A2 publication Critical patent/WO1998007867A2/en
Publication of WO1998007867A3 publication Critical patent/WO1998007867A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0008Oxidoreductases (1.) acting on the aldehyde or oxo group of donors (1.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0006Oxidoreductases (1.) acting on CH-OH groups as donors (1.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1025Acyltransferases (2.3)
    • C12N9/1029Acyltransferases (2.3) transferring groups other than amino-acyl groups (2.3.1)

Definitions

  • the present invention pertains to the field of lactic acid bacterial starter cultures which are useful in the production of food products, animal feed or aroma compounds, and specifically there is provided means for metabolically engineering such lactic acid bacteria which are thereby modified in their production of metabolic end products including aroma or flavour compounds and/or compounds having antimicrobial effects.
  • Lactic acid bacteria are used extensively as starter cultures in the food industry in the manufacture of fermented products including milk products such as e.g. yoghurt and cheese, meat products, bakery products, wine and vegetable products. Lacto - coccus lactis is one of the most commonly used lactic acid bacteria in dairy starter cultures. However, several other lactic acid bacteria such as Leuconostoc species, Lactobacillus species and Streptococcus species are also commonly used in food starter cultures. In the art, species of the obligate anaerobic bacteria belonging to BifidoJacteriuin which are taxonomically different from the group of bacteria generally referred to as lactic acid bacteria, are frequently included in the group of lactic acid bacteria due to their application as dairy starter cultures. Lactic acid bacteria are also commonly used as inoculants in feedstuffs of plant and animal origin, i.a. for preservation purposes.
  • diacetyl is one essential flavour compound which is formed during fermentation of the citrate-utilizing species of e.g. Lactococcus, Leuconostoc, and Lactobacillus .
  • Diacetyl is formed by an oxidative decarboxyla- tion (Rl, Fig. l) of ⁇ -acetolactate which is formed from two molecules of pyruvate by the action of o.-acetolactate synthase (R2, Fig. 1) .
  • Pyruvate is a key intermediate of several lactic acid bacterial metabolic pathways including the citrate metabolism and the degradation of lactose or glucose to lactate.
  • the pool of pyruvate in the cells is critical for the flux through the pathway leading to diacetyl , acetoin and 2 , 3 butylene glycol due to o.-acetolactate synthase affinity for pyruvate.
  • Overproduction of cv-acetolactate synthase in Lactococcus lactis as an approach for increased production of diacetyl has been disclosed by Platteuw et al . 1995.
  • An alternative metabolic engineering approach to providing an increased pool of pyruvate in lactic acid bacteria is to block one or several pyruvate degrading pathways.
  • a Lactococcus lactis mutant defective in the lactate dehydrogenase (R3, Fig. 1) has been disclosed by Gasson et al . (ref. 8, unpublished data, in Platteuw et al . 1995). Under aerobic conditions pyruvate is accumulated in this mutant leading to the formation of increased levels of acetoin and 2,3 butylene glycol.
  • PDC pyruvate dehydrogenase complex
  • the activity of PDC appears to be optimal under aerobic conditions (Snoep et al . 1992). Consequently, the pyruvate pool assumingly will be increased under anaerobic conditions by partially or completely blocking the Pfl activity.
  • an increased pyruvate pool may in turn lead to an increased flux from pyruvate towards acetoin and diacetyl via the intermediate ⁇ t-acetolac- tate.
  • Fermented foods or feed products produced by using a starter culture with reduced Pfl activity therefore may contain an increased amount of diacetyl or other products derived from conversion of ⁇ -acetolactate.
  • starter cultures with increased Pfl activity should result in enhanced production of the antimicrobially active metabolite formate and the use of such cultures in the production of feed or food products having increased shelf life can therefore be contemplated.
  • the pfl gene has been isolated from several microorganisms including Escherichia coli , Haemophilus influenzae, Clostridium pasteurianum and Streptococcus mutans .
  • the Pfl enzyme is post- translationally activated by the Pfl activase via formation of an organic free radical into a glycine residue located at the C-terminal of Pfl (Frey et al . 1994). This modification of Pfl occurs only in the absence of oxygen.
  • act encoding the Pfl activase flanks the pfl gene in E. coli , H. influenzae and C.
  • the AdhE protein of E. coli has acetaldehyde dehydrogenase activity, catalyzing the conversion of acetyl CoA to acetaldehyde (R6, Fig. 1) , and ethanol dehydrogenase activity, catalyzing the conversion of acetaldehyde to ethanol (R7, Fig. 1) . Additionally, the E. coli AdhE protein is responsible for the Pfl deactivase activity.
  • Clostridium acetobutylicum an adhE analogue, aad has been cloned and characterized.
  • the presence of Pfl deactivase activity could not be verified for the Aad protein, since no evidence exists for the presence of Pfl in C. acetobutylicum (Nair et al . 1994).
  • Lactic acid bacteria including Lactococcus lactis species are facultatively anaerobic organisms like E. coli , indicating that the occurrence of Pfl activase and deactivase activities in these organisms is to be expected.
  • Analysis of the expression of adhE in E. coli has shown an eight fold increase under anaerobic growth (Chen and Lin 1991) .
  • the facts that the regulation of expression of pfl and adhE under anaerobic conditions is similar and that expression of act in E. coli is constitutive suggest that an equilibrium is formed between activated and deactivated Pfl under anaerobic conditions.
  • the deactivase activity of the AdhE protein is partially or completely blocked in lactic acid bacteria, an increased Pfl activity is expected to occur while, on the other hand, a reduced Pfl activity is expected to occur if the deactivase activity is overexpressed. If the Pfl activase is blocked, a decreased Pfl activity is contemplated.
  • the acetaldehyde dehydrogenase and the ethanol dehydrogenase activities of the AdhE protein are also potential targets for metabolic engineering in lactic acid bacterial food starter cultures and cultures used in feed production or as cultures for the production of aroma compounds or antimicrobially active compounds.
  • a block or modification of the ethanol dehydrogenase activity of such cultures may result in the overproduction of acetaldehyde which is an important flavour compound in yoghurt.
  • a block of the acetaldehyde dehydrogenase activity could give rise to an increased production of acetate which in turn may result in improved preservation of fermented foods or feed products in whose production such modified cultures are used.
  • modifications of starter cultures would increase the pyruvate pool and consequently, the formation of diacetyl or other compounds derived from the conversion of o.-acetolactate. Increasing one or both dehydrogenase activities will most likely direct the conversion of acetyl CoA from acetate to acetaldehyde or ethanol .
  • the starting point for the invention is the achievement of the isolation and sequencing of the entire adhE and pfl genes of Lactococcus lactis .
  • the present invention provides novel means for metabolically engineering lactic acid bacteria, and lactic acid bacteria being modified by such means.
  • the inven- tion relates in a first aspect to an isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase
  • ACDH acetaldehyde dehydrogenase
  • the invention pertains to a recombinant replicon comprising the above DNA sequence and to a recombinant lactic acid bacterial cell comprising such a replicon.
  • an isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having pyruvate formate-lyase activity, subject to the limitation that the sequence is not derived from oral Streptococcus species, a recombinant replicon comprising such a DNA sequence and a recombinant lactic acid bacterial cell comprising such a replicon.
  • the invention relates to a method of producing a lactic acid bacterial metabolite, the method comprising cultivating a lactic acid bacterium comprising a DNA sequence as defined above which is modified so as to inactivate or reduce or enhance the expression of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity, or a lactic acid bacterium comprising a DNA sequence which is modified whereby its production of pyruvate formate-lyase is reduced or inhibited, or whereby the enzyme is expressed in a modified form having a reduced
  • the invention also pertains to methods of producing a food product or an animal feed, the method comprising the step of admixing to the food product or feed starting materials a starter culture of a lactic acid bacterium according to the invention and keeping the mixture under conditions allowing the starter culture to be metabolically active.
  • the facultative anaerobe Escherichia coli is capable of carry- ing out mixed-acid fermentation during anaerobic growth in the absence of exogenous electron acceptors.
  • a major fermentation product is ethanol which is synthesized from acetyl CoA by two consecutive NADH-dependent reductions catalyzed by a single polypeptide, AdhE, with an acetaldehyde dehydrogenase (ACDH) domain and alcohol dehydrogenase (ADH) domain. It has also been found that this polypeptide is responsible for pyruvate formate-lyase deactivase activity.
  • the present invention provides, as mentioned above, in its first aspect an isolated DNA sequence which comprises a sequence derived from a lactic acid bacterium, which sequence codes for a multi-functional polypeptide having at least one of the following enzymatic activities: (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
  • ACDH acetaldehyde dehydrogenase
  • ADH alcohol dehydrogenase
  • capability of converting acetyl CoA into ethanol pyruvate formate-lyase deactivase activity.
  • the coding sequence for the multifunctional polypeptide is also referred
  • the DNA sequence coding for the multi-functional polypeptide may be derived from any lactic acid bacterium.
  • lactic acid bacterium designates gram-positive, microaerophilic or facultatively anaerobic bacteria which ferment sugars with the production of acids including lactic acid as the predominantly produced acid, acetic acid and propionic acid.
  • the industrially most useful lactic acid bacteria are found among Lactococcus species, Streptococcus species, Lactobacillus species, Leucono - stoc species and Pediococcus species.
  • the strict anaerobic Bifidobacterium species which are commonly used in the manufacture of dairy products, are included in the group of lactic acid bacteria.
  • the group of lactic acid bacteria comprises so-called mesophilic species which have optimum growth temperatures in the range of 15-30°C and which in many cases do not grow at temperatures exceeding 35-40°C.
  • Other groups of lactic acid bacteria have higher growth temperatures, in particular species for which humans and/or animals are the natural habitat, e.g. Enterococcus species, oral streptococci and pathogenic streptococci.
  • the above DNA sequence is derived from Lactococcus lactis including Lactococcus lactis subspecies lactis, Lactococcus lactis subspecies diacetylactis (also frequently referred to as Lactococcus lactis subspecies lactis biovar diacetylactis) and Lactococcus lactis subspecies cremoris .
  • the lactic acid bacterium-derived DNA sequence codes for a multifunctional polypeptide that is at least 30% identical with the gene pro- ducts of the adhE gene of E. coli (FASTA, GCG Wisconsin accession No. P17547) or the aad gene of Clostridiu acetobutylicum (FASTA, GCG Wisconsin accession No. P33744) or the gene product of the sequence of Table 1.4 herein (SEQ ID N0:3).
  • the identity to such other gene products is at least 40%, such as at least 50%, such as at least 60% identity or even at least 70% identity.
  • amino acid similarity indicates that a particular amino acid in a polypeptide sequence can be replaced by another amino acid having similar physical/chemical characteristics such as charge or polarity charac- teristics.
  • the sequence according to the invention which codes for the AdhE protein also includes such a coding sequence of lactic acid bacterial origin which hybridizes to the adhE coding sequence from L. lactis strain DB1341 under the following conditions: hybridization overnight at 65°C followed by washing the filters twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes .
  • the DNA sequence according to the invention comprises the sequence as shown herein in Table 1.4
  • acetaldehyde dehydrogenase activity whereby acetyl CoA is converted into acetaldehyde
  • ADH alcohol dehydrogenase
  • capability of converting acetyl CoA into ethanol and pyruvate formate-lyase deactivase activity.
  • the above term "mutant or variant” is used to designate any naturally occurring or constructed nucleotide modification of the above DNA sequence which still allows a polypeptide having at least one of the defined activi- ties to be expressed by the thus modified sequence.
  • the modification may consist in one or more nucleotide substitutions in one or more codons, resulting in the translation of the same or different amino acid(s), or the modifica- tion may be in the form of the insertion or deletion of one or more nucleotides/codons.
  • the modifications can be provided by any conventional method including, where appropriate, modifications hereof, such as e.g. the use of restriction enzymes or random or site -directed mutagenesis, e.g.
  • DNA sequence according to the invention may also be provided as a synthetically produced sequence or it may be a hybrid sequence comprising in part a native sequence and in part a syntheti- cally prepared sequence. Additionally, the above term "mutant and variant” includes any mutein of the sequence.
  • the above lactic acid bacterial DNA sequence whether in its native form or in a modified mutant or variant form may further comprise one or more sequences that regulate the expression of the coding sequence.
  • Such regulatory sequences may be located upstream and/or downstream of the coding sequence or they can be placed on a different replicon, i.e. in trans .
  • the regulatory sequences may be sequences which are natively associated with the coding sequence or they may be inserted or modified promoter sequences not natively associated with the coding sequence, which can be operably linked to the coding sequence.
  • Such sequences which are not natively associated with the coding sequence may be derived from the bacterial strain which is the source of the coding sequence or from a different orga- nism.
  • a regulatory sequence includes a promoter/operator sequence, a ribosome binding site, a sequence coding for a gene product which either enhances or inhibits the expression the coding sequence, such as a repressor or activator substance including e.g. a RNA sequence including an antisense RNA, a terminator sequence or a leader sequence regulating the excretion of the above multifunctional enzyme product.
  • a promoter which is derived from a different organism or from the same organism may, depending on the desired characteristics of the resulting bacterial cell, have a stronger or a weaker promoter activity than the promoter with which the coding sequence is natively associated.
  • the coding sequence is under the control of a regulatable promoter.
  • a regulatable promoter is used to describe a promoter sequence, the activity of which is dependent on physical or chemical factors present in the medium where organisms comprising the above coding sequence and its regulatory sequences are cultivated. Such factors include the cultivation temperature, the pH and/or the arginine content of the medium, a temperature shift eliciting the expression of heat shock genes, the composition of the growth medium including the ionic strength/NaCl content and the growth phase/growth rate of the host cell and stringent response.
  • a promoter sequence as defined above may further comprise sequences whereby the activity of the promoter becomes regulated.
  • such further sequences may provide a regulation by a stochastic event and may e.g. be sequences, the presence of which results in a recombinational excision of the promoter or of genes coding for substances which are positively needed for the promoter function.
  • the invention relates, as it is mentioned above, to a recombinant replicon comprising the above DNA sequence coding for the multifunctional polypeptide.
  • replicon designates a DNA sequence which is capable of autonomous replication in a lactic acid bacterium.
  • a replicon can be selected from a plasmid capable of replicating in a lactic acid bacterium, a lactic acid bacterial chromosome and a bacteriophage derived from a lactic acid bacterium.
  • the replicon may comprise further sequences including marker sequences and linker sequences for the insertion of genes coding for desirable gene products.
  • the replicon may comprise a gene coding for a lipase, a peptidase, a gene coding for a gene product involved in carbohydrate or citrate metabolism, a gene coding for a gene product involved in bacteriophage resistance or a gene coding for a lytic enzyme or a gene coding for a bacteriocin such as e.g. nisin or pediocin.
  • the gene may also be one which codes for a gene product conferring resistance to an antibiotic.
  • the gene coding for a desired gene product may be a homologous gene, i.e. a gene isolated from the same species as the host cell for the replicon, or a heterologous gene including a gene isolated from a lactic acid bacterial species which is of a species different from the host cell.
  • the invention also provides a recombinant lactic acid bacterial cell comprising the above replicon.
  • a host cell may be derived from any species of lactic acid bacteria as defined herein, such as a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacte- rium species and a Leuconostoc species.
  • the above lactic acid bacterial cell is useful in starter culture compositions for the manufacturing of food products including dairy products, meat products, wine, vegetables and bakery products, or in the preservation of animal feed.
  • the present recombinant lactic acid bacterial cells are particularly useful as inoculants in field crops which are to be ensiled or as preserving agents in feedstuff components of animal origin such as waste products from the slaughtering and fish processing industries.
  • Such concentrates may be provided as starter culture compositions comprising further suitable components such as e.g. preserving agents, stabilizing agent, cryoprotectants, nutrients, bacterial growth factors or further active components including enzymes.
  • probiotically active indicates that the bacteria selected for this purpose have characteristics which enable them to colonize in the gastrointestinal tract and hereby exert a beneficial regulatory effect on the microbial flora in this habitat. Such an effect may be recog- nizable as an improved food or feed conversion in humans or animals to which the cells are administered, or as an increased resistance against invading pathogenic microorganisms.
  • the above lactic acid bacterial cell can also be provided in the form of a culture for the production of an aroma or antimi- crobially active compound.
  • the above lactic acid bacterial cell is one wherein the DNA sequence comprising the sequence coding for the multifunctional polypeptide is modified so as to inactivate or reduce the production of or the activity of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
  • ACDH acetaldehyde dehydrogenase
  • ADH alcohol dehydrogenase
  • a DNA modification can be in the form of deletion, insertion or substitution of one or more nucleotides in the coding sequence possibly leading to the translation of a polypeptide having a modified amino acid composition.
  • a modified polypeptide may have lost one or more of the above enzymatic activities or it/they may be reduced.
  • An inactivation of the coding sequence may also be obtained by random or site-directed mutagenesis, e.g. using a transposable element which is integratable in the replicon comprising the coding sequence.
  • Another useful means of providing inactivated mutants is Campbell-like homologous integration as it is described in the below examples.
  • the level of production of the multi-functional polypeptide can also be reduced by modifying or regulating regulatory sequences controlling the expression of the gene coding for the polypeptide.
  • a native constitutive promoter can be replaced by a regulatable promoter, the function of which can be reduced or inhibited under appropriate conditions such as those physical and chemical promoter regulating factors as mentioned above.
  • a native promoter which is in itself regulatable by certain factors may be replaced by another regulatable promoter which is negatively regulatable by other factors present in the cultivation medium for the recombinant cell.
  • a lactic acid bacterial cell which is modified as described above in one or more of its glycolytic pathways can be characterized as a metabolically engineered cell.
  • Dependent on the type and the site of the DNA modification such a cell will be at least partially blocked in one or more of the above pathways catalyzed by the multi-functional polypeptide (R6/R7 in Fig. 1) and/or the pyruvate formate- lyase deactivase activity will be reduced or blocked.
  • a metabolically engineered cell may as a result of these modifications produce increased amounts of i.a. acetaldehyde, ethanol and/or acetate.
  • the above lactic acid bacterial cell is one wherein the DNA sequence comprising the sequence coding for the multi-functional polypeptide is modified so as to enhance the production of or the activity of at least one of its native enzymatic activities as defined above. It is contemplated that such a modification can be provided by appropriate modifications of the coding sequence itself which result in an enhanced production level of the polypeptide and/or the production of a modified polypeptide having an enhanced activity of at least one of its native activities. Such modification can be made by substitution, deletion or insertion of one or more nucleotides using any conventional methods for such DNA modifi- cations, including random or site-directed mutagenesis followed by selection of the desired mutants.
  • a lactic acid bacterial cell having enhanced production of and/or enhanced activity of at least one of its native enzymatic activities can be provided by suitable modifi- cations of sequences regulating the production and/or the activity of the multifunctional polypeptide.
  • suitable modifi- cations of sequences regulating the production and/or the activity of the multifunctional polypeptide is by operably linking the coding sequence to a promoter sequence having a stronger promoter activity than the native promoter for the coding sequence.
  • such an inserted promoter is regulatable by a factor as mentioned above and the expression of the polypeptide can then be enhanced by cultivating the cell in the presence of a factor which mediates a strong promo- ter activity.
  • an enhanced production of the AdhE polypeptide in a host cell can be obtained by using a replicon which occurs in a high copy number in that host cell.
  • such a metabolically engineered lactic acid bacterial cell having enhanced production of and/or enhanced activity of at least one of its native enzymatic activities will result in that the cell produces increased amounts of at least one metabolite selected from the group consisting of acetaldehyde, ethanol, formate, acetate, acetoin, diacetyl and 2,3 butylene glycol.
  • at least one metabolite selected from the group consisting of acetaldehyde, ethanol, formate, acetate, acetoin, diacetyl and 2,3 butylene glycol.
  • such metabolically engineered have a production of one or more of these metabolites which, in comparison with a wild type strain, is at least 2-fold higher such as at last 5-fold higher, e.g. at least 10-fold higher or even at least 20- fold higher.
  • the present invention relates in a still further aspect to an isolated lactic acid bacterial DNA sequence that comprises a sequence coding for a polypeptide having pyruvate formate-lyase activity, i.e. a pfl gene.
  • a DNA sequence further comprises at least one regulatory sequence operably linked to the coding sequence and regulating the production of the pyruvate formate-lyase polypeptide or coding for a gene product regulating the pyruvate formate-lyase activity of the polypeptide.
  • the gene product of pfl will also be referred to as a Pfl polypeptide.
  • regulatory sequences may be located upstream and/or downstream of the coding sequence.
  • the regulatory sequences may be sequences which are natively associated with the coding sequence or they may be inserted or modified promoter sequences not natively associated with the coding sequence, but which can be operably linked to the coding sequence.
  • Such sequences which are not natively associated with the coding sequence may be derived from the bacterial strain which is the source of the coding sequence or from a different organism.
  • regulatory sequences include a promoter sequence, a ribosome binding site, a sequence coding for a gene product which either enhances or inhibits the expression of the coding sequence, such as a repressor or activator substance including e.g an antisense RNA, a transcription terminator sequence or a leader sequence directing the excretion of the Pfl polypeptide.
  • the coding sequence is under the control of a regulatable promoter as defined hereinbefore and being regulatable as also described above.
  • the activity of the pyruvate formate-lyase enzyme can be regulated or modulated under anaerobic conditions by the presence or absence of an activase and a deactivase, respectively.
  • the DNA sequence comprising the sequence coding for the Pfl polypeptide preferably comprises sequences coding for a pyruvate formate-lyase activase (act gene) and/or a pyruvate formate-lyase deactivase.
  • such a deactivase is a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity as defined hereinbefore .
  • ACDH acetaldehyde dehydrogenase
  • ADH alcohol dehydrogenase
  • pyruvate formate-lyase deactivase activity as defined hereinbefore .
  • the Pfl-encoding DNA sequence can be derived from any lactic acid bacterium including a Lactobacillus species, a Streptococcus species, a Pediococcus species a Bifidobacterium species, a Leuconostoc species and a Lactococcus species such as Lactococcus lactis including Lacto coccus lactis subspecies lactis, Lactococcus lactis subspecies lactis biovar diacetylactis and Lactococcus lactis subspecies cremoris.
  • Lactococcus lactis including Lacto coccus lactis subspecies lactis, Lactococcus lactis subspecies lactis biovar diacetylactis and Lactococcus lactis subspecies cremoris.
  • the Pfl polypeptide as encoded by the pfl gene of Lactococcus lactis subspecies lactis biovar di acetylactis strain DB1341 comprises 787 amino acids (Table 3.2 below) (SEQ ID NO: 15) and has a deduced molecular weight of 89.1 kDa.
  • This polypeptide shows considerable identity with known pfl gene products (Table 3.1).
  • the corresponding pfl gene in Lactococcus lactis subspecies lactis MG1363 differs from the DB1341 gene in only about 5% of the nucleotides.
  • the DNA sequence comprising a Pfl encoding sequence comprises the coding sequence as shown in Table 3.2 below (SEQ ID NO:15), the sequence designated mgl363- pfl as shown in Table 3.6 (SEQ ID NO: 22) and the sequence shown in Table 5.3 (SEQ ID NOS:36 and 38), or a DNA sequence which is a mutant or variant hereof which codes for a polypeptide having pyruvate formate-lyase activity, the term "mutant or variant" being used in the same manner as defined hereinbefore.
  • a pfl gene as defined herein encompasses any of the specific sequences as exemplified in the following and a lactic acid bacterial sequence coding for a polypeptide having the enzymatic activity of the gene products of such isolated sequences which has a DNA homology of at least 50% with the coding sequence of the plf of L. lactis strains DB1341 or MG1363 such as at least 60% homology including at least 70% homology or at least 80% homology, e.g. at least 90% homology.
  • the lactic acid bacterium-derived DNA sequence codes for a Pfl protein that is at least 30% identical with the gene products of the pfl gene of Streptococcus mutans (FASTA, GCG Wisconsin, Accession No. D50491) or the pfl gene of Hemophilus influenzae (FASTA, GCG Wisconsin, Accession Nos. U32812 and L42023) or the gene product of the sequence of Table 3.2 herein (SEQ ID NO: 15).
  • the identity to such gene products is at least 40%, such as at least 50%, such as at least 60% identity or even at least 70% identity.
  • the homology between the above gene products may also be expressed in terms of amino acid similarity in which case the similarity suitably is at least 60%, such as at least 70%, e.g. at least 80% similarity.
  • the DNA sequence coding for the Pfl polypeptide may also be a coding sequence of lactic acid bacterial origin that hybridizes to the pfl encoding sequence isolated from L. lactis strain MG1363, under the following conditions: hybridization overnight at 65°C followed by washing the filter twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes.
  • L. lactis open reading frames may be identified upstream of the coding region for the Pfl polypeptide. Such open reading frames were designated orfA and it was found that the gene products hereof has a function in transport across cell membranes of formate. Thus, it was found that a mutant strain of L. lactis wherein the open reading had been disrupted showed an increased tolerance to the toxic formate analogue, hypophosphite.
  • a recombinant replicon comprising the above Pfl -encoding DNA sequence.
  • a replicon can be derived from a plasmid, a lactic acid bacterial bacteriophage or a lactic acid bacterial chromosome .
  • the invention relates to a recombinant lactic acid bacterial host cell comprising such a replicon.
  • the cell can be selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species a Bifidobacterium species and a Leuconostoc species.
  • the lactic acid bacterial cell may conveniently be provided in the form of a starter culture composition for use in the manufacturing of food products as described above. It is also contemplated that the above cells may be used as probiotically active cultures or as inoculants in animal feed preservation. In this connection, a particular use is as inoculants in field crops or animal waste materials which are subjected to an ensiling process.
  • the above lactic acid bacterial cell is one wherein the DNA sequence coding for pyruvate formate-lyase activity is modified whereby the production of the pyruvate formate-lyase is reduced or eliminated or whereby the enzyme is produced in a modified form having a reduced pyruvate formate-lyase activity.
  • Such a modification can, as it has been described above for a cell comprising a sequence coding for the AdhE polypeptide, be made by methods which are known per se in the art.
  • a DNA modification can e.g. be made by deletion, insertion or substitution of one or more nucleotides in the coding sequence possibly leading to the expression of a polypeptide having a modified amino acid composition.
  • An inactivation of the coding sequence can also be obtained by random or site-directed mutagenesis, e.g. by using a transposable element which is integratable in the replicon comprising the coding sequence.
  • Another possible means of providing Pfl-inactivated ⁇ pfl " Mutants is Campbell-like homologous integration.
  • the level of expression of the Pfl polypeptide can also be reduced by modifying or regulating regulatory sequences controlling the production of the polypeptide.
  • a native constitutive promoter can be replaced by a regulatable promoter, the function of which can be reduced or inhibited under appropriate conditions such as those physical and chemical promoter regulating factors as mentioned hereinbefore.
  • a native promoter which is in itself regulatable by certain factors may be replaced by another regulatable promoter which is negatively regulatable by other factors present in the cultivation medium for the recombinant cell .
  • a cell being modified in this manner will be a metabolically engineered cell, since under conditions where the pyr ⁇ vate formate-lyase is normally metabolically active as shown in Fig. 1 such a modified cell will lack one of the major pathways whereby the pyruvate pool in normally consumed. This will result in a modification of the metabolic pathways based on pyruvate including an enhanced flux towards ⁇ -acetolactate which is a precursor substance for diacetyl, acetoin and 2,3 butylene glycol. Such a cell is particularly useful in dairy starter cultures where such flavour compounds are generally desirable.
  • the lactic acid bacterial cell according to the invention is a cell wherein the DNA sequence comprising the sequence coding for pyruvate formate- lyase is modified so that the production of the pyruvate formate- lyase is enhanced or so that the enzyme is produced in a modified form having an increased pyruvate formate- lyase activity.
  • a modification can be provided by appropriate modifications of the coding sequence itself which result in an enhanced production level of the Pfl polypeptide and/or the production of a modified polypeptide having an enhanced activity of at least one of its native activities.
  • modifications can be made by substitution, deletion or insertion of one or more nucleotides using any conventional methods for such DNA modifications, including random or site-directed mutagenesis followed by selection of the desired mutants.
  • a lactic acid bacterial cell having enhanced production of and/or enhanced activity of pyruvate formate- lyase can be provided by suitable modifications of sequences regulating the expression of the pfl gene and/or the activity of the enzyme.
  • One suitable manner whereby this can be obtained is by operably linking the coding sequence to a promoter sequence having a stronger promoter activity than the native promoter for the coding sequence.
  • such an inserted promoter is regulatable by a factor as mentioned above and the production of the polypeptide can then be enhanced by cultivating the cell in the presence of a factor which confers a strong promoter activity. It is contemplated that a thus modified lactic acid bacterial cell produces increased amounts of formate and/or acetate.
  • Enhanced production of the Pfl polypeptide may also be obtained in a host by using a replicon which occurs in a high copy number in that host cell or by chromosomal amplification.
  • a recombinant lactic acid bacterial cell comprising both the DNA sequence comprising the above sequence coding for an AdhE polypeptide, and the above sequence comprising a sequence coding for pyruvate formate-lyase, in both instances including sequences regulating the production and/or the activity of the enzyme activities.
  • the term "recombinant" implies that at least one of the coding sequences or regulatory sequences is not a naturally occurring sequence. The sequences may be located on the same replicon or they may be on separate replicons.
  • At least one of the sequences of the above cell is modified so as to modify the production of the pyruvate formate- lyase or the activity hereof, or the distribution of the amounts of end products resulting from the lactose and/or citrate metabolism of the cell.
  • a lactic acid bacterium which is metabolically engineered in accordance with the invention so that it has an enhanced production of one or more metabolites is useful in a method of producing such a metabolite or such metabolites.
  • the method comprises cultivating a lactic acid bacterium which is metabolically engineered in accordance with the invention under conditions where the metabolite is produced, and isolating the metabolite from the culture.
  • the isolation of the metabolite may be carried out according to any conventional methods of recovering the particular substance, such as e.g. distillation.
  • the lactic acid bacterial cells according to the invention are useful as food starter cultures.
  • the invention also provides a method of producing a food product, the method comprising the step of admixing to the food product starting materials a starter culture of a lactic acid bacterium as defined above and keeping the mixture under conditions allowing the starter culture to be metabolically active.
  • a starter culture which is metabolically engineered in accordance with the invention is used will, dependent on the type of metabolite modifications, result in a food product having an improved flavour and/or a product which has an improved shelf life due to an enhanced production of antimicrobially active metabolites by the starter culture.
  • Fig. 1 illustrates selected metabolic pathways in citrate fermenting lactic acid bacteria
  • Fig. 2 shows an overview of the cloned L. lactis DB1341 adhE gene (open arrow) , the sequence strategy for clone 1 (box in middle) and the regions covered by the ⁇ ZAP clones adhEl and adhE3 (bottom) .
  • the nucleotide position of relevant restriction sites is shown (top) .
  • the position of PCR and sequencing primers is shown as small open arrows.
  • a putative transcription terminator present downstream of the stop codon is shown as a circle.
  • the rbs box shows the position of a consensus lactococ- cal ribosome binding site.
  • Arrows show the sequencing strategy for clone 1 (middle) ;
  • Fig. 3 shows an overview of the cloned L. lactis DB1341 adhE gene fragment (open arrow) .
  • the nucleotide position of relevant restriction sites is shown (top) .
  • the position of PCR and sequencing primers is shown as small open arrows.
  • a putative transcription terminator present downstream of the stop codon is shown as a circle.
  • the rbs box shows the position of a consensus lactococcal ribosome binding site.
  • the cloned PCR fragments of the L. lactis MG1363 adhE gene are shown as lines (MGadhESTART and MGadhESTOP) .
  • the PCR fragments used to clone into pSMA500 for gene inactivation in strain DB1341 are shown as open boxes (pSMAKAS4 and pSMAKAS5) ;
  • Fig. 4 is an overview of the cloned Lactococcus lactis DB1341 strain ( . lactis subspecies lactis biovar diacetylactis) pfl gene (open arrow box) .
  • the nucleotide positions of relevant -.6 restriction sites are shown (top) .
  • the position of PCR and sequencing primers is shown as small open arrows.
  • a putative ribosome binding site (rbs box) and a transcription terminator present downstream of the stop codon is shown as a circle.
  • the plfl (open box) shows the fragment of the ⁇ ZAP clone of the DB1341 genomic library containing a pfl gene fragment.
  • the cloned PCR fragment of the L is an overview of the cloned Lactococcus lactis DB1341 strain ( . lactis subspecies lactis biovar diacetylactis) pfl gene (open arrow box) .
  • lactis subspecies lactis MG1363 pfl fragment is shown as a line (MGpfll) .
  • a Sau3AI fragment used for gene inactivation in strain DB1341 is shown as an open box (pSMAKAS7) .
  • the pfl region included in the fragment as obtained by inverse PCR from DB1341 using EcoRI digestion and primers pfll-250 and pfll-390 is shown as a dotted box (pflup- l);
  • Fig. 5 is a genetic map of the L. lactis MG1363 adhE locus including the orfB open reading frame. In the upper part are indicated primer sequences;
  • Fig. 6 illustrates the structure of the L. lactis OrfA protein.
  • the shadowed box at the terminal region of OrfA depicts the area covered by the internal orfA fragment used for gene inactivation.
  • the two transmembrane regions were identified using the PredictProtein server at the EMBL, Heidelberg, Germany;
  • Fig. 7 illustrates expression of orfA in L. lactis .
  • A genetic map of orfA showing the region covered by the probe (thick line below orfA) used in expression studies and in the construction of a null mutant strain.
  • B Northern blot analysis. RNA isolated from MG1363 was hybridized to the orfA probe. Lane 1: exponential culture in GM17 aerobic; lane 2: same, anaerobic- lane 3: stationary culture in GM17, aerobic; lane 4: same, anaerobic; lane 5: exponential culture i GalM17, aerobic; lane 6: same, anaerobic. The transcript size is shown in kb to the left. The autoradiogram was exposed for 14 days; Fig.
  • Fig. 9 shows a genetic map of the L. lactis MG1363 pfl gene, showing the region used as a probe in the identification of pfl homologues in other lactic acid bacteria, including the posi- tion of Ec ⁇ Rl sites;
  • Fig 10 shows autoradiograms from Southern hybridization of genomic DNA from non- Lactococcus lactic acid bacteria to a L. lactis pfl probe; Lane 1: L. lactis MG1363; lane 2: Streptococcus thermophilus; lane 3 : Leuconostoc mesenteroides; lane 4 Lactobacillus acidophilus. Bands are shown in kb. Filters were exposed 2 h (A) or overnight (B) ;
  • Fig. 11 illustrates two Sau3AI fragments including most of the L. lactis strain DB1341 adhE coding sequence used in Southern hybridization experiments with EcoRI-digested genomic DNA from no - Lactococcus lactic acid bacteria;
  • Fig. 12 illustrates detection of adhE homologues in other lactic acid bacteria by Southern hybridization experiments with EcoRI-digested genomic DNA from non- Lactococcus lactic acid bacteria.
  • Lane 1 L . lactis MG1363
  • lane 2 S. thermophilus
  • lane 3 L. mesenteroides
  • lane 4 L. acidophilus. Bands are shown in kb. Filters were exposed overnight; EXAMPLE 1
  • a genomic library was constructed by cloning partially Sau3AI- digested chromosomal DNA from strain DB1341 into BamHI-digested pSMA500 (Madsen et al . 1996) and transforming into E. coli MC1000 by electroporation (Sambrook et al . , 1989).
  • Strain DB1341 was kindly provided by Chr. Hansen A/S, H ⁇ rsholm, Den- mark.
  • the genomic library consisted of about 10,000 independent recombinant clones with an average insert size of 4 kb.
  • a mixed culture, containing all clones obtained, was grown in LB + erythromycin (erm, 50 ⁇ g/ml) and plasmid DNA was isolated for genetic complementation.
  • E. coli strain NZNlll ⁇ pfl ' ; Idh: :Tn5; kan R ) is unable to grow in the absence of 0 2 due to the accumulation of NADH derived from the lack of fermentative enzyme activities encoded by the pfl and Idh genes (Mat-Jan et al . , 1989).
  • protein extracts of clone 1 were used in a modified "Ldh” assay (Crow and Pritchard 1977) , where the pyruvate-dependent conversion of NADH to NAD is monitored, to ensure that complementation of the fermentative defects in strain NZNlll had occurred.
  • Protein extraction was carried out adding 100 ⁇ l 100 mM MOPS buffer (pH 6.5); 2 % Triton X-100 to the cell pellet from 1.5 ml stationary cultures grown in LB + erm (50 ⁇ g/ml) which had been washed in fresh ice cold LB, and frozen at -80°C for 15 min. Pellets were dissolved and trans- ferred to Eppendorf tubes.
  • Lysozyme (5 mg) was added and samples were incubated on ice for 30 min. Subsequently, glass beads (100 ⁇ M, Sigma; 100 ⁇ l ) were added and samples were vortexed for 30 sec and kept on ice for 30 sec. This step was repeated 10-15 times, and samples were centrifuged at maximum speed for 2 min. Supernatants were transferred to a new Eppendorf tube and kept at -80°C until assayed. To measure NADH oxidation, the following components were mixed in a quartz cuvette: 700 ⁇ l 100 mM MOPS, pH 6.5; 100 ⁇ l 120 mM Na-Pyruvate; 50 ⁇ l 2.56 mM NADH and 50 ⁇ l H 2 0.
  • Plasmid DNA was isolated from clone 1 and used to retransform E. coli NZNlll.
  • Duplicate LB + erm plates were incubated (i) aerobically for 4 days or (ii) anaerobically for 2 days and then 2 days aerobically at 37°C. A similar number of transfor- mants was obtained in both procedures (see Table l.l below) Thus, clone 1 did not result from artifact cloning and can indeed complement the defect in strain NZNlll.
  • Table 1.1 Retransformation of clone 1 into E. coli NZNlll
  • NZNlll competent cells were electroporated with the corresponding plasmid, and one half of the cell mixture was plated onto LB + kan + erm and incubated without 0 2
  • Clone 1 was further characterized by restriction enzyme analysis and included a 2.2 kb insert. Sequence analysis determined that it included a 1.7 kb fragment of an open reading frame (ORF) showing homology to the E. coli adhE gene disclosed by Goodlove et al . , 1989. The sequence of the 2.2 kb insert is shown in Table 1.2 below (SEQ ID N0:1). Table 1.2. Sequence of the insert in clone 1
  • Sau3AI recognition sites are indicated above the sequence. DNA homology to the E. coli adhE starts at nucleotide position 262 (data not shown) . A Sau3AI fragment with 100% homology to the 23S rRNA of L. lactis is shown doubly underlined at the top (positions 1-173) . Putative expression signals functional in E. coli are shown: -35, -10 promoter regions (underlined) ; Shine Dalgarno (SD, doubly underlined) and putative start codon (bold, discontinuous underline) . The amino acid sequence of the open reading frame is given in one-letter- code. The open reading frame ends in the multiple cloning site of vector pSMA500 (doubly underlined at bottom) (Madsen et al . , 1996).
  • E. coli AdhE is a multi-functional protein consisting of 890 amino acids that catalyzes the conversion of acetyl CoA into ethanol and has acetaldehyde-DHase (ACDH) and alcohol -DHase (ADH) activities. Additionally, AdhE shows Pfl deactivase activity involved in the inactivation of pyruvate- formate lyase, a key enzyme in anaerobic metabolism (Knappe et al . 1991) .
  • clone 1 includes the ADH domain of a L. lactis AdhE homologue, and it contains expression signals necessary for expression in E. coli (Shine Dalgarno and -35 and -10 regions) .
  • the putative gene product of 427 amino acids is highly homologous to a number of other iron-dependent ADHs. Comparison at the protein level showed a 41.4% identity (78% similarity) with E. coli AdhE, in addition to significant homology to other ADHs of both eukaryotic and prokaryotic origin (Table 1.3).
  • Table 1.3. Homology search FASTA. GCG Wisconsin package version 8. Genetics Computer Group
  • AdhE The region of homology to AdhE corresponds to the central region, where the ADH domain is possibly located. Only homology to the best score is shown.
  • Clone adhE-1 included a 1.7 kb insert that was identical to the adhE fragment of clone 1 (position 262-2054 in Table 1.2) .
  • Clone adhE-3 contained a 4 kb insert spanning from the Sau3AI site at position 1296 in Table 1.2. This fragment could harbour the 3 '-end of the L. lactis adhE gene. Sequence analysis of this clone confirmed that it included the 3 '-end of the L . lactis adhE gene, which ends with a double stop codon (TAATAA, position 2854-2859 in Table 1.4 below). Downstream from this position, a possible transcription terminator was found (position 2883-2905 in Table 1.4).
  • the L . lactis adhE gene of strain DB1341 encodes a 903 amino acid long protein, as deduced from the DNA sequence (Table 1.5), with an estimated molecular weight of 98.2 KDa.
  • a putative ribosome binding site (AAAGGAG, position 127-133 in Table 1.4 is found 11 bp upstream of the start codon (de Vos and Simmons 1994) .
  • I I Ih I I :::: M I ::::::: MMMMMMI MhhMIMIhM adhe_e AGAPIOLIG IDQPSVELSNAI- HHPDINLILATGGPGMVKAAYSSGKPAIGVGAGNTPV
  • Ml : :
  • MMMIMMMMMIM MM : MUM M 11 M:: llh: MM: adhe_c IAEPIGVVAAIIPVTNPTSTTIFKSLISLIO'RNGIFFSPHPRA-KSTILAAKTILDAAVK 100 110 120 130 140 150
  • : M :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: adhe_c EKLSPVIAMYEADNFDDAL-aCAv LINLGGLGHTSGIYADEIKARDKIDRFSSAMKTVRT 340 350 360 370 380 390
  • PCR was used to characterize the adhE homologue of strain MG1363.
  • Primers adhE-mgl and adhE- 1697 were used to amplify a 1.5 kb fragment from this strain, named MGadhESTART.
  • Primers adhE-1300x and adhE-mg2 were used to amplify an overlapping 1. kb fragment, named MGadhESTOP (Fig. 3) .
  • adhemgl363 SEQ ID NO: 8; adhedbl341 : SEQ ID NO: 9; adhe_ec: SEQ ID NO: 10; aad_ca: SEQ ID NO: 11
  • the complete sequence of the adhE gene of strain DB1341 is compared to the sequence obtained via PCR amplification of MG1363 adhE fragments (see Fig. 2) .
  • V T P F A V I T D D E T H V K Y 3701 CClACrr GCTGACTATC-AATTGAC-ACCTC-AAGTTGCCATTGTTGACCCTGA 3750
  • Inactivation of the adhE gene of strain DB1341 was carried out by Campbell-like integration (Leenthous et al., 1991) of pSMA- 500 derivatives into the DB1341 chromosome.
  • the adhE gene of strain DB1341 was inactivated at two different positions by cloning of PCR fragments (see Fig. 2) into the integration vector pSMA500 (Madsen et al . , 1996).
  • a 706 bp internal adhE fragment was amplified from the DB1341 chromosome using primer adhPl (position 1069-1088 in Table 1.4) and primer adhP2 (posi- tion 1775-1756 in Table 1.4) .
  • primers contain a Xhol and a BamHI recognition site at the 5' end.
  • the PCR fragment was digested with .Xhol and BamHI followed by cloning into pSMA500.
  • the resulting plasmid, pSMAKAS4 (Fig. 3), was introduced into E. coli MC1000 by electroporation (Sambrook et al . , 1989).
  • Plasmid pSMAKAS4 was purified and subsequently introduced into strain DB1341 by electroporation (Holo and Nes 1989) and trans - formants were selected on SGM17 plates containing 1 ⁇ g/ml erythromycin and 80 ⁇ g/ml X-gal (Madsen et al . , 1996) . Homologous integration leads to an adhE gene which is interrupted after amino acid residue Asp 543 . About 100 blue trans - formants were obtained, indicating that a transcriptional fusion of the adhE gene to the lacLM reporter gene of pSMA500 had occurred. Eight blue transformants were restreaked and the integration point was verified by PCR analysis. One strain, DBKAS4, was selected for further studies.
  • a 616 bp adhE fragment was amplified from the DB1341 chromosome using primer orf3Pl (posi- tion 2112-2138 in Table 1.4) and primer orf3P2 (position 2728- 2708 in Table 1.4) .
  • the cloning of this fragment into pSMA500 resulted in plasmid pSMAKAS5 (Fig. 3) .
  • Introduction of pSMAKAS5 into DB1341 and subsequent integration into the adhE gene leads to an adhE gene, which is interrupted after amino acid residue lie 861 .
  • pSMAKAS4 and pSMAKAS5 were used also to inactivate the MG1363 adhE gene.
  • One transformant from each transformation that turned blue on X-gal plates (MGKAS4 and MGKAS5) and therefore contained a translational fusion of the lacLM reporter gene of pSMA500 to the MG1363 adhE gene, was isolated for further studies.
  • Lactococcus lactis subspecies lactis biovar diace- tylactis strains DBKAS4 and DBKAS5, respectively and of Lactococcus lactis subspecies lactis strains MGKAS4 and MG AS5, respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Masche- roder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11084, DSM 11085, DSM 11081 and DSM 11082, respectively.
  • adhE mutant strain was obtained by PCR using MG1363 DNA as template and primers adhPl-JChoI (sequence 5' -GGCCGCTCGA- GGTTGAACGTGCTGGTGAAGG-3 ' spanning position 2657-2676 in the MG1363 adhE sequence) (SEQ ID NO: 32) and adhP2 -BainHI (sequence 5 ' -TAGTAGGATCCGGGTCAGGTTGGACTGAGCC-3 ' ; spanning position 3363 - 3344 in the MG1363 adhE sequence) (SEQ ID NO: 33) .
  • a 700 bp fragment was digested with Xhol and BamHI, cloned into likewise digested pSMA500 and transformed into E. co ⁇ MC1000.
  • the new construction, pSMAKAS14 was introduced into L . lactis MG1363 via electroporation. Integration led to disruption of the resident adhE gene and one transformant that turned blue on X- gal plates (integration results in transcriptional fusion to lacLM, a reporter gene) was selected for further analysis and was named MGKAS14. This integrant should express an AdhE protein truncated at position Asp S43 .
  • MGKAS14 A sample of MGKAS14 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 10 July 1997 under the accession No. DSM 11654. 2. Physiological characterization of MGKAS14
  • MGKAS14 Physiological studies of MGKAS14 was carried by cultivating the strain in anaerobiosis in M17 medium supplemented with either glucose (GM17) or galactose (GalM17) .
  • GM17 glucose
  • GalM17 galactose
  • GM17 the production of formate in the mutant strain was reduced (4.86 in GM1363 vs. 1.67 in MGKAS14)
  • the production of acetaldehyde was increased (0.52 in MG1363 vs. 0.67 in MGKAS14) .
  • No pyruvate was detected with any of the test strains.
  • ⁇ ZAP genomic libraries of L. lactis strains DB1341 and MG1363 were constructed according to the manufacturer's instructions (Stratagene) using partially Sau3AI-digested chromosomal DNA (average size about 5 kb) cloned into ⁇ vector BamHI arms. Average insert size was estimated to be 3 kb.
  • the coding sequence starts at position 80 and ends at position 2443.
  • a putative ribosome binding site is shown in bold, double underline (positions 65-71) .
  • a putative rho-independent transcriptional terminator (de Vos and Simons 1994) is found at positions 2468-2490 and is shown in bold, underline (stem) or dotted underline (loop) .
  • MIIMIIM smpf 1 CAGGTGACCCAAC-ATTTATTACGACTTCTATGGCTGGTATGGGAGCTGATGGACGTCACC 1270 1280 1290 1300 1310 1320
  • TGCTTTC-t-AGACGTTTACACGCGCAC ( -AAAGTATTGGAGGTTATGACGTCCTTCTT ⁇ -ATT
  • dbpfl complementary strand corresponding to nucleotides 1979-9 of SEQ ID NO: 15; hi3281: SEQ ID NO: 18
  • Table 3.4 Protein homology (FASTA. GCG Wisconsin Package Version 8. Genetics Computer Group) using the complete protein sequence derived from the L. lactis DB1341 pfl sequence shown in Table 3.2
  • the Pfl protein of Streptococcus mutans was not recorded in th searched protein databases.
  • : Ml MM I
  • : Ml
  • dbpfl corresponds to amino acid residue ⁇ 1-772 of SEQ ID NO: 16; pflb_e: correspond ⁇ to amino acid residues of SEQ ID NO: 14
  • ANTVDSLSAIKYAKVKTLR DENGYI YDYEVEGDFPRYGEDDDRADDIAKL
  • 11 pf lb_h - FGPGANPMHGRDQKGAVASLTSVAKLPFAYAKDGISYTFSIVPNALGKDAEAQRRNLAG 640 650 660 670 680 690
  • the highest homology value obtained when analysing the sequence from clone pfll corresponds to the S. mutans pfl gene (Table 3.1), i.e. about 80% at the DNA level, in the region covered by the probe used for library screening and 68.5% for the 1.1 kb pfl fragment analyzed. Sequence comparisons indicated that the fragment included in clone pfll encompasses 367 amino acids of the C- terminal regio of the L. lactis pfl gene. Therefore, about 1.3 kb of the 5'- end of the pfl gene was lacking.
  • High stringency hybridization (washing steps at 65°C, 2 x 30 min in 2 x SSC, then 1 x 30 min in 0.1 x SSC; 0.1 % SDS) resulted in the isolation of twelve positive clones.
  • pfll4 Sequence analysis of pfll4 confirmed that it included a pfl fragment that lacked the Sau3AI site at position 1 in clone pfll, but showed sequence identity from position 30 onwards in clone pfl (position 1372 in Table 3.2). It is therefore likely that the presence of an intact L. lactis pfl gene is toxic in E. coli and leads to plasmid rearrangement.
  • This PCR fragment was re-amplified from J ⁇ coRI-digested and religated DB 1341 DNA using modified primers pfll-250 (including an Xhol site at the 5' -end) and pfll-390 (including a BamHI site at the 5' -end) and the amplified product was digested with Xhol and BamHI and ligated into vector pGE digested with the same enzymes and transformation of E. coli DH5 ⁇ resulted in strain pflup-1.
  • the L. lactis DB1341 pfl gene encodes a 787 amino acid protein (Tables 3.2, 3.4 and 3.6) with a deduced molecular weight of 89.1 kDa.
  • E. coli DH5o. strain pflup-1 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11087.
  • TATO -AATGA .C-ACACTTTCCATGATTATTTGAGAGATTTCCGAAC-AAG 900
  • sequence included an open reading frame, designated orfA encoding a putative 37 kDa protein with no relevant homology to any sequence in available databases.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

The complete DNA sequences for adhE and pf1 genes of Lactococcus lactis, recombinant replicons comprising one or both of these genes or comprising mutants or variants hereof including mutants in which the genes are inactivated, and recombinant lactic acid bacteria comprising such a replicon are provided. The gene sequences and/or sequences regulating the expression of the genes can be modified to provide metabolically engineered lactic acid bacteria which have an enhanced or reduced production of one or more metabolites resulting from citrate and/or sugar fermentation. Such metabolically modified cells are useful as starter cultures in the manufacturing of food products and animal feed having improved flavour and/or shelf life, including dairy products, or they can be used directly in the manufacturing of a lactic acid bacterial metabolite.

Description

METABOLICALLY ENGINEERED LACTIC ACID BACTERIA AND MEANS FOR PROVIDING SAME
FIELD OF INVENTION
The present invention pertains to the field of lactic acid bacterial starter cultures which are useful in the production of food products, animal feed or aroma compounds, and specifically there is provided means for metabolically engineering such lactic acid bacteria which are thereby modified in their production of metabolic end products including aroma or flavour compounds and/or compounds having antimicrobial effects.
TECHNICAL BACKGROUND AND PRIOR ART
Lactic acid bacteria are used extensively as starter cultures in the food industry in the manufacture of fermented products including milk products such as e.g. yoghurt and cheese, meat products, bakery products, wine and vegetable products. Lacto - coccus lactis is one of the most commonly used lactic acid bacteria in dairy starter cultures. However, several other lactic acid bacteria such as Leuconostoc species, Lactobacillus species and Streptococcus species are also commonly used in food starter cultures. In the art, species of the obligate anaerobic bacteria belonging to BifidoJacteriuin which are taxonomically different from the group of bacteria generally referred to as lactic acid bacteria, are frequently included in the group of lactic acid bacteria due to their application as dairy starter cultures. Lactic acid bacteria are also commonly used as inoculants in feedstuffs of plant and animal origin, i.a. for preservation purposes.
When a lactic acid bacterial starter culture is added to a substrate including milk or any other food or feed product starting material under appropriate conditions, the bacteria grow rapidly with concomitant conversion of lactose or other sugars to lactic acid/lactate and minor amount of acetate resulting in a pH decrease. In addition, several other metabolites are produced during the growth of lactic acid bacteria. Among these metabolites, diacetyl is one essential flavour compound which is formed during fermentation of the citrate-utilizing species of e.g. Lactococcus, Leuconostoc, and Lactobacillus . Diacetyl is formed by an oxidative decarboxyla- tion (Rl, Fig. l) of α-acetolactate which is formed from two molecules of pyruvate by the action of o.-acetolactate synthase (R2, Fig. 1) .
Pyruvate is a key intermediate of several lactic acid bacterial metabolic pathways including the citrate metabolism and the degradation of lactose or glucose to lactate. The pool of pyruvate in the cells is critical for the flux through the pathway leading to diacetyl , acetoin and 2 , 3 butylene glycol due to o.-acetolactate synthase affinity for pyruvate. Overproduction of cv-acetolactate synthase in Lactococcus lactis as an approach for increased production of diacetyl has been disclosed by Platteuw et al . 1995.
An alternative metabolic engineering approach to providing an increased pool of pyruvate in lactic acid bacteria is to block one or several pyruvate degrading pathways. As an example hereof, a Lactococcus lactis mutant defective in the lactate dehydrogenase (R3, Fig. 1) has been disclosed by Gasson et al . (ref. 8, unpublished data, in Platteuw et al . 1995). Under aerobic conditions pyruvate is accumulated in this mutant leading to the formation of increased levels of acetoin and 2,3 butylene glycol. However, formate and ethanol were the major metabolic end products obtained under anaerobic conditions, but the formation of the latter end products in high amounts is generally undesired in fermented dairy products typically being produced under anaerobic conditions. The reaction whereby pyruvate is converted to formate and acetyl coenzyme A (acetyl CoA) (R4, Fig. 1) by the action of pyruvate formate-lyase (Pfl) takes place only under anaerobic conditions (Frey et al. 1994). An alternative pathway for the formation of acetyl CoA from pyruvate (R5, Fig. 1) in a lactic acid bacterium is by the activity of the pyruvate dehydrogenase complex (PDC) . In contrast to Pfl, the activity of PDC appears to be optimal under aerobic conditions (Snoep et al . 1992). Consequently, the pyruvate pool assumingly will be increased under anaerobic conditions by partially or completely blocking the Pfl activity. As mentioned above, an increased pyruvate pool may in turn lead to an increased flux from pyruvate towards acetoin and diacetyl via the intermediate αt-acetolac- tate. Fermented foods or feed products produced by using a starter culture with reduced Pfl activity therefore may contain an increased amount of diacetyl or other products derived from conversion of α-acetolactate. In contrast, starter cultures with increased Pfl activity should result in enhanced production of the antimicrobially active metabolite formate and the use of such cultures in the production of feed or food products having increased shelf life can therefore be contemplated.
The pfl gene has been isolated from several microorganisms including Escherichia coli , Haemophilus influenzae, Clostridium pasteurianum and Streptococcus mutans . The Pfl enzyme is post- translationally activated by the Pfl activase via formation of an organic free radical into a glycine residue located at the C-terminal of Pfl (Frey et al . 1994). This modification of Pfl occurs only in the absence of oxygen. Although the activation gene, act encoding the Pfl activase flanks the pfl gene in E. coli , H. influenzae and C. pasteurianum, the act gene is transcribed from its own promoter, and the expression is essentially constitutive (Weidner et al . 1996). In contrast, the pfl expression is induced 12 to 15 fold by anaerobiosis (Sauter and Sawers 1990) . The free radical enzyme, i.e. the activated Pfl, is destroyed by oxygen with concomitant fragmentation of the polypeptide chain (ref. 2 in Kessler 1992). However, in E. coli a Pfl deactivase activity has been found which under anaerobic conditions reverts the active radical form to the native non- radical form of Pfl (Kessler et al . 1992). By this activity, Pfl deactivase protects Pfl against being irreversibly destroyed by oxygen.
The AdhE protein of E. coli has acetaldehyde dehydrogenase activity, catalyzing the conversion of acetyl CoA to acetaldehyde (R6, Fig. 1) , and ethanol dehydrogenase activity, catalyzing the conversion of acetaldehyde to ethanol (R7, Fig. 1) . Additionally, the E. coli AdhE protein is responsible for the Pfl deactivase activity.
In the strict anaerobe, Clostridium acetobutylicum an adhE analogue, aad, has been cloned and characterized. However, the presence of Pfl deactivase activity could not be verified for the Aad protein, since no evidence exists for the presence of Pfl in C. acetobutylicum (Nair et al . 1994).
Lactic acid bacteria including Lactococcus lactis species are facultatively anaerobic organisms like E. coli , indicating that the occurrence of Pfl activase and deactivase activities in these organisms is to be expected. Analysis of the expression of adhE in E. coli has shown an eight fold increase under anaerobic growth (Chen and Lin 1991) . The facts that the regulation of expression of pfl and adhE under anaerobic conditions is similar and that expression of act in E. coli is constitutive suggest that an equilibrium is formed between activated and deactivated Pfl under anaerobic conditions. If the deactivase activity of the AdhE protein is partially or completely blocked in lactic acid bacteria, an increased Pfl activity is expected to occur while, on the other hand, a reduced Pfl activity is expected to occur if the deactivase activity is overexpressed. If the Pfl activase is blocked, a decreased Pfl activity is contemplated. The acetaldehyde dehydrogenase and the ethanol dehydrogenase activities of the AdhE protein are also potential targets for metabolic engineering in lactic acid bacterial food starter cultures and cultures used in feed production or as cultures for the production of aroma compounds or antimicrobially active compounds. Thus, it can be contemplated that a block or modification of the ethanol dehydrogenase activity of such cultures may result in the overproduction of acetaldehyde which is an important flavour compound in yoghurt. Alternatively, a block of the acetaldehyde dehydrogenase activity could give rise to an increased production of acetate which in turn may result in improved preservation of fermented foods or feed products in whose production such modified cultures are used. Additionally, it is contemplated that such modifications of starter cultures would increase the pyruvate pool and consequently, the formation of diacetyl or other compounds derived from the conversion of o.-acetolactate. Increasing one or both dehydrogenase activities will most likely direct the conversion of acetyl CoA from acetate to acetaldehyde or ethanol .
Based on the above analysis of the potential means of regulating the size of the pyruvate pool in lactic acid bacteria and the intracellular fluxes from this metabolic intermediate pool towards desirable end products, a novel approach has been developed for metabolically engineering lactic acid bacteria allowing the provision of useful lactic acid bacterial starter cultures either having an enhanced production of desirable flavour compounds or an increased production of antimicrobially active compounds which can be used to increase the shelf life of food or feed products .
In particular, the starting point for the invention is the achievement of the isolation and sequencing of the entire adhE and pfl genes of Lactococcus lactis . Based on these findings, it has become possible, by appropriate modifications of the genes and their expression and/or activity of one or more of the enzyme activities encoded by these genes, to provide in a goal -directed manner lactic acid bacterial starter cultures having the above desirable characteristics, including cultures of strains having reduced or enhanced production of particular metabolites.
SUMMARY OF THE INVENTION
Accordingly, the present invention provides novel means for metabolically engineering lactic acid bacteria, and lactic acid bacteria being modified by such means. Specifically, the inven- tion relates in a first aspect to an isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase
(ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
In further aspects, the invention pertains to a recombinant replicon comprising the above DNA sequence and to a recombinant lactic acid bacterial cell comprising such a replicon.
In still further aspects, there is provided an isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having pyruvate formate-lyase activity, subject to the limitation that the sequence is not derived from oral Streptococcus species, a recombinant replicon comprising such a DNA sequence and a recombinant lactic acid bacterial cell comprising such a replicon.
In another aspect, the invention relates to a method of producing a lactic acid bacterial metabolite, the method comprising cultivating a lactic acid bacterium comprising a DNA sequence as defined above which is modified so as to inactivate or reduce or enhance the expression of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity, or a lactic acid bacterium comprising a DNA sequence which is modified whereby its production of pyruvate formate-lyase is reduced or inhibited, or whereby the enzyme is expressed in a modified form having a reduced pyruvate formate-lyase activity, or wherein the DNA sequence is modified whereby the expression of pyruvate formate- lyase is enhanced or whereby the enzyme is expressed in a modified form having an increased pyruvate formate-lyase activity, and isolating the metabolite from the culture.
The invention also pertains to methods of producing a food product or an animal feed, the method comprising the step of admixing to the food product or feed starting materials a starter culture of a lactic acid bacterium according to the invention and keeping the mixture under conditions allowing the starter culture to be metabolically active.
There is also provided an isolated DNA sequence derived from a lactic acid bacterium, said sequence coding for a product having a formate transporter activity.
DETAILED DISCLOSURE OF THE INVENTION
The facultative anaerobe Escherichia coli is capable of carry- ing out mixed-acid fermentation during anaerobic growth in the absence of exogenous electron acceptors. In this connection, a major fermentation product is ethanol which is synthesized from acetyl CoA by two consecutive NADH-dependent reductions catalyzed by a single polypeptide, AdhE, with an acetaldehyde dehydrogenase (ACDH) domain and alcohol dehydrogenase (ADH) domain. It has also been found that this polypeptide is responsible for pyruvate formate-lyase deactivase activity.
It has now been found that a DNA sequence showing significant homology to the E. coli gene, adhE which codes for a polypeptide showing substantial similarity with the above multi-functional E. coli AdhE polypeptide is present in lactic acid bacteria which are also facultative anaerobes, such as in Lactococcus lactis . It was therefore hypothesized that the gene product of the thus identified and isolated lactic acid bacterial DNA sequence might have similar enzymatic activities as the corresponding E. coli gene. This was found to be the case.
Accordingly, the present invention provides, as mentioned above, in its first aspect an isolated DNA sequence which comprises a sequence derived from a lactic acid bacterium, which sequence codes for a multi-functional polypeptide having at least one of the following enzymatic activities: (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity. The coding sequence for the multifunctional polypeptide is also referred to herein as the adhE gene, and the polypeptide encoded by the gene as the AdhE polypeptide.
In accordance with the invention, the DNA sequence coding for the multi-functional polypeptide may be derived from any lactic acid bacterium. In the present context, the term "lactic acid bacterium" designates gram-positive, microaerophilic or facultatively anaerobic bacteria which ferment sugars with the production of acids including lactic acid as the predominantly produced acid, acetic acid and propionic acid. The industrially most useful lactic acid bacteria are found among Lactococcus species, Streptococcus species, Lactobacillus species, Leucono - stoc species and Pediococcus species. Additionally, the strict anaerobic Bifidobacterium species, which are commonly used in the manufacture of dairy products, are included in the group of lactic acid bacteria. The group of lactic acid bacteria comprises so-called mesophilic species which have optimum growth temperatures in the range of 15-30°C and which in many cases do not grow at temperatures exceeding 35-40°C. Other groups of lactic acid bacteria have higher growth temperatures, in particular species for which humans and/or animals are the natural habitat, e.g. Enterococcus species, oral streptococci and pathogenic streptococci.
In certain preferred embodiments, the above DNA sequence is derived from Lactococcus lactis including Lactococcus lactis subspecies lactis, Lactococcus lactis subspecies diacetylactis (also frequently referred to as Lactococcus lactis subspecies lactis biovar diacetylactis) and Lactococcus lactis subspecies cremoris .
In useful embodiments of the invention, the lactic acid bacterium-derived DNA sequence codes for a multifunctional polypeptide that is at least 30% identical with the gene pro- ducts of the adhE gene of E. coli (FASTA, GCG Wisconsin accession No. P17547) or the aad gene of Clostridiu acetobutylicum (FASTA, GCG Wisconsin accession No. P33744) or the gene product of the sequence of Table 1.4 herein (SEQ ID N0:3). In other useful embodiments, the identity to such other gene products is at least 40%, such as at least 50%, such as at least 60% identity or even at least 70% identity. The homology between the above gene products may also be expressed in terms of amino acid similarity in which case the similarity suitably is at least 60%, such as at least 70%, e.g. at least 80% similarity. In this context, the expression "amino acid similarity" indicates that a particular amino acid in a polypeptide sequence can be replaced by another amino acid having similar physical/chemical characteristics such as charge or polarity charac- teristics.
The sequence according to the invention which codes for the AdhE protein also includes such a coding sequence of lactic acid bacterial origin which hybridizes to the adhE coding sequence from L. lactis strain DB1341 under the following conditions: hybridization overnight at 65°C followed by washing the filters twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes .
In one specific embodiment, the DNA sequence according to the invention comprises the sequence as shown herein in Table 1.4
(SEQ ID NO: 3) or the sequence designated adhemgl363 as shown in the below Table 1.8 (SEQ ID NO: 12) or the sequence shown in Table 1.9 (SEQ ID NOS:28/30), or a mutant or variant hereof which codes at least in part for a polypeptide having at least one enzymatic activity selected from the group consisting of
(i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
In the present context, the above term "mutant or variant" is used to designate any naturally occurring or constructed nucleotide modification of the above DNA sequence which still allows a polypeptide having at least one of the defined activi- ties to be expressed by the thus modified sequence. Accordingly, the modification may consist in one or more nucleotide substitutions in one or more codons, resulting in the translation of the same or different amino acid(s), or the modifica- tion may be in the form of the insertion or deletion of one or more nucleotides/codons. The modifications can be provided by any conventional method including, where appropriate, modifications hereof, such as e.g. the use of restriction enzymes or random or site -directed mutagenesis, e.g. by means of transposable elements. It will be understood that the above DNA sequence according to the invention may also be provided as a synthetically produced sequence or it may be a hybrid sequence comprising in part a native sequence and in part a syntheti- cally prepared sequence. Additionally, the above term "mutant and variant" includes any mutein of the sequence.
The above lactic acid bacterial DNA sequence whether in its native form or in a modified mutant or variant form may further comprise one or more sequences that regulate the expression of the coding sequence. Such regulatory sequences may be located upstream and/or downstream of the coding sequence or they can be placed on a different replicon, i.e. in trans . The regulatory sequences may be sequences which are natively associated with the coding sequence or they may be inserted or modified promoter sequences not natively associated with the coding sequence, which can be operably linked to the coding sequence. Such sequences which are not natively associated with the coding sequence may be derived from the bacterial strain which is the source of the coding sequence or from a different orga- nism. In this context, a regulatory sequence includes a promoter/operator sequence, a ribosome binding site, a sequence coding for a gene product which either enhances or inhibits the expression the coding sequence, such as a repressor or activator substance including e.g. a RNA sequence including an antisense RNA, a terminator sequence or a leader sequence regulating the excretion of the above multifunctional enzyme product. A promoter which is derived from a different organism or from the same organism may, depending on the desired characteristics of the resulting bacterial cell, have a stronger or a weaker promoter activity than the promoter with which the coding sequence is natively associated.
In a useful embodiment, the coding sequence is under the control of a regulatable promoter. As used herein, the term "regulatable promoter" is used to describe a promoter sequence, the activity of which is dependent on physical or chemical factors present in the medium where organisms comprising the above coding sequence and its regulatory sequences are cultivated. Such factors include the cultivation temperature, the pH and/or the arginine content of the medium, a temperature shift eliciting the expression of heat shock genes, the composition of the growth medium including the ionic strength/NaCl content and the growth phase/growth rate of the host cell and stringent response.
A promoter sequence as defined above may further comprise sequences whereby the activity of the promoter becomes regulated. Thus, in lactic acid bacterial cultures for which it is advantageous to have a gradually decreasing activity of the coding sequence under control of the promoter sequence such further sequences may provide a regulation by a stochastic event and may e.g. be sequences, the presence of which results in a recombinational excision of the promoter or of genes coding for substances which are positively needed for the promoter function.
It has been found that in e.g. Lactococcus lactis there may be, upstream of the sequence coding for the above multifunctional polypeptide, DNA sequences coding for one or more open reading frames. Thus, such open reading frames were identified in both L. lactis strain DB1341 and strain MG1363. These open reading frames were designated orfB .
In a further aspect, the invention relates, as it is mentioned above, to a recombinant replicon comprising the above DNA sequence coding for the multifunctional polypeptide. As used herein, the term "replicon" designates a DNA sequence which is capable of autonomous replication in a lactic acid bacterium. Such a replicon can be selected from a plasmid capable of replicating in a lactic acid bacterium, a lactic acid bacterial chromosome and a bacteriophage derived from a lactic acid bacterium.
The replicon may comprise further sequences including marker sequences and linker sequences for the insertion of genes coding for desirable gene products. Thus, in useful embodi- ments, the replicon may comprise a gene coding for a lipase, a peptidase, a gene coding for a gene product involved in carbohydrate or citrate metabolism, a gene coding for a gene product involved in bacteriophage resistance or a gene coding for a lytic enzyme or a gene coding for a bacteriocin such as e.g. nisin or pediocin. The gene may also be one which codes for a gene product conferring resistance to an antibiotic.
The gene coding for a desired gene product may be a homologous gene, i.e. a gene isolated from the same species as the host cell for the replicon, or a heterologous gene including a gene isolated from a lactic acid bacterial species which is of a species different from the host cell.
The invention also provides a recombinant lactic acid bacterial cell comprising the above replicon. Such a host cell may be derived from any species of lactic acid bacteria as defined herein, such as a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacte- rium species and a Leuconostoc species.
The above lactic acid bacterial cell is useful in starter culture compositions for the manufacturing of food products including dairy products, meat products, wine, vegetables and bakery products, or in the preservation of animal feed. In the latter context, the present recombinant lactic acid bacterial cells are particularly useful as inoculants in field crops which are to be ensiled or as preserving agents in feedstuff components of animal origin such as waste products from the slaughtering and fish processing industries.
When the cells are to be used for these purposes they are conveniently provided in the form of freeze-dried or frozen concentrates typically containing 109 to 1012 colony forming units (CPUs) per g of concentrate. Such concentrates may be provided as starter culture compositions comprising further suitable components such as e.g. preserving agents, stabilizing agent, cryoprotectants, nutrients, bacterial growth factors or further active components including enzymes.
An interesting use of the above lactic acid bacterial cell is in the manufacturing of a probiotically active composition. In the present context, the term "probiotically active" indicates that the bacteria selected for this purpose have characteristics which enable them to colonize in the gastrointestinal tract and hereby exert a beneficial regulatory effect on the microbial flora in this habitat. Such an effect may be recog- nizable as an improved food or feed conversion in humans or animals to which the cells are administered, or as an increased resistance against invading pathogenic microorganisms.
The above lactic acid bacterial cell can also be provided in the form of a culture for the production of an aroma or antimi- crobially active compound.
In a particularly useful embodiment, the above lactic acid bacterial cell is one wherein the DNA sequence comprising the sequence coding for the multifunctional polypeptide is modified so as to inactivate or reduce the production of or the activity of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
Such a modification can be made by methods which are known per se in the art. Thus, as typical examples, a DNA modification can be in the form of deletion, insertion or substitution of one or more nucleotides in the coding sequence possibly leading to the translation of a polypeptide having a modified amino acid composition. Such a modified polypeptide may have lost one or more of the above enzymatic activities or it/they may be reduced. An inactivation of the coding sequence may also be obtained by random or site-directed mutagenesis, e.g. using a transposable element which is integratable in the replicon comprising the coding sequence. Another useful means of providing inactivated mutants is Campbell-like homologous integration as it is described in the below examples.
The level of production of the multi-functional polypeptide can also be reduced by modifying or regulating regulatory sequences controlling the expression of the gene coding for the polypeptide. Thus, as one example, a native constitutive promoter can be replaced by a regulatable promoter, the function of which can be reduced or inhibited under appropriate conditions such as those physical and chemical promoter regulating factors as mentioned above. Alternatively, a native promoter which is in itself regulatable by certain factors may be replaced by another regulatable promoter which is negatively regulatable by other factors present in the cultivation medium for the recombinant cell.
Generally, the term "metabolic engineering" in relation to lactic acid bacteria covers manipulations of the bacteria themselves or of the conditions under which they are cultivated whereby the production of metabolites from the fermentation of sugars or citrate is modulated quantitatively or qualitatively. Accordingly, a lactic acid bacterial cell which is modified as described above in one or more of its glycolytic pathways can be characterized as a metabolically engineered cell. Dependent on the type and the site of the DNA modification such a cell will be at least partially blocked in one or more of the above pathways catalyzed by the multi-functional polypeptide (R6/R7 in Fig. 1) and/or the pyruvate formate- lyase deactivase activity will be reduced or blocked. Accordingly, such a metabolically engineered cell may as a result of these modifications produce increased amounts of i.a. acetaldehyde, ethanol and/or acetate.
In a further useful embodiment, the above lactic acid bacterial cell is one wherein the DNA sequence comprising the sequence coding for the multi-functional polypeptide is modified so as to enhance the production of or the activity of at least one of its native enzymatic activities as defined above. It is contemplated that such a modification can be provided by appropriate modifications of the coding sequence itself which result in an enhanced production level of the polypeptide and/or the production of a modified polypeptide having an enhanced activity of at least one of its native activities. Such modification can be made by substitution, deletion or insertion of one or more nucleotides using any conventional methods for such DNA modifi- cations, including random or site-directed mutagenesis followed by selection of the desired mutants.
Alternatively, a lactic acid bacterial cell having enhanced production of and/or enhanced activity of at least one of its native enzymatic activities can be provided by suitable modifi- cations of sequences regulating the production and/or the activity of the multifunctional polypeptide. One suitable manner whereby this can be obtained is by operably linking the coding sequence to a promoter sequence having a stronger promoter activity than the native promoter for the coding sequence. In suitable embodiments such an inserted promoter is regulatable by a factor as mentioned above and the expression of the polypeptide can then be enhanced by cultivating the cell in the presence of a factor which mediates a strong promo- ter activity. It is contemplated that an enhanced production of the AdhE polypeptide in a host cell can be obtained by using a replicon which occurs in a high copy number in that host cell.
It is aimed at that such a metabolically engineered lactic acid bacterial cell having enhanced production of and/or enhanced activity of at least one of its native enzymatic activities will result in that the cell produces increased amounts of at least one metabolite selected from the group consisting of acetaldehyde, ethanol, formate, acetate, acetoin, diacetyl and 2,3 butylene glycol. Thus, in preferred embodiments, such metabolically engineered have a production of one or more of these metabolites which, in comparison with a wild type strain, is at least 2-fold higher such as at last 5-fold higher, e.g. at least 10-fold higher or even at least 20- fold higher.
The present invention relates in a still further aspect to an isolated lactic acid bacterial DNA sequence that comprises a sequence coding for a polypeptide having pyruvate formate-lyase activity, i.e. a pfl gene. In useful embodiments, such a DNA sequence further comprises at least one regulatory sequence operably linked to the coding sequence and regulating the production of the pyruvate formate-lyase polypeptide or coding for a gene product regulating the pyruvate formate-lyase activity of the polypeptide. In the following, the gene product of pfl will also be referred to as a Pfl polypeptide.
Such regulatory sequences may be located upstream and/or downstream of the coding sequence. The regulatory sequences may be sequences which are natively associated with the coding sequence or they may be inserted or modified promoter sequences not natively associated with the coding sequence, but which can be operably linked to the coding sequence. Such sequences which are not natively associated with the coding sequence may be derived from the bacterial strain which is the source of the coding sequence or from a different organism. In this context, regulatory sequences include a promoter sequence, a ribosome binding site, a sequence coding for a gene product which either enhances or inhibits the expression of the coding sequence, such as a repressor or activator substance including e.g an antisense RNA, a transcription terminator sequence or a leader sequence directing the excretion of the Pfl polypeptide. In a useful embodiment , the coding sequence is under the control of a regulatable promoter as defined hereinbefore and being regulatable as also described above.
The activity of the pyruvate formate-lyase enzyme can be regulated or modulated under anaerobic conditions by the presence or absence of an activase and a deactivase, respectively. Accordingly, the DNA sequence comprising the sequence coding for the Pfl polypeptide preferably comprises sequences coding for a pyruvate formate-lyase activase (act gene) and/or a pyruvate formate-lyase deactivase. In preferred embodiments, such a deactivase is a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity as defined hereinbefore .
In accordance with the invention, the Pfl-encoding DNA sequence can be derived from any lactic acid bacterium including a Lactobacillus species, a Streptococcus species, a Pediococcus species a Bifidobacterium species, a Leuconostoc species and a Lactococcus species such as Lactococcus lactis including Lacto coccus lactis subspecies lactis, Lactococcus lactis subspecies lactis biovar diacetylactis and Lactococcus lactis subspecies cremoris.
It has been found that the Pfl polypeptide as encoded by the pfl gene of Lactococcus lactis subspecies lactis biovar di acetylactis strain DB1341 comprises 787 amino acids (Table 3.2 below) (SEQ ID NO: 15) and has a deduced molecular weight of 89.1 kDa. This polypeptide shows considerable identity with known pfl gene products (Table 3.1). Furthermore, it has been found that the corresponding pfl gene in Lactococcus lactis subspecies lactis MG1363 differs from the DB1341 gene in only about 5% of the nucleotides.
In specific embodiments, the DNA sequence comprising a Pfl encoding sequence comprises the coding sequence as shown in Table 3.2 below (SEQ ID NO:15), the sequence designated mgl363- pfl as shown in Table 3.6 (SEQ ID NO: 22) and the sequence shown in Table 5.3 (SEQ ID NOS:36 and 38), or a DNA sequence which is a mutant or variant hereof which codes for a polypeptide having pyruvate formate-lyase activity, the term "mutant or variant" being used in the same manner as defined hereinbefore.
In accordance with the invention, a pfl gene as defined herein encompasses any of the specific sequences as exemplified in the following and a lactic acid bacterial sequence coding for a polypeptide having the enzymatic activity of the gene products of such isolated sequences which has a DNA homology of at least 50% with the coding sequence of the plf of L. lactis strains DB1341 or MG1363 such as at least 60% homology including at least 70% homology or at least 80% homology, e.g. at least 90% homology.
In useful embodiments of the invention, the lactic acid bacterium-derived DNA sequence codes for a Pfl protein that is at least 30% identical with the gene products of the pfl gene of Streptococcus mutans (FASTA, GCG Wisconsin, Accession No. D50491) or the pfl gene of Hemophilus influenzae (FASTA, GCG Wisconsin, Accession Nos. U32812 and L42023) or the gene product of the sequence of Table 3.2 herein (SEQ ID NO: 15). In other useful embodiments, the identity to such gene products is at least 40%, such as at least 50%, such as at least 60% identity or even at least 70% identity. The homology between the above gene products may also be expressed in terms of amino acid similarity in which case the similarity suitably is at least 60%, such as at least 70%, e.g. at least 80% similarity.
In accordance with the invention, the DNA sequence coding for the Pfl polypeptide may also be a coding sequence of lactic acid bacterial origin that hybridizes to the pfl encoding sequence isolated from L. lactis strain MG1363, under the following conditions: hybridization overnight at 65°C followed by washing the filter twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes.
It was found that e.g. in L. lactis open reading frames may be identified upstream of the coding region for the Pfl polypeptide. Such open reading frames were designated orfA and it was found that the gene products hereof has a function in transport across cell membranes of formate. Thus, it was found that a mutant strain of L. lactis wherein the open reading had been disrupted showed an increased tolerance to the toxic formate analogue, hypophosphite.
In accordance with the invention there is also provided herein a recombinant replicon comprising the above Pfl -encoding DNA sequence. Such a replicon can be derived from a plasmid, a lactic acid bacterial bacteriophage or a lactic acid bacterial chromosome .
In one aspect the invention relates to a recombinant lactic acid bacterial host cell comprising such a replicon. The cell can be selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species a Bifidobacterium species and a Leuconostoc species.
The lactic acid bacterial cell may conveniently be provided in the form of a starter culture composition for use in the manufacturing of food products as described above. It is also contemplated that the above cells may be used as probiotically active cultures or as inoculants in animal feed preservation. In this connection, a particular use is as inoculants in field crops or animal waste materials which are subjected to an ensiling process.
In particularly useful embodiments, the above lactic acid bacterial cell is one wherein the DNA sequence coding for pyruvate formate-lyase activity is modified whereby the production of the pyruvate formate-lyase is reduced or eliminated or whereby the enzyme is produced in a modified form having a reduced pyruvate formate-lyase activity.
Such a modification can, as it has been described above for a cell comprising a sequence coding for the AdhE polypeptide, be made by methods which are known per se in the art. Thus, as typical examples, a DNA modification can e.g. be made by deletion, insertion or substitution of one or more nucleotides in the coding sequence possibly leading to the expression of a polypeptide having a modified amino acid composition. An inactivation of the coding sequence can also be obtained by random or site-directed mutagenesis, e.g. by using a transposable element which is integratable in the replicon comprising the coding sequence. Another possible means of providing Pfl-inactivated {pfl "Mutants is Campbell-like homologous integration.
The level of expression of the Pfl polypeptide can also be reduced by modifying or regulating regulatory sequences controlling the production of the polypeptide. Thus, as one example, a native constitutive promoter can be replaced by a regulatable promoter, the function of which can be reduced or inhibited under appropriate conditions such as those physical and chemical promoter regulating factors as mentioned hereinbefore. Alternatively, a native promoter which is in itself regulatable by certain factors may be replaced by another regulatable promoter which is negatively regulatable by other factors present in the cultivation medium for the recombinant cell .
A cell being modified in this manner will be a metabolically engineered cell, since under conditions where the pyrμvate formate-lyase is normally metabolically active as shown in Fig. 1 such a modified cell will lack one of the major pathways whereby the pyruvate pool in normally consumed. This will result in a modification of the metabolic pathways based on pyruvate including an enhanced flux towards α-acetolactate which is a precursor substance for diacetyl, acetoin and 2,3 butylene glycol. Such a cell is particularly useful in dairy starter cultures where such flavour compounds are generally desirable.
In further useful embodiments, the lactic acid bacterial cell according to the invention is a cell wherein the DNA sequence comprising the sequence coding for pyruvate formate- lyase is modified so that the production of the pyruvate formate- lyase is enhanced or so that the enzyme is produced in a modified form having an increased pyruvate formate- lyase activity. Analogously with what is described above with respect to the modifications leading to an enhanced expression or activity of the AdhE polypeptide, it is contemplated that such a modification can be provided by appropriate modifications of the coding sequence itself which result in an enhanced production level of the Pfl polypeptide and/or the production of a modified polypeptide having an enhanced activity of at least one of its native activities. Such modifications can be made by substitution, deletion or insertion of one or more nucleotides using any conventional methods for such DNA modifications, including random or site-directed mutagenesis followed by selection of the desired mutants.
Alternatively, a lactic acid bacterial cell having enhanced production of and/or enhanced activity of pyruvate formate- lyase can be provided by suitable modifications of sequences regulating the expression of the pfl gene and/or the activity of the enzyme. One suitable manner whereby this can be obtained is by operably linking the coding sequence to a promoter sequence having a stronger promoter activity than the native promoter for the coding sequence. In suitable embodiments such an inserted promoter is regulatable by a factor as mentioned above and the production of the polypeptide can then be enhanced by cultivating the cell in the presence of a factor which confers a strong promoter activity. It is contemplated that a thus modified lactic acid bacterial cell produces increased amounts of formate and/or acetate. Enhanced production of the Pfl polypeptide may also be obtained in a host by using a replicon which occurs in a high copy number in that host cell or by chromosomal amplification.
In accordance with the invention, there is also provided a recombinant lactic acid bacterial cell comprising both the DNA sequence comprising the above sequence coding for an AdhE polypeptide, and the above sequence comprising a sequence coding for pyruvate formate-lyase, in both instances including sequences regulating the production and/or the activity of the enzyme activities. As used herein, the term "recombinant" implies that at least one of the coding sequences or regulatory sequences is not a naturally occurring sequence. The sequences may be located on the same replicon or they may be on separate replicons.
Preferably, at least one of the sequences of the above cell is modified so as to modify the production of the pyruvate formate- lyase or the activity hereof, or the distribution of the amounts of end products resulting from the lactose and/or citrate metabolism of the cell.
It will be understood that a lactic acid bacterium which is metabolically engineered in accordance with the invention so that it has an enhanced production of one or more metabolites is useful in a method of producing such a metabolite or such metabolites. In general, such a the method comprises cultivating a lactic acid bacterium which is metabolically engineered in accordance with the invention under conditions where the metabolite is produced, and isolating the metabolite from the culture. The isolation of the metabolite may be carried out according to any conventional methods of recovering the particular substance, such as e.g. distillation.
As it is also mentioned above, the lactic acid bacterial cells according to the invention are useful as food starter cultures. In accordance herewith, the invention also provides a method of producing a food product, the method comprising the step of admixing to the food product starting materials a starter culture of a lactic acid bacterium as defined above and keeping the mixture under conditions allowing the starter culture to be metabolically active. Such a method where a starter culture which is metabolically engineered in accordance with the invention is used will, dependent on the type of metabolite modifications, result in a food product having an improved flavour and/or a product which has an improved shelf life due to an enhanced production of antimicrobially active metabolites by the starter culture.
The invention will now be further illustrated in the below examples and the drawing wherein:
Fig. 1 illustrates selected metabolic pathways in citrate fermenting lactic acid bacteria;
Fig. 2 shows an overview of the cloned L. lactis DB1341 adhE gene (open arrow) , the sequence strategy for clone 1 (box in middle) and the regions covered by the λZAP clones adhEl and adhE3 (bottom) . The nucleotide position of relevant restriction sites is shown (top) . The position of PCR and sequencing primers is shown as small open arrows. A putative transcription terminator present downstream of the stop codon is shown as a circle. The rbs box shows the position of a consensus lactococ- cal ribosome binding site. Arrows show the sequencing strategy for clone 1 (middle) ;
Fig. 3 shows an overview of the cloned L. lactis DB1341 adhE gene fragment (open arrow) . The nucleotide position of relevant restriction sites is shown (top) . The position of PCR and sequencing primers is shown as small open arrows. A putative transcription terminator present downstream of the stop codon is shown as a circle. The rbs box shows the position of a consensus lactococcal ribosome binding site. The cloned PCR fragments of the L. lactis MG1363 adhE gene are shown as lines (MGadhESTART and MGadhESTOP) . The PCR fragments used to clone into pSMA500 for gene inactivation in strain DB1341 are shown as open boxes (pSMAKAS4 and pSMAKAS5) ;
Fig. 4 is an overview of the cloned Lactococcus lactis DB1341 strain ( . lactis subspecies lactis biovar diacetylactis) pfl gene (open arrow box) . The nucleotide positions of relevant -.6 restriction sites are shown (top) . The position of PCR and sequencing primers is shown as small open arrows. A putative ribosome binding site (rbs box) and a transcription terminator present downstream of the stop codon is shown as a circle. The plfl (open box) shows the fragment of the λZAP clone of the DB1341 genomic library containing a pfl gene fragment. The cloned PCR fragment of the L. lactis subspecies lactis MG1363 pfl fragment is shown as a line (MGpfll) . A Sau3AI fragment used for gene inactivation in strain DB1341 is shown as an open box (pSMAKAS7) . The pfl region included in the fragment as obtained by inverse PCR from DB1341 using EcoRI digestion and primers pfll-250 and pfll-390 is shown as a dotted box (pflup- l);
Fig. 5 is a genetic map of the L. lactis MG1363 adhE locus including the orfB open reading frame. In the upper part are indicated primer sequences;
Fig. 6 illustrates the structure of the L. lactis OrfA protein. The shadowed box at the terminal region of OrfA depicts the area covered by the internal orfA fragment used for gene inactivation. The two transmembrane regions were identified using the PredictProtein server at the EMBL, Heidelberg, Germany;
Fig. 7 illustrates expression of orfA in L. lactis . A: genetic map of orfA showing the region covered by the probe (thick line below orfA) used in expression studies and in the construction of a null mutant strain. B: Northern blot analysis. RNA isolated from MG1363 was hybridized to the orfA probe. Lane 1: exponential culture in GM17 aerobic; lane 2: same, anaerobic- lane 3: stationary culture in GM17, aerobic; lane 4: same, anaerobic; lane 5: exponential culture i GalM17, aerobic; lane 6: same, anaerobic. The transcript size is shown in kb to the left. The autoradiogram was exposed for 14 days; Fig. 8 illustrates inhibition of growth by hypophosphite in strains of L . lactis. Strains were grown anaerobically overnight in GM17 supplemented with different concentrations of hypophosphite. At the end of the incubation period (about 18 hours), OD600 was measured. Symbols: (♦) MG1363; (A) MG1363Δo- rfA; (■) MG1363 pAK80 :. - OrfA;
Fig. 9 shows a genetic map of the L. lactis MG1363 pfl gene, showing the region used as a probe in the identification of pfl homologues in other lactic acid bacteria, including the posi- tion of EcόRl sites;
Fig 10 shows autoradiograms from Southern hybridization of genomic DNA from non- Lactococcus lactic acid bacteria to a L. lactis pfl probe; Lane 1: L. lactis MG1363; lane 2: Streptococcus thermophilus; lane 3 : Leuconostoc mesenteroides; lane 4 Lactobacillus acidophilus. Bands are shown in kb. Filters were exposed 2 h (A) or overnight (B) ;
Fig. 11 illustrates two Sau3AI fragments including most of the L. lactis strain DB1341 adhE coding sequence used in Southern hybridization experiments with EcoRI-digested genomic DNA from no - Lactococcus lactic acid bacteria;
Fig. 12 illustrates detection of adhE homologues in other lactic acid bacteria by Southern hybridization experiments with EcoRI-digested genomic DNA from non- Lactococcus lactic acid bacteria. Lane 1: L . lactis MG1363; lane 2: S. thermophilus; lane 3: L. mesenteroides; lane 4 L. acidophilus. Bands are shown in kb. Filters were exposed overnight; EXAMPLE 1
Cloning of the L. lactis adhE gene
1. Construction of a L. lactis ssp. lactis biovar diacetylactis DB1341 genomic library for genetic complementation
A genomic library was constructed by cloning partially Sau3AI- digested chromosomal DNA from strain DB1341 into BamHI-digested pSMA500 (Madsen et al . 1996) and transforming into E. coli MC1000 by electroporation (Sambrook et al . , 1989). Strain DB1341 was kindly provided by Chr. Hansen A/S, Hørsholm, Den- mark. The genomic library consisted of about 10,000 independent recombinant clones with an average insert size of 4 kb. A mixed culture, containing all clones obtained, was grown in LB + erythromycin (erm, 50 μg/ml) and plasmid DNA was isolated for genetic complementation.
2. Genetic complementation in E. coli NZNlll using the pSMA500 library
E. coli strain NZNlll {pfl ' ; Idh: :Tn5; kanR) is unable to grow in the absence of 02 due to the accumulation of NADH derived from the lack of fermentative enzyme activities encoded by the pfl and Idh genes (Mat-Jan et al . , 1989).
Genetic complementation was attempted by transformation of NZNlll using 200 ng plasmid DNA from the library (see above) . Transformation mixtures were plated on LB + erm (50 /zg/ml) + kanamycin (kan; 50 μg/ml) and incubated at 37°C in anaerobic jars. As a control, pSMA500- transformed strain NZNlll was used. After two days, transformation plates were incubated aerobical- ly for another two days to allow weak complementing clones to grow. A clone was identified (clone 1) in the library- transformed plates, and no growth was observed in the pSMA500 con- trol. In a preliminary screening, protein extracts of clone 1 were used in a modified "Ldh" assay (Crow and Pritchard 1977) , where the pyruvate-dependent conversion of NADH to NAD is monitored, to ensure that complementation of the fermentative defects in strain NZNlll had occurred. Protein extraction was carried out adding 100 μl 100 mM MOPS buffer (pH 6.5); 2 % Triton X-100 to the cell pellet from 1.5 ml stationary cultures grown in LB + erm (50 μg/ml) which had been washed in fresh ice cold LB, and frozen at -80°C for 15 min. Pellets were dissolved and trans- ferred to Eppendorf tubes. Lysozyme (5 mg) was added and samples were incubated on ice for 30 min. Subsequently, glass beads (100 μM, Sigma; 100 μl ) were added and samples were vortexed for 30 sec and kept on ice for 30 sec. This step was repeated 10-15 times, and samples were centrifuged at maximum speed for 2 min. Supernatants were transferred to a new Eppendorf tube and kept at -80°C until assayed. To measure NADH oxidation, the following components were mixed in a quartz cuvette: 700 μl 100 mM MOPS, pH 6.5; 100 μl 120 mM Na-Pyruvate; 50 μl 2.56 mM NADH and 50 μl H20. The decrease in OD340 as a result of the oxidation of NADH to NAD was monitored after the addition of 100 μl sample. As control reaction, pyruvate was omitted. No significant decrease in OD was observed in the control. A relatively high conversion rate (approximately 2- fold as compared to the NZNlll: :pSMA500 control) was observed in clone 1.
Plasmid DNA was isolated from clone 1 and used to retransform E. coli NZNlll. Duplicate LB + erm plates were incubated (i) aerobically for 4 days or (ii) anaerobically for 2 days and then 2 days aerobically at 37°C. A similar number of transfor- mants was obtained in both procedures (see Table l.l below) Thus, clone 1 did not result from artifact cloning and can indeed complement the defect in strain NZNlll. Table 1.1. Retransformation of clone 1 into E. coli NZNlll
Figure imgf000032_0001
NZNlll competent cells were electroporated with the corresponding plasmid, and one half of the cell mixture was plated onto LB + kan + erm and incubated without 02
(anaerobic growth) , and the other half was plated onto the same medium and incubated with 02 (aerobic growth) . Trans - formants were scored after 4 days (see main text) .
A sample of clone 1 in E. coli was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell
Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11093.
3. Sequence analysis of clone 1 and identification of an adhE fragment
Clone 1 was further characterized by restriction enzyme analysis and included a 2.2 kb insert. Sequence analysis determined that it included a 1.7 kb fragment of an open reading frame (ORF) showing homology to the E. coli adhE gene disclosed by Goodlove et al . , 1989. The sequence of the 2.2 kb insert is shown in Table 1.2 below (SEQ ID N0:1). Table 1.2. Sequence of the insert in clone 1
Sau3AI 1 GATCTGTCCTTAGTACGAGAGGACCGGGATGGACTTACCGCTGGTGTACC
51 AGTTGTTCCGCCAGAGCACGGCTGGATAGCTATGTAGGGAAGGGATAAGC
101 GCTGA7 GCATCTAAGTGCGAAGCCACCTC-AAGATGAGATTACCCATTCG
_7au3AI 151 AGAATTAAGAGCCCAGAGAGATGATC-AAGATGTC-AATAATTTGC-AAAAAA
201 TCTTCTTTCAGC-AAAACGGGATTTGAGTTTTTGCTCGATTTGTGGGAATT Sa U3AI 251 TAAC-AGAAAGTGATCTGTTGAAATCGCAAGCCCTCTCGGTGTACTTGCTG
301 GTATCGTTCCL^CGACTAATCC-AACATCAAC^
351 TTGAITTGaVAAAACACGTAATGCTATTGTTTTCGCTTTCCACCCTCAAGC
401 TCLAAAAATGTTC-^-AGCCΛTGC-AGα y^AATTGTTTACGATGCTG<_ -AT^
451 AAGCTGGTGC^CCGGAAGACirTATT -ATGGATTC-AAGTACCLAAGCCTT
501 GAIATGACTACCGCCTTGATTC-AAAACCGTGGACrrTGC-RAI-^ΛTCCTTGC
551 AACTGGTGGCC(_AGGAATGGTAAACGCCGC_ACrC!AAATCTGGTAACCCTT
601 C-ΛCTCGGTGTTGGAGCTGGTAATGGTGCTGTTTATGTTGATGCAACTGCA
651 AATATTGAACGTGCCGTTGAAC3ACCTTTTGCTTTCAAAACGTTTTGATAA
-35 701 TGGGATGATTTGTGCCΑCTGAAAATTC-AGCTGTTATTGATGCTΓC-?_GTTT
-10 SD
751 TGATGAATTTATTGCTAAAATGCAAGAACAAGGCGCTTATATGGTTCCT
M V P 3
801 AAAAAAGACTACAAAGCTATTGAAAGTTTCGTTTTTGTTGAACGTGCTGG K K D Y K A I E S F V F V E R A G 20
851 TGAAGGTTTTGGAGTAACTGGTCCTGTTGCCGGTCGTTCTGGTCAATGGA
E G F G V T G P V A G R S G Q I 37 901 TTGCTGAACAAGCTGGTGTC-AAAGTTCCTAAAGATAAAGATGTCCTTCTT
A E Q A G V K V P K D K D V L L 53 951 TTTGAACTTGATAAGAAAAATATTGGTGAAGCACTTTCTTCTGAAAAACT
F E L D K K N I G E A S S E K 70 1001 TTCTCCTTTGCTTTCAATCTACAAAGCTGAAACACGTGAAGAAGGAATTG
S P L S I Y K A E T R E E G I E 87 1051 AGATTGTACGTAGCTTACTTGCTTATCAAGGTGCTGGACIATAATGCTGCA
I V R S L A Y Q G A G H N A A 103
Sau3AI 1101 ATTC-AAATCGGTGC1AATGGATGATCC-ATTCGTTAAAGAATATGGCGAAAA
I Q I G A M D D P F V K E Y G E K 120 1151 AGTTGAAGCTTCTCGTATCCTCGTTAACCAACCAGATTCTATTGGTGGGG
V E A S R I L V N Q P D S I G G V 137 1201 TCGGAGATATCTATACTGATGCAATGCGTCCATC-ACTTACIA ITGGAACT
G D I Y T D A M R P S L T L G T 153
Sau3AI 1251 GGTTCATGGGGGAAAAATTCACTTTC-ACACAATTTGAGTACATACGATCT
G S G K N S L S H N L S T Y D 170 1301 ATTGAATGTTAAAAC-AGTGGCTAAACGTCGTAATCGCCCACAATGGGTTC
L N V K T V A K R R N R P Q V R 187 1351 GTTTGCCAAAAGAAAΓΓTACTACGAAAAAAATGCIAATTTCTTACTTACAA L P K E I Y Y E K N A I S Y Q 203
1401 GAATTGCCIACACGTCC-ACAAAGCTTTCATCGTTGCTGACCCTGGTATGGT
E P H V H K A F I V A D P G M V 220 1451 TAAATTTGGTTTCGTTGATAAAGTTTTGGAACAACTTGCTATCCGCCCAA
K F G F V D K V L E Q L A I R P T 237 1501 CT(_1?U.GTTGAAAC1AAGCATTTATGGCTCTGTTCAACCTGACCC-AACTTTG
Q V E T S I Y G S V Q P D P T L 253 1551 AGCGAAGC-AATTGCAATCGCTCGTCLAAATGAAAC-AATTTGAACCTGACAC
S E A I A I A R Q M K Q F E P D T 270 1601 TGTCATCTGTCTTGGTGGTGGTTCTGCTCTCGATGCCGGTAAGATTGGTC V I C L G G G S A D A G K I G R 287
1651 GTTTGATTTATGAATATGATGCTCGTGGTGAAGCTGACCTTTCTGATGAT
L I Y E Y D A R G E A D L S D D 303 1701 GOΛGTTTGAAAGAACTTTTCΑUVGAATTAGCTC^AAAATTTGTCGATAT
A S L K E L F Q E L A Q K F V D I 320 1751 TCGTAAACGTATTATTAAATTCTACC1ATCC-ACATAAAGCA(_ΛAATGGTTG
R K R I I K F Y H P H K A Q M V A 337 1801 CAATTCCTACTACTTCTGGTACTGGTTCTGAAGTGACTCCATTTGCAGTT
I P T T S G T G S E V T P F A V 353 1851 ATCACTGATGATGAAACTCATGTTAAGTACCCACTTGCTGACTACCAATT I T D D E T H V K Y P L A D Y Q L 370
1901 AACACCAC1AAGTTGCC-ATTGTTGACCCTGAGTTTGTTATGACTGTACCAA
T P Q V A I V D P E F V M T V P K 387 1951 AACGTACTGTTTCTTGGTCTGGTATTGATGCGATGTCΑCACGCGCTTGAA
R T V S S G I D A M S H A L E 403 2001 TCTTACGTTTCTGTTATGTCTTCTGACTATACAAAACCAATTTCACTTCA
S Y V S V M S S D Y T K P I S L Q 420 Sau3AI
2051 AGCGATCCCGGGTCTAGATTAGGGTAACTTTGAAAGGA (SEQ ID NO:l)
A I P G L D * (SEQ ID NO:2) 426
Sau3AI recognition sites are indicated above the sequence. DNA homology to the E. coli adhE starts at nucleotide position 262 (data not shown) . A Sau3AI fragment with 100% homology to the 23S rRNA of L. lactis is shown doubly underlined at the top (positions 1-173) . Putative expression signals functional in E. coli are shown: -35, -10 promoter regions (underlined) ; Shine Dalgarno (SD, doubly underlined) and putative start codon (bold, discontinuous underline) . The amino acid sequence of the open reading frame is given in one-letter- code. The open reading frame ends in the multiple cloning site of vector pSMA500 (doubly underlined at bottom) (Madsen et al . , 1996).
E. coli AdhE is a multi-functional protein consisting of 890 amino acids that catalyzes the conversion of acetyl CoA into ethanol and has acetaldehyde-DHase (ACDH) and alcohol -DHase (ADH) activities. Additionally, AdhE shows Pfl deactivase activity involved in the inactivation of pyruvate- formate lyase, a key enzyme in anaerobic metabolism (Knappe et al . 1991) .
As shown in the above Table 1.2 and Table 1.3 below, clone 1 includes the ADH domain of a L. lactis AdhE homologue, and it contains expression signals necessary for expression in E. coli (Shine Dalgarno and -35 and -10 regions) . The putative gene product of 427 amino acids is highly homologous to a number of other iron-dependent ADHs. Comparison at the protein level showed a 41.4% identity (78% similarity) with E. coli AdhE, in addition to significant homology to other ADHs of both eukaryotic and prokaryotic origin (Table 1.3). Table 1.3. Homology search (FASTA. GCG Wisconsin package version 8. Genetics Computer Group) using the 427 amino acid putative protein encoded by clone 1 (see also Table 1.2)
The region of homology to AdhE corresponds to the central region, where the ADH domain is possibly located. Only homology to the best score is shown.
(Peptide) FASTA of: clonel.pep from: 1 to: 427 TRANSLATE of: clonel.seq check: 2521 from: 792 to: 2072
The best scores are: initl initn opt.. sw:adhe_ecoli P17547 escherichia coli. alcohol dehydroge.276 736 768 s : adhe_cloab P33744 clostridium acetobutylicum. alcoh..256 600 703 ε :adhl_cloab P13604 clostridium acetobutylicum. nadph..256 357 279 s :medh_bacmt P31005 bacillus methanolicus. nad- depend..169 224 173 ε :adh4_yeast P10127 saccharomyces cerevisiae (baker' s..146 224 165 sw:adhf_schpo Q09669 schizosaccharomyces pombe (fission.146 219 162 s :yiay_ecoli P37686 escherichia coli. hypothetical 40..158 218 187 sw:sucd_clokl P38947 clostridium kluyveri . succinate-s ..132 186 179 s : adh2_zymmo P06758 zymomonaε mobilis. alcohol dehydr..l29 180 169 sw:fuco_ecoli P11549 escherichia coli. lactaldehyde re..141 175 147 sw:adha_cloab Q04944 clostridium acetobutylicum. nadh-..136 153 145 clonel .pep s : adhe_ecoli
ID ADHE_ECOLI STANDARD; PRT; 890 AA. AC P17547;
DE ALCOHOL DEHYDROGENASE (EC 1.1.1.1) (ADH) / ACETALDEHYDE DEHYDROGENASE . . .
SCORES Initl: 276 Initn: 736 Opt: 768
41.4% identity in 430 aa overlap 10 20 30
Clonel MVPKKDYKAIESFVFVERAGEGFGVTGPVA adhe_e GVICASEQSVVVVDSVYDAVRERFATHGGYLLQGKELKAVQDVIL- -KNG- - -ALNAAIV 250 260 270 280 290 40 50 60 70 80 90 clonel GRSGQWIAEQACJV-CVP-ΦKDVLLFEI_D-α<_NIGEALSSEKLSPLLSIYKAETREEGIEIVR
I::: III II :||:::::|: |:: : :|::: Hill |::|:|:: |:::| : adhe_e GQPAYKIAEI.AGFSVPENTKILIGEVTVVDESEPFAHEKLSPTI-AMYRAKDFEDAVEKAE 300 310 320 330 340 350 100 110 120 130 140 149 clonel SLIAYCCAGHNAAIQIGAiTDDP-FVKEYGEKVEASRILVNQPDSIGGVGDIYTDAMRPSL
:|:| I ||:: : ::: ::| |: : | : | : : : : | | | : | |:| I I : I I = I = = III adhe_e KLVAMGGIGHTSCLYTDQDNQPARVSYFGQKRIKTARILINTPASQGGIGDLYNFKLAPSL 360 370 380 390 400 410 150 160 170 180 190 200 clonel TLGTGS GKNSLSHNLSTYDLLNVKTVAKRRNRPQWVRLPKEIYYEKNAISY-LQE-LPH
III MM l|:|:|::: :|:| :: I : I I I = I I : : : : : |:| ::: adhe_e TLGCGSWGGNSISENVGP-O.LIN KTVAKRAENMLHKLPKSIYFRRGSLPIALDEVITD 420 430 440 450 460 470
210 220 230 240 250 260 clone1 VHK-AFIVADPGMVKFGFVDKVLEQLAIRPTQVETSIYGSVQPDPTLSEAIAIARQMKQF
II |:||:|: = = MM:: : I : : = I II = = : = I = = II I I I : I : I adhe_e GHKRALIVTDRFLFNNGYADQITSVL- -KAAGVETEVFFEVEADPTLSIVRKGAELANSF 480 490 500 510 520 530
270 280 290 300 310 320 clonel EPDTVICLGGGSALDAGKIGRLIYEYDARGEADLSDDASLKELFQELAQKFVDIRKRIIK
:|| = :| MilMMMM :::|| = M = Mill MMIIIM I adhe_e KPDVIIALGGGSPMDAAKIMWVMYE HPETH FEELALRFMDIRKRIYK 540 550 560 570 580
330 340 350 360 370 380 clonel FYH- PHKAQMVAIPTTSGTGSEVTPFAVITDDETHVKYPLADYQLTPQVAIVDPEFVMTV
I : I h I : I :: I I I II I I I I II II I : I II : I II I II I I : II I : = I I I I : : : II : : adhe_e FPiα«GVKA-O.IAVTTTSGTGS- TPFAVVTDDATGQ-C-'PLADYALTPDirAIV^
590 600 610 620 630 640
390 400 410 420 clonel PKRTVS SGIDAMSHALESYVSVMSSDYTKPISLQAIPGLD (SEQ ID NO: 2)
Ih :::|:||::||:|:||||::|:::: :|||: |: adhe_e PKSLCΑFGGLDAVT-lA-4EAYVSVI-ASEFSDGQA--φAL-α^ 650 660 670 680 690 700 adhe_e VHSAATIAGIAFANAFLGVCHSMAHKLGSQFHIPHGLANALLICNVIRYNANDNPTKQTA (corresponding to a. a. residues 43-762 of SEQ ID NO: 6)
710. 720 730 740 750 760
4. DNA hybridization of the DB1341 λZAP library using an adhE fragment
Sequence comparison of clone 1 with the previously cloned adhE gene indicated that the first 500 bp and the last 600 bp of the putative L. lactis adhE homologue were not present in clone 1. Therefore, a λZAP genomic library of strain DB1341 was con- structed according to manufacturer's instructions (Stratagene) . The average insert size was estimated to be approx. 3 kb, with 80% recombinant clones. Approximately 2 x 105 pfu were screened using a 0.8 kb Sau3AI fragment (position 1296-2054 in Table 1.2) and 10 positive clones (named adhE-1 to 10 were selected for characterization. 5. Sequencing of positive λZAP adhE clones
Following 'in vivo' excision of the pBK plasmid version (Stratagene) of the clones, restriction mapping and sequencing of clones adhE-1 and adhE-3 was carried out as shown in Fig. 2. Clone adhE-1 included a 1.7 kb insert that was identical to the adhE fragment of clone 1 (position 262-2054 in Table 1.2) . Clone adhE-3 contained a 4 kb insert spanning from the Sau3AI site at position 1296 in Table 1.2. This fragment could harbour the 3 '-end of the L. lactis adhE gene. Sequence analysis of this clone confirmed that it included the 3 '-end of the L . lactis adhE gene, which ends with a double stop codon (TAATAA, position 2854-2859 in Table 1.4 below). Downstream from this position, a possible transcription terminator was found (position 2883-2905 in Table 1.4).
A sample of clones adhE-1 and adhE-3, respectively in E. coli was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 25 July 1996 under the accession Nos DSM 11101 and DSM 11102, respectively.
Table 1.4. Sequence of the L. lactis DB1341 adhE gene (SEQ ID NO: 3)
In this Table a putative ribosome binding site is shown in bold (position 127-133) , 12 bp upstream the putative start codon (position 145-147), deduced from homology comparisons (Figs. 2 and 3) . Two adjacent stop codons, located at position 2854-
2859) are shown (double underline) . A putative rho- independent transcription terminator (de Vos and Simons, 1994) is also shown downstream of the stop codons at position 2883-2904 (single and dotted underline show stem and loop sequences, respectively) . AAGCTTGTTAO -ΪΛCCGTTTTCTAAAC-TTTTGATGAGTGTTTTTGTAAA 1 + --+ --+ + + 50
AACTATCACΛATATTGCTTGACAT TATAAAAAACTTTGTTAAACTATTC 51 + --+ + --+ + 100
ACGTAAAAGAAAGTGAATGAAGTCACAAAGGAGAACCTACAAATATGGCA 101 + + + + -+ 150
MetAla
ACTAAAAAAGCCGCTCC^GCTGC-AAAGAAAGTTTTAAGCGCTGAAGAAAA 151 + + - + - - - + + 200
ThrLysLyεAlaAlaProAlaAlaLysLysValLeuSerAlaGluGluLys AGCCGCAAAATTCC-AAGAAGCTGTTGCTTATACTGACAAATTAGTCAAAA
201 + + + + + 250
AlaAlaLysPheGlnGluAlaValAlaTyrThrAspLysLeuValLysLyB -
AAGCAα .G TGCTGTTCTTAAATTTGAAGGATATA(-ACAAAC CAAGTC 251 + --+ + + + 300
AlaGlnAlaAlaValLeuLysPheGluGlyTyrThrGlnThrGlnVal
GATACTATTGTCG roθ^TGGCTCTTGCAGCAAGαiAA(-ΑTTCTCTAGA 301 +- -+ -+ + + 350 AspThrlleValAlaAlaMetAlaLeuAlaAlaSerLysHisSerLeuGlu
ACTCGCTCATGAAGCCGTTAACGAAACTGGTCGTGGTGTTGTCGAAGACA 351 + + + + + 400
LeuAlaHisGluAlaValAsnGluThrGlyArgGlyValValGluAspLyε -
AAGATACO^AAAACCIACTTTGC-TΓCΓGAATCTGTTTATAACGC-AATTAAA
401 --+ + + --+ + 450
AspThrLysAsnHisPheAlaSerGluSerValTyrAsnAlalleLys AATGACLAAAACTGTTGGTGTC-ATTTCTGAAAACAAGGTTGCTGGATCTGT
451 + + --+ + -+ 500
AεnAspLysThrValGlyVallleSerGluAsr-LysValAlaGlySerVal
TGAAATCGCAAGCCCTCTCGGTGTACTTGCTGGTATCGTTCCAACGACTA 501 - + -+ + + + 550
GluIleAlaSerProLeuGlyValLeuAlaGlylleValProThrThrAsn -
ATCCAACΑTOΛCAGC-AATCTTTAAATCT 551 + + --+ + + 600 ProThrSerThrAlallePheLysSerLeuLeuThrAlaLysThrArg
AATGCTATTGTTTTCGCTTTCC-ACCCTCAAGCTC^AAAATGTTC-AAGCαi 601 + + + + + 650
AεnAlalleValPheAlaPheHisProGlnAlaGlnLysCysSerSerHis
TGCAGCAAAAATTGTTTACGATGCTGCAATTGAAGCTGGTGCACCGGAAG
651 --+ --+-- --+ --+ + 700
AlaAlaLysIleValTyrAspAlaAlalleGluAlaGlyAlaProGluAsp - ACTTTATTCAATGGATTGAAGTACCAAGCCTTGAC-ATGACTACCGCCTTG
701 + + + + + 750
PhelleGlnTrpIleGluValProSerLeuAεpMetThrThrAlaLeu
ATTC_AAAACCGTGGACTTG(_AACAATCCTTGαΛCrrGGTGGCCCAGGAAT 751 + + + + + 800
IleGlnAsnArgGlyLeuAlaThrlleLeuAlaThrGlyGlyProGlyMet GGTAAACGCCGCACTC^ΛTCTGGTAACCCTTCACTCGGTGTTGGAGCTG 801 + - - + + + + 850
ValAsnAlaAlaLeuLysSerGlyAsnProSerLeuGlyValGlyAlaGly - GTAATGGTGCTGTTTATGTTGATGC-AACTGCAAATATTGAACGTGCCGTT
851 - + + - - + + + 900
AεnGlyAlaValTyrValAεpAlaThrAlaAsnlleGluArgAlaVal
GAAGACCTTTTGCTTTOiAAACGTTTTGATAATGGGATGATTTGTGCCAC 901 -- --+ + + + + 950
GluAspLeuLeuLeuSerLysArgPheAspAsnGlyMetlleCysAlaThr
TGAAAATTCAGCTGTTATTGATGCTTC-AGTTTATGATGAATΓTATTGCTA
951 + + + + - + 1000 GluAsnSerAlaVal lleAspAlaSerValTyrAspGluPhel leAlaLyε -
AAATGCAAGAACIAAGGCG I ATATGGTTCCTAAAAAAGACTACAAAGCT
1001 + + + + + 1050
MetGlnGluGlnGlyAlaTyrMetValProLysLysAspTyrLysAla
ATTGAAAGTTTCGTTTTTGTTGAACGTGCTGGTGAAGGTTTTGGAGTAAC
1051 -- --+ +-- + + + 1100
IleGluSerPheValPheValGluArgAlaGlyGluGlyPheGlyValThr -
TGGTCCTGTTGCCGGTCGTTCTGGTCAATGGATTGCTGAACAAGCTGGTG 1101 + + + --+- -- + 1150
GlyProValAlaGlyArgSerGlyGlnTrpIleAlaGluGlnAlaGlyVal -
TC-AAAGTTCCTAAAGATAAAGATGTCCTT ΓTTTTGAACTTGATAAGAAA
1151 + + + + + 1200 LyεValProLyεAspLysAspValLeuLeuPheGluLeuAspLysLys
AATATTGGTGAAGC-ACTTT ΓΓCTGAAAAACTTTCTCCTTTGCTTTCAAT
1201 -- + + + +-- -+ 1250
AsnlleGlyGluAlaLeuSerSerGluLysLeuSerProLeuLeuSerlle
CTAClAAAGCrrGAAA-ACGTGAAGAAGGAATTGAGATTGTACGTAGCTTAC
1251 --+ + + --+ + 1300
TyrLysAlaGluThrArgGluGluGlylleGluIleValArgSerLeuLeu -
TTGCTTATO^GGTGCTGGACATAAlOCTGr_AATTC-W^TCGGTGCAATG 1301 + + + + + 1350
AlaTyrGlnGlyAlaGlyHisAεnAlaAlalleGlnlleGlyAlaMet
GATGATCCATTCGTTAAAGAATATGGCGAAAAAGTTGAAGCTTCTCGTAT
1351 + + + + + 1400
AspAεpProPheValLysGluTyrGlyGluLysValGluAlaSerArglle
CCTCGTTAACCAACCIAGATTCTATTGGTGGGGTCGGAGATATCTATACTG
1401 + - + + + - + 1450
LeuValAsnGlnProAspSerlleGlyGlyValGlyAspIleTyrThrAsp - ATGCAATGCGTCCATCftCTTAC-ACTTGGAACTGGTTCATGGGGGAAAAAT
1451 + + - - - + + + 1500
AlaMetArgProSerLeuThrLeuGlyThrGlySerTrpGlyLysAsn
TCIACTTTC-ACΛC-AATTΓGAGTAC-ATACGATCTATTGAATGTTAAAACAGT 1501 + + + + + 1550
SerLeuSerHisAsnLeuSerThrTyrAspLeuLeuAsnValLysThrVal
GGCTAAACGTCGTAATCGCCC-AC-^TGGGTTCGTTTGCCAAAAGAAATTT
1551 + + - - + + + 1600
AlaLysArgArgAsnArgProGlnTrpValArgLeuProLysGluIleTyr - j y
ACTACGAAAAAAATG(-AATTTCTTACTTAC-AAGAATTGCCACACGTCCAC
1601 + + + + + 1650
TyrGluLysAsnAlalleSerTyrLeuGlnGluLeuProHisValHis AAAGCTTTC-ATCGTTG rGACCCTGGTATGGTTAAATTTGGTTTCGTTGA
1651 + + + + + 1700
LysAlaPhelleValAlaAεpProGlyMetValLysPheGlyPheValAsp
TAAAGTTTTGGAACAACTTGCTATCCGCCC-AACTα^GTTGAAACAAGCA 1701 ---+ + --+ + + 1750
LysValLeuGluGlnLeuAlalleArgProThrGlnValGluThrSerlle -
TTTATGGCTCTGTTCAACCTGACCCAACTTTGAGCGAAGC-AATTGCAATC
1751 + + + + + 1800
TyrGlySerValGlnProAspProThrLeuSerGluAlalleAlal le
GCTCGTCAAATGAAAC2AATTTGAACCTGACACTGTC-ATCTGTCTTGGTGG
1801 + + - + + + 1850
AlaArgGlnMetLysGlnPheGluProAspThrVallleCysLeuGlyGly - TGGTTCTGCTCTCGATGCCGGTAAGATTGGTCGTTTGATTTATGAATATG
1851 -- + + + + + 1900
GlySerAlaLeuAspAlaGlyLysIleGlyArgLeuIleTyrGluTyrAsp -
ATGCTCGTGGTGAAGCTGACCTTTCTGATGATGCAAGTTTGAAAGAACTT 1901 + + -+ + + 1950
AlaArgGlyGluAlaAspLeuSerAspAspAlaSerLeuLysGluLeu
TTCCAAGAATTAGCTCAAAAATTTGTCGATATTCGTAAACGTATTATTAA 1951 + + + + + 2000 PheGlnGluLeuAlaGlnLysPheValAspIleArgLysArgllelleLys
ATTCTACCATCC-ACATAAAG(-A<-3tfΛ^
2001 + + + + + 2050
PheTyrHisProHisLysAlaGlnMetValAlalleProThrThrSerGly - GTACTGGTTCTGAAGTGACTCC-ATTTGCAGTTATCACTGATGATGAAACT
2051 + + + + + 2100
ThrGlySerGluValThrProPheAlaVallleThrAspAspGluThr
(.ΑTGTTAAGTACCC^CTTGCTGACTACC-AATTAACACCAαVAGTTGCCAT 2101 +- + + -+ + 2150
HisValLyεTyrProLeuAlaAspTyrGlnLeuThrProGlnValAlalle
TGTTGACCt-TGAGTTTGTTATGACTGTACCAAAACGTACTGTTTCTTGGT
2151 + --+ + + -+ 2200 ValAspProGluPheValMetThrValProLysArgThrValSerTrpSer -
CTGGTATTGATGCGATGT(- .(- -CGCGCTTGAATCTTACGTTTCTGTTATG
2201 + + + + + 2250
GlylleAspAlaMetSerHisAlaLeuGluSerTyrValSerVal et
TCTTCTGACTATAC-AAAACClAATTTCACri C-ftAGCGATCAAACTTAT π
2251 + + + + + 2300
SerSerAεpTyrThrLysProIleSerLeuGlnAlalleLysLeuIlePhe TGAAAACI GACTGAGTCTTATC-ATTATGACCCAGCGCATCCAACTAAAG
2301 + + + + - - - - + 2350
GluAsnLeuThrGluSerTyrHisTyrAspProAlaHisProThrLysGlu -
AAGGACAAAAAGCCCGCGAAAA(-ATGCAC VATGCTG<-1AACACTCGCTGGT 2351 - - + + - - - - + + - - - - + 2400
GlyGlnLysAlaArgGluAsnMetHi sAsnAlaAlaThrLeuAlaGly ATGrGCCTTCGCTAATGCrTTCCTTGGAATTAAC
2401 + + + + + 2450
MetAlaPheAlaAsnAlaPheLeuGlylleAsnHiεSerLeuAlaHiεLys AATTGGTGGTGAATTTGGACTTCCTC^TGGTCTTGCCATTGCCATCGCTA
2451 + + + + + 2500
IleGlyGlyGluPheGlyLeuProHisGlyLeuAlalleAlalleAlaMet -
TGCC-AC^TGTCIATTAAATTTAACGCTGTAACAGGAAACGTTAAACGTACC 2501 --+ + + + + 2550
ProHisVallleLysPheAsnAlaValThrGlyAsnValLysArgThr
CCTTACCC-ACGTTATGAAACATATCGTGCTCAAGAGGACTACGCTGAAAT
2551 + +- +-- --+ + 2600 ProTyrProArgTyrGluThrTyrArgAlaGlnGluAspTyrAlaGluIle
TT(_ΛCGCTTC1ATGGGATTTGCTGGTAAAGATGA C1A.GATGAAAAAGCTG
2601 + +- -+ + + 2650
SerArgPheMetGlyPheAlaGlyLysAspAspSerABpGluLysAlaVal -
TGCAAGCrCTGGTTG(^TGAACl^AAGAAAOTGACTGATAGCATTGATATT
2651 + + + - - + + 2700
GlnAlaLeuValAlaGluLeuLysLysLeuThrAspSerlleAspIle AATATC1ACCCTTTCΛGGAAATGGTATCGATAAAGCTCACCTTGAACGTGA
2701 - + + + + + 2750
AsnlleThrLeuSerGlyAsnGlylleAspLysAlaHisLeuGluArgGlu
ACTTGATAAATTGGCTGACCTTGTTTATGATGATCAATGTACTCCTGCTA 2751 + + + + + 2800
LeuAspLysLeuAlaAspLeuValTyrAspAεpGlnCyεThrProAlaAεn -
ATCCTCGTOVACCAAGAATTGATGAGATTAAACAGTTGTTGTTAGATCAA
2801 + + + + + 2850 ProArgGlnProArglleAspGluIleLysGlnLeuLeuLeuAspGln
TACTAATAATCTGTTGATAAAATTATTAAAACGCTCTGATGAATTCGTCΆ
2851 + + + ::::;-..+ 2900
TVrEndEnd (SEQ ID NO: 4)
G^G£ATTTTTTATTATAGCΠTATA<-1AAC-TATCAAAAGGTATAAATΓ-AATT
2901 + +- --+ + + 2950
TCGATATAGGCTCTTTTCACTCC-ATTGATTTATGCATTTCTATAAAAATC
2951 -+ + -- --+ -+ + 3000
AATAATTAATTAGCGATAGAAGTCGAGTTCATGCATGCTAATAATGAAAT
3001 + + + + + 3050
TG ΓΓAAATTCTΓGGTTTTTCTTTATGTTCTTTGCGAAC-ATCTTTCACAG 3051 + + + + + 3100
TTTCTTTGTTCATGAAAATTCCTCCTTATTATGGTACTATTTTGAGCCCA
3101 -- + + + -+ + 3150 AATAGTTATATAAGAATCCTAAACTTCGGATATCTTATCAAAG (SEQ ID NO: 3) 3151 +- + + + 3193 The L . lactis adhE gene of strain DB1341 encodes a 903 amino acid long protein, as deduced from the DNA sequence (Table 1.5), with an estimated molecular weight of 98.2 KDa. A putative ribosome binding site (AAAGGAG, position 127-133 in Table 1.4 is found 11 bp upstream of the start codon (de Vos and Simmons 1994) .
Homology comparisons have shown a 44% identity (81% similarity) of the L . lactis AdhE to the E. coli protein and 42.4% identity (80% similarity) to the Clostridium acetobutylicum Aad protein throughout an approx. 750 amino acids fragment (Tables 1.4 and 1.5). A significantly lower homology is observed at the C- terminal region of these three proteins.
Table 1.5. Protein homology search (FASTA, GCG Wisconsin package version 8. Genetics Computer Group) using the deduced secruence of the AdhE protein encoded by the L. lactis DB1341 adhE gene
In this Table only alignment of the best two scores (E. coli AdhE and C. acetobutylicum Aad) is shown.
(Peptide) FASTA of: adhedbl341.pep from: 1 to: 904
TRANSLATE of: adhedb246. εeq check: 3519 from: 145 to: 2856
The best scores are: initl initn opt sw.adhe_ecoli P17547 escherichia coli. alcohol dehydr.. εw:adhe_cloab P33744 clostridium acetobutylicum. alcoh. εw:adhl_cloab P13604 clostridium acetobutylicum. nadph. sw:εucd_clokl P38947 clostridium kluyveri . succinate-s. sw:medh_bacmt P31005 bacillus methanolicuε . nad-depend. s :adh2_zymmo P06758 zymomonaε mobiliε. alcohol dehydr. sw:adh4_yeast P10127 saccharomyces cerevisiae (baker's εw:dhat_citfr P45513 citrobacter freundii. 1,3-propan..
Figure imgf000043_0001
sw:eute_salty P41793 salmonella typhimurium. ethanolam...150 309 372
adhedbl341.pep sw: adhe_ecoli
ID ADHE_ECOLI STANDARD; PRT; 890 AA. AC P17547; CREATED)
LAST SEQUENCE UPDATE) LAST ANNOTATION UPDATE) (EC 1.1.1.1) (ADH) / ACETALDEHYDE DEHYDRO-
Figure imgf000044_0001
SCORES Initl: 708 Initn: 1819 Opt: 1507
44.3% identity in 757 aa overlap
10 20 30 40 50 60 adhe24 MATKKAAPAA-ααπ.SAEE-O VKFQEAVAYTD.OiV. AQAAVLKFEGYTQTQVDTIVAAMA : ||:::: I :::::||:|||:| I j adhe_e AVTNVAELNALVERVKKAQREYASFTQEQVDKIFRAAA
10 20 30
70 80 90 100 110 120 adhe24 IiAASIOiSLEIAHEAVNETGRGVVEDKDTKNHFASESVYNAIKNDKTVGVISENKVAGSVE Ml:: :: ||: ||:|:|:|.|||| Ml Mill M MM I ||:||::: |::: adhe_e I-AAADARIPLA-α-AVAESG GIVED VIKNHFASEYIYNAYKDEKTCGVLSEDDTFGTIT
40 50 60 70 80 90
130 140 150 160 170 180 adhe24 IASPLGVLAGIVPTTNPTSTAIFKSLLTAKTRNAIVFAFHPQAQKCSSHAAKIVYDAAIE 11 : 1 : 1 : = II i 11 II 11 M ! ! I! I M : lllllhh Ihh: =:MIMI Mlh adhe_e lAEPIGIICGIVPTTNPTSTAIFKSLISLIO'RNAIIFSPHPRAKDATNKAADIVLQAAIA 100 110 120 130 140 150
190 200 210 220 230 240 adhe24 AGAPEDFIQ IEVPSI-DMTTALIQNRGLATILATGGPGMVNAALKSGNPSLGVGAGNGAV I I II : I = I Ih I I :::: M I :::::: : MMMMMMI MhhMIMIhM adhe_e AGAPIOLIG IDQPSVELSNAI- HHPDINLILATGGPGMVKAAYSSGKPAIGVGAGNTPV
160 170 180 190 200 210
250 260 270 280 290 300 adhe24 YVDATANIEPWTEDLLLSICIFDNGMIC-ATENSAVIDASVYDEFIAKMQEQGAYMVPKKDY :|:||:|:|||:::|:|| | | | | : | | | : | : | : | : :||||: ::::::|:|::: |: adhe_e VIDETADIKRAVASVLMS- TFDNGVICASEQSVVVVDSVYDAVRERFATHGGYLLQGKEL 220 230 240 250 260 270
310 320 330 340 350 360 adhe24 KAIES- /FVERAGEGFGVTGPVAGRSGQWIAEQAGVKVPKDKDVLLFELDKKNIGEALSS ||::: :: ::| :::::::|::: | || I I = I 1 = = : : M = I : : : :|::: adhe_e KAVQDVIL- -KNG- - -ALNAAIVGQPAYKIAELAGFSVPENTKILIGEVTWDESEPFAH 280 290 300 310 320 330
370 380 390 400 410 419 adhe24 EKLSPLLSIYKAETREEGIEIVRSLLAYQGAGHNAAIQIGAMDDP - FVKEYGEKVEASRI IHH |::|:|:: |:::| : :|:| | || :: : ::: ::| |: :|:|::::|| adhe_e EKLSPTLAMYRAKDFEDAVEKAEKLVAMGGIGHTSCLYTDQDNQPARVSYFGQKMKTARI
340 350 360 370 380 390
420 430 440 450 460 470 479 adhe24 LVNQPDSIGGVGDIYTDAITOPSLTUΪTGSWGKNSLSHNLSTYDLLNVKTVAKRRNRPQ V |:| |:| ||:||:|: : | | || || | || | || : | : | : : : : | : | | || I II : : I adhe_e LINTPASQGGIGDLYNFKIJ^SLTLGCGS GGNSISENVGPKHLINK.IVAKRAENMLWH 400 410 420 430 440 450 480 490 500 510 520 530 adhe24 RLPKEIYYE.α iSY-LQE-LPHVHK-AFIVADPGΪ_VKFGFVDKVLEQLAIRPTQVETSI
:|||:||: -: = : Ml - = II IMhh : : |::|:= : I ::: |||:: adhe_e KLPKS I YFRRGSLP IALDEVITDGHKRALI VTDRFLFNNGYADQITSVL - - KAAGVETEV 460 470 480 490 500 510
540 550 560 570 580 590 adhe24 YGSVQPDPTLSEAIAIARQMKQFEPDTVICLGGGSALDAGKIGRLIYEYDARGEADLSDD
: :|::|llll : I : IMMM MMMMIMI :::M : | = : adhe_e FFEVEADPTLSIVRKGAELANSFKPDVIIALGGGSPMDAAKIM VMYE- - -HPETH 520 530 540 550 560
600 610 620 630 640 650 adhe24 ASLKELFQELAQKF DIRKRIIKFYH-PHKAQI1VAIPTTSGTGSEVTPFAVITDDETHVK hill MMIIIM II = 11 = I = I = : II II II II 1111 II = II I = I I adhe_e -FEELALRF-rDIRKRIYKFPKMGVKAKMIAVTTTSGTGSEVTPFAVVTDDATGQK
570 580 590 600 610
660 670 680 690 700 710 adhe24 YPLADYQLTPQVAIVDPEFVMTVPIJITVSWSGIDA SHALESYVSVMSSDYTKPISLQAI lllll|:|||::||||:::||::|h : = = I = II = = II = I = II II : : I : : : : :|||: adhe_e YPLADYALTPDMAI DANLVMD PKSLClAFGGLXiAVTHAMEAYVSVLASEFSDGQALQAL 620 630 640 650 660 670
720 730 740 750 760 770 adhe24 IO.IFENLTESYHYDPAHPTKEGQKAREN--HNAATLAG AFANAFLGINHSLAHKIGGEFG ||: I |::||| :: :|: :: ::: :: ::|: || :: :::|:|: :: | adhe_e KLLKEYLPASYHEGSKNPVARERVHSAATIAGIAFAN-AFLGVCHSMAHKLGSQFHIPHG 680 690 700 710 720 730
780 790 800 810 820 830 adhe24 LPHGLAIAIAMPHVIKFNAVTGNVKRTPYPRYETYRAQEDYAEISRFMGFAGKDDSDEKA
|:::| I adhe_e LANALLICNVIRYNANDNPTKQTAFSQYDRPQARRRYAEIADHLGLSAPGDRTAAKIEKL 740 750 760 770 780 790
adhe24 : SEQ ID NO: 5; adh_e : SEQ ID NO: 6
adhedbl341.pep sw:adhe cloab
ID ADHE CLOAB STANDARD ; PRT; 862 AA. AC P33744; DT 01-FEB-1994 (REL. 28, CREATED) DT 01-FEB-1994 (REL. 28, LAST SEQUENCE UPDATE) DT 01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE) DE ALCOHOL DEHYDROGENASE (EC 1.1.1.1) (ADH) / ACETALDEHYDE DEHYDRO- GΞNASE . . .
SCORES Initl: 404 Initn: 1297 Opt: 1053
38.6% identity in 568 aa overlap
10 20 30 40 50 60 adhe24 MATKKAAPAAKKVLSAEE-G^AKFQEAVAYTDKLVT-^QAAVLKFEGYTQTQVDTIVAAMA i I
I • I • ; I I I ; I i • i I • I I adhe c M MKVTTVKELDEKLKVIKEAQKKFSCYSQEMVDEIFRNAA 10 20 30 70 80 90 100 110 120 adhe24 :T_AASKHSLEIAH-ΪAV_reTGRGvVED-α7rKNHFASESVYI^
Ml : ::|||::|| MIMMIII MIIIIM Ml I::|| |: h I :: adhe_c MAAIDARIELAKAAVLETGMGLVEDKVIKNHFAGEYIYNKYKDEKTCGI IERNEPYGITK 40 50 60 70 80 90
130 140 150 160 170 180 adhe 24 IAS PLGVLAGIVPTTNPTSTAI FKSLLTAKTRNAI VFAFHPQAQKCS SHAAKIVYDAAIE
MMMIMMMMMIM MM:: MUM M 11 M:: llh: MM: adhe_c IAEPIGVVAAIIPVTNPTSTTIFKSLISLIO'RNGIFFSPHPRA-KSTILAAKTILDAAVK 100 110 120 130 140 150
190 200 210 220 230 240 adhe24 AGAPEDFIQWIEVPSLDMTTALIQNRGIATII.ftTGGPG-V.VNAALKSGNPSLGVGAGNGAV
Ml I MM Ih I I::: I MM ::: I II I I I : M : M M I M : :| II : | | : :| adhe_c SGAPENIIGWIDEPSIELTQYLMQKADIT- -LATGGPSLVKSAYSSGKPAIGVGPGNTPV 160 170 180 190 200 210
250 260 270 280 290 300 adhe24 YVDATANIERAVEDLLLSKRFDNGMICATENSAVIDASVYDEFIAKMQEQGAYMVPKKDY
:|::|:|::||::::MI : I I I : I I I : I : I : : : |:|:= :::||:|||:: |:: adhe_c IIDESAHIK»LAVSSIILSKTYDNGVI(-ASEQSvTVLKSIYNKVKDEFQERGAYIIKKNEL 220 230 240 250 260 270
310 320 330 340 350 360 adhe24 IAIESFVTVERAGEGFGVTGPVAGRSGQWIAEQAGVIVPKDKDVLLFELDKKNIGEALSS
: : : :| ::| =M : : I : I : | | : | | : | | | | : : : | : | : : : : : | : : : adhe_c DKVREVIF- -KDG- - -SVNPKIVGQSAYTIAAMAGIKVPKTTRILIGEVTSLGEEEPFAH 280 290 300 310 320 330
370 380 390 400 410 419 adhe24 EKLSPLLSIYKAETREEGIEIVRSLLAYQGAGHNAAIQIGAMDDP-FVKEYGEKVEASRI
MIIMMMM:: : = ::: = M=: I l|: = M :::::: :: ::: ::: |: adhe_c EKLSPVIAMYEADNFDDAL-aCAv LINLGGLGHTSGIYADEIKARDKIDRFSSAMKTVRT 340 350 360 370 380 390
Figure imgf000046_0001
480 490 500 510 520 530 adhe24 RLPKEIYYEKNAISY-LQELPHVHK- -AFIVADPGMVKFGFVDKVLEQLAIRPTQVETSI adhe_c RvTHKVYFKFGCLQFALKDL- L-JKRAFIVTDSDPYNLNYVDSIIKILE- -HLDIDFKV 460 470 480 490 500 510
540 550 560 570 580 590 adhe24 YGSVQPDPTLSEAIAIARQMKQFEPDTVICLGGGSALDAGKIGRLIYEYDARGEADLSDD
:::| ::::|:: : :| : | | | | :| || | ::::::: | : ::: | | : : : | | : adhe_c FNKVGREADl-KTIKKATEEMSSFMPDTIIALGGTPE-^SAKLKr /LYEHPEVKFEDLAIK 520 530 540 550 560 570
600 610 620 630 640 650 adhe24 ASLKELFQELAQKFVDIRKRIIKFYHPHKAQMVAIPTTSGTGSEVTPFAVITDDETHVKY adhe_c F∞IR.O YTFPKLGKKANrLVAITTSAGSGSEvTPFALVTDNNTGN.CY^ 580 590 600 610 620 630 adhe2 : corresponding to amino acid residues 1-656 of SEQ ID NO: 5 adh_c: corresponding to amino acid residues 1-630 of SEQ ID NO: 11 6. Inverse PCR to obtain sequences upstream of the L. lactis DB1341 adhE coding sequence and cloning of PCR fragments
Inverse PCR was used to obtain additional sequences from the upstream region of the L. lactis DB1341 adhE gene. Hindlll-, Hpal- or PvuII-digested genomic DNA of strain DB1341 was ligated at low concentration and PCR was carried out using primers adhE-350 and adhE-700 (or adhE1300x) (see Fig. 2) . Sequence analysis of the obtained PCR products, using primers adhE-240 (or adhE-1300x) , allowed the identification of the upstream region of the adhE gene. A 0.6 kb PCR product obtaine from Hindlll inverse PCR amplification was subsequently cloned into pSMA500 resulting in E. coli DH5o. strain adhEup-1.
A sample of adhEup-1 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures Mascheroder eg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11091.
Further inverse PCR was carried out using Pstl-digested and religated chromosomal DNA of strain DB1341, using primers derived from the above sequence. An about 5 kb PCR product was obtained which in addition to the entire coding sequence of th adhE gene comprises about 1800 bp upstream of the coding sequence. This upstream sequence includes an open reading frame, designated orfB that encodes a putative 341 aa protein having no homology to in available databases.
Table 1.6. DNA sequence upstream of the coding sequence of the L . lactis DB1341 adhE gene
Pstl
1 CTGC-AGCTTGTTTTTTAGTACCAACAAAAAGGACTACTGCACCTTCTTGT 50
51 GAAGCGTTTTTTAC.ATAGTTGTAAGC-ATCGTC-Ϊ .C-AAGTTTTACAGTTTT 100 101 TTGAAGGTCGATAACGTGGATACCATTACGTTCTGTGAAGATGTATGGTT 150
151 TCATTTTTGGGTTCC-AACGACGAGTTTGGTGACCGAAGTGAACACCAGCT 200
201 TC-AAGAAGTTGTTTCATTGAAATAACTGACATGTTAATGTCTCCTTTTAA 250
251 AATAGTTTTTCCTCITTCATCTGTCATCCGCLMCCGC-AATACTTGCGTAC 300
301 ACT'ACGACTTTGTCGAGACGAAATGCGAGATGGTTGCATAGCAACTCTAT 350 351 (.ΛTTATAC-ATTGTTTGACCTATTTTTGCAAGTATCTATTCATGCTTCTAT 400 401 TGTTClAGTAAATCTATTTTTCTAACCACTCCrrATTATCTGAC-?UΛTTTAA 450 451 TTGTTAATTTAGGCTCTATAATCACTAAAAGAGTAAGTTTTTAAATTTTT 500 501 TTCTAAGAAAAAAATTAATATTTTTGCTGAAACCGCTTTTTTTGTGATAA 550 551 AATAATTATAGTAAATAAATTAGTTTGTGAGGAGAGAAATATGAAAGAAA 600 orfB M K E K
601 AAATCCTTTTAGGCGGCTATACAAAACGTGTATCTAAAGGCGTATATAGT 650
I L L G G Y T K R V S K G V Y S
651 GTT I TTGGACACrAAAGCTGCrGAATTATCATCATTAAATGAAGTCGC 700
V L L D T K A A E L S S L N E V A 701 TGCGGTTOΛAACCCTACΓTATATCACTCTCGATGAAAAGGGACACCTCT 750
A V Q N P T Y I T L D E K G H L Y
751 ATACTTGTGC-AGC-AGATAGTAATGGTGGAGGAATCGCCGCCTTTGATTTT 800
T C A A D S N G G G I A A F D F 801 GATGGCGAAACTGCTACTCATCTCGGAAATGTCACAACCACGGGAGCTCC 850 D G E T A T H L G N V T T T G A P
851 ACTCTGCTATGTTGCCGTGGACGAAGCGCGAC-AATTAGTTTACGGAGCGA 900
L C Y V A V D E A R Q L V Y G A N 901 ACTATCΛT πGGAGAAGTTCGTGTTTATAAGATTClAAGCTAATGGCTCA 950 Y H L G E V R V Y K I Q A N G S 951 CTCCGATTAACGGATACAGTAAAACATACCGGTTCTGGACCACGTCCTGA 1000
L R L T D T V K H T G S G P R P E 1001 ACAAGCrTAGCTt^C-ACGTTClATTATTCTGATTTGACTCCTGACGGACGAC 1050
Q A S S H V H Y S D L T P D G R L 1051 TTGTC-ACCTGTGATTTGGGAACAGATGAAGTOVCTGTTTATGATGTCATT 1100 V T C D L G T D E V T V Y D V I
1101 GGTGAAGGTAAACT<_3VATATTGCTACAATTTATCGGGCAGAAAAAGGAAT 1150
G E G K L N I A T I Y R A E K G M
1151 GGGTGCTCGTαVTATTACTTTCC-ATCCAAATGGTAAAATCGCTTATTTGG 1200
G A R H I T F H P N G K I A Y L V 1201 TTGGAGAGTTAAATTαACAATTGAAGTTTTAAGTTACAATGAAGAAAAA 1250
G E L N S T I E V L S Y N E E K
1251 GGACGCTTTGCTCGTCπC-AAACAATTAGCACCCTACCTGAAGATTATCA 1300
G R F A R L Q T I S T L P E D Y H 1301 TGGAGC-AAATGGTGTTGCTGCCATCCGTATTTCATCTGACGGTAAATTCC 1350 G A N G V A A I R I S S D G K F L
1351 TCrATACTTCTAATCGTGGAC-ATGATTCTTTGAC-AACriTACAAAGTAAGT 1400
Y T S N R G H D S L T T Y K V S 1401 CCTCTTGGTACAAAACTTGAAACTATTGGCTGGACAAATACTGAAGGTCA 1450 P L G T K L E T I G W T N T E G H 1451 TATCCC CGCGATTTTAATTTClAAC-AAAACrGAAGATTATATCATTGTCG 1500
I P R D F N F N K T E D Y I I V A 1501 CTC-ATC_?AGAATCTGATAATTTATCTCTTTTCTTGCGAGATAAAAAAACC 1550
H Q E S D N L S L F L R D K K T 1551 GGTACTTTAACTTTGGAAC-AAAAAGATTTTTACGCTCCTGAAATCACTTG 1600 G T L T L E Q K D F Y A P E I T C
1601 TGTTTTACCΛCTATAAAAATTTATTTTTTCΛC-AAAGTTTGACTGATAAAC 1650
V L P L Stop (SEQ ID NO: 27) 1651 TAAAAAAGATTGCTAATTTC CTC-ftAAGAATTAGC-?UVTClT'l l,rrCTTC 1700 1701 AGTAAAGCTTGTTAC-AAAACCGTTTTCTAAACrTTTGATGAGTGTTTTTG 1750 1751 TAAAAACTATCACAATATTGCTTGACATCTATAAAAAACTTTGTTAAACT 1800
1801 ATTCACGTAAAAGAAAGTGAATGAAGTCACAAAGGAGAACCTACAAAT (SEQ ID NO: 26)
7. Sequence of a fragment of the L. lactis strain MG1363 adhE gene
PCR was used to characterize the adhE homologue of strain MG1363. Primers adhE-mgl and adhE- 1697 were used to amplify a 1.5 kb fragment from this strain, named MGadhESTART. Primers adhE-1300x and adhE-mg2 were used to amplify an overlapping 1. kb fragment, named MGadhESTOP (Fig. 3) .
The above fragments were subsequently cloned into the plasmid pGEM and transformed into E. coli DH5α resulting in strains MGadhESTART and MGadhESTOP, respectively. Using the relevant primers a sequence was obtained that spans from position 1306- 2775 shown in Table 1.2. An additional primer adhe-mg3 (5'- CTTCTTTGGTTGGATGAGC-3 ' ) (SEQ ID NO: 7), derived from the MG1363 adhE sequence and corresponding to position 2359-2335 of the DB1341 adhE sequence (Table 1.4) was used to fill a sequence gap. A limited sequence variation at the DNA level (84 base changes, no insertion/deletions in the 1470 bp MG1363 adhE fragment, corresponding to 5.7 % variation; Table 1.7 below), resulting in only 8 amino acid substitutions (or 1.6 % vari- ation; Table 1.7).
A sample of E. coli DH5o. strain MGadhESTART and strain MGadhESTOP, respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany o 18 July 1996 under the accession Nos DSM 11089 and DSM 11090, respectively.
Table 1.7. Multialignment of the deduced L . lactis AdhE protei from strain MG1363 (fragment. adheπrl363) and DB1341 (adhedb!3 41) with the E. coli (adhe_ec) and C. acetobutyli cum (aad_ca) AdhE homologues
The program lineup (GCG Wisconsin package version 8, Genetics Computer Group) was used for the alignment. The consensus sequence (bold type at bottom) shows only conserved residues for all proteins. The differences between the two L . lactis AdhE proteins are shown as bold, underlined in adhemgl363.
50 adhemgl363 adhedbl34l MATKKAAPAA KKVLSAEEKA AKF.QEAVAY TDKLVKKAQA AVLK.FEGYT adhe_ec MAVTNVA ELNALVER VKKAQREYAS FT QE QVDKIFRA.. aad_ca MKVTTVK ELDEKLKV IKEAQKKFSC YS QE MVDEIFRN.. consensus M L K..Q Q. .V...F
51 100 adhemgl363 adhedbi34l QTQVDTIVAA MALAASKHSL ELAHEAVNET GRGWEDKDT KNHFASESVY adhe_ec AALAAADARI PLAKMAVAES GMGIVEDKVI KNHFASEYIY aad_ca AAMAAIDARI ELAKAAVLET GMGLVEDKVI KNHFAGEYIY consensus A.AA LA..AV.E. G.G.VKDK.. KNHFA.E..Y
101 150 adhemgl363 adhedbl341 NAIKNDKTVG VISENKVAGS VEIASPLGVL AGIVPTTNPT STAIFKSLLT adhe_ec NAYKDEKTCG VLSEDDTFGT ITIAEPIGII CGIVPTTNPT STAIFKSLIS aad_ca NKYKDEKTCG IIERNEPYGI TKIAEPIGW AAIIPVTNPT STTIFKSLIS consensus N..K..KT.G G. ..IA.P.G.. ..1.P.TNPT ST. IFKSLI .
151 200 adhemgl363 adhedbl341 AKTRNAIVFA FHPQAQKCSS HAAKIVYDAA IEAGAPEDFI QWIEVPSLDM adhe_ec LKTRNAIIFS PHPRAKDATN KAADIVLQAA IAAGAPKDLI GWIDQPSVEL aad_ca LKTRNGIFFS PHPRAKKSTI LAAKTILDAA VKSGAPENII GWIDEPSIEL consensus •KTRN HP.A AA AA ...GAP... I .WI..PS...
201 250 adhemgl363 adhedbl341 TTALIQNRGL ATILATGGPG MVNAALKSGN PSLGVGAGNG AVYVDATANI adhe_ec SNALMHHPDI NLILATGGPG MVKAAYSSGK PAIGVGAGNT PWIDETADI aad_ca TQYLMQKADI T..LATGGPS LVKSAYSSGK PAIGVGPGNT PVIIDESAHI consensus ...L LATGGP. .VK.A..SG. P.IGVG.GN. .V..D..A.I
251 300 adhemgl363 adhedbl341 ERAVEDLLLS KRFDNGMICA TENSAVIDAS VYDEFIAKMQ EQGAYMVPKK adhe_ec KRAVASVLMS KTFDNGVICA SEQSWWDS VYDAVRERFA THGGYLLQGK aad_ca KMAVSSIILS KTYDNGVICA SEQSVIVLKS IYNKVKDEFQ ERGAYIIKKN consensus ..AV. .S ...DNG.ICA .E.S S .Y. .G.Y.
301 350 adhemgl363 adhedbl341 DYKAIESFVF VERAGEGFGV TGPVAGRSGQ WIAEQAGVKV PKDKDVLLFE adhe_ec ELKAVQDVIL ..KNG...AL NAAIVGQPAY KIAELAGFSV PENTKILIGE aad_ca ELDKVREVIF ..KDG...SV NPKIVGQSAY TIAAMAGIKV PKTTRILIGE consensus G G IA..AG..V P LIGE
351 400 adhemgl363 Y QGAGHNAAIQ adhedbl341 LDKKNIGEAL SSEKLSPLLS IYKAETREEG IEIVRSLLAY QGAGHNAAIQ adhe_ec VTWDESEPF AHEKLSPTLA MYRAKDFEDA VEKAEKLVAM GGIGHTSCLY aad_ca VTSLGEEEPF AHEKLSPVLA MYEADNFDDA LKKAVTLINL GGLGHTSGIY consensus E.. ..EKLSP.L. .Y.A ... .G.GH 401 450 adhemgl363 IGAMDDP.FV KEYGIKVEAS RILVNQPDSI GGVGDIYTDA MRPSLTLGTG adhedbl341 IGAMDDP.FV KEYGEKVEAS RILVNQPDSI GGVGDIYTDA MRPSLTLGTG adhe ec TDQDNQPARV SYFGQKMKTA RILINTPASQ GGIGDLYNFK LAPSLTLGCG aad_ca ADEIKARDKI DRFSSAMKTV RTFVNIPTSQ GASGDLYNFR IPPSFTLGCG consensus R...N.P.S. G..GD.Y PS.TLG.G
451 500 adhemgl363 SWGKNSLSHN LSTYDLLNVK TVAKRRNRPQ WVRLPKEIYY EKNAISYLQE adhedbl341 SWGKNSLSHN LSTYDLLNVK TVAKRRNRPQ WVRLPKEIYY EKNAISYLQE adhe_ec SWGGNSISEN VGPKHLINKK TVAKRAENML WHKLPKSIYF RRGSLPIALD aad_ca FWGGNSVSEN VGPKHLLNIK TVAERRENML WFRVPHKVYF KFGCLQFALK consensus .WG.NS.S.N L.N.K TVA.R W...P...Y
501 550 adhemgl363 LPHVHK...A FIVADPGMVK FGFVDKVLEQ LAIRPTQVET SIYGSVQPDP adhedbl341 LPHVHK...A FIVADPGMVK FGFVDKVLEQ LAIRPTQVET SIYGSVQPDP adhe_ec EVITDGHKRA LIVTDRFLFN NGYADQITSV L..KAAGVET EVFFEVEADP aad_ca DLKDLKKKRA FIVTDSDPYN LNYVDSIIKI L..EHLDIDF KVFNKVGREA consensus IV.D D L V....
551 600 adhemgl363 TLSEAIAIAR QMNHFEPDTV ICLGGGSALD AGKIGRLIYE YDARGEADLS adhedbl341 TLSEAIAIAR QMKQFEPDTV ICLGGGSALD AGKIGRLIYE YDARGEADLS adhe_ec TLSIVRKGAE LANSFKPDVI IALGGGSPMD AAKIMWVMYE ...HPETH.. aad_ca DLKTIKKATE EMSSFMPDTI IALGGTPEMS SAKLMWVLYE ...HPEVK.. consensus • . •F• • . I• GG..... «.K. .YE ■ E.
601 650 adhemgl363 DDASLKEIFQ ELAQKFVDIR KRIIKFYH.P HKAQMVAIPT TSGTGSEVTP adhedbl341 DDASLKELFQ ELAQKFVDIR KRIIKFYH.P HKAQMVAIPT TSGTGSEVTP adhe_ec FE ELALRFMDIR KRIYKFPKMG VKAKMIAVTT TSGTGSEVTP aad_ca FE DLAIKFMDIR KRIYTFPKLG KKAMLVAITT SAGSGSEVTP consensus F. . A..F.DIR KRI..F KA...A..T ....GSEVTP
651 700 adhemgl363 FAVITDDETH VKYPLADYQL TPQVAIVDPE FVMTVPKRTV SWSGIDAMSH adhedbl34l FAVITDDETH VKYPLADYQL TPQVAIVDPE FVMTVPKRTV SWSGIDAMSH adhe_ec FAWTDDATG QKYPLADYAL TPDMAIVDAN LVMDMPKSLC AFGGLDAVTΉ aad_ca FALVTDNNTG NKYMLADYEM TPNMAIVDAE LMMKMPKGLT AYSGIDALVN consensus FA..TD..T. .KY.LADY.. TP..AIVD.. ..M..PK G.DA...
701 750 adhemgl363 ALESYVSVMS SDYTKPISLQ AIKLIFENLT ESYHYDPAHP TKEGQKAREN adhedbl341 ALESYVSVMS SDYTKPISLQ AIKLIFENLT ESYHYDPAHP TKEGQKAREN adhe_ec AMEAYVSVLA SEFSDGQALQ ALKLLKEYLP ASYHEGSKNP VARER aad_ca SIEAYTSVYA SEYTNGLALE AIRLIFKYLP EAYKNGRTNE KAREK consensus ..E.Y.SV.. S L. AI. ....L. ..Y ARE.
751 800 adhemgl363 MHNAATLAGM AFANAFLGIN HSLAHKIAGE FGLPHGLAIA IAMPHVIKFN adhedbl34l MHNAATLAGM AFANAFLGIN HSLAHKIGGE FGLPHGLAIA IAMPHVIKFN adhe_ec VHSAATIAGI AFANAFLGVC HSMAHKLGSQ FHIPHGLANA LLICNVIRYN aad_ca MAHASTMAGM ASANAFLGLC HSMAIKLSSE HNIPSGIANA LLIEEVIKFN consensus * • •*» • X • &A3 • A.ANAFLG.. HSMA.K ...P.G.A.A VI..N
801 850 adhemgl363 AVTGNVKFTP YPRYETYRAQ EDYAEISRFM GFAGKEDSDE KAVKAFVAEL adhedbl341 AVTGNVKRTP YPRYETYRAQ EDYAEISRFM GFAGKDDSDE KAVQALVAEL adhe_ec ANDNPTKQTA FSQYDRPQAR RRYAEIADHL GLSAPGDRTA AKIEKLLAWL aad_ca AVDNPVKQAP CPQYKYPNTI FRYARIADYI KLGGNTDEEK VDLLINKIHE consensus A K Y YA.I D 851 900 adhemgl363 KKLTDSIDIN ITLSGN..GV DKAHLERELD KLADLV adhedbl341 KKLTDSIDIN ITLSGN..GI DKAHLERELD KLADLVYDDQ CTPANPRQPR adhe_ec ETLKA..ELG IPKSIREAGV QEADFLANVD KLSEDAFDDQ CTGANPRYPL aad_ca LKKAL N IPTSIKDAGV LEENFYSSLD RISELALDDQ CTGANPRFPL consensus I..S G D DDQ CT.ANPR.P.
901 941 adhedbl341 IDEIKQLLLD QY* adhe_ec ISELKQILLD TYYGRDYVEG ETAAKKEAAP AKAEKKAKKS A aad_ca TSEIKEMYIN CFKKQP consensus ..E.K
adhemgl363: SEQ ID NO: 8; adhedbl341 : SEQ ID NO: 9; adhe_ec: SEQ ID NO: 10; aad_ca: SEQ ID NO: 11
Table 1.8. Alignment of the acihE sequences from L. lactis DB1341 and MG1363
The complete sequence of the adhE gene of strain DB1341 is compared to the sequence obtained via PCR amplification of MG1363 adhE fragments (see Fig. 2) .
1 50 adhemgl363 adhedbl341 AAGCTTGTTA CAAAACCGTT TTCTAAACTT TTGATGAGTG TTTTTGTAAA consensus
51 100 adhemgl363 • — adhedbl341 AACTATCACA ATATTGCTTG ACATCTATAA AAAACTTTGT TAAACTATTC consensus
101 150 adhemgl363 adhedbl341 ACGTAAAAGA AAGTGAATGA AGTCACAAAG GAGAACCTAC AAATATGGCA consensus
151 200 adhemgl363 adhedbl341 ACTAAAAAAG CCGCTCCAGC TGCAAAGAAA GTTTTAAGCG CTGAAGAAAA consensus
201 250 adhemgl363 adhedbl341 AGCCGCAAAA TTCCAAGAAG CTGTTGCTTA TACTGACAAA TTAGTCAAAA consβnsus 251 300 adhemgl363 adhedbl341 AAGCACAAGC TGCTGTTCTT AAATTTGAAG GATATACACA AACTCAAGTC consensus
301 350 adhemgl363 adhedbl341 GATACTATTG TCGCTGCAAT GGCTCTTGCA GCAAGCAAAC ATTCTCTAGA consensus 351 400 adhemgl363 adhedbl341 ACTCGCTCAT GAAGCCGTTA ACGAAACTGG TCGTGGTGTT GTCGAAGACA consensus 401 450 adhemgl363 adhedbl341 AAGATACCAA AAACCACTTT GCTTCTGAAT CTGTTTATAA CGCAATTAAA consensus
451 500 adhemgl363 adhedbl341 AATGACAAAA CTGTTGGTGT CATTTCTGAA AACAAGGTTG CTGGATCTGT consensus
501 550 adhemgl363 adhedbl341 TGAAATCGCA AGCCCTCTCG GTGTACTTGC TGGTATCGTT CCAACGACTA consensus
551 600 adhemgl363 adhedbl341 ATCCAACATC AACAGCAATC TTTAAATCTT TATTGACTGC AAAAACACGT consensus
601 650 adhemgl363 adhedbl341 AATGCTATTG TTTTCGCTTT CCACCCTCAA GCTCAAAAAT GTTCAAGCCA consensus 651 700 adhemgl363 adhedbl341 TGCAGCAAAA ATTGTTTACG ATGCTGCAAT TGAAGCTGGT GCACCGGAAG consensus
701 750 adhemgl363 adhedbl341 ACTTTATTCA ATGGATTGAA GTACCAAGCC TTGACATGAC TACCGCCTTG consensus
751 800 adhemgl363 adhedbl341 ATTCAAAACC GTGGACTTGC AACAATCCTT GCAACTGGTG GCCCAGGAAT consensus
801 850 adhemgl363 adhedbl341 GGTAAACGCC GCACTCAAAT CTGGTAACCC TTCACTCGGT GTTGGAGCTG consensus
851 900 adhemgl363 adhedbl341 GTAATGGTGC TGTTTATGTT GATGCAACTG CAAATATTGA ACGTGCCGTT consensus 901 950 adhemgl363 adhedbl341 GAAGACCTTT TGCTTTCAAA ACGTTTTGAT AATGGGATGA TTTGTGCCAC consensus
951 1000 adhemgl363 adhedbl341 TGAAAATTCA GCTGTTATTG ATGCTTCAGT TTATGATGAA TTTATTGCTA consensus 1001 1050 adhemgl363 adhedbl341 AAATGCAAGA ACAAGGCGCT TATATGGTTC CTAAAAAAGA CTACAAAGCT consensus 1051 1100 adhemgl363 adhedbl341 ATTGAAAGTT TCGTTTTTGT TGAACGTGCT GGTGAAGGTT TTGGAGTAAC consensus
1101 1150 adhemgl363 adhedbl341 TGGTCCTGTT GCCGGTCGTT CTGGTCAATG GATTGCTGAA CAAGCTGGTG consensus
1151 1200 adhemgl363 adhedbl341 TCAAAGTTCC TAAAGATAAA GATGTCCTTC TTTTTGAACT TGATAAGAAA consensus
1201 1250 adhemgl363 adhedbl341 AATATTGGTG AAGCACTTTC TTCTGAAAAA CTTTCTCCTT TGCTTTCAAT consensus
1251 1300 adhemgl363 adhedbl341 CTACAAAGCT GAAACACGTG AAGAAGGAAT TGAGATTGTA CGTAGCTTAC consensus 1301 1350 adhemgl363 TACCA AGGAGCTGGT CACAACGCTG CAATTCAAAT CGGTGCAATG adhedbl341 TTGCTTATCA AGGTGCTGGA CATAATGCTG CAATTCAAAT CGGTGCAATG consensus TA.CA AGG.GCTGG. CA.AA.GCTG CAATTCAAAT CGGTGCAATG
1351 1400 adhemgl363 GACGACCCAT TTGTCAAAGA ATACGGAATT AAAGTCGAAG CTTCTCGTAT adhedbl341 GATGATCCAT TCGTTAAAGA ATATGGCGAA AAAGTTGAAG CTTCTCGTAT consensus GA.GA.CCAT T.GT.AAAGA ATA.GG AAAGT.GAAG CTTCTCGTAT
1401 1450 adhemgl363 CCTCGTTAAC CAACCTGACT CTATCGGTGG GGTCGGAGAT ATTTATACTG adhedbl341 CCTCGTTAAC CAACCAGATT CTATTGGTGG GGTCGGAGAT ATCTATACTG consensus CCTCGTTAAC CAACC.GA.T CTAT.GGTGG GGTCGGAGAT AT.TATACTG
1451 1500 adhemgl363 ATGCAATGCG TCCATCATTG ACGCTCGGAA CTGGTTCATG GGGGAAAAAT adhedbl341 ATGCAATGCG TCCATCACTT ACACTTGGAA CTGGTTCATG GGGGAAAAAT consensus ATGCAATGCG TCCATCA.T. AC.CT.GGAA CTGGTTCATG GGGGAAAAAT
1501 1550 adhemgl363 TCACTTTCAC ACAATTTGAG TACATACGAT CTATTGAATG TTAAAACAGT adhedbl341 TCACTTTCAC ACAATTTGAG TACATACGAT CTATTGAATG TTAAAACAGT consensus TCACTTTCAC ACAATTTGAG TACATACGAT CTATTGAATG TTAAAACAGT 1551 1600 adhemgl363 GGCTAAACGT CGTAATCGCC CTCAATGGGT TCGTTTGCCA AAAGAAATTT adhedbl341 GGCTAAACGT CGTAATCGCC CACAATGGGT TCGTTTGCCA AAAGAAATTT consensus GGCTAAACGT CGTAATCGCC C.CAATGGGT TCGTTTGCCA AAAGAAATTT
1601 1650 adhemgl363 ACTACGAAAA AAATGCAATT TCTTACTTAC AAGAATTGCC ACACGTCCAC adhedbl341 ACTACGAAAA AAATGCAATT TCTTACTTAC AAGAATTGCC ACACGTCCAC consensus ACTACGAAAA AAATGCAATT TCTTACTTAC AAGAATTGCC ACACGTCCAC 1651 1700 adhemgl363 AAAGCTTTCA TTGTTGCCGA CCCTGGTATG GTTAAATTCG GTTTCGTTGA adhedbl341 AAAGCTTTCA TCGTTGCTGA CCCTGGTATG GTTAAATTTG GTTTCGTTGA consensus AAAGCTTTCA T.GTTGC.GA CCCTGGTATG GTTAAATT.G GTTTCGTTGA
1701 1750 adhemgl363 TAAAGTTTTG GAACAACTTG CTATCCGCCC AACTCAAGTT GAAACAAGCA adhedbl341 TAAAGTTTTG GAACAACTTG CTATCCGCCC AACTCAAGTT GAAACAAGCA consensus TAAAGTTTTG GAACAACTTG CTATCCGCCC AACTCAAGTT GAAACAAGCA
1751 1800 adhemgl363 TTTATGGCTC AGTCCAACCT GACCCAACTT TGAGTGAAGC AATTGCAATC adhedbl341 TTTATGGCTC TGTTCAACCT GACCCAACTT TGAGCGAAGC AATTGCAATC consensus TTTATGGCTC .GT.CAACCT GACCCAACTT TGAG.GAAGC AATTGCAATC
1801 1850 adhemgl363 GCTCGTCAAA TGAACCATTT TGAACCTGAC ACTGTCATCT GTCTTGGTGG adhedbl341 GCTCGTCAAA TGAAACAATT TGAACCTGAC ACTGTCATCT GTCTTGGTGG consensus GCTCGTCAAA TGAA.CA.TT TGAACCTGAC ACTGTCATCT GTCTTGGTGG
1851 1900 adhemgl363 TGGTTCTGCT CTCGATGCTG GTAAGATTGG TCGTTTGATT TATGAATATG adhedbl34l TGGTTCTGCT CTCGATGCCG GTAAGATTGG TCGTTTGATT TATGAATATG consensus TGGTTCTGCT CTCGATGC.G GTAAGATTGG TCGTTTGATT TATGAATATG
1901 1950 adhemgl363 ATGCTCGTGG TGAGGCTGAC CTTTCCGATG ACGCAAGTTT GAAAGAGATC adhedbl341 ATGCTCGTGG TGAAGCTGAC CTTTCTGATG ATGCAAGTTT GAAAGAACTT consensus ATGCTCGTGG TGA.GCTGAC CTTTC.GATG A.GCAAGTTT GAAAGA..T. 1951 2000 adhemgl363 TTCCAAGAGT TAGCTCAAAA ATTTGTTGAT ATTCGTAAAC GTATTATCAA adhedbl341 TTCCAAGAAT TAGCTCAAAA ATTTGTCGAT ATTCGTAAAC GTATTATTAA consensus TTCCAAGA.T TAGCTCAAAA ATTTGT.GAT ATTCGTAAAC GTATTAT.AA
2001 2050 adhemgl363 ATTCTACCAC CCACACAAAG CACAAATGGT TGCTATCCCT ACTACTTCTG adhedbl341 ATTCTACCAT CCACATAAAG CACAAATGGT TGCAATTCCT ACTACTTCTG consensus ATTCTACCA. CCACA.AAAG CACAAATGGT TGC.AT.CCT ACTACTTCTG
2051 2100 adhemgl363 GTACTGGTTC TGAAGTGACT CCATTTGCGG TTATCACTGA TGATGAAACT adhedbl341 GTACTGGTTC TGAAGTGACT CCATTTGCAG TTATCACTGA TGATGAAACT consensus GTACTGGTTC TGAAGTGACT CCATTTGC.G TTATCACTGA TGATGAAACT
2101 2150 adhemgl363 CACGTTAAAT ATCCACTTGC TGACTATCAA TTGACACCTC AAGTTGCCAT adhedbl341 CATGTTAAGT ACCCACTTGC TGACTACCAA TTAACACCAC AAGTTGCCAT consensus CA.GTTAA.T A.CCACTTGC TGACTA.CAA TT.ACACC.C AAGTTGCCAT
2151 2200 adhemgl3 3 TGTTGACCCT GAGTTTGTTA TGACTGTACC AAAACGTACT GTTTCTTGGT adhedbl34i TGTTGACCCT GAGTTTGTTA TGACTGTACC AAAACGTACT GTTTCTTGGT consensus TGTTGACCCT GAGTTTGTTA TGACTGTACC AAAACGTACT GTTTCTTGGT
2201 2250 adhemgl363 CTGGGATTGA TGCTATGTCA CACGCGCTTG AATCTTATGT TTCTGTCATG adhedbl34l CTGGTATTGA TGCGATGTCA CACGCGCTTG AATCTTACGT TTCTGTTATG consensus CTGG.ATTGA TGC.ATGTCA CACGCGCTTG AATCTTA.GT TTCTGT.ATG
2251 2300 adhemgl363 TCTTCTGACT ATACAAAACC AATTTCACTT CAAGCCATCA AACTCATCTT adhedbl341 TCTTCTGACT ATACAAAACC AATTTCACTT CAAGCGATCA AACTTATCTT consensus TCTTCTGACT ATACAAAACC AATTTCACTT CAAGC.ATCA AACT.ATCTT 2301 2350 adhemgl363 TGAAAACTTG ACTGAGTCTT ATCATTATGA CCCAGCTCAT CCAACCAAAG adhedbl341 TGAAAACTTG ACTGAGTCTT ATCATTATGA CCCAGCGCAT CCAACTAAAG consensus TGAAAACTTG ACTGAGTCTT ATCATTATGA CCCAGC.CAT CCAAC.AAAG
2351 2400 adhemgl363 AAGGTCAAAA AGCTCGCGAA AACATGCACA ATGCTGCAAC ACTCGCTGGT adhedbl341 AAGGACAAAA AGCCCGCGAA AACATGCACA ATGCTGCAAC ACTCGCTGGT consensus AAGG.CAAAA AGC.CGCGAA AACATGCACA ATGCTGCAAC ACTCGCTGGT
2401 2450 adhemgl363 ATGGCCTTCG CCAATGCTTT CCTTGGAATT AACCACTCAC TTGCTCATAA adhedbl341 ATGGCCTTCG CTAATGCTTT CCTTGGAATT AACCACTCAC TTGCTCATAA consensus ATGGCCTTCG C.AATGCTTT CCTTGGAATT AACCACTCAC TTGCTCATAA
2451 2500 adhemgl363 AATTGCTGGT GAATTTGGGC TTCCTCATGG TCTTGCCATT GCTATCGCTA adhedbl341 AATTGGTGGT GAATTTGGAC TTCCTCATGG TCTTGCCATT GCCATCGCTA consensus AATTG.TGGT GAATTTGG.C TTCCTCATGG TCTTGCCATT GC.ATCGCTA
2501 2550 adhemgl363 TGCCACATGT CATTAAATTT AACGCTGTAA CAGGAAACGT TAAATTTACC adhedbl3 1 TGCCACATGT CATTAAATTT AACGCTGTAA CAGGAAACGT TAAACGTACC consensus TGCCACATGT CATTAAATTT AACGCTGTAA CAGGAAACGT TAAA..TACC
2551 2600 adhemgl363 CCTTACCCAC GTTATGAAAC TTATCGTGCG CAAGAAGACT ACGCTGAAAT adhedbl341 CCTTACCCAC GTTATGAAAC ATATCGTGCT CAAGAGGACT ACGCTGAAAT consensus CCTTACCCAC GTTATGAAAC .TATCGTGC. CAAGA.GACT ACGCTGAAAT
2601 2650 adhemgl363 TTCACGCTTC ATGGGATTTG CTGGCAAAGA AGATTCAGAT GAAAAAGCGG adhedbl341 TTCACGCTTC ATGGGATTTG CTGGTAAAGA TGATTCAGAT GAAAAAGCTG consensus TTCACGCTTC ATGGGATTTG CTGG.AAAGA .GATTCAGAT GAAAAAGC.G
2651 2700 adhemgl363 TCAAAGCTTT TGTTGCTGAA CTTAAAAAAT TGACTGATAG TATTGATATT adhedbl341 TGCAAGCTCT GGTTGCTGAA CTTAAGAAAC TGACTGATAG CATTGATATT consensus T ..AAGCT.T .GTTGCTGAA CTTAA.AAA. TGACTGATAG .ATTGATATT
2701 2750 adhemgl363 AATATCACCC TTTCAGGAAA TGGTGTAGAT AAAGCTCACC TTGAACGTGA adhedbl341 AATATCACCC TTTCAGGAAA TGGTATCGAT AAAGCTCACC TTGAACGTGA consensus AATATCACCC TTTCAGGAAA TGGT.T.GAT AAAGCTCACC TTGAACGTGA
2751 2800 adhemgl363 GCTTGATAAA TTGGCTGACC TTGTT (SEQ ID NO: 12) adhedbl341 ACTTGATAAA TTGGCTGACC TTGTTTATGA TGATCAATGT ACTCCTGCTA consensus .CTTGATAAA TTGGCTGACC TTGTT
2801 2850 adhedbl341 ATCCTCGTCA ACCAAGAATT GATGAGATTA AACAGTTGTT GTTAGATCAA consensus 2851 2900 adhedbl341 TACTAATAAT CTGTTGATAA AATTATTAAA ACGCTCTGAT GAATTCGTCA consensus
2901 2950 adhedbl341 GAGCATTTTT TATTATAGCT TATACAACTA TCAAAAGGTA TAAATCAATT consensus 2951 3000 adhedbl341 TCGATATAGG CTCTTTTCAC TCCATTGATT TATGCATTTC TATAAAAATC consensus
3001 3050 adhedbl341 AATAATTAAT TAGCGATAGA AGTCGAGTTC ATGCATGCTA ATAATGAAAT consensus
3051 3100 adhedbl341 TGTTTTAAAT TCTGGTTTTT CTTTATGTTC TTTGCGAACA TCTTTCACAG consensus 3101 3150 adhedbl341 TTTCTTTGTT CATGAAAATT CCTCCTTATT ATGGTACTAT TTTGAGCCCA consensus
3151 3193 adhedbl341 AATAGTTATA TAAGAATCCT AAACTTCGGA TATCTTATCA AAG (SEQ ID NO: 13) consensus
8. Obtaining and sequencing the entire adhE locus from L. lactis strain MG1363
Inverse PCR was carried out on digested and religated chromosomal DNA of strain MG1363, using primers adhE-146 and adhE-MG5 (see Fig. 5) . A PCR fragment was obtained which in addition to the above fragment of the MG1363 adhE sequence comprised an about 2.9 kb sequence upstream of that fragment including the 5' -end of the adhE coding sequence and and open reading frame, designated orfB showing a high homology with th corresponding open reading frame from strain DB1341.
The entire sequence of the adhE locus of Lactococcus lactis strain MG1363 is shown in Table 1.9 below.
Table 1.9. The adhE locus of strain MG1363
1 TTTGGTGACCGAAGTGAAO.CCAGCTTCAAGAAGTTGTTTCΛTTGAAATA 50 51 ACTGAClATGTTAATGTCrrCCrTTTAAAATAGTTTTTCCTCTTTCATCTGT 100
101 CATCCGCAGCCGCAATACT rGCGTACACTACGACTTTGTCGAGACGAAAT 150 151 GCGAGATGGTTGCATAGCAACTCTCTOVTTATAC-ATTGTTTAAGCTACTT 200 201 TTGCAAGC-ATCTATTC-ATTTATTTCTTTTATCAATATGAGTAAATGAAAG 250 251 CTATCCTACCCCCCTTTCTTTTTATTCTGTTTTTTATATCTCAATGTTGT 300 301 CTGACAAATTTAACGAATATTTTTGCCTATATAATCCCCATAAGGGAGAT 350
351 TTTTACATTTTTTTCTAAGAATAAAATTAATATTTTTGCTGAAAACGCTT 400 401 TTTTTGTGATAAAATAATTATAGTAAATAAAATAGTTTGTGAGGAGAGAA 450 451 ATATGAAAGAAAAAATCCTTTTAGGCGGTTATACTAAACGTGTATCTAAA 500 orfB M K E K I L L G G Y T K R V S K 501 GGCGTTTACΑGTGTTCTATTAGATAGCAAGAAAGCTGAATTGTCGGCrTT 550
G V Y S V L L D S K K A E L S A L
Sau3AI 551 AACTGAAGTTGC-AGCGGTTCAAAATCCAACTTATATCACTCTTGATCAAA 600
T E V A A V Q N P T Y I T L D Q K 601 AAGGGCACCTCTACACTTGTGCTGCTGATGGAAATGGTGGTGGAATTGCT 650 G H L Y T C A A D G N G G G I A 651 GCCTTTGATTTCGATGGTCAAAATACAACTCACCTAGGGAATGTAACGAG 700 A F D F D G Q N T T H L G N V T S 701 TACTGGAGCCCCTTTGTGTTATGTGGCTGTTGATGAAGCACGTCAACTCG 750
T G A P L C Y V A V D E A R Q L V 751 TTTATGGTGCClAACTATClACTTGGGTGAAGTTCGTGTGTACyUAATTCAA 800 Y G A N Y H L G E V R V Y K I Q
801 GCTGATGGTTCCCTTAGATTAACCGATAC-AGTTAAACATAATGGTTCTGG 850
A D G S L R L T D T V K H N G S G 851 CCCTCGACCTGAGClAAGCAAGTTCrCATGTCC-ATTACTCTGATTTAACTC 900 P R P E Q A S S H V H Y S D L T P 901 C-AGATGGTCGTCTTGTTACTTGTGATTTAGGTACAGATGAAGTGACTGTT 950
D G R L V T C D L G T D E V T V 951 TACGATGTTATTGGTGAAGGTAAACTCAATATCGTTACGATTTATCGTGC 1000 Y D V I G E G K L N I V T I Y R A 1001 CGAAAAAGGAATGGGAGCTCGTCACATCAGCTTCCATCCTAATGGAAAAA 1050 E K G M G A R H I S F H P N G K I
1051 TTGCTTATCTCGTCGGAGAATTAAATTCAACTATTGAAGTTCTAAGCTAT 1100
A Y L V G E L N S T I E V L S Y 1101 AATGAAGAAAAAGGACGATTCGCTCGT( CLAAACAATCAGTACTTTACC 1150 N E E K G R F A R L Q T I S T L P 1151 TGAAGACTATCACGGAGCCAATGGAGTAGCTGCTATTCGAATTTCTTCTG 1200
E D Y H G A N G V A A I R I S S D 1201 ATGGTAAGTTCCTCTATGCI CTAATCGTGGG(-ΛCGACTCTTTAGCAATT 1250
G K F L Y A S N R G H D S L A I 1251 TAC-AAGGTAAGTCCTCTCGGAAC-AAAATTAGAATCTATTGGTTGGACAAA 1300 Y K V S P L G T K L E S I G W T K
1301 GACTGAATATC-ATATTCC-ACGCGATTTTAATTTTAATAAAACCGAAGATT 1350
T E Y H I P R D F N F N K T E D Y
1351 ATATCATTGTCGCTOVTα^GAATCTGATAATTTAACTCTTTTCTTGAGA 1400
I I V A H Q E S D N L T L F L R 1401 GATAAAAATAC1AGGGTCATTAACGTTAGAACAAAAAGACTTTTACGCTCC 1450
D K N T G S L T L E Q K D F Y A P
1451 TGAAATTACTTGTGTTTTACCTTTGTAAAAACTAAACTTTAGTAAATCTT 1500
E I T C V L P L Stop (SEQ ID NO: 29) 1501 GCTTTTGTTTTTTC1ACAAAGTTTTACTAAATCAGAC1AAAAAAATATTGCC 1550 1551 AAATCTTTAAAAGGATTGGαATATTTTTTTGTCTXΪAAACCCTTGCTTAT 1600
1601 AAAGCGATTTCTAAAAGTTTGATGAGTTTTTTTGTAAATTTCATCACAAT 1650 1651 ATCGCTTGACTTCTTTAAAAAACTTTGTTAAACTATTCACGTAAAAGAAA 1700 1701 GTGAATGGAATCACAAAGGAGAACGTACACATATGGCAACTAAAAAAGCC 1750 adhE M A T K K A 1751 GCTCCΑGCTGC-AAAGAAAGTTTTAAGCGCTGAAGAAAAAGCCGC-AAAATT 1800
A P A A K K V L S A E E K A A K F
Sau3AI 1801 CC-AAGGAAGTGTCGCTTATACTGATCAATTAGTCAAAAAAGCTCAAGCTG 1850 Q G S V A Y T D Q L V K K A Q A A 1851 CΑGTTCTTAAATTTGAAGGATACACACAAA rC-AAGTTGATACTATTGTT 1900
V L K F E G Y T Q T Q V D T I V 1901 GCTGt-IAATGGCTCl^GC-AGClAAGCAAAC-ATTCTCTGGAACTCGCTCACGA 1950
A A M A L A A S K H S L E L A H E 1951 AGCCGTTAATGAAACTGGCCGTGGAGTTGTTGAGGACAAAGATACAAAAA 2000 A V N E T G R G V V E D K D T K N
2001 ACC1ATTTTGCTTCTGAATCTGTTTATAATGC1AATCAAAAATGATAAAACA 2050
H F A S E S V Y N A I K N D K T 2051 GTTGGCGTTATCGCTGAAAAC-AAAGTTGCTGGTTCTGTTGAAATCGCAAG 2100 V G V I A E N K V A G S V E I A S 2101 CCCCCTTGGAGTACTTGCTGGTATTGTCCCAACAACTAATCCAACATCAA 2150
P L G V L A G I V P T T N P T S T 2151 C1AGCCATCTTTAAATCATTATTAACTGCAAAGACACGTAATGCTATTGTC 2200 A I F K S L L T A K T R N A I V 2201 TTTGCCTTTC-ACCC-ACAAGCACAAAAATGCTCAAGCCATGCGGCAAAAAT 2250
F A F H P Q A Q K C S S H A A K I 2251 TGTTTATGATGCTGCGATTGAAGCTGGTGCACCTGAAGACTTTATTCAAT 2300 V Y D A A I E A G A P E D F I Q W 2301 GGATTGAAGTACCCAGTCTTGATATGACGACTGCTTTGATTC-AAAATAGA 2350
I E V P S L D M T T A L I Q N R 2351 GGAATTGCTACAATTCTTGCAACTGGTGGTCCAGGTATGGTCAATGCCGC 2400
G I A T I L A T G G P G M V N A A 2401 GCTTAAGTCTGGTAATCCTTCACTTGGTGTAGGTGCTGGTAATGGTGCAG 2450 L K S G N P S L G V G A G N G A V
Sau3AI Sau3AI 2451 TTTATGTTGATGC1AACTGCAAATATCGATCGTGCTGTTGAAGATCTTTTG 2500
Y V D A T A N I D R A V E D L L 2501 CITT(-ΛAAACGTTTTGATAACGGAATGATTTGTGCGACTGAAAACTCTGC 2550 L S K R F D N G M I C A T E N S A
2551 AGTTATTGATGCATC-AATCTATGATGAATTTGTCGCTAAAATGCCAACGC 2600
V I D A S I Y D E F V A K M P T Q 2601 AAGGCGCTTATATGGTTCCTAAAAAAGATTAC-AAGGC-AATTGAAAGTTTT 2650 G A Y M V P K K D Y K A I E S F 2651 GTTTTCGTTGAACGTGCTGGTGAAGGTTTTGGTGTAACTGGTCCTGTTGC 2700
V F V E R A G E G F G V T G P V A 2701 TGGTCGTTCTGGTCAATGGATTGCTGAACAAGCTGGTGTTAACGTCCCTA 2750
G R S G Q W I A E Q A G V N V P K 2751 AAGATAAAGATGTTCTT riTTTGAACTTGATAAGAAAAATATTGGGGAA 2800 D K D V L L F E L D K K N I G E
2801 GCTCTTTCTTCTGAAAAACΓTTCTCC-T^ 2850
A L S S E K L S P L L S I Y K S E 2851 AACACGTGAAGAAGGAATTGAAATTGTACGTAGCTTACTTGCTTACCAAG 2900 T R E E G I E I V R S L L A Y Q G 2901 GAGCTGGTCIAC1AACGCTGCCΛTTCAAATCGGTGC-AATGGACGACCCATTT 2950
A G H N A A I Q I G A M D D P F 2951 GTC-AAAGAATACGGAATTAAAGTCGAAGCTTCTCGTATCCTCGTTAACCA 3000
V K E Y G I K V E A S R I L V N Q 3001 ACCTGACTCTATCGGTGGGGTCGGAGATATTTATACTGATGCAATGCGTC 3050 P D S I G G V G D I Y T D A M R P .
3051 CATCATTGACGCTCGGAACTGGTTCATGGGGGAAAAATTCΛCTTTCACAC 3100 S L T L G T G S W G K N S L S H Sau3AI 3101 AATTTGAGTACATACGATCTATTGAATGTTAAAACAGTGGCTAAACGTCG 3150 N L S T Y D L L N V K T V A K R R
3151 TAATCGCCCTCAATGGGTTCGTTTGCCAAAAGAAATTTACTACGAAAAAA 3200
N R P Q W V R L P K E I Y Y E K N 3201 ATGCIAATTTCTTACTTACAAGAATTGCCAC-ACGTCC1ACAAAGCITTCΑTT 3250 A I S Y L Q E L P H V H K A F I 3251 GTTGCCGACCCTGGTATGGTTAAATTCGGTTTCGTTGATAAAGTTTTGGA 3300
V A D P G M V K F G F V D K V L E 3301 AC-AACI GCTATCCGCCCAACTC-AAGTTGAAAC-AAGCATTTATGGCTCAG 3350
Q L A I R P T Q V E T S I Y G S V 3351 TCC-AACCTGACCCAACTTTGAGTGAAGCAATTGCAATCGCTCGTCAAATG 3400 Q P D P T L S E A I A I A R Q M
3401 AACC-Arri GAACCTGAC-A rGTCATCTGTCTTGGTGGTGGTTCTGCTCT 3450
N H F E P D T V I C L G G G S A L 3451 CGATGCTGGTAAGATTGGTCGTTTGATTTATGAATATGATGCTCGTGGTG 3500 D A G K I G R L I Y E Y D A R G E Sau3AI
3501 AGGCTGACCTTTCCGATGACGCAAGTTTGAAAGAGATCTTCCAAGAGTTA 3550
A D L S D D A S L K E I F Q E L
3551 GCTCAAAAATTTGTTGATATTCGTAAACGTATTATCAAATTCTACCACCC 3600
A Q K F V D I R K R I I K F Y H P 3601 ACΛCAAAG(ZAC-?UATGGTTGCTATCCC-TACTACT CTGGTACTGGTTCTG 3650
H K A Q M V A I P T T S G T G S E
3651 AAGTGACTCC-ATTTGCGGTTATCACTGATGATGAAACTCACGTTAAATAT 3700
V T P F A V I T D D E T H V K Y 3701 CClACrr GCTGACTATC-AATTGAC-ACCTC-AAGTTGCCATTGTTGACCCTGA 3750
P L A D Y Q L T P Q V A I V D P E 3751 GTTTGTTATGACTGTACC-AAAACGTACTGTTTCT GGTCTGGGATTGATG 3800 F V M T V P K R T V S W S G I D A 3801 CTATGTC.ACACGCGCTTGAATCTTATGTTTCTGTCATGTCTTCTGACTAT 3850
M S H A L E S Y V S V M S S D Y 3851 AC-AAAACCAATTTCACTTCAAGCC1ATCAAACTCATCTTTGAAAACTTGAC 3900
T K P I S L Q A I K L I F E N L T 3901 TGAGTCITATC-ATTATGACCI-ΛGCTCATCCAACCAAAGAAGGTC-AAAAAG 3950 E S Y H Y D P A H P T K E G Q K A
3951 CTCGCGAAAACATGCACAATGCTGCAACACTCGCTGGTATGGCCTTCGCC 4000
R E N M H N A A T L A G M A F A
4001 AATG ITTCCTTGGAATTAACCACTCACTTGCTCATAAAATTGCTGGTGA 4050
N A F L G I N H S L A H K I A G E 4051 ATTTGGGCTTCCTf^TGGTCl'TGCCΛTTGCTATCGCTATGCCACATGTCA 4100
F G L P H G L A I A I A M P H V I 101 TTAAATTTAACGCTGTAAC-AGGAAACGTTAAATTTACCCCTTACCCACGT 4150
K F N A V T G N V K F T P Y P R 4151 TATGAAACTTATCGTGCGC-fiAGAAGACrrACGCTGAAATTTCACGCTTCAT 4200 Y E T Y R A Q E D Y A E I S R F M
4201 GGGATTTGCTΓGGC-AAAGAAGATTCAGATGAAAAAGCGGTC-W-AGCTTTGG 4250
G F A G K E D S D E K A V K A L V
4251 TTGCTGAACrrTAAAAAATTGACTGATAGTATTGATATTAATATCACCCTT 4300
A E L K K L T D S I D I N I T L 4301 TCAGGAAATGGTGTAGATAAAGCTCATCTTGAACGTGAGCTTGATAAATT 4350
S G N G V D K A H L E R E L D K L
4351 GGCTGACCTTGTTTACGATGACCAATGTACACCTGCTAATCCACGTCAAC 400
A D L V Y D D Q C T P A N P R Q P
4401 C1AAGAATTGATGAGATTAAAC-AACTCTTGTTAGACCAATATTAATATATT 4450 R I D E I K Q L L L D Q Y Stop (SEQ ID NO: 31
4451 AATTATAGTATTTGGAACCGAACGATATCCATGCTCGCTAACCTGCTAAA 4500
4501 GCAGGAAGTCGCLAATGGTACGTC-AACCAAGAATTGATGAGATTAAACAAC 550
Sau3AI 4551 TCTTGTTAGATCAATACTAATAATCTGTTGATAAAAATAATTAAAACGCT 4600 4601 CTGATGAATTCGTCAGAGCGTTTTTTATTATAGCTTATACAACTATCAAA 4650 4651 AGGTATAAATCΛATTTCGATATAGGCTCTTITTC-ACTCCATTGATTTATAT 4700
Sau3AI 4701 TTATATAAAAATCAATAATTAATTAGCGATAGAAGTGATCC 4741 (SEQ ID NOS:28/30)
EXAMPLE 2
1. Construction of L. lactis DB1341 and MG1363 adhE mutant strains by gene inactivation
Inactivation of the adhE gene of strain DB1341 was carried out by Campbell-like integration (Leenthous et al., 1991) of pSMA- 500 derivatives into the DB1341 chromosome. The adhE gene of strain DB1341 was inactivated at two different positions by cloning of PCR fragments (see Fig. 2) into the integration vector pSMA500 (Madsen et al . , 1996). A 706 bp internal adhE fragment was amplified from the DB1341 chromosome using primer adhPl (position 1069-1088 in Table 1.4) and primer adhP2 (posi- tion 1775-1756 in Table 1.4) . These primers contain a Xhol and a BamHI recognition site at the 5' end. The PCR fragment was digested with .Xhol and BamHI followed by cloning into pSMA500. The resulting plasmid, pSMAKAS4 (Fig. 3), was introduced into E. coli MC1000 by electroporation (Sambrook et al . , 1989).
Plasmid pSMAKAS4 was purified and subsequently introduced into strain DB1341 by electroporation (Holo and Nes 1989) and trans - formants were selected on SGM17 plates containing 1 μg/ml erythromycin and 80 μg/ml X-gal (Madsen et al . , 1996) . Homologous integration leads to an adhE gene which is interrupted after amino acid residue Asp543. About 100 blue trans - formants were obtained, indicating that a transcriptional fusion of the adhE gene to the lacLM reporter gene of pSMA500 had occurred. Eight blue transformants were restreaked and the integration point was verified by PCR analysis. One strain, DBKAS4, was selected for further studies.
Another integration further downstream in the adhE gene was constructed by a similar strategy. A 616 bp adhE fragment was amplified from the DB1341 chromosome using primer orf3Pl (posi- tion 2112-2138 in Table 1.4) and primer orf3P2 (position 2728- 2708 in Table 1.4) . The cloning of this fragment into pSMA500 resulted in plasmid pSMAKAS5 (Fig. 3) . Introduction of pSMAKAS5 into DB1341 and subsequent integration into the adhE gene leads to an adhE gene, which is interrupted after amino acid residue lie861. About 400 blue transformants were obtained, which again indicated that a transcriptional fusion of the adhE gene to the lacLM reporter gene of pSMA500 had occurred. Eight blue trans - formants were restreaked and the integration point was verified by PCR analysis. One strain, DBKAS5, was selected for further studies.
pSMAKAS4 and pSMAKAS5 were used also to inactivate the MG1363 adhE gene. One transformant from each transformation that turned blue on X-gal plates (MGKAS4 and MGKAS5) , and therefore contained a translational fusion of the lacLM reporter gene of pSMA500 to the MG1363 adhE gene, was isolated for further studies.
A sample of Lactococcus lactis subspecies lactis biovar diace- tylactis strains DBKAS4 and DBKAS5, respectively and of Lactococcus lactis subspecies lactis strains MGKAS4 and MG AS5, respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Masche- roder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11084, DSM 11085, DSM 11081 and DSM 11082, respectively.
A further adhE mutant strain was obtained by PCR using MG1363 DNA as template and primers adhPl-JChoI (sequence 5' -GGCCGCTCGA- GGTTGAACGTGCTGGTGAAGG-3 ' spanning position 2657-2676 in the MG1363 adhE sequence) (SEQ ID NO: 32) and adhP2 -BainHI (sequence 5 ' -TAGTAGGATCCGGGTCAGGTTGGACTGAGCC-3 ' ; spanning position 3363 - 3344 in the MG1363 adhE sequence) (SEQ ID NO: 33) . A 700 bp fragment was digested with Xhol and BamHI, cloned into likewise digested pSMA500 and transformed into E. coϋ MC1000. The new construction, pSMAKAS14 was introduced into L . lactis MG1363 via electroporation. Integration led to disruption of the resident adhE gene and one transformant that turned blue on X- gal plates (integration results in transcriptional fusion to lacLM, a reporter gene) was selected for further analysis and was named MGKAS14. This integrant should express an AdhE protein truncated at position AspS43.
A sample of MGKAS14 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 10 July 1997 under the accession No. DSM 11654. 2. Physiological characterization of MGKAS14
Physiological studies of MGKAS14 was carried by cultivating the strain in anaerobiosis in M17 medium supplemented with either glucose (GM17) or galactose (GalM17) . The production under these conditions of the metabolites formate, acetaldehyde and pyruvate, respectively was measured and compared to corresponding measurement for the wild type strain, cultivated under similar conditions. In GM17 the production of formate in the mutant strain was reduced (4.86 in GM1363 vs. 1.67 in MGKAS14) , the production of acetaldehyde was increased (0.52 in MG1363 vs. 0.67 in MGKAS14) . No pyruvate was detected with any of the test strains. In the GalM17 medium, the production of formate was reduced substantially in the mutant strain (39.11 in GM1363 vs. 4.39 in MGKAS14) and that of acetaldehyde increased (0.67 in MG1363 vs. 1.12 in MGKAS14) . None of the strains produced pyruvate .
EXAMPLE 3
Cloning of the L. lactis pfl gene
The sequence of the pfl gene encoding pyruvate formate-lyase, a key enzyme in anaerobic metabolism, has only been reported in a few bacteria. DNA sequence homology between the different bacterial pfl genes is limited, making it difficult to clone this gene from other organisms (Table 3.1). Recently, this gene has been cloned in Streptococcus mutans (Yamamoto et al., 1996) . The S. mutans pfl gene encodes a 775 amino acid protein as deduced from the published DNA sequence. Table 3.1. Homology (DNA and protein level) of the L. lactis Ofl with other bacterial pfl genes
Figure imgf000064_0001
*DNA homology through the L. lactis pfl sequence obtained. NA: not submitted to the databases; NF : not found in database searches .
1. Construction of La.ctococcus lactis λZAP genomic libraries
λZAP genomic libraries of L. lactis strains DB1341 and MG1363 were constructed according to the manufacturer's instructions (Stratagene) using partially Sau3AI-digested chromosomal DNA (average size about 5 kb) cloned into λ vector BamHI arms. Average insert size was estimated to be 3 kb.
2. Screening of a λZAP genomic library of strain DB1341 with a S. mutans pfl probe
A l kb EcoRI fragment from the S . mutans pfl gene, encompassing positions 1190-2213 of the published S. mutans sequence (codons 298-639 of the pfl gene) was randomly labelled and used for screening the λZAP genomic library of strain DB1341 (approximately 2 x 105 pfu; Sambrook et al . , 1989). Filters were washed at low stringency (2 x 30 min at room temperature in 5 x SSC, then 1 x 30 min at 65°C in 3 x SSC; 0.1 % SDS), and two positive clones, pfll and pfl2 were identified. A sample of an E. coli strain transformed with clone pfll was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D- 38 124 Braunschweig, Germany on 25 July 1996 under the acces- sion Nos DSM 11103.
3. Sequencing positive λZAP clones and identification of clone containing a pfl fragment
Following in vivo excision (Stratagene) and plasmid DNA isola- tion, sequence analysis (ALF sequenator, Pharmacia) was carrie out for pfll using T7 and T3 primers (Stratagene) . Approximately 2.1 kb was sequenced from one end of clone pfll (from position 1342 in Table 3.2 below), and a truncated, uninterrupted ORF spanning 1.1 kb was found that showed significant homology to other pfl genes, both at the DNA and protein level (Tables 3.3 and 3.4). A putative rho- independent transcription terminator (de Vos and Simons 1994) is located 26 bp downstream of the stop codon (positions 2468-2490 in Table 3.2).
Table 3.2. Sequence of the L. lactis DB1341 pfl gene
The coding sequence starts at position 80 and ends at position 2443. A putative ribosome binding site is shown in bold, double underline (positions 65-71) . A putative rho-independent transcriptional terminator (de Vos and Simons 1994) is found at positions 2468-2490 and is shown in bold, underline (stem) or dotted underline (loop) .
E c o R I
GAATTCTGTTTGCTATTCTCAAACTGTATGATATAATGAAGTTGTAATTT 1 + + + + + 50
GAAACΆGAAAGAACAAAGGAGATTTCAAAATGAAAACCGAAGTTACGGAA 51 - + + + + + 100
MetLysThrGluValThrGlu AATATCTTTGAAC1AAGCTTGGGATGGTTTTAAAGGAACCAACTGGCGCGA 101 + + + + + 150
AsnllePheGluGlnAlaTrpAspGlyPheLysGlyThrAsnTrpArgAsp
TAAAGCAAGCGTTACTCGCTTTGTACAAGAAAACTACAAACCATATGATG 151 + + + + + 200
LysAlaSerValThrArgPheValGlnGluAsnTyr ysProTyrAspGly -
GTGATGAAAGCTTTCTTGCTGGGCCAACAGAACGTACACTTAAAGTAAAG 201 + + + + --+ 250
AspGluSerPheLeuAlaGlyProThrGluArgThrLeu ysValLys AAAATTATTGAAGATAC-AAAAAATCACTACGAAGAAGTAGGATTTCCCTT
251 -- + + + +- + 300
LysIlelleGluAspThr ysAsnHisTyrGluGluValGlyPheProPhe
CGATACTGACCGCGTAACCTCTATTGATAAAATCCCTGCTGGATATATCG 301 + --+-- + + + 350 AspThrAspArgValThrSerlleAspLysIleProAlaGlyTyrlleAsp -
ATGCTAATGATAAAGAACTTGAACTCATCTATGGGATGCAAAATAGCGAA 351 + + + + -+ 400
AlaAsriAspLysGluLeuGlu euIleTyrGly etGlnAsnSerGlu
CTTTTCCGCTTGAATTTCATGCCAAGAGGTGGACTTCGTGTTGCTGAAAA 401 + + + + + 450 euPheArgLeuAsnPheMetProArgGlyGlyLeuArgValAlaGluLys
GATTTTGACΑGAAC-ACGGTCTCTCAGTTGACCC-AGGCTTGC-ATGATGTTT
451 + + + + + 500
Ile euThrGluHisGly euSerValAspProGly euHisAspValLeu - TGTC1AC-AAACAATGACTTCTGTAAATGATGGAATCTTTCGTGCTTATACT
501 + + + ---+ + 550
SerGlnThrMetThrSerValAsnAspGlyllePheArgAlaTyrThr
TClAGαΛTTCGTAAAGCACGT(_ΛTGCTC-ATACTGTAACAGGTTTGCCAGA 551 + + + + + 600 SerAlalleArgLysAlaArgHisAlaHisThrValThrGlyLeuProAsp
TGCTTACTCTCGTGGACGTATCIA.TTGGTGTCTATGCACGTCTTGCCCTTT 601 + + + + + 650
AlaTyrSerArgGlyArgllelleGlyValTyrAlaArg euAlaLeuTyr -
ACGGTGCTGATTACCTTATGAAGGAAAAAGCAAAAGAATGGGATGCAATC 651 + + +- -+ + 700
GlyAlaAspTyr euMetLysGlu ysAlaLysGluTrpAspAlalle
ACTGAAATTAACG-^GAAAACATTCGTCTTAAAGAAGAAATTAATATGCA 701 + -+ + + + 750
ThrGluIleAsnGluGluAsnlleArgLeuLysGluGluIleAsnMetGln ATACClAAGC-TTGClAAGAAGTTGTAAACTTTGGTGCCTTATATGGTCrTG
751 + + --+ + --+ 800
TyrGlnAlaLeuGlnGluValValAsnPheGlyAlaLeuTyrGlyLeuAsp -
ATGTTTCACGTCCAGCTATGAACGTAAAAGAAGCAATCCAATGGGTTAAC 801 + + + + + 850 ValSerArgProAla etAεnValLysGluAlalleGlnTrpValAsn
ATCGCTTATATGGCAGTATGTCGTGTC-ATTAATGGAGCTGCAACTTCACT 851 --+ + + + + 900
IleAlaTyrMetAlaValCysArgVallleAsnGlyAlaAlaThrSerLeu TGGACGTGTTCCAATCGTTCI GATATCTTTGCAGAACGTGACCTTGCTC 901 --+ + + + + 950
GlyArgValProIleValLeuAspIlePheAlaGluArgAsp euAlaArg -
GTGGAACATTTACTGAACAAGAAATTCAAGAATTTGTTGATGATTTCGTT 951 + + + -+ + 1000
GlyThrPheThrGluGlnGluIleGlnGluPheValAspAspPheVal
TTGAAGCTTCGTACAATGAAATTTGCTCGTGCAGCTGCTTATGATGAACT
1001 + + + + + 1050
LeuLy sLeuArgThrMe tLy s PheAleLArgAl εiAl aAl aTy r AspG luLeu TTATTCTGGTGACCCAACATTCATCACAACATCTATGGCTGGTATGGGTA
1051 + + + + + 1100
TyrSerGlyAspProThrPhelleThrThrSerMetAlaGlyMetGlyAsn -
ATGACGGACGTCACCGTGTCACΓAAAATGGACTACCGTTTCTTGAACACA
1101 + + + +- + 1150 AspGlyArgHiεArgValThrLysMetAspTyrArgPheLeuAsnThr
CTTGATACAATCGGAAATGCTCCLAGAACCAAACTTGACAGTCCTTTGGGA
1151 -- + --+ + +-- + 1200
LeuAspThrlleGlyAsnAlaProGluProAsnLeuThrValLeuTrpAsp
TTCTAAACTTCCTTACTCATTCAAACGTTATTCAATGTCTATGAGCCACA 1201 + + + + --+ 1250
SerLyεLeuProTyrSerPheLysArgTyrSerMetSerMetSerHisLys -
AGCATTCTTCTATTCAATATGAAGGTGTTGAAACAATGGCTAAAGATGGA
1251 + + + + + 1300
HisSerSerlleGlnTyrGluGlyValGluThrMetAlaLysAspGly S a u 3 A I
TATGGCGAAATGTCATGTATCTCTTGTTGTGTCTCACCACTTGATCCAGA
1301 + --+ + -+ + 1350
TyrGlyGluMetSerCysIleSerCysCysValSerProLeuAspProGlu -
AAATGAAGAAGGACGTCATAACCTCCAATACTTTGGTGCGCGTGTAAACG 1351 +-- + + + + 1400
AsnGluGluGlyArgHisAsnLeuGlnTyrPheGlyAlaArgValAsnVal -
TCTTGAAAGCAATGTTGACTGGTTTGAACGGTGGTTATGATGACGTTCAT
1401 + + + + + 1450
LeuLysAlaMetLeuThrGlyLeuAsnGlyGlyTyrAspAspValHis AAAGATTATAAAGTATTCGACATCGAACCTGTTCGTGACGAAATTCTTGA
1451 + + --+ + --+ 1500
LysAspTyrLysValPheAspIleGluProValArgAspGluIleLeuAsp
CTATGATA<_AGTTATGGAAAACTTTGACAAATCTCTCGACTGGTTGACTG
1501 + + + + + 1550 TyrAspThrValMetGluAsnPheAspLysSerLeuAspTrpLeuThrAsp -
ATACTTATGTTGATGCAATGAATATCATTCATTACATGACTGATAAATAT
1551 + + + + + 1600
ThrTyrValAspAlaMetAsnllelleHisTyrMetThrAspLysTyr AACTATGAAGC-A.GTTC-AAATGGCCTTCTTGCCTACTAAAGTTCGTGCTAA
1601 - + + + + - - + 1650
AsnTyrGluAlaValGlnMetAlaPheLeuProThrLysValArgAlaAsn
CΛTGGGATTTGGTATCTGTGGATTCGC-AAATACAGTTGATTα.CTTTCAG 1651 + - + + - + + 1700
MetGlyPheGlylleCysGlyPheAlaAsnThrValAspSerLeuSerAla -
CAATTAAATATGCTAAAGTTAAAACATTGCGTGATGAAAATGGCTATATC
1701 +-- + + + + 1750
IleLysTyrAlaLysValLysThrLeuArgAspGluAsnGlyTyrIle S a u 3 A I
TACGATTACGAAGTAGAAGGTGATTTCCCTCGTTATGGTGAAGATGATGA
1751 + + -+ --+ + 1800
TyrAspTyrGluValGluGlyAεpPheProArgTyrGlyGluAspAspAsp
TCGTGCTGATGATATTGCTAAACTTGTCATGAAAATGTACCATGAAAAAT 1801 --+ +-- - + + --+ 1850
ArgAlaAspAspIleAlaLysLeuValMetLysMetTyrHisGluLysLeu -
TAGCTTCAC1ACΛAACTTTACAAAAATGCTGAAGCTACTGTTTCACTTTTG
1851 - + -+ + +- -+ 1900
AlaSerHisLysLeuTyrLysAsnAlaGluAlaThrValSerLeuLeu ACAATTACATCTAACGTTGCTTACTCTAAACAAACTGGTAATTCTCCAGT
1901 + + + +- + 1950
ThrlleThrSerAsnValAlaTyrSerLysGlnThrGlyAsnSerProVal
ACATAAAGGAGTATTCCTCAATGAAGATGGTACAGTAAATAAATCTAAAC
1951 ---+ + + +--- + 2000 HisLysGlyValPheLeuAsnGluAspGlyThrValAsnLysSerLysLeu -
E c o R I
TTGAATTCTTCTCACCAGGTGCTAACCCATCTAATAAAGCTAAGGGTGGT
2001 + + --+ + -+ 2050
GluPhePheSerProGlyAlaAsnProSerAsnLysAlaLysGlyGly
E c o R I TGGTTGCAAAACCTTCGCTC-ATTGGCTAAGTTGGAATTCAAAGATGCAAA 2051 +-- + + - + -- + 2100
TrpLeuGlnAsnLeuArgSerLeuAlaLysLeuGluPheLysAspAlaAsn
TGATGGTATTTCATTGACTACTCAAGTTTCACCTCGTGCACTTGGTAAAA
2101 --- + + + - + + 2150
AεpGlylleSerLeuThrThrGlnValSerProArgAlaLeuGlyLysThr - CTCGTGATGAAC-AAGTGGATAACTTGGTTCAAATTCTTGATGGATACTTC
2151 + + +- + + 2200
ArgAspGluGlnValAspAsnLeuValGlnlleLeuAspGlyTyrPhe ACACCAGGTGCTTTGATTAATGGTACTGAATTTGCAGGTCAACACGTTAA
2201 -+ + + + + 2250
ThrProGlyAlaLeuIleAsnGlyThrGluPheAlaGlyGlnHisValAsn -
CTTGAACGTAATGGACCTTAAAGATGTTTACGATAAAATCATGCGTGGTG 2251 + + + + + 2300
LeuAsnValMetAspLeuLysAspValTyrAspLysIleMetArgGlyGlu -
AAGATGTTATCGTTCGTATCTCTGGTTACTGTGTCAATACTAAATACCTC
2301 + + +-- + + 2350
AspVallleValArglleSerGlyTyrCysValAsnThrLysTyrLeu ACIACCAGAAC-AAAAACLAAGAATTAACTGAACGTGTCTTCCΑTGAAGTTCT
2351 --+ + +-- + + 2400
ThrProGluGlnLysGlnGluLeuThrGluArgValPheHisGluValLeu
TTCAAACGATGATGAAGAAGTAATGCATACTTCAAACATCTAATTCTTAA
2401 + -- + + + -+ 2450 SerAsnAspAspGluGluValMetHisThrSerAsnlleEnd (SEQ ID NO: 16)
AATTTAATGAATATTCGGTCTGTC-A.GTTTTACTGACAGACT'1"1 1"1,1 AC 2451 - +- + - - - + + + 2500
GAAAAAATTAATCTVTAATAGTTAAAAAC-TATTGTTTTTAGTTTAAGAAAG 2501 + -+ + --+ + 2550
TTAAATTTTATGCTAAAATAGATGAATGAAAATGGTAATTGGATTGACAG 2551 + + + - + + 2600 GCGGAATTGCGATGGGAAATCAACGGTGGTTGATTTTTTGATTCTGAGGG
2601 + + + + + 2650
TTAT(_y GTGATTGATGCTGACAAAGTTGTCCGTCAATTTACAAGAACCT
2651 + --+ + -+ + 2700
GGCGGAAAACTTTAC-AAGGCAATATTAGAAACTTACGGTTTAGATTTTAT 2701 + --+ + --+ + 2750
TGCTGACAATTGGAC-AGTTAAATCGTGAAAAATTAGGAGCTTTAGTTTTT
2751 + + + + + 2800
TCTΓGATTCAAAAGAGCGCGAGAAATTATCAAACTTACAAGATGAAATTAT
2801 - + + -+ + - --+ 2850
TCGTACAGAATTATATGATAGACGTGATGACTTATTAAAAAAAATGACTG 2851 + + + +- + 2900
AO GTCTGTC-AGTAAAAATTTTGATTCAAAGAGTCAAGGAAAAAATCTG
2901 + + + + --+ 2950 TCAGTAAATAAGCCAATATTTATGGATATTCCGTTATTAATTGAATACAA
2951 - + +-- + +-- + 3000
TTATACCGGATTTGATGAAATATGGTTGGTCAGCTTACCTGAAAAAATAC
3001 + + + + + 3050
AATTAGAAAGACTGATGGCAAGAAATAAGTTTACGGAAGAAGAAGCTAAA 3051 + + + + + 3100
AAACGAATTTCTTCLAC-AAATGCCATTGTC-AGAAAAACAAAAAGTCGCTGA 3101 + - + + + + 3150
TGTCATTCTGGATAATTCTGGAAAGATTGAAGCACTAAAAAAACAAATCC 3151 + + + + - + 3200 AGCGAGAACTAGCTAGGATAGAAGAACAGAAATAGAGGTGAATCGCACGA 3201 + + + - - + + 3250
AAACAGTTAATTGGAAAGGAATTTATTTATAACATGGATTGGCTGCTTTT 3251 - - + - - + + + + 3300
TTGTAGGTTC-ATCATTTTCACTCGTCATGCCTTTCTCCCCTTGTATATTC 3301 - - - + + - + + + 3350 AAGGACTGGGTGAAGCGGTGGGAATTTGAACTTTACTC-R.GGGTTACTTTT
3351 + + + + - - + 3400
TCTTTGCCAGCCTTA ( SEQ ID NO : 15 ) 3401 + 3415
Table 3.3. DNA homology (FASTA. GCG Wisconsin Package Version 8. Genetics Computer Group) using the complete L . lactis DB1341 pfl sequence shown in Table 3.2
Only the two highest scores {S. mutans and H. influenzae pfl genes, designated smpfl and hi3281, respectively) are shown.
(Nucleotide) FASTA of: dbpf1. seq from: 1 to: 3415 July 19, 1996 10:11 The best scores are: initl initn opt.. empro:smpfl D50491 Streptococcus mutans pfl...4335 empro-.hi32812 /rev U32812 Hae . influenzae focA 652 empro:ecpfl X08035 E. coli pfl (pyruvate form. 429 empro : cppflact X93463 C.pasteurianum pfl and act309 emnew:cefl3bl2 /rev Z70683 Caenorhabdi is eleg. 94
Figure imgf000070_0001
dbpf1. seq empro: smpfl ID SMPFL standard; DNA; PRO; 3067 BP.
AC D50491;
NI gll29081
DT 23-DEC-1995 (Rel . 46, Created)
DE Streptococcus mutans pfl gene for pyruvate formate - lyase . SCORES Initl: 4335 Initn: 5345 Opt: 4996
71.8% identity in 2608 bp overlap
10 20 30 40 50 dbpf1. GAATTCTGTTTGCTATTCTCAAACTGTATGATATAATGAAGTTGTAATTTGA l l l l l i l l l l l M l I I M M M l smpfl AAGCAAGTTCTTTCGCTTGTGTAACCGGTTACTGTATGATAGAATATAATCGTAAATTGT 200 210 220 230 240 250 60 70 80 90 dbpf1. AACAGA AAGAACAAAGGAGATTTCAA-AATGAAAAC CGAAGTTACG
MMM MM I M! M MM Ml Ml I II II smp f 1 AAC1AGATTAACTGTTACTAGAATAGAGGGGAACT(_UTTATGGCAACTGTCAAAACTAAC 260 270 280 290 300 310
100 110 120 130 140 150 dbpf 1. GAAAATATCTTTGAACAAGCTTGGGATGGTTTTAAAGGAACCAACTGGCGCGATAAAGCA
I I MMM MM Mill II Mill II I MM smpfl ACTGACGTTTTTGAAAAAGCCTGGGAAGGCTTTAAAGGAACTGACTGGAAAGACAGAGCA 320 330 340 350 360 370
160 170 180 190 200 210 dbpf 1. AGCGTTACTCGCTTTGTACAAGAAAACTAClftAACCATATGATGGTGATGAAAGCTrTCTT
Ml II MIIMIIM Mill IIIIIM II I Mill MMM smpf 1 AGC-ATTTCTCGCriTTGTTC-AAGACAACTACACTCC-ATATGACGGAGGCGAAAGTTTTCTT 380 390 400 410 420 430
220 230 240 250 260 270 dbpf 1. GCTGGGCCAACAGAACGTACACTTAAAGTAAAGAAAATTATTGAAGATACAAAAAATCAC
II II II II MMM Mill I I II III I I Mill II Ml II smpf1 GCCGGCCCTACrGAACGTTCACTTCAClATαΛAAAGTCGTAGAAGAAACTAAAGCGCAT 440 450 460 470 480 490
280 290 300 310 320 330 dbpf1. TACGAAGAAGTAGGATTTCCCTTCGATACTGACCGCGTAACCTCTATTGATAAAATCCCT
MIMIIII I I IIIM MMM I II IIIIIM I I Mill smpf1 TACGAAGAAACACGTTTTCCAATGGATAC ACGTATTACATCTATTGCTGATATCCCA 500 510 520 530 540 550
340 350 360 370 380 390 dbpf1. GCTGGATATATCGATGCTAATGATAAAGAACTTGAACTCATCTATGGGATGCAAAATAGC
II II Mill III II III MM I II I III II Mill smpf 1 GCAGGTTATAT TGAC1AAGGAAAATGAATTGATTTTTGGTATCCAAAACGAT 560 570 580 590 600
400 410 420 430 440 450 dbpf 1. GAACTTTTCCGCTTGAATTTCATGCCΪΛGAGGTGGACTTCGTGTTGCTGAAAAGATTTTG MM MIIMIIM III II MM I IIIIIM MM smpf 1 GAACTTTTTAAGCTGAACTTCATGCCAAAAGGCGGTATTCGCATGGCTGAAACAGCTTTG 610 620 630 640 650 660
460 470 480 490 500 510 dbp f 1. ACΛGAAC-ACGGTCTCTCAGTTGACCC-AGGCTTGC TGATGTTTTGTC-ACAAACAATG - - A
I MMM Ml I Mill I I I Mill I I I II II Ml I smpf1 AAAGAACATGGTTATGAACCAGACCCTGCCGTTCATGAAATCT- -TTACCAAATATGCAA 670 680 690 700 710 720
520 530 540 550 560 570 dbpf1. CTTCTGTAAATGATGGAATCTTTCGTGCTTATACTTC-AGCAATTCGTAAAGCACGTCATG
I I M MMMIIMMM MMM Mill iMIIIMM smpf 1 CLAACCGTTAATGATGGTATCTTTCGTGCTTAC-ACTTC-AAAC-ATTCGCCGTGCACGTCATG 730 740 750 760 770 780
580 590 600 610 620 630 dbpf 1. CTCATACTGTAAC1AGGTTTGCC-AGATGCTTACTCTCGTGGACGTATCATTGGTGTCTATG
I II II II MM III I Mill Ml IIIIIM! Mill I II Mill II MM smpf1 CCCACLACTGTAACTGGTCTCCCIAGATGC-ATACTCTCGCGGACGTATTATTGGAGTTTATG 790 800 810 820 830 840 640 650 660 670 680 690 dbpf1. CΑCGTCTTGCCCTTTACGGTGCTGATTACCTTATGAAGGAAAAAGCAAAAGAATGGGATG
I IMMIII II II III I III I IIIIIM II M III I smpf 1 CCCGTCTTGCTCTCTATGGTGCTGACTACTTGATGCAAGAAAAAGTGAACGACTGGAACT 850 860 870 880 890 900
700 710 720 730 740 750 dbpf 1. CAATCACTGAAATTAACGAAGAAAAC1ATTCGTCTTAAAGAAGAAATTAATATGCAATACC
MM IMMIII I MMM III I Mill I smpf 1 ClAATTGCTGAAATTGATGAAGAATC-ftATTCGTCTTCGTGAAGAAATCAATCTTCAATATC 910 920 930 940 950 960
760 770 780 790 800 810 dbpf 1. AAGCTTTGCAAGAAGTTGTAAACTTTGGTGCCrrTATATGGTCπGATGTTTC-ACGTCCAG
I II I Mill M II MM I MMIMMMMM II I smpf1 AGGCACTTGGCGAAGTAGTGCGGTTGGGTGATCTGTATGGTCTTGATGTTCGCAAACCTG 970 980 990 1000 1010 1020
820 830 840 850 860 870 dbpf1. CTATGAACGTAAAAGAAGCAATCCAATGGGTTAACATCGCTTATATGGCAGTATGTCGTG
IIIIIM II II Mill MM Mill I MMM II II II I smpfl CTATGAATGTTAAGGAAGCTATCC-AATGGATTAATATCGCCTTTATGGCTGTCTGCCGCG 1030 1040 1050 1060 1070 1080
880 890 900 910 920 930 dbpf 1. TCATTAATGGAGCTGCAACTTCACTTGGACGTGTTCCAATCGTTCTTGATATCTTTGCAG
I II Mill MMMMMIMMIMMMIM smpf l TTATCAATGGTGCTGCAACTTCTCTTGGACGTGTCCCAATCGTTCTTGATATCΓTTGCAG 1090 1100 1110 1120 1130 1140
940 950 960 970 980 990 dbpf1. AACGTGACCTTGCTCGTGGAAC-ATTTACTGAACAAGAAATTα\AGAATTTGTTGATGATT II II MMM MMM IIIIMM IMMIII I smpf1 AACGTGACCTTGCTCGTGGCACTTTCACTGAATC1AGAAATCC1AAGAATTCGTTGATGACT 1150 1160 1170 1180 1190 1200
1000 1010 1020 1030 1040 1050 dbpf1. TCGTTTTGAAGCTTCGTAC-AATGAAATTTGCTCGTGCAGCTGCTTATGATGAACTTTATT
III! I III! I IMMIII III I smpfl TCGTTATGAAACTTCGTACGGTTAAATTTGCACGTACTAAGGCTTATGACGAACTTTATT 1210 1220 1230 1240 1250 1260
1060 1070 1080 1090 1100 1110 dbpf1. CTGGTGACCCAACATTCATCACAACATCTATGGCTGGTATGGGTAATGACGGACGTCACC
I II II II III MIIMIIM smpf 1 CAGGTGACCCAAC-ATTTATTACGACTTCTATGGCTGGTATGGGAGCTGATGGACGTCACC 1270 1280 1290 1300 1310 1320
1120 1130 1140 1150 1160 1170 dbpf 1. GTGTα-CTAAAATGGACTACCGTTT TroAACAC-ACTTGATAC_AATCGGAAATGCTCCAG
MM Mill MMMMMIMMM II II IIIIIM II II MIIMIIM smpf 1 GTGTTACTAAGATGGACTACCGTTTCTTAAATACGCTTGATAATATTGGCAATGCTCCAG 1330 1340 1350 1360 1370 1380
1180 1190 1200 1210 1220 1230 dbpf 1. AACCLAAACCTGAC-AGTCCTTTGGGATTCTAAACTTCCr-^^
MM Mill II II MMM MM I IMMIII Ml I Mill I smpfl AACCTAACTTAACCGTTCTTTGGTC_f-AGTAAATTGCCTTACT πTCCGTC^TTATTGTA 1390 1400 1410 1420 1430 1440 1240 1250 1260 1270 1280 1290 dbpf1. TGTCTATGAGCC1ACAAGC-ATTCTTCTATTCAATATGAAGGTGTTGAAACAATGGCTAAAG
MIIMIMMMMMMMMM 1111 ! 11111111111 i III MIIMIIM smpf 1 TGTCTATGAGCCACAAGCATTCTTCAATTCAATATGAAGGTGTCACAACTATGGCTAAAG 1450 1460 1470 1480 1490 1500
1300 1310 1320 1330 1340 1350 dbpf 1. ATGGATATGGCGAAATGTC1ATGTATCTCTTGTTGTGTCTCACCACTTGATCCAGAAAATG
I II Mill MMMMMIMMM M Mill II II Mill I smpf 1 AAGGTTATGGTGAAATGTCATGTATCTCATGCTGTGTATCTCCGCTTGATCCTGAAAACG 1510 1520 1530 1540 1550 1560
1360 1370 1380 1390 1400 1410 dbpf 1. AAGAAGGACGTC-ATAACCTCClftATACTTTGGTGCGCGTGTAAACGTCTTGAAAGCAATGT
MM I II II II II Mill Mill I MMM I smpf1 AAGATCGTCGCCACAATCTACAATACTTTGGTGCrrCGTGTTAACGTTCTTAAAGCACTTC 1570 1580 1590 1600 1610 1620
1420 1430 1440 1450 1460 1470 dbpf1. TGACTGGTTTGAACGGTGGTTATGATGACGTTCATAAAGATTATAAAGTATTCGACATCG
I II II! I II I! IIMI II M Mill Mill II I! II! smpf1 TTACAGGTCTTAATGGCGGTTACGACGATGTTC-AαiAAGACTACAAAGTATTTGATGTCG 1630 1640 1650 1660 1670 1680
1480 1490 1500 1510 1520 1530 dbpf1. AACCTGTTCGTGACGAAATTCTTGACTATGATAC1AGTTATGGAAAACTTTGACAAATCTC
MM! I Mill Ml I Mill I III II MM I II Mill III I I smpf 1 AACCTATCCGTGATGAAGTCCTTGATTTTGAAACGGTTAAAGCTAATTTTGAAAAAGCAC 1690 1700 1710 1720 1730 1740
1540 1550 1560 1570 1580 1590 dbpf 1. TCGACTGGTTGACTGATACTTATGTTGATGC1AATGAATATCATTCATTACATGACTGATA
I II MMMMMIMMM II II ! 11111111111111 M II MIIMIIM smpf 1 TTGATTGGTTGA(_rrGATACTTACGTGGACGCAATGAATATCATTCACTATATGACTGATA 1750 1760 1770 1780 1790 1800
1600 1610 1620 1630 1640 1650 dbpf 1. AATATAACTATGAAGCAGTTCAAATGGCCTTCTTGCCTACTAAAGTTCGTGCTAACATGG
MMIIMMMMM 11111111111111111 II II III II I! MM smpfl AATATAACTATGAAGCCGTTCAAATGGCCTTCTTACCΛACACGTGTTAAAGCCAATATGG 1810 1820 1830 1840 1850 1860
1660 1670 1680 1690 1700 1710 dbpf 1. GATTTGGTATCTGTGGATTCGC_AAATA(_AGTTGATTC-ACTTTCAGCAATTAAATATGCTA
MIIMIIM II MMM I 11111111 M I II 11 I MM! 1111111111111 smpf1 GATTTGGTATTTGCGGATTCTCTAATACAGTTGATTCATTATC1AGCTATTAAATATGCTA 1870 1880 1890 1900 1910 1920
1720 1730 1740 1750 1760 1770 dbpf1. AAGTTAAAACATTGCGTGATGAAAATGGCTATATCTACGATTACGAAGTAGAAGGTGATT
II III I I MM II II Mill II III I MM ! smpf1 CTGTAAAACCTATTCGTGATGAAGATGGTTAC-ATTTACGACTATGAAACTGTTGGTAACT 1930 1940 1950 1960 1970 1980
1780 1790 1800 1810 1820 1830 dbpf1. TCCCTCGTTATGGTGAAGATGATGATCGTGCTGATGATATTGCTAAA- -CTTGTCATGAA
IIIMIMM II MM I! II III II Ml I MM smpf1 TCCCTCGTTACGGAGAAGATGATGACCGTGTAGACTCAATCGCTGAATGGTTG-CTTGAA 1990 2000 2010 2020 2030 2040 1840 1850 1860 1870 1880 1890 dbpf1. AATGTACCATGAAAAATTAGCTTCAC1ACAAAC TTACAAAAATGCTGAAGCTACTGTTTC
I I MM I II II MM MMM II I II smpf 1 GCT - TTCCATACTCGTCTTGCACGTCATAAACTGTACAAAGATTCCGAAGCTACTGTATC 2050 2060 2070 2080 2090 2100
1900 1910 1920 1930 1940 1950 dbpf 1. ACTTTTGAClAATTAClATCrrAACGTTGCTTACTCTAAAC-AAACTGGTAATTCTCCAGTACA
I I I Mill II MM II I II I II II II II II II I II II II I II smpf 1 ATTGCTTAC1AATCACTTCTAATGTTGC-TTATTCTAAAC-AAACTGGTAATTCTCCAGTTCA 2110 2120 2130 2140 2150 2160
1960 1970 1980 1990 2000 2010 dbpf 1. TAAAGGAGTATTCCTCAATGAAGATGGTACAGTAAATAAATCTAAACTTGAATTCTTCTC
M II II I I II II MM I smpf 1 C1AAGGGTGTTTACCTC-AATGAAGATGGTTCTGTGAACTTGTCTAAAGTAGAATTCTTCTC 2170 2180 2190 2200 2210 2220
2020 2030 2040 2050 2060 2070 dbpf 1. ACC-AGGTGCTAACCCATCTAATAAAGCTAAGGGTGGTTGGTTGCAAAACCTTCGCTCATT II II MMMMIMI I MM smpf 1 ACCΛGGTGCTAACCCΑTC-AAATAAAGCTTCCGGCGGCTGGTTGC1AAAACTTGAACTCATT 2230 2240 2250 2260 2270 2280
2080 2090 2100 2110 2120 2130 dbpf 1. GGCTAAGTTGGAATTCAAAGATGCAAATGATGGTATTTC1ATTGACTACTCAAGTTTCACC
I II I II II I MMMMIMI MMM smp f 1 GAAGAAACTTGACTTTGCTCLACGC-AAATGATGGTATCTC-ATTGACAACTC-ΛAGTTTCACC 2290 2300 2310 2320 2330 2340
2140 2150 2160 2170 2180 2190 dbpf 1. TCGTGCACTTGGTAAAACTCGTGATGAAC-AAGTGGATAACTTGGTTCΛAATTCTTGATGG
II MMM II I MM II smpf1 AAAAGCTCTTGGTAAGACATTCGATGAA<_J^GTTGCTAACTTAGTAACAATTCTTGATGG 2350 2360 2370 2380 2390 2400
2200 2210 2220 2230 2240 2249 dbpf1. ATACTTCACACCAGGTGCT TTGATTAATGGTACTGAATTTGCAGGTCAACACGTTA smpf1 TT MACTMTTGAAIGGCGIGICGIGTICAACACGITTIAAICTITGIAAICI -GITTATIGGIATCTTIAAIAIGAITGITITIT 2410 2420 2430 2440 2450 2460
2250 2260 2270 2280 2290 2300 2309 dbpf l . ACΓTGAACGTAATGGACCTTAAAGATGTTTACGATAAAATCATGCGTGGTGAAGATGTTA I II I III I I MMM II I III Ml ! MM smpf1 ATGACAAGATCATGAATGGTGAAGATGTTATCGTTCGTATC TCAGGTTACTGTGTTA 2470 2480 2490 2500 2510
2310 2320 2330 2340 2350 2360 2369 dbpfl . TCGTTCGTATCTCTGGTTACΓGTGTCAATACTAAATACCTCACACCAGAACAAAAACAAG I I I I I I II III III I Ml I II I smpf1 ACACTAAATACCTTACTAAAGAACAAAAGACTGAAT TGACACAACGTGTTTTCCATG 2520 2530 2540 2550 2560 2570
2370 2380 2390 2400 2410 2420 dbpf1. AA-TTAACTGAACGTGTCTTCCA- -TGAAGTTCTTTCAAACGATGATGAAGAAGTAA- -T smpf1 A IA!GTIT!CTCITICIAIAITGIGATIGATIGCIAIGCTIACAIGIACTTIGGTTAMAClAAICAAIGTAMAGAlGTTIAIA!ACA 2580 2590 2600 2610 2620 2630 2430 2440 2450 2460 2470 dbpf1. GCATA-CTTCAAACATCTAATTCTTAAAA TTTAATGAATATTCGG- -TCTGT
I II I! Ill I II I II MM III! I MM I I smpf1 GTTTAGTTTAAAAGACCTCΑCTCATAAAAGTGAGGTCTTTACTTTGCTTTCGGGTACGAT 2640 2650 2660 2670 2680 2690
2480 2490 2500 2510 2520 2530 dbpf1. CAGTTTTACTGACAGACTTTTTTTTACGAAAAAATTAATCATAAT-AGTTAAAAACTATT
I I M i l l ! M l I I I M l 1 1 I I I I I I I I I I I I I I I smpf1 CA-AAGCAGTGAGAGCTTTTTATATTCTAAAAACTCA- -CAAATTCAGAAAAAAACAGCT 2700 2710 2720 2730 2740 2750
2540 2550 2560 2570 2580 2590 dbpf1. GTTTTTAGTTTAAGAAAGTTAAATTTTATGCTAAAATAGATGAATGAAAATGGTAATTGG smpf1 CT IT!GT IGA ITT ITI G MAAAl A IGCT MTTMTA-G MCTAlCA MATAlATATTA ITMGAMAAIAITI - -T MAAMTTAT
2760 2770 2780 2790 2800
2600 2610 2620 2630 2640 2650 dbpf1. ATTGACAGGCGGAATTGCGATGGGAAATC1AACGGTGGTTGATTTTTTGATTCTGAGGGTT smpf1 ACTCGCGA(_ΛCAOTGT(_-TCC-ACCTATCTTGATGCAGTAAAAATTAGAC-ACC1^GTCTTC 2810 2820 2830 2840 2850 2860 dbpfl.: corresponding to nucleotides 1-2653 of SEQ ID NO: 15; smpfl: SEQ ID NO: 17
dbpfl . seq /rev empro:hi32812 ID HI32812 standard; DNA; PRO; 10817 BP.
AC U32812; L42023;
NI gl222092
DT 09-AUG-1995 (Rel . 44, Created)
DE Haemophilus influenzae focA, pflA, pflB, rspB, yaaJ, yajF, yeiG
SCORES Initl: 652 Initn: 1077 Opt: 1299
55.4% identity in 1961 bp overlap
1979 1969 1959 1949 1939 1929 1920 dbpfl . C_ATCrrC-RTTGAGGAATACTCCTTTATGTACTGGAGAATTACC-AGTTTGTTTAGAGTAAG
I II I MM MM III Ml hi3281 GTCCGAATGGTGCACC-AGCACGACGACC-ATC-AGGGGTGTTACCCGTTTTCTTACCATAAA 2730 2740 2750 2760 2770 2780
1919 1909 1899 1889 1879 1869 1860 dbpfl . C-AACGTTAGATGTAATTGTCAAAAGTGAAAC-AGTAGCrTCΛG(-ATTTTTGTAAAGTTT^
I M M I I I I I I I M l I M M M l I I ! hi3281 CTACGTTAGAAGTAATGGTTAATACΛGATTGTGTAGGCACTGCΛTTGCGGTAAGTTTTAA 2790 2800 2810 2820 2830 2840
1859 1849 1839 1829 1819 1809 dbpfl . GTGAAGCTAATTT-TTC-ATGGTACATTTTCATGAC-AA
I I I I I I M M I I I I I I I I I I I M M M l hi3281 GTTTTTGAATTTTCTTCATAAAAC-GTTCAACTAAGTCACAAGCGATGTCATCAACACGG 2850 2860 2870 2880 2890 2900
1799 1789 1779 1769 1759 dbpfl . T(_ATC-ATCTTCACCATAACGAGGGAAATCACCTTCTACTTCGTAATCG- - -TA
I M M I M M I I I I M M M ! I l l I I I I I ! hi3281 TTATC1ATTGTTACC1ATATTGTGGATATTCACCTTCGATTTCAAAGTCGATTGCTACGTTA 2910 2920 2930 2940 2950 2960 1750 1749 1739 1729 1719 dbpfl . G - -ATATAGCCATTTTCAT CACGC-AATGTTTTAACTTTAGCA
I I II MM II II III! I II MMM III hi3281 GTTGC-AAC-AACATTGCCATCTTTATCTTTGATGTCGCCIACGAACTGGTTTAACTTTCGCA 2970 2980 2990 3000 3010 3020
1709 1699 1689 1679 1669 1659 dbpfl . TATTTAATTGCTGAAAGTGAATC-AACTGTATTTGCGAATCCACAGATACCAAATCCCATG
MM III I I I I I II MMM I Ml hi 3281 TATTTGATTGCTGAAAGTGAGTC-AGCCGCIAACAGAAAGACCTGCGATACCACAAGCCATA 3030 3040 3050 3060 3070 3080
1649 1639 1629 1619 1609 1599 dbpfl . TTAGC-ACGAACTTTAGTAGGClAAGAAGGCC-ATTTGAACrrGCTTC-ATAGTTATATTTATCA
II II I I I II MM ! Ml! II MMM h i 3281 GTACGGTATACATCACGATCATGTAATGCCATTAATGCGGCTTCGTATGAATATTTATCG 3090 3100 3110 3120 3130 3140
1589 1579 1569 1559 1549 1539 dbpfl . GTCATGTAATGAATGATATTCATTGCATC-ftACATAAGTATC-AGTCAACCAGTCGAGAGAT
III II II II I II I III MM I I MM M i l l h i 3281 TGCATATAGTGGATTACGTTTAAGGCAGTCACATATTGTTTTGCCAACCAATCCATAAAG 3150 3160 3170 3180 3190 3200
1529 1519 1509 1499 1489 1479 dbpf 1. TTGTCAAAGTTTTCC-ATAACTGTATC-ATAGTCAAGAATTTCGTC-ACGAACAGGTTC - GAT
! M I III I II I I Ml III II III I I I hi 3281 CTATCCLATACGAGT(_ΛTTACTGTATCGAAATCTAATACTTCAT(-AGTAATTGGTGCAGTT 3210 3220 3230 3240 3250 3260
1469 1459 1449 1439 1429 1419 dbpfl . GTCGAATACTTTATAAT I TATGAACGTC-ATCATAACCACCGTTCAAACCΛGTC-AACAT
III I I II II MM III MM I I Ml hi 3281 TTCGGACCTACTTGCATACCTA - ATTTTTCATCGATACCGCCGTTGATTGCGTATAACAA 3270 3280 3290 3300 3310
1409 1399 1389 1379 1369 1359 dbpfl . TGCTTTC-t-AGACGTTTACACGCGCAC(-AAAGTATTGGAGGTTATGACGTCCTTCTT<-ATT
II Ml I III! Ml MM III Ml I I I II II Ml h i 3281 TGTTTTCGCTAAGTTTGCACGTGC-ACCGAAGAATTGCATTTGTTTAC - - CCACAATCAT - 3320 3330 3340 3350 3360 3370
1349 1339 1329 1319 1309 1299 dbp f 1. TTCTGGATCAAGTGGTGAGAC-AC_AAα^GAGATACATGACATTTCGCC-RTATCC-ATCri T
MM MMM I III I III I III hi3281 - TGGTGATACACAAC-ATGCGATTGCGTAGTCATCGTTGTTGAAGTCTGG 3380 3390 3400 3410 3420
1289 1279 1269 1259 1249 1239 dbpfl . AGCClATTGTTTαΛCACCTTt^TATTGAATAGAAGA-ATGCTTGTGGCr-CATAGACATT
I III! Ill I III MMM II I! I I I I I I II hi3281 ACGCATTAAATCATCGTTTTCGTATTGAACTGATGAGGTATC-AATCGATACTTTTGCACA 3430 3440 3450 3460 3470 3480
1229 1219 1209 1199 1189 1179 dbpfl . GAATAACGTTTGAATGAGTAAGGAAGTTTAGAATCCCAAAGGACTGTC1AAGTTTGGTTCT
II I III I II I MM I II MMM III hi3281 GA- -AACGTTTGAAGTTTTCLAGGTAATTGTTCAGACCLAAAGAATGGTTAAGTTTGGCTCT 3490 3500 3510 3520 3530 3540 1169 1159 1149 1139 1129 1119 dbpfl . GGAGCATTTCCGATTGTATCAAGTGTGTTCAAGAAACGGTAGTCCATTTTAGTGACACGG
Ml I I II II I I III Ml II I Ml I Ml II II hi3281 GGAGAAGTACCCATGTTGTAAAGGGTGTGTAAAATACGGAATGTATTTTTGGTTACTAAT 3550 3560 3570 3580 3590 3600
1109 1099 1089 1079 1069 1059 dbpfl . TGACGTCCGTCATTACCCATACCAGCCATAGATGTTGTGATGAATGTTGGGTCACCAGAA
Ml II I! MMM II II I I II I 1111111111111 h i 3281 GTACGACC-ATCTAAACCCΛTAC ΓGCGATGGTTTC-AGTTGCCC-ACA.TTGGGTCACCAGAG 3610 3620 3630 3640 3650 3660
1049 1039 1029 1019 1009 999 dbpfl . TAAAGTT<_ATC-ATAAGCAGCTGClACGAGC-AAATTTC-aTTGTACGAAGCTTCAAAACGAAA
I ! I! Ml II III II Ml II I MMM Ml III M hi 3281 AATAATTGATCGTATTC-AGGTGTACGTAAGAAACGAACCATACGAAGTTTCATAACTAAG 3670 3680 3690 3700 3710 3720
989 979 969 959 949 939 dbpfl . TC-ATCAA.CAAATTCTTGAATTTCTTGTTC-AGTAAATGTTCCACGAGCAAGGTCACGTTCT
! MM! III I MM ! MMM hi3281 TGGTC-AACTAATTCTTGCGCTTαGTTTC-AGTAATTTTTCCTGCTTTTAAATCACGTTCG 3730 3740 3750 3760 3770 3780
929 919 909 899 889 879 dbpfl . GCzυ^GATATCAAGAACGATTGGAACACGTCCAAGTGAAGTTGCAGCTCCA-TTAATGAC
! I Ml! II I III II! II I III MMM II! II II! hi3281 ATGTAC-ACGTCAATAAAGGTTGCGGTACGACCGAATGACLA.TTGCAGC-ACC-ATTTTGTGAT 3790 3800 3810 3820 3830 3840
869 859 849 839 829 819 dbpfl . ACGACATACTGCC-ATATAAGCGATGTTAACCCATTGGATTGCTTCΓTTTACGTTCATAGC I I I II I MM I II I MMM II MMM I II i ll hi 3281 TTTA - TTGClAGCL?ΛGATAAGαWlAGTACATCCATTGAATGGCTTCTTGAGClATTAGTTGC 3850 3860 3870 3880 3890 3900
809 799 789 779 769 759 dbpfl . TGGACGTGAAACATCAAGACCΑTATAAGGCACCAAAGTTTAC-AACr-^CTTGCAAAGCTTG
Ml MM MM MM II I I I I I II II II I h i 3281 TGGGTTAGAAATATCATAACCATAGCTTGCTGCC-A.TTTGTTTTAATTGACCTAATGCACG 3910 3920 3930 3940 3950 3960
749 739 729 719 709 699 dbpfl . GTATTGCATATTAATTTCTTCTTTAAGACGAATGTTTTCTTCGTTAATTTCAGTGATTGC
II Ml MMM I MM II Ml I II I ! hi3281 GTGTTGTTCTGCGATTTCTTClACGTAAACGAATTGTTGCπTCLAAGATTT ACGCC 3970 3980 3990 4000 4010
689 679 669 659 649 639 dbpf 1. ATCCCLATTCTTTTGCTTTTTCCTTCATAAG - GTAATCAGCACCGTAAAGGGCAAGACGTG
III MM I UNI I I Ml I I I II I I I hi 3281 ATC - - - TTCTAAATCTTTTT - GTAAAGAAGAGAATTGTGCGTATTTATCTTTCATTAAGA 4020 4030 4040 4050 4060 4070
629 619 609 599 589 579 dbpfl . C-ATAGAC_ACCAATGATACGTCCACGAGAGTAAGCATCTGGCAAACCTGTTACAGTATGAG
II MMM I MM III II I Ml I M M hi3281 AATCTACACCATAAAGTGCTACACGACGGTAGTCACCGATGATACGACCACGACCATAAG 4080 4090 4100 4110 4120 4130 569 559 549 539 529 520 dbpf1. CAT- - -GACGTGCTTTACGAATTGCTGAAGTATAAGCACGAAAGAT-TCCATCATTTACA lll ll l l l I I II II Ml M II II M M ! hi3281 CATCTGGAAGACO-GTTAATACCCCAGATTTACGGCAACGTAAAATATCTGGCGTGTAAA 4140 4150 4160 4170 4180 4190
519 509 499 489 479 dbpfl . GAAGTCA- -TTGTTTGTG ACAAAACATCATGCAAGCCTGGGTCAACTGAG
III II III II II M I III III ! I M hi 3281 CATCGAATAC1ACCTTGGTTATGTGTTTTACGGTA - TT<_ΑGTGAAGATTTTTTTC_ACrTTT 4200 4210 4220 4230 4240 4250
469 459 449 439 429 419 dbpf 1. AGACCGTGTT - - C GTCAAAATCTTTTCAGCAACACGAAGTCCACCTCTTGGCATGAAAT
II I III I II II Ml II III MM II I II I hi3281 GGATCAAGTTCACGACCATAAACTTTACAAGAAC CTTCCACCATTTTG-ATACCAC 4260 4270 4280 4290 4300
409 399 389 379 369 359 dbpfl . TCAAGCGGAAAAGTTCGCTATTT TGCATCCCATAGATGAGTTCAAGTTCTTTATC
I! I I II I I III I Ml I II II III III II hi 3281 CGAATGG(_ATAATGGC1ACGTTTTAAAGGTTCATCAGTTTGA - - AGACCAACGATTTTTTC 4310 4320 4330 4340 4350 4360
349 339 329 319 309 dbpf 1. ATTAGCATCGATATATCCAGCAGGGATTTTATCAATAGAGG - TACGCGGTCAGT - - ATC
II hi3281 TAAATCTTTGTTAATGTAACCAGGTGCGTGAGAGATAATGGTAGATGGTGTATGTTCATC 4370 4380 4390 4400 4410 4420
299 289 279 269 259 249 dbpf1. GAAGGGAAATCCTACTT- -CTTC-GTAGTGATTTTTTGTATCTTCAAT AATTTTCTT
II III I I I I II I I Ml II Ml II I II I hi3281 AAAATCTAATGGCGCGTGAGTACGGTTTTCAATTTTAATACCTTCCATCACAGATTCCCA 4430 4440 4450 4460 4470 4480
239 229 219 209 199 189 dbpf1. TACTTTAAGTGTACGTTCTGTTGGCCCAGC-AAGAAAGCTTTCATCACCATCATATGGTT
I II III III MM II II MMM MM I! MM II hi3281 AAGCTTGGTTGTTGCTTCGGTTGGACCTGCTAAGAAAG-AGTCATCGCCTTCATAAGGGG 4490 4500 4510 4520 4530 4540
179 169 159 149 139 129 dbpfl . TGTAGTTTTCTTGTACAAAGCGAGTAACGCTTGCTTTATCGCGCC-AGTTGGTTCCTTTAA
I MMM III I Ml I II I I II II III! I I II II hi3281 TATAGTTTTTTTGGATAAAGTCACGTACIATTGACATTTTCTTGCCAATCGCCACCAGCAA 4550 4560 4570 4580 4590 4600
119 109 99 89 79 69 dbpfl . AACCATCCC-AAGCTTGTTCAAAGATATTTTCCGTAACTTCGGTT^
MM Ml II II II III I I II II! I I I I II I! hi3281 AACCAGCCCACGC CA- - -ATTTTTG .TTTCATTAAGTTCTGACATAGTCATTTC 4610 4620 4630 4640 4650
59 49 39 29 19 9 dbpfl . CTTTGTTCTTTCTGTTTC-AAATTAC-AACTTCATTATATCATAC-AGTTTGAGAATAGCAAA
MMM II hi3281 CTTTGTTAATTAATAAATAAATCTTTAATGTGTTTTGGTTAAATAACGTTGGAATACACC 4660 4670 4680 4690 4700 4710
dbpfl: complementary strand corresponding to nucleotides 1979-9 of SEQ ID NO: 15; hi3281: SEQ ID NO: 18 Table 3.4. Protein homology (FASTA. GCG Wisconsin Package Version 8. Genetics Computer Group) using the complete protein sequence derived from the L. lactis DB1341 pfl sequence shown in Table 3.2
Only alignment of the L. lactis Pfl protein (dbpfl.pep) with the best four scores is shown.
The Pfl protein of Streptococcus mutans was not recorded in th searched protein databases.
(Peptide) FASTA of: dbpfl.pep from: 1 to: 788 July 19, 1996 09:11
The best scores are: initl initn opt.. εw:pflb_ecoli P09373 escherichia coli. formate ac. 560 1498 1502 sw:pfl3_ecoli P42632 escherichia coli. probable f. 558 1358 1487 ε :pflb_haein P43753 haemophilus influenzae. form. 545 1228 1521 s :pfl_chlre P37836 chlamydomonas reinhardtii. f. 163 259 306 sw:fasd_ecoli P46000 escherichia coli. outer me b. 53 113 75 s :gtf2_strdo P27470 streptococcus do nei (strept. 46 110 75 s :frap_rat P42346 rattuε norvegicus (rat) . fkb. 42 101 53 s :frap_human P42345 homo sapiens (human) . fkbp-r. 42 101 53 dbpfl .pep s :pflb_ecoli
ID PFLB_ECOLI STANDARD; PRT; 759 AA. AC P09373;
DE FORMATE ACETYLTRANSFERASE 1 (EC 2.3.1.54) (PYRUVATE FORMATE- LYASE 1) . . . .
SCORES Initl: 560 Initn: 1498 Opt: 1502
42.2% identity in 732 aa overlap
10 20 30 40 50 59 dbpfl . M.O'EVTENIFEQAWDGFKGTNWRD ASvTRFVQENYKPYDGDESFLAGPTERTLKV-KKI
:: ||:||: ::|:::::| h h II : I I : I I I II III : || I :: :|: pflb_e SEI ^E-α<ATAWEGFTKGDWQNEvNVPJ,FIQKNYTPYEGDESFIJAGATEATTTLWDKV
10 20 30 40 50
60 70 80 90 100 110 dbpfl . IEDTK-NHYEEVGFPFDTDRVTSIDKIPAGYIDANDKELELIYGMQNSELFRLNFMPRGG :|::| - =:= MM :::| = : Ml MM I |:|::: :: :::| || pflb_e MEGVKLENRTHAPVDFDTAVASTITSHDAGYI NKQLEKIVGLQTEAPLKRALIPFGG
60 70 80 90 100 110
120 130 140 150 160 170 dbpfl . LRVAEKILTEHGLSVDPGLHDVLSQTMTSVNDGIFRAYTSAIRKARHAHTVTGLPDAYSR : : : | : : : : : : | | : : :: : : : : : : | : | : | : | | : : | : | : : : : | | | | | | | : | pflb_e I.α^IEGSCKAYNRE--J_)PMI- CIFTEYRKTHNQGVFDVYTPDILRCRKSGVLTGLPDAYGR
120 130 140 150 160 170
180 190 200 210 220 230 dbpfl . GRIIGVYARLALYGADYLMKEKAKEWDAI -TEIN EENIRLKEEINMQYQALQEW
MM I 1 = 1111 IIIIIM ==::: :::: I : :| I I : I | |: | : :| | : : pf lb_e GRI IGDYP^VALYGIDYl-M-ΦKI-AQFTSLQADLENGVNLEQTIRLREEIAEQHRALGQMK 180 190 200 210 220 230 240 250 260 270 280 290 dbpfl . NFGALYGLDVSRPAMNVKEAIQWVNIAYMAVCRVINGAATSLGRVPIVLDIFAERDLARG = = = l II Ml II MMIIIh =:|:|: = Ml |:||::: ||:: Ml I pf lb_e EMAAKYGYDI SGPATNAQEAIQWTYFGYLAAVKSQNGAAMS FGRTSTFLDVYIERDLKAG 240 250 260 270 280 290 300 310 320 330 340 350 dbpfl . TFTEQEIQEFVDDFVLKLRTMKFARAAAYDELYSGDPTFITTSMAGMGNDGRHRVTKMDY Mill l|:||::|:||l = = l 1 = = = I I I 1 = I I I I = = hh = ||| III III :: pflb_e KITEQEAQEMVDHLVMKLRMVRFLRTPEYDELFSGDPI ATESIGGMGLDGRTLVTKNSF 300 310 320 330 340 350 360 370 380 390 400 410 dbpfl . RF]-J^LDTIGNAPEPNLTVLWDSKLPYSFKRYSMSMSHKHSSIQYEGVETMAKDGYGEMS hi :|ll|:|:||::|ll Ml::: :M : I ll : I I :: pflb_e RFl-NTLYTMGPSPEPNMTILWSEiαPLNFKKFAAKVSIDTSSLQYENDDLMRPDFNNDDY 360 370 380 390 400 410 420 430 440 450 460 470 dbpfl . CISCCVSPLDPENEEGRHNLQYFGARVNVLKAMLTGLNGGYDDVHKDYKVFDIEPVRDEI pf lb_e AIACCVSPMIVGKQ MQFFGARANLAKTMLYAINGGVDEKLKMQVGPKSEPIKGDV
420 430 440 450 460
Figure imgf000080_0001
540 550 560 570 580 590 dbpfl . A VDSLSAIKYAKV TLRDENGYIYDYEVEGDFPRYGEDDDRADDIAKLVMKMYHEK-LA
: |:|:|h:|::h:| hlhl ::::: :|:: pflb_e SVAADSLSAIKYAKVKPIRDEDGl-AIDFEIEGEYPQFGNNDPRVDDI-AVDLVERFMKKIQ 530 540 550 560 570 580 600 610 620 630 640 650 dbpfl . SHKLYKIIAEATVSLLTITSNVAYSKQTGNSPVHKGVF'LNEDGTVNKSKLEFFSPGANPSN
: : Ml :| M ::: I : :: : pf lb_e KLHTYRDAIPTQSVLTITSNWYGKKTGNTP DG- - RRAGAPFGPGANPMHG
590 600 610 620 630 660 670 680 690 700 710 dbpfl . KAKGG LQNLRSIA-OjEFKl-ANDGISLTTQVSPRALGKTRDEQVDNLVQILDGYFTPGAL
::= I :::| hill I 1=1111 I = 1=1111= = = Mh :=|||| ==| pflb_e RDQ GAVASLTSVAKLPFAYA- GISYTFSIVPNALG- DEVTWrNIiAGI-MDGYFHHEAS 640 650 660 670 680 690 720 730 740 750 760 770 dbpfl . INGTEFAGQIiVNI- ^DL.ΦvYDKIMRGEDVIVRISGYCvNTKYLTPEQKQELTERVFHE
\:\:: = =1 = I h = = pflb_e IEGGQHLNVNVMNREMLLDAMENPEKYPQLTIRVSGYAVRFNSLTKEQQQDVITRTFTQS 700 710 720 730 740 750 dbpfl: corresponds to amino acid residueε 1-772 of SEQ ID NO: 16; pflb_e: correspondε to amino acid residues of SEQ ID NO: 14
dbpfl .pep s :pf13_ecoli ID PFL3_ECOLI STANDARD; PRT; 746 AA. AC P42632;
DE PROBABLE FORMATE ACETYLTRANSFERASE 3 (EC 2.3.1.54) (PYRUVATE FORMATE- . . .
SCORES Initl: 558 Initn: 1358 Opt: 1487 39.8% identity in 741 aa overlap
10 20 30 40 50 dbpfl . MKTEVTENIFEQAWDGFKGTNWRDKASVTRFVQENYKPYDGDESFLAGPTERTLKV-K pf13_e MKVDIDTSDKLYADA LGFKGTD KNEINVRDFIQHNYTPYEGDESFLAEATPATTEL E 10 20 30 40 50 60
60 70 80 90 100 110 dbpfl . KIIEDTK-NHYEEVGFPFDTDRVTSIDKIPAGYIDANDKELELIYGMQNSELFRLNFMPR j : : j : : : : : ; : : | I| I| II pf13_e KVMEGIRIENATHAPVDFDTNIATTITAHDAGYI - - -NQPLEKIVGLQTDAPLKRALHPF 70 80 90 100 110
120 130 140 150 160 170 dbpfl . GGLRVAEKILTEHGLSVDPGLHDVLSQTMTSVNDGIFRAYTSAIRKARHA..TVTGLPDAY
I I i pfl3_e GGINMIKSSFHAYGREMDSEFEYLFTDLRKTHNQGVFDVYSPDMLRCRKSGVLTGLPDGY 120 130 140 150 160 170
180 190 200 210 220 230 dbpfl . SRGRIIGVYARLALYGADYLMKEKAKEWDAI TEINEENIRLKEEINMQYQALQE
MUM I hllll :||: = h= = = = = = =1 h = ||| = lh= = =11 = pfl3_e GRGRIIGDYRRVALYGISYLVRERELQFADLQSRLEKGEDLEATIRLREELAEHRHALLQ 180 190 200 210 220 230
240 250 260 270 280 290 dbpfl . VVNFGALYGLDVSRPAMNVKEAIQWVNIAYMAVCRVINGAATSLGRVPIVLDIFAERDLA
= :: = l Ihhllll h:|hlh =lhh = 11 = 1 III h= Ml = I I! : pfl3_e IQEMAAKYGFDISRPAQNAQEAVQVrLYFAYLAAVKSQNGGAMSLGRTASFLDIYIERDFK 240 250 260 270 280 290
300 310 320 330 340 350 dbpfl . RGTFTEQEIQEFvDDFVLKLRTMKFARAAAYDELYSGDPTFITTSMAGMGNDGRHRVTKM
|:::||: lh = hl=:| = l =M h = = :|:hlllh= h = = l!l III ||| pfl3_e AGVLNEQQAQELIDHFIMKIRMVRFLRTPEFDSLFSGDPIWATEVIGGMGLDGRTLVTKN 300 310 320 330 340 350
360 370 380 390 400 410 dbpfl . DYRFI-NTLDTIGNAPEPNLTVLWDSKLPYSFKRYSMSMSHKHSSIQYEGVETMAKDGYGE
= = l = hll = | = | MMM hl|:::| I =11 = 1= =1 I hi 11= = I =! = = pfl3_e SFRYLHTLHTMGPAPEPNLTILWSEELPIAFKKYAAQVSIVTSSLQYENDDLMRTDFNSD 360 370 380 390 400 410
420 430 440 450 460 470 dbpfl . MSCISCCVSPLDPENEEGRHNLQYFGARVNVLKAMLTGLNGGYDDVHKDYKVFDIEPVRD hi Nil: ": = I = Ml I = h I = = 1 = = I II h I = = = | = = | pf13_e DYAIACCVSPMVIGKQ MQFFGARANLAKTLLYAINGGVDEKLKIQVGPKTAPLMD 420 430 440 450 460 470 480 490 500 510 520 530 dbpfl . EII-DYDTVMENFDKSI-D LTDTYVDAMNII--YMTDKYNYEAVQMAFLPTKVRANMGFGIC
= = lllhll = = = h :||h h = llhlll 11= =1 =1= II pfl3_e DVIXiYDKvϊΦS]-JDH-T!mWI_AVQYISA-- iHYMHD 480 490 500 510 520 530
540 550 560 570 580 590 dbpfl . GFANTVDSLSAIKYAKVKTLRDENGYIYDYEVEGDFPRYGEDDDRADDIAKLVMKMYHEK l = = ::|||||||lhl|::||lll :|::| = I I ::1 = |: I : I I = = = :: :| pfl3_e GLSVATDSLSAIKYARVKPIRDENGLAVDFEIDGEYPQYGNNDERVDSIACDLVERFMKK 540 550 560 570 580 590
600 610 620 630 640 650 dbpfl . I-ASH- LYIOSIAEATVSLLTITSNVAYSKQTGNSPVHKGVFLNEDGTVNKSKLEFFSPGANP
= = hi I :| hMIM = h = = !lhl I! = • = 1 = = = pf13_e IKALPTYRNAVPTQSILTITSNWYGQKTGNTP DG- -RRAGTPFAPGANPM 600 610 620 630 640
660 670 680 690 700 710 dbpfl . SNKAKGGWLQNLRSLAKLEFKDANDGISLTTQVSPRALGKTRDEQVDNLVQILDGYFTPG
= = = = I =: = l hill h hull I = I lllh = =111 =1111! = = pfl3_e HGRDRKGAVASLTSVAKLPFTYAKDGISYTFSIVPAALGKEDPVRKTNLVGLLDGYFHHE 650 660 670 680 690 700
720 730 740 750 760 770 dbpfl . ALINGTEFAGQHVNLNVMDLKDVYDKIMRGEDVIVRISGYCVNTKYLTPEQKQELTERVF
I ::|:: : :| = | |=== pfl3_e ADVEGGQHLNVNVMNREMLLDAIEHPEKYPNLTIRVSGYACASTH 710 720 730 740 dbplf: coreresponds to amino acid residues 1-770 of SEQ ID NO: 16; pfll3_e: SEQ ID NO: 19
dbpfl .pep ε :pflb_haein ID PFLB_HAEIN STANDARD; PRT; 769 AA. AC P43753;
DE FORMATE ACETYLTRANSFERASE (EC 2.3.1.54) (PYRUVATE FORMATE- LYASE) . . . .
SCORES Initl: 545 Initn: 1228 Opt: 1521 42.1% identity in 781 aa overlap
10 20 30 40 50 59 dbpfl . MKTEVTENIFEQAWDGFKGTNWRDKASVTRFVQENYKPYDGDESFLAGPTERTLKV-KKI
I IN I |::|:::::| 1 = 1 = 1 hi hi hill III II I 1= = = = pflb_h SEI-NEMQ- -AAGFAGGDWQENVNVRDFIQKNYTPYEGDDSFLAGPTEATTKL ESV 10 20 30 40 50
60 70 80 90 100 110 dbpfl . IEDTK-NHYEEVGFPFDTDRVTSIDKIPAGYIDANDKELELIYGMQNSELFRLNFMPRGG
:|::| == === = ll== ==l = ==lll =1=11 I hl==! == ==ll II pflb_h MΞGIKIENRTHAPLDFDEHTPSTIISHAPGYI NKDLEKIVGLQTDEPLKRAIMPFGG 60 70 80 90 100 110
120 130 140 150 160 170 dbpfl . LRVAEKILTEHGLSVDPGLHDVLSQTMTSVNDGIFRAYTSAIRKARHAHTVTGLPDAYSR
::::| = =1 ==ll ======= === 1=1=1 =lh=l = !== ==lllll!h! pflb_h IKMVEGSCKVYGRELDPKVKKIFTEYRKTHNQGVFDVYTPDILRCRKSGVLTGLPDAYGR 120 130 140 150 160 170 180 190 200 210 220 230 dbpfl . GRIIGVYARLALYGADYLMKEKAKEWDAI TEIN-EENIRLKEEINMQYQALQEW
MM I hill hi = 111 = 1 = = = = = =::| |::|M = llh h:|l = = pflb_h GRIIGDYRRVALYGVDFLMKDKYAQFSSLQKDLEDGVNLEATIRLREEIAEQHRALGQLK 180 190 200 210 220 230
240 250 260 270 280 290 dbpfl . NFGALYGLDVSRPAMNVKEAIQWVNIAYMAVCRVINGAATSLGRVPIVLDIFAERDLARG
= = = l II hhll h = llllh =lhh = III I h I h = = = h = I II I I pflb_h QMAASYGYDISNPATNAQEAIQWMYFAYLAAIKSQNGAAMSFGRTATFIDVYIERDLKAG 240 250 260 270 280 290
300 310 320 330 340 350 dbpfl. TFTEQEIQEFVDDFVLKLRTMKFARAAAYDELYSGDPTFITTSMAGMGNDGRHRVTKMDY
= = ll 1 lhlh:|:||| ::| |:::||: |:|IN = I = - = I I I I III III = = pflb_h KITETEAQELVDHLVMKLRMVRFLRTPEYDQLFSGDPMATETIAGMGLDGRTLVTKNTF 300 310 320 330 340 350
360 370 380 390 400 410 dbpfl . FLNTLDTIGNAPEPNLTVLWDSKLPYSFKRYSMSMSHKHSSIQYEGVETMAKDGYGEMS hhll = = h = lllllhlh = = ll =llh= = = l = I hill = = I I pflb_h RILHTLYNMGTSPEPNLTIL SEQLPENFKRFCAKVSIDTSsVQYENDDLMRPDFNNDDY 360 370 380 390 400 410
420 430 440 450 460 470 dbpfl . CISCCVSPLDPENEEGRHNLQYFGARVNVLKAMLTGLNGGYDDVHKDYKVFDIEPVRDEI hllllh = = = =1 = 1111 = 1= h:| = = lll h = = = h 11 = pflb_h AIACCVSPMIVGKQ MQFFGARANLAKTLLYAINGGIDEKLGMQVGPKTAPITDEV 420 430 440 450 460
480 490 500 510 520 530 dbpfl . LDYDTVMENFDKSIii LTDTY DAMNIIHYMTDKYNYEAVQMAFLPTKVRANMGFGICGF lh!llh: = h MIN: Ihhhllll llhllh 11= =1 =h II 1 = pflb_h LDFDTVMTRMDSFMDWLAKQYVTAIiNVIHYMHDKΥSYEAALMALHDRDVYRTMACGIAGL 470 480 490 500 510 520
540 550 560 570 580 dbpfl . ANTVDSLSAIKYAKVKTLR DENGYI YDYEVEGDFPRYGEDDDRADDIAKL
= ::|| II III II II MM h = l = h I = 11 = = 1 = II = = 1 = h II II pflb_h SVAADSLSAIKYAKVKPVRGDIKDKDGNVVATNVAIDFEIEGEYPQYGNNDNRVDDIACD 530 540 550 560 570 580
590 600 610 620 630 640 dbpfl . VMKllYHEiα.MHKLYiαAEATVSLLTITSNVAYSKQTGNSPVHKGVFI-l^DGTVN SKLE
= = = = = =1 = = = I hi I =1 1 = 1111111 = 1 = 1 = 111 = 1 I! = = pf lb_h LVERFMK IQKLKTYRNAVPTQSVLTITSNWYGKKTGNTP DGRRAGAP- - 590 600 610 620 630
650 660 670 680 690 700 dbpfl . FFSPGANPSN-KAKGGVπ-QNLRSLA-OiEFIΦANDGISLTTQVSPRALGKTRDEQVDNLVQ hlllll = = = = I :::! hill I 1 = 1111 I = hlllh ::| 11 = pf lb_h - FGPGANPMHGRDQKGAVASLTSVAKLPFAYAKDGISYTFSIVPNALGKDAEAQRRNLAG 640 650 660 670 680 690
710 720 730 740 750 760 dbpfl . IliDGYFTPGALINGTEFAGQHVNI-NVMDLKDVYDKIMRGEDVIVRISGYCVNT YLTPEQ
I I pflb_h LMDGYFHHEATVEGGQHLNVNV-LNREMLLDAMENPDKYPQLTIRVSGYAVRFNSLTKEQ 700 710 720 730 740 750 770 780 dbpfl. KQELTERVFHEVLSNDDEEVMHTSNIX (SEQ ID NO: 16)
=l====hl I = pflb_h QQDVITRTFTESM (SEQ ID NO: 20) 760
dbpfl .pep sw:pfl_chlre
ID PFL_CHLRE STANDARD; PRT; 195 AA. AC P37836; DE FORMATE ACETYLTRANSFERASE (EC 2.3.1.54) (PYRUVATE FORMATE - LYASE) . . .
SCORES Initl: 163 Initn: 259 Opt: 306
38.0% identity in 213 aa overlap
540 550 560 570 580 590 dbpfl. NTvDSLSAIKYA VTLRDENGYIYDYEVEGDFPRYGEDDDRADDIAKLVMiαyrYHEKLAS hlhlhlllhhlh h= = =111 = pfl_Ch GSFPKYGNDDDRVDEIAE WSTFSSKLAK
10 20 30
600 610 620 630 640 650 dbpfl . H-OiYiαiAEATVSLLTITSNVAYSKQTGNSPVHKGVFLNEDGTVNKSKLEFFSPGANP - SN
= = \ : \ = I = I = I I I I I I I = I = 1 = I h = I I I : : | I M I N I = pfl_ch QHTYRNSVPTLSVLTITSNWYGKKTGSTP DG RKKGEPFAPGANPLHG
40 50 60 70
660 670 680 690 700 710 dbpfl. KAKGGVπQNLRSl-AKLEFKDANDGISLTTQVSPRALGK-TRDEQVDNLVQILDGYFTPGA
= = I h = hl = M == Ml I = h = lh = :|:::||: lllllh h pfl_ch RDAHGALASLNSVAKLPYTMCLDGISNTFSLIPQVLGRGGEHERATNLASILDGYFANGG 80 90 100 110 120 130
720 730 740 750 760 770 dbpfl. LINGTEFAGQHVNLNVMDLKDVYDKIMRGEDVIVRISGYCVNTKYLTPEQKQELTERVFH ::: :: : ::::: : | = = = = ! = Ill 1= Ihlh h = = hli pfl_ch HHINVNVLNRSMLMDAVEHPEKY PNLTIRVSGYAVHFARLTREQQLEVIARTFH
140 150 160 170 180 190
780 dbpfl. EVLSNDDEEVMHTSNIX (corresponding to amino acid residues 535-788 of SEQ ID NO:16) pfl_ch DTM (SEQ ID NO: 21)
The highest homology value obtained when analysing the sequence from clone pfll corresponds to the S. mutans pfl gene (Table 3.1), i.e. about 80% at the DNA level, in the region covered by the probe used for library screening and 68.5% for the 1.1 kb pfl fragment analyzed. Sequence comparisons indicated that the fragment included in clone pfll encompasses 367 amino acids of the C- terminal regio of the L. lactis pfl gene. Therefore, about 1.3 kb of the 5'- end of the pfl gene was lacking.
A 0.6 kb Pstl-EcoRI fragment of clone pfll, spanning from the polylinker (Pstl site) and including a fragment spanning from positions 1342-2003 in the sequence shown in Table 3.2, was randomly labelled and used for screening a λZAP genomic librar of strain DB1341 (Sambrook et al . , 1989) to get the upstream region of the pfl gene. High stringency hybridization (washing steps at 65°C, 2 x 30 min in 2 x SSC, then 1 x 30 min in 0.1 x SSC; 0.1 % SDS) resulted in the isolation of twelve positive clones.
Sequence analysis of clones pfl9, pfllO, pfll9 and pfl20 showe that they included the same pfl fragment as did clone pfll. Restriction analysis of the above clones showed that they all contained a 460 bp Sau3AI fragment identical to pfll (position 1342-1798 in Table 1.2) . Only clone pfll4 showed a different Sau3AI restriction pattern. This clone lacked the above Sau3AI fragment and had a 600 bp fragment that hybridized to the Pstl- .EcoRI pfl probe, suggesting that rearrangement of the insert occurred during in vivo excision of the plasmid. Sequence analysis of pfll4 confirmed that it included a pfl fragment that lacked the Sau3AI site at position 1 in clone pfll, but showed sequence identity from position 30 onwards in clone pfl (position 1372 in Table 3.2). It is therefore likely that the presence of an intact L. lactis pfl gene is toxic in E. coli and leads to plasmid rearrangement.
4. Inverse PCR to obtain the complete pfl sequence of L. lacti DB1341
To facilitate the characterization of the 5' region of the L. lactis pfl gene from strain DB1341 inverse PCR was used. EcoRI- digested genomic DNA of strain DB1341 was religated at low concentration (Sambrook et al . , 1989) and PCR was carried out using primers pfll-250 and pfl-390 (see Fig. 4). A 1.6 kb fragment that contained the lacking 421 codons and the upstream region of the L. lactis pfl sequence (positions l to 1342 in Table 3.2) was amplified. This PCR fragment was re-amplified from JϊcoRI-digested and religated DB 1341 DNA using modified primers pfll-250 (including an Xhol site at the 5' -end) and pfll-390 (including a BamHI site at the 5' -end) and the amplified product was digested with Xhol and BamHI and ligated into vector pGE digested with the same enzymes and transformation of E. coli DH5α resulted in strain pflup-1. The L. lactis DB1341 pfl gene encodes a 787 amino acid protein (Tables 3.2, 3.4 and 3.6) with a deduced molecular weight of 89.1 kDa.
A sample of E. coli DH5o. strain pflup-1 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11087.
5. Cloning of the pfl upstream sequence from L. lactis DB1341
Inverse PCR was carried out on Hhal-digested and religated chromosomal DNA of strain DB1341, using primers derived from the above sequence (Table 3.2). The Hhal fragment spans about 1.7 kb from position 1 to 1707 in the below sequence which overlaps the sequence shown in Table 3.2 from position 1563 to 1750.
Table 3.5. pfl upstream sequence from L . lactis DB1341
Hhal
1 GCGCCTAGATAAGAAACAGCAACAGCTAAAAGATAGGTATCAAAAGCACT 50 51 TGATTTAAAAATAATGACTTTATCCGATTTTTTGATTCCCAACTCAGATA 100
101 AGAGACTTGCCTTATCAA(_ΑΛTTGCTTGATGAGTCTTTTGGTAAGTCGTT 150
151 TCAAGAGCTAGTTCGGGGAAAGCTCCAACAGCCTCATCAAAGATAATTGG 200
201 GCTATC.AGGAAACTGTTCAGCTGATTTTTTAAAGTTTAGATACAAATTTA 250 251 GGGGTTCGTGTTTGAATTTO ^AAAAATCTCCTCAAGTTAATAAGTTTA 300 301 TTATATCAC1AAAGTATTCTTTAGACC-AATAGTTAATGTAAATGTTTTCTT 350 351 AAGTCGTAGAGAATAAAATTCTCGGAAAAAAAGTCTAAAATCTGCTACAA 400 401 TTAAAGGGACA ΓAAGAGGATTCC-AATCCTCTTTTATCAGGAAAAGAAGG 450 451 GATAGATAGGAAAATGATTAAAAATTATGAACTATCCAACGAAAAAAAAT 500 orfA M I K N Y E L S N E K K L 501 TAATTTCAACCTCTGAAATGAAGAATTTCACCTATGTTCTCAATCCAACA 550
I S T S E M K N F T Y V L N P T 551 CGTGAAGAAATTGGGAATATTTCTGAATACTATGACTTCCCTTTTGACTA 600 R E E I G N I S E Y Y D F P F D Y
601 TTTATCAGGAATTTTGGATGACTATGAAAATGCCCGTTTTGAAACAGATG 650
L S G I L D D Y E N A R F E T D D 651 ATAATGATAATAATCTGATTCTCTTACAATATCCTCCACTCTCTAATTAT 700 N D N N L I L L Q Y P P L S N Y 01 GGAGAAGTGGCGA<ZTTTTCCATATTCTTTGGTTTGGACTAAAAATGAATC 750 G E V A T F P Y S L V W T K N E S 751 GGTTATTTTAGCACTTAATCATGAGATTGATAATGGCTTAATTTTCGAGC 800
V I L A L N H E I D N G L I F E R 801 GTGAATATGATTATAAACGCTAC-AAACΛTαΛGTTATTTTTCAAGTGATG 850 E Y D Y K R Y K H Q V I F Q V M
851 TATO=-AATGA .C-ACACTTTCCATGATTATTTGAGAGATTTCCGAAC-AAG 900
Y Q M T H T F H D Y L R D F R T R
901 GCGTCGCAGACTTGAACAGGGAATC_?UUWΛTTαΛCAAAGAACGACCAAA 950
R R R L E Q G I K N S T K N D Q I 951 TTGTTGATTTGATTGCCATTC1AAGCLAAGTTTAATTTATTTTGAAGATGCC 1000
V D L I A I Q A S L I Y F E D A
1001 TTGCACAATAATATGC-AAGTACTTCAGGATTTTATTGATTACTTGAGAGA 1050
L H N N M Q V L Q D F I D Y L R E 1051 AGATGATGAAGACGGTTTTGCTGAAAAGATTTATGATATTTTTGTCGAAA 1100 D D E D G F A E K I Y D I F V E T
1101 CAGACCAAGCTTATAC-AGAAACCAAGATTCAGCTCAAGTTACTAGAAAAT 1150
D Q A Y T E T K I Q L K L L E N 1151 CTCCGAGATTTGTTCTCAAACAATGTCTCTAATAACTTGAACATTGTCAT 1200 L R D L F S N N V S N N L N I V M 1201 GAAAAT<_ΛTGACΑTCAGCTACTTTCGTTCTAGGGATTCCTGCAGTAATTG 1250
K I M T S A T F V L G I P A V I V 1251 TTGGTTTTTACGGAATGAATGTTCCAATTCCTGGTCAAAATTTTAATTGG 1300
G F Y G M N V P I P G Q N F N W 1301 ATGGTTTGGCTTATTTTAGTTCTAGGAATTTTATTATGTGTTTGGGTCAC 1 50 M V W L I L V L G I L L C V W V T
1351 TTGGTGGTTACATAAAAAAGATATGTTATAAAATGGAGAAAAATCTCCAT 1400
W L H K K D M L Stop (SEQ ID NO: 35) 1401 TTTTTTGCTCTTTGTGAAAAAATTAATTAGTGATTGCAGATTATGAAGTT 1450 1451 AGCAATGTTTGTTAAAACTATTTTGTGAATTATTTATGAAAACGTTTTAA 1500 1501 AAAAGTATAACAGATATTAAAATAATTGGAACTGTATTAGTAAAGAATCT 1550
EcoRI 1551 GTAATTTCTCTTGAATTCTGTTTGCTATTCTCAAACTGTATGATATAATG 1600 1601 AAGTTGTAATTTGAAAC-AGAAAGAA(_-Z_AAGGAGATTTCAAAATGAAAACC 1650 pfl M K T 1651 GAAGTTACGGAAAATATCITTGAACAAGCTTGGGATGGTTTTAAAGGAAC 1700 E V T E N I F E Q A D G F K G T Hhal 1701 CϊACrGGCGCGATAAAGC-AAGCGTTACTCGCTTTGTACAAGAAAACTACA 1750
N R D K A S V T R F V Q E N Y K
Nucleotides 1-1750: SEQ ID NO: 34
The sequence included an open reading frame, designated orfA encoding a putative 37 kDa protein with no relevant homology to any sequence in available databases.
EXAMPLE 4
Characterization of L. lactis orfA encoding a putative transporter protein
In gram-negative bacteria, the pfl gene is located downstream of an open reading frame transcribed with focA that codes for a putative membrane-bound formate transporter (Suppmann and Sawers 1994). This genetic organization is conserved in E.coli and H. influenzae but has shown great variation in streptococci (Arnau et al. 1997) . In L. lactis, the orfA gene is located immediately upstream of pfl. An open reading frame is also found upstream of the pfl gene in Streptococcus mutans that showed no homology to the L. lactis orfA.
In E. coli, growth under anaerobiosis results in the synthesis of large amounts of PFL protein, about 3% of the total protein content (Suppmann and Sawers 1994) . Consequently, high amounts of formate are formed intracellularly. At physiological intracellular pH in E. coli formate (low pKa, 3.75) is not dissociated and therefore is not membrane-permeable. Thus, there is a requirement for a specific transporter to remove th excess formate in the cells.
In the following the novel orfA gene of L . lactis and its gene product is characterized.
1. The orfA gene structure, protein homology and structure
Sequence analysis of orfA (see Table 3.5. above) showed a "weak" RBS (AGG) and a consensus -10 promoter region upstream of the ATG start codon. No -35 consensus region was identified, suggesting a low expression level for this gene. The deduced protein encoded by orfA, consisting of 306 amino acids and a size of 37 kDa, showed homology (38% identity at the C- terminus) to a 37 kDa putative lactococcal protein (Donkersloot and Thompson 1995) and to a less extent to numerous membrane-bound transporter proteins. A prediction of the structure of OrfA suggested the presence of a large intracellularly located N- terminal region followed by two transmembrane domains, Leu242 to Phe265 and Asn276 to Val294 (Fig. 6) . These features are consistent with a possible role of the protein in transport across the cell membrane, although neither sequence homology nor structural similarities with the E. coli FocA protein could be identified. A molecular prediction of the FocA protein showed the presence of six transmembrane domains, but among the related proteins a certain variation in the number of these domains is found. In fact, one of these proteins, the E. coli NirC has four and not six of these domains in its primary sequence (Suppmann and Sawers 1994) .
2. Ea ression of orfA
RNA was isolated from aerobic and anaerobic cultures of L. lactis MG1363 grown in fermenters at 30°C. Using an orfA specific probe (Fig. 7A) , Northern blot hybridization was carried out. As shown (Fig. 7B) , a low level of expression was observed under the conditions used, which is in agreement with the sequence analysis (lack of -35 region, short RBS) of the upstream region of orfA and with the level of expression expected for a gene coding a membrane associated protein.
No anaerobic induction was observed in GM17 or GalM17 during exponential growth. In GM17 a lower expression of orfA was detected as compared to GalM17 and virtually no expression of the gene was observed during stationary phase.
3. Construction and analysis of or£A mutant strains in L. lactis MG1363.
In order to determine whether orfA is the focA analogue in L. lactis, two mutant strains of MG1363 were constructed. A null mutation was carried out by gene disruption using an internal fragment of the orfA gene (including codons 30-168, Fig 7A) , cloned into the integrative vector pSMA500 and transformed into MG1363. One transformant (MG1363ΔorfA) that formed light blue colonies on X-gal was selected. An orfA multicopy strain was constructed by cloning of the entire coding sequence and promoter region of this gene in pAKΘO and transforming into MG1363. As above, a transformant giving blue colonies in X-gal was selected (MG1363 pAKΘO : -. orfA) .
In E. coli , a focA null mutant strain was capable of growing at higher sodium hypophosphite concentrations than was the wild type strain. This compound is a formate analogue that is toxic. Thus, transport of hypophosphite into the cytosol via the FocA channel protein is deleterious for the cells (Suppmann and Sawers 1994) . If the OrfA protein has a similar function in . lactis as does FocA in E. coli , then a null mutant should show an increased resistance to hypophosphite and a strain containing multiple copies of the gene should be more sensitive to this compound than the wild type. As shown in Fig. 8, strain MG1363 showed reduced growth when the medium was supplemented with 500 mM of hypophosphite and it did not grow at 600 mM.
MG1363ΔorfA grew at 600 mM and was unable to grow at higher concentrations. The orfA multicopy strain, MG1363 pAKΘO:: orfA was completely unable to grow at 500 mM hypophosphite. Thus, these results confirmed that OrfA may represent a formate transporter protein in L . lactis.
The mutant strains constructed included a translational fusion of the orfA gene to the lacLM reporter gene (Madsen et al . 1996) . The effect of the addition of formate to the medium on the expression of orfA was studied. To exclude a possible toxic effect of the addition of formate to the medium, a dosis curve was studied. Growth inhibition of the wild type strain was observed at formate concentrations exceeding 10 mM. Expo- nentially growing cultures (OD600 about 1) were used to measure -galactosidase after the addition of 10 mM of formate to the growth medium. As shown in the below Table 3.6 similar levels of /3-galactosidase were observed in MG1363ΔorfA independently of the addition of formate or the growth conditions.
Table 4.1. Analysis of orfA expression in mutant strains of L. lactis strains.5
Aerobic Anaerobic
STRAIN +Formate -Formate +Formate -Formate
MG 1363Δor/Λ 9.1 + 0.3 8.2 + 0.7 7.5 + 0.7 6.2 + 0.1
G 1363PAK80::o// 14.6 + 0.2 16.7 + 0.7 13.2 + 0.1 13.2 + 1.3 a) -galactosidase activity in exponentially growing cultures. At OD600 about 1, formate was added (+ formate) and the cultures were incubated further for 15 min before cells were separated by centrifugation and frozen.
Higher levels were observed in all cases with the multicopy strain MG1363 pAKΘO :: orfA. These levels, about 2-fold higher, did not correlate with the number of copies (5-10 per cell) expected in this strain. A degree of regulation of expression may exist for orfA in L . lactis to ensure an appropriate level of the OrfA protein.
EXAMPLE 5
Isolating and characterizing the pfl gene from L. lactis subspecies lactis MG1363
1. Cloning of a fragment of the pfl gene
A pfl fragment was amplified with the above modified primers pfll-20 and pfll-1066 from chromosomal DNA of strain MG1363 (see Fig. 4) . This fragment was digested and cloned into the vector pGEM digested with Xhol and BamHI, respectively and transformed into E. coli strain DH5o. (Stratagene) , resulting in strain MGpfl-1. The fragment was sequenced using the relevant primers derived from the sequence of the DB1341 pfl fragment (see Fig. 4) .
The sequence of the MG1363 pfl fragment showed 48 differences (42 base changes and a 6 bp deletion) in the 1 kb region char- acterized when compared to the corresponding sequence of the DB1341 pfl (below Table 5.1). The deduced Pfl protein fragment encoded by the characterized pfl sequences of strains MG1363 and DB1341 showed high homology. Only four sequence differences are found in a 336 amino acid stretch (below Table 5.1) : two amino acid substitutions (Pro447 to Thr473 and Asn486 to Asp486 in Table 5.1) and two adjacent deletions (Asp454 -Asp455) encode by the DB1341 pfl gene. The latter two residues are also present in the protein encoded by the S. mutans pfl gene.
A sample of E. coli DH5o. strain MGpfl-1 was deposited under th Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11088.
Table 5.1. Homology between the DNA sequences of a fragment of the pfl gene fragment isolated from L. lactis strains DB1341 (db!341pfl) and a fragment of the pfl gene of MG1363 (mg!363- pfl)
The comparison starts at the position of the Sau3AI site in th L. lactis DB 1341 pfl gene (position 1342 in Table 3.2) .
50 mgl363pfl dbl341pfl GATCCAGAAA ATGAAGAAGG ACGTCATAAC CTCCAATACT TTGGTGCGCG consensus
51 100 mgl363pfl TGTTACCTGG TTTGAACGGT GGTTAC.... dbl341pfl TGTAAACGTC TTGAAAGCAA TGTTGACTGG TTTGAACGGT GGTTATGATG consensus TGTT.. CTGG TTTGAACGGT GGTTA
101 150 mgl363pfl .. GTTCATAA AGATTATAAA GTATTCGATA TTGAACCTGT TCGTGATGAA dbl341pfl ACGTTCATAA AGATTATAAA GTATTCGACA TCGAACCTGT TCGTGACGAA consensus ..GTTCATAA AGATTATAAA GTATTCGA.A T.GAACCTGT TCGTGA.GAA
151 200 mgl363pfl ATTCTTGACT ATGATACAGT TATGGAAAAC TTCGACAAAT CACTCAACTG dbl341pfl ATTCTTGACT ATGATACAGT TATGGAAAAC TTTGACAAAT CTCTCGACTG consensus ATTCTTGACT ATGATACAGT TATGGAAAAC TT.GACAAAT C.CTC.ACTG
201 250 mgl363pfl GTTGACAGAT ACTTATGTTG ATGCAATGAA TATCATTCAC TACATGACTG dbl341pfl GTTGACTGAT ACTTATGTTG ATGCAATGAA TATCATTCAT TACATGACTG consensus GTTGAC.GAT ACTTATGTTG ATGCAATGAA TATCATTCA. TACATGACTG
251 300 mgl363pfl ACAAATATAA CTATGAAGCA GTTCAAATGG CCTTCTTGCC TACTAAAGTT dbl341pfl ATAAATATAA CTATGAAGCA GTTCAAATGG CCTTCTTGCC TACTAAAGTT consensus A.AAATATAA CTATGAAGCA GTTCAAATGG CCTTCTTGCC TACTAAAGTT 301 350 mgl363pfl CGTGCTAACA TGGGATTTGG TATCTGTGGT TTCGCAAATA CAGTTGATTC dbl341pfl CGTGCTAACA TGGGATTTGG TATCTGTGGA TTCGCAAATA CAGTTGATTC consensus CGTGCTAACA TGGGATTTGG TATCTGTGG. TTCGCAAATA CAGTTGATTC 351 400 mgl363pfl ACTTTCAGCG ATTAAATATG CTAAAGTTAA AACTTTGCGT GATGAAAATG dbl34lpf1 ACTTTCAGCA ATTAAATATG CTAAAGTTAA AACATTGCGT GATGAAAATG consensus ACTTTCAGC. ATTAAATATG CTAAAGTTAA AAC.TTGCGT GATGAAAATG
401 450 mgl363pfl GCTACATCTA CGATTATGAA GTAGAAGGTG ACTTCCCACG TTATGGTGAA dbl341pfl GCTATATCTA CGATTACGAA GTAGAAGGTG ATTTCCCTCG TTATGGTGAA consensus GCTA.ATCTA CGATTA.GAA GTAGAAGGTG A.TTCCC.CG TTATGGTGAA
451 500 mgl363pfl GATGATGACC GTGCTGATGA TATCGCTAAA CTTGTCATGA AAATGTACCA dbl34lpfl GATGATGATC GTGCTGATGA TATTGCTAAA CTTGTCATGA AAATGTACCA consensus GATGATGA.C GTGCTGATGA TAT.GCTAAA CTTGTCATGA AAATGTACCA
501 550 mgl363pfl TGAAAAATTA GCTTCACACA AACTTTACAA AAATGCTGAA GCTACTGTTT dbl341pfl TGAAAAATTA GCTTCACACA AACTTTACAA AAATGCTGAA GCTACTGTTT consensus TGAAAAATTA GCTTCACACA AACTTTACAA AAATGCTGAA GCTACTGTTT
551 600 mgl363pfl CACTTTTGAC AATCACATCT AACGTTGCTT ACTCTAAACA AACTGGTAAC dbl34lpfl CACTTTTGAC AATTACATCT AACGTTGCTT ACTCTAAACA AACTGGTAAT consensus CACTTTTGAC AAT.ACATCT AACGTTGCTT ACTCTAAACA AACTGGTAA.
601 650 mgl363pfl TCTCCAGTTC ATAAAGGAGT ATTCCTCAAT GAAGATGGTA CAGTCAACAA dbl341pfl TCTCCAGTAC ATAAAGGAGT ATTCCTCAAT GAAGATGGTA CAGTAAATAA consensus TCTCCAGT.C ATAAAGGAGT ATTCCTCAAT GAAGATGGTA CAGT.AA.AA
651 700 mgl363pfl ATCTAAACTT GAATTCTTCT CACCAGGTGC TAACCCATCT AACAAAGCTA dbl341pfl ATCTAAACTT GAATTCTTCT CACCAGGTGC TAACCCATCT AATAAAGCTA consensus ATCTAAACTT GAATTCTTCT CACCAGGTGC TAACCCATCT AA.AAAGCTA
701 750 mgl363pfl AAGGTGGATG GTTGCAAAAT CTTCGTTCAT TAGCTAAATT GGAATTCAAA dbl341pfl AGGGTGGTTG GTTGCAAAAC CTTCGCTCAT TGGCTAAGTT GGAATTCAAA consensus A.GGTGG.TG GTTGCAAAA. CTTCG.TCAT T.GCTAA.TT GGAATTCAAA
751 800 mgl363pfl GATGCAAATG ACGGTATTTC ATTAACTACT CAAGTTTCTC CTCGTGCACT dbl34lpfl GATGCAAATG ATGGTATTTC ATTGACTACT CAAGTTTCAC CTCGTGCACT consensus GATGCAAATG A.GGTATTTC ATT.ACTACT CAAGTTTC.C CTCGTGCACT
801 850 mgl363pfl TGGTAAAACT CGTGATGAAC AAGTAGATAA CTTGGTTCAA ATTCTTGATG dbl341pfl TGGTAAAACT CGTGATGAAC AAGTGGATAA CTTGGTTCAA ATTCTTGATG consensus TGGTAAAACT CGTGATGAAC AAGT.GATAA CTTGGTTCAA ATTCTTGATG
851 900 mgl363pfl GATACTTCAC ACCAGGAGCT TTGATTAATG GTACTGAATT TGCAGGTCAA dbl341pfl GATACTTCAC ACCAGGTGCT TTGATTAATG GTACTGAATT TGCAGGTCAA consensus GATACTTCAC ACCAGG.GCT TTGATTAATG GTACTGAATT TGCAGGTCAA
901 950 mgl363pfl CACGTTAACT TGAACGTTAT GGACCTTAAA GATGTTTACG ATAAAATCAT dbl341pfl CACGTTAACT TGAACGTAAT GGACCTTAAA GATGTTTACG ATAAAATCAT consensus CACGTTAACT TGAACGT.AT GGACCTTAAA GATGTTTACG ATAAAATCAT 951 1000 mgl363pfl GCGTGGTGAA GATGTTATCG TTCGTATCTC TGGATACTGT GTTAACACTA dbl341pfl GCGTGGTGAA GATGTTATCG TTCGTATCTC TGGTTACTGT GTCAATACTA consensus GCGTGGTGAA GATGTTATCG TTCGTATCTC TGG.TACTGT GT.AA.ACTA
1001 1050 mgl363pfl AATACCTCAC ACCTGAACAA AAACAAGAAT TGACTGAACG TGTCTTCCAT dbl341pfl AATACcTCAC ACCAGAACAA AAACAAGAAT TAACTGAACG TGTCTTCCAT consensus AATACCTCAC ACC.GAACAA AAACAAGAAT T.ACTGAACG TGTCTTCCAT
1051 1100 mgl363pfl GAAGTACTTT CAAACGATGA TGAAGAAGTA AT (SEQ ID NO: 22) dbl341pfl GAAGTTCTTT CAAACGATGA TGAAGAAGTA ATGCATACTT CAAACATCTA consensus GAAGT.CTTT CAAACGATGA TGAAGAAGTA AT
1101 1150 dbl341pfl ATTCTTAAAA TTTAATGAAT ATTCGGTCTG TCAGTTTTAC TGACAGACTT consensus
1151 1200 dbl341pfl TTTTTTACGA AAAAATTAAT CATAATAGTT AAAAACTATT GTTTTTAGTT consensus 1201 1250 dbl341pfl TAAGAAAGTT AAATTTTATG CTAAAATAGA TGAATGAAAA TGGTAATTGG consensus
1251 1300 dbl341pfl ATTGACAGGC GGAATTGCGA KTGGGAAATC AACGGTGGTT GATTTTTTGA consensus
dbl341pfl: corresponding to nucleotides 1342-2641 of SEQ ID NO:15
Table 5.2. Multialignment of the putative Pfl protein from L. lactis strains MG1363 (partial sequence: l) and DB1341 (2) wit the deduced amino acid sequences of known cloned bacterial pfl genes
The L. lactis Pfl proteins were aligned with the following known Pfl proteins: deduced proteins of S. mutans pfl (3); E. coli pfl3 and pflb genes (Accession Nos. P42632 and P09373; 4 and 5) ; H. influenzae Pfl (6) ; C. pasteurianum Pfl (7) .
Consensus (con) shows conserved positions (bold) among all of the protein sequences. The four amino acid differences between the MG1363 and DB1341 Pfl are shown in underlined, bold at the top (1) .
Figure imgf000096_0001
480
1 LPGLNG GY^VHRDYK VFDIEPVRDE 2 SCISCCVSPL DPENEEGRHN LQYFGARVNV LKAMLTGLNG GYDDVHKDYK VFDIEPVRDE 3 SCISCCVSPL DPENEDRRHN LQYFGARVNV LKALLTGLNG GYDDVHKDYK VFDVEPIRDE 4 YAIACCVSPM VIG KQ MQFFGARANL AKTLLYAING GVDEKLKIQV GPKTAPLMDD 5 YAIACCVSPM IVG KQ MQFFGARANL AKTMLYAING GVDEKLKMQV GPKSEPIKGD 6 YAIACCVSPM IVG KQ MQFFGARANL AKTLLYAING GIDEKLGMQV GPKTAPITDE 7 YAIACCVSAM RVG KD MQFFGARCNL AKCLLLAING GVDEKKGIKV VPDIEPITDE con I CCVS Q FGAR N K L NG G P
540
1 I DYDTVMEN FDKSLNWLTD TYVDAMNIIH YMTDKYNYEA VQMAFLPTKV RANMGFGICG 2 ILDYDTVMEN FDKSLDWLTD TYVDAMNIIH YMTDKYNYEA VQMAFLPTKV RANMGFGICG 3 VLDFETVKAN FEKALDWLTD TYVDAMNIIH YMTDKYNYEA VQMAFLPTRV KANMGFGICG 4 VLDYDKVMDS LDHFMDWLAV QYISALNIIH YMHDKYSYEA SLMALHDRDV YRTMACGIAG 5 VLNYDEVMER MDHFMDWLAK QYITALNIIH YMHDKYSYEA SLMALHDRDV IRTMACGIAG 6 VLDFDTVMTR MDSFMDWLAK QYVTALNVIH YMHDKYSYEA ALMALHDRDV YRTMACGIAG 7 VLDYEKVKEN YFKVLEYMAG LYVNTMNIIH FMHDKYAYEA SQMALHDTKV GRLMAFGIAG con L V Y N IH YM DKY YEA MA V M G G
600 1 FANTVDSLSA IKYAKVKTLR DEN -- ---GYIYDYE VEGDFPRYGE DDDRADDIAK
2 FANTVDSLSA IKYAKVKTLR DEN ---GYIYDYE VEGDFPRYGE DDDRADDIAK
3 FSNTVDSLSA IKYATVKPIR DED --GYIYDYE TVGNFPRYGE DDDRVDSIAE
4 LSVATDSLSA IKYARVKPIR DEN GLAVDFE IDGEYPQYGN NDERVDSIAC
5 LSVAADSLSA IKYAKVKPIR DED GLAIDFE IEGEYPQFGN NDPRVDDLAV 6 LSVAADSLSA IKYAKVKPVR GDIKDKDGNV VATNVAIDFE IEGEYPQYGN NDNRVDDIAC
7 FSVAADSLSA IRYAKVKPIR -EN GITVDFV KEGDFPKYGN DDDRVDSIAV con DSLSA IKYA VK R D G P G D R D A
660
1 LVMKMYHEKL ASHKLYKNAE ATVSLLTITS NVAYSKQTGN SPVHKGVFLN EDGTVNKSKL 2 LVMKMYHEKL ASHKLYKNAE ATVSLLTITS NVAYSKQTGN SPVHKGVFLN EDGTVNKSKL
3 WLLEAFHTRL ARHKLYKDSE ATVSLLTITS NVAYSKQTGN SPVHKGVYLN EDGSVNLSKV
4 DLVERFMKKI KALPTYRNAV PTQSILTITS NWYGQKTGN TPD GRRAG
5 DLVERFMKKI QKLHTYRDAI PTQSVLTITS NWYGKKTGN TPD - GRRAG
6 DLVERFMKKI QKLKTYRNAV PTQSVLTITS NWYGKKTGN TPD GRRAG 7 EIVEKFSDEL KKHPTYRNAK HTLSVLTITS NVMYGKKTGT TPD GRKVG con Y T S LTITS NV Y TGN P
720
1 EFFSPGANPS NKA-KGGWLQ NLRSLAKLEF KDANDGISLT TQVSPRALGK TRDEQVDNLV 2 EFFSPGANPS NKA-KGGWLQ NLRSLAKLEF KDANDGISLT TQVSPRALGK TRDEQVDNLV 3 EFFSPGANPS NKA-SGGWLQ NLNSLKKLDF AHANDGISLT TQVSPKALGK TFDEQVANLV 4 TPFAPGANPM HGRDRKGAVA SLTSVAKLPF TYAKDGISYT FSIVPAALGK EDPVRKTNLV 5 APFGPGANPM HGRDQKGAVA SLTSVAKLPF AYAKDGISYT FSIVPNALGK DDEVRKTNLA 6 APFGPGANPM HGRDQKGAVA SLTSVAKLPF AYAKDGISYT FSIVPNALGK DAEAQRRNLA 7 EPLAPGANPM HGRDMEGALA SLNSVAKVPY VCCEDGVSNT FSIVPDALGN DHDVRINNLV con PGANP G L S K DG S T P ALG NL
780
1 QILDGYFTPG ALINGTEFAG QHVNLNVMDL KDVYDKIMRG EDV---IVRI SGYCVNTKYL
2 QILDGYFTPG ALINGTEFAG QHVNLNVMDL KDVYDKIMRG EDV---IVRI SGYCVNTKYL
3 TILDGYF--- EGGG QHVNLNVMDL KDVYDKIMNG EDV---IVRI SGYCVNTKYL 4 GLLDGYFHHE ADV EGG QHLNVNVMNR EMLLDAIEHP EKYPNLTIRV SGYACASTH
5 GLMDGYFHHE ASI EGG QHLNVNVMNR EMLLDAMENP EKYPQLTIRV SGYAVRFNSL
6 GLMDGYFHHE ATV EGG QHLNVNVLNR EMLLDAMENP DKYPQLTIRV SGYAVRFNSL
7 SIMGGYF GQGA HHLNVNVLNR ETLIDAMNNP DKYPTLTIRV SGYAVNFNRL con GYF H N NV D R SGY 1 TPEQKQELTE RVFHEVLSND DEEV (SEQ ID NO:23)
2 TPEQKQELTE RVFHEVLSND DEEVMHTSNI Z (SEQ ID NO: 16)
3 TKEQKTELTQ RVFHEVLSMD DAATDLVNNK Z (SEQ ID NO: 4)
4 (SEQ ID NO:19)
5 TKEQQQDVIT RTFTQSM (SEQ ID NO: 14)
6 TKEQQQDVIT RTFTESM (SEQ ID NO: 20)
7 SKDHQKEVIS RTFHEKL (SEQ ID NO: 25) con
2. Cloning and sequencing of the entire pfl gene of L. lactis strain MG1363
The entire pfl gene sequence was obtained from L. lactis subsp, cremoris strain MG1363 using PCR. Like the pfl coding sequence of L. lactis strain DB1341 the coding sequence of MG1363 comprises 2363 bp and encodes a 787 amino acid PFL protein having a predicted molecular weight of 89.1 kDa.
Table 5.3. The complete sequence of the pfl locus of L. lactis strain MG1363
1 TTGGGCTATAAGGAAATTGTTCTGCTGATTTTTTAAAGTTTAGATATAGG 50 51 TTTAGGGGTTC TGTTTGAATTTCΛAAAAAAGTCTCCTC-AAGTTAATAAG 100 101 TTTATTATATCIACAAAGTATTATTTAGACCAACTTCCTTCAAAAAACTTT 150 151 TCGTTAAGGCTTTGAAATAAAATAATGAGAAAAAAATAGGAAAATCTGCT 200
201 ACAATTAGAAGGAGAAGAAGAGGATTTAAATCCTTTTTTATTAGGAAAAG 250 251 AAGGGATAGATAGGCTGATATGATAAAAAATTATGAACTATCCAATGAAA 300 orfA M I K N Y E L S N E K Sau3AI
301 AAAAATTGATCTCAACTTCΓGAGATGAAGAATTTCACTTATGTCCTCAAT 350
K L I S T S E M K N F T Y V L N
351 CCAACACGTGAAGAAATTGGGAATATCTC-AGAACACTATGATTTTCCTTT 400
P T R E E I G N I S E H Y D F P F 401 TGACTATCTATCTGGAATTTTAGATGACTATGAAAATGCCCGTTTTGAAA 450
D Y L S G I L D D Y E N A R F E T 451 CAGATGATAATGAO^TAATCTOATTCTTTTGC-AATATCCCGCCTTGTCC 500
D D N D N N L I L L Q Y P A L S 501 AACTATGGAGAAGTGGCCΛCrTTTCCATATTCTTTGGTTTGGACTAAGAA 550 N Y G E V A T F P Y S L V W T K N
551 TGAATCGGTTATTTTGGCCCTTAACCATGAAATTGATAATGGTCTCATTT 600 E S V I L A L N H E I D N G L I F 601 TTGAACGAGAATATGATTATAAACGCTATAAACΑCCAATTGATTTTTCAA 650
E R E Y D Y K R Y K H Q L I F Q 651 GTGATGTACCAAATGACT-ATACTTTTCATGATTATTTGAGAGACTTTAG 700 V M Y Q M T H T F H D Y L R D F R 701 AACAAGGCGCCGCCGGCTTGAAGTTGGTATC-AAAAATTCAACAAAAAATG 750
T R R R R L E V G I K N S T K N D
751 ACC1AAATTGTTGACTTAATTGCC-ATTC-AAGCGAGTTTGATTTATTTTGAA 800
Q I V D L I A I Q A S L I Y F E 801 GATGCGCTGCACAATAATATGC-AAGTTCTCCAGAATTTTATTGATTACTT 850 D A L H N N M Q V L Q N F I D Y L
851 ACGAGAAGATGATGAAGATGGTTTTGCCGAAAAAATCTATGATATTTTTG 900
R E D D E D G F A E K I Y D I F V 901 TCGAAACLAGACC-Ϊ-AGCTTATACΛGAAACCAAGATTCAGCTCAAGTTACTA 950 E T D Q A Y T E T K I Q L K L L 951 GAAAATCTCCGAGATTTGTTCTC -AAC-ATTGTCTCTAATAATTTGAATAT 1000
E N L R D L F S N I V S N N L N I 1001 CGTCATGAAAATTATGACCT(_ΛGCfΛCΛTTTGTTCTAGGTATTCCGGCGG 1050
V M K I M T S A T F V L G I P A V 1051 TTATTGTCGGCTTTTATGGAATGAATGTTCCGATTCCTGGTCAAAATTTT 1100 I V G F Y G M N V P I P G Q N F
1101 AATTGGATGGTCTGGCTCΛTTTTGGTGTTTGGAATTTTATTATGTGTTTG 1150
N W M V W L I L V F G I L L C V W
1151 GGTTACTTGGTGGCTAC-ACAAAAAAGATATGTTATGAATGGAGAAAATTT 1200
V T W W L H K K D M L Stop (SEQ ID NO: 37) 1201 CTCCGTTTTTTTATCTTTGTGAAAAAATTAATTAGTGATAATAAATCATG 1250
1251 AAGTTAGCIAATGTTTGTC-AAAGCTATTTAGTGAATTAATTATGAAAACGT 1300
1301 TTTAAAAAAGTATAACAGATATTAAAATAATTGAAACTGTATTAGTAAAG 1350
EcoRI 1351 AATCTGTAATTTCT TTGAATTCTGTTTGCTATTATCAAACTGTATGATA 1400 1401 TAATGAAGTTGTAATTTGAAACAGAAAGAACAAAGGAGATTTCAAAATGA 1450 pfl M K 1451 AAACCGAAGTTACGGAAAATATCTTTGAACAAGCTTGGGATGGTTTTAAA 1500
T E V T E N I F E Q A W D G F K 1501 GGAACTAACΓGGCGCGATAAAGCAAGCGTTACTCGCTTTGTACAAGAAAA 1550 G T N W R D K A S V T R F V Q E N
1551 CTACAAACCATATGATGGTGATGAAAGCTTTCTTGCTGGGCCAACAGAAC 1600
Y K P Y D G D E S F L A G P T E R 1601 GTACACTTAAAGTAAAGAAAATTATTGAAGATACAAAAAATCACTACGAA 1650 T L K V K K I I E D T K N H Y E 1651 GAAGTAGGATTTCCCTTTGATACTGACCGCGTAACCTCTATCGATAAAAT 1700 E V G F P F D T D R V T S I D K I 1701 TCCTGCTGGATATATTGATGCTAATGATAAAGAACTTGAACTCATCTATG 1750 P A G Y I D A N D K E L E L I Y G 1751 GGATGC-AAAATAGCGAACTTTTCCGCTTAAACTTCATGCCAAGAGGTGGT 1800
M Q N S E L F R L N F M P R G G 1801 CTTCGTGTTGCTGAAAAGATTTTGACAGAACACGGTCTTTCAGTTGACCC 1850 L R V A E K I L T E H G L S V D P
1851 AGGTTTGC-ATGATGTTTTGTClAC-AAA U-TGACTTCTGTAAATGATGGAA 1900
G L H D V L S Q T M T S V N D G I
1901 TCTTCCGTGCTTATACTTC-AGCAATTCGTAAAGCACGTCACGCTCACACT 1950
F R A Y T S A I R K A R H A H T 1951 GTAACAGGTTTGCCTGATGCATACTCTCGTGGACGTATCATCGGGGTATA 2000
V T G L P D A Y S R G R I I G V Y
2001 TGCACGTCTTGCTCTTTATGGAGCTGACTACCTTATGAAGGAAAAAGCAA 2050
A R L A L Y G A D Y L M K E K A K 2051 AAGAATGGGATGCAATCACTGAAATTAATGATGATAACATTCGTCTTAAA 2100 E W D A I T E I N D D N I R L K
2101 GAAGAAATTAACATGCAATACCAAGCTTTGCAAGAAGTTGTAAACTTTGG 2150
E E I N M Q Y Q A L Q E V V N F G 2151 TGCTTTGTATGGTCπ'GACGTTTCTCGTCCAGCGATGAACGTAAAAGAAG 2200 A L Y G L D V S R P A M N V K E A 2201 CAATCCAATGGGTTAATATTGC-ATA(_ΑTGGC-AGTTTGTCGTGTTATCAAT 2250
I Q W V N I A Y M A V C R V I N 2251 GGTGCTGC-AACTTC-ACTTGGACGTGTGCCLAATCGTTCTTGACIATCTTTGC 2300
G A A T S L G R V P I V L D I F A 2301 AGAACGTGACCTTGCΓCGTGGAACATTTACTGAGCAAGAAATCCAAGAAT 2350 E R D L A R G T F T E Q E I Q E F
2351 TTGTTGATGATTTCATTTTAAAACrTCGTAC-AATGAAATTTGCTCGTGCT 2 00
V D D F I L K L R T M K F A R A
2401 GCTGCITATGATGAACTTTATTCTGGTGACCCC-ACGTTCATCACAACATC 2450
A A Y D E L Y S G D P T F I T T S 2451 TATGGCTGGTATGGGTAATGACGGACGCCACCGTGTCACTAAAATGGACT 2500
M A G M G N D G R H R V T K M D Y
2501 ATCGTTTCTTGAACACACTTGATACAATCGGAAATGCTCCAGAACCAAAC 2550
R F L N T L D T I G N A P E P N 2551 TTGA(_AGTTCTTTGGGACTCTAAACTCCCLATATTCATTCAAACGTTATTC 2600 L T V L W D S K L P Y S F K R Y S
2601 AATGTCTATGAGTClftCAAACACTCATCTATCCAATATGAAGGTGTTGAAA 2650
M S M S H K H S S I Q Y E G V E T 2651 CAATGGCTAAAGATGGATATGGCGAAATGTCATGTATCTCTTGTTGTGTC 2700 M A K D G Y G E M S C I S C C V 2701 TCACCACTTGACCCAGAAAATGAAGAAGGACGTCATAATCTCCAATACTT 2750
S P L D P E N E E G R H N L Q Y F 2751 TGGTGCGCGTGTAAACGTCTTGAAAGCAATGTTGACTGGTTTGAACGGTG 2800 G A R V N V L K A M L T G L N G G 2801 GTTACGATGACGTTCATAAAGATTATAAAGTATTCGATATTGAACCTGTT 2850
Y D D V H K D Y K V F D I E P V 2851 CGTGATGAAATTCTTGACTATGATACAGTTATGGAAAACTTCGACAAATC 2900 R D E I L D Y D T V M E N F D K S 901 ACTC-AACTGGTTGACAGATACTTATGTTGATGCAATGAATATCATTCACT 2950
L N W L T D T Y V D A M N I I H Y 2951 AC1ATGACTGACAAATATAACTATGAAGCΛGTTC-AAATGGCCTTCTTGCCT 3000
M T D K Y N Y E A V Q M A F L P 3001 ACTAAAGTTCGTGCTAAC-ATGGGATTTGGTATCTGTGGTTTCGCAAATAC 3050 T K V R A N M G F G I C G F A N T
3051 AGTTGATTC-A.CTTTC1AGCGATTAAATATGCTAAAGTTAAAACTTTGCGTG 3100
V D S L S A I K Y A K V K T L R D 3101 ATGAAAATGGCTACATCTACGATTATGAAGTAGAAGGTGACTTCCCACGT 3150 E N G Y I Y D Y E V E G D F P R 3151 TATGGTGAAGATGATGACCGTGCTGATGATATCGCTAAACTTGTCATGAA 3200
Y G E D D D R A D D I A K L V M K 3201 AATGTACC1ATGAAAAATTAGCTTC-ACACAAACTTTACAAAAATGCTGAAG 3250
M Y H E K L A S H K L Y K N A E A 3251 CTACT'GTTTC-ACTTTTGAC-AATCACATCTAACGTTGCTTA rcrrAAACAA 3300 T V S L L T I T S N V A Y S K Q
3 01 ACTGGTAACTCTCCAGTTCATAAAGGAGTATTCCTCAATGAAGATGGTAC 3350 T G N S P V H K G V F L N E D G T
EcoRI 3351 AGTC_ΛACAAATCTAAACTTGAATT ITCTCACCAGGTGCTAACCCATCTA 3400 V N K S K L E F F S P G A N P S N
3401 ACAAAGCTAAAGGTGGATGGTTGC-AAAATCTTCGTTCIATTAGCTAAATTG 3450 K A K G G W L Q N L R S L A K L EcoRI 3451 GAATTC-AAAGATGC1AAATGACGGTATTTCATTAACTACTCAAGTTTCTCC 3500 E F K D A N D G I S L T T Q V S P
3501 TCGTGCACTTGGTAAAA rCGTGATGAACAAGTAGATAACTTGGTTCAAA 3550
R A L G K T R D E Q V D N L V Q I 3551 TTCTTGATGGATACTTCAC-ACCAGGAGCTTTGATTAATGGTACTGAATTT 3600 L D G Y F T P G A L I N G T E F 3601 GC-AGGTCLAAClACGTTAACTTGAACGTTATGGACCTTAAAGATGTTTACGA 3650
A G Q H V N L N V M D L K D V Y D 3651 TAAAATCATGCGTGGTGAAGATGTTATCGTTCGTATCTCTGGATACTGTG 3700
K I M R G E D V I V R I S G Y C V 3701 TTAACACTAAATACCTCACACCTGAACAAAAACAAGAATTGACTGAACGT 3750 N T K Y L T P E Q K Q E L T E R
3751 GTCTTCCATGAAGTACTTTCAAATGATGATGAAGAAGTAATGCACACTTC 3800 V F H E V L S N D D E E V M H T S 3801 AAATATCTAATTCTTAGTATTAAAAAATATAAGGTCTGTCAGTTCTACTG 3850
N I Stop (SEQ ID NO: 39) 3851 AC-AGACTTTTTTTCTATAAATTAATTATAATAGTTAAAAAC ATTATTTT 3900 3901 TAGTTTAAGAAAAATAAAATTTGTGCTAAAATAGATGAATGATAAAGGTA 3950 3951 ATTGGATTAACAGGCGGAATTGCGAGTGGGAAATCAACGGTGGTTGATTT 4000 4001 TTTGATTTCTGAAGGTTATCAAGTAATTGATGCTGACAAAGTTGTTCGTC 4050 4051 AGTTGClϊ^GAACCTGATGGGAAACTTTTTAATGαΛTAATGGAAACTTTC 4100 4101 GGTTC-AGATTTTACTGACGAAAATGGGAAATTAAACCGATGCAAAATTGA 4150 4151 GTGCTTAAGTTTTGCTGACCC-AAATCAACGTCAAAAATTAT 4191
Nucleotides 1-4191: SEQ ID NOS:36/38)
Homology searches using the above deduced PFL protein revealed a 79% overall protein sequence identity with the S. mutans PFL and higher than 40% with the E. coli , C. pasteurianum and H. influenzae PFL.
In the promoter region of the MG1363 pfl gene canonical lactoc- occal ribosome binding site (AAAGGAG, position +21 to +27) . -35 and -10 promoter regions (TTGCTA and TATAAT, respectively were found. A putative rho-dependent transcription terminator was located 24 bp downstream of the pfl stop codon (position 2432 to 2445) . Additionally, two sequences (FNR-1 and FNR-2 with significant homology to E. coli FNR-boxes having consensus sequence TTGAT-N4-ATCAA (SEQ ID NO: 40) and being involved in regulation of the expression of pfl in E. coli were identified. The MG1363 FNR-1 (GGAGT-N4ATCAA) (SEQ ID N0:41) was also pres- ent in strain DB1341. FNR-2 (TTTGC-N4-ATCAA) (SEQ ID N0:42); position -36 to -23 overlaps with the -35 hexamer of the promoter region of the pfl gene.
The coding sequence of the MG1363 pfl gene showed 102 basepair changes when compared to the corresponding sequence of strain DB1341, but these changes resulted only in four amino acid changes in the PFL primary structure. The lactococcal PFL includes the conserved Gly residue at position 749, flanked by Ser and Tyr residues, which is involved in activation and deactivation of the enzyme in E. coli via free radical forma- tion. This region is present in all PFL proteins characterized to date. The L. lactis sequence ISCCVSP is highly conserved and includes two adjacent Cys residues.
EXAMPLE 6
Construction of pfl mutant strains of L . lactis strains DB1341 and MG1363 by gene inactivation and physiological characterization of pfl ' strains
A 460 bp Sau3AI internal fragment (positions 1343 to 1799 in Table 3.2) of the L. lactis DB1341 pjfl gene was cloned into BajriHI-digested pSMA500 (Madsen et al . , 1996), resulting in plasmid pSMAKAS7, and transformed into E. coli MC1000 by electroporation (Sambrook et al . , 1989). A transformant (SMAKAS7) containing the recombinant plasmid was isolated. The orientation of the pfl fragment in pSMAKAS7 was confirmed by sequenc- ing. Homologous recombination of pSMAKAS7 into the L. lactis pfl gene allows translational fusion of the reporter lacLM gene (Madsen et al . , 1996).
Plasmid pSMAKAS7 was used to transform L. lactis strains DB1341 and MG1363 by electroporation (Holo and Nes 1989) . Two single transformants were isolated (DBKAS7 and MGKAS7, respectively) . DBKAS7 became blue on X-gal plates, as expected if homologous integration at the chromosomal pfl locus had occurred, and was further characterized. Integration of pSMAKAS7 by homologous recombination into the DB1341 chromosome would result in a truncated pfl gene, where the N-terminal region of the protein (residues Met1-Asp574) would be separated from the C- terminal domain (residues Asp42 -lie778) . PCR analysis was used to confirm that DBKAS7 carries a disrupted pfl gene. The activation site of the E. coli Pfl, a glycine residue at position Gly734 flanked by serine and tyrosine is conserved in all bacterial Pfl proteins characterized (Weidner and Sawers, 1996; Yamamoto et al., 1996), including the L . lactis Pfl (position 2321-2329 of the nucleotide sequence in Table 3.2; Table 3.6). The truncated Pfl protein in strain DBKAS7 would lack an activation site.
A sample of Lactococcus lactis subspecies lactis biovar diace - tylactis strain DBKAS7 and of Lactococcus lactis subspecies lactis strains MGKAS7, respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11086 and DSM 11083, respectively.
A 495 bp PCR fragment was amplified from MG1363 using primers pf11- PIMG1363 (5 ' -GGCCGCTCGA GTTGTGTCTC ACCACTTGAC CC-3' (SEQ ID NO:43); Xhol site underlined) and pflP2MG1363 (5 ' -TAGTAGGATC CCATCATCTT CACCATAACG TGG-3' (SEQ ID NO:44); BamHI site under- lined) and cloned into XhoI+BamUI digested pSMA500 and transformed into strain MG1363, resulting in strain MGKAS13.
MGKAS13 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 10 July 1997 under the accession No. DSM 11653.
DBKAS7 and MGKAS13 formed blue colonies on X-gal -containing plates. Plasmid integration through homologous recombination was confirmed via PCR in both strains.
Physiological analysis of the L. lactis pfl ' strain
A colorimetric assay (Voges-Proskauer, VP; Westerfeld 1945) was used to study acetoin and diacetyl production in strain DBKAS7. The presence of acetoin and diacetyl in the samples results in the formation of red colour which is monitored by measuring OD520. Overnight cultures of strain DBKAS7 {pfl ' ) and wild type strain DB1341, grown at 30°C without aeration in GM17 were used. The VP assay was performed by mixing 200 μl bacterial culture, 100 μl 0.3 % (w/v) creatine, 100 μl 5 M NaOH, and 50 μl 5 % α-naphthol (dissolved in 2.5 M NaOH immediately before use) . The mixture was incubated for 10 min at room temperature, with constant stirring to provide aeration. The reaction was stopped by adding 1 ml 4 mM DTT. After cen rifugation to remove cellular debris, OD520 was measured. As shown in Table 6.1. DBKAS7 had approximately a 2-fold increase in the production of acetoin/diacetyl as compared to strain DB1341.
Table 6.1. Voges-Proskauer assay for aroma compounds produced by DB1341 and DBKAS7, respectively
Figure imgf000105_0001
Overnight cultures were grown at 30°C, without shaking in
GM17.
The OD600 values represent a measure for growth. The OD520 values are the results of the production of acetoin and diacetyl (Westerfeld 1945) .
Thus, gene inactivation of the pfl gene in the L. lactis strain DB1341 results in an enhanced production of aroma compounds, without affecting the ability to grow.
Similar levels of formate were obtained in strain DB1341 as in MG1363, and no formate was detected in DBKAS7 under anaerobic conditions, confirming the pfl mutant phenotype in this strain.
L . lactis biovar diacetylactis strains are used as starter cultures due to their ability to produce diacetyl during milk fermentation. A mutation in the pfl gene of DB1341 should result in increased pyruvate levels under anaerobic growth. Thus, if excess pyruvate is directed towards the production of diacetyl and acetoin, a higher level of these metabolites would be expected in strain DBKAS7 grown under anaerobiosis. As shown in Table 6.2. a 7- fold increase in the production of aroma compounds was observed in strain DBKAS7 grown in GM17 and a more than 4- fold increase was detected in GalM17 as compared to the wild type strain, DB1341. This demonstrated the effect of a pfl mutation in the production of diacetyl and acetoin in a L . lactis biovar diacetylactis strain.
Table 6.2. Production of aroma compounds in the L. lactis biovar diacetylactis pfi'strain, DBKAS7 as compared to the wild type strain
Figure imgf000106_0001
aCell extracts from stationary culture (OD600 about 3) were assayed according to Casabadan et al . 1980. Values shown are the mean of two independent experiments.
Inactivation of the pfl gene leads to a transcriptional fusion of the lacLM reporter gene (Madsen et al . 1996) β -galactosidase levels were measured in overnight cultures of strain MGKAS13 grown in M17 with either glucose (GM17) or galactose (GalM17) (Table 6.2). Using GM17, anaerobic growth was observed, about a 10 -fold increase of β -galactosidase units, which is consistent with the induction observed at RNA level. High levels of β- galactosidase were observed under anaerobic growth when growing in the presence of galactose, and a 4- fold induction was observed under anaerobiosis in this medium which is in agree¬ ment with the RNA studies. Table 6.3. Characterization of the L. lacti s Mgl363 pfl ' strain, MGKAS13
Figure imgf000107_0001
MGKAS13 should not produce formate under anaerbic conditions as a result of the inactivation of the pfl gene in this strain. In strain MG1363, no formate was detected under aerobic growth in GM17, as it would be expected if the lactococcal PFL is inactivated in the presence of oxygen. Relatively low levels of formate were detected under anaerobic conditions. In GalM17 a 8-fold higher amount of formate was detected in anaerobiosis.
No formate was detected in strain MGKAS13 in either of the test media, confirming that this strain carries a pfl null mutation.
EXAMPLE 7
Identification of pfl and adhE homologues in non- Lactococcus lactic acid bacteria using Lactococcus lactis pfl and adhE gene fragments as probes
1. Southern hybridization of genomic DNA from non -Lactococcus lactic acid bacteria using a L. lactis pfl gene fragment aε a probe
A PCR fragment including most of the L . lactis pfl coding sequence was obtained by amplification of MG1363 genomic DNA with primers pfl89 and pf11066 (see Fig. 4) . A 2 kb DNA frag- ment (Fig. 9) was obtained and used as a probe in Southern hybridization experiments using EcoRI-digested total DNA from Streptococcus thermophilus ATCC 19258, Leuconostoc mesente roides subsp. mesenteroides ATCC 10878 and Lactobacillus acido- philus ATCC 4796 (Fig. 10) .
Hybridization was carried out overnight at 65°C. Filters were washed twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes. As shown in Fig. 10C the expected EcoRI genomic fragment deduced from L. lactis pfl sequence was detected after overnight exposure. After short exposure of the filters (Fig. 10B) only hybridization was detected in S. thermophilus and only weak signals were detected in L . mesenteroides and L . acido philus after longer exposure (Fig. 10C) , indicating lower pfl sequence homology in these bacteria, as would be expected due to their taxonomic distance to L. lactis .
2. Southern hybridisation of genomic DNA from non-JtVactococcus lactic acid bacteria using L. lactis adhE gene fragment as a probe
Two Sau3AI fragments including most of the L . lactis subsp. lactis biovar diacetylactis DB 1341 adhE coding sequence (Fig. 11) were used as a probe in Southern hybridization experiments using EcoRI-digested total DNA from Streptococcus thermophilus ATCC 19258, Leuconostoc mesenteroides subsp. mesenteroides ATCC 10878 and Lactobacillus acidophilus ATCC 4796.
Hybridization was carried out overnight at 65°C. Filters were washed twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes. As shown in Fig. 12, the expected EcoRI genomic fragment (about 5 kb) deduced from the L . lactis MG1363 adhE sequence was detected. Strongly hybridizing bands were also detected in S . thermophilus (5 kb) and L. mesenteroides (5 and 0.4 kb) . Weaker hybridizing bands were also detected in L. acidophilus (4.2 and 2 kb, and two minor bands, 2.3 and 5 kb) .
3. Conclusions
Using the above L. lactis DNA probes, preliminary restriction maps of the pfl and adhE genes, respectively in the three non- Lactococcus lactic acid bacterial species could be carried out using different restriction digests of the genomic DNA. Two strategies for the cloning of these non- Lactococcus genes can be followed: (i) cloning of DNA fragments isolated from agarose gels corresponding in size to the hybridizing bands detected in Southern analysis; (ii) PCR of conserved regions using primers derived from the corresponding L . lactis sequence.
REFERENCES
1. Arnau, J. , F. Jørgensen, S. Madsen, A. Vrang and H. Israel- sen. 1997. Cloning, expression and characterization of the Lactococcus lactis pfl gene, encoding pyruvate formate-lyase. Submitted for publication.
2. Chen, Y.-M. and C.C. Lin. 1991. Regulation of the adhE gene, which encodes ethanol dehydrogenase in Escherichia coli . J. Bacteriol. 173:8009-8013.
3. Chippaux, M. , F. Casse and M.-C. Pascal. 1972. Isolation and phenotypes of mutants from Salmonella typhimurium defective in formate hydrogenlyase activity. J. Bacteriol. 110:766-768.
4. Christiansen, L. and S. Pedersen. 1981. Cloning, restriction endonuclease mapping and post-transcriptional regulation of rspA, the structural gene for ribosomal protein SI. Mol . Gen. Genet. 181:548-551.
5. Crow, V.L. and G.G. Pritchard. 1977. Fructose 1,6-diphos- phate- ctivated L- lactate dehydrogenase from Streptococcus lactis : kinetic properties and factors affecting activation. J. Bacteriol. 131:82-91.
6. Donkersloot, J.A. and J. Thompson. 1995. Cloning, expression, sequence analysis, and site-directed mutagenesis of the Tn5306-encoded N5- (Carboxyethy1) ornithine synthase from Lactococcus lactis Kl. J. Biol Chem 270:12226-12234.
7. Fleischmann, R.D., M.D. Adams, 0. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomp, B.A. Dougherty and J.M. Merrick et al . 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496-512. 8. Frey, M. , Rothe, M. , Volker Wagner, A.F. and Knappe, J. 1994. Adenosylmethionine-dependent synthesis of the glycyl radical in pyruvate formate- lyase by abstraction of the glycin C-2 pro-S hydrogen atom. J. Biol. Chem. 269: 12432-12437.
9. Goodlove, P.E., P.R. Cunningham, J.Parker, and D.P. Clark. 1989. Cloning and sequence analysis of the fermentative alcohol -dehydrogenase-encoding gene of Escherichia coli . Gene 85:209-214.
10. Holo, H. and I.F. Ness. 1989. High-frequency transformatio by electroporation of Lactococcus lactis subsp. cremoris grown with glycine in osmotically stabilized media. Appl . Environ. Microbiol. 55:3119-3123.
11. Kessler, D. , I. Leibrecht and J. Knappe. 1991. Pyruvate- formate-lyase-deactivase and acetyl CoA reductase activities of Escherichia coli reside on a polymeric protein particle encoded by adhE. FEBS Lett. 281:59-63.
12. Kessler, D., W. Herth and J. Knappe. 1992. Ultrastructure and pyruvate formate-lyase radical quenching property of the multienzymic AdhE protein of Escherichia coli . J. Biol. Chem. 267:18073-18079.
13. Madsen, S.M., B. Albrechtsen, E.B. Hansen and H. Israelsen. 1996. Cloning and transcriptional analysis of two threonine biosynthetic genes from Lactococcus lactis MG1614. J. Bacteriol 178:3689-3694.
14. Mat-Jan, F. , K.Y. Alam and D.P. Clark. 1989. Mutants of Escherichia coli deficient in the fermentative lactate dehydrogenase. J. Bacteriol. 171:342-348.
15. Nair, R.V. , G.N. Bennett and E.T. Papoutsakis. 1994. Molecular characterization of an aldehyde/alcohol dehydrogenase from Clostridium acetobutylicum ATCC 824. J. Bacteriol. 176:871-881.
16. Pecher, A., H.P. Blaschkowski, K. Knappe and A. Bock. 1982. Expression of pyruvate formate-lyase of Escherichia coli from the cloned structural gene. Arch. Microbiol. 132:365-371.
17. Platteeuw, C. , J. Hugenholtz, M. Starrenburg, I. van Alen- Boerrigter and W. M. de Vos. 1995. Metabolic engineering of Lactococcus lactis: influence of the overproduction of α-aceto- lactate synthase in strains deficient in lactate dehydrogenase as a function of culture conditions. Appl. Environ. Microbiol. 61:3967-3971.
18. Sambrook, J., E.F. Fritsch and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y.
19. Sauter, M. and Sawers, R.G. 1990. Transcriptional analysis of the gene encoding pyruvate formate-lyase-activating enzyme of Escherichia coli . Mol . Microbiol. 4: 355-363.
20. Sawers, G., A.F. Wagner and A. Bock. 1989. Transcription initiation at multiple promoters of the pfl gene by Eσ70-depen- dent transcription in vitro and heterologous expression in Pseudomonas putida in vivo. J. Bacteriol. 171: 4930-4937.
21. Sawers, G. and A. Bδck. 1988. Anaerobic regulation of pyruvate formate- lyase from Escherichia coli K-12. J. Bacteriol. 170:5330-5336.
22. Sawers, G. and A. Bδck. 1989. Novel transcriptional control of the pyruvate formate-lyase gene: upstream regulatory sequences and multiple promoters regulate anaerobic expression. J. Bacteriol. 171:2485-2498. 23. Snoep, J.L., M.J.T. de Mattos, M.J.C. Starrenburg and J. Hugenholtz. 1992. Isolation, characterization, and physiological role of the pyruvate dehydrogenase complex and α-acetolac- tate synthase of Lactococcus lactis subsp. lactis bv. diacetyl actis . J . Bacteriol. 174: 4838-4841.
24. Suppmann, B. and G. Sawers. 1994. Isolation and characterization of hypophosphite-resistant mutants of E. coli : identi fication of the FocA protein, encoded by the pfl operon, as a putative formate transporter. Mol . Microbiol 11:965-982.
25. Takahashi, S., K. Abbe and T. Ya ada. 1982. Purification o pyruvate formate-lyase from Streptococcus mutans and its regulatory properties. J. Bacteriol. 149:1034-1040.
26. Varenne, S., F. Casse, M. Chippaux and M.C. Pascal. 1975. mutant of Escherichia coli deficient in pyruvate formate-lyase. Mol. Gen. Genet. 141:181-184.
27. deVos, W.M. and G. Simons. 1994. Gene cloning and expression systems in lactococci. In: Gasson, M. , Vos W. de (eds) Genetics and biotechnology of lactic acid bacteria, Chapman an Hall, London, pp. 52-105.
28. Weidner, G. and G. Sawers. 1996. Molecular characterization of the genes encoding formate-lyase and its activating enzyme of Clostridium pasteurianum. J. Bacteriol. 178:2440-2444.
29. Westerfeld, W.W. 1945. A colorimetric determination of blood acetoin. J. Biol. Chem. 161: 495-502.
30. Wong, K.K., K.L. Suen and H.S. Kwan. 1989. Transcription of pfl is regulated by anaerobiosis, catabolite repression, pyruvate, and oxrA: pfl : :Mu dA operon fusions of Salmonella typhimurium. J. Bacteriol. 171:4900-4905. 31. Yamamoto, Y. , Y. Sato, S. Takahashi-Abbe, K. Abbe, T. Yamada and H. Kizaki. 1996. Cloning and sequence analysis of the pfl gene encoding pyruvate formate-lyase from Streptococcus mutans . Infect. Immun. 64:385-391.
INDICATIONS RELATING TO A DEPOSITED MICROORGANISM
(PCT Rule 13bis)
A. Hie indications made below relate to the microorganism referred to in the description on page 30 . line 9 -12
B. roENTDTICATION OF DEPOSIT Further deposits are identified on an additional sheet I X I
Name of depositary institution
DSM-Deutsche Sa mlung von Mikroorganismen und Zellkulturen GmbH
Address of depositary institution (including postal code and country)
Mascheroder Weg IB D-38124 Braunschweig Germany
Date of deposit Accessssiioonn N Nuummbbeerr-, _
18 July 1996 DSM 11093
C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet V~
As regards the respective Patent Offices of the respective designated states, the applicants request that a sample of the deposited microorganisms only be made available to an expert nominated by the requester until the date on which the patent is granted or the date on which the application has been refused or withdrawn or is deemed to be withdrawn.
D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States)
E. SEPARATE FURNISHING OF^INDICATIONS (leave blank if not applicable)
The indications listed below will be submitted to the International Bureau later (specify the general nature of the indications cr, 'Accession Number of Deposit")
Figure imgf000115_0001
INDICATIONS RELATING TO DEPOSITED MICROORGANISMS
(PCT Rule 12bis)
Additional sheet
In addition to the microorganism indicated on page 113 of the description, the following microorganisms have been deposited with
DSM-Deutsche Sammlung von Mikroorganismen und
Cellkulturen GmbH
Mascheroder Weg IB, D-38124 Braunschweig, Germany on the dates and under the accession numbers as stated below:
Figure imgf000116_0001
For all of the above- identified deposited microorganisms, the following additional indications apply:
As regards the respective Patent Offices of the respective designated states, the applicants request that a sample of the deposited microorganisms stated above only be made available to an expert nominated by the requester until the date on which the patent is granted or the date on which the application has been refused or withdrawn or is deemed to be withdrawn. SEQUENCE LISTING
(1) GENERAL INFORMATION
(i) APPLICANT: Bioteknologisk Institut
(ii) TITLE OF THE INVENTION: Metabolically engineered lactic acid bacteria and means for providing same
(iii) NUMBER OF SEQUENCES: 44
(v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Diskette
(B) COMPUTER: IBM Compatible
(C) OPERATING SYSTEM: DOS
(D) SOFTWARE: FastSEQ for Windows Version 2.0
(vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(B) FILING DATE:
(C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: US 08/701,458
(B) FILING DATE: 22-AUG-1996
(viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: PLOUGMANN, VINGTOFT & PARTNERS A/S
(B) REGISTRATION NUMBER:
(C) REFERENCE/DOCKET NUMBER: 18383 PC 1 fix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: +45 33 63 93 00
(B) TELEFAX: +45 33 63 96 00
(C) TELEX:
(2) INFORMATION FOR SEQ ID N0:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2088 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS : single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 792...2069 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 :
GATCTGTCCT TAGTACGAGA GGACCGGGAT GGACTTACCG CTGGTGTACC AGTTGTTCCG 60
CCAGAGCACG GCTGGATAGC TATGTAGGGA AGGGATAAGC GCTGAAAGCA TCTAAGTGCG 120
AAGCCACCTC AAGATGAGAT TACCCATTCG AGAATTAAGA GCCCAGAGAG ATGATCAAGA 180
TGTCAATAAT TTGCAAAAAA TCTTCTTTCA GCAAAACGGG ATTTGAGTTT TTGCTCGATT 240
TGTGGGAATT TAACAGAAAG TGATCTGTTG AAATCGCAAG CCCTCTCGGT GTACTTGCTG 300
GTATCGTTCC AACGACTAAT CCAACATCAA CAGCAATCTT TAAATCTTTA TTGACTGCAA 360
AAACACGTAA TGCTATTGTT TTCGCTTTCC ACCCTCAAGC TCAAAAATGT TCAAGCCATG 420
CAGCAAAAAT TGTTTACGAT GCTGCAATTG AAGCTGGTGC ACCGGAAGAC TTTATTCAAT 480
GGATTGAAGT ACCAAGCCTT GACATGACTA CCGCCTTGAT TCAAAACCGT GGACTTGCAA 540
CAATCCTTGC AACTGGTGGC CCAGGAATGG TAAACGCCGC ACTCAAATCT GGTAACCCTT 600
CACTCGGTGT TGGAGCTGGT AATGGTGCTG TTTATGTTGA TGCAACTGCA AATATTGAAC 660
GTGCCGTTGA AGACCTTTTG CTTTCAAAAC GTTTTGATAA TGGGATGATT TGTGCCACTG 720
AAAATTCAGC TGTTATTGAT GCTTCAGTTT ATGATGAATT TATTGCTAAA ATGCAAGAAC 780
AAGGCGCTTA T ATG GTT CCT AAA AAA GAC TAC AAA GCT ATT GAA AGT TTC 830 Met Val Pro Lys Lys Asp Tyr Lys Ala lie Glu Ser Phe 1 5 10
GTT TTT GTT GAA CGT GCT GGT GAA GGT TTT GGA GTA ACT GGT CCT GTT 878 Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly Pro Val 15 20 25
GCC GGT CGT TCT GGT CAA TGG ATT GCT GAA CAA GCT GGT GTC AAA GTT 926 Ala Gly Arg Ser Gly Gin Trp lie Ala Glu Gin Ala Gly Val Lys Val 30 35 40 45
CCT AAA GAT AAA GAT GTC CTT CTT TTT GAA CTT GAT AAG AAA AAT ATT 974 Pro Lys Asp Lys Asp Val Leu Leu Phe Glu Leu Asp Lys Lys Asn lie 50 55 60
GGT GAA GCA CTT TCT TCT GAA AAA CTT TCT CCT TTG CTT TCA ATC TAC 1022 Gly Glu Ala Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser lie Tyr 65 70 75
AAA GCT GAA ACA CGT GAA GAA GGA ATT GAG ATT GTA CGT AGC TTA CTT 1070 Lys Ala Glu Thr Arg Glu Glu Gly lie Glu lie Val Arg Ser Leu Leu 80 85 90
GCT TAT CAA GGT GCT GGA CAT AAT GCT GCA ATT CAA ATC GGT GCA ATG 1118 Ala Tyr Gin Gly Ala Gly His Asn Ala Ala He Gin He Gly Ala Met 95 100 105
GAT GAT CCA TTC GTT AAA GAA TAT GGC GAA AAA GTT GAA GCT TCT CGT 1166 Asp Asp Pro Phe Val Lys Glu Tyr Gly Glu Lys Val Glu Ala Ser Arg 110 115 120 125
ATC CTC GTT AAC CAA CCA GAT TCT ATT GGT GGG GTC GGA GAT ATC TAT 1214 He Leu Val Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp He Tyr 130 135 140
ACT GAT GCA ATG CGT CCA TCA CTT ACA CTT GGA ACT GGT TCA TGG GGG 1262 Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser Trp Gly 145 150 155
AAA AAT TCA CTT TCA CAC AAT TTG AGT ACA TAC GAT CTA TTG AAT GTT 1310 Lys Asn Ser Leu Ser His Asn Leu Ser Thr Tyr Asp Leu Leu Asn Val 160 165 170 AAA ACA GTG GCT AAA CGT CGT AAT CGC CCA CAA TGG GTT CGT TTG CCA 1358 Lys Thr Val Ala Lys Arg Arg Asn Arg Pro Gin Trp Val Arg Leu Pro 175 180 185
AAA GAA ATT TAC TAC GAA AAA AAT GCA ATT TCT TAC TTA CAA GAA TTG 1406 Lys Glu He Tyr Tyr Glu Lys Asn Ala He Ser Tyr Leu Gin Glu Leu 190 195 200 205
CCA CAC GTC CAC AAA GCT TTC ATC GTT GCT GAC CCT GGT ATG GTT AAA 1454 Pro His Val His Lys Ala Phe He Val Ala Asp Pro Gly Met Val Lys 210 215 220
TTT GGT TTC GTT GAT AAA GTT TTG GAA CAA CTT GCT ATC CGC CCA ACT 1502 Phe Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala He Arg Pro Thr 225 230 235
CAA GTT GAA ACA AGC ATT TAT GGC TCT GTT CAA CCT GAC CCA ACT TTG 1550 Gin Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Asp Pro Thr Leu 240 245 250
AGC GAA GCA ATT GCA ATC GCT CGT CAA ATG AAA CAA TTT GAA CCT GAC 1598 Ser Glu Ala He Ala He Ala Arg Gin Met Lys Gin Phe Glu Pro Asp 255 260 265
ACT GTC ATC TGT CTT GGT GGT GGT TCT GCT CTC GAT GCC GGT AAG ATT 1646 Thr Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly Lys He 270 275 280 285
GGT CGT TTG ATT TAT GAA TAT GAT GCT CGT GGT GAA GCT GAC CTT TCT 1694 Gly Arg Leu He Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp Leu Ser 290 295 300
GAT GAT GCA AGT TTG AAA GAA CTT TTC CAA GAA TTA GCT CAA AAA TTT 1742 Asp Asp Ala Ser Leu Lys Glu Leu Phe Gin Glu Leu Ala Gin Lys Phe 305 310 315
GTC GAT ATT CGT AAA CGT ATT ATT AAA TTC TAC CAT CCA CAT AAA GCA 1790 Val Asp He Arg Lys Arg He He Lys Phe Tyr His Pro His Lys Ala 320 325 330
CAA ATG GTT GCA ATT CCT ACT ACT TCT GGT ACT GGT TCT GAA GTG ACT 1838 Gin Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu Val Thr 335 340 345
CCA TTT GCA GTT ATC ACT GAT GAT GAA ACT CAT GTT AAG TAC CCA CTT 1886 Pro Phe Ala Val He Thr Asp Asp Glu Thr His Val Lys Tyr Pro Leu 350 355 360 365
GCT GAC TAC CAA TTA ACA CCA CAA GTT GCC ATT GTT GAC CCT GAG TTT 1934 Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala He Val Asp Pro Glu Phe 370 375 380
GTT ATG ACT GTA CCA AAA CGT ACT GTT TCT TGG TCT GGT ATT GAT GCG 1982 Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He Asp Ala 385 390 395 ATG TCA CAC GCG CTT GAA TCT TAC GTT TCT GTT ATG TCT TCT GAC TAT 2030 Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser Asp Tyr 400 405 410
ACA AAA CCA ATT TCA CTT CAA GCG ATC CCG GGT CTA GAT TAGGGTAACT TT 2081 Thr Lys Pro He Ser Leu Gin Ala He Pro Gly Leu Asp 415 420 425
GAAAGGA 2088
(2) INFORMATION FOR SEQ ID NO: 2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 426 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS : single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:
Met Val Pro Lys Lys Asp Tyr Lys Ala He Glu Ser Phe Val Phe Val
1 5 10 15
Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly Pro Val Ala Gly Arg
20 25 30
Ser Gly Gin Trp He Ala Glu Gin Ala Gly Val Lys Val Pro Lys Asp
35 40 45
Lys Asp Val Leu Leu Phe Glu Leu Asp Lys Lys Asn He Gly Glu Ala
50 55 60
Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser He Tyr Lys Ala Glu 65 70 75 80
Thr Arg Glu Glu Gly He Glu He Val Arg Ser Leu Leu Ala Tyr Gin
85 90 95
Gly Ala Gly His Asn Ala Ala He Gin He Gly Ala Met Asp Asp Pro
100 105 110
Phe Val Lys Glu Tyr Gly Glu Lys Val Glu Ala Ser Arg He Leu Val
115 120 125
Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp He Tyr Thr Asp Ala
130 135 140
Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser Trp Gly Lys Asn Ser 145 150 155 160
Leu Ser His Asn Leu Ser Thr Tyr Asp Leu Leu Asn Val Lys Thr Val
165 170 175
Ala Lys Arg Arg Asn Arg Pro Gin Trp Val Arg Leu Pro Lys Glu He
180 185 190
Tyr Tyr Glu Lyε Asn Ala He Ser Tyr Leu Gin Glu Leu Pro His Val
195 200 205
His Lys Ala Phe He Val Ala Asp Pro Gly Met Val Lys Phe Gly Phe
210 215 220
Val Asp Lys Val Leu Glu Gin Leu Ala He Arg Pro Thr Gin Val Glu 225 230 235 240
Thr Ser He Tyr Gly Ser Val Gin Pro Asp Pro Thr Leu Ser Glu Ala
245 250 255
He Ala He Ala Arg Gin Met Lys Gin Phe Glu Pro Asp Thr Val He 260 265 270 Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly Lyε He Gly Arg Leu
275 280 285
He Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp Leu Ser Asp Asp Ala
290 295 300
Ser Leu Lyε Glu Leu Phe Gin Glu Leu Ala Gin Lys Phe Val Asp He 305 310 315 320
Arg Lyε Arg He He Lys Phe Tyr His Pro His Lys Ala Gin Met Val
325 330 335
Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu Val Thr Pro Phe Ala
340 345 350
Val He Thr Asp Asp Glu Thr His Val Lyε Tyr Pro Leu Ala Asp Tyr
355 360 365
Gin Leu Thr Pro Gin Val Ala He Val Asp Pro Glu Phe Val Met Thr
370 375 380
Val Pro Lys Arg Thr Val Ser Trp Ser Gly He Asp Ala Met Ser His 385 390 395 400
Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser Asp Tyr Thr Lyε Pro
405 410 415
He Ser Leu Gin Ala He Pro Gly Leu Asp 420 425
(2) INFORMATION FOR SEQ ID NO: 3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3185 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 145...2853
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3:
AAGCTTGTTA CAAAACCGTT TTCTAAACTT TTGATGAGTG TTTTTGTAAA AACTATCACA 60 ATATTGCTTG ACATCTATAA AAAACTTTGT TAAACTATTC ACGTAAAAGA AAGTGAATGA 120 AGTCACAAAG GAGAACCTAC AAAT ATG GCA ACT AAA AAA GCC GCT CCA GCT 171
Met Ala Thr Lys Lys Ala Ala Pro Ala 1 5
GCA AAG AAA GTT TTA AGC GCT GAA GAA AAA GCC GCA AAA TTC CAA GAA 219 Ala Lys Lys Val Leu Ser Ala Glu Glu Lyε Ala Ala Lys Phe Gin Glu 10 15 20 25
GCT GTT GCT TAT ACT GAC AAA TTA GTC AAA AAA GCA CAA GCT GCT GTT 267 Ala Val Ala Tyr Thr Asp Lys Leu Val Lys Lys Ala Gin Ala Ala Val 30 35 40
CTT AAA TTT GAA GGA TAT ACA CAA ACT CAA GTC GAT ACT ATT GTC GCT 315 Leu Lys Phe Glu Gly Tyr Thr Gin Thr Gin Val Aεp Thr He Val Ala 45 50 55
GCA ATG GCT CTT GCA GCA AGC AAA CAT TCT CTA GAA CTC GCT CAT GAA 363 Ala Met Ala Leu Ala Ala Ser Lyε His Ser Leu Glu Leu Ala His Glu 60 65 70 GCC GTT AAC GAA ACT GGT CGT GGT GTT GTC GAA GAC AAA GAT ACC AAA 411 Ala Val Asn Glu Thr Gly Arg Gly Val Val Glu Asp Lyε Aεp Thr Lyε 75 80 85
AAC CAC TTT GCT TCT GAA TCT GTT TAT AAC GCA ATT AAA AAT GAC AAA 459 Asn Hiε Phe Ala Ser Glu Ser Val Tyr Asn Ala He Lys Asn Asp Lys 90 95 100 105
ACT GTT GGT GTC ATT TCT GAA AAC AAG GTT GCT GGA TCT GTT GAA ATC 507 Thr Val Gly Val He Ser Glu Asn Lys Val Ala Gly Ser Val Glu He 110 115 120
GCA AGC CCT CTC GGT GTA CTT GCT GGT ATC GTT CCA ACG ACT AAT CCA 555 Ala Ser Pro Leu Gly Val Leu Ala Gly He Val Pro Thr Thr Asn Pro 125 130 135
ACA TCA ACA GCA ATC TTT AAA TCT TTA TTG ACT GCA AAA ACA CGT AAT 603 Thr Ser Thr Ala He Phe Lys Ser Leu Leu Thr Ala Lys Thr Arg Asn 140 145 150
GCT ATT GTT TTC GCT TTC CAC CCT CAA GCT CAA AAA TGT TCA AGC CAT 651 Ala He Val Phe Ala Phe His Pro Gin Ala Gin Lys Cys Ser Ser His 155 160 165
GCA GCA AAA ATT GTT TAC GAT GCT GCA ATT GAA GCT GGT GCA CCG GAA 699 Ala Ala Lys He Val Tyr Asp Ala Ala He Glu Ala Gly Ala Pro Glu 170 175 180 185
GAC TTT ATT CAA TGG ATT GAA GTA CCA AGC CTT GAC ATG ACT ACC GCC 747 Aεp Phe He Gin Trp He Glu Val Pro Ser Leu Aεp Met Thr Thr Ala 190 195 200
TTG ATT CAA AAC CGT GGA CTT GCA ACA ATC CTT GCA ACT GGT GGC CCA 795 Leu He Gin Asn Arg Gly Leu Ala Thr He Leu Ala Thr Gly Gly Pro 205 210 215
GGA ATG GTA AAC GCC GCA CTC AAA TCT GGT AAC CCT TCA CTC GGT GTT 843 Gly Met Val Asn Ala Ala Leu Lyε Ser Gly Asn Pro Ser Leu Gly Val 220 225 230
GGA GCT GGT AAT GGT GCT GTT TAT GTT GAT GCA ACT GCA AAT ATT GAA 891 Gly Ala Gly Aεn Gly Ala Val Tyr Val Asp Ala Thr Ala Asn He Glu 235 240 245
CGT GCC GTT GAA GAC CTT TTG CTT TCA AAA CGT TTT GAT AAT GGG ATG 939 Arg Ala Val Glu Asp Leu Leu Leu Ser Lys Arg Phe Asp Asn Gly Met 250 255 260 265
ATT TGT GCC ACT GAA AAT TCA GCT GTT ATT GAT GCT TCA GTT TAT GAT 987 He Cys Ala Thr Glu Asn Ser Ala Val He Asp Ala Ser Val Tyr Asp 270 275 280
GAA TTT ATT GCT AAA ATG CAA GAA CAA GGC GCT TAT ATG GTT CCT AAA 1035 Glu Phe He Ala Lys Met Gin Glu Gin Gly Ala Tyr Met Val Pro Lyε 285 290 295 AAA GAC TAC AAA GCT ATT GAA AGT TTC GTT TTT GTT GAA CGT GCT GGT 1083 Lys Asp Tyr Lys Ala He Glu Ser Phe Val Phe Val Glu Arg Ala Gly 300 305 310
GAA GGT TTT GGA GTA ACT GGT CCT GTT GCC GGT CGT TCT GGT CAA TGG 1131 Glu Gly Phe Gly Val Thr Gly Pro Val Ala Gly Arg Ser Gly Gin Trp 315 320 325
ATT GCT GAA CAA GCT GGT GTC AAA GTT CCT AAA GAT AAA GAT GTC CTT 1179 He Ala Glu Gin Ala Gly Val Lyε Val Pro Lyε Asp Lys Asp Val Leu 330 335 340 345
CTT TTT GAA CTT GAT AAG AAA AAT ATT GGT GAA GCA CTT TCT TCT GAA 1227 Leu Phe Glu Leu Asp Lyε Lys Asn He Gly Glu Ala Leu Ser Ser Glu 350 355 360
AAA CTT TCT CCT TTG CTT TCA ATC TAC AAA GCT GAA ACA CGT GAA GAA 1275 Lys Leu Ser Pro Leu Leu Ser He Tyr Lys Ala Glu Thr Arg Glu Glu 365 370 375
GGA ATT GAG ATT GTA CGT AGC TTA CTT GCT TAT CAA GGT GCT GGA CAT 1323 Gly He Glu He Val Arg Ser Leu Leu Ala Tyr Gin Gly Ala Gly His 380 385 390
AAT GCT GCA ATT CAA ATC GGT GCA ATG GAT GAT CCA TTC GTT AAA GAA 1371 Aεn Ala Ala He Gin He Gly Ala Met Aεp Asp Pro Phe Val Lys Glu 395 400 405
TAT GGC GAA AAA GTT GAA GCT TCT CGT ATC CTC GTT AAC CAA CCA GAT 1419 Tyr Gly Glu Lys Val Glu Ala Ser Arg He Leu Val Asn Gin Pro Asp 410 415 420 425
TCT ATT GGT GGG GTC GGA GAT ATC TAT ACT GAT GCA ATG CGT CCA TCA 1467 Ser He Gly Gly Val Gly Asp He Tyr Thr Asp Ala Met Arg Pro Ser 430 435 440
CTT ACA CTT GGA ACT GGT TCA TGG GGG AAA AAT TCA CTT TCA CAC AAT 1515 Leu Thr Leu Gly Thr Gly Ser Trp Gly Lys Asn Ser Leu Ser His Asn 445 450 455
TTG AGT ACA TAC GAT CTA TTG AAT GTT AAA ACA GTG GCT AAA CGT CGT 1563 Leu Ser Thr Tyr Asp Leu Leu Asn Val Lys Thr Val Ala Lys Arg Arg 460 465 470
AAT CGC CCA CAA TGG GTT CGT TTG CCA AAA GAA ATT TAC TAC GAA AAA 1611 Aεn Arg Pro Gin Trp Val Arg Leu Pro Lys Glu He Tyr Tyr Glu Lys 475 480 485
AAT GCA ATT TCT TAC TTA CAA GAA TTG CCA CAC GTC CAC AAA GCT TTC 1659 Asn Ala He Ser Tyr Leu Gin Glu Leu Pro His Val His Lys Ala Phe 490 495 500 505
ATC GTT GCT GAC CCT GGT ATG GTT AAA TTT GGT TTC GTT GAT AAA GTT 1707 He Val Ala Asp Pro Gly Met Val Lys Phe Gly Phe Val Asp Lys Val 510 515 520 TTG GAA CAA CTT GCT ATC CGC CCA ACT CAA GTT GAA ACA AGC ATT TAT 1755 Leu Glu Gin Leu Ala He Arg Pro Thr Gin Val Glu Thr Ser He Tyr 525 530 535
GGC TCT GTT CAA CCT GAC CCA ACT TTG AGC GAA GCA ATT GCA ATC GCT 1803 Gly Ser Val Gin Pro Asp Pro Thr Leu Ser Glu Ala He Ala He Ala 540 545 550
CGT CAA ATG AAA CAA TTT GAA CCT GAC ACT GTC ATC TGT CTT GGT GGT 1851 Arg Gin Met Lys Gin Phe Glu Pro Asp Thr Val He Cys Leu Gly Gly 555 560 565
GGT TCT GCT CTC GAT GCC GGT AAG ATT GGT CGT TTG ATT TAT GAA TAT 1899 Gly Ser Ala Leu Asp Ala Gly Lys He Gly Arg Leu He Tyr Glu Tyr 570 575 580 585
GAT GCT CGT GGT GAA GCT GAC CTT TCT GAT GAT GCA AGT TTG AAA GAA 1947 Asp Ala Arg Gly Glu Ala Asp Leu Ser Asp Asp Ala Ser Leu Lys Glu 590 595 600
CTT TTC CAA GAA TTA GCT CAA AAA TTT GTC GAT ATT CGT AAA CGT ATT 1995 Leu Phe Gin Glu Leu Ala Gin Lys Phe Val Asp He Arg Lyε Arg He 605 610 615
ATT AAA TTC TAC CAT CCA CAT AAA GCA CAA ATG GTT GCA ATT CCT ACT 2043 He Lys Phe Tyr His Pro His Lys Ala G n Met Val Ala He Pro Thr 620 625 630
ACT TCT GGT ACT GGT TCT GAA GTG ACT CCA TTT GCA GTT ATC ACT GAT 2091 Thr Ser Gly Thr Gly Ser Glu Val Thr Pro Phe Ala Val He Thr Asp 635 640 645
GAT GAA ACT CAT GTT AAG TAC CCA CTT GCT GAC TAC CAA TTA ACA CCA 2139 Asp Glu Thr His Val Lys Tyr Pro Leu Ala Asp Tyr Gin Leu Thr Pro 650 655 660 665
CAA GTT GCC ATT GTT GAC CCT GAG TTT GTT ATG ACT GTA CCA AAA CGT 2187 Gin Val Ala He Val Asp Pro Glu Phe Val Met Thr Val Pro Lys Arg 670 675 680
ACT GTT TCT TGG TCT GGT ATT GAT GCG ATG TCA CAC GCG CTT GAA TCT 2235 Thr Val Ser Trp Ser Gly He Asp Ala Met Ser His Ala Leu Glu Ser 685 690 695
TAC GTT TCT GTT ATG TCT TCT GAC TAT ACA AAA CCA ATT TCA CTT CAA 2283 Tyr Val Ser Val Met Ser Ser Asp Tyr Thr Lys Pro He Ser Leu Gin 700 705 710
GCG ATC AAA CTT ATC TTT GAA AAC TTG ACT GAG TCT TAT CAT TAT GAC 2331 Ala He Lys Leu He Phe Glu Asn Leu Thr Glu Ser Tyr His Tyr Asp 715 720 725
CCA GCG CAT CCA ACT AAA GAA GGA CAA AAA GCC CGC GAA AAC ATG CAC 2379 Pro Ala His Pro Thr Lys Glu Gly Gin Lyε Ala Arg Glu Asn Met His 730 735 740 745 AAT GCT GCA ACA CTC GCT GGT ATG GCC TTC GCT AAT GCT TTC CTT GGA 2427 Asn Ala Ala Thr Leu Ala Gly Met Ala Phe Ala Asn Ala Phe Leu Gly 750 755 760
ATT AAC CAC TCA CTT GCT CAT AAA ATT GGT GGT GAA TTT GGA CTT CCT 2475 He Asn His Ser Leu Ala His Lys He Gly Gly Glu Phe Gly Leu Pro 765 770 775
CAT GGT CTT GCC ATT GCC ATC GCT ATG CCA CAT GTC ATT AAA TTT AAC 2523 His Gly Leu Ala He Ala He Ala Met Pro His Val He Lys Phe Asn 780 785 790
GCT GTA ACA GGA AAC GTT AAA CGT ACC CCT TAC CCA CGT TAT GAA ACA 2571 Ala Val Thr Gly Asn Val Lyε Arg Thr Pro Tyr Pro Arg Tyr Glu Thr 795 800 805
TAT CGT GCT CAA GAG GAC TAC GCT GAA ATT TCA CGC TTC ATG GGA TTT 2619 Tyr Arg Ala Gin Glu Asp Tyr Ala Glu He Ser Arg Phe Met Gly Phe 810 815 820 825
GCT GGT AAA GAT GAT TCA GAT GAA AAA GCT GTG CAA GCT CTG GTT GCT 2667 Ala Gly Lys Asp Asp Ser Asp Glu Lyε Ala Val Gin Ala Leu Val Ala 830 835 840
GAA CTT AAG AAA CTG ACT GAT AGC ATT GAT ATT AAT ATC ACC CTT TCA 2715 Glu Leu Lys Lys Leu Thr Asp Ser He Asp He Asn He Thr Leu Ser 845 850 855
GGA AAT GGT ATC GAT AAA GCT CAC CTT GAA CGT GAA CTT GAT AAA TTG 2763 Gly Asn Gly He Asp Lys Ala His Leu Glu Arg Glu Leu Asp Lys Leu 860 865 870
GCT GAC CTT GTT TAT GAT GAT CAA TGT ACT CCT GCT AAT CCT CGT CAA 2811 Ala Asp Leu Val Tyr Asp Asp Gin Cys Thr Pro Ala Aεn Pro Arg Gin 875 880 885
CCA AGA ATT GAT GAG ATT AAA CAG TTG TTG TTA GAT CAA TAC TAATAATCT 2862 Pro Arg He Aεp Glu He Lyε Gin Leu Leu Leu Asp Gin Tyr 890 895 900
GTTGATAAAA TTATTAAAAC GCTCTGATCA GAGCATTTTT TATTATAGCT TATACAACTA 2922
TCAAAAGGTA TAAATCAATT TCGATATAGG CTCTTTTCAC TCCATTGATT TATGCATTTC 2982
TATAAAAATC AATAATTAAT TAGCGATAGA AGTCGAGTTC ATGCATGCTA ATAATGAAAT 3042
TGTTTTAAAT TCTGGTTTTT CTTTATGTTC TTTGCGAACA TCTTTCACAG TTTCTTTGTT 3102
CATGAAAATT CCTCCTTATT ATGGTACTAT TTTGAGCCCA AATAGTTATA TAAGAATCCT 3162
AAACTTCGGA TATCTTATCA AAG 3185
(2) INFORMATION FOR SEQ ID N0:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 903 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single <D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4:
Met Ala Thr Lyε Lyε Ala Ala Pro Ala Ala Lys Lyε Val Leu Ser Ala
1 5 10 15
Glu Glu Lys Ala Ala Lys Phe Gin Glu Ala Val Ala Tyr Thr Asp Lyε
20 25 30
Leu Val Lys Lys Ala Gin Ala Ala Val Leu Lyε Phe Glu Gly Tyr Thr
35 40 45
Gin Thr Gin Val Asp Thr He Val Ala Ala Met Ala Leu Ala Ala Ser
50 55 60
Lys His Ser Leu Glu Leu Ala Hiε Glu Ala Val Asn Glu Thr Gly Arg 65 70 75 80
Gly Val Val Glu Asp Lys Asp Thr Lys Asn His Phe Ala Ser Glu Ser
85 90 95
Val Tyr Asn Ala He Lys Asn Asp Lys Thr Val Gly Val He Ser Glu
100 105 110
Aεn Lyε Val Ala Gly Ser Val Glu He Ala Ser Pro Leu Gly Val Leu
115 120 125
Ala Gly He Val Pro Thr Thr Aεn Pro Thr Ser Thr Ala He Phe Lyε
130 135 140
Ser Leu Leu Thr Ala Lys Thr Arg Asn Ala He Val Phe Ala Phe His 145 150 155 160
Pro Gin Ala Gin Lys Cys Ser Ser His Ala Ala Lys He Val Tyr Aεp
165 170 175
Ala Ala He Glu Ala Gly Ala Pro Glu Asp Phe He Gin Trp He Glu
180 185 190
Val Pro Ser Leu Asp Met Thr Thr Ala Leu He Gin Asn Arg Gly Leu
195 200 205
Ala Thr He Leu Ala Thr Gly Gly Pro Gly Met Val Asn Ala Ala Leu
210 215 220
Lys Ser Gly Asn Pro Ser Leu Gly Val Gly Ala Gly Asn Gly Ala Val 225 230 235 240
Tyr Val Asp Ala Thr Ala Aεn He Glu Arg Ala Val Glu Asp Leu Leu
245 250 255
Leu Ser Lyε Arg Phe Asp Asn Gly Met He Cys Ala Thr Glu Aεn Ser
260 265 270
Ala Val He Asp Ala Ser Val Tyr Asp Glu Phe He Ala Lys Met Gin
275 280 285
Glu Gin Gly Ala Tyr Met Val Pro Lys Lys Asp Tyr Lys Ala He Glu
290 295 300
Ser Phe Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly 305 310 315 320
Pro Val Ala Gly Arg Ser Gly Gin Trp He Ala Glu Gin Ala Gly Val
325 330 335
Lys Val Pro Lyε Asp Lys Asp Val Leu Leu Phe Glu Leu Aεp Lys Lys
340 345 350
Asn He Gly Glu Ala Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser
355 360 365
He Tyr Lys Ala Glu Thr Arg Glu Glu Gly He Glu He Val Arg Ser
370 375 380
Leu Leu Ala Tyr Gin Gly Ala Gly Hiε Asn Ala Ala He Gin He Gly 385 390 395 400
Ala Met Aεp Aεp Pro Phe Val Lys Glu Tyr Gly Glu Lyε Val Glu Ala
405 410 415
Ser Arg He Leu Val Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp
420 425 430
He Tyr Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser 435 440 445 Trp Gly Lys Asn Ser Leu Ser Hiε Asn Leu Ser Thr Tyr Asp Leu Leu
450 455 460
Asn Val Lys Thr Val Ala Lys Arg Arg Asn Arg Pro Gin Trp Val Arg 465 470 475 480
Leu Pro Lys Glu He Tyr Tyr Glu Lys Aεn Ala He Ser Tyr Leu Gin
485 490 495
Glu Leu Pro His Val His Lys Ala Phe He Val Ala Asp Pro Gly Met
500 505 510
Val Lys Phe Gly Phe Val Asp Lyε Val Leu Glu Gin Leu Ala He Arg
515 520 525
Pro Thr Gin Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Aεp Pro
530 535 540
Thr Leu Ser Glu Ala He Ala He Ala Arg Gin Met Lys Gin Phe Glu 545 550 555 560
Pro Asp Thr Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly
565 570 575
Lyε He Gly Arg Leu He Tyr Glu Tyr Aεp Ala Arg Gly Glu Ala Asp
580 585 590
Leu Ser Asp Asp Ala Ser Leu Lyε Glu Leu Phe Gin Glu Leu Ala Gin
595 600 605
Lys Phe Val Asp He Arg Lys Arg He He Lys Phe Tyr His Pro His
610 615 620
Lys Ala Gin Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640
Val Thr Pro Phe Ala Val He Thr Asp Asp Glu Thr His Val Lyε Tyr
645 650 655
Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala He Val Asp Pro
660 665 670
Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He
675 680 685
Asp Ala Met Ser Hiε Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser
690 695 700
Asp Tyr Thr Lyε Pro He Ser Leu Gin Ala He Lys Leu He Phe Glu 705 710 715 720
Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lyε Glu
725 730 735
Gly Gin Lys Ala Arg Glu Asn Met Hiε Aεn Ala Ala Thr Leu Ala Gly
740 745 750
Met Ala Phe Ala Aεn Ala Phe Leu Gly He Aεn His Ser Leu Ala His
755 760 765
Lys He Gly Gly Glu Phe Gly Leu Pro His Gly Leu Ala He Ala He
770 775 780
Ala Met Pro His Val He Lys Phe Asn Ala Val Thr Gly Asn Val Lyε 785 790 795 800
Arg Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Aεp Tyr
805 810 815
Ala Glu He Ser Arg Phe Met Gly Phe Ala Gly Lyε Asp Asp Ser Asp
820 825 830
Glu Lys Ala Val Gin Ala Leu Val Ala Glu Leu Lys Lys Leu Thr Asp
835 840 845
Ser He Asp He Asn He Thr Leu Ser Gly Asn Gly He Asp Lys Ala
850 855 860
His Leu Glu Arg Glu Leu Asp Lys Leu Ala Asp Leu Val Tyr Asp Asp 865 870 875 880
Gin Cys Thr Pro Ala Asn Pro Arg Gin Pro Arg He Asp Glu He Lys
885 890 895
Gin Leu Leu Leu Asp Gin Tyr 900 (2) INFORMATION FOR SEQ ID NO: 5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 835 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:
Met Ala Thr Lys Lys Ala Ala Pro Ala Ala Lyε Lys Val Leu Ser Ala
1 5 10 15
Glu Glu Lys Ala Ala Lys Phe Gin Glu Ala Val Ala Tyr Thr Asp Lys
20 25 30
Leu Val Lys Lyε Ala Gin Ala Ala Val Leu Lys Phe Glu Gly Tyr Thr
35 40 45
Gin Thr Gin Val Asp Thr He Val Ala Ala Met Ala Leu Ala Ala Ser
50 55 60
Lys His Ser Leu Glu Leu Ala His Glu Ala Val Asn Glu Thr Gly Arg 65 70 75 80
Gly Val Val Glu Asp Lys Asp Thr Lys Aεn Hiε Phe Ala Ser Glu Ser
85 90 95
Val Tyr Asn Ala He Lyε Asn Asp Lys Thr Val Gly Val He Ser Glu
100 105 110
Asn Lys Val Ala Gly Ser Val Glu He Ala Ser Pro Leu Gly Val Leu
115 120 125
Ala Gly He Val Pro Thr Thr Asn Pro Thr Ser Thr Ala He Phe Lys
130 135 140
Ser Leu Leu Thr Ala Lyε Thr Arg Asn Ala He Val Phe Ala Phe His 145 150 155 160
Pro Gin Ala Gin Lys Cys Ser Ser His Ala Ala Lys He Val Tyr Asp
165 170 175
Ala Ala He Glu Ala Gly Ala Pro Glu Asp Phe He Gin Trp He Glu
180 185 190
Val Pro Ser Leu Asp Met Thr Thr Ala Leu He Gin Asn Arg Gly Leu
195 200 205
Ala Thr He Leu Ala Thr Gly Gly Pro Gly Met Val Asn Ala Ala Leu
210 215 220
Lys Ser Gly Asn Pro Ser Leu Gly Val Gly Ala Gly Asn Gly Ala Val 225 230 235 240
Tyr Val Asp Ala Thr Ala Asn He Glu Arg Ala Val Glu Asp Leu Leu
245 250 255
Leu Ser Lys Arg Phe Asp Asn Gly Met He Cys Ala Thr Glu Asn Ser
260 265 270
Ala Val He Asp Ala Ser Val Tyr Asp Glu Phe He Ala Lys Met Gin
275 280 285
Glu Gin Gly Ala Tyr Met Val Pro Lys Lys Asp Tyr Lys Ala He Glu
290 295 300
Ser Phe Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly 305 310 315 320
Pro Val Ala Gly Arg Ser Gly Gin Trp He Ala Glu Gin Ala Gly Val
325 330 335
Lys Val Pro Lys Asp Lys Asp Val Leu Leu Phe Glu Leu Asp Lyε Lyε
340 345 350
Asn He Gly Glu Ala Leu Ser Ser Glu Lyε Leu Ser Pro Leu Leu Ser 355 360 365 He Tyr Lys Ala Glu Thr Arg Glu Glu Gly He Glu He Val Arg Ser
370 375 380
Leu Leu Ala Tyr Gin Gly Ala Gly Hiε Asn Ala Ala He Gin He Gly 385 390 395 400
Ala Met Asp Aεp Pro Phe Val Lys Glu Tyr Gly Glu Lys Val Glu Ala
405 410 415
Ser Arg He Leu Val Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp
420 425 430
He Tyr Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser
435 440 445
Trp Gly Lys Asn Ser Leu Ser Hiε Aεn Leu Ser Thr Tyr Asp Leu Leu
450 455 460
Asn Val Lys Thr Val Ala Lyε Arg Arg Asn Arg Pro Gin Trp Val Arg 465 470 475 480
Leu Pro Lys Glu He Tyr Tyr Glu Lys Asn Ala He Ser Tyr Leu Gin
485 490 495
Glu Leu Pro His Val His Lys Ala Phe He Val Ala Asp Pro Gly Met
500 505 510
Val Lys Phe Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala He Arg
515 520 525
Pro Thr Gin Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Asp Pro
530 535 540
Thr Leu Ser Glu Ala He Ala He Ala Arg Gin Met Lys Gin Phe Glu 545 550 555 560
Pro Asp Thr Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly
565 570 575
Lys He Gly Arg Leu He Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp
580 585 590
Leu Ser Asp Asp Ala Ser Leu Lyε Glu Leu Phe Gin Glu Leu Ala Gin
595 600 605
Lyε Phe Val Asp He Arg Lys Arg He He Lys Phe Tyr Hiε Pro His
610 615 620
Lyε Ala Gin Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640
Val Thr Pro Phe Ala Val He Thr Aεp Aεp Glu Thr His Val Lys Tyr
645 650 655
Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala He Val Asp Pro
660 665 670
Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He
675 680 685
Asp Ala Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser
690 695 700
Aεp Tyr Thr Lys Pro He Ser Leu Gin Ala He Lys Leu He Phe Glu 705 710 715 720
Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu
725 730 735
Gly Gin Lys Ala Arg Glu Asn Met His Aεn Ala Ala Thr Leu Ala Gly
740 745 750
Met Ala Phe Ala Asn Ala Phe Leu Gly He Asn His Ser Leu Ala His
755 760 765
Lys He Gly Gly Glu Phe Gly Leu Pro His Gly Leu Ala He Ala He
770 775 780
Ala Met Pro Hiε Val He Lyε Phe Aεn Ala Val Thr Gly Aεn Val Lys 785 790 795 800
Arg Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Asp Tyr
805 810 815
Ala Glu He Ser Arg Phe Met Gly Phe Ala Gly Lys Asp Aεp Ser Asp 820 825 830 Glu Lyε Ala 835
(2) INFORMATION FOR SEQ ID NO : 6 :
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 797 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:
Ala Val Thr Asn Val Ala Glu Leu Asn Ala Leu Val Glu Arg Val Lys
1 5 10 15
Lyε Ala Gin Arg Glu Tyr Ala Ser Phe Thr Gin Glu Gin Val Aεp Lyε
20 25 30
He Phe Arg Ala Ala Ala Leu Ala Ala Ala Aεp Ala Arg He Pro Leu
35 40 45
Ala Lys Met Ala Val Ala Glu Ser Gly Met Gly He Val Glu Asp Lyε
50 55 60
Val He Lys Aεn His Phe Ala Ser Glu Tyr He Tyr Asn Ala Tyr Lys 65 70 75 80
Asp Glu Lys Thr Cys Gly Val Leu Ser Glu Asp Asp Thr Phe Gly Thr
85 90 95
He Thr He Ala Glu Pro He Gly He He Cys Gly He Val Pro Thr
100 105 110
Thr Asn Pro Thr Ser Thr Ala He Phe Lys Ser Leu He Ser Leu Lys
115 120 125
Thr Arg Asn Ala He He Phe Ser Pro Hiε Pro Arg Ala Lyε Asp Ala
130 135 140
Thr Aεn Lyε Ala Ala Asp He Val Leu Gin Ala Ala He Ala Ala Gly 145 150 155 160
Ala Pro Lys Aεp Leu He Gly Trp He Aεp Gin Pro Ser Val Glu Leu
165 170 175
Ser Asn Ala Leu Met His His Pro Asp He Asn Leu He Leu Ala Thr
180 185 190
Gly Gly Pro Gly Met Val Lys Ala Ala Tyr Ser Ser Gly Lys Pro Ala
195 200 205
He Gly Val Gly Ala Gly Asn Thr Pro Val Val He Asp Glu Thr Ala
210 215 220
Asp He Lys Arg Ala Val Ala Ser Val Leu Met Ser Lys Thr Phe Asp 225 230 235 240
Asn Gly Val He Cys Ala Ser Glu Gin Ser Val Val Val Val Aεp Ser
245 250 255
Val Tyr Asp Ala Val Arg Glu Arg Phe Ala Thr His Gly Gly Tyr Leu
260 265 270
Leu Gin Gly Lyε Glu Leu Lyε Ala Val Gin Asp Val He Leu Lys Asn
275 280 285
Gly Ala Leu Asn Ala Ala He Val Gly Gin Pro Ala Tyr Lyε He Ala
290 295 300
Glu Leu Ala Gly Phe Ser Val Pro Glu Aεn Thr Lys He Leu He Gly 305 310 315 320
Glu Val Thr Val Val Aεp Glu Ser Glu Pro Phe Ala His Glu Lys Leu 325 330 335 Ser Pro Thr Leu Ala Met Tyr Arg Ala Lys Asp Phe Glu Asp Ala Val
340 345 350
Glu Lys Ala Glu Lys Leu Val Ala Met Gly Gly He Gly His Thr Ser
355 360 365
Cys Leu Tyr Thr Asp Gin Asp Asn Gin Pro Ala Arg Val Ser Tyr Phe
370 375 380
Gly Gin Lys Met Lys Thr Ala Arg He Leu He Aεn Thr Pro Ala Ser 385 390 395 400
Gin Gly Gly He Gly Asp Leu Tyr Asn Phe Lys Leu Ala Pro Ser Leu
405 410 415
Thr Leu Gly Cys Gly Ser Trp Gly Gly Asn Ser He Ser Glu Aεn Val
420 425 430
Gly Pro Lys His Leu He Asn Lys Lys Thr Val Ala Lys Arg Ala Glu
435 440 445
Asn Met Leu Trp Hiε Lys Leu Pro Lyε Ser He Tyr Phe Arg Arg Gly
450 455 460
Ser Leu Pro He Ala Leu Aεp Glu Val He Thr Aεp Gly Hiε Lyε Arg 465 470 475 480
Ala Leu He Val Thr Asp Arg Phe Leu Phe Aεn Aεn Gly Tyr Ala Asp
485 490 495
Gin He Thr Ser Val Leu Lys Ala Ala Gly Val Glu Thr Glu Val Phe
500 505 510
Phe Glu Val Glu Ala Asp Pro Thr Leu Ser He Val Arg Lyε Gly Ala
515 520 525
Glu Leu Ala Asn Ser Phe Lys Pro Asp Val He He Ala Leu Gly Gly
530 535 540
Gly Ser Pro Met Asp Ala Ala Lys He Met Trp Val Met Tyr Glu His 545 550 555 560
Pro Glu Thr His Phe Glu Glu Leu Ala Leu Arg Phe Met Asp He Arg
565 570 575
Lys Arg He Tyr Lys Phe Pro Lys Met Gly Val Lys Ala Lys Met He
580 585 590
Ala Val Thr Thr Thr Ser Gly Thr Gly Ser Glu Val Thr Pro Phe Ala
595 600 605
Val Val Thr Asp Asp Ala Thr Gly Gin Lys Tyr Pro Leu Ala Asp Tyr
610 615 620
Ala Leu Thr Pro Asp Met Ala He Val Asp Ala Asn Leu Val Met Asp 625 630 635 640
Met Pro Lys Ser Leu Cys Ala Phe Gly Gly Leu Asp Ala Val Thr His
645 650 655
Ala Met Glu Ala Tyr Val Ser Val Leu Ala Ser Glu Phe Ser Aεp Gly
660 665 670
Gin Ala Leu Gin Ala Leu Lys Leu Leu Lys Glu Tyr Leu Pro Ala Ser
675 680 685
Tyr His Glu Gly Ser Lys Asn Pro Val Ala Arg Glu Arg Val His Ser
690 695 700
Ala Ala Thr He Ala Gly He Ala Phe Ala Asn Ala Phe Leu Gly Val 705 710 715 720
Cys His Ser Met Ala His Lys Leu Gly Ser Gin Phe His He Pro His
725 730 735
Gly Leu Ala Asn Ala Leu Leu He Cys Asn Val He Arg Tyr Asn Ala
740 745 750
Asn Asp Asn Pro Thr Lys Gin Thr Ala Phe Ser Gin Tyr Asp Arg Pro
755 760 765
Gin Ala Arg Arg Arg Tyr Ala Glu He Ala Asp His Leu Gly Leu Ser
770 775 780
Ala Pro Gly Asp Arg Thr Ala Ala Lys He Glu Lys Leu 785 790 795 (2) INFORMATION FOR SEQ ID NO: 7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 19 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: CTTCTTTGGT TGGATGAGC 19
(2) INFORMATION FOR SEQ ID NO: 8:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 490 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 :
Tyr Gin Gly Ala Gly His Asn Ala Ala He Gin He Gly Ala Met Asp
1 5 10 15
Aεp Pro Phe Val Lyε Glu Tyr Gly He Lys Val Glu Ala Ser Arg He
20 25 30
Leu Val Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp He Tyr Thr
35 40 45
Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser Trp Gly Lys
50 55 60
Aεn Ser Leu Ser Hiε Aεn Leu Ser Thr Tyr Aεp Leu Leu Asn Val Lyε 65 70 75 80
Thr Val Ala Lyε Arg Arg Asn Arg Pro Gin Trp Val Arg Leu Pro Lys
85 90 95
Glu He Tyr Tyr Glu Lys Asn Ala He Ser Tyr Leu Gin Glu Leu Pro
100 105 110
His Val His Lys Ala Phe He Val Ala Asp Pro Gly Met Val Lys Phe
115 120 125
Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala He Arg Pro Thr Gin
130 135 140
Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Aεp Pro Thr Leu Ser 145 150 155 160
Glu Ala He Ala He Ala Arg Gin Met Aεn His Phe Glu Pro Asp Thr
165 170 175
Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly Lys He Gly
180 185 190
Arg Leu He Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp Leu Ser Aεp
195 200 205
Aεp Ala Ser Leu Lyε Glu He Phe Gin Glu Leu Ala Gin Lyε Phe Val
210 215 220
Aεp He Arg Lys Arg He He Lys Phe Tyr Hiε Pro His Lyε Ala Gin 225 230 235 240
Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu Val Thr Pro 245 250 255 Phe Ala Val He Thr Aεp Asp Glu Thr His Val Lys Tyr Pro Leu Ala
260 265 270
Asp Tyr Gin Leu Thr Pro Gin Val Ala He Val Asp Pro Glu Phe Val
275 280 285
Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He Asp Ala Met
290 295 300
Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser Aεp Tyr Thr 305 310 315 320
Lyε Pro He Ser Leu Gin Ala He Lyε Leu He Phe Glu Asn Leu Thr
325 330 335
Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu Gly Gin Lyε
340 345 350
Ala Arg Glu Aεn Met Hiε Asn Ala Ala Thr Leu Ala Gly Met Ala Phe
355 360 365
Ala Asn Ala Phe Leu Gly He Asn Hiε Ser Leu Ala His Lys He Ala
370 375 380
Gly Glu Phe Gly Leu Pro His Gly Leu Ala He Ala He Ala Met Pro 385 390 395 400
His Val He Lys Phe Asn Ala Val Thr Gly Asn Val Lys Phe Thr Pro
405 410 415
Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Aεp Tyr Ala Glu He
420 425 430
Ser Arg Phe Met Gly Phe Ala Gly Lyε Glu Asp Ser Aεp Glu Lyε Ala
435 440 445
Val Lyε Ala Phe Val Ala Glu Leu Lyε Lyε Leu Thr Aεp Ser He Asp
450 455 460
He Asn He Thr Leu Ser Gly Aεn Gly Val Aεp Lys Ala His Leu Glu 465 470 475 480
Arg Glu Leu Asp Lys Leu Ala Asp Leu Val 485 490
(2) INFORMATION FOR SEQ ID NO:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 903 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:
Met Ala Thr Lys Lys Ala Ala Pro Ala Ala Lys Lys Val Leu Ser Ala
1 5 10 15
Glu Glu Lys Ala Ala Lys Phe Gin Glu Ala Val Ala Tyr Thr Asp Lys
20 25 30
Leu Val Lys Lyε Ala Gin Ala Ala Val Leu Lys Phe Glu Gly Tyr Thr
35 40 45
Gin Thr Gin Val Asp Thr He Val Ala Ala Met Ala Leu Ala Ala Ser
50 55 60
Lys His Ser Leu Glu Leu Ala His Glu Ala Val Asn Glu Thr Gly Arg 65 70 75 80
Gly Val Val Glu Asp Lys Asp Thr Lys Asn His Phe Ala Ser Glu Ser
85 90 95
Val Tyr Asn Ala He Lys Asn Asp Lys Thr Val Gly Val He Ser Glu 100 105 110 Asn Lys Val Ala Gly Ser Val Glu He Ala Ser Pro Leu Gly Val Leu
115 120 125
Ala Gly He Val Pro Thr Thr Asn Pro Thr Ser Thr Ala He Phe Lys
130 135 140
Ser Leu Leu Thr Ala Lys Thr Arg Aεn Ala He Val Phe Ala Phe His 145 150 155 160
Pro Gin Ala Gin Lys Cys Ser Ser His Ala Ala Lys He Val Tyr Asp
165 170 175
Ala Ala He Glu Ala Gly Ala Pro Glu Asp Phe He Gin Trp He Glu
180 185 190
Val Pro Ser Leu Asp Met Thr Thr Ala Leu He Gin Aεn Arg Gly Leu
195 200 205
Ala Thr He Leu Ala Thr Gly Gly Pro Gly Met Val Aεn Ala Ala Leu
210 215 220
Lys Ser Gly Aεn Pro Ser Leu Gly Val Gly Ala Gly Asn Gly Ala Val 225 230 235 240
Tyr Val Aεp Ala Thr Ala Asn He Glu Arg Ala Val Glu Aεp Leu Leu
245 250 255
Leu Ser Lys Arg Phe Asp Asn Gly Met He Cyε Ala Thr Glu Asn Ser
260 265 270
Ala Val He Asp Ala Ser Val Tyr Asp Glu Phe He Ala Lyε Met Gin
275 280 285
Glu Gin Gly Ala Tyr Met Val Pro Lyε Lyε Asp Tyr Lys Ala He Glu
290 295 300
Ser Phe Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly 305 310 315 320
Pro Val Ala Gly Arg Ser Gly Gin Trp He Ala Glu Gin Ala Gly Val
325 330 335
Lys Val Pro Lys Asp Lys Asp Val Leu Leu Phe Glu Leu Asp Lyε Lys
340 345 350
Asn He Gly Glu Ala Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser
355 360 365
He Tyr Lyε Ala Glu Thr Arg Glu Glu Gly He Glu He Val Arg Ser
370 375 380
Leu Leu Ala Tyr Gin Gly Ala Gly His Asn Ala Ala He Gin He Gly 385 390 395 400
Ala Met Asp Asp Pro Phe Val Lys Glu Tyr Gly Glu Lys Val Glu Ala
405 410 415
Ser Arg He Leu Val Asn Gin Pro Aεp Ser He Gly Gly Val Gly Asp
420 425 430
He Tyr Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser
435 440 445
Trp Gly Lys Asn Ser Leu Ser His Asn Leu Ser Thr Tyr Asp Leu Leu
450 455 460
Asn Val Lys Thr Val Ala Lye Arg Arg Asn Arg Pro Gin Trp Val Arg 465 470 475 480
Leu Pro Lys Glu He Tyr Tyr Glu Lys Asn Ala He Ser Tyr Leu Gin
485 490 495
Glu Leu Pro His Val Hiε Lyε Ala Phe He Val Ala Asp Pro Gly Met
500 505 510
Val Lys Phe Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala He Arg
515 520 525
Pro Thr Gin Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Asp Pro
530 535 540
Thr Leu Ser Glu Ala He Ala He Ala Arg Gin Met Lyε Gin Phe Glu 545 550 555 560
Pro Asp Thr Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly 565 570 575 Lys He Gly Arg Leu He Tyr Glu Tyr Aεp Ala Arg Gly Glu Ala Aεp
580 585 590
Leu Ser Asp Asp Ala Ser Leu Lys Glu Leu Phe Gin Glu Leu Ala Gin
595 600 605
Lyε Phe Val Asp He Arg Lys Arg He He Lys Phe Tyr His Pro His
610 615 620
Lys Ala Gin Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640
Val Thr Pro Phe Ala Val He Thr Asp Asp Glu Thr His Val Lys Tyr
645 650 655
Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala He Val Asp Pro
660 665 670
Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He
675 680 685
Asp Ala Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser
690 695 700
Asp Tyr Thr Lys Pro He Ser Leu Gin Ala He Lys Leu He Phe Glu 705 710 715 720
Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu
725 730 735
Gly Gin Lys Ala Arg Glu Asn Met His Aεn Ala Ala Thr Leu Ala Gly
740 745 750
Met Ala Phe Ala Aεn Ala Phe Leu Gly He Asn His Ser Leu Ala His
755 760 765
Lyε He Gly Gly Glu Phe Gly Leu Pro His Gly Leu Ala He Ala He
770 775 780
Ala Met Pro His Val He Lys Phe Asn Ala Val Thr Gly Asn Val Lyε 785 790 795 800
Arg Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Aεp Tyr
805 810 815
Ala Glu He Ser Arg Phe Met Gly Phe Ala Gly Lys Asp Asp Ser Asp
820 825 830
Glu Lys Ala Val Gin Ala Leu Val Ala Glu Leu Lys Lys Leu Thr Asp
835 840 845
Ser He Asp He Aεn He Thr Leu Ser Gly Aεn Gly He Asp Lys Ala
850 855 860
His Leu Glu Arg Glu Leu Asp Lys Leu Ala Asp Leu Val Tyr Asp Asp 865 870 875 880
Gin Cys Thr Pro Ala Asn Pro Arg Gin Pro Arg He Asp Glu He Lys
885 890 895
Gin Leu Leu Leu Asp Gin Tyr 900
(2) INFORMATION FOR SEQ ID NO: 10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 891 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:
Met Ala Val Thr Asn Val Ala Glu Leu Asn Ala Leu Val Glu Arg Val 1 5 10 15 Lys Lys Ala Gin Arg Glu Tyr Ala Ser Phe Thr Gin Glu Gin Val Asp
20 25 30
Lys He Phe Arg Ala Ala Ala Leu Ala Ala Ala Asp Ala Arg He Pro
35 40 45
Leu Ala Lys Met Ala Val Ala Glu Ser Gly Met Gly He Val Glu Aεp
50 55 60
Lys Val He Lys Asn His Phe Ala Ser Glu Tyr He Tyr Asn Ala Tyr 65 70 75 80
Lys Asp Glu Lys Thr Cys Gly Val Leu Ser Glu Asp Asp Thr Phe Gly
85 90 95
Thr He Thr He Ala Glu Pro He Gly He He Cys Gly He Val Pro
100 105 110
Thr Thr Asn Pro Thr Ser Thr Ala He Phe Lys Ser Leu He Ser Leu
115 120 125
Lys Thr Arg Asn Ala He He Phe Ser Pro Hiε Pro Arg Ala Lys Asp
130 135 140
Ala Thr Asn Lyε Ala Ala Aεp He Val Leu Gin Ala Ala He Ala Ala 145 150 155 160
Gly Ala Pro Lyε Asp Leu He Gly Trp He Aεp Gin Pro Ser Val Glu
165 170 175
Leu Ser Asn Ala Leu Met Hiε Hiε Pro Asp He Asn Leu He Leu Ala
180 185 190
Thr Gly Gly Pro Gly Met Val Lyε Ala Ala Tyr Ser Ser Gly Lyε Pro
195 200 205
Ala He Gly Val Gly Ala Gly Asn Thr Pro Val Val He Asp Glu Thr
210 215 220
Ala Aεp He Lyε Arg Ala Val Ala Ser Val Leu Met Ser Lyε Thr Phe 225 230 235 240
Asp Asn Gly Val He Cys Ala Ser Glu Gin Ser Val Val Val Val Aεp
245 250 255
Ser Val Tyr Asp Ala Val Arg Glu Arg Phe Ala Thr Hiε Gly Gly Tyr
260 265 270
Leu Leu Gin Gly Lyε Glu Leu Lys Ala Val Gin Asp Val He Leu Lys
275 280 285
Asn Gly Ala Leu Asn Ala Ala He Val Gly Gin Pro Ala Tyr Lys He
290 295 300
Ala Glu Leu Ala Gly Phe Ser Val Pro Glu Asn Thr Lys He Leu He 305 310 315 320
Gly Glu Val Thr Val Val Asp Glu Ser Glu Pro Phe Ala His Glu Lys
325 330 335
Leu Ser Pro Thr Leu Ala Met Tyr Arg Ala Lyε Asp Phe Glu Asp Ala
340 345 350
Val Glu Lyε Ala Glu Lys Leu Val Ala Met Gly Gly He Gly His Thr
355 360 365
Ser Cys Leu Tyr Thr Asp Gin Asp Asn Gin Pro Ala Arg Val Ser Tyr
370 375 380
Phe Gly Gin Lys Met Lys Thr Ala Arg He Leu He Asn Thr Pro Ala 385 390 395 400
Ser Gin Gly Gly He Gly Aεp Leu Tyr Aεn Phe Lys Leu Ala Pro Ser
405 410 415
Leu Thr Leu Gly Cys Gly Ser Trp Gly Gly Asn Ser He Ser Glu Asn
420 425 430
Val Gly Pro Lys His Leu He Aεn Lyε Lys Thr Val Ala Lys Arg Ala
435 440 445
Glu Asn Met Leu Trp His Lyε Leu Pro Lyε Ser He Tyr Phe Arg Arg
450 455 460
Gly Ser Leu Pro He Ala Leu Aεp Glu Val He Thr Aεp Gly His Lyε 465 470 475 480 Arg Ala Leu He Val Thr Asp Arg Phe Leu Phe Asn Asn Gly Tyr Ala
485 490 495
Asp Gin He Thr Ser Val Leu Lys Ala Ala Gly Val Glu Thr Glu Val
500 505 510
Phe Phe Glu Val Glu Ala Asp Pro Thr Leu Ser He Val Arg Lys Gly
515 520 525
Ala Glu Leu Ala Asn Ser Phe Lys Pro Asp Val He He Ala Leu Gly
530 535 540
Gly Gly Ser Pro Met Asp Ala Ala Lys He Met Trp Val Met Tyr Glu 545 550 555 560
His Pro Glu Thr His Phe Glu Glu Leu Ala Leu Arg Phe Met Asp He
565 570 575
Arg Lys Arg He Tyr Lys Phe Pro Lys Met Gly Val Lys Ala Lyε Met
580 585 590
He Ala Val Thr Thr Thr Ser Gly Thr Gly Ser Glu Val Thr Pro Phe
595 600 605
Ala Val Val Thr Aεp Aεp Ala Thr Gly Gin Lyε Tyr Pro Leu Ala Asp
610 615 620
Tyr Ala Leu Thr Pro Asp Met Ala He Val Asp Ala Asn Leu Val Met 625 630 635 640
Asp Met Pro Lys Ser Leu Cys Ala Phe Gly Gly Leu Asp Ala Val Thr
645 650 655
His Ala Met Glu Ala Tyr Val Ser Val Leu Ala Ser Glu Phe Ser Asp
660 665 670
Gly Gin Ala Leu Gin Ala Leu Lys Leu Leu Lys Glu Tyr Leu Pro Ala
675 680 685
Ser Tyr His Glu Gly Ser Lys Asn Pro Val Ala Arg Glu Arg Val His
690 695 700
Ser Ala Ala Thr He Ala Gly He Ala Phe Ala Asn Ala Phe Leu Gly 705 710 715 720
Val Cys His Ser Met Ala Hiε Lyε Leu Gly Ser Gin Phe Hiε He Pro
725 730 735
His Gly Leu Ala Asn Ala Leu Leu He Cys Asn Val He Arg Tyr Asn
740 745 750
Ala Asn Asp Asn Pro Thr Lys Gin Thr Ala Phe Ser Gin Tyr Asp Arg
755 760 765
Pro Gin Ala Arg Arg Arg Tyr Ala Glu He Ala Asp His Leu Gly Leu
770 775 780
Ser Ala Pro Gly Asp Arg Thr Ala Ala Lyε He Glu Lys Leu Leu Ala 785 790 795 800
Trp Leu Glu Thr Leu Lys Ala Glu Leu Gly He Pro Lys Ser He Arg
805 810 815
Glu Ala Gly Val Gin Glu Ala Asp Phe Leu Ala Asn Val Asp Lyε Leu
820 825 830
Ser Glu Aεp Ala Phe Asp Asp Gin Cys Thr Gly Ala Asn Pro Arg Tyr
835 840 845
Pro Leu He Ser Glu Leu Lys Gin He Leu Leu Asp Thr Tyr Tyr Gly
850 855 860
Arg Asp Tyr Val Glu Gly Glu Thr Ala Ala Lys Lys Glu Ala Ala Pro 865 870 875 880
Ala Lys Ala Glu Lys Lys Ala Lyε Lyε Ser Ala 885 890 (2) INFORMATION FOR SEQ ID NO: 11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 862 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: None
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:
Met Lys Val Thr Thr Val Lyε Glu Leu Asp Glu Lys Leu Lyε Val He
1 5 10 15
Lyε Glu Ala Gin Lyε Lys Phe Ser Cys Tyr Ser Gin Glu Met Val Asp
20 25 30
Glu He Phe Arg Aεn Ala Ala Met Ala Ala He Aεp Ala Arg He Glu
35 40 45
Leu Ala Lys Ala Ala Val Leu Glu Thr Gly Met Gly Leu Val Glu Asp
50 55 60
Lys Val He Lys Asn His Phe Ala Gly Glu Tyr He Tyr Aεn Lys Tyr 65 70 75 80
Lys Aεp Glu Lys Thr Cys Gly He He Glu Arg Aεn Glu Pro Tyr Gly
85 90 95
He Thr Lys He Ala Glu Pro He Gly Val Val Ala Ala He He Pro
100 105 110
Val Thr Asn Pro Thr Ser Thr Thr He Phe Lyε Ser Leu He Ser Leu
115 120 125
Lys Thr Arg Asn Gly He Phe Phe Ser Pro Hiε Pro Arg Ala Lyε Lys
130 135 140
Ser Thr He Leu Ala Ala Lys Thr He Leu Asp Ala Ala Val Lys Ser 145 150 155 160
Gly Ala Pro Glu Asn He He Gly Trp He Aεp Glu Pro Ser He Glu
165 170 175
Leu Thr Gin Tyr Leu Met Gin Lyε Ala Aεp He Thr Leu Ala Thr Gly
180 185 190
Gly Pro Ser Leu Val Lys Ser Ala Tyr Ser Ser Gly Lys Pro Ala He
195 200 205
Gly Val Gly Pro Gly Asn Thr Pro Val He He Asp Glu Ser Ala His
210 215 220
He Lys Met Ala Val Ser Ser He He Leu Ser Lyε Thr Tyr Asp Asn 225 230 235 240
Gly Val He Cyε Ala Ser Glu Gin Ser Val He Val Leu Lys Ser He
245 250 255
Tyr Asn Lys Val Lys Asp Glu Phe Gin Glu Arg Gly Ala Tyr He He
260 265 270
Lyε Lys Aεn Glu Leu Asp Lys Val Arg Glu Val He Phe Lyε Aεp Gly
275 280 285
Ser Val Aεn Pro Lyε He Val Gly Gin Ser Ala Tyr Thr He Ala Ala
290 295 300
Met Ala Gly He Lyε Val Pro Lyε Thr Thr Arg He Leu He Gly Glu 305 310 315 320
Val Thr Ser Leu Gly Glu Glu Glu Pro Phe Ala His Glu Lyε Leu Ser
325 330 335
Pro Val Leu Ala Met Tyr Glu Ala Asp Asn Phe Asp Asp Ala Leu Lys
340 345 350
Lys Ala Val Thr Leu He Asn Leu Gly Gly Leu Gly His Thr Ser Gly 355 360 365 He Tyr Ala Asp Glu He Lys Ala Arg Aεp Lys He Asp Arg Phe Ser
370 375 380
Ser Ala Met Lys Thr Val Arg Thr Phe Val Asn He Pro Thr Ser Gin 385 390 395 400
Gly Ala Ser Gly Asp Leu Tyr Asn Phe Arg He Pro Pro Ser Phe Thr
405 410 415
Leu Gly Cys Gly Phe Trp Gly Gly Aεn Ser Val Ser Glu Asn Val Gly
420 425 430
Pro Lys His Leu Leu Aεn He Lyε Thr Val Ala Glu Arg Arg Glu Asn
435 440 445
Met Leu Trp Phe Arg Val Pro His Lys Val Tyr Phe Lys Phe Gly Cys
450 455 460
Leu Gin Phe Ala Leu Lys Asp Leu Lys Asp Leu Lyε Lys Lyε Arg Ala 465 470 475 480
Phe He Val Thr Asp Ser Asp Pro Tyr Asn Leu Asn Tyr Val Asp Ser
485 490 495
He He Lys He Leu Glu His Leu Aεp He Asp Phe Lys Val Phe Asn
500 505 510
Lys Val Gly Arg Glu Ala Asp Leu Lys Thr He Lys Lys Ala Thr Glu
515 520 525
Glu Met Ser Ser Phe Met Pro Asp Thr He He Ala Leu Gly Gly Thr
530 535 540
Pro Glu Met Ser Ser Ala Lyε Leu Met Trp Val Leu Tyr Glu Hiε Pro 545 550 555 560
Glu Val Lys Phe Glu Aεp Leu Ala He Lys Phe Met Asp He Arg Lys
565 570 575
Arg He Tyr Thr Phe Pro Lys Leu Gly Lys Lys Ala Met Leu Val Ala
580 585 590
He Thr Thr Ser Ala Gly Ser Gly Ser Glu Val Thr Pro Phe Ala Leu
595 600 605
Val Thr Asp Asn Asn Thr Gly Asn Lys Tyr Met Leu Ala Asp Tyr Glu
610 615 620
Met Thr Pro Asn Met Ala He Val Asp Ala Glu Leu Met Met Lys Met 625 630 635 640
Pro Lys Gly Leu Thr Ala Tyr Ser Gly He Asp Ala Leu Val Asn Ser
645 650 655
He Glu Ala Tyr Thr Ser Val Tyr Ala Ser Glu Tyr Thr Asn Gly Leu
660 665 670
Ala Leu Glu Ala He Arg Leu He Phe Lys Tyr Leu Pro Glu Ala Tyr
675 680 685
Lys Asn Gly Arg Thr Asn Glu Lys Ala Arg Glu Lys Met Ala His Ala
690 695 700
Ser Thr Met Ala Gly Met Ala Ser Ala Asn Ala Phe Leu Gly Leu Cys 705 710 715 720
Hiε Ser Met Ala He Lys Leu Ser Ser Glu His Asn He Pro Ser Gly
725 730 735
He Ala Asn Ala Leu Leu He Glu Glu Val He Lys Phe Asn Ala Val
740 745 750
Asp Asn Pro Val Lys Gin Ala Pro Cys Pro Gin Tyr Lys Tyr Pro Asn
755 760 765
Thr He Phe Arg Tyr Ala Arg He Ala Aεp Tyr He Lyε Leu Gly Gly
770 775 780
Asn Thr Asp Glu Glu Lyε Val Aεp Leu Leu He Aεn Lys He His Glu 785 790 795 800
Leu Lys Lys Ala Leu Asn He Pro Thr Ser He Lyε Asp Ala Gly Val
805 810 815
Leu Glu Glu Asn Phe Tyr Ser Ser Leu Asp Arg He Ser Glu Leu Ala 820 825 830 Leu Aεp Aεp Gin Cyε Thr Gly Ala Aεn Pro Arg Phe Pro Leu Thr Ser
835 840 845
Glu He Lyε Glu Met Tyr He Aεn Cys Phe Lyε Lys Gin Pro
850 855 860
(2) INFORMATION FOR SEQ ID NO: 12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1470 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: CDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:
TACCAAGGAG CTGGTCACAA CGCTGCAATT CAAATCGGTG CAATGGACGA CCCATTTGTC 60
AAAGAATACG GAATTAAAGT CGAAGCTTCT CGTATCCTCG TTAACCAACC TGACTCTATC 120
GGTGGGGTCG GAGATATTTA TACTGATGCA ATGCGTCCAT CATTGACGCT CGGAACTGGT 180
TCATGGGGGA AAAATTCACT TTCACACAAT TTGAGTACAT ACGATCTATT GAATGTTAAA 240
ACAGTGGCTA AACGTCGTAA TCGCCCTCAA TGGGTTCGTT TGCCAAAAGA AATTTACTAC 300
GAAAAAAATG CAATTTCTTA CTTACAAGAA TTGCCACACG TCCACAAAGC TTTCATTGTT 360
GCCGACCCTG GTATGGTTAA ATTCGGTTTC GTTGATAAAG TTTTGGAACA ACTTGCTATC 420
CGCCCAACTC AAGTTGAAAC AAGCATTTAT GGCTCAGTCC AACCTGACCC AACTTTGAGT 480
GAAGCAATTG CAATCGCTCG TCAAATGAAC CATTTTGAAC CTGACACTGT CATCTGTCTT 540
GGTGGTGGTT CTGCTCTCGA TGCTGGTAAG ATTGGTCGTT TGATTTATGA ATATGATGCT 600
CGTGGTGAGG CTGACCTTTC CGATGACGCA AGTTTGAAAG AGATCTTCCA AGAGTTAGCT 660
CAAAAATTTG TTGATATTCG TAAACGTATT ATCAAATTCT ACCACCCACA CAAAGCACAA 720
ATGGTTGCTA TCCCTACTAC TTCTGGTACT GGTTCTGAAG TGACTCCATT TGCGGTTATC 780
ACTGATGATG AAACTCACGT TAAATATCCA CTTGCTGACT ATCAATTGAC ACCTCAAGTT 840
GCCATTGTTG ACCCTGAGTT TGTTATGACT GTACCAAAAC GTACTGTTTC TTGGTCTGGG 900
ATTGATGCTA TGTCACACGC GCTTGAATCT TATGTTTCTG TCATGTCTTC TGACTATACA 960
AAACCAATTT CACTTCAAGC CATCAAACTC ATCTTTGAAA ACTTGACTGA GTCTTATCAT 1020
TATGACCCAG CTCATCCAAC CAAAGAAGGT CAAAAAGCTC GCGAAAACAT GCACAATGCT 1080
GCAACACTCG CTGGTATGGC CTTCGCCAAT GCTTTCCTTG GAATTAACCA CTCACTTGCT 1140
CATAAAATTG CTGGTGAATT TGGGCTTCCT CATGGTCTTG CCATTGCTAT CGCTATGCCA 1200
CATGTCATTA AATTTAACGC TGTAACAGGA AACGTTAAAT TTACCCCTTA CCCACGTTAT 1260
GAAACTTATC GTGCGCAAGA AGACTACGCT GAAATTTCAC GCTTCATGGG ATTTGCTGGC 1320
AAAGAAGATT CAGATGAAAA AGCGGTCAAA GCTTTTGTTG CTGAACTTAA AAAATTGACT 1380
GATAGTATTG ATATTAATAT CACCCTTTCA GGAAATGGTG TAGATAAAGC TCACCTTGAA 1440
CGTGAGCTTG ATAAATTGGC TGACCTTGTT 1470
(2) INFORMATION FOR SEQ ID NO: 13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3193 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:
AAGCTTGTTA CAAAACCGTT TTCTAAACTT TTGATGAGTG TTTTTGTAAA AACTATCACA 60 ATATTGCTTG ACATCTATAA AAAACTTTGT TAAACTATTC ACGTAAAAGA AAGTGAATGA 120 AGTCACAAAG GAGAACCTAC AAATATGGCA ACTAAAAAAG CCGCTCCAGC TGCAAAGAAA 180 GTTTTAAGCG CTGAAGAAAA AGCCGCAAAA TTCCAAGAAG CTGTTGCTTA TACTGACAAA 240
TTAGTCAAAA AAGCACAAGC TGCTGTTCTT AAATTTGAAG GATATACACA AACTCAAGTC 300
GATACTATTG TCGCTGCAAT GGCTCTTGCA GCAAGCAAAC ATTCTCTAGA ACTCGCTCAT 360
GAAGCCGTTA ACGAAACTGG TCGTGGTGTT GTCGAAGACA AAGATACCAA AAACCACTTT 420
GCTTCTGAAT CTGTTTATAA CGCAATTAAA AATGACAAAA CTGTTGGTGT CATTTCTGAA 480
AACAAGGTTG CTGGATCTGT TGAAATCGCA AGCCCTCTCG GTGTACTTGC TGGTATCGTT 540
CCAACGACTA ATCCAACATC AACAGCAATC TTTAAATCTT TATTGACTGC AAAAACACGT 600
AATGCTATTG TTTTCGCTTT CCACCCTCAA GCTCAAAAAT GTTCAAGCCA TGCAGCAAAA 660
ATTGTTTACG ATGCTGCAAT TGAAGCTGGT GCACCGGAAG ACTTTATTCA ATGGATTGAA 720
GTACCAAGCC TTGACATGAC TACCGCCTTG ATTCAAAACC GTGGACTTGC AACAATCCTT 780
GCAACTGGTG GCCCAGGAAT GGTAAACGCC GCACTCAAAT CTGGTAACCC TTCACTCGGT 840
GTTGGAGCTG GTAATGGTGC TGTTTATGTT GATGCAACTG CAAATATTGA ACGTGCCGTT 900
GAAGACCTTT TGCTTTCAAA ACGTTTTGAT AATGGGATGA TTTGTGCCAC TGAAAATTCA 960
GCTGTTATTG ATGCTTCAGT TTATGATGAA TTTATTGCTA AAATGCAAGA ACAAGGCGCT 1020
TATATGGTTC CTAAAAAAGA CTACAAAGCT ATTGAAAGTT TCGTTTTTGT TGAACGTGCT 1080
GGTGAAGGTT TTGGAGTAAC TGGTCCTGTT GCCGGTCGTT CTGGTCAATG GATTGCTGAA 1140
CAAGCTGGTG TCAAAGTTCC TAAAGATAAA GATGTCCTTC TTTTTGAACT TGATAAGAAA 1200
AATATTGGTG AAGCACTTTC TTCTGAAAAA CTTTCTCCTT TGCTTTCAAT CTACAAAGCT 1260
GAAACACGTG AAGAAGGAAT TGAGATTGTA CGTAGCTTAC TTGCTTATCA AGGTGCTGGA 1320
CATAATGCTG CAATTCAAAT CGGTGCAATG GATGATCCAT TCGTTAAAGA ATATGGCGAA 1380
AAAGTTGAAG CTTCTCGTAT CCTCGTTAAC CAACCAGATT CTATTGGTGG GGTCGGAGAT 1440
ATCTATACTG ATGCAATGCG TCCATCACTT ACACTTGGAA CTGGTTCATG GGGGAAAAAT 1500
TCACTTTCAC ACAATTTGAG TACATACGAT CTATTGAATG TTAAAACAGT GGCTAAACGT 1560
CGTAATCGCC CACAATGGGT TCGTTTGCCA AAAGAAATTT ACTACGAAAA AAATGCAATT 1620
TCTTACTTAC AAGAATTGCC ACACGTCCAC AAAGCTTTCA TCGTTGCTGA CCCTGGTATG 1680
GTTAAATTTG GTTTCGTTGA TAAAGTTTTG GAACAACTTG CTATCCGCCC AACTCAAGTT 1740
GAAACAAGCA TTTATGGCTC TGTTCAACCT GACCCAACTT TGAGCGAAGC AATTGCAATC 1800
GCTCGTCAAA TGAAACAATT TGAACCTGAC ACTGTCATCT GTCTTGGTGG TGGTTCTGCT 1860
CTCGATGCCG GTAAGATTGG TCGTTTGATT TATGAATATG ATGCTCGTGG TGAAGCTGAC 1920
CTTTCTGATG ATGCAAGTTT GAAAGAACTT TTCCAAGAAT TAGCTCAAAA ATTTGTCGAT 1980
ATTCGTAAAC GTATTATTAA ATTCTACCAT CCACATAAAG CACAAATGGT TGCAATTCCT 2040
ACTACTTCTG GTACTGGTTC TGAAGTGACT CCATTTGCAG TTATCACTGA TGATGAAACT 2100
CATGTTAAGT ACCCACTTGC TGACTACCAA TTAACACCAC AAGTTGCCAT TGTTGACCCT 2160
GAGTTTGTTA TGACTGTACC AAAACGTACT GTTTCTTGGT CTGGTATTGA TGCGATGTCA 2220
CACGCGCTTG AATCTTACGT TTCTGTTATG TCTTCTGACT ATACAAAACC AATTTCACTT 2280
CAAGCGATCA AACTTATCTT TGAAAACTTG ACTGAGTCTT ATCATTATGA CCCAGCGCAT 2340
CCAACTAAAG AAGGACAAAA AGCCCGCGAA AACATGCACA ATGCTGCAAC ACTCGCTGGT 2400
ATGGCCTTCG CTAATGCTTT CCTTGGAATT AACCACTCAC TTGCTCATAA AATTGGTGGT 2460
GAATTTGGAC TTCCTCATGG TCTTGCCATT GCCATCGCTA TGCCACATGT CATTAAATTT 2520
AACGCTGTAA CAGGAAACGT TAAACGTACC CCTTACCCAC GTTATGAAAC ATATCGTGCT 2580
CAAGAGGACT ACGCTGAAAT TTCACGCTTC ATGGGATTTG CTGGTAAAGA TGATTCAGAT 2640
GAAAAAGCTG TGCAAGCTCT GGTTGCTGAA CTTAAGAAAC TGACTGATAG CATTGATATT 2700
AATATCACCC TTTCAGGAAA TGGTATCGAT AAAGCTCACC TTGAACGTGA ACTTGATAAA 2760
TTGGCTGACC TTGTTTATGA TGATCAATGT ACTCCTGCTA ATCCTCGTCA ACCAAGAATT 2820
GATGAGATTA AACAGTTGTT GTTAGATCAA TACTAATAAT CTGTTGATAA AATTATTAAA 2880
ACGCTCTGAT GAATTCGTCA GAGO-TTTTT TATTATAGCT TATACAACTA TCAAAAGGTA 2940
TAAATCAATT TCGATATAGG CTCTTTTCAC TCCATTGATT TATGCATTTC TATAAAAATC 3000
AATAATTAAT TAGCGATAGA AGTCGAGTTC ATGCATGCTA ATAATGAAAT TGTTTTAAAT 3060
TCTGGTTTTT CTTTATGTTC TTTGCGAACA TCTTTCACAG TTTCTTTGTT CATGAAAATT 3120
CCTCCTTATT ATGGTACTAT TTTGAGCCCA AATAGTTATA TAAGAATCCT AAACTTCGGA 3180
TATCTTATCA AAG 3193 (2) INFORMATION FOR SEQ ID NO: 14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 758 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:
Ser Glu Leu Asn Glu Lys Leu Ala Thr Ala Trp Glu Gly Phe Thr Lys
1 5 10 15
Gly Aεp Trp Gin Aεn Glu Val Aεn Val Arg Asp Phe He Gin Lys Asn
20 25 30
Tyr Thr Pro Tyr Glu Gly Asp Glu Ser Phe Leu Ala Gly Ala Thr Glu
35 40 45
Ala Thr Thr Thr Leu Trp Asp Lys Val Met Glu Gly Val Lys Leu Glu
50 55 60
Asn Arg Thr His Ala Pro Val Asp Phe Asp Thr Ala Val Ala Ser Thr 65 70 75 80
He Thr Ser His Asp Ala Gly Tyr He Asn Lyε Gin Leu Glu Lyε He
85 90 95
Val Gly Leu Gin Thr Glu Ala Pro Leu Lyε Arg Ala Leu He Pro Phe
100 105 110
Gly Gly He Lys Met He Glu Gly Ser Cys Lys Ala Tyr Asn Arg Glu
115 120 125
Leu Asp Pro Met He Lys Lys He Phe Thr Glu Tyr Arg Lys Thr His
130 135 140
Asn Gin Gly Val Phe Asp Val Tyr Thr Pro Asp He Leu Arg Cys Arg 145 150 155 160
Lys Ser Gly Val Leu Thr Gly Leu Pro Aεp Ala Tyr Gly Arg Gly Arg
165 170 175
He He Gly Asp Tyr Arg Arg Val Ala Leu Tyr Gly He Asp Tyr Leu
180 185 190
Met Lyε Asp Lys Leu Ala Gin Phe Thr Ser Leu Gin Ala Asp Leu Glu
195 200 205
Asn Gly Val Asn Leu Glu Gin Thr He Arg Leu Arg Glu Glu He Ala
210 215 220
Glu Gin Hiε Arg Ala Leu Gly Gin Met Lys Glu Met Ala Ala Lys Tyr 225 230 235 240
Gly Tyr Asp He Ser Gly Pro Ala Thr Asn Ala Gin Glu Ala He Gin
245 250 255
Trp Thr Tyr Phe Gly Tyr Leu Ala Ala Val Lys Ser Gin Asn Gly Ala
260 265 270
Ala Met Ser Phe Gly Arg Thr Ser Thr Phe Leu Asp Val Tyr He Glu
275 280 285
Arg Asp Leu Lys Ala Gly Lys He Thr Glu Gin Glu Ala Gin Glu Met
290 295 300
Val Asp His Leu Val Met Lys Leu Arg Met Val Arg Phe Leu Arg Thr 305 310 315 320
Pro Glu Tyr Aεp Glu Leu Phe Ser Gly Aεp Pro He Trp Ala Thr Glu
325 330 335
Ser He Gly Gly Met Gly Leu Aεp Gly Arg Thr Leu Val Thr Lyε Aεn
340 345 350
Ser Phe Arg Phe Leu Asn Thr Leu Tyr Thr Met Gly Pro Ser Pro Glu 355 360 365 Pro Asn Met Thr He Leu Trp Ser Glu Lyε Leu Pro Leu Asn Phe Lys
370 375 380
Lys Phe Ala Ala Lys Val Ser He Asp Thr Ser Ser Leu Gin Tyr Glu 385 390 395 400
Asn Asp Aεp Leu Met Arg Pro Aεp Phe Aεn Aεn Asp Asp Tyr Ala He
405 410 415
Ala Cys Cys Val Ser Pro Met He Val Gly Lys Gin Met Gin Phe Phe
420 425 430
Gly Ala Arg Ala Asn Leu Ala Lys Thr Met Leu Tyr Ala He Asn Gly
435 440 445
Gly Val Asp Glu Lys Leu Lys Met Gin Val Gly Pro Lys Ser Glu Pro
450 455 460
He Lys Gly Asp Val Leu Asn Tyr Asp Glu Val Met Glu Arg Met Asp 465 470 475 480
His Phe Met Asp Trp Leu Ala Lys Gin Tyr He Thr Ala Leu Asn He
485 490 495
He His Tyr Met His Asp Lys Tyr Ser Tyr Glu Ala Ser Leu Met Ala
500 505 510
Leu His Asp Arg Asp Val He Arg Thr Met Ala Cys Gly He Ala Gly
515 520 525
Leu Ser Val Ala Ala Asp Ser Leu Ser Ala He Lyε Tyr Ala Lyε Val
530 535 540
Lyε Pro He Arg Asp Glu Asp Gly Leu Ala He Asp Phe Glu He Glu 545 550 555 560
Gly Glu Tyr Pro Gin Phe Gly Asn Aεn Aεp Pro Arg Val Asp Asp Leu
565 570 575
Ala Val Asp Leu Val Glu Arg Phe Met Lys Lys He Gin Lyε Leu Hiε
580 585 590
Thr Tyr Arg Aεp Ala He Pro Thr Gin Ser Val Leu Thr He Thr Ser
595 600 605
Asn Val Val Tyr Gly Lys Lys Thr Gly Aεn Thr Pro Asp Gly Arg Arg
610 615 620
Ala Gly Ala Pro Phe Gly Pro Gly Ala Asn Pro Met His Gly Arg Asp 625 630 635 640
Gin Lys Gly Ala Val Ala Ser Leu Thr Ser Val Ala Lys Leu Pro Phe
645 650 655
Ala Tyr Ala Lyε Aεp Gly He Ser Tyr Thr Phe Ser He Val Pro Asn
660 665 670
Ala Leu Gly Lys Aεp Asp Glu Val Arg Lyε Thr Asn Leu Ala Gly Leu
675 680 685
Met Asp Gly Tyr Phe His His Glu Ala Ser He Glu Gly Gly Gin His
690 695 700
Leu Asn Val Asn Val Met Asn Arg Glu Met Leu Leu Asp Ala Met Glu 705 710 715 720
Asn Pro Glu Lys Tyr Pro Gin Leu Thr He Arg Val Ser Gly Tyr Ala
725 730 735
Val Arg Phe Asn Ser Leu Thr Lys Glu Gin Gin Gin Asp Val He Thr
740 745 750
Arg Thr Phe Thr Gin Ser 755
(2) INFORMATION FOR SEQ ID NO: 15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3412 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 80...2440
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15:
GAATTCTGTT TGCTATTCTC AAACTGTATG ATATAATGAA GTTGTAATTT GAAACAGAAA 60 GAACAAAGGA GATTTCAAA ATG AAA ACC GAA GTT ACG GAA AAT ATC TTT GAA 112
Met Lys Thr Glu Val Thr Glu Asn He Phe Glu 1 5 10
CAA GCT TGG GAT GGT TTT AAA GGA ACC AAC TGG CGC GAT AAA GCA AGC 160 Gin Ala Trp Aεp Gly Phe Lyε Gly Thr Aεn Trp Arg Asp Lys Ala Ser 15 20 25
GTT ACT CGC TTT GTA CAA GAA AAC TAC AAA CCA TAT GAT GGT GAT GAA 208 Val Thr Arg Phe Val Gin Glu Asn Tyr Lys Pro Tyr Asp Gly Asp Glu 30 35 40
AGC TTT CTT GCT GGG CCA ACA GAA CGT ACA CTT AAA GTA AAG AAA ATT 256 Ser Phe Leu Ala Gly Pro Thr Glu Arg Thr Leu Lys Val Lyε Lys He 45 50 55
ATT GAA GAT ACA AAA AAT CAC TAC GAA GAA GTA GGA TTT CCC TTC GAT 304 He Glu Asp Thr Lys Asn His Tyr Glu Glu Val Gly Phe Pro Phe Asp 60 65 70 75
ACT GAC CGC GTA ACC TCT ATT GAT AAA ATC CCT GCT GGA TAT ATC GAT 352 Thr Aεp Arg Val Thr Ser He Aεp Lyε He Pro Ala Gly Tyr He Asp 80 85 90
GCT AAT GAT AAA GAA CTT GAA CTC ATC TAT GGG ATG CAA AAT AGC GAA 400 Ala Asn Asp Lys Glu Leu Glu Leu He Tyr Gly Met Gin Asn Ser Glu 95 100 105
CTT TTC CGC TTG AAT TTC ATG CCA AGA GGT GGA CTT CGT GTT GCT GAA 448 Leu Phe Arg Leu Asn Phe Met Pro Arg Gly Gly Leu Arg Val Ala Glu 110 115 120
AAG ATT TTG ACA GAA CAC GGT CTC TCA GTT GAC CCA GGC TTG CAT GAT 496 Lys He Leu Thr Glu His Gly Leu Ser Val Asp Pro Gly Leu His Asp 125 130 135
GTT TTG TCA CAA ACA ATG ACT TCT GTA AAT GAT GGA ATC TTT CGT GCT 544 Val Leu Ser Gin Thr Met Thr Ser Val Asn Asp Gly He Phe Arg Ala 140 145 150 155
TAT ACT TCA GCA ATT CGT AAA GCA CGT CAT GCT CAT ACT GTA ACA GGT 592 Tyr Thr Ser Ala He Arg Lys Ala Arg His Ala His Thr Val Thr Gly 160 165 170
TTG CCA GAT GCT TAC TCT CGT GGA CGT ATC ATT GGT GTC TAT GCA CGT 640 Leu Pro Asp Ala Tyr Ser Arg Gly Arg He He Gly Val Tyr Ala Arg 175 180 185 CTT GCC CTT TAC GGT GCT GAT TAC CTT ATG AAG GAA AAA GCA AAA GAA 688 Leu Ala Leu Tyr Gly Ala Asp Tyr Leu Met Lyε Glu Lys Ala Lyε Glu 190 195 200
TGG GAT GCA ATC ACT GAA ATT AAC GAA GAA AAC ATT CGT CTT AAA GAA 736 Trp Aεp Ala He Thr Glu He Asn Glu Glu Asn He Arg Leu Lys Glu 205 210 215
GAA ATT AAT ATG CAA TAC CAA GCT TTG CAA GAA GTT GTA AAC TTT GGT 784 Glu He Aεn Met Gin Tyr Gin Ala Leu Gin Glu Val Val Aεn Phe Gly 220 225 230 235
GCC TTA TAT GGT CTT GAT GTT TCA CGT CCA GCT ATG AAC GTA AAA GAA 832 Ala Leu Tyr Gly Leu Asp Val Ser Arg Pro Ala Met Asn Val Lyε Glu 240 245 250
GCA ATC CAA TGG GTT AAC ATC GCT TAT ATG GCA GTA TGT CGT GTC ATT 880 Ala He Gin Trp Val Asn He Ala Tyr Met Ala Val Cys Arg Val He 255 260 265
AAT GGA GCT GCA ACT TCA CTT GGA CGT GTT CCA ATC GTT CTT GAT ATC 928 Asn Gly Ala Ala Thr Ser Leu Gly Arg Val Pro He Val Leu Aεp He 270 275 280
TTT GCA GAA CGT GAC CTT GCT CGT GGA ACA TTT ACT GAA CAA GAA ATT 976 Phe Ala Glu Arg Asp Leu Ala Arg Gly Thr Phe Thr Glu Gin Glu He 285 290 295
CAA GAA TTT GTT GAT GAT TTC GTT TTG AAG CTT CGT ACA ATG AAA TTT 1024 Gin Glu Phe Val Asp Asp Phe Val Leu Lys Leu Arg Thr Met Lys Phe 300 305 310 315
GCT CGT GCA GCT GCT TAT GAT GAA CTT TAT TCT GGT GAC CCA ACA TTC 1072 Ala Arg Ala Ala Ala Tyr Asp Glu Leu Tyr Ser Gly Asp Pro Thr Phe 320 325 330
ATC ACA ACA TCT ATG GCT GGT ATG GGT AAT GAC GGA CGT CAC CGT GTC 1120 He Thr Thr Ser Met Ala Gly Met Gly Asn Asp Gly Arg His Arg Val 335 340 345
ACT AAA ATG GAC TAC CGT TTC TTG AAC ACA CTT GAT ACA ATC GGA AAT 1168 Thr Lys Met Asp Tyr Arg Phe Leu Asn Thr Leu Asp Thr He Gly Asn 350 355 360
GCT CCA GAA CCA AAC TTG ACA GTC CTT TGG GAT TCT AAA CTT CCT TAC 1216 Ala Pro Glu Pro Asn Leu Thr Val Leu Trp Asp Ser Lys Leu Pro Tyr 365 370 375
TCA TTC AAA CGT TAT TCA ATG TCT ATG AGC CAC AAG CAT TCT TCT ATT 1264 Ser Phe Lys Arg Tyr Ser Met Ser Met Ser His Lys His Ser Ser He 380 385 390 395
CAA TAT GAA GGT GTT GAA ACA ATG GCT AAA GAT GGA TAT GGC GAA ATG 1312 Gin Tyr Glu Gly Val Glu Thr Met Ala Lyε Aεp Gly Tyr Gly Glu Met 400 405 410 TCA TGT ATC TCT TGT TGT GTC TCA CCA CTT GAT CCA GAA AAT GAA GAA 1360 Ser Cys He Ser Cys Cys Val Ser Pro Leu Asp Pro Glu Asn Glu Glu 415 420 425
GGA CGT CAT AAC CTC CAA TAC TTT GGT GCG CGT GTA AAC GTC TTG AAA 1408 Gly Arg His Aεn Leu Gin Tyr Phe Gly Ala Arg Val Asn Val Leu Lys 430 435 440
GCA ATG TTG ACT GGT TTG AAC GGT GGT TAT GAT GAC GTT CAT AAA GAT 1456 Ala Met Leu Thr Gly Leu Asn Gly Gly Tyr Aεp Aεp Val His Lyε Aεp 445 450 455
TAT AAA GTA TTC GAC ATC GAA CCT GTT CGT GAC GAA ATT CTT GAC TAT 1504 Tyr Lys Val Phe Asp He Glu Pro Val Arg Asp Glu He Leu Aεp Tyr 460 465 470 475
GAT ACA GTT ATG GAA AAC TTT GAC AAA TCT CTC GAC TGG TTG ACT GAT 1552 Asp Thr Val Met Glu Asn Phe Asp Lys Ser Leu Asp Trp Leu Thr Asp 480 485 490
ACT TAT GTT GAT GCA ATG AAT ATC ATT CAT TAC ATG ACT GAT AAA TAT 1600 Thr Tyr Val Asp Ala Met Asn He He His Tyr Met Thr Aεp Lyε Tyr 495 500 505
AAC TAT GAA GCA GTT CAA ATG GCC TTC TTG CCT ACT AAA GTT CGT GCT 1648 Asn Tyr Glu Ala Val Gin Met Ala Phe Leu Pro Thr Lys Val Arg Ala 510 515 520
AAC ATG GGA TTT GGT ATC TGT GGA TTC GCA AAT ACA GTT GAT TCA CTT 1696 Asn Met Gly Phe Gly He Cys Gly Phe Ala Aεn Thr Val Asp Ser Leu 525 530 535
TCA GCA ATT AAA TAT GCT AAA GTT AAA ACA TTG CGT GAT GAA AAT GGC 1744 Ser Ala He Lys Tyr Ala Lyε Val Lys Thr Leu Arg Asp Glu Asn Gly 540 545 550 555
TAT ATC TAC GAT TAC GAA GTA GAA GGT GAT TTC CCT CGT TAT GGT GAA 1792 Tyr He Tyr Asp Tyr Glu Val Glu Gly Asp Phe Pro Arg Tyr Gly Glu 560 565 570
GAT GAT GAT CGT GCT GAT GAT ATT GCT AAA CTT GTC ATG AAA ATG TAC 1840 Asp Asp Asp Arg Ala Asp Asp He Ala Lys Leu Val Met Lys Met Tyr 575 580 585
CAT GAA AAA TTA GCT TCA CAC AAA CTT TAC AAA AAT GCT GAA GCT ACT 1888 His Glu Lys Leu Ala Ser His Lyε Leu Tyr Lyε Asn Ala Glu Ala Thr 590 595 600
GTT TCA CTT TTG ACA ATT ACA TCT AAC GTT GCT TAC TCT AAA CAA ACT 1936 Val Ser Leu Leu Thr He Thr Ser Asn Val Ala Tyr Ser Lyε Gin Thr 605 610 615
GGT AAT TCT CCA GTA CAT AAA GGA GTA TTC CTC AAT GAA GAT GGT ACA 1984 Gly Asn Ser Pro Val Hiε Lyε Gly Val Phe Leu Asn Glu Aεp Gly Thr 620 625 630 635 GTA AAT AAA TCT AAA CTT GAA TTC TTC TCA CCA GGT GCT AAC CCA TCT 2032 Val Asn Lys Ser Lys Leu Glu Phe Phe Ser Pro Gly Ala Asn Pro Ser 640 645 650
AAT AAA GCT AAG GGT GGT TGG TTG CAA AAC CTT CGC TCA TTG GCT AAG 2080 Aεn Lyε Ala Lys Gly Gly Trp Leu Gin Aεn Leu Arg Ser Leu Ala Lys 655 660 665
TTG GAA TTC AAA GAT GCA AAT GAT GGT ATT TCA TTG ACT ACT CAA GTT 2128 Leu Glu Phe Lyε Aεp Ala Aεn Aεp Gly He Ser Leu Thr Thr Gin Val 670 675 680
TCA CCT CGT GCA CTT GGT AAA ACT CGT GAT GAA CAA GTG GAT AAC TTG 2176 Ser Pro Arg Ala Leu Gly Lys Thr Arg Asp Glu Gin Val Asp Asn Leu 685 690 695
GTT CAA ATT CTT GAT GGA TAC TTC ACA CCA GGT GCT TTG ATT AAT GGT 2224 Val Gin He Leu Aεp Gly Tyr Phe Thr Pro Gly Ala Leu He Aεn Gly 700 705 710 715
ACT GAA TTT GCA GGT CAA CAC GTT AAC TTG AAC GTA ATG GAC CTT AAA 2272 Thr Glu Phe Ala Gly Gin Hiε Val Aεn Leu Aεn Val Met Aεp Leu Lyε 720 725 730
GAT GTT TAC GAT AAA ATC ATG CGT GGT GAA GAT GTT ATC GTT CGT ATC 2320 Aεp Val Tyr Aεp Lys He Met Arg Gly Glu Asp Val He Val Arg He 735 740 745
TCT GGT TAC TGT GTC AAT ACT AAA TAC CTC ACA CCA GAA CAA AAA CAA 2368 Ser Gly Tyr Cys Val Asn Thr Lyε Tyr Leu Thr Pro Glu Gin Lys Gin' 750 755 760
GAA TTA ACT GAA CGT GTC TTC CAT GAA GTT CTT TCA AAC GAT GAT GAA 2416 Glu Leu Thr Glu Arg Val Phe His Glu Val Leu Ser Asn Asp Asp Glu 765 770 775
GAA GTA ATG CAT ACT TCA AAC ATC TAATTCTTAA AATTTAATGA ATATTCGGTC 2470 Glu Val Met His Thr Ser Asn He 780 785
TGTCAGTACT GACAGACTTT TTTTTACGAA AAAATTAATC ATAATAGTTA AAAACTATTG 2530
TTTTTAGTTT AAGAAAGTTA AATTTTATGC TAAAATAGAT GAATGAAAAT GGTAATTGGA 2590
TTGACAGGCG GAATTGCGAT GGGAAATCAA CGGTGGTTGA TTTTTTGATT CTGAGGGTTA 2650
TCAAGTGATT GATGCTGACA AAGTTGTCCG TCAATTTACA AGAACCTGGC GGAAAACTTT 2710
ACAAGGCAAT ATTAGAAACT TACGGTTTAG ATTTTATTGC TGACAATTGG ACAGTTAAAT 2770
CGTGAAAAAT TAGGAGCTTT AGTTTTTTCT GATTCAAAAG AGCGCGAGAA ATTATCAAAC 2830
TTACAAGATG AAATTATTCG TACAGAATTA TATGATAGAC GTGATGACTT ATTAAAAAAA 2890
ATGACTGACA AGTCTGTCAG TAAAAATTTT GATTCAAAGA GTCAAGGAAA AAATCTGTCA 2950
GTAAATAAGC CAATATTTAT GGATATTCCG TTATTAATTG AATACAATTA TACCGGATTT 3010
GATGAAATAT GGTTGGTCAG CTTACCTGAA AAAATACAAT TAGAAAGACT GATGGCAAGA 3070
AATAAGTTTA CGGAAGAAGA AGCTAAAAAA CGAATTTCTT CACAAATGCC ATTGTCAGAA 3130
AAACAAAAAG TCGCTGATGT CATTCTGGAT AATTCTGGAA AGATTGAAGC ACTAAAAAAA 3190
CAAATCCAGC GAGAACTAGC TAGGATAGAA GAACAGAAAT AGAGGTGAAT CGCACGAAAA 3250
CAGTTAATTG GAAAGGAATT TATTTATAAC ATGGATTGGC TGCTTTTTTG TAGGTTCATC 3310
ATTTTCACTC GTCATGCCTT TCTCCCCTTG TATATTCAAG GACTGGGTGA AGCGGTGGGA 3370
ATTTGAACTT TACTCAGGGT TACTTTTTCT TTGCCAGCCT TA 3412 (2) INFORMATION FOR SEQ ID NO: 16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 787 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16:
Met Lys Thr Glu Val Thr Glu Aεn He Phe Glu Gin Ala Trp Asp Gly
1 5 10 15
Phe Lys Gly Thr Asn Trp Arg Asp Lys Ala Ser Val Thr Arg Phe Val
20 25 30
Gin Glu Asn Tyr Lys Pro Tyr Asp Gly Asp Glu Ser Phe Leu Ala Gly
35 40 45
Pro Thr Glu Arg Thr Leu Lys Val Lys Lys He He Glu Asp Thr Lyε
50 55 60
Aεn Hiε Tyr Glu Glu Val Gly Phe Pro Phe Aεp Thr Asp Arg Val Thr 65 70 75 80
Ser He Asp Lys He Pro Ala Gly Tyr He Aεp Ala Asn Asp Lys Glu
85 90 95
Leu Glu Leu He Tyr Gly Met Gin Asn Ser Glu Leu Phe Arg Leu Asn
100 105 110
Phe Met Pro Arg Gly Gly Leu Arg Val Ala Glu Lys He Leu Thr Glu
115 120 125
His Gly Leu Ser Val Asp Pro Gly Leu His Asp Val Leu Ser Gin Thr
130 135 140
Met Thr Ser Val Asn Aεp Gly He Phe Arg Ala Tyr Thr Ser Ala He 145 150 155 160
Arg Lys Ala Arg Hiε Ala His Thr Val Thr Gly Leu Pro Asp Ala Tyr
165 170 175
Ser Arg Gly Arg He He Gly Val Tyr Ala Arg Leu Ala Leu Tyr Gly
180 185 190
Ala Asp Tyr Leu Met Lys Glu Lys Ala Lys Glu Trp Asp Ala He Thr
195 200 205
Glu He Aεn Glu Glu Asn He Arg Leu Lys Glu Glu He Asn Met Gin
210 215 220
Tyr Gin Ala Leu Gin Glu Val Val Asn Phe Gly Ala Leu Tyr Gly Leu 225 230 235 240
Asp Val Ser Arg Pro Ala Met Aεn Val Lyε Glu Ala He Gin Trp Val
245 250 255
Asn He Ala Tyr Met Ala Val Cys Arg Val He Aεn Gly Ala Ala Thr
260 265 270
Ser Leu Gly Arg Val Pro He Val Leu Asp He Phe Ala Glu Arg Aεp
275 280 285
Leu Ala Arg Gly Thr Phe Thr Glu Gin Glu He Gin Glu Phe Val Asp
290 295 300
Asp Phe Val Leu Lys Leu Arg Thr Met Lys Phe Ala Arg Ala Ala Ala 305 310 315 320
Tyr Asp Glu Leu Tyr Ser Gly Asp Pro Thr Phe He Thr Thr Ser Met
325 330 335
Ala Gly Met Gly Asn Asp Gly Arg Hiε Arg Val Thr Lys Met Asp Tyr 340 345 350 Arg Phe Leu Asn Thr Leu Asp Thr He Gly Aεn Ala Pro Glu Pro Aεn
355 360 365
Leu Thr Val Leu Trp Asp Ser Lys Leu Pro Tyr Ser Phe Lys Arg Tyr
370 375 380
Ser Met Ser Met Ser His Lyε His Ser Ser He Gin Tyr Glu Gly Val 385 390 395 400
Glu Thr Met Ala Lys Asp Gly Tyr Gly Glu Met Ser Cys He Ser Cys
405 410 415
Cys Val Ser Pro Leu Aεp Pro Glu Asn Glu Glu Gly Arg His Asn Leu
420 425 430
Gin Tyr Phe Gly Ala Arg Val Asn Val Leu Lys Ala Met Leu Thr Gly
435 440 445
Leu Aεn Gly Gly Tyr Asp Asp Val His Lys Asp Tyr Lys Val Phe Asp
450 455 460
He Glu Pro Val Arg Asp Glu He Leu Asp Tyr Asp Thr Val Met Glu 465 470 475 480
Asn Phe Asp Lys Ser Leu Asp Trp Leu Thr Asp Thr Tyr Val Asp Ala
485 490 495
Met Asn He He His Tyr Met Thr Asp Lys Tyr Asn Tyr Glu Ala Val
500 505 510
Gin Met Ala Phe Leu Pro Thr Lyε Val Arg Ala Aεn Met Gly Phe Gly
515 520 525
He Cyε Gly Phe Ala Asn Thr Val Asp Ser Leu Ser Ala He Lys Tyr
530 535 540
Ala Lys Val Lys Thr Leu Arg Asp Glu Asn Gly Tyr He Tyr Asp Tyr 545 550 555 560
Glu Val Glu Gly Asp Phe Pro Arg Tyr Gly Glu Asp Asp Asp Arg Ala
565 570 575
Asp Aεp He Ala Lys Leu Val Met Lyε Met Tyr Hiε Glu Lyε Leu Ala
580 585 590
Ser Hiε Lys Leu Tyr Lys Aεn Ala Glu Ala Thr Val Ser Leu Leu Thr
595 600 605
He Thr Ser Asn Val Ala Tyr Ser Lyε Gin Thr Gly Aεn Ser Pro Val
610 615 620
Hiε Lys Gly Val Phe Leu Asn Glu Aεp Gly Thr Val Aεn Lyε Ser Lyε 625 630 635 640
Leu Glu Phe Phe Ser Pro Gly Ala Aεn Pro Ser Asn Lys Ala Lys Gly
645 650 655
Gly Trp Leu Gin Aεn Leu Arg Ser Leu Ala Lyε Leu Glu Phe Lyε Aεp
660 665 670
Ala Asn Asp Gly He Ser Leu Thr Thr Gin Val Ser Pro Arg Ala Leu
675 680 685
Gly Lys Thr Arg Asp Glu Gin Val Asp Aεn Leu Val Gin He Leu Asp
690 695 700
Gly Tyr Phe Thr Pro Gly Ala Leu He Asn Gly Thr Glu Phe Ala Gly 705 710 715 720
Gin His Val Asn Leu Asn Val Met Asp Leu Lys Asp Val Tyr Asp Lys
725 730 735
He Met Arg Gly Glu Asp Val He Val Arg He Ser Gly Tyr Cys Val
740 745 750
Asn Thr Lys Tyr Leu Thr Pro Glu Gin Lys Gin Glu Leu Thr Glu Arg
755 760 765
Val Phe His Glu Val Leu Ser Asn Asp Asp Glu Glu Val Met His Thr
770 775 780
Ser Asn He 785 (2) INFORMATION FOR SEQ ID NO: 17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2665 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:
AAGCAAGTTC TTTCGCTTGT GTAACCGGTT ACTGTATGAT AGAATATAAT CGTAAATTGT 60
AACAGATTAA CTGTTACTAG AATAGAGGGG AACTCAATTA TGGCAACTGT CAAAACTAAC 120
ACTGACGTTT TTGAAAAAGC CTGGGAAGGC TTTAAAGGAA CTGACTGGAA AGACAGAGCA 180
AGCATTTCTC GCTTTGTTCA AGACAACTAC ACTCCATATG ACGGAGGCGA AAGTTTTCTT 240
GCCGGCCCTA CTGAACGTTC ACTTCACATC AAAAAAGTCG TAGAAGAAAC TAAAGCGCAT 300
TACGAAGAAA CACGTTTTCC AATGGATACA CGTATTACAT CTATTGCTGA TATCCCAGCA 360
GGTTATATTG ACAAGGAAAA TGAATTGATT TTTGGTATCC AAAACGATGA ACTTTTTAAG 420
CTGAACTTCA TGCCAAAAGG CGGTATTCGC ATGGCTGAAA CAGCTTTGAA AGAACATGGT 480
TATGAACCAG ACCCTGCCGT TCATGAAATC TTTACCAAAT ATGCAACAAC CGTTAATGAT 540
GGTATCTTTC GTGCTTACAC TTCAAACATT CGCCGTGCAC GTCATGCCCA CACTGTAACT 600
GGTCTCCCAG ATGCATACTC TCGCGGACGT ATTATTGGAG TTTATGCCCG TCTTGCTCTC 660
TATGGTGCTG ACTACTTGAT GCAAGAAAAA GTGAACGACT GGAACTCAAT TGCTGAAATT 720
GATGAAGAAT CAATTCGTCT TCGTGAAGAA ATCAATCTTC AATATCAGGC ACTTGGCGAA 780
GTAGTGCGGT TGGGTGATCT GTATGGTCTT GATGTTCGCA AACCTGCTAT GAATGTTAAG 840
GAAGCTATCC AATGGATTAA TATCGCCTTT ATGGCTGTCT GCCGCGTTAT CAATGGTGCT 900
GCAACTTCTC TTGGACGTGT CCCAATCGTT CTTGATATCT TTGCAGAACG TGACCTTGCT 960
CGTGGCACTT TCACTGAATC AGAAATCCAA GAATTCGTTG ATGACTTCGT TATGAAACTT 1020
CGTACGGTTA AATTTGCACG TACTAAGGCT TATGACGAAC TTTATTCAGG TGACCCAACA 1080
TTTATTACGA CTTCTATGGC TGGTATGGGA GCTGATGGAC GTCACCGTGT TACTAAGATG 1140
GACTACCGTT TCTTAAATAC GCTTGATAAT ATTGGCAATG CTCCAGAACC TAACTTAACC 1200
GTTCTTTGGT CAAGTAAATT GCCTTACTCT TTCCGTCATT ATTGTATGTC TATGAGCCAC 1260
AAGCATTCTT CAATTCAATA TGAAGGTGTC ACAACTATGG CTAAAGAAGG TTATGGTGAA 1320
ATGTCATGTA TCTCATGCTG TGTATCTCCG CTTGATCCTG AAAACGAAGA TCGTCGCCAC 1380
AATCTACAAT ACTTTGGTGC TCGTGTTAAC GTTCTTAAAG CACTTCTTAC AGGTCTTAAT 1440
GGCGGTTACG ACGATGTTCA CAAAGACTAC AAAGTATTTG ATGTCGAACC TATCCGTGAT 1500
GAAGTCCTTG ATTTTGAAAC GGTTAAAGCT AATTTTGAAA AAGCACTTGA TTGGTTGACT 1560
GATACTTACG TGGACGCAAT GAATATCATT CACTATATGA CTGATAAATA TAACTATGAA 1620
GCCGTTCAAA TGGCCTTCTT ACCAACACGT GTTAAAGCCA ATATGGGATT TGGTATTTGC 1680
GGATTCTCTA ATACAGTTGA TTCATTATCA GCTATTAAAT ATGCTACTGT AAAACCTATT 1740
CGTGATGAAG ATGGTTACAT TTACGACTAT GAAACTGTTG GTAACTTCCC TCGTTACGGA 1800
GAAGATGATG ACCGTGTAGA CTCAATCGCT GAATGGTTGC TTGAAGCTTT CCATACTCGT 1860
CTTGCACGTC ATAAACTGTA CAAAGATTCC GAAGCTACTG TATCATTGCT TACAATCACT 1920
TCTAATGTTG CTTATTCTAA ACAAACTGGT AATTCTCCAG TTCACAAGGG TGTTTACCTC 1980
AATGAAGATG GTTCTGTGAA CTTGTCTAAA GTAGAATTCT TCTCACCAGG TGCTAACCCA 2040
TCAAATAAAG CTTCCGGCGG CTGGTTGCAA AACTTGAACT CATTGAAGAA ACTTGACTTT 2100
GCTCACGCAA ATGATGGTAT CTCATTGACA ACTCAAGTTT CACCAAAAGC TCTTGGTAAG 2160
ACATTCGATG AACAAGTTGC TAACTTAGTA ACAATTCTTG ATGGTTACTT TGAAGGCGGC 2220
GGTCAACACG TTAACTTGAA CGTTATGGAT CTTAAAGATG TTTATGACAA GATCATGAAT 2280
GGTGAAGATG TTATCGTTCG TATCTCAGGT TACTGTGTTA ACACTAAATA CCTTACTAAA 2340
GAACAAAAGA CTGAATTGAC ACAACGTGTT TTCCATGAAG TTCTCTCAAT GGATGATGCA 2400
GCTACAGACT TGGTTAACAA CAAGTAAGAG TTAAACAGTT TAGTTTAAAA GACCTCACTC 2460
ATAAAAGTGA GGTCTTTACT TTGCTTTCGG GTACGATCAA AGCAGTGAGA GCTTTTTATA 2520
TTCTAAAAAC TCACAAATTC AGAAAAAAAC AGCTCTTGTG ATTTGAAAAG CTTTTAGCTA 2580
CAATAATATT ATGAAAATTA ATTATACTCG CGACACACTG TCATCCACCT ATCTTGATGC 2640
AGTAAAAATT AGACACCTTG TCTTC 2665 (2) INFORMATION FOR SEQ ID NO: 18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1993 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:
GTCCGAATGG TGCACCAGCA CGACGACCAT CAGGGGTGTT ACCCGTTTTC TTACCATAAA 60
CTACGTTAGA AGTAATGGTT AATACAGATT GTGTAGGCAC TGCATTGCGG TAAGTTTTAA 120
GTTTTTGAAT TTTCTTCATA AAACGTTCAA CTAAGTCACA AGCGATGTCA TCAACACGGT 180
TATCATTGTT ACCATATTGT GGATATTCAC CTTCGATTTC AAAGTCGATT GCTACGTTAG 240
TTGCAACAAC ATTGCCATCT TTATCTTTGA TGTCGCCACG AACTGGTTTA ACTTTCGCAT 300
ATTTGATTGC TGAAAGTGAG TCAGCCGCAA CAGAAAGACC TGCGATACCA CAAGCCATAG 360
TACGGTATAC ATCACGATCA TGTAATGCCA TTAATGCGGC TTCGTATGAA TATTTATCGT 420
GCATATAGTG GATTACGTTT AAGGCAGTCA C .TATTGTTT TGCCAACCAA TCCATAAAGC 480
TATCCATACG AGTCATTACT GTATCGAAAT CTAATACTTC ATCAGTAATT GGTGCAGTTT 540
TCGGACCTAC TTGCATACCT AATTTTTCAT CGATACCGCC GTTGATTGCG TATAACAATG 600
TTTTCGCTAA GTTTGCACGT GCACCGAAGA ATTGCATTTG TTTACCCACA ATCATTGGTG 660
ATACACAACA TGCGATTGCG TAGTCATCGT TGTTGAAGTC TGGACGCATT AAATCATCGT 720
TTTCGTATTG AACTGATGAG GTATCAATCG ATACTTTTGC ACAGAAACGT TTGAAGTTTT 780
CAGGTAATTG TTCAGACCAA AGAATGGTTA AGTTTGGCTC TGGAGAAGTA CCCATGTTGT 840
AAAGGGTGTG TAAAATACGG AATGTATTTT TGGTTACTAA TGTACGACCA TCTAAACCCA 900
TACCTGCGAT GGTTTCAGTT GCCCACATTG GGTCACCAGA GAATAATTGA TCGTATTCAG 960
GTGTACGTAA GAAACGAACC ATACGAAGTT TCATAACTAA GTGGTCAACT AATTCTTGCG 1020
CTTCAGTTTC AGTAATTTTT CCTGCTTTTA AATCACGTTC GATGTACACG TCAATAAAGG 1080
TTGCGGTACG ACCGAATGAC ATTGCAGCAC C-ATTTTGTGA TTTTATTGCA GCAAGATAAG 1140
CAAAGTACAT CCATTGAATG GCTTCTTGAG CATTAGTTGC TGGGTTAGAA ATATCATAAC 1200
CATAGCTTGC TGCCATTTGT TTTAATTGAC CTAATGCACG GTGTTGTTCT GCGATTTCTT 1260
CACGTAAACG AATTGTTGCT TCAAGATTTA CGCCATCTTC TAAATCTTTT TGTAAAGAAG 1320
AGAATTGTGC GTATTTATCT TTCATTAAGA AATCTACACC ATAAAGTGCT ACACGACGGT 1380
AGTCACCGAT GATACGACCA CGACCATAAG CATCTGGAAG ACCAGTTAAT ACCCCAGATT 1440
TACGGCAACG TAAAATATCT GGCGTGTAAA CATCGAATAC ACCTTGGTTA TGTGTTTTAC 1500
GGTATTCAGT GAAGATTTTT TTCACTTTTG GATCAAGTTC ACGACCATAA ACTTTACAAG 1560
AACCTTCCAC C-ATTTTGATA CCACCGAATG GCATAATGGC ACGTTTTAAA GGTTCATCAG 1620
TTTGAAGACC AACGATTTTT TCTAAATCTT TGTTAATGTA ACCAGGTGCG TGAGAGATAA 1680
TGGTAGATGG TGTATGTTCA TCAAAATCTA ATGGCGCGTG AGTACGGTTT TCAATTTTAA 1740
TACCTTCCAT CACAGATTCC CAAAGCTTGG TTGTTGCTTC GGTTGGACCT GCTAAGAAAG 1800
AGTCATCGCC TTCATAAGGG GTATAGTTTT TTTGGATAAA GTCACGTACA TTGACATTTT 1860
CTTGCCAATC GCCACCAGCA AAACCAGCCC ACGCCAATTT TTGCATTTCA TTAAGTTCTG 1920
ACATAGTCAT TTCCTTTGTT AATTAATAAA TAAATCTTTA ATGTGTTTTG GTTAAATAAC 1980
GTTGGAATAC ACC 1993
(2) INFORMATION FOR SEQ ID NO: 19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 746 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:
Met Lys Val Aεp He Aεp Thr Ser Asp Lys Leu Tyr Ala Asp Ala Trp
1 5 10 15
Leu Gly Phe Lys Gly Thr Asp Trp Lys Asn Glu He Asn Val Arg Asp
20 25 30
Phe He Gin Hiε Asn Tyr Thr Pro Tyr Glu Gly Asp Glu Ser Phe Leu
35 40 45
Ala Glu Ala Thr Pro Ala Thr Thr Glu Leu Trp Glu Lys Val Met Glu
50 55 60
Gly He Arg He Glu Asn Ala Thr His Ala Pro Val Asp Phe Aεp Thr 65 70 75 80
Aεn He Ala Thr Thr He Thr Ala His Aεp Ala Gly Tyr He Asn Gin
85 90 95
Pro Leu Glu Lys He Val Gly Leu Gin Thr Asp Ala Pro Leu Lyε Arg
100 105 110
Ala Leu His Pro Phe Gly Gly He Asn Met He Lys Ser Ser Phe His
115 120 125
Ala Tyr Gly Arg Glu Met Asp Ser Glu Phe Glu Tyr Leu Phe Thr Asp
130 135 140
Leu Arg Lys Thr His Asn Gin Gly Val Phe Asp Val Tyr Ser Pro Asp 145 150 155 160
Met Leu Arg Cyε Arg Lys Ser Gly Val Leu Thr Gly Leu Pro Asp Gly
165 170 175
Tyr Gly Arg Gly Arg He He Gly Asp Tyr Arg Arg Val Ala Leu Tyr
180 185 190
Gly He Ser Tyr Leu Val Arg Glu Arg Glu Leu Gin Phe Ala Asp Leu
195 200 205
Gin Ser Arg Leu Glu Lys Gly Glu Asp Leu Glu Ala Thr He Arg Leu
210 215 220
Arg Glu Glu Leu Ala Glu His Arg His Ala Leu Leu Gin He Gin Glu 225 230 235 240
Met Ala Ala Lys Tyr Gly Phe Asp He Ser Arg Pro Ala Gin Asn Ala
245 250 255
Gin Glu Ala Val Gin Trp Leu Tyr Phe Ala Tyr Leu Ala Ala Val Lyε
260 265 270
Ser Gin Asn Gly Gly Ala Met Ser Leu Gly Arg Thr Ala Ser Phe Leu
275 280 285
Asp He Tyr He Glu Arg Asp Phe Lyε Ala Gly Val Leu Aεn Glu Gin
290 295 300
Gin Ala Gin Glu Leu He Aεp Hiε Phe He Met Lys He Arg Met Val 305 310 315 320
Arg Phe Leu Arg Thr Pro Glu Phe Asp Ser Leu Phe Ser Gly Aβp Pro
325 330 335
He Trp Ala Thr Glu Val He Gly Gly Met Gly Leu Asp Gly Arg Thr
340 345 350
Leu Val Thr Lys Asn Ser Phe Arg Tyr Leu His Thr Leu His Thr Met
355 360 365
Gly Pro Ala Pro Glu Pro Asn Leu Thr He Leu Trp Ser Glu Glu Leu
370 375 380
Pro He Ala Phe Lys Lys Tyr Ala Ala Gin Val Ser He Val Thr Ser 385 390 395 400
Ser Leu Gin Tyr Glu Asn Asp Asp Leu Met Arg Thr Asp Phe Aεn Ser
405 410 415
Aεp Asp Tyr Ala He Ala Cys Cys Val Ser Pro Met Val He Gly Lys
420 425 430
Gin Met Gin Phe Phe Gly Ala Arg Ala Asn Leu Ala Lyε Thr Leu Leu 435 440 445 Tyr Ala He Aεn Gly Gly Val Aεp Glu Lyε Leu Lyε He Gin Val Gly
450 455 460
Pro Lyε Thr Ala Pro Leu Met Aεp Asp Val Leu Asp Tyr Aεp Lyε Val 465 470 475 480
Met Asp Ser Leu Aεp His Phe Met Asp Trp Leu Ala Val Gin Tyr He
485 490 495
Ser Ala Leu Aεn He He His Tyr Met Hiε Asp Lys Tyr Ser Tyr Glu
500 505 510
Ala Ser Leu Met Ala Leu His Asp Arg Asp Val Tyr Arg Thr Met Ala
515 520 525
Cyε Gly He Ala Gly Leu Ser Val Ala Thr Aεp Ser Leu Ser Ala He
530 535 540
Lys Tyr Ala Arg Val Lys Pro He Arg Asp Glu Asn Gly Leu Ala Val 545 550 555 560
Asp Phe Glu He Asp Gly Glu Tyr Pro Gin Tyr Gly Asn Aεn Aεp Glu
565 570 575
Arg Val Asp Ser He Ala Cyε Asp Leu Val Glu Arg Phe Met Lys Lys
580 585 590
He Lys Ala Leu Pro Thr Tyr Arg Asn Ala Val Pro Thr Gin Ser He
595 600 605
Leu Thr He Thr Ser Asn Val Val Tyr Gly Gin Lys Thr Gly Asn Thr
610 615 620
Pro Asp Gly Arg Arg Ala Gly Thr Pro Phe Ala Pro Gly Ala Asn Pro 625 630 635 640
Met Hiε Gly Arg Asp Arg Lys Gly Ala Val Ala Ser Leu Thr Ser Val
645 650 655
Ala Lys Leu Pro Phe Thr Tyr Ala Lys Asp Gly He Ser Tyr Thr Phe
660 665 670
Ser He Val Pro Ala Ala Leu Gly Lys Glu Asp Pro Val Arg Lys Thr
675 680 685
Asn Leu Val Gly Leu Leu Asp Gly Tyr Phe His His Glu Ala Asp Val
690 695 700
Glu Gly Gly Gin His Leu Asn Val Asn Val Met Asn Arg Glu Met Leu 705 710 715 720
Leu Asp Ala He Glu His Pro Glu Lys Tyr Pro Asn Leu Thr He Arg
725 730 735
Val Ser Gly Tyr Ala Cys Ala Ser Thr Hiε 740 745
(2) INFORMATION FOR SEQ ID NO: 20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 769 amino acidε
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:
Ser Glu Leu Asn Glu Met Gin Lys Leu Ala Trp Ala Gly Phe Ala Gly
1 5 10 15
Gly Asp Trp Gin Glu Aεn Val Aεn Val Arg Asp Phe He Gin Lys Asn
20 25 30
Tyr Thr Pro Tyr Glu Gly Asp Asp Ser Phe Leu Ala Gly Pro Thr Glu 35 40 45 Ala Thr Thr Lys Leu Trp Glu Ser Val Met Glu Gly He Lys He Glu
50 55 60
Asn Arg Thr His Ala Pro Leu Asp Phe Asp Glu Hiε Thr Pro Ser Thr 65 70 75 80
He He Ser Hiε Ala Pro Gly Tyr He Asn Lys Asp Leu Glu Lys He
85 90 95
Val Gly Leu Gin Thr Asp Glu Pro Leu Lyε Arg Ala He Met Pro Phe
100 105 110
Gly Gly He Lys Met Val Glu Gly Ser Cys Lyε Val Tyr Gly Arg Glu
115 120 125
Leu Aεp Pro Lyε Val Lyε Lyε He Phe Thr Glu Tyr Arg Lyε Thr Hiε
130 135 140
Aεn Gin Gly Val Phe Asp Val Tyr Thr Pro Asp He Leu Arg Cys Arg 145 150 155 160
Lyε Ser Gly Val Leu Thr Gly Leu Pro Aεp Ala Tyr Gly Arg Gly Arg
165 170 175
He He Gly Asp Tyr Arg Arg Val Ala Leu Tyr Gly Val Asp Phe Leu
180 185 190
Met Lys Asp Lys Tyr Ala Gin Phe Ser Ser Leu Gin Lys Asp Leu Glu
195 200 205
Asp Gly Val Asn Leu Glu Ala Thr He Arg Leu Arg Glu Glu He Ala
210 215 220
Glu Gin His Arg Ala Leu Gly Gin Leu Lys Gin Met Ala Ala Ser Tyr 225 230 235 240
Gly Tyr Aεp He Ser Asn Pro Ala Thr Asn Ala Gin Glu Ala He Gin
245 250 255
Trp Met Tyr Phe Ala Tyr Leu Ala Ala He Lys Ser Gin Aεn Gly Ala
260 265 270
Ala Met Ser Phe Gly Arg Thr Ala Thr Phe He Aεp Val Tyr He Glu
275 280 285
Arg Asp Leu Lys Ala Gly Lys He Thr Glu Thr Glu Ala Gin Glu Leu
290 295 300
Val Asp His Leu Val Met Lyε Leu Arg Met Val Arg Phe Leu Arg Thr 305 310 315 320
Pro Glu Tyr Aεp Gin Leu Phe Ser Gly Asp Pro Met Trp Ala Thr Glu
325 330 335
Thr He Ala Gly Met Gly Leu Asp Gly Arg Thr Leu Val Thr Lyε Aεn
340 345 350
Thr Phe Arg He Leu His Thr Leu Tyr Asn Met Gly Thr Ser Pro Glu
355 360 365
Pro Asn Leu Thr He Leu Trp Ser Glu Gin Leu Pro Glu Aεn Phe Lyε
370 375 380
Arg Phe Cys Ala Lys Val Ser He Asp Thr Ser Ser Val Gin Tyr Glu 385 390 395 400
Asn Asp Asp Leu Met Arg Pro Asp Phe Asn Aεn Aεp Aεp Tyr Ala He
405 410 415
Ala Cyε Cys Val Ser Pro Met He Val Gly Lys Gin Met Gin Phe Phe
420 425 430
Gly Ala Arg Ala Asn Leu Ala Lys Thr Leu Leu Tyr Ala He Asn Gly
435 440 445
Gly He Asp Glu Lys Leu Gly Met Gin Val Gly Pro Lyε Thr Ala Pro
450 455 460
He Thr Asp Glu Val Leu Asp Phe Asp Thr Val Met Thr Arg Met Asp 465 470 475 480
Ser Phe Met Asp Trp Leu Ala Lys Gin Tyr Val Thr Ala Leu Asn Val
485 490 495
He Hiε Tyr Met His Aεp Lys Tyr Ser Tyr Glu Ala Ala Leu Met Ala 500 505 510 Leu His Asp Arg Asp Val Tyr Arg Thr Met Ala Cyε Gly He Ala Gly
515 520 525
Leu Ser Val Ala Ala Asp Ser Leu Ser Ala He Lyε Tyr Ala Lys Val
530 535 540
Lyε Pro Val Arg Gly Aεp He Lyε Asp Lyε Asp Gly Aεn Val Val Ala 545 550 555 560
Thr Aεn Val Ala He Asp Phe Glu He Glu Gly Glu Tyr Pro Gin Tyr
565 570 575
Gly Asn Asn Asp Asn Arg Val Asp Asp He Ala Cys Asp Leu Val Glu
580 585 590
Arg Phe Met Lyε Lyε He Gin Lyε Leu Lys Thr Tyr Arg Asn Ala Val
595 600 605
Pro Thr Gin Ser Val Leu Thr He Thr Ser Asn Val Val Tyr Gly Lys
610 615 620
Lys Thr Gly Asn Thr Pro Asp Gly Arg Arg Ala Gly Ala Pro Phe Gly 625 630 635 640
Pro Gly Ala Asn Pro Met His Gly Arg Asp Gin Lys Gly Ala Val Ala
645 650 655
Ser Leu Thr Ser Val Ala Lys Leu Pro Phe Ala Tyr Ala Lyε Asp Gly
660 665 670
He Ser Tyr Thr Phe Ser He Val Pro Asn Ala Leu Gly Lys Asp Ala
675 680 685
Glu Ala Gin Arg Arg Aεn Leu Ala Gly Leu Met Aεp Gly Tyr Phe His
690 695 700
His Glu Ala Thr Val Glu Gly Gly Gin His Leu Asn Val Asn Val Leu 705 710 715 720
Asn Arg Glu Met Leu Leu Asp Ala Met Glu Aεn Pro Aεp Lys Tyr Pro
725 730 735
Gin Leu Thr He Arg Val Ser Gly Tyr Ala Val Arg Phe Aεn Ser Leu
740 745 750
Thr Lys Glu Gin Gin Gin Asp Val He Thr Arg Thr Phe Thr Glu Ser
755 760 765
Met
(2) INFORMATION FOR SEQ ID NO: 21:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 195 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:
Gly Ser Phe Pro Lys Tyr Gly Aεn Aεp Asp Asp Arg Val Asp Glu He
1 5 10 15
Ala Glu Trp Val Val Ser Thr Phe Ser Ser Lys Leu Ala Lys Gin His
20 25 30
Thr Tyr Arg Asn Ser Val Pro Thr Leu Ser Val Leu Thr He Thr Ser
35 40 45
Asn Val Val Tyr Gly Lyε Lyε Thr Gly Ser Thr Pro Asp Gly Arg Lys
50 55 60
Lyε Gly Glu Pro Phe Ala Pro Gly Ala Asn Pro Leu His Gly Arg Asp 65 70 75 80
Ala His Gly Ala Leu Ala Ser Leu Asn Ser Val Ala Lys Leu Pro Tyr 85 90 95 Thr Met Cyε Leu Asp Gly He Ser Asn Thr Phe Ser Leu He Pro Gin
100 105 110
Val Leu Gly Arg Gly Gly Glu His Glu Arg Ala Thr Aεn Leu Ala Ser
115 120 125
He Leu Asp Gly Tyr Phe Ala Asn Gly Gly Hiε Hiε He Asn Val Asn
130 135 140
Val Leu Asn Arg Ser Met Leu Met Asp Ala Val Glu Hiε Pro Glu Lys 145 150 155 160
Tyr Pro Asn Leu Thr He Arg Val Ser Gly Tyr Ala Val His Phe Ala
165 170 175
Arg Leu Thr Arg Glu Gin Gin Leu Glu Val He Ala Arg Thr Phe Hiε
180 185 190
Aεp Thr Met 195
(2) INFORMATION FOR SEQ ID NO: 22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1006 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:
TGTTACCTGG TTTGAACGGT GGTTACGTTC ATAAAGATTA TAAAGTATTC GATATTGAAC 60
CTGTTCGTGA TGAAATTCTT GACTATGATA CAGTTATGGA AAACTTCGAC AAATCACTCA 120
ACTGGTTGAC AGATACTTAT GTTGATGCAA TGAATATCAT TCACTACATG ACTGACAAAT 180
ATAACTATGA AGCAGTTCAA ATGGCCTTCT TGCCTACTAA AGTTCGTGCT AACATGGGAT 240
TTGGTATCTG TGGTTTCGCA AATACAGTTG ATTCACTTTC AGCGATTAAA TATGCTAAAG 300
TTAAAACTTT GCGTGATGAA AATGGCTACA TCTACGATTA TGAAGTAGAA GGTGACTTCC 360
CACGTTATGG TGAAGATGAT GACCGTGCTG ATGATATCGC TAAACTTGTC ATGAAAATGT 420
ACCATGAAAA ATTAGCTTCA CACAAACTTT ACAAAAATGC TGAAGCTACT GTTTCACTTT 480
TGACAATCAC ATCTAACGTT GCTTACTCTA AACAAACTGG TAACTCTCCA GTTCATAAAG 540
GAGTATTCCT CAATGAAGAT GGTACAGTCA ACAAATCTAA ACTTGAATTC TTCTCACCAG 600
GTGCTAACCC ATCTAACAAA GCTAAAGGTG GATGGTTGCA AAATCTTCGT TCATTAGCTA 660
AATTGGAATT CAAAGATGCA AATGACGGTA TTTCATTAAC TACTCAAGTT TCTCCTCGTG 720
CACTTGGTAA AACTCGTGAT GAACAAGTAG ATAACTTGGT TCAAATTCTT GATGGATACT 780
TCACACCAGG AGCTTTGATT AATGGTACTG AATTTGCAGG TCAACACGTT AACTTGAACG 840
TTATGGACCT TAAAGATGTT TACGATAAAA TCATGCGTGG TGAAGATGTT ATCGTTCGTA 900
TCTCTGGATA CTGTGTTAAC ACTAAATACC TCACACCTGA ACAAAAACAA GAATTGACTG 960
AACGTGTCTT CCATGAAGTA CTTTCAAACG ATGATGAAGA AGTAAT 1006
(2) INFORMATION FOR SEQ ID NO: 23:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 334 amino acidε
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:
Leu Pro Gly Leu Asn Gly Gly Tyr Val His Lys Aεp Tyr Lys Val Phe
1 5 10 15
Asp He Glu Pro Val Arg Asp Glu He Leu Asp Tyr Asp Thr Val Met
20 25 30
Glu Asn Phe Asp Lyε Ser Leu Asn Trp Leu Thr Asp Thr Tyr Val Asp
35 40 45
Ala Met Asn He He Hiε Tyr Met Thr Aεp Lyε Tyr Asn Tyr Glu Ala
50 55 60
Val Gin Met Ala Phe Leu Pro Thr Lys Val Arg Ala Asn Met Gly Phe 65 70 75 80
Gly He Cys Gly Phe Ala Asn Thr Val Asp Ser Leu Ser Ala He Lyε
85 90 95
Tyr Ala Lyε Val Lys Thr Leu Arg Asp Glu Asn Gly Tyr He Tyr Asp
100 105 110
Tyr Glu Val Glu Gly Asp Phe Pro Arg Tyr Gly Glu Asp Asp Asp Arg
115 120 125
Ala Asp Asp He Ala Lys Leu Val Met Lys Met Tyr Hiε Glu Lyε Leu
130 135 140
Ala Ser Hiε Lys Leu Tyr Lys Asn Ala Glu Ala Thr Val Ser Leu Leu 145 150 155 160
Thr He Thr Ser Asn Val Ala Tyr Ser Lyε Gin Thr Gly Asn Ser Pro
165 170 175
Val Hiε Lyε Gly Val Phe Leu Aεn Glu Aεp Gly Thr Val Aεn Lyε Ser
180 185 190
Lys Leu Glu Phe Phe Ser Pro Gly Ala Asn Pro Ser Asn Lyε Ala Lys
195 200 205
Gly Gly Trp Leu Gin Asn Leu Arg Ser Leu Ala Lys Leu Glu Phe Lys
210 215 220
Asp Ala Aεn Asp Gly He Ser Leu Thr Thr Gin Val Ser Pro Arg Ala 225 230 235 240
Leu Gly Lys Thr Arg Asp Glu Gin Val Asp Aεn Leu Val Gin He Leu
245 250 255
Aεp Gly Tyr Phe Thr Pro Gly Ala Leu He Aεn Gly Thr Glu Phe Ala
260 265 270
Gly Gin Hiε Val Asn Leu Aεn Val Met Asp Leu Lyε Asp Val Tyr Aεp
275 280 285
Lyε He Met Arg Gly Glu Asp Val He Val Arg He Ser Gly Tyr Cyε
290 295 300
Val Aεn Thr Lys Tyr Leu Thr Pro Glu Gin Lyε Gin Glu Leu Thr Glu 305 310 315 320
Arg Val Phe Hiε Glu Val Leu Ser Asn Asp Asp Glu Glu Val 325 330
(2) INFORMATION FOR SEQ ID NO:24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 776 amino acidε
(B) TYPE : amino acid
(C) STRANDEDNESS: εingle
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:
Met Ala Thr Val Lyε Thr Asn Thr Asp Val Phe Glu Lys Ala Trp Glu
1 5 10 15
Gly Phe Lys Gly Thr Asp Trp Lys Asp Arg Ala Ser He Ser Arg Phe
20 25 30
Val Gin Asp Asn Tyr Thr Pro Tyr Asp Gly Gly Glu Ser Phe Leu Ala
35 40 45
Gly Pro Thr Glu Arg Ser Leu His He Lys Lys Val Val Glu Glu Thr
50 55 60
Lys Ala His Tyr Glu Glu Thr Arg Phe Pro Met Asp Thr Arg He Thr 65 70 75 80
Ser He Ala Aεp He Pro Ala Gly Tyr He Asp Lys Glu Asn Glu Leu
85 90 95
He Phe Gly He Gin Asn Asp Glu Leu Phe Lyε Leu Aεn Phe Met Pro
100 105 110
Lys Gly Gly He Arg Met Ala Glu Thr Ala Leu Lyε Glu Hiε Gly Tyr
115 120 125
Glu Pro Asp Pro Ala Val His Glu He Phe Thr Lys Tyr Ala Thr Thr
130 135 140
Val Asn Asp Gly He Phe Arg Ala Tyr Thr Ser Asn He Arg Arg Ala 145 150 155 160
Arg His Ala His Thr Val Thr Gly Leu Pro Asp Ala Tyr Ser Arg Gly
165 170 175
Arg He He Gly Val Tyr Ala Arg Leu Ala Leu Tyr Gly Ala Asp Tyr
180 185 190
Leu Met Gin Glu Lys Val Aεn Asp Trp Asn Ser He Ala Glu He Aεp
195 200 205
Glu Glu Ser He Arg Leu Arg Glu Glu He Asn Leu Gin Tyr Gin Ala
210 215 220
Leu Gly Glu Val Val Arg Leu Gly Asp Leu Tyr Gly Leu Asp Val Arg 225 230 235 240
Lys Pro Ala Met Asn Val Lys Glu Ala He Gin Trp He Asn He Ala
245 250 255
Phe Met Ala Val Cys Arg Val He Asn Gly Ala Ala Thr Ser Leu Gly
260 265 270
Arg Val Pro He Val Leu Asp He Phe Ala Glu Arg Asp Leu Ala Arg
275 280 285
Gly Thr Phe Thr Glu Ser Glu He Gin Glu Phe Val Asp Asp Phe Val
290 295 300
Met Lys Leu Arg Thr Val Lys Phe Ala Arg Thr Lys Ala Tyr Aεp Glu 305 310 315 320
Leu Tyr Ser Gly Asp Pro Thr Phe He Thr Thr Ser Met Ala Gly Met
325 330 335
Gly Ala Asp Gly Arg His Arg Val Thr Lys Met Asp Tyr Arg Phe Leu
340 345 350
Asn Thr Leu Aεp Aεn He Gly Aεn Ala Pro Glu Pro Aεn Leu Thr Val
355 360 365
Leu Trp Ser Ser Lyε Leu Pro Tyr Ser Phe Arg His Tyr Cys Met Ser
370 375 380
Met Ser His Lys His Ser Ser He Gin Tyr Glu Gly Val Thr Thr Met 385 390 395 400
Ala Lyε Glu Gly Tyr Gly Glu Met Ser Cyε He Ser Cys Cyε Val Ser
405 410 415
Pro Leu Asp Pro Glu Asn Glu Asp Arg Arg His Asn Leu Gin Tyr Phe
420 425 430
Gly Ala Arg Val Asn Val Leu Lyε Ala Leu Leu Thr Gly Leu Asn Gly 435 440 445 Gly Tyr Asp Asp Val His Lys Asp Tyr Lys Val Phe Asp Val Glu Pro
450 455 460
He Arg Asp Glu Val Leu Asp Phe Glu Thr Val Lyε Ala Aεn Phe Glu 465 470 475 480
Lyε Ala Leu Aεp Trp Leu Thr Aεp Thr Tyr Val Aεp Ala Met Asn He
485 490 495
He Hiε Tyr Met Thr Aεp Lys Tyr Aεn Tyr Glu Ala Val Gin Met Ala
500 505 510
Phe Leu Pro Thr Arg Val Lyε Ala Aεn Met Gly Phe Gly He Cyε Gly
515 520 525
Phe Ser Aεn Thr Val Aεp Ser Leu Ser Ala He Lyε Tyr Ala Thr Val
530 535 540
Lys Pro He Arg Asp Glu Asp Gly Tyr He Tyr Aεp Tyr Glu Thr Val 545 550 555 560
Gly Asn Phe Pro Arg Tyr Gly Glu Asp Asp Asp Arg Val Asp Ser He
565 570 575
Ala Glu Trp Leu Leu Glu Ala Phe His Thr Arg Leu Ala Arg Hiε Lys
580 585 590
Leu Tyr Lys Aεp Ser Glu Ala Thr Val Ser Leu Leu Thr He Thr Ser
595 600 605
Asn Val Ala Tyr Ser Lys Gin Thr Gly Asn Ser Pro Val His Lys Gly
610 615 620
Val Tyr Leu Asn Glu Asp Gly Ser Val Asn Leu Ser Lys Val Glu Phe 625 630 635 640
Phe Ser Pro Gly Ala Asn Pro Ser Asn Lys Ala Ser Gly Gly Trp Leu
645 650 655
Gin Asn Leu Asn Ser Leu Lyε Lys Leu Asp Phe Ala Hiε Ala Aεn Asp
660 665 670
Gly He Ser Leu Thr Thr Gin Val Ser Pro Lys Ala Leu Gly Lys Thr
675 680 685
Phe Asp Glu Gin Val Ala Asn Leu Val Thr He Leu Asp Gly Tyr Phe
690 695 700
Glu Gly Gly Gly Gin His Val Asn Leu Aεn Val Met Asp Leu Lys Asp 705 710 715 720
Val Tyr Aεp Lys He Met Aεn Gly Glu Aεp Val He Val Arg He Ser
725 730 735
Gly Tyr Cyε Val Aεn Thr Lyε Tyr Leu Thr Lyε Glu Gin Lyε Thr Glu
740 745 750
Leu Thr Gin Arg Val Phe His Glu Val Leu Ser Met Asp Asp Ala Ala
755 760 765
Thr Asp Leu Val Asn Asn Lys Glx 770 775
(2) INFORMATION FOR SEQ ID NO: 25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 740 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0:25:
Leu Phe Lys Gin Trp Glu Gly Phe Gin Asp Gly Glu Trp Thr Asn Asp
1 5 10 15
Val Asn Val Arg Asp Phe He Gin Lyε Aεn Tyr Lyε Glu Tyr Thr Gly
20 25 30
Aεp Lyε Ser Phe Leu Lyε Gly Pro Thr Glu Lys Thr Lyε Lyε Val Trp
35 40 45
Asp Lys Ala Val Ser Leu He Leu Glu Glu Leu Lys Lys Gly He Leu
50 55 60
Asp Val Asp Thr Glu Thr He Ser Gly He Aεn Ser Phe Lys Pro Gly 65 70 75 80
Tyr Leu Asp Lys Asp Asn Glu Val He Val Gly Phe Gin Thr Aεp Ala
85 90 95
Pro Leu Lyε Arg He Thr Aεn Pro Phe Gly Gly He Arg Met Ala Glu
100 105 110
Gin Ser Leu Lyε Glu Tyr Gly Phe Lyε He Ser Aεp Glu Met Hiε Aεn
115 120 125
He Phe Thr Aεn Tyr Arg Lys Thr Hiε Aεn Gin Gly Val Phe Aεp Ala
130 135 140
Tyr Ser Glu Glu Thr Arg He Ala Arg Ser Ala Gly Val Leu Thr Gly 145 150 155 160
Leu Pro Aεp Ala Tyr Gly Arg Gly Arg He He Gly Aεp Tyr Arg Arg
165 170 175
Val Ala Leu Tyr Gly He Aεp Phe Leu He Gin Glu Lyε Lys Lys Asp
180 185 190
Leu Ser Asn Leu Lys Gly Asp Met Leu Asp Glu Leu He Arg Leu Arg
195 200 205
Glu Glu Val Ser Glu Gin He Arg Ala Leu Asp Glu He Lys Lys Met
210 215 220
Ala Leu Ser Tyr Gly Val Asp He Ser Arg Pro Ala Val Asn Ala Lys 225 230 235 240
Glu Ala Ala Gin Phe Leu Tyr Phe Gly Tyr Leu Ala Gly Val Lys Glu
245 250 255
Asn Asn Gly Ala Ala Met Ser Leu Gly Arg Thr Ser Thr Phe Leu Asp
260 265 270
He Tyr He Glu Arg Asp Leu Glu Gin Gly Leu He Thr Glu Asp Glu
275 280 285
Ala Gin Glu Val He Asp Gin Phe He He Lys Leu Arg Leu Val Arg
290 295 300
Hiε Leu Arg Thr Pro Glu Tyr Asn Glu Leu Phe Ala Gly Asp Pro Thr 305 310 315 320
Trp Val Thr Glu Ser He Ala Gly Val Gly He Asp Gly Arg Ser Leu
325 330 335
Val Thr Lys Asn Ser Phe Arg Tyr Leu His Thr Leu He Asn Leu Gly
340 345 350
Ser Ala Pro Glu Pro Asn Met Thr Val Leu Trp Ser Glu Asn Leu Pro
355 360 365
Glu Ser Phe Lyε Lys Phe Cyε Ala Glu Met Ser He Leu Thr Aεp Ser
370 375 380
He Gin Tyr Glu Aεn Asp Asp He Met Arg Pro He Tyr Gly Asp Asp 385 390 395 400
Tyr Ala He Ala Cys Cys Val Ser Ala Met Arg Val Gly Lys Asp Met
405 410 415
Gin Phe Phe Gly Ala Arg Cys Aεn Leu Ala Lyε Cys Leu Leu Leu Ala
420 425 430
He Asn Gly Gly Val Asp Glu Lys Lys Gly He Lyε Val Val Pro Aεp 435 440 445 He Glu Pro He Thr Aεp Glu Val Leu Aεp Tyr Glu Lyε Val Lys Glu
450 455 460
Asn Tyr Phe Lys Val Leu Glu Tyr Met Ala Gly Leu Tyr Val Asn Thr 465 470 475 480
Met Aεn He He Hiε Phe Met Hiε Aεp Lyε Tyr Ala Tyr Glu Ala Ser
485 490 495
Gin Met Ala Leu Hiε Aεp Thr Lyε Val Gly Arg Leu Met Ala Phe Gly
500 505 510
He Ala Gly Phe Ser Val Ala Ala Aεp Ser Leu Ser Ala He Arg Tyr
515 520 525
Ala Lyε Val Lys Pro He Arg Glu Asn Gly He Thr Val Asp Phe Val
530 535 540
Lys Glu Gly Asp Phe Pro Lys Tyr Gly Asn Asp Aεp Asp Arg Val Asp 545 550 555 560
Ser He Ala Val Glu He Val Glu Lys Phe Ser Asp Glu Leu Lys Lys
565 570 575
His Pro Thr Tyr Arg Asn Ala Lyε Hiε Thr Leu Ser Val Leu Thr He
580 585 590
Thr Ser Asn Val Met Tyr Gly Lyε Lys Thr Gly Thr Thr Pro Asp Gly
595 600 605
Arg Lys Val Gly Glu Pro Leu Ala Pro Gly Ala Asn Pro Met His Gly
610 615 620
Arg Asp Met Glu Gly Ala Leu Ala Ser Leu Asn Ser Val Ala Lyε Val 625 630 635 640
Pro Tyr Val Cyε Cys Glu Asp Gly Val Ser Asn Thr Phe Ser He Val
645 650 655
Pro Asp Ala Leu Gly Asn Asp Hiε Asp Val Arg He Asn Asn Leu Val
660 665 670
Ser He Met Gly Gly Tyr Phe Gly Gin Gly Ala His His Leu Asn Val
675 680 685
Asn Val Leu Aεn Arg Glu Thr Leu He Asp Ala Met Asn Asn Pro Asp
690 695 700
Lys Tyr Pro Thr Leu Thr He Arg Val Ser Gly Tyr Ala Val Asn Phe 705 710 715 720
Asn Arg Leu Ser Lys Asp Hiε Gin Lyε Glu Val He Ser Arg Thr Phe
725 730 735
Hiε Glu Lyε Leu 740
(2) INFORMATION FOR SEQ ID NO:26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1848 baεe pairε
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 591...1613
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:
CTGCAGCTTG TTTTTTAGTA CCAACAAAAA GGACTACTGC ACCTTCTTGT GAAGCGTTTT 60
TTACATAGTT GTAAGCATCG TCAACAAGTT TTACAGTTTT TTGAAGGTCG ATAACGTGGA 120
TACCATTACG TTCTGTGAAG ATGTATGGTT TCATITTTGG GTTCCAACGA CGAGTTTGGT 180
GACCGAAGTG AACACCAGCT TCAAGAAGTT GTTTCATTGA AATAACTGAC ATGTTAATGT 240 CTCCTTTTAA AATAGTTTTT CCTCTTTCAT CTGTCATCCG CAGCCGCAAT ACTTGCGTAC 300
ACTACGACTT TGTCGAGACG AAATGCGAGA TGGTTGCATA GCAACTCTAT CATTATACAT 360
TGTTTGACCT ATTTTTGCAA GTATCTATTC ATGCTTCTAT TGTTCAGTAA ATCTATTTTT 420
CTAACCACTC CTATTATCTG ACAAATTTAA TTGTTAATTT AGGCTCTATA ATCACTAAAA 480
GAGTAAGTTT TTAAATTTTT TTCTAAGAAA AAAATTAATA TTTTTGCTGA AACCGCTTTT 540
TTTGTGATAA AATAATTATA GTAAATAAAT TAGTTTGTGA GGAGAGAAAT ATG AAA 596
Met Lyε 1
GAA AAA ATC CTT TTA GGC GGC TAT ACA AAA CGT GTA TCT AAA GGC GTA 644 Glu Lys He Leu Leu Gly Gly Tyr Thr Lys Arg Val Ser Lyε Gly Val 5 10 15
TAT AGT GTT CTT TTG GAC ACT AAA GCT GCT GAA TTA TCA TCA TTA AAT 692 Tyr Ser Val Leu Leu Aεp Thr Lyε Ala Ala Glu Leu Ser Ser Leu Asn 20 25 30
GAA GTC GCT GCG GTT CAA AAC CCT ACT TAT ATC ACT CTC GAT GAA AAG 740 Glu Val Ala Ala Val Gin Aεn Pro Thr Tyr He Thr Leu Aεp Glu Lyε 35 40 45 50
GGA CAC CTC TAT ACT TGT GCA GCA GAT AGT AAT GGT GGA GGA ATC GCC 788 Gly His Leu Tyr Thr Cys Ala Ala Aεp Ser Aεn Gly Gly Gly He Ala 55 60 65
GCC TTT GAT TTT GAT GGC GAA ACT GCT ACT CAT CTC GGA AAT GTC ACA 836 Ala Phe Asp Phe Asp Gly Glu Thr Ala Thr Hiε Leu Gly Asn Val Thr 70 75 80
ACC ACG GGA GCT CCA CTC TGC TAT GTT GCC GTG GAC GAA GCG CGA CAA 884 Thr Thr Gly Ala Pro Leu Cys Tyr Val Ala Val Asp Glu Ala Arg Gin 85 90 95
TTA GTT TAC GGA GCG AAC TAT CAT CTT GGA GAA GTT CGT GTT TAT AAG 932 Leu Val Tyr Gly Ala Asn Tyr His Leu Gly Glu Val Arg Val Tyr Lyε 100 105 110
ATT CAA GCT AAT GGC TCA CTC CGA TTA ACG GAT ACA GTA AAA CAT ACC 980 He Gin Ala Asn Gly Ser Leu Arg Leu Thr Asp Thr Val Lys His Thr 115 120 125 130
GGT TCT GGA CCA CGT CCT GAA CAA GCT AGC TCA CAC GTT CAT TAT TCT 1028 Gly Ser Gly Pro Arg Pro Glu Gin Ala Ser Ser His Val Hiε Tyr Ser 135 140 145
GAT TTG ACT CCT GAC GGA CGA CTT GTC ACC TGT GAT TTG GGA ACA GAT 1076 Aεp Leu Thr Pro Asp Gly Arg Leu Val Thr Cyε Aεp Leu Gly Thr Aεp 150 155 160
GAA GTC ACT GTT TAT GAT GTC ATT GGT GAA GGT AAA CTC AAT ATT GCT 1124 Glu Val Thr Val Tyr Asp Val He Gly Glu Gly Lys Leu Asn He Ala 165 170 175
ACA ATT TAT CGG GCA GAA AAA GGA ATG GGT GCT CGT CAT ATT ACT TTC 1172 Thr He Tyr Arg Ala Glu Lys Gly Met Gly Ala Arg His He Thr Phe 180 185 190
CAT CCA AAT GGT AAA ATC GCT TAT TTG GTT GGA GAG TTA AAT TCA ACA 1220 Hiε Pro Aεn Gly Lyε He Ala Tyr Leu Val Gly Glu Leu Asn Ser Thr 195 200 205 210
ATT GAA GTT TTA AGT TAC AAT GAA GAA AAA GGA CGC TTT GCT CGT CTT 1268 He Glu Val Leu Ser Tyr Asn Glu Glu Lyε Gly Arg Phe Ala Arg Leu 215 220 225
CAA ACA ATT AGC ACC CTA CCT GAA GAT TAT CAT GGA GCA AAT GGT GTT 1316 Gin Thr He Ser Thr Leu Pro Glu Asp Tyr His Gly Ala Asn Gly Val 230 235 240
GCT GCC ATC CGT ATT TCA TCT GAC GGT AAA TTC CTC TAT ACT TCT AAT 1364 Ala Ala He Arg He Ser Ser Asp Gly Lys Phe Leu Tyr Thr Ser Asn 245 250 255
CGT GGA CAT GAT TCT TTG ACA ACT TAC AAA GTA AGT CCT CTT GGT ACA 1412 Arg Gly His Asp Ser Leu Thr Thr Tyr Lys Val Ser Pro Leu Gly Thr 260 265 270
AAA CTT GAA ACT ATT GGC TGG ACA AAT ACT GAA GGT CAT ATC CCT CGC 1460 Lys Leu Glu Thr He Gly Trp Thr Asn Thr Glu Gly His He Pro Arg 275 280 285 290
GAT TTT AAT TTC AAC AAA ACT GAA GAT TAT ATC ATT GTC GCT CAT CAA 1508 Asp Phe Asn Phe Asn Lyε Thr Glu Aεp Tyr He He Val Ala His Gin 295 300 305
GAA TCT GAT AAT TTA TCT CTT TTC TTG CGA GAT AAA AAA ACC GGT ACT 1556 Glu Ser Asp Aεn Leu Ser Leu Phe Leu Arg Aεp Lys Lys Thr Gly Thr 310 315 320
TTA ACT TTG GAA CAA AAA GAT TTT TAC GCT CCT GAA ATC ACT TGT GTT 1604 Leu Thr Leu Glu Gin Lyε Asp Phe Tyr Ala Pro Glu He Thr Cys Val 325 330 335
TTA CCA CTA TAAAAATTTA TTTTTTCACA AAGTTTGACT GATAAACTAA AAAAGATTG 1662 Leu Pro Leu 340
CTAATTTCTC TCAAAGAATT AGCAATCTTT TTTTCTTCAG TAAAGCTTGT TACAAAACCG 1722
TTTTCTAAAC TTTTGATGAG TGTTTTTGTA AAAACTATCA CAATATTGCT TGACATCTAT 1782
AAAAAACTTT GTTAAACTAT TCACGTAAAA GAAAGTGAAT GAAGTCACAA AGGAGAACCT 1842
ACAAAT 1848
(2) INFORMATION FOR SEQ ID NO: 27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 341 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:
Met Lys Glu Lys He Leu Leu Gly Gly Tyr Thr Lys Arg Val Ser Lys
1 5 10 15
Gly Val Tyr Ser Val Leu Leu Asp Thr Lys Ala Ala Glu Leu Ser Ser
20 25 30
Leu Asn Glu Val Ala Ala Val Gin Asn Pro Thr Tyr He Thr Leu Asp
35 40 45
Glu Lys Gly His Leu Tyr Thr Cys Ala Ala Asp Ser Asn Gly Gly Gly
50 55 60
He Ala Ala Phe Aεp Phe Asp Gly Glu Thr Ala Thr His Leu Gly Asn 65 70 75 80
Val Thr Thr Thr Gly Ala Pro Leu Cyε Tyr Val Ala Val Aεp Glu Ala
85 90 95
Arg Gin Leu Val Tyr Gly Ala Aεn Tyr Hiε Leu Gly Glu Val Arg Val
100 105 110
Tyr Lys He Gin Ala Asn Gly Ser Leu Arg Leu Thr Asp Thr Val Lys
115 120 125
His Thr Gly Ser Gly Pro Arg Pro Glu Gin Ala Ser Ser His Val His
130 135 140
Tyr Ser Asp Leu Thr Pro Asp Gly Arg Leu Val Thr Cys Asp Leu Gly 145 150 155 160
Thr Asp Glu Val Thr Val Tyr Asp Val He Gly Glu Gly Lyε Leu Aεn
165 170 175
He Ala Thr He Tyr Arg Ala Glu Lys Gly Met Gly Ala Arg Hiε He
180 185 190
Thr Phe His Pro Aεn Gly Lyε He Ala Tyr Leu Val Gly Glu Leu Asn
195 200 205
Ser Thr He Glu Val Leu Ser Tyr Aεn Glu Glu Lys Gly Arg Phe Ala
210 215 220
Arg Leu Gin Thr He Ser Thr Leu Pro Glu Asp Tyr Hiε Gly Ala Aεn 225 230 235 240
Gly Val Ala Ala He Arg He Ser Ser Aεp Gly Lyε Phe Leu Tyr Thr
245 250 255
Ser Aεn Arg Gly Hiε Aεp Ser Leu Thr Thr Tyr Lyε Val Ser Pro Leu
260 265 270
Gly Thr Lys Leu Glu Thr He Gly Trp Thr Aεn Thr Glu Gly His He
275 280 285
Pro Arg Asp Phe Aεn Phe Aεn Lys Thr Glu Asp Tyr He He Val Ala
290 295 300
His Gin Glu Ser Aεp Asn Leu Ser Leu Phe Leu Arg Asp Lys Lys Thr 305 310 315 320
Gly Thr Leu Thr Leu Glu Gin Lyε Aεp Phe Tyr Ala Pro Glu He Thr
325 330 335
Cys Val Leu Pro Leu 340
(2) INFORMATION FOR SEQ ID NO: 28:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4741 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA (lx) FEATURE: (A) NAME/KEY: Coding Sequence
(B) LOCATION: 453...1475
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28:
TTTGGTGACC GAAGTGAACA CCAGCTTCAA GAAGTTGTTT CATTGAAATA ACTGACATGT 60
TAATGTCTCC TTTTAAAATA GTTTTTCCTC TTTCATCTGT CATCCGCAGC CGCAATACTT 120
GCGTACACTA CGACTTTGTC GAGACGAAAT GCGAGATGGT TGCATAGCAA CTCTCTCATT 180
ATACATTGTT TAAGCTACTT TTGCAAGCAT CTATTCATTT ATTTCTTTTA TCAATATGAG 240
TAAATGAAAG CTATCCTACC CCCCTTTCTT TTTATTCTGT TTTTTATATC TCAATGTTGT 300
CTGACAAATT TAACGAATAT TTTTGCCTAT ATAATCCCCA TAAGGGAGAT TTTTACATTT 360
TTTTCTAAGA ATAAAATTAA TATTTTTGCT GAAAACGCTT TTTTTGTGAT AAAATAATTA 420
TAGTAAATAA AATAGTTTGT GAGGAGAGAA AT ATG AAA GAA AAA ATC CTT TTA 473
Met Lys Glu Lys He Leu Leu 1 5
GGC GGT TAT ACT AAA CGT GTA TCT AAA GGC GTT TAC AGT GTT CTA TTA 521 Gly Gly Tyr Thr Lyε Arg Val Ser Lyε Gly Val Tyr Ser Val Leu Leu 10 15 20
GAT AGC AAG AAA GCT GAA TTG TCG GCT TTA ACT GAA GTT GCA GCG GTT 569 Asp Ser Lys Lys Ala Glu Leu Ser Ala Leu Thr Glu Val Ala Ala Val 25 30 35
CAA AAT CCA ACT TAT ATC ACT CTT GAT CAA AAA GGG CAC CTC TAC ACT 617 Gin Aεn Pro Thr Tyr He Thr Leu Asp Gin Lys Gly His Leu Tyr Thr 40 45 50 55
TGT GCT GCT GAT GGA AAT GGT GGT GGA ATT GCT GCC TTT GAT TTC GAT 665 Cys Ala Ala Asp Gly Asn Gly Gly Gly He Ala Ala Phe Asp Phe Asp 60 65 70
GGT CAA AAT ACA ACT CAC CTA GGG AAT GTA ACG AGT ACT GGA GCC CCT 713 Gly Gin Asn Thr Thr His Leu Gly Asn Val Thr Ser Thr Gly Ala Pro 75 80 85
TTG TGT TAT GTG GCT GTT GAT GAA GCA CGT CAA CTC GTT TAT GGT GCC 761 Leu Cys Tyr Val Ala Val Asp Glu Ala Arg Gin Leu Val Tyr Gly Ala 90 95 100
AAC TAT CAC TTG GGT GAA GTT CGT GTG TAC AAA ATT CAA GCT GAT GGT 809 Asn Tyr His Leu Gly Glu Val Arg Val Tyr Lys He Gin Ala Asp Gly 105 110 115
TCC CTT AGA TTA ACC GAT ACA GTT AAA CAT AAT GGT TCT GGC CCT CGA 857 Ser Leu Arg Leu Thr Asp Thr Val Lys His Asn Gly Ser Gly Pro Arg 120 125 130 135
CCT GAG CAA GCA AGT TCT CAT GTC CAT TAC TCT GAT TTA ACT CCA GAT 905 Pro Glu Gin Ala Ser Ser His Val His Tyr Ser Asp Leu Thr Pro Asp 140 145 150
GGT CGT CTT GTT ACT TGT GAT TTA GGT ACA GAT GAA GTG ACT GTT TAC 953 Gly Arg Leu Val Thr Cys Asp Leu Gly Thr Asp Glu Val Thr Val Tyr 155 160 165 GAT GTT ATT GGT GAA GGT AAA CTC AAT ATC GTT ACG ATT TAT CGT GCC 1001 Asp Val He Gly Glu Gly Lys Leu Asn He Val Thr He Tyr Arg Ala 170 175 180
GAA AAA GGA ATG GGA GCT CGT CAC ATC AGC TTC CAT CCT AAT GGA AAA 1049 Glu Lys Gly Met Gly Ala Arg His He Ser Phe His Pro Asn Gly Lys 185 190 195
ATT GCT TAT CTC GTC GGA GAA TTA AAT TCA ACT ATT GAA GTT CTA AGC 1097 He Ala Tyr Leu Val Gly Glu Leu Asn Ser Thr He Glu Val Leu Ser 200 205 210 215
TAT AAT GAA GAA AAA GGA CGA TTC GCT CGT CTT CAA ACA ATC AGT ACT 1145 Tyr Asn Glu Glu Lys Gly Arg Phe Ala Arg Leu Gin Thr He Ser Thr 220 225 230
TTA CCT GAA GAC TAT CAC GGA GCC AAT GGA GTA GCT GCT ATT CGA ATT 1193 Leu Pro Glu Asp Tyr His Gly Ala Asn Gly Val Ala Ala He Arg He 235 240 245
TCT TCT GAT GGT AAG TTC CTC TAT GCT TCT AAT CGT GGG CAC GAC TCT 1241 Ser Ser Asp Gly Lys Phe Leu Tyr Ala Ser Asn Arg Gly His Asp Ser 250 255 260
TTA GCA ATT TAC AAG GTA AGT CCT CTC GGA ACA AAA TTA GAA TCT ATT 1289 Leu Ala He Tyr Lys Val Ser Pro Leu Gly Thr Lys Leu Glu Ser He 265 270 275
GGT TGG ACA AAG ACT GAA TAT CAT ATT CCA CGC GAT TTT AAT TTT AAT 1337 Gly Trp Thr Lys Thr Glu Tyr His He Pro Arg Asp Phe Asn Phe Asn 280 285 290 295
AAA ACC GAA GAT TAT ATC ATT GTC GCT CAT CAA GAA TCT GAT AAT TTA 1385 Lys Thr Glu Asp Tyr He He Val Ala His Gin Glu Ser Asp Asn Leu 300 305 310
ACT CTT TTC TTG AGA GAT AAA AAT ACA GGG TCA TTA ACG TTA GAA CAA 1 33 Thr Leu Phe Leu Arg Asp Lys Asn Thr Gly Ser Leu Thr Leu Glu Gin 315 320 325
AAA GAC TTT TAC GCT CCT GAA ATT ACT TGT GTT TTA CCT TTG TAAAAACTA 1484 Lys Asp Phe Tyr Ala Pro Glu He Thr Cys Val Leu Pro Leu 330 335 340
AACTTTAGTA AATCTTGCTT TTGTTTTTTC ACAAAGTTTT ACTAAATCAG ACAAAAAAAT 1544
ATTGCCAAAT CTTTAAAAGG ATTGGCAATA TTTTTTTGTC TGAAACCCTT GCTTATAAAG 1604
CGATTTCTAA AAGTTTGATG AGTTTTTTTG TAAATTTCAT CACAATATCG CTTGACTTCT 1664
TTAAAAAACT TTGTTAAACT ATTCACGTAA AAGAAAGTGA ATGGAATCAC AAAGGAGAAC 1724
GTACACATAT GGCAACTAAA AAAGCCGCTC CAGCTGCAAA GAAAGTTTTA AGCGCTGAAG 1784
AAAAAGCCGC AAAATTCCAA GGAAGTGTCG CTTATACTGA TCAATTAGTC AAAAAAGCTC 1844
AAGCTGCAGT TCTTAAATTT GAAGGATACA CACAAACTCA AGTTGATACT ATTGTTGCTG 1904
CAATGGCTCT TGCAGCAAGC AAACATTCTC TGGAACTCGC TCACGAAGCC GTTAATGAAA 1964
CTGGCCGTGG AGTTGTTGAG GACAAAGATA CAAAAAACCA TTTTGCTTCT GAATCTGTTT 2024
ATAATGCAAT CAAAAATGAT AAAACAGTTG GCGTTATCGC TGAAAACAAA GTTGCTGGTT 2084
CTGTTGAAAT CGCAAGCCCC CTTGGAGTAC TTGCTGGTAT TGTCCCAACA ACTAATCCAA 2144
CATCAACAGC CATCTTTAAA TCATTATTAA CTGCAAAGAC ACGTAATGCT ATTGTCTTTG 2204
CCTTTCACCC ACAAGCACAA AAATGCTCAA GCCATGCGGC AAAAATTGTT TATGATGCTG 2264
CGATTGAAGC TGGTGCACCT GAAGACTTTA TTCAATGGAT TGAAGTACCC AGTCTTGATA 2324 TGACGACTGC TTTGATTCAA AATAGAGGAA TTGCTACAAT TCTTGCAACT GGTGGTCCAG 2384
GTATGGTCAA TGCCGCGCTT AAGTCTGGTA ATCCTTCACT TGGTGTAGGT GCTGGTAATG 2444
GTGCAGTTTA TGTTGATGCA ACTGCAAATA TCGATCGTGC TGTTGAAGAT CTTTTGCTTT 2504
CAAAACGTTT TGATAACGGA ATGATTTGTG CGACTGAAAA CTCTGCAGTT ATTGATGCAT 2564
CAATCTATGA TGAATTTGTC GCTAAAATGC CAACGCAAGG CGCTTATATG GTTCCTAAAA 2624
AAGATTACAA GGCAATTGAA AGTTTTGTTT TCGTTGAACG TGCTGGTGAA GGTTTTGGTG 2684
TAACTGGTCC TGTTGCTGGT CGTTCTGGTC AATGGATTGC TGAACAAGCT GGTGTTAACG 274
TCCCTAAAGA TAAAGATGTT CTTCTTTTTG AACTTGATAA GAAAAATATT GGGGAAGCTC 2804
TTTCTTCTGA AAAACTTTCT CCTTTGCTTT CAATCTACAA ATCAGAAACA CGTGAAGAAG 2864
GAATTGAAAT TGTACGTAGC TTACTTGCTT ACCAAGGAGC TGGTCACAAC GCTGCCATTC 2924
AAATCGGTGC AATGGACGAC CCATTTGTCA AAGAATACGG AATTAAAGTC GAAGCTTCTC 2984
GTATCCTCGT TAACCAACCT GACTCTATCG GTGGGGTCGG AGATATTTAT ACTGATGCAA 3044
TGCGTCCATC ATTGACGCTC GGAACTGGTT CATGGGGGAA AAATTCACTT TCACACAATT 3104
TGAGTACATA CGATCTATTG AATGTTAAAA CAGTGGCTAA ACGTCGTAAT CGCCCTCAAT 3164
GGGTTCGTTT GCCAAAAGAA ATTTACTACG AAAAAAATGC AATTTCTTAC TTACAAGAAT 3224
TGCCACACGT CCACAAAGCT TTCATTGTTG CCGACCCTGG TATGGTTAAA TTCGGTTTCG 3284
TTGATAAAGT TTTGGAACAA CTTGCTATCC GCCCAACTCA AGTTGAAACA AGCATTTATG 3344
GCTCAGTCCA ACCTGACCCA ACTTTGAGTG AAGCAATTGC AATCGCTCGT CAAATGAACC 3404
ATTTTGAACC TGACACTGTC ATCTGTCTTG GTGGTGGTTC TGCTCTCGAT GCTGGTAAGA 3464
TTGGTCGTTT GATTTATGAA TATGATGCTC GTGGTGAGGC TGACCTTTCC GATGACGCAA 3524
GTTTGAAAGA GATCTTCCAA GAGTTAGCTC AAAAATTTGT TGATATTCGT AAACGTATTA 3584
TCAAATTCTA CCACCCACAC AAAGCACAAA TGGTTGCTAT CCCTACTACT TCTGGTACTG 3644
GTTCTGAAGT GACTCCATTT GCGGTTATCA CTGATGATGA AACTCACGTT AAATATCCAC 3704
TTGCTGACTA TCAATTGACA CCTCAAGTTG CCATTGTTGA CCCTGAGTTT GTTATGACTG 3764
TACCAAAACG TACTGTTTCT TGGTCTGGGA TTGATGCTAT GTCACACGCG CTTGAATCTT 3824
ATGTTTCTGT CATGTCTTCT GACTATACAA AACCAATTTC ACTTCAAGCC ATCAAACTCA 3884
TCTTTGAAAA CTTGACTGAG TCTTATCATT ATGACCCAGC TCATCCAACC AAAGAAGGTC 3944
AAAAAGCTCG CGAAAACATG CACAATGCTG CAACACTCGC TGGTATGGCC TTCGCCAATG 4004
CTTTCCTTGG AATTAACCAC TCACTTGCTC ATAAAATTGC TGGTGAATTT GGGCTTCCTC 4064
ATGGTCTTGC CATTGCTATC GCTATGCCAC ATGTCATTAA ATTTAACGCT GTAACAGGAA 4124
ACGTTAAATT TACCCCTTAC CCACGTTATG AAACTTATCG TGCGCAAGAA GACTACGCTG 4184
AAATTTCACG CTTCATGGGA TTTGCTGGCA AAGAAGATTC AGATGAAAAA GCGGTCAAAG 4244
CTTTGGTTGC TGAACTTAAA AAATTGACTG ATAGTATTGA TATTAATATC ACCCTTTCAG 4304
GAAATGGTGT AGATAAAGCT CATCTTGAAC GTGAGCTTGA TAAATTGGCT GACCTTGTTT 4364
ACGATGACCA ATGTACACCT GCTAATCCAC GTCAACCAAG AATTGATGAG ATTAAACAAC 4424
TCTTGTTAGA CCAATATTAA TATATTAATT ATAGTATTTG GAACCGAACG ATATCCATGC 4484
TCGCTAACCT GCTAAAGCAG GAAGTCGCAA TGGTACGTCA ACCAAGAATT GATGAGATTA 4544
AACAACTCTT GTTAGATCAA TACTAATAAT CTGTTGATAA AAATAATTAA AACGCTCTGA 4604
TGAATTCGTC AGAGCGTTTT TTATTATAGC TTATACAACT ATCAAAAGGT ATAAATCAAT 4664
TTCGATATAG GCTCTTTTCA CTCCATTGAT TTATATTTAT ATAAAAATCA ATAATTAATT 4724
AGCGATAGAA GTGATCC 4741
(2) INFORMATION FOR SEQ ID NO: 29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 341 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:29:
Met Lys Glu Lys He Leu Leu Gly Gly Tyr Thr Lys Arg Val Ser Lys
1 5 10 15
Gly Val Tyr Ser Val Leu Leu Asp Ser Lys Lyε Ala Glu Leu Ser Ala
20 25 30
Leu Thr Glu Val Ala Ala Val Gin Asn Pro Thr Tyr He Thr Leu Asp
35 40 45
Gin Lys Gly Hiε Leu Tyr Thr Cys Ala Ala Asp Gly Asn Gly Gly Gly
50 55 60
He Ala Ala Phe Aεp Phe Asp Gly Gin Aεn Thr Thr Hiε Leu Gly Aεn 65 70 75 80
Val Thr Ser Thr Gly Ala Pro Leu Cyε Tyr Val Ala Val Aεp Glu Ala
85 90 95
Arg Gin Leu Val Tyr Gly Ala Asn Tyr His Leu Gly Glu Val Arg Val
100 105 110
Tyr Lys He Gin Ala Asp Gly Ser Leu Arg Leu Thr Aεp Thr Val Lyε
115 120 125
His Asn Gly Ser Gly Pro Arg Pro Glu Gin Ala Ser Ser His Val His
130 135 140
Tyr Ser Asp Leu Thr Pro Asp Gly Arg Leu Val Thr Cys Aεp Leu Gly 145 150 155 160
Thr Aεp Glu Val Thr Val Tyr Aεp Val He Gly Glu Gly Lyε Leu Aεn
165 170 175
He Val Thr He Tyr Arg Ala Glu Lyε Gly Met Gly Ala Arg Hiε He
180 185 190
Ser Phe Hiε Pro Aεn Gly Lys He Ala Tyr Leu Val Gly Glu Leu Asn
195 200 205
Ser Thr He Glu Val Leu Ser Tyr Asn Glu Glu Lys Gly Arg Phe Ala
210 215 220
Arg Leu Gin Thr He Ser Thr Leu Pro Glu Asp Tyr His Gly Ala Asn 225 230 235 240
Gly Val Ala Ala He Arg He Ser Ser Asp Gly Lys Phe Leu Tyr Ala
245 250 255
Ser Asn Arg Gly His Asp Ser Leu Ala He Tyr Lys Val Ser Pro Leu
260 265 270
Gly Thr Lys Leu Glu Ser He Gly Trp Thr Lys Thr Glu Tyr His He
275 280 285
Pro Arg Asp Phe Asn Phe Asn Lyε Thr Glu Aεp Tyr He He Val Ala
290 295 300
His Gin Glu Ser Asp Asn Leu Thr Leu Phe Leu Arg Asp Lys Asn Thr 305 310 315 320
Gly Ser Leu Thr Leu Glu Gin Lyε Aεp Phe Tyr Ala Pro Glu He Thr
325 330 335
Cyε Val Leu Pro Leu 340
(2) INFORMATION FOR SEQ ID NO: 30:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4741 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA (ix) FEATURE :
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 1733...4441
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30:
TTTGGTGACC GAAGTGAACA CCAGCTTCAA GAAGTTGTTT CATTGAAATA ACTGACATGT 60
TAATGTCTCC TTTTAAAATA GTTTTTCCTC TTTCATCTGT CATCCGCAGC CGCAATACTT 120
GCGTACACTA CGACTTTGTC GAGACGAAAT GCGAGATGGT TGCATAGCAA CTCTCTCATT 180
ATACATTGTT TAAGCTACTT TTGCAAGCAT CTATTCATTT ATTTCTTTTA TCAATATGAG 240
TAAATGAAAG CTATCCTACC CCCCTTTCTT TTTATTCTGT TTTTTATATC TCAATGTTGT 300
CTGACAAATT TAACGAATAT TTTTGCCTAT ATAATCCCCA TAAGGGAGAT TTTTACATTT 360
TTTTCTAAGA ATAAAATTAA TATTTTTGCT GAAAACGCTT TTTTTGTGAT AAAATAATTA 420
TAGTAAATAA AATAGTTTGT GAGGAGAGAA ATATGAAAGA AAAAATCCTT TTAGGCGGTT 480
ATACTAAACG TGTATCTAAA GGCGTTTACA GTGTTCTATT AGATAGCAAG AAAGCTGAAT 540
TGTCGGCTTT AACTGAAGTT GCAGCGGTTC AAAATCCAAC TTATATCACT CTTGATCAAA 600
AAGGGCACCT CTACACTTGT GCTGCTGATG GAAATGGTGG TGGAATTGCT GCCTTTGATT 660
TCGATGGTCA AAATACAACT CACCTAGGGA ATGTAACGAG TACTGGAGCC CCTTTGTGTT 720
ATGTGGCTGT TGATGAAGCA CGTCAACTCG TTTATGGTGC CAACTATCAC TTGGGTGAAG 780
TTCGTGTGTA CAAAATTCAA GCTGATGGTT CCCTTAGATT AACCGATACA GTTAAACATA 840
ATGGTTCTGG CCCTCGACCT GAGCAAGCAA GTTCTCATGT CCATTACTCT GATTTAACTC 900
CAGATGGTCG TCTTGTTACT TGTGATTTAG GTACAGATGA AGTGACTGTT TACGATGTTA 960
TTGGTGAAGG TAAACTCAAT ATCGTTACGA TTTATCGTGC CGAAAAAGGA ATGGGAGCTC 1020
GTCACATCAG CTTCCATCCT AATGGAAAAA TTGCTTATCT CGTCGGAGAA TTAAATTCAA 1080
CTATTGAAGT TCTAAGCTAT AATGAAGAAA AAGGACGATT CGCTCGTCTT CAAACAATCA 1140
GTACTTTACC TGAAGACTAT CACGGAGCCA ATGGAGTAGC TGCTATTCGA ATTTCTTCTG 1200
ATGGTAAGTT CCTCTATGCT TCTAATCGTG GGCACGACTC TTTAGCAATT TACAAGGTAA 1260
GTCCTCTCGG AACAAAATTA GAATCTATTG GTTGGACAAA GACTGAATAT CATATTCCAC 1320
GCGATTTTAA TTTTAATAAA ACCGAAGATT ATATCATTGT CGCTCATCAA GAATCTGATA 1380
ATTTAACTCT TTTCTTGAGA GATAAAAATA CAGGGTCATT AACGTTAGAA CAAAAAGACT 1440
TTTACGCTCC TGAAATTACT TGTGTTTTAC CTTTGTAAAA ACTAAACTTT AGTAAATCTT 1500
GCTTTTGTTT TTTCACAAAG TTTTACTAAA TCAGACAAAA AAATATTGCC AAATCTTTAA 1560
AAGGATTGGC AATATTTTTT TGTCTGAAAC CCTTGCTTAT AAAGCGATTT CTAAAAGTTT 1620
GATGAGTTTT TTTGTAAATT TCATCACAAT ATCGCTTGAC TTCTTTAAAA AACTTTGTTA 1680
AACTATTCAC GTAAAAGAAA GTGAATGGAA TCACAAAGGA GAACGTACAC AT ATG GCA 1738
Met Ala 1
ACT AAA AAA GCC GCT CCA GCT GCA AAG AAA GTT TTA AGC GCT GAA GAA 1786 Thr Lys Lys Ala Ala Pro Ala Ala Lys Lys Val Leu Ser Ala Glu Glu 5 10 15
AAA GCC GCA AAA TTC CAA GGA AGT GTC GCT TAT ACT GAT CAA TTA GTC 1834 Lys Ala Ala Lys Phe Gin Gly Ser Val Ala Tyr Thr Asp Gin Leu Val 20 25 30
AAA AAA GCT CAA GCT GCA GTT CTT AAA TTT GAA GGA TAC ACA CAA ACT 1882 Lyε Lyε Ala Gin Ala Ala Val Leu Lyε Phe Glu Gly Tyr Thr Gin Thr 35 40 45 50
CAA GTT GAT ACT ATT GTT GCT GCA ATG GCT CTT GCA GCA AGC AAA CAT 1930 Gin Val Aεp Thr He Val Ala Ala Met Ala Leu Ala Ala Ser Lys His 55 60 65
TCT CTG GAA CTC GCT CAC GAA GCC GTT AAT GAA ACT GGC CGT GGA GTT 1978 Ser Leu Glu Leu Ala Hiε Glu Ala Val Asn Glu Thr Gly Arg Gly Val 70 75 80 GTT GAG GAC AAA GAT ACA AAA AAC CAT TTT GCT TCT GAA TCT GTT TAT 2026 Val Glu Asp Lys Asp Thr Lys Asn His Phe Ala Ser Glu Ser Val Tyr 85 90 95
AAT GCA ATC AAA AAT GAT AAA ACA GTT GGC GTT ATC GCT GAA AAC AAA 2074 Asn Ala He Lys Asn Asp Lys Thr Val Gly Val He Ala Glu Asn Lys 100 105 110
GTT GCT GGT TCT GTT GAA ATC GCA AGC CCC CTT GGA GTA CTT GCT GGT 2122 Val Ala Gly Ser Val Glu He Ala Ser Pro Leu Gly Val Leu Ala Gly 115 120 125 130
ATT GTC CCA ACA ACT AAT CCA ACA TCA ACA GCC ATC TTT AAA TCA TTA 2170 He Val Pro Thr Thr Asn Pro Thr Ser Thr Ala He Phe Lyε Ser Leu 135 140 145
TTA ACT GCA AAG ACA CGT AAT GCT ATT GTC TTT GCC TTT CAC CCA CAA 2218 Leu Thr Ala Lys Thr Arg Asn Ala He Val Phe Ala Phe Hiε Pro Gin 150 155 160
GCA CAA AAA TGC TCA AGC CAT GCG GCA AAA ATT GTT TAT GAT GCT GCG 2266 Ala Gin Lyε Cys Ser Ser His Ala Ala Lys He Val Tyr Asp Ala Ala 165 170 175
ATT GAA GCT GGT GCA CCT GAA GAC TTT ATT CAA TGG ATT GAA GTA CCC 2314 He Glu Ala Gly Ala Pro Glu Asp Phe He Gin Trp He Glu Val Pro 180 185 190
AGT CTT GAT ATG ACG ACT GCT TTG ATT CAA AAT AGA GGA ATT GCT ACA 2362 Ser Leu Asp Met Thr Thr Ala Leu He Gin Asn Arg Gly He Ala Thr 195 200 205 210
ATT CTT GCA ACT GGT GGT CCA GGT ATG GTC AAT GCC GCG CTT AAG TCT 2410 He Leu Ala Thr Gly Gly Pro Gly Met Val Asn Ala Ala Leu Lys Ser 215 220 225
GGT AAT CCT TCA CTT GGT GTA GGT GCT GGT AAT GGT GCA GTT TAT GTT 2458 Gly Asn Pro Ser Leu Gly Val Gly Ala Gly Asn Gly Ala Val Tyr Val 230 235 240
GAT GCA ACT GCA AAT ATC GAT CGT GCT GTT GAA GAT CTT TTG CTT TCA 2506 Asp Ala Thr Ala Asn He Asp Arg Ala Val Glu Asp Leu Leu Leu Ser 245 250 255
AAA CGT TTT GAT AAC GGA ATG ATT TGT GCG ACT GAA AAC TCT GCA GTT 2554 Lys Arg Phe Asp Asn Gly Met He Cys Ala Thr Glu Asn Ser Ala Val 260 265 270
ATT GAT GCA TCA ATC TAT GAT GAA TTT GTC GCT AAA ATG CCA ACG CAA 2602 He Asp Ala Ser He Tyr Asp Glu Phe Val Ala Lys Met Pro Thr Gin 275 280 285 290
GGC GCT TAT ATG GTT CCT AAA AAA GAT TAC AAG GCA ATT GAA AGT TTT 2650 Gly Ala Tyr Met Val Pro Lys Lys Asp Tyr Lys Ala He Glu Ser Phe 295 300 305 GTT TTC GTT GAA CGT GCT GGT GAA GGT TTT GGT GTA ACT GGT CCT GTT 2698 Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly Pro Val 310 315 320
GCT GGT CGT TCT GGT CAA TGG ATT GCT GAA CAA GCT GGT GTT AAC GTC 2746 Ala Gly Arg Ser Gly Gin Trp He Ala Glu Gin Ala Gly Val Asn Val 325 330 335
CCT AAA GAT AAA GAT GTT CTT CTT TTT GAA CTT GAT AAG AAA AAT ATT 2794 Pro Lys Aεp Lyε Asp Val Leu Leu Phe Glu Leu Asp Lys Lys Asn He 340 345 350
GGG GAA GCT CTT TCT TCT GAA AAA CTT TCT CCT TTG CTT TCA ATC TAC 2842 Gly Glu Ala Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser He Tyr 355 360 365 370
AAA TCA GAA ACA CGT GAA GAA GGA ATT GAA ATT GTA CGT AGC TTA CTT 2890 Lys Ser Glu Thr Arg Glu Glu Gly He Glu He Val Arg Ser Leu Leu 375 380 385
GCT TAC CAA GGA GCT GGT CAC AAC GCT GCC ATT CAA ATC GGT GCA ATG 2938 Ala Tyr Gin Gly Ala Gly His Aεn Ala Ala He Gin He Gly Ala Met 390 395 400
GAC GAC CCA TTT GTC AAA GAA TAC GGA ATT AAA GTC GAA GCT TCT CGT 2986 Aεp Asp Pro Phe Val Lys Glu Tyr Gly He Lys Val Glu Ala Ser Arg 405 410 415
ATC CTC GTT AAC CAA CCT GAC TCT ATC GGT GGG GTC GGA GAT ATT TAT 3034 He Leu Val Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp He Tyr 420 425 430
ACT GAT GCA ATG CGT CCA TCA TTG ACG CTC GGA ACT GGT TCA TGG GGG 3082 Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser Trp Gly 435 440 445 450
AAA AAT TCA CTT TCA CAC AAT TTG AGT ACA TAC GAT CTA TTG AAT GTT 3130 Lys Asn Ser Leu Ser Hiε Asn Leu Ser Thr Tyr Asp Leu Leu Asn Val 455 460 465
AAA ACA GTG GCT AAA CGT CGT AAT CGC CCT CAA TGG GTT CGT TTG CCA 3178 Lyε Thr Val Ala Lyε Arg Arg Asn Arg Pro Gin Trp Val Arg Leu Pro 470 475 480
AAA GAA ATT TAC TAC GAA AAA AAT GCA ATT TCT TAC TTA CAA GAA TTG 3226 Lys Glu He Tyr Tyr Glu Lys Asn Ala He Ser Tyr Leu Gin Glu Leu 485 490 495
CCA CAC GTC CAC AAA GCT TTC ATT GTT GCC GAC CCT GGT ATG GTT AAA 3274 Pro His Val His Lys Ala Phe He Val Ala Asp Pro Gly Met Val Lyε 500 505 510
TTC GGT TTC GTT GAT AAA GTT TTG GAA CAA CTT GCT ATC CGC CCA ACT 3322 Phe Gly Phe Val Aεp Lyε Val Leu Glu Gin Leu Ala He Arg Pro Thr 515 520 525 530 CAA GTT GAA ACA AGC ATT TAT GGC TCA GTC CAA CCT GAC CCA ACT TTG 3370 Gin Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Asp Pro Thr Leu 535 540 545
AGT GAA GCA ATT GCA ATC GCT CGT CAA ATG AAC CAT TTT GAA CCT GAC 3418 Ser Glu Ala He Ala He Ala Arg Gin Met Asn His Phe Glu Pro Asp 550 555 560
ACT GTC ATC TGT CTT GGT GGT GGT TCT GCT CTC GAT GCT GGT AAG ATT 3466 Thr Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly Lys He 565 570 575
GGT CGT TTG ATT TAT GAA TAT GAT GCT CGT GGT GAG GCT GAC CTT TCC 3514 Gly Arg Leu He Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp Leu Ser 580 585 590
GAT GAC GCA AGT TTG AAA GAG ATC TTC CAA GAG TTA GCT CAA AAA TTT 3562 Aεp Aεp Ala Ser Leu Lys Glu He Phe Gin Glu Leu Ala Gin Lyε Phe 595 600 605 610
GTT GAT ATT CGT AAA CGT ATT ATC AAA TTC TAC CAC CCA CAC AAA GCA 3610 Val Asp He Arg Lys Arg He He Lys Phe Tyr His Pro Hiε Lys Ala 615 620 625
CAA ATG GTT GCT ATC CCT ACT ACT TCT GGT ACT GGT TCT GAA GTG ACT 3658 Gin Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu Val Thr 630 635 640
CCA TTT GCG GTT ATC ACT GAT GAT GAA ACT CAC GTT AAA TAT CCA CTT 3706 Pro Phe Ala Val He Thr Asp Asp Glu Thr His Val Lyε Tyr Pro Leu 645 650 655
GCT GAC TAT CAA TTG ACA CCT CAA GTT GCC ATT GTT GAC CCT GAG TTT 3754 Ala Aεp Tyr Gin Leu Thr Pro Gin Val Ala He Val Aεp Pro Glu Phe 660 665 670
GTT ATG ACT GTA CCA AAA CGT ACT GTT TCT TGG TCT GGG ATT GAT GCT 3802 Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He Aεp Ala 675 680 685 690
ATG TCA CAC GCG CTT GAA TCT TAT GTT TCT GTC ATG TCT TCT GAC TAT 3850 Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser Asp Tyr 695 700 705
ACA AAA CCA ATT TCA CTT CAA GCC ATC AAA CTC ATC TTT GAA AAC TTG 3898 Thr Lys Pro He Ser Leu Gin Ala He Lys Leu He Phe Glu Aεn Leu 710 715 720
ACT GAG TCT TAT CAT TAT GAC CCA GCT CAT CCA ACC AAA GAA GGT CAA 3946 Thr Glu Ser Tyr Hiε Tyr Asp Pro Ala His Pro Thr Lys Glu Gly Gin 725 730 735
AAA GCT CGC GAA AAC ATG CAC AAT GCT GCA ACA CTC GCT GGT ATG GCC 3994 Lys Ala Arg Glu Asn Met His Asn Ala Ala Thr Leu Ala Gly Met Ala 740 745 750 TTC GCC AAT GCT TTC CTT GGA ATT AAC CAC TCA CTT GCT CAT AAA ATT 4042 Phe Ala Aεn Ala Phe Leu Gly He Asn His Ser Leu Ala His Lys He 755 760 765 770
GCT GGT GAA TTT GGG CTT CCT CAT GGT CTT GCC ATT GCT ATC GCT ATG 4090 Ala Gly Glu Phe Gly Leu Pro Hiε Gly Leu Ala He Ala He Ala Met 775 780 785
CCA CAT GTC ATT AAA TTT AAC GCT GTA ACA GGA AAC GTT AAA TTT ACC 4138 Pro His Val He Lys Phe Asn Ala Val Thr Gly Asn Val Lyε Phe Thr 790 795 800
CCT TAC CCA CGT TAT GAA ACT TAT CGT GCG CAA GAA GAC TAC GCT GAA 4186 Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Aεp Tyr Ala Glu 805 810 815
ATT TCA CGC TTC ATG GGA TTT GCT GGC AAA GAA GAT TCA GAT GAA AAA 4234 He Ser Arg Phe Met Gly Phe Ala Gly Lyε Glu Aεp Ser Asp Glu Lys 820 825 830
GCG GTC AAA GCT TTG GTT GCT GAA CTT AAA AAA TTG ACT GAT AGT ATT 4282 Ala Val Lys Ala Leu Val Ala Glu Leu Lyε Lys Leu Thr Asp Ser He 835 840 845 850
GAT ATT AAT ATC ACC CTT TCA GGA AAT GGT GTA GAT AAA GCT CAT CTT 4330 Asp He Asn He Thr Leu Ser Gly Asn Gly Val Asp Lys Ala His Leu 855 860 865
GAA CGT GAG CTT GAT AAA TTG GCT GAC CTT GTT TAC GAT GAC CAA TGT 4378 Glu Arg Glu Leu Aεp Lys Leu Ala Aεp Leu Val Tyr Aεp Asp Gin Cyε 870 875 880
ACA CCT GCT AAT CCA CGT CAA CCA AGA ATT GAT GAG ATT AAA CAA CTC 4426 Thr Pro Ala Asn Pro Arg Gin Pro Arg He Asp Glu He Lys Gin Leu 885 890 895
TTG TTA GAC CAA TAT TAATATATTA ATTATAGTAT TTGGAACCGA ACGATATCCA T 4482 Leu Leu Asp Gin Tyr 900
GCTCGCTAAC CTGCTAAAGC AGGAAGTCGC AATGGTACGT CAACCAAGAA TTGATGAGAT 4542
TAAACAACTC TTGTTAGATC AATACTAATA ATCTGTTGAT AAAAATAATT AAAACGCTCT 4602
GATGAATTCG TCAGAGCGTT TTTTATTATA GCTTATACAA CTATCAAAAG GTATAAATCA 4662
ATTTCGATAT AGGCTCTTTT CACTCCATTG ATTTATATTT ATATAAAAAT CAATAATTAA 4722
TTAGCGATAG AAGTGATCC 4741
(2) INFORMATION FOR SEQ ID NO: 31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 903 amino acidε
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:
Met Ala Thr Lys Lys Ala Ala Pro Ala Ala Lys Lys Val Leu Ser Ala
1 5 10 15
Glu Glu Lys Ala Ala Lyε Phe Gin Gly Ser Val Ala Tyr Thr Aεp Gin
20 25 30
Leu Val Lyε Lys Ala Gin Ala Ala Val Leu Lys Phe Glu Gly Tyr Thr
35 40 45
Gin Thr Gin Val Asp Thr He Val Ala Ala Met Ala Leu Ala Ala Ser
50 55 60
Lyε His Ser Leu Glu Leu Ala His Glu Ala Val Asn Glu Thr Gly Arg 65 70 75 80
Gly Val Val Glu Aεp Lyε Aεp Thr Lys Asn Hiε Phe Ala Ser Glu Ser
85 90 95
Val Tyr Aεn Ala He Lyε Aεn Aεp Lyε Thr Val Gly Val He Ala Glu
100 105 110
Aεn Lyε Val Ala Gly Ser Val Glu He Ala Ser Pro Leu Gly Val Leu
115 120 125
Ala Gly He Val Pro Thr Thr Aεn Pro Thr Ser Thr Ala He Phe Lys
130 135 140
Ser Leu Leu Thr Ala Lys Thr Arg Asn Ala He Val Phe Ala Phe Hiε 145 150 155 160
Pro Gin Ala Gin Lyε Cyε Ser Ser Hiε Ala Ala Lys He Val Tyr Asp
165 170 175
Ala Ala He Glu Ala Gly Ala Pro Glu Asp Phe He Gin Trp He Glu
180 185 190
Val Pro Ser Leu Asp Met Thr Thr Ala Leu He Gin Asn Arg Gly He
195 200 205
Ala Thr He Leu Ala Thr Gly Gly Pro Gly Met Val Aεn Ala Ala Leu
210 215 220
Lys Ser Gly Asn Pro Ser Leu Gly Val Gly Ala Gly Aεn Gly Ala Val 225 230 235 240
Tyr Val Asp Ala Thr Ala Asn He Asp Arg Ala Val Glu Asp Leu Leu
245 250 255
Leu Ser Lys Arg Phe Asp Asn Gly Met He Cys Ala Thr Glu Asn Ser
260 265 270
Ala Val He Asp Ala Ser He Tyr Asp Glu Phe Val Ala Lys Met Pro
275 280 285
Thr Gin Gly Ala Tyr Met Val Pro Lys Lys Asp Tyr Lys Ala He Glu
290 295 300
Ser Phe Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly 305 310 315 320
Pro Val Ala Gly Arg Ser Gly Gin Trp He Ala Glu Gin Ala Gly Val
325 330 335
Asn Val Pro Lys Asp Lys Asp Val Leu Leu Phe Glu Leu Asp Lyε Lys
340 345 350
Asn He Gly Glu Ala Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser
355 360 365
He Tyr Lys Ser Glu Thr Arg Glu Glu Gly He Glu He Val Arg Ser
370 375 380
Leu Leu Ala Tyr Gin Gly Ala Gly His Asn Ala Ala He Gin He Gly 385 390 395 400
Ala Met Asp Asp Pro Phe Val Lys Glu Tyr Gly He Lys Val Glu Ala
405 410 415
Ser Arg He Leu Val Asn Gin Pro Asp Ser He Gly Gly Val Gly Asp
420 425 430
He Tyr Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser 435 440 445 Trp Gly Lyε Aεn Ser Leu Ser Hiε Aεn Leu Ser Thr Tyr Asp Leu Leu
450 455 460
Asn Val Lys Thr Val Ala Lyε Arg Arg Aεn Arg Pro Gin Trp Val Arg 465 470 475 480
Leu Pro Lyε Glu He Tyr Tyr Glu Lyε Aεn Ala He Ser Tyr Leu Gin
485 490 495
Glu Leu Pro Hiε Val His Lys Ala Phe He Val Ala Asp Pro Gly Met
500 505 510
Val Lys Phe Gly Phe Val Aεp Lys Val Leu Glu Gin Leu Ala He Arg
515 520 525
Pro Thr Gin Val Glu Thr Ser He Tyr Gly Ser Val Gin Pro Asp Pro
530 535 540
Thr Leu Ser Glu Ala He Ala He Ala Arg Gin Met Asn His Phe Glu 545 550 555 560
Pro Aεp Thr Val He Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly
565 570 575
Lys He Gly Arg Leu He Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp
580 585 590
Leu Ser Asp Asp Ala Ser Leu Lyε Glu He Phe Gin Glu Leu Ala Gin
595 600 605
Lyε Phe Val Aεp He Arg Lyε Arg He He Lyε Phe Tyr Hiε Pro His
610 615 620
Lys Ala Gin Met Val Ala He Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640
Val Thr Pro Phe Ala Val He Thr Asp Asp Glu Thr His Val Lys Tyr
645 650 655
Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala He Val Asp Pro
660 665 670
Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly He
675 680 685
Aεp Ala Met Ser Hiε Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser
690 695 700
Asp Tyr Thr Lys Pro He Ser Leu Gin Ala He Lys Leu He Phe Glu 705 710 715 720
Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu
725 730 735
Gly Gin Lys Ala Arg Glu Asn Met His Asn Ala Ala Thr Leu Ala Gly
740 745 750
Met Ala Phe Ala Asn Ala Phe Leu Gly He Asn His Ser Leu Ala His
755 760 765
Lys He Ala Gly Glu Phe Gly Leu Pro His Gly Leu Ala He Ala He
770 775 780
Ala Met Pro His Val He Lyε Phe Aεn Ala Val Thr Gly Aεn Val Lyε 785 790 795 800
Phe Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Aεp Tyr
805 810 815
Ala Glu He Ser Arg Phe Met Gly Phe Ala Gly Lyε Glu Aεp Ser Asp
820 825 830
Glu Lys Ala Val Lyε Ala Leu Val Ala Glu Leu Lys Lys Leu Thr Asp
835 840 845
Ser He Asp He Asn He Thr Leu Ser Gly Aεn Gly Val Aεp Lyε Ala
850 855 860
Hiε Leu Glu Arg Glu Leu Aεp Lyε Leu Ala Aεp Leu Val Tyr Aεp Asp 865 870 875 880
Gin Cys Thr Pro Ala Asn Pro Arg Gin Pro Arg He Asp Glu He Lys
885 890 895
Gin Leu Leu Leu Asp Gin Tyr 900 (2) INFORMATION FOR SEQ ID NO: 32:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
( ix) FEATURE :
(A) NAME/KEY: Other
(B) LOCATION: 1...31
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: GGCCGCTCGA GGTTGAACGT GCTGGTGAAG G 31
(2) INFORMATION FOR SEQ ID NO: 33:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 31 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Other
(B) LOCATION: 1...31
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: TAGTAGGATC CGGGTCAGGT TGGACTGAGC C 31
(2) INFORMATION FOR SEQ ID NO: 34:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1750 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 464...1378 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34:
GCGCCTAGAT AAGAAACAGC AACAGCTAAA AGATAGGTAT CAAAAGCACT TGATTTAAAA 60
ATAATGACTT TATCCGATTT TTTGATTCCC AACTCAGATA AGAGACTTGC CTTATCAACA 120
ATTGCTTGAT GAGTCTTTTG GTAAGTCGTT TCAAGAGCTA GTTCGGGGAA AGCTCCAACA 180
GCCTCATCAA AGATAATTGG GCTATCAGGA AACTGTTCAG CTGATTTTTT AAAGTTTAGA 240
TACAAATTTA GGGGTTCGTG TTTGAATTTC AAAAAAAATC TCCTCAAGTT AATAAGTTTA 300
TTATATCACA AAGTATTCTT TAGACCAATA GTTAATGTAA ATGTTTTCTT AAGTCGTAGA 360
GAATAAAATT CTCGGAAAAA AAGTCTAAAA TCTGCTACAA TTAAAGGGAC ACTAAGAGGA 420
TTCCAATCCT CTTTTATCAG GAAAAGAAGG GATAGATAGG AAA ATG ATT AAA AAT 475
Met He Lyε Aεn 1
TAT GAA CTA TCC AAC GAA AAA AAA TTA ATT TCA ACC TCT GAA ATG AAG 523 Tyr Glu Leu Ser Aεn Glu Lyε Lyε Leu He Ser Thr Ser Glu Met Lyε 5 10 15 20
AAT TTC ACC TAT GTT CTC AAT CCA ACA CGT GAA GAA ATT GGG AAT ATT 571 Aεn Phe Thr Tyr Val Leu Asn Pro Thr Arg Glu Glu He Gly Aεn He 25 30 35
TCT GAA TAC TAT GAC TTC CCT TTT GAC TAT TTA TCA GGA ATT TTG GAT 619 Ser Glu Tyr Tyr Aεp Phe Pro Phe Asp Tyr Leu Ser Gly He Leu Aεp 40 45 50
GAC TAT GAA AAT GCC CGT TTT GAA ACA GAT GAT AAT GAT AAT AAT CTG 667 Aεp Tyr Glu Asn Ala Arg Phe Glu Thr Asp Asp Aεn Aεp Asn Asn Leu 55 60 65
ATT CTC TTA CAA TAT CCT CCA CTC TCT AAT TAT GGA GAA GTG GCG ACT 715 He Leu Leu Gin Tyr Pro Pro Leu Ser Aεn Tyr Gly Glu Val Ala Thr 70 75 80
TTT CCA TAT TCT TTG GTT TGG ACT AAA AAT GAA TCG GTT ATT TTA GCA 763 Phe Pro Tyr Ser Leu Val Trp Thr Lyε Asn Glu Ser Val He Leu Ala 85 90 95 100
CTT AAT CAT GAG ATT GAT AAT GGC TTA ATT TTC GAG CGT GAA TAT GAT 811 Leu Asn His Glu He Asp Asn Gly Leu He Phe Glu Arg Glu Tyr Asp 105 110 115
TAT AAA CGC TAC AAA CAT CAA GTT ATT TTT CAA GTG ATG TAT CAA ATG 859 Tyr Lys Arg Tyr Lyε His Gin Val He Phe Gin Val Met Tyr Gin Met 120 125 130
ACA CAC ACT TTC CAT GAT TAT TTG AGA GAT TTC CGA ACA AGG CGT CGC 907 Thr His Thr Phe His Asp Tyr Leu Arg Asp Phe Arg Thr Arg Arg Arg 135 140 145
AGA CTT GAA CAG GGA ATC AAA AAT TCA ACA AAG AAC GAC CAA ATT GTT 955 Arg Leu Glu Gin Gly He Lys Asn Ser Thr Lys Asn Asp Gin He Val 150 155 160
GAT TTG ATT GCC ATT CAA GCA AGT TTA ATT TAT TTT GAA GAT GCC TTG 1003 Asp Leu He Ala He Gin Ala Ser Leu He Tyr Phe Glu Asp Ala Leu 165 170 175 180 CAC AAT AAT ATG CAA GTA CTT CAG GAT TTT ATT GAT TAC TTG AGA GAA 1051 His Aεn Aεn Met Gin Val Leu Gin Asp Phe He Asp Tyr Leu Arg Glu 185 190 195
GAT GAT GAA GAC GGT TTT GCT GAA AAG ATT TAT GAT ATT TTT GTC GAA 1099 Asp Asp Glu Asp Gly Phe Ala Glu Lys He Tyr Asp He Phe Val Glu 200 205 210
ACA GAC CAA GCT TAT ACA GAA ACC AAG ATT CAG CTC AAG TTA CTA GAA 1147 Thr Asp Gin Ala Tyr Thr Glu Thr Lys He Gin Leu Lys Leu Leu Glu 215 220 225
AAT CTC CGA GAT TTG TTC TCA AAC AAT GTC TCT AAT AAC TTG AAC ATT 1195 Asn Leu Arg Asp Leu Phe Ser Asn Asn Val Ser Asn Asn Leu Asn He 230 235 240
GTC ATG AAA ATC ATG ACA TCA GCT ACT TTC GTT CTA GGG ATT CCT GCA 1243 Val Met Lys He Met Thr Ser Ala Thr Phe Val Leu Gly He Pro Ala 245 250 255 260
GTA ATT GTT GGT TTT TAC GGA ATG AAT GTT CCA ATT CCT GGT CAA AAT 1291 Val He Val Gly Phe Tyr Gly Met Asn Val Pro He Pro Gly Gin Asn 265 270 275
TTT AAT TGG ATG GTT TGG CTT ATT TTA GTT CTA GGA ATT TTA TTA TGT 1339 Phe Asn Trp Met Val Trp Leu He Leu Val Leu Gly He Leu Leu Cys 280 285 290
GTT TGG GTC ACT TGG TGG TTA CAT AAA AAA GAT ATG TTA TAAAATGGAG AA 1390 Val Trp Val Thr Trp Trp Leu His Lys Lys Aεp Met Leu 295 300 305
AAATCTCCAT TTTTTTGCTC TTTGTGAAAA AATTAATTAG TGATTGCAGA TTATGAAGTT 1450
AGCAATGTTT GTTAAAACTA TTTTGTGAAT TATTTATGAA AACGTTTTAA AAAAGTATAA 1510
CAGATATTAA AATAATTGGA ACTGTATTAG TAAAGAATCT GTAATTTCTC TTGAATTCTG 1570
TTTGCTATTC TCAAACTGTA TGATATAATG AAGTTGTAAT TTGAAACAGA AAGAACAAAG 1630
GAGATTTCAA AATGAAAACC GAAGTTACGG AAAATATCTT TGAACAAGCT TGGGATGGTT 1690
TTAAAGGAAC CAACTGGCGC GATAAAGCAA GCGTTACTCG CTTTGTACAA GAAAACTACA 1750
1750
(2) INFORMATION FOR SEQ ID NO: 35:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 305 amino acidε
(B) TYPE: amino acid
(C) STRANDEDNESS: εingle
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:
Met He Lyε Asn Tyr Glu Leu Ser Aεn Glu Lyε Lys Leu He Ser Thr
1 5 10 15
Ser Glu Met Lyε Asn Phe Thr Tyr Val Leu Asn Pro Thr Arg Glu Glu 20 25 30 He Gly Asn He Ser Glu Tyr Tyr Asp Phe Pro Phe Asp Tyr Leu Ser
35 40 45
Gly He Leu Asp Asp Tyr Glu Asn Ala Arg Phe Glu Thr Asp Asp Asn
50 55 60
Asp Asn Aεn Leu He Leu Leu Gin Tyr Pro Pro Leu Ser Aεn Tyr Gly 65 70 75 80
Glu Val Ala Thr Phe Pro Tyr Ser Leu Val Trp Thr Lyε Asn Glu Ser
85 90 95
Val He Leu Ala Leu Asn Hiε Glu He Aεp Aεn Gly Leu He Phe Glu
100 105 110
Arg Glu Tyr Aεp Tyr Lyε Arg Tyr Lyε Hiε Gin Val He Phe Gin Val
115 120 125
Met Tyr Gin Met Thr Hiε Thr Phe Hiε Asp Tyr Leu Arg Asp Phe Arg
130 135 140
Thr Arg Arg Arg Arg Leu Glu Gin Gly He Lys Asn Ser Thr Lyε Asn 145 150 155 160
Aεp Gin He Val Asp Leu He Ala He Gin Ala Ser Leu He Tyr Phe
165 170 175
Glu Asp Ala Leu Hiε Aεn Aεn Met Gin Val Leu Gin Aεp Phe He Aεp
180 185 190
Tyr Leu Arg Glu Aεp Asp Glu Asp Gly Phe Ala Glu Lys He Tyr Asp
195 200 205
He Phe Val Glu Thr Asp Gin Ala Tyr Thr Glu Thr Lys He Gin Leu
210 215 220
Lyε Leu Leu Glu Asn Leu Arg Asp Leu Phe Ser Asn Aεn Val Ser Asn 225 230 235 240
Asn Leu Asn He Val Met Lys He Met Thr Ser Ala Thr Phe Val Leu
245 250 255
Gly He Pro Ala Val He Val Gly Phe Tyr Gly Met Asn Val Pro He
260 265 270
Pro Gly Gin Asn Phe Aεn Trp Met Val Trp Leu He Leu Val Leu Gly
275 280 285
He Leu Leu Cys Val Trp Val Thr Trp Trp Leu His Lys Lys Asp Met
290 295 300
Leu 305
(2) INFORMATION FOR SEQ ID NO:36:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4191 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 270...1184
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:
TTGGGCTATA AGGAAATTGT TCTGCTGATT TTTTAAAGTT TAGATATAGG TTTAGGGGTT 60
CATGTTTGAA TTTCAAAAAA AGTCTCCTCA AGTTAATAAG TTTATTATAT CACAAAGTAT 120
TATTTAGACC AACTTCCTTC AAAAAACTTT TCGTTAAGGC TTTGAAATAA AATAATGAGA 180
AAAAAATAGG AAAATCTGCT ACAATTAGAA GGAGAAGAAG AGGATTTAAA TCCTTTTTTA 240 TTAGGAAAAG AAGGGATAGA TAGGCTGAT ATG ATA AAA AAT TAT GAA CTA TCC 293
Met He Lyε Aεn Tyr Glu Leu Ser 1 5
AAT GAA AAA AAA TTG ATC TCA ACT TCT GAG ATG AAG AAT TTC ACT TAT 341 Asn Glu Lys Lys Leu He Ser Thr Ser Glu Met Lys Asn Phe Thr Tyr 10 15 20
GTC CTC AAT CCA ACA CGT GAA GAA ATT GGG AAT ATC TCA GAA CAC TAT 389 Val Leu Asn Pro Thr Arg Glu Glu He Gly Asn He Ser Glu His Tyr 25 30 35 40
GAT TTT CCT TTT GAC TAT CTA TCT GGA ATT TTA GAT GAC TAT GAA AAT 437 Asp Phe Pro Phe Asp Tyr Leu Ser Gly He Leu Asp Asp Tyr Glu Asn 45 50 55
GCC CGT TTT GAA ACA GAT GAT AAT GAC AAT AAT CTG ATT CTT TTG CAA 485 Ala Arg Phe Glu Thr Aεp Aεp Aεn Asp Asn Aεn Leu He Leu Leu Gin 60 65 70
TAT CCC GCC TTG TCC AAC TAT GGA GAA GTG GCC ACT TTT CCA TAT TCT 533 Tyr Pro Ala Leu Ser Aεn Tyr Gly Glu Val Ala Thr Phe Pro Tyr Ser 75 80 85
TTG GTT TGG ACT AAG AAT GAA TCG GTT ATT TTG GCC CTT AAC CAT GAA 581 Leu Val Trp Thr Lys Asn Glu Ser Val He Leu Ala Leu Asn His Glu 90 95 100
ATT GAT AAT GGT CTC ATT TTT GAA CGA GAA TAT GAT TAT AAA CGC TAT 629 He Asp Asn Gly Leu He Phe Glu Arg Glu Tyr Aεp Tyr Lys Arg Tyr 105 110 115 120
AAA CAC CAA TTG ATT TTT CAA GTG ATG TAC CAA ATG ACT CAT ACT TTT 677 Lys His Gin Leu He Phe Gin Val Met Tyr Gin Met Thr Hiε Thr Phe 125 130 135
CAT GAT TAT TTG AGA GAC TTT AGA ACA AGG CGC CGC CGG CTT GAA GTT 725 His Asp Tyr Leu Arg Asp Phe Arg Thr Arg Arg Arg Arg Leu Glu Val 140 145 150
GGT ATC AAA AAT TCA ACA AAA AAT GAC CAA ATT GTT GAC TTA ATT GCC 773 Gly He Lys Asn Ser Thr Lyε Aεn Aεp Gin He Val Asp Leu He Ala 155 160 165
ATT CAA GCG AGT TTG ATT TAT TTT GAA GAT GCG CTG CAC AAT AAT ATG 821 He Gin Ala Ser Leu He Tyr Phe Glu Asp Ala Leu His Asn Asn Met 170 175 180
CAA GTT CTC CAG AAT TTT ATT GAT TAC TTA CGA GAA GAT GAT GAA GAT 869 Gin Val Leu Gin Asn Phe He Asp Tyr Leu Arg Glu Aεp Aεp Glu Aεp 185 190 195 200
GGT TTT GCC GAA AAA ATC TAT GAT ATT TTT GTC GAA ACA GAC CAA GCT 917 Gly Phe Ala Glu Lyε He Tyr Aεp He Phe Val Glu Thr Aεp Gin Ala 205 210 215 TAT ACA GAA ACC AAG ATT CAG CTC AAG TTA CTA GAA AAT CTC CGA GAT 965 Tyr Thr Glu Thr Lyε He Gin Leu Lyε Leu Leu Glu Asn Leu Arg Asp 220 225 230
TTG TTC TCA AAC ATT GTC TCT AAT AAT TTG AAT ATC GTC ATG AAA ATT 1013 Leu Phe Ser Asn He Val Ser Asn Asn Leu Asn He Val Met Lys He 235 240 245
ATG ACC TCA GCA ACA TTT GTT CTA GGT ATT CCG GCG GTT ATT GTC GGC 1061 Met Thr Ser Ala Thr Phe Val Leu Gly He Pro Ala Val He Val Gly 250 255 260
TTT TAT GGA ATG AAT GTT CCG ATT CCT GGT CAA AAT TTT AAT TGG ATG 1109 Phe Tyr Gly Met Asn Val Pro He Pro Gly Gin Asn Phe Asn Trp Met 265 270 275 280
GTC TGG CTC ATT TTG GTG TTT GGA ATT TTA TTA TGT GTT TGG GTT ACT 1157 Val Trp Leu He Leu Val Phe Gly He Leu Leu Cys Val Trp Val Thr 285 290 295
TGG TGG CTA CAC AAA AAA GAT ATG TTA TGAATGGAGA AAATTTCTCC GTTTTTT 1211 Trp Trp Leu Hiε Lyε Lyε Asp Met Leu 300 305
TATCTTTGTG AAAAAATTAA TTAGTGATAA TAAATCATGA AGTTAGCAAT GTTTGTCAAA 1271
GCTATTTAGT GAATTAATTA TGAAAACGTT TTAAAAAAGT ATAACAGATA TTAAAATAAT 1331
TGAAACTGTA TTAGTAAAGA ATCTGTAATT TCTCTTGAAT TCTGTTTGCT ATTATCAAAC 1391
TGTATGATAT AATGAAGTTG TAATTTGAAA CAGAAAGAAC AAAGGAGATT TCAAAATGAA 1451
AACCGAAGTT ACGGAAAATA TCTTTGAACA AGCTTGGGAT GGTTTTAAAG GAACTAACTG 1511
GCGCGATAAA GCAAGCGTTA CTCGCTTTGT ACAAGAAAAC TACAAACCAT ATGATGGTGA 1571
TGAAAGCTTT CTTGCTGGGC CAACAGAACG TACACTTAAA GTAAAGAAAA TTATTGAAGA 1631
TACAAAAAAT CACTACGAAG AAGTAGGATT TCCCTTTGAT ACTGACCGCG TAACCTCTAT 1691
CGATAAAATT CCTGCTGGAT ATATTGATGC TAATGATAAA GAACTTGAAC TCATCTATGG 1751
GATGCAAAAT AGCGAACTTT TCCGCTTAAA CTTCATGCCA AGAGGTGGTC TTCGTGTTGC 1811
TGAAAAGATT TTGACAGAAC ACGGTCTTTC AGTTGACCCA GGTTTGCATG ATGTTTTGTC 1871
ACAAACAATG ACTTCTGTAA ATGATGGAAT CTTCCGTGCT TATACTTCAG CAATTCGTAA 1931
AGCACGTCAC GCTCACACTG TAACAGGTTT GCCTGATGCA TACTCTCGTG GACGTATCAT 1991
CGGGGTATAT GCACGTCTTG CTCTTTATGG AGCTGACTAC CTTATGAAGG AAAAAGCAAA 2051
AGAATGGGAT GCAATCACTG AAATTAATGA TGATAACATT CGTCTTAAAG AAGAAATTAA 2111
CATGCAATAC CAAGCTTTGC AAGAAGTTGT AAACTTTGGT GCTTTGTATG GTCTTGACGT 2171
TTCTCGTCCA GCGATGAACG TAAAAGAAGC AATCCAATGG GTTAATATTG CATACATGGC 2231
AGTTTGTCGT GTTATCAATG GTGCTGCAAC TTCACTTGGA CGTGTGCCAA TCGTTCTTGA 2291
CATCTTTGCA GAACGTGACC TTGCTCGTGG AACATTTACT GAGCAAGAAA TCCAAGAATT 2351
TGTTGATGAT TTCATTTTAA AACTTCGTAC AATGAAATTT GCTCGTGCTG CTGCTTATGA 2411
TGAACTTTAT TCTGGTGACC CCACGTTCAT CACAACATCT ATGGCTGGTA TGGGTAATGA 2471
CGGACGCCAC CGTGTCACTA AAATGGACTA TCGTTTCTTG AACACACTTG ATACAATCGG 2531
AAATGCTCCA GAACCAAACT TGACAGTTCT TTGGGACTCT AAACTCCCAT ATTCATTCAA 2591
ACGTTATTCA ATGTCTATGA GTCACAAACA CTCATCTATC CAATATGAAG GTGTTGAAAC 2651
AATGGCTAAA GATGGATATG GCGAAATGTC ATGTATCTCT TGTTGTGTCT CACCACTTGA 2711
CCCAGAAAAT GAAGAAGGAC GTCATAATCT CCAATACTTT GGTGCGCGTG TAAACGTCTT 2771
GAAAGCAATG TTGACTGGTT TGAACGGTGG TTACGATGAC GTTCATAAAG ATTATAAAGT 2831
ATTCGATATT GAACCTGTTC GTGATGAAAT TCTTGACTAT GATACAGTTA TGGAAAACTT 2891
CGACAAATCA CTCAACTGGT TGACAGATAC TTATGTTGAT GCAATGAATA TCATTCACTA 2951
CATGACTGAC AAATATAACT ATGAAGCAGT TCAAATGGCC TTCTTGCCTA CTAAAGTTCG 3011
TGCTAACATG GGATTTGGTA TCTGTGGTTT CGCAAATACA GTTGATTCAC TTTCAGCGAT 3071
TAAATATGCT AAAGTTAAAA CTTTGCGTGA TGAAAATGGC TACATCTACG ATTATGAAGT 3131
AGAAGGTGAC TTCCCACGTT ATGGTGAAGA TGATGACCGT GCTGATGATA TCGCTAAACT 3191
TGTCATGAAA ATGTACCATG AAAAATTAGC TTCACACAAA CTTTACAAAA ATGCTGAAGC 3251 TACTGTTTCA CTTTTGACAA TCACATCTAA CGTTGCTTAC TCTAAACAAA CTGGTAACTC 3311
TCCAGTTCAT AAAGGAGTAT TCCTCAATGA AGATGGTACA GTCAACAAAT CTAAACTTGA 3371
ATTCTTCTCA CCAGGTGCTA ACCCATCTAA CAAAGCTAAA GGTGGATGGT TGCAAAATCT 3431
TCGTTCATTA GCTAAATTGG AATTCAAAGA TGCAAATGAC GGTATTTCAT TAACTACTCA 3491
AGTTTCTCCT CGTGCACTTG GTAAAACTCG TGATGAACAA GTAGATAACT TGGTTCAAAT 3551
TCTTGATGGA TACTTCACAC CAGGAGCTTT GATTAATGGT ACTGAATTTG CAGGTCAACA 3611
CGTTAACTTG AACGTTATGG ACCTTAAAGA TGTTTACGAT AAAATCATGC GTGGTGAAGA 3671
TGTTATCGTT CGTATCTCTG GATACTGTGT TAACACTAAA TACCTCACAC CTGAACAAAA 3731
ACAAGAATTG ACTGAACGTG TCTTCCATGA AGTACTTTCA AATGATGATG AAGAAGTAAT 3791
GCACACTTCA AATATCTAAT TCTTAGTATT AAAAAATATA AGGTCTGTCA GTTCTACTGA 3851
CAGACTTTTT TTCTATAAAT TAATTATAAT AGTTAAAAAC TATTATTTTT AGTTTAAGAA 3911
AAATAAAATT TGTGCTAAAA TAGATGAATG ATAAAGGTAA TTGGATTAAC AGGCGGAATT 3971
GCGAGTGGGA AATCAACGGT GGTTGATTTT TTGATTTCTG AAGGTTATCA AGTAATTGAT 4031
GCTGACAAAG TTGTTCGTCA GTTGCAAGAA CCTGATGGGA AACTTTTTAA TGCAATAATG 4091
GAAACTTTCG GTTCAGATTT TACTGACGAA AATGGGAAAT TAAACCGATG CAAAATTGAG 4151
TGCTTAAGTT TTGCTGACCC AAATCAACGT CAAAAATTAT 4191
(2) INFORMATION FOR SEQ ID NO: 37:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 305 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37:
Met He Lyε Asn Tyr Glu Leu Ser Aεn Glu Lys Lys Leu He Ser Thr
1 5 10 15
Ser Glu Met Lys Asn Phe Thr Tyr Val Leu Asn Pro Thr Arg Glu Glu
20 25 30
He Gly Asn He Ser Glu Hiε Tyr Asp Phe Pro Phe Asp Tyr Leu Ser
35 40 45
Gly He Leu Asp Asp Tyr Glu Asn Ala Arg Phe Glu Thr Asp Asp Aεn
50 55 60
Asp Asn Asn Leu He Leu Leu Gin Tyr Pro Ala Leu Ser Asn Tyr Gly 65 70 75 80
Glu Val Ala Thr Phe Pro Tyr Ser Leu Val Trp Thr Lys Aεn Glu Ser
85 90 95
Val He Leu Ala Leu Asn Hiε Glu He Asp Aεn Gly Leu He Phe Glu
100 105 110
Arg Glu Tyr Asp Tyr Lys Arg Tyr Lys Hiε Gin Leu He Phe Gin Val
115 120 125
Met Tyr Gin Met Thr Hiε Thr Phe Hiε Aεp Tyr Leu Arg Aεp Phe Arg
130 135 140
Thr Arg Arg Arg Arg Leu Glu Val Gly He Lyε Asn Ser Thr Lys Asn 145 150 155 160
Aεp Gin He Val Aεp Leu He Ala He Gin Ala Ser Leu He Tyr Phe
165 170 175
Glu Aεp Ala Leu His Asn Asn Met Gin Val Leu Gin Asn Phe He Asp
180 185 190
Tyr Leu Arg Glu Asp Asp Glu Asp Gly Phe Ala Glu Lys He Tyr Asp
195 200 205
He Phe Val Glu Thr Aεp Gin Ala Tyr Thr Glu Thr Lyε He Gin Leu 210 215 220 Lys Leu Leu Glu Asn Leu Arg Aεp Leu Phe Ser Asn He Val Ser Asn 225 230 235 240
Asn Leu Asn He Val Met Lys He Met Thr Ser Ala Thr Phe Val Leu
245 250 255
Gly He Pro Ala Val He Val Gly Phe Tyr Gly Met Asn Val Pro He
260 265 270
Pro Gly Gin Asn Phe Asn Trp Met Val Trp Leu He Leu Val Phe Gly
275 280 285
He Leu Leu Cys Val Trp Val Thr Trp Trp Leu Hiε Lys Lys Aεp Met
290 295 300
Leu 305
(2) INFORMATION FOR SEQ ID NO: 38:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4191 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Coding Sequence
(B) LOCATION: 1447...3807
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38:
TTGGGCTATA AGGAAATTGT TCTGCTGATT TTTTAAAGTT TAGATATAGG TTTAGGGGTT 60
C-ATGTTTGAA TTTCAAAAAA AGTCTCCTCA AGTTAATAAG TTTATTATAT CACAAAGTAT 120
TATTTAGACC AACTTCCTTC AAAAAACTTT TCGTTAAGGC TTTGAAATAA AATAATGAGA 180
AAAAAATAGG AAAATCTGCT ACAATTAGAA GGAGAAGAAG AGGATTTAAA TCCTTTTTTA 240
TTAGGAAAAG AAGGGATAGA TAGGCTGATA TGATAAAAAA TTATGAACTA TCCAATGAAA 300
AAAAATTGAT CTCAACTTCT GAGATGAAGA ATTTCACTTA TGTCCTCAAT CCAACACGTG 360
AAGAAATTGG GAATATCTCA GAACACTATG ATTTTCCTTT TGACTATCTA TCTGGAATTT 420
TAGATGACTA TGAAAATGCC CGTTTTGAAA CAGATGATAA TGACAATAAT CTGATTCTTT 480
TGCAATATCC CGCCTTGTCC AACTATGGAG AAGTGGCCAC TTTTCCATAT TCTTTGGTTT 540
GGACTAAGAA TGAATCGGTT ATTTTGGCCC TTAACCATGA AATTGATAAT GGTCTCATTT 600
TTGAACGAGA ATATGATTAT AAACGCTATA AACACCAATT GATTTTTCAA GTGATGTACC 660
AAATGACTCA TACTTTTCAT GATTATTTGA GAGACTTTAG AACAAGGCGC CGCCGGCTTG 720
AAGTTGGTAT CAAAAATTCA ACAAAAAATG ACCAAATTGT TGACTTAATT GCCATTCAAG 780
CGAGTTTGAT TTATTTTGAA GATGCGCTGC ACAATAATAT GCAAGTTCTC CAGAATTTTA 840
TTGATTACTT ACGAGAAGAT GATGAAGATG GTTTTGCCGA AAAAATCTAT GATATTTTTG 900
TCGAAACAGA CCAAGCTTAT ACAGAAACCA AGATTCAGCT CAAGTTACTA GAAAATCTCC 960
GAGATTTGTT CTCAAACATT GTCTCTAATA ATTTGAATAT CGTCATGAAA ATTATGACCT 1020
CAGCAACATT TGTTCTAGGT ATTCCGGCGG TTATTGTCGG CTTTTATGGA ATGAATGTTC 1080
CGATTCCTGG TCAAAATTTT AATTGGATGG TCTGGCTCAT TTTGGTGTTT GGAATTTTAT 1140
TATGTGTTTG GGTTACTTGG TGGCTACACA AAAAAGATAT GTTATGAATG GAGAAAATTT 1200
CTCCGTTTTT TTATCTTTGT GAAAAAATTA ATTAGTGATA ATAAATCATG AAGTTAGCAA 1260
TGTTTGTCAA AGCTATTTAG TGAATTAATT ATGAAAACGT TTTAAAAAAG TATAACAGAT 1320
ATTAAAATAA TTGAAACTGT ATTAGTAAAG AATCTGTAAT TTCTCTTGAA TTCTGTTTGC 1380
TATTATCAAA CTGTATGATA TAATGAAGTT GTAATTTGAA ACAGAAAGAA CAAAGGAGAT 1440
TTCAAA ATG AAA ACC GAA GTT ACG GAA AAT ATC TTT GAA CAA GCT TGG 1488 Met Lyε Thr Glu Val Thr Glu Aεn He Phe Glu Gin Ala Trp 1 5 10 GAT GGT TTT AAA GGA ACT AAC TGG CGC GAT AAA GCA AGC GTT ACT CGC 1536 Asp Gly Phe Lys Gly Thr Asn Trp Arg Asp Lys Ala Ser Val Thr Arg 15 20 25 30
TTT GTA CAA GAA AAC TAC AAA CCA TAT GAT GGT GAT GAA AGC TTT CTT 1584 Phe Val Gin Glu Asn Tyr Lys Pro Tyr Asp Gly Asp Glu Ser Phe Leu 35 40 45
GCT GGG CCA ACA GAA CGT ACA CTT AAA GTA AAG AAA ATT ATT GAA GAT 1632 Ala Gly Pro Thr Glu Arg Thr Leu Lys Val Lys Lys He He Glu Asp 50 55 60
ACA AAA AAT CAC TAC GAA GAA GTA GGA TTT CCC TTT GAT ACT GAC CGC 1680 Thr Lyε Asn His Tyr Glu Glu Val Gly Phe Pro Phe Asp Thr Asp Arg 65 70 75
GTA ACC TCT ATC GAT AAA ATT CCT GCT GGA TAT ATT GAT GCT AAT GAT 1728 Val Thr Ser He Asp Lyε He Pro Ala Gly Tyr He Asp Ala Aεn Asp 80 85 90
AAA GAA CTT GAA CTC ATC TAT GGG ATG CAA AAT AGC GAA CTT TTC CGC 1776 Lys Glu Leu Glu Leu He Tyr Gly Met Gin Asn Ser Glu Leu Phe Arg 95 100 105 110
TTA AAC TTC ATG CCA AGA GGT GGT CTT CGT GTT GCT GAA AAG ATT TTG 1824 Leu Aεn Phe Met Pro Arg Gly Gly Leu Arg Val Ala Glu Lyε He Leu 115 120 125
ACA GAA CAC GGT CTT TCA GTT GAC CCA GGT TTG CAT GAT GTT TTG TCA 1872 Thr Glu His Gly Leu Ser Val Asp Pro Gly Leu His Asp Val Leu Ser 130 135 140
CAA ACA ATG ACT TCT GTA AAT GAT GGA ATC TTC CGT GCT TAT ACT TCA 1920 Gin Thr Met Thr Ser Val Asn Asp Gly He Phe Arg Ala Tyr Thr Ser 145 150 155
GCA ATT CGT AAA GCA CGT CAC GCT CAC ACT GTA ACA GGT TTG CCT GAT 1968 Ala He Arg Lys Ala Arg His Ala His Thr Val Thr Gly Leu Pro Asp 160 165 170
GCA TAC TCT CGT GGA CGT ATC ATC GGG GTA TAT GCA CGT CTT GCT CTT 2016 Ala Tyr Ser Arg Gly Arg He He Gly Val Tyr Ala Arg Leu Ala Leu 175 180 185 190
TAT GGA GCT GAC TAC CTT ATG AAG GAA AAA GCA AAA GAA TGG GAT GCA 2064 Tyr Gly Ala Aεp Tyr Leu Met Lys Glu Lys Ala Lys Glu Trp Asp Ala 195 200 205
ATC ACT GAA ATT AAT GAT GAT AAC ATT CGT CTT AAA GAA GAA ATT AAC 2112 He Thr Glu He Asn Aεp Aεp Asn He Arg Leu Lys Glu Glu He Asn 210 215 220
ATG CAA TAC CAA GCT TTG CAA GAA GTT GTA AAC TTT GGT GCT TTG TAT 2160 Met Gin Tyr Gin Ala Leu Gin Glu Val Val Asn Phe Gly Ala Leu Tyr 225 230 235 GGT CTT GAC GTT TCT CGT CCA GCG ATG AAC GTA AAA GAA GCA ATC CAA 2208 Gly Leu Aεp Val Ser Arg Pro Ala Met Aεn Val Lyε Glu Ala He Gin 240 245 250
TGG GTT AAT ATT GCA TAC ATG GCA GTT TGT CGT GTT ATC AAT GGT GCT 2256 Trp Val Aεn He Ala Tyr Met Ala Val Cys Arg Val He Asn Gly Ala 255 260 265 270
GCA ACT TCA CTT GGA CGT GTG CCA ATC GTT CTT GAC ATC TTT GCA GAA 2304 Ala Thr Ser Leu Gly Arg Val Pro He Val Leu Aεp He Phe Ala Glu 275 280 285
CGT GAC CTT GCT CGT GGA ACA TTT ACT GAG CAA GAA ATC CAA GAA TTT 2352 Arg Aεp Leu Ala Arg Gly Thr Phe Thr Glu Gin Glu He Gin Glu Phe 290 295 300
GTT GAT GAT TTC ATT TTA AAA CTT CGT ACA ATG AAA TTT GCT CGT GCT 2400 Val Asp Asp Phe He Leu Lys Leu Arg Thr Met Lyε Phe Ala Arg Ala 305 310 315
GCT GCT TAT GAT GAA CTT TAT TCT GGT GAC CCC ACG TTC ATC ACA ACA 2448 Ala Ala Tyr Asp Glu Leu Tyr Ser Gly Asp Pro Thr Phe He Thr Thr 320 325 330
TCT ATG GCT GGT ATG GGT AAT GAC GGA CGC CAC CGT GTC ACT AAA ATG 2496 Ser Met Ala Gly Met Gly Asn Asp Gly Arg His Arg Val Thr Lys Met 335 340 345 350
GAC TAT CGT TTC TTG AAC ACA CTT GAT ACA ATC GGA AAT GCT CCA GAA 2544 Aεp Tyr Arg Phe Leu Asn Thr Leu Asp Thr He Gly Asn Ala Pro Glu 355 360 365
CCA AAC TTG ACA GTT CTT TGG GAC TCT AAA CTC CCA TAT TCA TTC AAA 2592 Pro Asn Leu Thr Val Leu Trp Asp Ser Lys Leu Pro Tyr Ser Phe Lys 370 375 380
CGT TAT TCA ATG TCT ATG AGT CAC AAA CAC TCA TCT ATC CAA TAT GAA 2640 Arg Tyr Ser Met Ser Met Ser His Lyε Hiε Ser Ser He Gin Tyr Glu 385 390 395
GGT GTT GAA ACA ATG GCT AAA GAT GGA TAT GGC GAA ATG TCA TGT ATC 2688 Gly Val Glu Thr Met Ala Lys Asp Gly Tyr Gly Glu Met Ser Cys He 400 405 410
TCT TGT TGT GTC TCA CCA CTT GAC CCA GAA AAT GAA GAA GGA CGT CAT 2736 Ser Cys Cys Val Ser Pro Leu Asp Pro Glu Asn Glu Glu Gly Arg Hiε 415 420 425 430
AAT CTC CAA TAC TTT GGT GCG CGT GTA AAC GTC TTG AAA GCA ATG TTG 2784 Asn Leu Gin Tyr Phe Gly Ala Arg Val Aεn Val Leu Lyε Ala Met Leu 435 440 445
ACT GGT TTG AAC GGT GGT TAC GAT GAC GTT CAT AAA GAT TAT AAA GTA 2832 Thr Gly Leu Aεn Gly Gly Tyr Aεp Asp Val His Lys Asp Tyr Lyε Val 450 455 460 TTC GAT ATT GAA CCT GTT CGT GAT GAA ATT CTT GAC TAT GAT ACA GTT 2880 Phe Asp He Glu Pro Val Arg Asp Glu He Leu Asp Tyr Aεp Thr Val 465 470 475
ATG GAA AAC TTC GAC AAA TCA CTC AAC TGG TTG ACA GAT ACT TAT GTT 2928 Met Glu Aεn Phe Aεp Lyε Ser Leu Asn Trp Leu Thr Asp Thr Tyr Val 480 485 490
GAT GCA ATG AAT ATC ATT CAC TAC ATG ACT GAC AAA TAT AAC TAT GAA 2976 Asp Ala Met Asn He He His Tyr Met Thr Asp Lys Tyr Asn Tyr Glu 495 500 505 510
GCA GTT CAA ATG GCC TTC TTG CCT ACT AAA GTT CGT GCT AAC ATG GGA 3024 Ala Val Gin Met Ala Phe Leu Pro Thr Lys Val Arg Ala Asn Met Gly 515 520 525
TTT GGT ATC TGT GGT TTC GCA AAT ACA GTT GAT TCA CTT TCA GCG ATT 3072 Phe Gly He Cys Gly Phe Ala Asn Thr Val Aεp Ser Leu Ser Ala He 530 535 540
AAA TAT GCT AAA GTT AAA ACT TTG CGT GAT GAA AAT GGC TAC ATC TAC 3120 Lys Tyr Ala Lyε Val Lyε Thr Leu Arg Aεp Glu Asn Gly Tyr He Tyr 545 550 555
GAT TAT GAA GTA GAA GGT GAC TTC CCA CGT TAT GGT GAA GAT GAT GAC 3168 Asp Tyr Glu Val Glu Gly Aβp Phe Pro Arg Tyr Gly Glu Asp Asp Asp 560 565 570
CGT GCT GAT GAT ATC GCT AAA CTT GTC ATG AAA ATG TAC CAT GAA AAA 3216 Arg Ala Asp Asp He Ala Lys Leu Val Met Lys Met Tyr Hiε Glu Lys 575 580 585 590
TTA GCT TCA CAC AAA CTT TAC AAA AAT GCT GAA GCT ACT GTT TCA CTT 3264 Leu Ala Ser His Lys Leu Tyr Lys Aεn Ala Glu Ala Thr Val Ser Leu 595 600 605
TTG ACA ATC ACA TCT AAC GTT GCT TAC TCT AAA CAA ACT GGT AAC TCT 3312 Leu Thr He Thr Ser Asn Val Ala Tyr Ser Lys Gin Thr Gly Asn Ser 610 615 620
CCA GTT CAT AAA GGA GTA TTC CTC AAT GAA GAT GGT ACA GTC AAC AAA 3360 Pro Val His Lys Gly Val Phe Leu Aεn Glu Asp Gly Thr Val Asn Lye 625 630 635
TCT AAA CTT GAA TTC TTC TCA CCA GGT GCT AAC CCA TCT AAC AAA GCT 3408 Ser Lys Leu Glu Phe Phe Ser Pro Gly Ala Asn Pro Ser Aεn Lys Ala 640 645 650
AAA GGT GGA TGG TTG CAA AAT CTT CGT TCA TTA GCT AAA TTG GAA TTC 3456 Lys Gly Gly Trp Leu Gin Asn Leu Arg Ser Leu Ala Lys Leu Glu Phe 655 660 665 670
AAA GAT GCA AAT GAC GGT ATT TCA TTA ACT ACT CAA GTT TCT CCT CGT 3504 Lys Aεp Ala Aεn Aεp Gly He Ser Leu Thr Thr Gin Val Ser Pro Arg 675 680 685 GCA CTT GGT AAA ACT CGT GAT GAA CAA GTA GAT AAC TTG GTT CAA ATT 3552 Ala Leu Gly Lyε Thr Arg Asp Glu Gin Val Asp Aεn Leu Val Gin He 690 695 700
CTT GAT GGA TAC TTC ACA CCA GGA GCT TTG ATT AAT GGT ACT GAA TTT 3600 Leu Asp Gly Tyr Phe Thr Pro Gly Ala Leu He Aεn Gly Thr Glu Phe 705 710 715
GCA GGT CAA CAC GTT AAC TTG AAC GTT ATG GAC CTT AAA GAT GTT TAC 3648 Ala Gly Gin Hiε Val Asn Leu Asn Val Met Asp Leu Lys Aεp Val Tyr 720 725 730
GAT AAA ATC ATG CGT GGT GAA GAT GTT ATC GTT CGT ATC TCT GGA TAC 3696 Asp Lys He Met Arg Gly Glu Asp Val He Val Arg He Ser Gly Tyr 735 740 745 750
TGT GTT AAC ACT AAA TAC CTC ACA CCT GAA CAA AAA CAA GAA TTG ACT 3744 Cyε Val Aεn Thr Lyε Tyr Leu Thr Pro Glu Gin Lyε Gin Glu Leu Thr 755 760 765
GAA CGT GTC TTC CAT GAA GTA CTT TCA AAT GAT GAT GAA GAA GTA ATG 3792 Glu Arg Val Phe Hiε Glu Val Leu Ser Asn Asp Asp Glu Glu Val Met 770 775 780
CAC ACT TCA AAT ATC TAATTCTTAG TATTAAAAAA TATAAGGTCT GTCAGTTCTA C 3848 Hiε Thr Ser Asn He 785
TGACAGACTT TTTTTCTATA AATTAATTAT AATAGTTAAA AACTATTATT TTTAGTTTAA 3908
GAAAAATAAA ATTTGTGCTA AAATAGATGA ATGATAAAGG TAATTGGATT AACAGGCGGA 3968
ATTGCGAGTG GGAAATCAAC GGTGGTTGAT TTTTTGATTT CTGAAGGTTA TCAAGTAATT 4028
GATGCTGACA AAGTTGTTCG TCAGTTGCAA GAACCTGATG GGAAACTTTT TAATGCAATA 4088
ATGGAAACTT TCGGTTCAGA TTTTACTGAC GAAAATGGGA AATTAAACCG ATGCAAAATT 4148
GAGTGCTTAA GTTTTGCTGA CCCAAATCAA CGTCAAAAAT TAT 4191
(2) INFORMATION FOR SEQ ID NO:39:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 787 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein (v) FRAGMENT TYPE: internal
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39:
Met Lys Thr Glu Val Thr Glu Asn He Phe Glu Gin Ala Trp Asp Gly
1 5 10 15
Phe Lys Gly Thr Asn Trp Arg Asp Lys Ala Ser Val Thr Arg Phe Val
20 25 30
Gin Glu Asn Tyr Lys Pro Tyr Asp Gly Aεp Glu Ser Phe Leu Ala Gly
35 40 45
Pro Thr Glu Arg Thr Leu Lyε Val Lys Lyε He He Glu Aεp Thr Lyε
50 55 60
Aεn Hiε Tyr Glu Glu Val Gly Phe Pro Phe Asp Thr Asp Arg Val Thr 65 70 75 80 Ser He Asp Lys He Pro Ala Gly Tyr He Asp Ala Asn Aεp Lys Glu
85 90 95
Leu Glu Leu He Tyr Gly Met Gin Asn Ser Glu Leu Phe Arg Leu Asn
100 105 110
Phe Met Pro Arg Gly Gly Leu Arg Val Ala Glu Lyε He Leu Thr Glu
115 120 125
His Gly Leu Ser Val Asp Pro Gly Leu His Asp Val Leu Ser Gin Thr
130 135 140
Met Thr Ser Val Asn Asp Gly He Phe Arg Ala Tyr Thr Ser Ala He 145 150 155 160
Arg Lyε Ala Arg Hiε Ala Hiε Thr Val Thr Gly Leu Pro Aεp Ala Tyr
165 170 175
Ser Arg Gly Arg He He Gly Val Tyr Ala Arg Leu Ala Leu Tyr Gly
180 185 190
Ala Aεp Tyr Leu Met Lyε Glu Lyε Ala Lyε Glu Trp Asp Ala He Thr
195 200 205
Glu He Asn Asp Asp Asn He Arg Leu Lys Glu Glu He Asn Met Gin
210 215 220
Tyr Gin Ala Leu Gin Glu Val Val Asn Phe Gly Ala Leu Tyr Gly Leu 225 230 235 240
Asp Val Ser Arg Pro Ala Met Asn Val Lys Glu Ala He Gin Trp Val
245 250 255
Asn He Ala Tyr Met Ala Val Cys Arg Val He Asn Gly Ala Ala Thr
260 265 270
Ser Leu Gly Arg Val Pro He Val Leu Asp He Phe Ala Glu Arg Asp
275 280 285
Leu Ala Arg Gly Thr Phe Thr Glu Gin Glu He Gin Glu Phe Val Asp
290 295 300
Asp Phe He Leu Lys Leu Arg Thr Met Lys Phe Ala Arg Ala Ala Ala 305 310 315 320
Tyr Asp Glu Leu Tyr Ser Gly Asp Pro Thr Phe He Thr Thr Ser Met
325 330 335
Ala Gly Met Gly Asn Aεp Gly Arg His Arg Val Thr Lys Met Asp Tyr
340 345 350
Arg Phe Leu Aεn Thr Leu Aεp Thr He Gly Asn Ala Pro Glu Pro Asn
355 360 365
Leu Thr Val Leu Trp Asp Ser Lys Leu Pro Tyr Ser Phe Lyε Arg Tyr
370 375 380
Ser Met Ser Met Ser Hiε Lys Hiε Ser Ser He Gin Tyr Glu Gly Val 385 390 395 400
Glu Thr Met Ala Lys Asp Gly Tyr Gly Glu Met Ser Cys He Ser Cys
405 410 415
Cys Val Ser Pro Leu Asp Pro Glu Asn Glu Glu Gly Arg His Asn Leu
420 425 430
Gin Tyr Phe Gly Ala Arg Val Asn Val Leu Lys Ala Met Leu Thr Gly
435 440 445
Leu Asn Gly Gly Tyr Asp Asp Val His Lys Asp Tyr Lys Val Phe Asp
450 455 460
He Glu Pro Val Arg Asp Glu He Leu Asp Tyr Asp Thr Val Met Glu 465 470 475 480
Aεn Phe Asp Lyε Ser Leu Asn Trp Leu Thr Asp Thr Tyr Val Asp Ala
485 490 495
Met Asn He He His Tyr Met Thr Asp Lys Tyr Asn Tyr Glu Ala Val
500 505 510
Gin Met Ala Phe Leu Pro Thr Lys Val Arg Ala Asn Met Gly Phe Gly
515 520 525
He Cyε Gly Phe Ala Aεn Thr Val Asp Ser Leu Ser Ala He Lys Tyr 530 535 540 Ala Lys Val Lyε Thr Leu Arg Aεp Glu Asn Gly Tyr He Tyr Asp Tyr 545 550 555 560
Glu Val Glu Gly Asp Phe Pro Arg Tyr Gly Glu Asp Aεp Asp Arg Ala
565 570 575
Asp Aεp He Ala Lyε Leu Val Met Lys Met Tyr His Glu Lys Leu Ala
580 585 590
Ser Hiε Lyε Leu Tyr Lys Asn Ala Glu Ala Thr Val Ser Leu Leu Thr
595 600 605
He Thr Ser Aεn Val Ala Tyr Ser Lys Gin Thr Gly Asn Ser Pro Val
610 615 620
Hiε Lyε Gly Val Phe Leu Aεn Glu Asp Gly Thr Val Asn Lys Ser Lys 625 630 635 640
Leu Glu Phe Phe Ser Pro Gly Ala Aεn Pro Ser Asn Lys Ala Lys Gly
645 650 655
Gly Trp Leu Gin Asn Leu Arg Ser Leu Ala Lyε Leu Glu Phe Lys Asp
660 665 670
Ala Asn Aεp Gly He Ser Leu Thr Thr Gin Val Ser Pro Arg Ala Leu
675 680 685
Gly Lyε Thr Arg Aεp Glu Gin Val Asp Asn Leu Val Gin He Leu Asp
690 695 700
Gly Tyr Phe Thr Pro Gly Ala Leu He Asn Gly Thr Glu Phe Ala Gly 705 710 715 720
Gin Hiε Val Aεn Leu Aεn Val Met Aεp Leu Lyε Asp Val Tyr Asp Lyε
725 730 735
He Met Arg Gly Glu Asp Val He Val Arg He Ser Gly Tyr Cys Val
740 745 750
Asn Thr Lys Tyr Leu Thr Pro Glu Gin Lys Gin Glu Leu Thr Glu Arg
755 760 765
Val Phe Hiε Glu Val Leu Ser Aεn Aεp Asp Glu Glu Val Met Hiε Thr
770 775 780
Ser Asn He 785
(2) INFORMATION FOR SEQ ID NO: 40:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Other
(B) LOCATION: 6...9
(D) OTHER INFORMATION: Unknown
(A) NAME/KEY: Other
(B) LOCATION: 1...14
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: TTGATNNNNA TCAA 14 (2) INFORMATION FOR SEQ ID NO: 41:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Other
(B) LOCATION: 6...9
(D) OTHER INFORMATION: Unknown
(A) NAME/KEY: Other
(B) LOCATION: 1...14
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: GGAGTNNNNA TCAA 14
(2) INFORMATION FOR SEQ ID NO: 42:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 14 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Other
(B) LOCATION: 6...9
(D) OTHER INFORMATION: Unknown
(A) NAME/KEY: Other
(B) LOCATION: 1...14
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: TTTGCNNNNA TCAA 14
(2) INFORMATION FOR SEQ ID NO: 43:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 32 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA ( ix) FEATURE :
(A) NAME/KEY: Other
(B) LOCATION: 1...32
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: GGCCGCTCGA GTTGTGTCTC ACCACTTGAC CC 32
(2) INFORMATION FOR SEQ ID NO: 4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 33 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: Other
(B) LOCATION: 1...33
(D) OTHER INFORMATION: Primer
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: TAGTAGGATC CCATCATCTT CACCATAACG TGG 33

Claims

1. An isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) ac ivity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
2. A DNA sequence according to claim 1 further comprising sequences regulating the expression of the coding sequence and/or the activity of its gene product.
3. A DNA sequence according to claim 1 which is derived from a lactic acid bacterium selected from the group consisting of a
Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
4. A DNA sequence according to claim 3 which is derived from Lactococcus lactis.
5. A DNA sequence according to claim 1 coding for a polypeptide which is at least 30% identical with a polypeptide which is selected from the group consisting of the gene product of the adhE gene of E. coli as recorded in FASTA, GCG Wisconsin under the accession No. P17547, the gene product of the aad gene of Clostridium acetobutylicum as recorded in FASTA, GCG Wisconsin under the accession No. P33744 and of the DNA sequence of SEQ ID NO: 3.
6. A DNA sequence according to claim 1 which comprises the coding sequence of SEQ ID NO: 3 or SEQ ID NO: 30, or a mutant or variant hereof which codes for a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
7. A recombinant replicon comprising the DNA sequence of claim 1.
8. A replicon according to claim 7 which is selected from a plasmid capable of replicating in a lactic acid bacterium and a lactic acid bacterial chromosome .
9. A recombinant lactic acid bacterial cell comprising the replicon of claim 7.
10. A lactic acid bacterial cell according to claim 9 which is selected from the group consisting of a Lactococcus species, a
Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
11. A lactic acid bacterial cell according to claim 9 which is in the form of a starter culture composition for the production of a food product or an animal feed, or in the form of a culture for the production of an aroma or antimicrobially active compound.
12. A lactic acid bacterial cell according to claim 9 wherein the DNA sequence comprising the sequence coding for the multi- functional polypeptide is modified so as to inactivate or reduce the production of or the activity of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
13. A lactic acid bacterial cell according to claim 12 wherein said modification of the DNA sequence results in the cell producing increased amounts of a metabolite selected from the group consisting of acetaldehyde, acetate and ethanol.
14. A lactic acid bacterial cell according to claim 9 wherein the DNA sequence comprising the sequence coding for the multifunctional polypeptide is modified so as to enhance the produc- tion of or the activity of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde iε converted into ethanol, (iii) capabil- ity of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
15. A lactic acid bacterial cell according to claim 14 wherein said modification of the DNA sequence results in the cell producing an increased amount of a metabolite selected from the group consisting of acetaldehyde, ethanol, formate, acetate, a- acetolactate, acetoin, diacetyl and 2,3 butylene glycol.
16. An isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having pyruvate formate-lyase activity, subject to the limita- tion that the sequence is not derived from an oral Streptococcus species.
17. A DNA sequence according to claim 16 comprising at least one regulatory sequence regulating the expression of the pyruvate formate- lyase polypeptide or coding for a gene product regulating the pyruvate formate-lyase activity of the polypeptide.
18. A DNA sequence according to claim 17 wherein the regulating gene product is selected from a pyruvate formate-lyase activase and a pyruvate formate-lyase deactivase.
19. A DNA sequence according to claim 18 wherein the deactivase is a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capabi- lity of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity as defined in claim l.
20. A DNA sequence according to claim 16 which is derived from a lactic acid bacterium selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
21. A DNA sequence according to claim 20 which is derived from Lactococcus lactis.
22. A DNA sequence according to claim 16 which comprises the coding sequence of SEQ ID NO: 15 or SEQ ID NO: 30, or a mutant or variant hereof which codes for a polypeptide having pyruvate formate-lyase activity.
23. A recombinant replicon comprising the DNA sequence of claim 16.
24. A replicon according to claim 23 which is selected from a plasmid capable of replicating in a lactic acid bacterium and a lactic acid bacterial chromosome.
25. A recombinant lactic acid bacterial cell comprising the replicon of claim 23.
26. A lactic acid bacterial cell according to claim 25 which is selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
27. A lactic acid bacterial cell according to claim 25 which is in the form of a starter culture composition for the production of a food product or an animal feed.
28. A lactic acid bacterial cell according to claim 25 wherein the DNA sequence is modified whereby its production of pyruvate formate-lyase is reduced or inhibited or whereby the enzyme is produced in a modified form having a reduced pyruvate formate-lyase activity.
29. A lactic acid bacterial cell according to claim 28 wherein said modification of the DNA sequence results in that the cell produces increased amounts of a metabolite selected from the group consisting of α-acetolactate, acetoin, diacetyl and 2,3 butylene glycol .
30. A lactic acid bacterial cell according to claim 25 wherein the DNA sequence is modified whereby its production of pyruvate formate-lyase is enhanced or whereby the enzyme is produced in a modified form having an increased pyruvate formate-lyase activity.
31. A lactic acid bacterial cell according to claim 30 wherein said modification of the DNA sequence results in the cell producing increased amounts of formate.
32. A recombinant lactic acid bacterial cell comprising the DNA sequence of claim 1 and the DNA sequence of claim 16.
33. A recombinant lactic acid bacterial cell according to claim 32 wherein at least one of said DNA sequences is modified so as to modify the expression of pyruvate formate-lyase or the activity hereof.
34. A method of producing a lactic acid bacterial metabolite, the method comprising cultivating a lactic acid bacterium according to any of claims 12, 15, 30 or 32 under conditions where the metabolite is produced and isolating the metabolite from the culture.
36. A method of producing an animal feed, the method comprising the step of admixing to the feed starting materials a starter culture of a lactic acid bacterium according to claim 9 or 26 and keeping the mixture under conditions allowing the starter culture to be metabolically active.
37. An isolated DNA sequence derived from a lactic acid bacterium, said sequence coding for a product having a formate transporter activity.
38. A DNA sequence according to claim 37 which is the open reading frame orfA isolated from Lactococcus lactis strain DB1341 where it is located upstream of the pfl gene (SEQ ID NO: 34) .
PCT/DK1997/000336 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same WO1998007867A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP97934442A EP0938566A2 (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same
NZ334294A NZ334294A (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same
CA002262418A CA2262418A1 (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same
AU37659/97A AU721803B2 (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US70145896A 1996-08-22 1996-08-22
US08/701,458 1996-08-22

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US98109797A A-371-Of-International 1997-12-17 1997-12-17
US10/267,989 Continuation US20030199035A1 (en) 1997-12-17 2002-10-10 Metabolically engineered lactic acid bacteria and means for providing same

Publications (2)

Publication Number Publication Date
WO1998007867A2 true WO1998007867A2 (en) 1998-02-26
WO1998007867A3 WO1998007867A3 (en) 1998-05-22

Family

ID=24817464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK1997/000336 WO1998007867A2 (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same

Country Status (5)

Country Link
EP (1) EP0938566A2 (en)
AU (1) AU721803B2 (en)
CA (1) CA2262418A1 (en)
NZ (1) NZ334294A (en)
WO (1) WO1998007867A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999027107A3 (en) * 1997-11-20 1999-08-12 Genencor Int Gram-positive microorganism formate transporters
WO2014183117A1 (en) * 2013-05-10 2014-11-13 Richard Carpenter Compositions comprising a mixture of bacteria comprising pedoiococcus and lactobacillus an methods for decreasing the effects of alcohols
CN110088279A (en) * 2016-12-15 2019-08-02 株式会社钟化 Novel host cell and used its target protein manufacturing method
CN111471660A (en) * 2020-03-12 2020-07-31 广州辉园苑医药科技有限公司 Acetaldehyde dehydrogenase recombinant gene, lactobacillus vector and application thereof
CN112601808A (en) * 2018-08-10 2021-04-02 协和发酵生化株式会社 Microorganism producing eicosapentaenoic acid and process for producing eicosapentaenoic acid

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
DOROTHEA KESSLER ET AL.: "Pyruvate-formate-lyase-deactivase and acetyl-CoA reductase activities of Escherichia coli reside on a polymeric protein particle encoded by adhE" FEBS LETTERS, vol. 281, no. 1,2, 9 April 1991, AMSTERDAM NL, pages 59-63, XP002045430 cited in the application *
GERHARD WEIDNER & GARY SAWERS: "Molecular characterization of the genes encoding pyruvate formate-lyase and its activating enzyme of Clostridium pasteurianum" JOURNAL OF BACTERIOLOGY, vol. 178, no. 8, April 1996, pages 2440-2444, XP002055590 *
JOACHIM KNAPPE ET AL.: "A radical-chemical route to acetyl-CoA: the anaerobically induced pyruvate formate-lyase system of Escherichia coli" FEMS MICROBIOLOGY REVIEWS, vol. 75, no. 4, August 1990, pages 383-398, XP002045432 *
JOSÉ ARNAU ET AL.: "Cloning, expression, and characterization of the Lactococcus lactis pfl gene, encoding pyruvate formate-lyase" JOURNAL OF BACTERIOLOGY, vol. 179, no. 18, September 1997, pages 5884-5891, XP002055591 *
RAMESH V. NAIR ET AL.: "Molecular characterization of an aldehyde/alcohol dehydroganase gene from Clostridium acetobutylicum ATCC 824" JOURNAL OF BACTERIOLOGY, vol. 176, no. 3, February 1994, pages 871-885, XP002045431 cited in the application *
VALERIE M. MARSHALL ET AL: "Threonine aldolase and alcohol dehydrogenase activities in Lactobacillus bulgaricus and Lactobacillus acidophilus and their contribution to flavour production in fermented milks" JOURNAL DAIRY RESEARCH, vol. 50, no. 3, 1983, pages 375-379, XP000677656 *
WENGANG YANG ET AL: "Entamoeba histolytica has an alcohol dehydrogenase homologous to the multifunctional adhE gene product of Escherichia coli" MOLECULAR AND BIOCHEMICAL PARASITOLOGY, vol. 64, no. 2, 1994, pages 253-260, XP000646353 cited in the application *
YASUHITO YAMAMOTO ET AL.: "Cloning and sequence analysis of the pfl gene encoding pyruvate formate-lyase from Streptococcus mutans" INFECTION AND IMMUNITY, vol. 64, no. 2, February 1996, WASHINGTON US, pages 385-391, XP002045550 cited in the application *
YU-MEI CHEN ET AL.: "Regulation of the adhE gene, which encodes ethanol dehydrogenase in Escherichia coli" JOURNAL OF BACTERIOLOGY, vol. 173, no. 24, December 1991, pages 8009-8013, XP002045433 cited in the application *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999027107A3 (en) * 1997-11-20 1999-08-12 Genencor Int Gram-positive microorganism formate transporters
US6458557B1 (en) 1997-11-20 2002-10-01 Genencor International, Inc. Enhancing growth in gram-positive microorganisms using formate supplementation and inactivation of formate-associated transport proteins
WO2014183117A1 (en) * 2013-05-10 2014-11-13 Richard Carpenter Compositions comprising a mixture of bacteria comprising pedoiococcus and lactobacillus an methods for decreasing the effects of alcohols
JP2016521287A (en) * 2013-05-10 2016-07-21 バイオウィッシュ テクノロジーズ インコーポレイテッド Composition comprising a mixture of bacteria comprising Pediococcus and Lactobacillus, and method for reducing the effects of alcohol
US10130664B2 (en) 2013-05-10 2018-11-20 BiOWiSH Technologies, Inc. Compositions and methods for decreasing the effects of alcohol
CN110088279A (en) * 2016-12-15 2019-08-02 株式会社钟化 Novel host cell and used its target protein manufacturing method
CN112601808A (en) * 2018-08-10 2021-04-02 协和发酵生化株式会社 Microorganism producing eicosapentaenoic acid and process for producing eicosapentaenoic acid
CN111471660A (en) * 2020-03-12 2020-07-31 广州辉园苑医药科技有限公司 Acetaldehyde dehydrogenase recombinant gene, lactobacillus vector and application thereof
CN111471660B (en) * 2020-03-12 2023-11-24 广州辉园苑医药科技有限公司 Acetaldehyde dehydrogenase recombinant gene, lactic acid bacteria carrier and application thereof

Also Published As

Publication number Publication date
CA2262418A1 (en) 1998-02-26
WO1998007867A3 (en) 1998-05-22
NZ334294A (en) 2000-02-28
AU3765997A (en) 1998-03-06
EP0938566A2 (en) 1999-09-01
AU721803B2 (en) 2000-07-13

Similar Documents

Publication Publication Date Title
Taguchi et al. D-lactate dehydrogenase is a member of the D-isomer-specific 2-hydroxyacid dehydrogenase family. Cloning, sequencing, and expression in Escherichia coli of the D-lactate dehydrogenase gene of Lactobacillus plantarum
Söhling et al. Molecular analysis of the anaerobic succinate degradation pathway in Clostridium kluyveri
JP4187774B2 (en) Novel alcohol aldehyde dehydrogenase
Arnau et al. Cloning, expression, and characterization of the Lactococcus lactis pfl gene, encoding pyruvate formate-lyase
CA2326405C (en) Novel genetically modified lactic acid bacteria having modified diacetyl reductase activities
Priefert et al. Identification and molecular characterization of the Alcaligenes eutrophus H16 aco operon genes involved in acetoin catabolism
US6261827B1 (en) Method for increasing hemoprotein production in filamentous fungi
KR20160133308A (en) Yeast cell having acid tolerant property, method for preparing the yeast cell and use thereof
Jobin et al. Expression of the Oenococcus oeni trxA gene is induced by hydrogen peroxide and heat shock
Denayrolles et al. Cloning and sequence analysis of the gene encoding Lactococcus lactis malolactic enzyme: relationships with malic enzymes
US7160708B2 (en) Fatty alcohol oxidase genes and proteins from Candida tropicalis and methods relating thereto
US5958747A (en) Aspergillus oryzae 5-aminolevulinic acid synthases and nucleic acids encoding same
WO1998007867A2 (en) Metabolically engineered lactic acid bacteria and means for providing same
US20030199035A1 (en) Metabolically engineered lactic acid bacteria and means for providing same
Sahara et al. Cloning, sequencing, and expression of a gene encoding the monomeric isocitrate dehydrogenase of the nitrogen-fixing bacterium, Azotobacter vinelandii
US7541168B2 (en) Recombinant cyclopentanone monooxygenase [cpmo]
CN116670295A (en) Amycolatopsis strain for producing vanillin with suppressed formation of vanillic acid
Suzuki et al. Differential expression in Escherichia coli of the Vibrio sp. strain ABE-1 icdI and icdII genes encoding structurally different isocitrate dehydrogenase isozymes
Kirimura et al. Cloning and expression of Aspergillus niger icdA gene encoding mitochondrial NADP+-specific isocitrate dehydrogenase
van Beilen Alkane oxidation by Pseudomonas oleovorans: genes and proteins
EP0385451B1 (en) Cytochrome C gene derived from hydrogen bacterium
US5866391A (en) Aspergillus porphobilinogen synthases and nucleic acids encoding same
Tani et al. Two acyl-CoA dehydrogenases of Acinetobacter sp. strain M-1 that uses very long-chain n-alkanes
EP1614745B1 (en) Process for producing glucose dehydrogenase
Snoep et al. Catabolism of Branched-Chain

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 08981097

Country of ref document: US

AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE ES FI FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase in:

Ref country code: CA

Ref document number: 2262418

Kind code of ref document: A

Format of ref document f/p: F

Ref document number: 2262418

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 334294

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: 1997934442

Country of ref document: EP

NENP Non-entry into the national phase in:

Ref country code: JP

Ref document number: 1998510286

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1997934442

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997934442

Country of ref document: EP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载