+

WO2018127544A1 - USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES - Google Patents

USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES Download PDF

Info

Publication number
WO2018127544A1
WO2018127544A1 PCT/EP2018/050225 EP2018050225W WO2018127544A1 WO 2018127544 A1 WO2018127544 A1 WO 2018127544A1 EP 2018050225 W EP2018050225 W EP 2018050225W WO 2018127544 A1 WO2018127544 A1 WO 2018127544A1
Authority
WO
WIPO (PCT)
Prior art keywords
amino acid
seq
group
rsam
acid sequence
Prior art date
Application number
PCT/EP2018/050225
Other languages
French (fr)
Inventor
Jörn PIEL
Brandon MORINAKA
Original Assignee
Eth Zurich
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eth Zurich filed Critical Eth Zurich
Publication of WO2018127544A1 publication Critical patent/WO2018127544A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y205/00Transferases transferring alkyl or aryl groups, other than methyl groups (2.5)
    • C12Y205/01Transferases transferring alkyl or aryl groups, other than methyl groups (2.5) transferring alkyl or aryl groups, other than methyl groups (2.5.1)
    • C12Y205/01006Methionine adenosyltransferase (2.5.1.6), i.e. adenosylmethionine synthetase

Definitions

  • the present invention relates to the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-3 ⁇ 4 3 -amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine. Furthermore, the present invention relates to a method for introducing at least one a-keto-3 ⁇ 4 3 -amino acid into (poly)-peptides comprising said XYG motif using a radical S-adenosyl methionine (rSAM) enzyme.
  • rSAM radical S-adenosyl methionine
  • ⁇ -amino acids are widely distributed in nature and are present in natural product-derived drugs such as penicillin, taxol and cocaine.
  • the installation of the ⁇ -amino acids in peptides offers, e.g., greater stability (for example from protease-mediated degradation) and can confer structural features distinct from all L-amino acid-containing peptides or natural products, resulting in unique functions and bioactivities.
  • Known methods to incorporate ⁇ -amino acids into peptides generally rely on total synthesis, for example, by condensation of monomers in solid- phase synthesis.
  • a second approach relies on in vitro translation, which usually suffers from low incorporation efficiency.
  • ribosomally synthesized peptides and proteins are comprised of a-amino acids.
  • No post-translational modifying enzymes are known that incorporate ⁇ -amino acids into ribosomal products.
  • Biosynthetic routes to ⁇ -amino acid-containing peptides typically utilize non-ribosomal peptide synthetases (NRPS) that act on free amino acids or peptide residues.
  • NRPS non-ribosomal peptide synthetases
  • These enzymes are very large multimodular enzymes and are difficult to manipulate for bioengineering or biotechnological applications.
  • the loading of amino acids onto NRPS usually occurs specifically for certain amino acids.
  • the NRPS-type machinery is limited to small peptides with typically less than 15 residues. Therefore, incorporation of ⁇ - amino acids into proteins by NRPSs is not possible.
  • the above objective is solved by the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)- peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine.
  • rSAM radical S-adenosyl methionine
  • radical S-adenosyl methionine (rSAM) enzymes have the capacity to incorporate different types of a-keto-IS 3 -amino acids into diverse (poly)peptide substrates that comprise one or more of amino acid motif XYG.
  • rSAM enzymes are monomeric enzymes of up to 50 kDa and can be overexpressed in high yields in many cells, e.g. in E. coli (see below).
  • peptides generated by NRPSs require huge enzyme complexes and are typically quite specific to result in either a single product or a few closely related analogs only.
  • Preferred rSAM enzymes for use in the present invention were identified by their surrogate substrate nifll precursor peptides within ribosomally synthesized and post-translationally modified peptide (RiPP) natural product gene clusters.
  • Gene clusters containing NHLP- or Nifll- type precursors are considered a new natural product family (see Freeman et al. Science 338, 387-90 (2012) and Haft et al. BMC Biology 8, 70 (2010)) termed "proteusins".
  • proteusin biosynthetic gene clusters are widespread in bacteria, their biosynthetic end products and functions are currently unknown with the exception of the cytotoxic pore-forming polytheonamides (see Hamada et al. Tetrahedron Lett.
  • a-keto ⁇ 3 -amino acid-comprising (poly)peptide products can conveniently be generated according to the present invention by in vivo co-expression of a (poly)peptide substrate with the rSAM enzymes in bacteria and also other organisms, e.g. yeast, plant cells, mammalian cells or insect cells.
  • the rSAM enzymes described herein also introduce a keto functionality that is not present in proteinogenic amino acids.
  • the introduction of a keto functionality into ribosomally synthesized (poly)peptides is of interest, e.g.
  • non-selective chemical oxidation such as sodium periodate to oxidize all side chains of serine
  • introduction of non-canonical amino acids by an unnatural amino acid mutagenesis strategy based on recoding of stop codons.
  • the pro- teusin cluster e.g. from Pleurocapsa sp. PCC 7319, containing genes for two substrates (e.g. plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117), an rSAM SPASM protein (e.g. plpX (SEQ ID NOs: 118 and 119), and optionally an associated protein (e.g. plpY (SEQ ID NOs: 120 and 121) can be cloned into expression vectors for expression in E. coli.
  • the associated protein e.g.
  • plpY is only optional, e.g. to increase the enzymatic efficiency of the rSAM and is not required for practicing the present invention.
  • the substrate peptides e.g. plpA2 and plpA3
  • the rSAM enzyme and the optional associated protein e.g. pip X and plpY
  • the substrate peptide genes can be individually transformed with the rSAM enzyme and the optional associated protein (e.g. plpX and plpY), e.g. into E.
  • coli and protein overexpression can be carried out in standard medium.
  • purification e.g. Ni-affinity purification of the N-terminally hexahistidine-tagged (NHis6) peptides, and in vitro cleavage of leader, e.g. with a protease
  • the core (poly)peptide comprising the a-keto ⁇ 3 -amino acid(s) can be analyzed, e.g. by MALDI-MS, and optionally the a-keto ⁇ -amino acid(s) can be reduced to the corresponding ⁇ -amino acid(s) using standard chemical methods, e.g. using sodium borohydride.
  • the rSAM enzyme for use in the present invention excises a structure from the XYG motif that has a mass of C 8 H 9 NO, as determined by high resolution mass spectrometry and is attributed to the tyrosine residue.
  • the structures of the transformed substrates are products containing the a-keto ⁇ 3 -amino acid.
  • the XYG motif is conserved among a wide range of precursors, e.g.
  • precursors carrying more than one XYG motif can be converted to products with several a-keto ⁇ 3 -amino acids after expression of the rSAM enzyme in E. coli.
  • ⁇ -amino acid refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has its amino group bonded to the ⁇ -carbon rather than the a carbon.
  • a-keto ⁇ 3 -amino acid refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has a keto group at the a carbon and its amino group bonded to the 3- ⁇ 3 ⁇ rather than the a carbon.
  • polypeptide as used herein, is meant to encompass peptides, polypeptides, oligopeptides and proteins that comprise two or more amino acids linked covalently through peptide bonds. The term does not refer to a specific length of the product.
  • the term (poly)- peptide includes (poly)peptides with post-translational modifications, for example, glycosylates, acetylations, phosphorylations and the like, as well as (poly)peptides comprising non- natural or non-conventional amino acids and functional derivatives as described below.
  • non-natural or non-conventional amino acid refers to naturally occurring or naturally not occurring unnatural amino acids or chemical amino acid analogues, e.g. D-amino acids, ⁇ , ⁇ -disubstituted amino acids, N-alkyl amino acids, homo-amino acids, dehyd roamino acids, aromatic amino acids (other than phenylalanine, tyrosine and tryptophan), and ortho-, meta- or para-aminobenzoic acid.
  • D-amino acids e.g. D-amino acids, ⁇ , ⁇ -disubstituted amino acids, N-alkyl amino acids, homo-amino acids, dehyd roamino acids, aromatic amino acids (other than phenylalanine, tyrosine and tryptophan), and ortho-, meta- or para-aminobenzoic acid.
  • Non-conventional amino acids also include compounds which have an amine and carboxyl functional group separated in a 1,3 or larger substitution pattern, such as ⁇ -alanine, y-amino butyric acid, Freidinger lactam, the bicyclic dipeptide (BTD) , amino- methyl benzoic acid and others well known in the art.
  • BTD bicyclic dipeptide
  • Statine-like isosteres, hydroxyethylene isosteres, reduced amide bond isosteres, thioamide isosteres, urea isosteres, carbamate isosteres, thioether isosteres, vinyl isosteres and other amide bond isosteres known to the art may also be used.
  • a non limiting list of non-conventional amino acids which may be comprised in the (poly)peptide and their standard abbreviations (in brackets) is as follows: a-aminobutyric acid (Abu), L-N-methylalanine (Nmala), ⁇ -amino-a-methylbutyrate (Mgabu), L-N-methylarginine (Nmarg), aminocyclopropane (Cpro), L-N-methylasparagine (Nmasn), carboxylate L-N-methyl- aspartic acid (Nmasp), aniinoisobutyric acid (Aib), L-N-methylcysteine (Nmcys), aminonorbornyl (Norb), L-N-methylglutamine (Nmgln), carboxylate L-N-methylglutamic acid (Nmglu), cyclohexyl- alanine (Chexa), L-N-methylhistidine (Nmhis), cyclopentylalanine
  • Nmaabu D-a-methylleucine (Dmleu), a-napthylalanine (Anap), D-a-methyllysine (Dmlys), N- benzylglycine (Nphe), D-a-methylmethionine (Dmmet), N-(2-carbamylethyl)glycine (Ngln), D-a- methylornithine (Dmorn), N-(carbamylmethyl)glycine (Nasn), D-a-methylphenylalanine (Dmphe), N-(2-carboxyethyl)glycine (Nglu), D-a-methylproline (Dmpro), N-(carboxymethyl)glycine (Nasp), D-a-methylserine (Dmser), N-cyclobutylglycine (Ncbut), D-a-methylthreonine (Dmthr), N-cyclo- heptylglycine (
  • the rSAM enzyme for use in the present invention is a peptide radical SAM maturase, preferably a nifll class peptide radical SAM maturase 3.
  • the nifll-class peptide radical SAM maturase 3 belongs to the conserved protein domain family rSAM_nifll_3 (nifll-class peptide radical SAM maturase 3; IPR026482) or the rad_SAM_trio family (radical SAM GDL-associated; IPR023820) as defined by the National Center for Biotechnology Information (NCBI).
  • rSAM_nifll_3 are radical SAM enzymes that often occur co-clustered together with nifll-related ribosomal natural product (RNP) precursors described by TIGRFAMs model TIGR03798.
  • RNP ribosomal natural product
  • rad_SAM_trio radical SAM enzymes that often occur co-clustered together with DUF1843-domain RNP precursors carrying a YXioGDL motif and described by Pfam model PF08898 (see Haft et al. Nucleic Acids Res. 31, 371-3 (2003); Haft and Basu, J. Bacteriol. 193, 2745-2755 (2011); NCBI database on
  • the rSAM enzyme for use in the present invention comprises (A) an amino acid sequence according to Formula (I) (SEQ ID NO: 1) or (II) (SEQ ID NO: 2)
  • ⁇ - ⁇ 20 and Zi-Z 20 each denote amino acids
  • Xi is selected from the group consisting of Y, H, F and W, preferably Y and H;
  • X 2 is selected from the group consisting of Y, R and H;
  • X 3 is selected from the group consisting of R, K and Q, preferably R;
  • X 4 is selected from the group consisting of I, T and V, preferably I and T;
  • X 5 is selected from the group consisting of R, S and K, preferably R and S;
  • X 6 is selected from the group consisting of H, Y, F and W, preferably H and Y;
  • X 7 is selected from the group consisting of A and S, preferably A;
  • X 8 is selected from the group consisting of V, I and L, preferably V;
  • X 9 is selected from the group consisting of W, Y and F, preferably W;
  • Xio is selected from the group consisting of E, Q, D and K, preferably E;
  • Xii is selected from the group consisting of I, L, V and M, preferably I and L;
  • Xi2 is selected from the group consisting of T and S, preferably T;
  • Xi3 is selected from the group consisting of L, M, I and V, preferably L;
  • Xi4 is selected from the group consisting of K, R, E and Q, preferably K;
  • Xi5 is C
  • Xi6 is selected from the group consisting of N and D, preferably N;
  • Xi7 is selected from the group consisting of L, M, I and V, preferably L;
  • Xi8 is selected from the group consisting of A and S, preferably A;
  • X 2 o is selected from the group consisting of S, Q, E and K, preferably S and Q;
  • Zi is selected from the group consisting of T, D, T, E and N, preferably T and D;
  • Z 2 is selected from the group consisting of R, P, N and A, preferably R, P and A;
  • Z 3 is selected from the group consisting of R, Q, K and L, preferably R;
  • Z 5 is selected from the group consisting of A and S, preferably A;
  • Z 6 is selected from the group consisting of R, K and Q, preferably R;
  • Z 7 is selected from the group consisting of Y, F, H and W, preferably Y;
  • Z 8 is selected from the group consisting of L, M, I and V, preferably L;
  • Z 9 is selected from the group consisting of F, H, S and Y, preferably H, F and S;
  • Z10 is selected from the group consisting of D, E and A, preferably D and E;
  • Zn is selected from the group consisting of D, S and T, preferably T;
  • Z12 is selected from the group consisting of D, E and N, preferably D;
  • Zi3 is selected from the group consisting of Y, F, L and M, preferably Y, F and L;
  • Zi4 is selected from the group consisting of K, Q, R and E, preferably Q. and K;
  • Zi5 is selected from the group consisting of R, K and Q, preferably R;
  • Zi6 is selected from the group consisting of Y, F and W, preferably Y and F;
  • Zi7 is selected from the group consisting of V, I and L, preferably V;
  • Zi9 is selected from the group consisting of V, I and L, preferably V;
  • Z 2 o is selected from the group consisting of H and Y, preferably H; or
  • (C) a functional fragment and/or functional derivative of (A) or (B), preferably a functional fragment of at least 10 amino acids, more preferably at least 15 amino acids of (A) or (B).
  • the percentage identity of related amino acid molecules can be determined with the assistance of known methods. In general, special computer programs are employed that use algorithms adapted to accommodate the specific needs of this task. Preferred methods for determining identity begin with the generation of the largest degree of identity among the sequences to be compared. Preferred computer programs for determining the identity among two amino acid sequences comprise, but are not limited to, TBLASTN, BLASTP, BLASTX, TBLASTX (Altschul et al., J. Mol.
  • the BLAST programs can be obtained from the National Center for Biotechnology Information (NCBI) and from other sources (BLAST handbook, Altschul et al., NCB NLM NIH Bethesda, MD 20894).
  • NCBI National Center for Biotechnology Information
  • the ClustalW program can be obtained from
  • the term "functional derivative" of a (poly)peptide of the present invention is meant to include any (poly)peptide or fragment thereof that has been chemically or genetically modified in its amino acid sequence, e.g. by addition, substitution and/or deletion of amino acid residue(s) and/or has been chemically modified in at least one of its atoms and/or functional chemical groups, e.g. by additions, deletions, rearrangement, oxidation, reduction, etc. as long as the derivative still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
  • a "functional fragment" of the invention is one that forms part of a
  • polypeptide or derivative of the invention still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
  • amino acid sequence of Formula (I) and (II) are based on the sequences disclosed by TIGFRAMs TIGR04103 (Formula (I) above) and TIGR03913 (Formula (II) above) with the specific preferred amino acids defined for ⁇ - ⁇ 20 and ⁇ - ⁇ 2 ⁇
  • the rSAM enzyme for use in the present invention comprises at least one motif selected from the group consisting of
  • motif CXXXCXXC (SEQ ID NO: 3), wherein X is any natural amino acid and wherein the motif CXXXCXXC (SEQ ID NO: 3) is preferably comprised in an N-terminal radical SAM domain;
  • motif CX 9 _i 5 GX 4 C (SEQ ID NO: 6) reads on a motif consisting of amino acid C, 9 to 15 natural amino acids, amino acid G, 4 natural amino acids and amino acid C.
  • the rSAM enzyme for use in the present invention comprises
  • rSAM enzyme further comprises
  • the rSAM enzyme for use in the present invention comprises
  • the rSAM enzyme for use in the present invention comprises an amino acid sequence selected from the group of
  • sequences listed in any of SEQ ID NOs: 12 to 54 preferably SEQ ID NOs: 39 and 40, or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 12 to 54;
  • amino acid sequence having an amino acid sequence identity of at least 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 55 to 113.
  • the present invention is directed to a use of a recombinant vector comprising a nucleic acid encoding an rSAM enzyme as defined above in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a viral or episomal vector, more preferably a vector selected from the group consisting of lentivirus vector, adenovirus vector, baculovirus vector, bacterial vector and yeast vector.
  • the viral vector is a lentivirus vector (see for example System Biosciences, Mountain View, CA, USA), adenovirus vector (see for example ViraPower Adenoviral Expression System, Life Technologies, Carlsbad, CA, USA), baculovirus vector (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.), bacterial vector (see for example Novagen, Darmstadt, Germany)) or yeast vector (see for example ATCC Manassas, Virginia).
  • Vector construction including the operable linkage of a coding sequence with a promoter and other expression control sequences, is within the ordinary skill in the art.
  • a host cell expressing an rSAM enzyme as defined above preferably comprising a recombinant vector as defined above, in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a host cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae cells (see for example Methods in Enzmology, 350, 248, 2002), and Pichia pastoris cells (see for example Pichia Expression Kit Instruction Manual, Invitrogen Corporation, Carlsbad, Calif.); bacterial cells, preferably E.
  • coll cells preferably BL21(DE3), K-12 and derivatives (see for example Applied Microbiology and Biotechnology, 72, 211, 2006), and Bacillus subtilis cells, preferably 1012 wild type, 168 Marburg or WB800N (see for example Westers et al., (2004) Mol. Cell. Res. Volume 1694, Issues 1-3 P:299-310); plant cells, preferably Nicotiana tabacum, and Physcomitrella patens (see e.g. Lau and Sun, Biotechnol Adv.
  • NIH-3T3 mammalian cells see for example Sambrook and Russell, 2001
  • insect cells preferably sf9 insect cells (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.).
  • rSAM-associated protein plpY (SEQ ID NO: 121) is optional but can significantly contribute to the efficiency of rSAM enzyme activity in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG.
  • the present invention is also directed to an rSAM-associated protein comprising an amino acid sequence selected from the group consisting of (a) SEQ ID NO: 121, (b) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with SEQ ID NO: 121, and (c) a functional fragment and/or functional derivative of (a) or (b), preferably a functional fragment of at least 30 amino acids, more preferably at least 45 amino acids of (a) or (b).
  • the rSAM-associated protein is preferably for use in combination with an rSAM enzyme as described above in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG. More preferably, the rSAM-associa- ted protein is expressed by a recombinant vector and/or host cell also expressing the rSAM enzyme as described above.
  • the present invention relates to a method for introducing at least one a- keto-IS 3 -amino acid into (poly)-peptides comprising the steps of:
  • polypeptide substrate in step (iii), preferably both, optionally together with the rSAM-associated protein of optional step (ii), of the above method are provided in the form of a host cell, more preferably are all co-expressed in the host cell as defined above.
  • the host cell for use in step (i) of the above method is an E. coli host cell.
  • step (iv) is followed by step (v), wherein the keto-functionality resulting from step (iv) is reduced chemically, preferably by sodium
  • borohydride or is converted to an imine, preferably the methoxyamine.
  • E. coli expressing an rSAM enzyme as defined above is cultured in a rich medium such as TB medium, LB or YT medium.
  • a culture in the above medium preferably in an Erienmeyer flask or ultra-yield flask, is inoculated with an overnight culture at a concentration of preferably about 1:100.
  • the culture is grown at about 37°C and shaken at, e.g. about 250 RPM until, e.g. an OD 600 of about 1.2-2.0.
  • the culture is cooled, e.g. on ice, and induced with, e.g. IPTG (preferably about ImM final concentration).
  • the culture is shaken at, e.g.
  • the cells are collected by centrifugation, lysed, and the substrate subjected to purification, preferably Ni-affinity purification.
  • purification preferably Ni-affinity purification.
  • the product(s) are verified, e.g. by mass spectrometry of the full length or digested precursors, e.g. NHis-precursors.
  • the keto-functionality may be reduced (e.g. by sodium borohydride) or converted to the imine, preferably the methoxyamine.
  • the present invention is directed to isolated and purified nucleic acids encoding the (poly)peptides for use in the present invention.
  • Fig. 1 a) pip gene cluster encoding precursors (plpAl, A2, and A3), rSAM epimerase (plpD), rSAM excision enzyme (plpX) and associated protein (plpY). b) Protein sequences for core peptides of precursors PlpA2 (SEQ ID NO: 122), PlpA3 (SEQ ID NO: 123), PlpA3-9 (SEQ ID NO: 124) and PcpA (SEQ ID NO: 125).
  • Fig. 3 MS 2 spectra results for 1 (SEQ ID NO: 122). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
  • Fig. 4 MS 2 spectra for 2 (SEQ ID NO: 122). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
  • Fig. 5 MS 2 spectra for 3 (SEQ ID NO: 126). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
  • Fig. 6 MS 2 spectra for 4 (SEQ ID NO: 126). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
  • Fig. 7 Sodium borohydride reduction of a mixture of 1 (SEQ ID NO: 122) and 2 (SEQ ID NO:
  • Fig. 8 HMBC spectra for product 5 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto ⁇ -amino acid.
  • Fig. 9 HMBC spectra for product 6 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto ⁇ -amino acid.
  • Fig. 10 13 C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
  • Fig. 11 13 C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
  • Fig. 12 Reaction catalyzed by PIpX.
  • Fig. 13 Results for all PlpA3-Fx mutants. Shown is the peptide fragment (SEQ ID NOs: 128 to 143) affected by mutation and the respective detection of conversion.
  • th( gene was co-expressed with either of the two precursor genes plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117) located upstream.
  • the translated precursors contain, in addition to an N-terminal leader region of the Nifll family, predicted core regions of 25 and 23 aa, respectively (Fig. lb).
  • the Nifll precursor genes (plpAl and plpA2) were individually cloned with N-terminal His6-tags and a Factor Xa site at the interface of the leader and core and inserted into pACYCDuet-1.
  • the rSAM gene (plpX) was cloned into MCSII of pRSFDuet-1 and constructs were transformed into E. coli BL21 DE3 for protein expression (Fig 2). Under these conditions, transformation of the precursors was not observed. More detailed analysis of the pip cluster (Fig. la) revealed a small conserved gene, plpY (SEQ ID NOs: 120 and 121), located downstream of plpX, an architecture also preserved in other clusters with plpX homologs.
  • HMBC Hetero- nuclear Multiple Bond Correlation
  • PlpA3-Fx was used to investigate the origin of the ⁇ -amino acid moiety by feeding of various 13 C-labeled amino acids to E. coli expression cultures.
  • labels of [l- 13 C] Met, [U- 13 C]Met, [l- 13 C]Tyr, and [U- 13 C]Tyr were detected by MS in the peptide products.
  • NMR-based characterization of the purified core fragment 6 revealed enhancements of carbon signals that were consistent with Met remaining fully intact (Fig. 10), while only CI of Tyr is retained and accounts for the amide carbonyl in the product (Fig. 11).
  • PCC 7327 was co- expressed with its cognate excisase gene partners pcpX and pcpY. The excision reaction was detected at two of the three YG motifs. With a translated core containing 64 aa and three predicted YG motifs, the pep pathway generates a giant natural product that may exceed the size of all specialized metabolites reported to date.
  • Factor Xa protease was purchased from Merck (USA). Restriction enzymes and GluC were purchased from New England Biolabs (USA). Thermo Scientific (USA) Phusion ® DNA polymerase and T4 DNA ligase were used for all PCRs and ligations, respectively. DNA primers were obtained from Microsynth (Switzerland) or Thermo Scientific (USA). Antibiotics (chloramphenicol for pACYCDuet-1 and kanamycin for pRSFDuet-1) were used at a concentration of 25mg/mL in solid and liquid medium.
  • Expression vectors containing NHis 6 -precursor genes were constructed as follows. Mini-preps derived from previously reported plasmids (plpAl-Fx, plpA2-Fx, and p/p3-Fx in pET-28b) (see Morinaka, B. I. et al. Angew. Chem. Int. Ed. 53, 8503-8507 (2014)) containing NHis 6 -precursor genes containing a Factor Xa site (IDGR) at the interface of leader and core peptide were digested with Ncol and EcoRI.
  • IDGR Factor Xa site
  • the precursor peptide inserts were gel-purified, ligated into MCSI of pACYCDuet-1 to give pAlFxACYC, pA2FxACYC, and pA3FxACYC. These precursor constructs were sequence verified. Constructs for the excision enzyme (PlpX) and associated protein (PlpY) were constructed as follows. The gene for the excision enzyme was amplified by PCR (primers PlpX_F,
  • ATCTCTCG AGTTACTTTG CT A AAG CGTA AG C AG A (SEQ ID NO: 145)) and products were gel-purified, digested with Ndel and Xhol and ligated into MCSII of pRSFDuet-1 to give plasmid pXRSF.
  • the gene for the associated protein was amplified by PCR (PlpY_F, GCGAACTCATGA ACTCTAATCAAATACCAAATAAA (SEQ ID NO: 146) and PlpY_R, GCGCAGCTGT- TATGTCAGAAAATTGCT (SEQ ID NO: 147)), gel-purified, digested with BspHI and Sail, ligated into MCSI of pXRSF (cut with Ncol and Sail) to give pXYRSF, and the insert confirmed by sequencing.
  • Precursor constructs were transformed and expressed in E. coli BL21(DE3) cells alone and with pXRSF or pXYRSF. Proteins containing a Factor Xa cleavage site are denoted with an 'Fx'.
  • TB medium (30 mL) containing appropriate antibiotics was inoculated with 300 ⁇ _ overnight culture grown in LB. The cells were grown at 37°C at 250 rpm until an OD 600 of ⁇ 1.6-2.0, cooled on ice for 30 min, induced with IPTG (1 mM final concentration), then shaken for 24 hours (250 rpm, 16 °C). The cells were collected by centrifugation (3,220 x g, 10 min). Proteins were purified using Ni-NTA resin (Macherey-Nagel (Germany)) according to the manufacturer's protocol. 10% glycerol was added to the lysis, wash and elution buffers.
  • Proteins were adsorbed using 0.5 mL Ni-NTA resin, and eluted with 2.5 mL (250 mM imidazole, 50 mM sodium phosphate, 300 mM NaCI, and 10% (v/v) glycerol, pH 8). Elution fractions were desalted on a PD-10 column, digested with Factor Xa or trypsin, and ana- lyzed by LC-MS and MALDI.
  • LC-MS conditions column: Kinetex C18-XB, 2.6 ⁇ , 150 x 4.6 mm; flow rate: 1.0 mL/min; mobile phase/gradient: 95:5 A/B for 5 minutes ramped to 40:60 A/B over 30 minutes.
  • SEQ ID NO: 12 (70% TIGR03913, >WP_052261552)
  • SEQ ID NO: 15 (70% TIGR03913, >OCW56221)
  • SEQ ID NO: 19 (70% TIGR03913, >WP_050043969)
  • SEQ ID NO: 20 (70% TIGR03913, >WP_020539729)
  • SEQ ID NO: 24 (70% TIGR03913, >WP_062765523)
  • SEQ ID NO: 26 (70% TIGR04103, >WP_020737613)
  • SEQ ID NO: 28 (70% TIGR04103, >KIG18351)
  • SEQ ID NO: 29 (70% TIGR04103, >WP_006974883)
  • SEQ ID NO: 30 (70% TIGR04103, >WP_012234464)
  • SEQ ID NO: 32 (70% TIGR04103, >WP_010607032)
  • SEQ ID NO: 33 (70% TIGR04103, >WP_010607027)
  • SEQ ID NO: 34 (70% TIGR04103, >WP_054014533)
  • SEQ ID NO: 38 (70% TIGR04103, >SEA53645)
  • SEQ ID NO: 45 (70% TIGR04103, >WP_015929579)
  • SEQ ID NO: 46 (70% TIGR04103, >SFE54945)
  • SEQ ID NO: 48 (70% TIGR04103, >SFE67100)
  • SEQ ID NO: 50 (70% TIGR04103, >WP_006972642)
  • SEQ ID NO: 52 (70% TIGR04103, >WP_006969608)
  • SEQ ID NO: 54 (70% TIGR04103, >AKV02060)
  • SEQ ID NO: 56 (80% TIGR03913, >WP_052261552)
  • SEQ ID NO: 70 (80% TIGR03913, >WP_006753568)
  • SEQ ID NO: 72 (80% TIGR03913, >WP_062765523)
  • SEQ ID NO: 82 (80% TIGR04103, >WP_012234464)
  • SEQ ID NO: 83 (80% TIGR04103, >WP_002625456)
  • SEQ ID NO: 90 (80% TIGR04103, >WP_002708735) MSTQLRQRRTYAVWEITLKCNLACQHCGSRAGEARQDELSTAEALDLVQQMAEAGIGEVTLIGGEAFLRKD

Landscapes

  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present invention relates to the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one α-keto-ß3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine. Furthermore, the present invention relates to a method for introducing at least one α-keto-ß3-amino acid into (poly)-peptides comprising said XYG motif using a radical S-adenosyl methionine (rSAM) enzyme.

Description

USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-ΚΕΤΟ-β3-ΑΜΙΝΟ ACIDS INTO (POLY)PEPTIDES
The present invention relates to the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-¾3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine. Furthermore, the present invention relates to a method for introducing at least one a-keto-¾3-amino acid into (poly)-peptides comprising said XYG motif using a radical S-adenosyl methionine (rSAM) enzyme.
β-amino acids are widely distributed in nature and are present in natural product-derived drugs such as penicillin, taxol and cocaine. The installation of the β-amino acids in peptides offers, e.g., greater stability (for example from protease-mediated degradation) and can confer structural features distinct from all L-amino acid-containing peptides or natural products, resulting in unique functions and bioactivities. Known methods to incorporate β-amino acids into peptides generally rely on total synthesis, for example, by condensation of monomers in solid- phase synthesis. A second approach relies on in vitro translation, which usually suffers from low incorporation efficiency.
Substantially all known ribosomally synthesized peptides and proteins are comprised of a-amino acids. No post-translational modifying enzymes are known that incorporate β-amino acids into ribosomal products. Biosynthetic routes to β-amino acid-containing peptides typically utilize non-ribosomal peptide synthetases (NRPS) that act on free amino acids or peptide residues. These enzymes are very large multimodular enzymes and are difficult to manipulate for bioengineering or biotechnological applications. For example, the loading of amino acids onto NRPS usually occurs specifically for certain amino acids. Furthermore, the NRPS-type machinery is limited to small peptides with typically less than 15 residues. Therefore, incorporation of β- amino acids into proteins by NRPSs is not possible.
Czekster et al. (Czekster et al., JACS 2016, 138, 5194-5197) report that E. coli elongation factor Tu (EF-Tu) and phenylalanyl-tRNA synthetase collaborate with erythromycin-resistant E. coli ribosomes to incorporate β3-Ρΐΐθ analogs into full length dihydrofolate reductase (DHFR) in vivo. However, this system is rather complex and limited to specific β-amino acid substrates as well as to target proteins that can be modified with the β-amino acid. US 2003/113882 Al discloses a method for producing enantiomerically pure β-amino acids from a-amino acids. For this conversion a lysine 2,3-aminomutase catalyst is required. However, US 2003/113882 Al is silent on the incorporation of IS3-amino acid into peptides or proteins.
In summary, there is a need for methods for the incorporation of diverse IS3-amino acids into peptides or proteins. It is the objective of the present invention to provide new means and methods for introducing IS3-amino acids into peptides and polypeptides.
In a first aspect, the above objective is solved by the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-IS3-amino acid into (poly)- peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine.
It was surprisingly found that radical S-adenosyl methionine (rSAM) enzymes have the capacity to incorporate different types of a-keto-IS3-amino acids into diverse (poly)peptide substrates that comprise one or more of amino acid motif XYG. rSAM enzymes are monomeric enzymes of up to 50 kDa and can be overexpressed in high yields in many cells, e.g. in E. coli (see below). Conversely, peptides generated by NRPSs require huge enzyme complexes and are typically quite specific to result in either a single product or a few closely related analogs only.
Preferred rSAM enzymes for use in the present invention were identified by their surrogate substrate nifll precursor peptides within ribosomally synthesized and post-translationally modified peptide (RiPP) natural product gene clusters. Gene clusters containing NHLP- or Nifll- type precursors are considered a new natural product family (see Freeman et al. Science 338, 387-90 (2012) and Haft et al. BMC Biology 8, 70 (2010)) termed "proteusins". Although proteusin biosynthetic gene clusters are widespread in bacteria, their biosynthetic end products and functions are currently unknown with the exception of the cytotoxic pore-forming polytheonamides (see Hamada et al. Tetrahedron Lett. 35, 719-720 (1994) and Freeman et al. Science 338, 387-90 (2012)). In view of previous in silico studies (see Haft et al. J . Bacteriol. 193, 2745-2755 (2011)) and a search of genomic databases in cyanobacteria, a proteusin pathway containing one or two nifll precursor peptides (containing an XYG motif in the core peptide) as the substrate(s), an rSAM enzyme annotated as SPASM domain, and a small associated protein were surprisingly identified.
For example, a-keto^3-amino acid-comprising (poly)peptide products can conveniently be generated according to the present invention by in vivo co-expression of a (poly)peptide substrate with the rSAM enzymes in bacteria and also other organisms, e.g. yeast, plant cells, mammalian cells or insect cells. In addition to introducing β3-3 ΐτιίηο acid topology into the backbone of ribosomal (poly)peptide products, the rSAM enzymes described herein also introduce a keto functionality that is not present in proteinogenic amino acids. The introduction of a keto functionality into ribosomally synthesized (poly)peptides is of interest, e.g. for site-selective modification in a wide range of applications and is usually carried out in the prior art by (i) non-selective chemical oxidation such as sodium periodate to oxidize all side chains of serine or by (ii) introduction of non-canonical amino acids by an unnatural amino acid mutagenesis strategy based on recoding of stop codons.
In a preferred example for the use of an rSAM according to the present invention, the pro- teusin cluster, e.g. from Pleurocapsa sp. PCC 7319, containing genes for two substrates (e.g. plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117), an rSAM SPASM protein (e.g. plpX (SEQ ID NOs: 118 and 119), and optionally an associated protein (e.g. plpY (SEQ ID NOs: 120 and 121) can be cloned into expression vectors for expression in E. coli. The associated protein, e.g. plpY is only optional, e.g. to increase the enzymatic efficiency of the rSAM and is not required for practicing the present invention. The substrate peptides (e.g. plpA2 and plpA3) can be cloned into an expression vector, e.g. with an N-terminal His6 tag for purification by affinity chromatography. The rSAM enzyme and the optional associated protein (e.g. pip X and plpY) can be cloned into a second expression vector for convenient co-transformation, e.g. with a substrate-containing vector. The substrate peptide genes can be individually transformed with the rSAM enzyme and the optional associated protein (e.g. plpX and plpY), e.g. into E. coli and protein overexpression can be carried out in standard medium. Following purification, e.g. Ni-affinity purification of the N-terminally hexahistidine-tagged (NHis6) peptides, and in vitro cleavage of leader, e.g. with a protease, the core (poly)peptide comprising the a-keto^3-amino acid(s) can be analyzed, e.g. by MALDI-MS, and optionally the a-keto^-amino acid(s) can be reduced to the corresponding β-amino acid(s) using standard chemical methods, e.g. using sodium borohydride.
Also, the introduction of a-keto^3-amino acids into (poly)peptides using the rSAM enzymes according to the present invention can be carried out in vitro.
Without wishing to be bound by theory, it is assumed that the rSAM enzyme for use in the present invention excises a structure from the XYG motif that has a mass of C8H9NO, as determined by high resolution mass spectrometry and is attributed to the tyrosine residue. The structures of the transformed substrates are products containing the a-keto^3-amino acid. The XYG motif is conserved among a wide range of precursors, e.g. from related cyanobacterial gene clusters, from bacteria of the genus Thiothrix, Pseudoalteromonas, Frankia, Bulkholderia, or Nitrospirilium and serves as general tyrosine excision site for placement of a-keto^3-amino acids. Also, precursors carrying more than one XYG motif can be converted to products with several a-keto^3-amino acids after expression of the rSAM enzyme in E. coli.
The term β-amino acid, as used herein, refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has its amino group bonded to the β-carbon rather than the a carbon.
The term a-keto^3-amino acid, as used herein, refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has a keto group at the a carbon and its amino group bonded to the 3-ΰ3^οη rather than the a carbon.
The term (poly)peptide, as used herein, is meant to encompass peptides, polypeptides, oligopeptides and proteins that comprise two or more amino acids linked covalently through peptide bonds. The term does not refer to a specific length of the product. The term (poly)- peptide includes (poly)peptides with post-translational modifications, for example, glycosylates, acetylations, phosphorylations and the like, as well as (poly)peptides comprising non- natural or non-conventional amino acids and functional derivatives as described below.
The term non-natural or non-conventional amino acid refers to naturally occurring or naturally not occurring unnatural amino acids or chemical amino acid analogues, e.g. D-amino acids, α,α-disubstituted amino acids, N-alkyl amino acids, homo-amino acids, dehyd roamino acids, aromatic amino acids (other than phenylalanine, tyrosine and tryptophan), and ortho-, meta- or para-aminobenzoic acid. Non-conventional amino acids also include compounds which have an amine and carboxyl functional group separated in a 1,3 or larger substitution pattern, such as β-alanine, y-amino butyric acid, Freidinger lactam, the bicyclic dipeptide (BTD) , amino- methyl benzoic acid and others well known in the art. Statine-like isosteres, hydroxyethylene isosteres, reduced amide bond isosteres, thioamide isosteres, urea isosteres, carbamate isosteres, thioether isosteres, vinyl isosteres and other amide bond isosteres known to the art may also be used. A non limiting list of non-conventional amino acids which may be comprised in the (poly)peptide and their standard abbreviations (in brackets) is as follows: a-aminobutyric acid (Abu), L-N-methylalanine (Nmala), α-amino-a-methylbutyrate (Mgabu), L-N-methylarginine (Nmarg), aminocyclopropane (Cpro), L-N-methylasparagine (Nmasn), carboxylate L-N-methyl- aspartic acid (Nmasp), aniinoisobutyric acid (Aib), L-N-methylcysteine (Nmcys), aminonorbornyl (Norb), L-N-methylglutamine (Nmgln), carboxylate L-N-methylglutamic acid (Nmglu), cyclohexyl- alanine (Chexa), L-N-methylhistidine (Nmhis), cyclopentylalanine (Cpen), L-N-methylisolleucine (Nmile), L-N-methylleucine (Nmleu), L-N-methyllysine (Nmlys), L-N-methylmethionine (Nmmet), L-N-methylnorleucine (Nmnle), L-N-methylnorvaline (Nmnva), L-N-methylornithine (Nmorn), L- N-methylphenylalanine (Nmphe), L-N-methylproline (Nmpro), L-N-methylserine (Nmser), L-N- methylthreonine (Nmthr), L-N-methyltryptophan (Nmtrp), D-ornithine (Dorn), L-N-methyltyro- sine (Nmtyr), L-N-methylvaline (Nmval), L-N-methylethylglycine (Nmetg), L-N-methyl-t-butyl- glycine (Nmtbug), L-norleucine (Nle), L-norvaline (Nva), a-methyl-aminoisobutyrate (Maib), a- methyl-y-aminobutyrate (Mgabu), D-a-methylalanine (Dmala), a-methylcyclohexylalanine (Mchexa), D-a-methylarginine (Dmarg), a-methylcylcopentylalanine (Mcpen), D-a-methylaspara- gine (Dmasn), α-methyl-a-napthylalanine (Manap), D-a-methylaspartate (Dmasp), a-methyl- penicillamine (Mpen), D-a-methylcysteine (Dmcys), N-(4-aminobutyl)glycine (Nglu), D-a- methylglutamine (Dmgln), N-(2-aminoethyl)glycine (Naeg), D-a-methylhistidine (Dmhis), N-(3 - aminopropyl)glycine (Norn), D-a-methylisoleucine (Dmile), N-amino-a-methylbutyrate
(Nmaabu), D-a-methylleucine (Dmleu), a-napthylalanine (Anap), D-a-methyllysine (Dmlys), N- benzylglycine (Nphe), D-a-methylmethionine (Dmmet), N-(2-carbamylethyl)glycine (Ngln), D-a- methylornithine (Dmorn), N-(carbamylmethyl)glycine (Nasn), D-a-methylphenylalanine (Dmphe), N-(2-carboxyethyl)glycine (Nglu), D-a-methylproline (Dmpro), N-(carboxymethyl)glycine (Nasp), D-a-methylserine (Dmser), N-cyclobutylglycine (Ncbut), D-a-methylthreonine (Dmthr), N-cyclo- heptylglycine (Nchep), D-a-methyltryptophan (Dmtrp), N-cyclohexylglycine (Nchex), D-a-methyl- tyrosine (Dmty), N-cyclodecylglycine (Ncdec), D-a-methylvaline (Dmval), N-cylcododecylglycine (Ncdod), D-N-methylalanine (Dnmala), N-cyclooctylglycine (Ncoct), D-N-methylarginine
(Dnmarg), N-cyclopropylglycine (Ncpro), D-N-methylasparagine (Dnmasn), N-cycloundecylglycine (Ncund), D-N-methylaspartate (Dnmasp), N-(2,2-diphenylethyl)glycine (Nbhm), D-N-methylcys- teine (Dnmcys), N-(3,3-diphenylpropyl)glycine (Nbhe), D-N-methylglutamine (Dnmgln), N-(3 - guanidinopropyl)glycine (Narg), D-N-methylglutamate (Dnmglu), N-( 1 -hydroxyethyl)glycine (Ntbx), D-N-methylhistidine (Dnmhis), N-(hydroxyethyl))glycine (Nser), D-N-methylisoleucine (Dnmile), N-(imidazolylethyl))glycine (Nhis), D-N-methylleucine (Dnmleu), N-(3 -indolylyethyl)- glycine (Nhtrp), D-N-methyllysine (Dnnilys), N-methyl-y-aminobutyrate (Nmgabu), N-methyl- cyclohexylalanine (Nmchexa), D-N-methylmethionine (Dnmmet), D-N-methylornithine (Dnmorn), N-methylcyclopentylalanine (Nmcpen), N-methylglycine (Nala), D-N-methylphenylalanine (Dnmphe), N-methylaminoisobutyrate (Nmaib), D-N-methylproline (Dnmpro), N-( 1 -methyl- propyl)glycine (Nile), D-N-methylserine (Dnmser), N-(2-methylpropyl)glycine (Nleu), D-N-methyl- threonine (Dnmthr), D-N-methyltryptophan (Dnmtrp), N-(l-methylethyl)glycine (Nval), D-N- methyltyrosine (Dnmtyr), N-methyla-napthylalanine (Nmanap), D-N-methylvaline (Dnmval), N- methylpenicillamine (Nmpen), y-aminobutyric acid (Gabu), N-(p-hydroxyphenyl)glycine (Nhtyr), L-/-butylglycine (Tbug), N-(thiomethyl)glycine (Ncys), L-ethylglycine (Etg), penicillamine (Pen), L- homophenylalanine (Hphe), L-a-methylalanine (Mala), L-a-methylarginine (Marg), L-a-methyl- asparagine (Masn), L-a-methylaspartate (Masp), L-a-methyl-t-butylglycine (Mtbug), L-a-methyl- cysteine (Mcys), L-methylethylglycine (Metg), L-a-methylglutamine (Mgln), L-a-methylglutamate (Mglu), L-a-methylhistidine (Mhis), L-a-methylhomophenylalanine (Mhphe), L-a-methyliso- leucine (Mile), N-(2-methylthioethyl)glycine (Nmet), L-a-methylleucine (Mleu), L-a-methyllysine (Mlys), L-a-methylmethionine (Mmet), L-a-methylnorleucine (Mnle), L-a-methylnorvaline (Mnva), L-a-methylornithine (Morn), L-a-methylphenylalanine (Mphe), L-a-methylproline (Mpro), L-a-methylserine (Mser), L-a-methylthreonine (Mthr), L-a-methyltryptophan (Mtrp), L- a-methyltyrosine (Mtyr), L-a-methylvaline (Mval), L-N-methylhomophenylalanine (Nmhphe), N- (N-(2,2-diphenylethyl)carbamylmethyl)glycine (Nnbhm), N-(N-(3 ,3 -diphenylpropyl)carbamyl- methyl)glycine (Nnbhe), l-carboxy-l-(2,2-diphenyl-ethylamino)cyclopropane (Nmbc), L-O-methyl serine (Omser), L-O-methyl homoserine (Omhser).
In a preferred embodiment, the rSAM enzyme for use in the present invention is a peptide radical SAM maturase, preferably a nifll class peptide radical SAM maturase 3. The nifll-class peptide radical SAM maturase 3 (IPR026482) belongs to the conserved protein domain family rSAM_nifll_3 (nifll-class peptide radical SAM maturase 3; IPR026482) or the rad_SAM_trio family (radical SAM GDL-associated; IPR023820) as defined by the National Center for Biotechnology Information (NCBI). Members of the rSAM_nifll_3 family are radical SAM enzymes that often occur co-clustered together with nifll-related ribosomal natural product (RNP) precursors described by TIGRFAMs model TIGR03798. Members of the rad_SAM_trio family are radical SAM enzymes that often occur co-clustered together with DUF1843-domain RNP precursors carrying a YXioGDL motif and described by Pfam model PF08898 (see Haft et al. Nucleic Acids Res. 31, 371-3 (2003); Haft and Basu, J. Bacteriol. 193, 2745-2755 (2011); NCBI database on
"rSAM_nifll_3" and "rad_SAM_trio").
In a preferred embodiment, the rSAM enzyme for use in the present invention comprises (A) an amino acid sequence according to Formula (I) (SEQ ID NO: 1) or (II) (SEQ ID NO: 2)
FormulS (I); Xi-X2~X3~X4~X5~X6~X7~X8~^9~^10~^ll~^12~^13~^14~^15~^16~^17~^18~^19~^20/
Formula (II): Z1-Z2-Z3-Z4-Z5-Z6-Z7-Z8-Z9-Z1o-Z11-Z12-Z13-Z14-Z15-Z16-Z17-Z18-Z19-Z2o,
wherein Χι-Χ20 and Zi-Z20 each denote amino acids and
Xi is selected from the group consisting of Y, H, F and W, preferably Y and H;
X2 is selected from the group consisting of Y, R and H; X3 is selected from the group consisting of R, K and Q, preferably R;
X4 is selected from the group consisting of I, T and V, preferably I and T;
X5 is selected from the group consisting of R, S and K, preferably R and S; X6 is selected from the group consisting of H, Y, F and W, preferably H and Y; X7 is selected from the group consisting of A and S, preferably A;
X8 is selected from the group consisting of V, I and L, preferably V;
X9 is selected from the group consisting of W, Y and F, preferably W;
Xio is selected from the group consisting of E, Q, D and K, preferably E;
Xii is selected from the group consisting of I, L, V and M, preferably I and L; Xi2 is selected from the group consisting of T and S, preferably T;
Xi3 is selected from the group consisting of L, M, I and V, preferably L;
Xi4 is selected from the group consisting of K, R, E and Q, preferably K;
Xi5 is C;
Xi6 is selected from the group consisting of N and D, preferably N;
Xi7 is selected from the group consisting of L, M, I and V, preferably L;
Xi8 is selected from the group consisting of A and S, preferably A;
X2o is selected from the group consisting of S, Q, E and K, preferably S and Q; Zi is selected from the group consisting of T, D, T, E and N, preferably T and D; Z2 is selected from the group consisting of R, P, N and A, preferably R, P and A; Z3 is selected from the group consisting of R, Q, K and L, preferably R;
Z4 is P;
Z5 is selected from the group consisting of A and S, preferably A;
Z6 is selected from the group consisting of R, K and Q, preferably R;
Z7 is selected from the group consisting of Y, F, H and W, preferably Y;
Z8 is selected from the group consisting of L, M, I and V, preferably L;
Z9 is selected from the group consisting of F, H, S and Y, preferably H, F and S;
Z10 is selected from the group consisting of D, E and A, preferably D and E;
Zn is selected from the group consisting of D, S and T, preferably T;
Z12 is selected from the group consisting of D, E and N, preferably D;
Zi3 is selected from the group consisting of Y, F, L and M, preferably Y, F and L;
Zi4 is selected from the group consisting of K, Q, R and E, preferably Q. and K;
Zi5 is selected from the group consisting of R, K and Q, preferably R; Zi6 is selected from the group consisting of Y, F and W, preferably Y and F; Zi7 is selected from the group consisting of V, I and L, preferably V;
Zi9 is selected from the group consisting of V, I and L, preferably V; and
Z2o is selected from the group consisting of H and Y, preferably H; or
(B) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with at least one of the amino acid sequences of Formula (I) or (II), more preferably an amino acid sequence having at least 14, 16, 18 or 19 of the 20 amino acids of Formula (I) or (II); or
(C) a functional fragment and/or functional derivative of (A) or (B), preferably a functional fragment of at least 10 amino acids, more preferably at least 15 amino acids of (A) or (B). The percentage identity of related amino acid molecules can be determined with the assistance of known methods. In general, special computer programs are employed that use algorithms adapted to accommodate the specific needs of this task. Preferred methods for determining identity begin with the generation of the largest degree of identity among the sequences to be compared. Preferred computer programs for determining the identity among two amino acid sequences comprise, but are not limited to, TBLASTN, BLASTP, BLASTX, TBLASTX (Altschul et al., J. Mol. Biol., 215, 403-410, 1990), or ClustalW (Larkin MA et al., Bioinformatics, 23, 2947-2948, 2007). The BLAST programs can be obtained from the National Center for Biotechnology Information (NCBI) and from other sources (BLAST handbook, Altschul et al., NCB NLM NIH Bethesda, MD 20894). The ClustalW program can be obtained from
http://www.clustal.org.
The term "functional derivative" of a (poly)peptide of the present invention is meant to include any (poly)peptide or fragment thereof that has been chemically or genetically modified in its amino acid sequence, e.g. by addition, substitution and/or deletion of amino acid residue(s) and/or has been chemically modified in at least one of its atoms and/or functional chemical groups, e.g. by additions, deletions, rearrangement, oxidation, reduction, etc. as long as the derivative still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
In this context a "functional fragment" of the invention is one that forms part of a
(poly)peptide or derivative of the invention and still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
The amino acid sequence of Formula (I) and (II) are based on the sequences disclosed by TIGFRAMs TIGR04103 (Formula (I) above) and TIGR03913 (Formula (II) above) with the specific preferred amino acids defined for Χι-Χ20 and Ζι-Ζ2ο·
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises at least one motif selected from the group consisting of
(i) motif CXXXCXXC (SEQ ID NO: 3), wherein X is any natural amino acid and wherein the motif CXXXCXXC (SEQ ID NO: 3) is preferably comprised in an N-terminal radical SAM domain;
(ii) motif EXTXXCXXXCXXCGXRXXXXRXXEL (SEQ ID NO: 4), wherein X is any natural amino acid;
(iii) motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5), wherein X is any natural amino acid; and
(iv) motif CX9_15GX4C (SEQ ID NO: 6) and CX2CX5CX3CX14_18C (SEQ ID NO: 7), wherein X is any natural amino acid and the integers denote the number of X(s), and wherein the motif CX9_ 15GX4C (SEQ ID NO: 6) and CX2CX5CX3CX14-18C (SEQ ID NO: 7) is preferably comprised in a C- terminal SPASM domain.
For example, motif CX9_i5GX4C (SEQ ID NO: 6) reads on a motif consisting of amino acid C, 9 to 15 natural amino acids, amino acid G, 4 natural amino acids and amino acid C.
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises
(i) an amino acid sequence according to Formula (I) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (I);
(ii) a motif EXTXXCXXXCXXCGXRXXXXRXXEL (SEQ ID NO: 4), wherein X is any natural amino acid;
and wherein the rSAM enzyme further comprises
(iii) a motif CXXGXXXXXXXXXGXXKXCP (SEQ ID NO: 8) and/or
GXCXXCXXXXXCXXXCXXXXXXXXXXXGXNPXCXXR (SEQ ID NO: 9), wherein X is any natural amino acid.
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises
(i) an amino acid sequence according to Formula (II) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (II); (ii) a motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5), wherein X is any natural amino acid; and wherein the rSAM enzyme further comprises
(iii) a motif CXAGXXXXXEADGXXKXCPXL (SEQ ID NO: 10) and/or
CXXCYYXXXCXXGCXWXXXXLXGXXGXNPXCXXR (SEQ ID NO: 11), wherein X is any natural amino acid.
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises an amino acid sequence selected from the group of
(i) sequences listed in any of SEQ ID NOs: 12 to 54, preferably SEQ ID NOs: 39 and 40, or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 12 to 54; and
(ii) sequences listed in any of SEQ ID NOs: 55 to 113, preferably SEQ ID NOs: 93 and 94 or an
amino acid sequence having an amino acid sequence identity of at least 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 55 to 113.
In another aspect, the present invention is directed to a use of a recombinant vector comprising a nucleic acid encoding an rSAM enzyme as defined above in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a viral or episomal vector, more preferably a vector selected from the group consisting of lentivirus vector, adenovirus vector, baculovirus vector, bacterial vector and yeast vector.
The selection of a suitable vector and expression control sequences as well as vector construction are within the ordinary skill in the art. Preferably, the viral vector is a lentivirus vector (see for example System Biosciences, Mountain View, CA, USA), adenovirus vector (see for example ViraPower Adenoviral Expression System, Life Technologies, Carlsbad, CA, USA), baculovirus vector (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.), bacterial vector (see for example Novagen, Darmstadt, Germany)) or yeast vector (see for example ATCC Manassas, Virginia). Vector construction, including the operable linkage of a coding sequence with a promoter and other expression control sequences, is within the ordinary skill in the art.
In a further preferred embodiment is directed to a use of a host cell expressing an rSAM enzyme as defined above, preferably comprising a recombinant vector as defined above, in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a host cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae cells (see for example Methods in Enzmology, 350, 248, 2002), and Pichia pastoris cells (see for example Pichia Expression Kit Instruction Manual, Invitrogen Corporation, Carlsbad, Calif.); bacterial cells, preferably E. coll cells, preferably BL21(DE3), K-12 and derivatives (see for example Applied Microbiology and Biotechnology, 72, 211, 2006), and Bacillus subtilis cells, preferably 1012 wild type, 168 Marburg or WB800N (see for example Westers et al., (2004) Mol. Cell. Res. Volume 1694, Issues 1-3 P:299-310); plant cells, preferably Nicotiana tabacum, and Physcomitrella patens (see e.g. Lau and Sun, Biotechnol Adv. 2009 27(6):1015-22); NIH-3T3 mammalian cells (see for example Sambrook and Russell, 2001); and insect cells, preferably sf9 insect cells (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.).
The use of the rSAM-associated protein plpY (SEQ ID NO: 121) is optional but can significantly contribute to the efficiency of rSAM enzyme activity in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG. Therefore, the present invention is also directed to an rSAM-associated protein comprising an amino acid sequence selected from the group consisting of (a) SEQ ID NO: 121, (b) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with SEQ ID NO: 121, and (c) a functional fragment and/or functional derivative of (a) or (b), preferably a functional fragment of at least 30 amino acids, more preferably at least 45 amino acids of (a) or (b).
The rSAM-associated protein is preferably for use in combination with an rSAM enzyme as described above in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG. More preferably, the rSAM-associa- ted protein is expressed by a recombinant vector and/or host cell also expressing the rSAM enzyme as described above.
In another aspect, the present invention relates to a method for introducing at least one a- keto-IS3-amino acid into (poly)-peptides comprising the steps of:
(i) providing a radical S-adenosyl methionine (rSAM) enzyme as defined above, and/or a host cell as defined above,
(ii) optionally providing an rSAM-associated protein as defined above,
(iii) providing at least one (poly)peptide substrate of interest comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, and (iv) contacting the enzyme and/or host cell of (i) with the substrate of (iii), and optionally the rSAM-associated protein of (ii), under conditions suitable for the enzymatic introduction of at least one a-keto^3-amino acid into the substrate.
In a preferred embodiment, the at least one of the enzyme in step (i) and the
(poly)peptide substrate in step (iii), preferably both, optionally together with the rSAM-associated protein of optional step (ii), of the above method are provided in the form of a host cell, more preferably are all co-expressed in the host cell as defined above.
In a further preferred embodiment, the host cell for use in step (i) of the above method is an E. coli host cell. In addition, it is preferred that step (iv) is followed by step (v), wherein the keto-functionality resulting from step (iv) is reduced chemically, preferably by sodium
borohydride, or is converted to an imine, preferably the methoxyamine.
As an example of the above method, E. coli expressing an rSAM enzyme as defined above is cultured in a rich medium such as TB medium, LB or YT medium. A culture in the above medium, preferably in an Erienmeyer flask or ultra-yield flask, is inoculated with an overnight culture at a concentration of preferably about 1:100. The culture is grown at about 37°C and shaken at, e.g. about 250 RPM until, e.g. an OD600 of about 1.2-2.0. The culture is cooled, e.g. on ice, and induced with, e.g. IPTG (preferably about ImM final concentration). The culture is shaken at, e.g. about 16°C at about 250 RPM for, e.g. 24 hours and the cells are collected by centrifugation, lysed, and the substrate subjected to purification, preferably Ni-affinity purification. Following purification, the product(s) are verified, e.g. by mass spectrometry of the full length or digested precursors, e.g. NHis-precursors. The keto-functionality may be reduced (e.g. by sodium borohydride) or converted to the imine, preferably the methoxyamine.
In a further aspect, the present invention is directed to isolated and purified nucleic acids encoding the (poly)peptides for use in the present invention.
The following Figures and Examples serve to illustrate the invention and are not intended to limit the scope of the invention as described in the appended claims.
Fig. 1: a) pip gene cluster encoding precursors (plpAl, A2, and A3), rSAM epimerase (plpD), rSAM excision enzyme (plpX) and associated protein (plpY). b) Protein sequences for core peptides of precursors PlpA2 (SEQ ID NO: 122), PlpA3 (SEQ ID NO: 123), PlpA3-9 (SEQ ID NO: 124) and PcpA (SEQ ID NO: 125).
Fig. 2: Detection of PlpX activity by coexpression experiments in E. coli. HPLC chromato- grams indicating starting material (SM). New products 1-4 are only detected in coexpression experiments with PlpX and Y. The inlays are extracted mass spectra (t = 27.6 - 28.6 min for PlpA2-Fx and t = 21.3 - 22.5 min for PlpA3-Fx) from the corresponding LC-MS chromatograms. In all cases the [M+H]2+ ions are shown.
Fig. 3: MS2 spectra results for 1 (SEQ ID NO: 122). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 4: MS2 spectra for 2 (SEQ ID NO: 122). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 5: MS2 spectra for 3 (SEQ ID NO: 126). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 6: MS2 spectra for 4 (SEQ ID NO: 126). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 7: Sodium borohydride reduction of a mixture of 1 (SEQ ID NO: 122) and 2 (SEQ ID NO
122).
Fig. 8: HMBC spectra for product 5 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto^-amino acid.
Fig. 9: HMBC spectra for product 6 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto^-amino acid.
Fig. 10: 13C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
Fig. 11: 13C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
Fig. 12: Reaction catalyzed by PIpX.
Fig. 13: Results for all PlpA3-Fx mutants. Shown is the peptide fragment (SEQ ID NOs: 128 to 143) affected by mutation and the respective detection of conversion.
Example 1: The function of plpX - a new rSAM
To investigate the function of pi pX (SEQ ID NOs: 118 and 119) encoding the new rSAM, th( gene was co-expressed with either of the two precursor genes plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117) located upstream. The translated precursors contain, in addition to an N-terminal leader region of the Nifll family, predicted core regions of 25 and 23 aa, respectively (Fig. lb).
The Nifll precursor genes (plpAl and plpA2) were individually cloned with N-terminal His6-tags and a Factor Xa site at the interface of the leader and core and inserted into pACYCDuet-1. The rSAM gene (plpX) was cloned into MCSII of pRSFDuet-1 and constructs were transformed into E. coli BL21 DE3 for protein expression (Fig 2). Under these conditions, transformation of the precursors was not observed. More detailed analysis of the pip cluster (Fig. la) revealed a small conserved gene, plpY (SEQ ID NOs: 120 and 121), located downstream of plpX, an architecture also preserved in other clusters with plpX homologs. When plpY was included in E. coli co-expressions, the presence of two additional broad peaks in the LCMS chromatograms that overlapped with the unmodified PlpA2-Fx (products 1 (SEQ ID NO: 122) and 2 (SEQ ID NO: 122)) and PlpA3-Fx (products 3 (SEQ ID NO: 126) and 4 (SEQ ID NO: 126)) (Fig. 2) was observed.
Example 2: Characterization of the products
For initial characterization of the products, a combination of high resolution and tandem mass spectrometric (MS2) experiments were performed (Fig. 3-6). These data showed that the mass loss could be attributed to a loss of -C8H9NO comprising four degrees of unsaturation and localized the modification to the tyrosine residues (Y21 and Y6, respectively) in the PlpA2 and PlpA3 core, which are part of a conserved YG core motif in gene clusters encoding PlpX homo- logs. Assuming that the atoms were covalently bonded and the information above it was proposed that an extraordinary excision of a tyramine equivalent from the backbone had taken place. In this scenario, excision of tyramine is followed by concomitant C-C bond formation of Cl-Leu/Met (PlpA2/PlpA3) and Cl-Tyr resulting in formation of the corresponding a-keto-β3- amino moiety. This functional group is known to be reactive under reductive conditions and indeed, in the presence of sodium borohydride a mass shift in the HRMS spectrum for the corresponding alcohol was observed (Figure 7).
To obtain more detailed structural information by NMR, it was found that simultaneous digestion (trypsin/chymotrypsin) of the co-expression of PlpA3-Fx + PlpXY cleanly provided a peptide fragment (Ala-1 to Trp-12) composed of the two isomeric products (5 and 6 (SEQ ID NO: 127)). Each product was purified to homogeneity by reversed phase HPLC and the samples were analyzed by NMR. Heteronuclear two- and three-bond correlations observed in the Hetero- nuclear Multiple Bond Correlation (HMBC) spectra showed cross peaks to the newly formed ketone and amide carbonyls that possess characteristic chemical shifts of δ ~195 and ~160 ppm, respectively, that have been reported in other natural products (Figs. 8 and 9 and Fukuhara, K. et al., Org. Lett. 17, 2646-2648 (2015)).
PlpA3-Fx was used to investigate the origin of the β-amino acid moiety by feeding of various 13C-labeled amino acids to E. coli expression cultures. For individual feeding experiments, labels of [l-13C] Met, [U-13C]Met, [l-13C]Tyr, and [U-13C]Tyr were detected by MS in the peptide products. NMR-based characterization of the purified core fragment 6 revealed enhancements of carbon signals that were consistent with Met remaining fully intact (Fig. 10), while only CI of Tyr is retained and accounts for the amide carbonyl in the product (Fig. 11). These data confirm that PlpX catalyzes an extraordinary reaction involving excision of almost the entire Tyr moiety (only excluding the carbonyl) and reconnection of the remaining protein sections (Fig. 12). This modification, representing a non-canonical splicing process, is unprecedented for proteins.
To obtain further insights into the prevalence and distribution of this modification, a large- scale sequence analysis of taxonomically diverse Cyanobacteria was performed that comprised 436 combined published and newly sequenced organisms. Homologs of the rSAM genes were detected in 53 of these 436 genomes, all of them located in proteusin gene clusters, suggesting the generation of β-amino acid products. All precursors contained at least one YG motif and further YG copies were identified in many of the cores. To test whether these sites collectively direct multiple tyrosine excision events, pcpA (Fig. lb) from Pleurocapsa sp. PCC 7327 was co- expressed with its cognate excisase gene partners pcpX and pcpY. The excision reaction was detected at two of the three YG motifs. With a translated core containing 64 aa and three predicted YG motifs, the pep pathway generates a giant natural product that may exceed the size of all specialized metabolites reported to date.
The introduction of an a-keto^3-amino amide by PlpX has wide-ranging potential applications in drug discovery, chemical biology, and synthetic biology. To determine the versatility of PlpX a series of mutations at the site converted to the a-keto^-amino amide in PlpA3-Fx (M5) was created. The results show that PlpX is very promiscuous and can convert nearly every type of residues (Fig. 13).
Example 3: Materials and Methods
General Experimental Procedures. Factor Xa protease was purchased from Merck (USA). Restriction enzymes and GluC were purchased from New England Biolabs (USA). Thermo Scientific (USA) Phusion® DNA polymerase and T4 DNA ligase were used for all PCRs and ligations, respectively. DNA primers were obtained from Microsynth (Switzerland) or Thermo Scientific (USA). Antibiotics (chloramphenicol for pACYCDuet-1 and kanamycin for pRSFDuet-1) were used at a concentration of 25mg/mL in solid and liquid medium. Protino® Ni-NTA resin, Nucleospin plasmid and gel purification kits were purchased from Macherey-Nagel (Germany). LC-MS experiments were performed on a Dionex Ultimate 3000 UHPLC coupled to a Thermo Scientific (USA) Qexactive mass spectrometer. LC-MS measurements were carried out using solvents A (water + 0.1% formic acid) and B (acetonitrile + 0.1% formic acid). All HPLC columns were purchased from Phenomenex (USA). NMR spectra were acquired using a Bruker (USA) 500 MHz Avance III equipped with a 5 mm TCI cryoprobe. 13C-labeled amino acids used in labeling experiments was purchased from Cambridge Isotope Labs (USA).
Cloning of precursor and excision enzyme constructs for protein expression. Expression vectors containing NHis6-precursor genes were constructed as follows. Mini-preps derived from previously reported plasmids (plpAl-Fx, plpA2-Fx, and p/p3-Fx in pET-28b) (see Morinaka, B. I. et al. Angew. Chem. Int. Ed. 53, 8503-8507 (2014)) containing NHis6-precursor genes containing a Factor Xa site (IDGR) at the interface of leader and core peptide were digested with Ncol and EcoRI. The precursor peptide inserts were gel-purified, ligated into MCSI of pACYCDuet-1 to give pAlFxACYC, pA2FxACYC, and pA3FxACYC. These precursor constructs were sequence verified. Constructs for the excision enzyme (PlpX) and associated protein (PlpY) were constructed as follows. The gene for the excision enzyme was amplified by PCR (primers PlpX_F,
CTCGCATATGACTAAAAAATACAGACGAGTTAGTTAT (SEQ ID NO: 144) and PlpX_R,
ATCTCTCG AGTTACTTTG CT A AAG CGTA AG C AG A (SEQ ID NO: 145)) and products were gel-purified, digested with Ndel and Xhol and ligated into MCSII of pRSFDuet-1 to give plasmid pXRSF. Following sequence verification, the gene for the associated protein was amplified by PCR (PlpY_F, GCGAACTCATGA ACTCTAATCAAATACCAAATAAA (SEQ ID NO: 146) and PlpY_R, GCGCAGCTGT- TATGTCAGAAAATTGCT (SEQ ID NO: 147)), gel-purified, digested with BspHI and Sail, ligated into MCSI of pXRSF (cut with Ncol and Sail) to give pXYRSF, and the insert confirmed by sequencing. Precursor constructs were transformed and expressed in E. coli BL21(DE3) cells alone and with pXRSF or pXYRSF. Proteins containing a Factor Xa cleavage site are denoted with an 'Fx'.
Cloning of PlpA3-Fx mutants. Mutants were constructed using New England Biolabs (USA) Q5 site-directed mutagenisis kit according to the manufacture's protocol. pA3FxACYC was used as a template. The identity of constructs was verified by sequencing. Each precursor was expressed alone and with pXYRSF in BL21 (DE3).
Protein expression and purification of precursors. TB medium (30 mL) containing appropriate antibiotics was inoculated with 300 μΙ_ overnight culture grown in LB. The cells were grown at 37°C at 250 rpm until an OD600 of ~1.6-2.0, cooled on ice for 30 min, induced with IPTG (1 mM final concentration), then shaken for 24 hours (250 rpm, 16 °C). The cells were collected by centrifugation (3,220 x g, 10 min). Proteins were purified using Ni-NTA resin (Macherey-Nagel (Germany)) according to the manufacturer's protocol. 10% glycerol was added to the lysis, wash and elution buffers. Proteins were adsorbed using 0.5 mL Ni-NTA resin, and eluted with 2.5 mL (250 mM imidazole, 50 mM sodium phosphate, 300 mM NaCI, and 10% (v/v) glycerol, pH 8). Elution fractions were desalted on a PD-10 column, digested with Factor Xa or trypsin, and ana- lyzed by LC-MS and MALDI. LC-MS conditions: column: Kinetex C18-XB, 2.6μ, 150 x 4.6 mm; flow rate: 1.0 mL/min; mobile phase/gradient: 95:5 A/B for 5 minutes ramped to 40:60 A/B over 30 minutes.
Purification of excision products 5 and 6 from NHis6-PlpA3-Fx + PlpXY. The following procedure as above was carried out with 1L TB medium in an Ultra Yield Flask™ (Thomson, USA). The NHis6-precursor was bound with 6 mL Ni-NTA resin and eluted with 30 mL elution buffer. The elution fraction was buffer exchanged using a PD-10 column which into 100 mM sodium phosphate, 2 mM CaCI2, pH 8 (0.22 μ PES-membrane filtered). To the buffer exchanged elution was added trypsin (1:100, trypsin/precursor (m/m)) and chymotrypsin (1:20 chymotrypsin/pre- cursor (m/m)) and incubated overnight at 37°C. The peptide digest was desalted by C18 SPE (Strata, C18-E, 2g, Phenomenex, USA) to give 39 mg of digested peptide. This material was subjected to reversed phase HPLC (column: Phenyl-Hexyl, 5μ, 10 x 250 mm; flow rate: 4.5 mL/min; column temperature: 75°C) to give products A (1.7 mg) and B (1.8 mg) which were composed of residues 1-12 of the core peptide. Samples were dissolved in 500 μΐ DMSO-d6.
Feeding experiments with 13C -labeled amino acids. Carried out as above except 150 mL TB media in a 500 mL Erienmeyer flasks were used. Four separate experiments were carried out with one labeled amino acid each. At the time of induction 13C -labeled amino acids (1-13C Met, 21 mg; 1-13C Tyr, 25.6 mg; U-13C Met, 24.6 mg; U-13C Tyr, 23.7 mg) were directly added to the medium. The labeled NHis6-precursors were bound with 1 mL Ni-NTA resin and eluted with 6 mL elution buffer. The elution was desalted and purified as above to give the core peptide fragments for each labeling experiment which were subjected to LC-MS and NMR analysis.
Reduction of excision products with sodium borohydride. A portion of the PlpA2 core (100 μg) was dissolved in 1:1 MeOH/H20 (500 μΐ) a solution of sodium borohydride in H20 (500 μΐ, 1 mg/mL)) was added and the mixture left standing for 10 minutes. The reaction was quenched by neutralization with acetic acid and subjected to LC-MS analysis. See fig. 7.
Sequence listing
SEQ ID NO: 1 (Formula I, claim 3)
SEQ ID NO: 2 (Formula II, claim 3)
SEQ ID NO: 3 CXXXCXXC (X is any natural amino acid)
SEQ ID NO: 4 EXTXXCXXXCXXCGXRXXXXRXXEL (X is any natural amino acid)
SEQ ID NO: 5 CXLXCXHCGSRAGXXXXXE (X is any natural amino acid)
SEQ ID NO: 6 CX9-15GX4C (X is any natural amino acid and the integers denote the number of X(s)) SEQ ID NO: 7 CX2CX5CX3CXi4_i8C (X is any natural amino acid and the integers denote the number of X(s))
SEQ ID NO: 8 CXXGXXXXXXXXXGXXKXCP (X is any natural amino acid) SEQ ID NO: 9 GXCXXCXXXXXCXXXCXXXXXXXXXXXGXNPXCXXR (X is any natural amino acid)
SEQ ID NO: 10 CXAGXXXXXEADGXXKXCPXL (X is any natural amino acid)
SEQ ID NO: 11 CXXCYYXXXCXXGCXWXXXXLXGXXGXNPXCXXR (X is any natural amino acid)
SEQ ID NO: 12 (70% TIGR03913, >WP_052261552)
MAATRFLSKSDVETRRPVYVVWETTLACNLKCKHCGSRAGTPRKDELSTEDAFGLIDELADLGTREITLIGGELF
LRKDWLKLVERISSKGILCTMQSGGFHLTRERLKDAKEAGLAAIGISIDGLEKTHNSLRGVRTSFQHALRSLEAA
RDLHITSNVNTTITSKNIDELELLYEVLKGFHVRNWQFQIVVAMGNAADEETLLFQPYQITKVLDSVARIRTSAS
RVGMLVQASNALGYFGPYEWLWERGSDTDSHWSSCGAGQSTMGIEADGTIKACPSLATEDFGVTPSEYDTL
QDLWIGSDRIRFNETRDPPSGSICGACYYWAACKGGCSWATHSLTGTVGENPYCHYRAICLSELGLKEKIRKVR
DAPGSSFDRGEFEVVIEDEEGKTLKSDDPRKHRIIEKISAGREQGASEEALKLCTSCHQFAWEYEKQCPFCSGTK
FVSTRALKSIKSATKFL
SEQ ID NO: 13 (70% TIGR03913, >WP_035619696)
MDKTSTRYRVRDDYETATPVHVVWEITLACNLKCSHCGSRAGKVRPGELTTEQCFGVIDSLKRLGTREISIIGGE
AFLRKDWLEIIERIHQSGIECSMQSGAYNLNEERIIDAKKAGINNIGVSIDGMPDTHNKIRGRRDSFEHAVNCL
QLLKKHNITSSVNTVITKRSKNELNELLDILIENDVKNWQIQLAVAMGNAVDNSDELIVQPYELIDFYDDLIVIYR
KALAHNILIQAGNNIGYFGPYEHIWRQGNEKYYTGCSAGHTGIGIEADGKIKGCPSLPTSAYTGGNVKDMDLE
DIWKYSEEMVFSRYRNKEELWGGCKGCYYESSCLAGCTWTSHVLFGKRGNNPFCHHRALELKKKGLKERIRKI
QEAPGVSFDMGLFEIIVEDENGVIVEIQSPNSETPVLVPDYSARIPRIPKALKLCNGCDNYVYEEETVCSFCNAD
VQKVN D EYAAKM E RAKQTLE KLELLM M K
SEQ ID NO: 14 (70% TIGR03913, >WP_063775385)
MRSARDRKTHVPVHVVWEITLACNLKCGHCGSRAGKRRANELSTAECLDVVRQLAAVGTREITLIGGEAYLRK
DWLEIAAEIARLGMHCGLQTGARGLTRERIAAAYAAGVRAIGISLDGLRDLHDELRGVKGSHDQAIQAIKWVS
EVGIEPGVNSQINLRSMRELDGIFDEIVAAGAKYWQVQLTVAMGNAVDNSEMLLQPHQIVDVVDKLAELYH
RGRDVGLRLLPGNSIGYFGRHEAYWRSLTDDVTHWGGCTAGETTLGLEADGTIKSCPSLPKSHFAGGETRTSTI
EEALKALESRNVRRDGNRGRSFCGSCYYWNVCRGGCTWVSHVLEGRRGDNPYCYYRATTLARRGLRERIVKV
ADAPDEPFAVGRFEVRLEHEDGSRAPRSIGLDDKPRKRGRLG LCQSCYEYMSLGERHCPHCGEVNRPKLMVD
AKLELDVQIALDEIERHGRSILELAAAAGQDPQAAE
SEQ ID NO: 15 (70% TIGR03913, >OCW56221)
MPVHVVWEITLACNLACGHCGSRAGARRPDELTTAECFDIIRQLREAGTREITLIGGEAYLRKDWLEIAAEISRL
GMLCGLQTGARGLTKARIEAAYDAGVRAIGVSIDGPRDIHDRLRGVAGSHDQAMRAIADIAETGIRPGVNTQI
NALSAPHLWDIYRAIRDAGARSWQTQLTVAMGNAVDNAEMLLQPHLIVDVVEDLYAIFEDGLLHDFRVLPG
NSIGYFGRHEAQWRSITEAAEPWQGCTAGETTLGLEADGTIKGCPSLTRESYSGGNSRTVSIAEAIGNLADRTV
RRDGNPGGGFCATCYYYDFCQAGCTWVTHSLTGRRGDNPFCVYRAGKLRDAGLRERIVKTAEAPDTPFAIGK
FDIVLETLDGKPGPATVADTAPRSVGATHLVVCEQCGQFIANTEPVCVHCHAEQRAAARSRLERVQDVHNLL
AEIDSHSHGIHALVDEISAR
SEQ ID NO: 16 (70% TIGR03913, >WP_060978522)
MVPQAELPHARFFGSADQRRRVPVSVVWELTRACDLSCCHCGSRAGRRAREELSSVECLDLVDQLAELGARD
IGLIGGEVYLRRDWLEIVRRIRQAGMDCSVQTGGRFFTPATLDAAIAAGVMSIGVSLDGVGQTHDLQRGVPG
SFEAALALLHLVANRPIMASVNTQINRHTMQQLPELLDVLIAAKVSNWQLALTVAMGNAADNDDLLLQPYDL
LALFPLLASLHDRAAENGIVLQPSNNIGYFGPYEEKLRLIGDIAAHWTGCSAGDNVLGIEADGNIKGCPGLSRDY
VGGNVRDEPLRVIWDRAERLAFTHRATKSDLWGYCAQCYYAEICMAGCTWTAHSLMGRPGNNPMCHYRA
LEFARRGKRERLVKVADAPGHAFDYGRHALVVEDRPLSDQAVLS
SEQ ID NO: 17 (70% TIGR03913, >SAL02149)
MELKISHHDAPVRVLRPSDRGGRTPVYAVWELTLQCNLACSHCGSRAGKKRTGELSTSEALALVEDLRGLGVR ELALIGGEAYLRGDWIEIVRRARDLGIRPVLQSGGYG FTATLAQRAKEAGLAALGISVDGLEPIHDDIRGVVGSY RAAFEALNAAREVGLTVTANTQIHAKNWRSLPDIYTDLRKAQIAAWQLQLTVAMGNAADNDDLLLQPFHLY ELFPVIAALTDMAKTENVQIYPGNNIGFFGPYEHLWRGPHRFDNHYKGCQAGINTIGIEADGTIKGCPSLATSR YSEGNVRVVPIKEIWNGNLAFPLNRDWDLKLWGFCKTCYYAKVCRGGCNWTSDSLFGKPGNNPYCHHRVLS LKKLGKRERVVKVSEADKSSFGTGLFKLIEESW SEQ ID NO: 18 (70% TIGR03913, >WP_043629091)
MGDAGGRARPARPNRYLSGPDVRGHQLVHAVWELTLACNLKCRHCGSRAGSVRPEEMTTQECLGVVRQLH
ELGAREVTLIGGEAYLRKDWVEIVRSISDAGMECTLQTGAWQLTGTRIAQAADAGLVACGVSIDGLAPLHDHL
RGRPGSFDAAIDALGRLREHGIRTSVNTQITAAVIPQLRDLFREFLATGVRNWQVQLTVAMGRAADNHDLLM
QPYQMNELMPLLAELHAAGVAKGLLLQPGNNIGYFGPYEAQLRGSGTASMHWAGCFAGRNVLGIEADGTIK
GCPSLPTVTYAGGNVRDSSIAEIWASASQLSFARKSRGKEMWGFCASCYYADVCEGGCTWTSHSLLGRPGNN
PYCHYRASELKKQGKRERVVRRLPAPGTSFDHGRFEIVVEPDPDTAGEHRAAGASPQERSRPGPPPAPGDAG
SHLPSLVLCRACDCHVFAGTDFCPHCGADVPASQREYESALADARHAASRLARAIGSLGPELPATTTPRA
SEQ ID NO: 19 (70% TIGR03913, >WP_050043969)
MDQAPDDTPRRRLSLSDQLDCVPVYVVWELTLACNLKCIHCGSRAGHKRAKELTTDECVDVIRQLAELGTREI
SVIGGEAFIRRDWLTII AIRDHGIDCTMQTGGYKLSGDMIRSAADAGLLG LGVSVDGLEPLHDRLRGVRGSYR
EALRVLDDCRRLGLTASVNTQITSAVMSELPQVMEIIIDAGAKYWQVQLTVAMGNAVDHDDILLQPYDLTTL
MPLLAELHLKGRERNLVLLPGNNVGYFGPYDVLWRGPDRGYYSGCPAGQNVIGLEADGTVKGCPSLATDRYG
AGDVRSASIAELWATHPALQFNRNRGTDDLWGFCRDCYYAEACRGGCTWTADSLLGRRGNNPYCHHRVLT
LAERGIRERIVKIEDAPALPFATGRFALIEEPLPAGAPHA
SEQ ID NO: 20 (70% TIGR03913, >WP_020539729)
MSVEEDGLARRVARERDFRDKVPVHVVWELTLACNLKCLHCGSRAGPRRPAELTTEEALDLVAQLAGLGTRE
LTVIGGEAYLRNDWVEIIRRATELGMSCSMQTGARALTPARLRAGADAGLRGIGVSIDGLRDLHDEVRGVPG
AYANAFKVLRDAREAGLRVSVNTQIGARTIGELPALMDELVAAGVTHWQVQLTVAMGNAADHDELLLQPYR
LAELMPLLARLHHEGQRHGLLMVPGNNIGYFGPYEHLWRNASSMSGHWSGCEAGHTALGIEADGTVKGCP
SLPTSAYTAGNVRDLSVADMWRDSPALGFRRGRAGADELWGHCGTCYYADVCDAGCTWTAHSLLGRPGN
NPYCHHRVLELAKKGIRERIVKVADASAAPFGIGEFALVEEAIPADPRPADPRPAVPGRAERAPGDRGVPELRL
CGDCAHFVMAGDAACPFCSADVAEAEERARAVHRRRTELMDQVKALMRQET
SEQ ID NO: 21 (70% TIGR03913, >KYF96555)
MSRAVRDRETDDFRRNVPVHAVWEITLACDLKCRHCGSRAGARRDRELDTAECLEVVAALARLGTRELSLIGG
EAYLRSDWIDIIRAARAARMRCAVQTGGRNLTEARLAAAVGAGLQAVGVSIDGLAPLHDELRGVPGSFSRAID
AVRRARAHGIAASVNTQIGARTMDDLPALLDAIAAAGATHWQIQLTVAMGNAVDNDAVLLQPYQLLELMPL
LDRLYREGLDRGLLMVVGNNIGYFGPFEHRLRGVGDESVHWTGCAAGQNVIGLEADGTVKGCPSLRTSGYA
GGNVRDLRLEDIWNHAEEIHFGRLATVDALWGFCRTCYYADVCRGGCTWTADSLFGRPGNNPYCHHRALEL
DRRSLRERVVKKREAPGAPFAIGEFELIVEPVPGREGASPAPPVTS
SEQ ID NO: 22 (70% TIGR03913, >WP_013570078)
MLLSEPRKMETPPRYLTNQDFQRYIPVHVVWEITLACDLKCMHCGSRAGHRRPDELTTQECLDVVDALARLG
TREVTLIGGEAYLRKDWTQIIRRCADHGIYCATQTGGRNFTQQRLQEAIDAGMNGLGVSLDGLEPLHDRLRG
VPGSFRQALDTLQRAHDAGLSISVNTQIGAEIIPQLPELMEIILGAGAKQWQIQITVAMGNAVDHPELLLQPH
RVLELMPMLARLYQEAAERGMLMVTGNNIGYFGPYEHIWRGFGDERVHWSGCNAGHTGLALEADGTVKG
CPSLATVGFSGGNVRDLTLEHIWNHSKEIHFGRLRSIEDLWGFCRTCYYAEVCRGGCTWTSHSLLGKPGNNPY
CHYRALELEKQGLRERVVKIRDAAPDSFAVGEFELITERIEDGTVAASSVSQKQLVQLGLSNDERWSPREGRVP
PTLKLCRACNEYVWPHEVDCPHCGENIAASLAQYQEDTRRRREAMDAVRALIESYRAAETSVVPST
SEQ ID NO: 23 (70% TIGR03913, >AAY91426)
MGWCRAPPGAQGTLMSDRLPARYLSETDLKRFVPVHVVWEITLACDLKCLHCGSRAGHRRPGELNTQECLS
VIDSIAALGTREVTLIGGEAYLRKDWTRLIQAIHDHGMYVAIQTGGRNLTPAKMQAAVDAGLNGVGVSLDGL
APLHDAVRNVPGSFDKALDTLRRAKQAGLKVSVNTQIGAATLPDLPALMELIIDAGASHWQIQLTVAMGNAV
DHPELLLQPYQLLEVMPLLARLYREGAERGLLMNVGNNIGYYGPYEHMWRGFGDDRVHWSGCAAGQTVLA
LEADGTVKGCPSLATVGFSGGNVRSMSLHDIWHYSEGIHFGRLRSVDDLWGYCRTCYYNDVCRGGCTWTSH
SLLGKPGNNPYCHYRTLDLAKKGLRERIVKLEDAGPASFSVGRFDLITERIDTGEAVSSVNDSGQVIKLAWVNQ
GQASPEEGRIPPRLALCRSCLEYIHAHESTCPHCNADVAAAEARHQEDRLRQQALINTLHQLLGTPQEQPRL
SEQ ID NO: 24 (70% TIGR03913, >WP_062765523)
MPDGQTPAERAIPRARSLDDIVDLVPVHVVWEITLACNLKCQHCGSRAGHVRKGELSTAECLDLVDQMARLG TREVTLIGGEAYLRRDWLEILARVRAHGILCLIQTGGRNLTDKRLEAAIAAGINGIGVSIDGLAPLHDRLRGVPGS FDQAMSALTRAKAAGLSISVNTQIGAETMEDLPELMDRIIAAGATHWQIQMTVAMGNAVDNPDILLQPYRLI ELMPLLARLYREGARRGLTMVVGNNIGYFGPYESLWRGFGNEAVHWTGCAAGQNVIGIEADGTIKGCPSLAT HGYAAGNIRDLALDDIWRNQEAMAFGRTRSVEKDLWGYCRSCYYADVCRAGCTWTSDSLLGRRGNNPYCH YRVRDLAKKGRRERVVKIREAGPESFAVGEFALIEEAIPPEELAALPPDRLPGFSATDGRPVHIAPEDDDDPARA AGARGRIPPHLDLCRSCHEYVWPHETDCPHCGADIAAAAAAYAARLAEVQALAARIRARLDEAAPGI SEQ ID NO: 25 (70% TIGR03913, >WP_014249603)
MDQPLPSPDLSARSPEQLPEQPASVPARYCFDDDFRKLVPVLVVWETTLACNLKCQHCGSRAGRPRPDELTTE
EALDLVDRLAALGTREISLIGGEAYLRKDWVEIIRRCRSHGIRTAVQTGGRNLTDRRLDEAVAAGLQAIGVSIDG
LPDLHDRVRGVSGSYDQAMSALRRAKDRGLAVSVNTQIGPETPDHLPELMNRIIEAGATHWQIQFTVAMGN
AVDNPDLLLQPHRLLDVMPLLARLYREGLDRGLLMVVGNNVGYYGPYEHIWRGLGDDRMHWTGCAAGQI
GIGIEADGTLKGCPSLATSLYAAGNIREMSVEDIWRHADRMQFGRLRSVDELWGYCRTCYYADICRAGCTWT
SESLLGKRGNNPYCHYRVLDLAKHGLRERVVKIKDAPNEAFAIGEFALIQEPIPGAESPGLPERDPAKVHRHEYE
RSLEGGVVPPSLTLCRACNQYVWPHETDCPHCGADVAAAAAQHGIDSARRRALIRETQRLLDEARAAKLEAA
GAPSPATAAGD
SEQ ID NO: 26 (70% TIGR04103, >WP_020737613)
MIEPTEIPVRALMPADLMEAKPIYAVWELTMKCDQPCQHCGSRAGAARDAELSTEEVLEVAASLARLGCREV
ALIGGEAYLREDLAEIVSFLARSGMRVIMQTGGRAFTAERAKALRAAGLTGLGVSVDGPAHIHDELRGNVGSH
AAAIRALDNARAAGLITTANTQINRLNAHLLRETCAELRSHGIQTWQVQITVPMGRAADHPEWILEPWRVVE
VIDTLAAIQREALETHVSGVPFNVFANNNIGYFGPHEQLLRSRPGGGDAHWRGCRGGINAIGIESDGTVKACP
SVPTVPYAGGNVRERGLEHIWEGSAEVRFARDRDASELWGHCATCYYADECRAGCSWTAHCTLGRRGNNPF
CYHRVTQLKRRGIRERLVMKQRAPHVPYDHGVFELVEEAWDAPPPPPPEPVVPRNARRRLAVV
SEQ ID NO: 27 (70% TIGR04103, >SFD76092)
MNSPARALRPDDLRQPRPIYVVWETTLRCDHECAHCGSRAGDARPDELSTEELLEVADALVRLGSREVTLIGG
EAYLRGDCYRLIEHMTKAGIRVTMQTGGRGLTQDRCRKLREAGLAAIGVSVDGPEAAHDTLRASPGSHAAAL
KGIRNAREAGLLVTSNSQINRLNKDVLRETAELLADAGVAVWRAQMTAPMGRAADRPDWLLEPYMVLEVID
TLADIQRWAQRRAADRGIPWERAFHVRLGNNLGYFGPHEQLLRTRPGSPDSYWQGCSAGKFVMGIESDGTI
KGCPSLPTAPYTGGNVKTAALADIWNEAPEIAFARDRGTSELWGFCKSCYYAEVCRAGCSFTAHSAIGKRGNN
PFCYYRATQMKRKGLRERVVLRQAAPGDPYDFGQYEIVEEPWDSAPPRAAVRLPVLAG
SEQ ID NO: 28 (70% TIGR04103, >KIG18351)
MSSSRTIREHEVDQPRPIYTVWEITLRCDHACAHCGSRAGPVRDDELDTAELLAVADALVELGSREVTLIGGEA
YLRSDVYQLVEHLAKAGVRVTMQTGGRGLTAARAQRLRDAGLAAVGVSIDGTAAVHDRLRASPGSHDAAM
RAIEHARAAGMVVTSNSQINQLNMHELPAIAAELEAAGVLVWRGQLTAPMGRAADHPEWIVQPYMVLEIID
TLAQIQAGASARAQARGASEMESFRVTLGNNLGYYGPHEPLLRSRPDRRDRFFPGCQAGRYVLGIESDGTVK
GCPSLPTAPYQGGNVRELSLEQIWDSEAIRFTRDRSTDELWGHCASCYYADVCRAGCSFTSHSTLGRRGNNPF
CYYRADKLRKQGLREVIVHARAAPGSPYDFGGFELREQPWSDLPPPAGRRSLPVVTS
SEQ ID NO: 29 (70% TIGR04103, >WP_006974883)
MAGRRLDVLSTDFFPAYVVWELTLRCDLACRHCGSRAGPARPVELTTSEAVAVAEELGRMGAREVVLIGGEA
YLHEGFLEVVEALARAGVVPVMTTGGRGVDEALARAMAEAGLRRVSVSIDGLEPTHDRMRGFRGSFAAALA
ALDHCAAAGLSISANTNLNRLNWGDLEALYEQHLRGRVRSWQLQITTPLGRAADRTAMIFQPFDLLELLPRVA
ALKRRAFAEGVLILPGNNLGYFGPEEGLLRSQTPEGTDHWQGCQAGRFVMGIESDGAVKGCPSLQTAAYVG
GKVLERGLAEIWNEAPQLAFTRERSVEDLWGYCRGCVFAKTCLGGCSFTAHAVFGRPGNNPYCHYRARDFAK
RGLRERLVPTEAAEGTPFDNGLFEVVEEPLDAPDPEAALEPRELVQITRRPGSTPAQTPRDPAGGGA
SEQ ID NO: 30 (70% TIGR04103, >WP_012234464)
MWRSAQRARARARARARARARARAQVIIHPPRDRKCTGEQEPAAEFAVGSGMALRRLDVVPGEYFPAYVV
WELTLRCDQPCRHCGSRAGAARPSELGTDEALGVVRQLAAMGAREVVLIGGEAYLHDGFLEIIAALKAAGVRP
TMTTGGRGITAEIAAQLKEAGLHSVSVSVDGLERAHDLIRKAPGSHGSALAALGHLRSAGLLTAANTNLNRVN
QGDLEALYDLLREQGIKAWQVQITAALGRAADRPAMLLQPYDLLDVLPRIAELKRRAFRDGITVMPGNNLGYF
GPEEALLRSLREGGRDHFRGCQAGKLVLGIESDGAVKGCPSLQSDAYVGGDLRGRALQEIWDEAPRLAFARA
RTADDLWGFCRSCAFAEVCMGGCTFTAHALFGRPGNNPYCHFRARTLAAQGKRERLVPAEPAAGRPFDNGL
FELVLEDLDGPEPGLDSPEQLVQLTRKPRPSS
SEQ ID NO: 31 (70% TIGR04103, >WP_002625456) MSMLRRLDVTRPDHHPAYVVWELTLRCDQPCTHCGSRAGTQRPDELSTAEALDVVRQLREMRTREVVLIGG
EAYLHPGFLDIIRALKEAGIRPGLTTGGRGMTEALARQVAEAGLYAASVSIDGLEPTHDLMRAAPGSFASATAA
LGFLSAAGVRVAVNTNFNRLNQADLEPLYEHLKGLGPRAWQLQITAPLGRAADRPALLLQPWDLLDLLPRIAA
LKQRAFADGITLMPGNNLGYFGPEEGVLRSPGPDASDHWRGCMAGRYVMGIESNGAVKGCPSLQTAHYVG
GNLRERPLRDLWDNAPPLAFTRTRTVEDLWGFCRTCPFASTCMAGCSFTAHALFGRPGNNPYCHYRARTLAK
QGVRERLVPQAPAPGKPFDHGLFDLVVEPLDAPDPRPPTPRMLVKRLKWPEAHTPQAVGARTDPTG
SEQ ID NO: 32 (70% TIGR04103, >WP_010607032)
MTNTLENAGVKVREYDRSTYAVWEITLKCNLACSHCGSRAGDKRADELSTTEAFDLIKQMADLGVKEVTLIGG EAYLRPDWLMIASEIKNKGMRVTMTTGGYGISRGTAKRMAQAGIEAVSVSIDGLEDEHNSIRGKSDSWSQCF TTLSQFKDLGVHTGVNTTVTRKSAKDLPLLYEKLIDVGVKNWRIQLAVPMGNAADNNEMLMQPYELLDLYPL LGLLSVRGRKDDLIIQPGNNVGYFGPYERLLRGTFMQESKYSFYTGCVAGQGAIGIEADGKIKGCPSLPSEEYTG GNIRERTLKDIYENAPELNFNSQEMDDASIAHLWGNCKGCKYAKLCRAGCNWTAHVFFGKRGNNPYCHHRA LTLAASGLRERFQQRVAASGIPFDHGVFEIYEESISATSNDNVNRFTIQGINFPPSWLATDSNLRERLNSEKLNAI HQYRALGLAKAV
SEQ ID NO: 33 (70% TIGR04103, >WP_010607027)
MIVLVARYAPPIRLELLGVKNMSVNMDSAGIKIKKEARQTYAVWEITLKCNLACSHCGSRAGDSRVNELSTTEA
LDLVHQMADLGIKEVSLIGGEAFMRPDWLMIAAEITRLGMKASMTTGGFGISEGTAKRMKQAGISTVSVSID
GLEKEHDLLRGKVGAWKQCFLTIERLTNVGINVGCNTQINRYSAKQLPLLYQKLVDVGARAWQLQLTVPMG
NAADNDEMLLQPNELLDVFPLINFLSVRGRRDGLAVQAGNNIGYFGPYERQLRDNKSTHSEWAFYRGCGAG
QNTLGIEADGSIKGCPSLPTNAYTGGNIRERSLRDIYENTDELRFNDINKPEDATKHLWGECATCEFAKVCRGG
CNWTSHVFFGKRGNNPYCHHRAVKMAVRGKQERFFIREKASGDPFDHGVFDLVVEDFKPLDPQDTSVFSLA
QAQFPENWLEADPNLVRRLFTERGLVMKQYVDSGIVPKEESPWFDKTQREALMSSAVPA
SEQ ID NO: 34 (70% TIGR04103, >WP_054014533)
MTDLTNRSDIRINAAYRQTYAVWEITLKCNLACNHCGSRAGDARVDELSTSEALDLVAQMADLGIKEVTLIGG
EAFMRPDWLQIAAEITKKGMKATMTTGGYGISLGTAKRMKEAGIAAVSLSIDGMERSHDLLRGKQGAWQKC
FETIAHLREAGIPVGCNSQVNRESIAELPSLYEELLKAGISAWQLALTVPMGNAVENSHILLQPYELLDVFPLLAY
LSKRGNSEGIRVHMGNNIGYFGPYERLLKEPIASEAKWAFTRGCSAGQNAIGIEADGSIKGCPSLPSAEYTGGNI
RDRKLQDIYQNSPELRINDITTPEDATRHLWGECSSCEFASVCRAGCHWTAHVFFGKRGNNPYCHHRALKKA
AKNQRERFYVKQAAPGKPFDHGVFAIHDELCIVNEDSGQFRIDDMAIPDRWQEDGLDLLALIREEKASAIESYR
SLVN
SEQ ID NO: 35 (70% TIGR04103, >WP_055410774)
MRGRRRTYAVWELTLACNLACGHCGSRAGARRPAELSTAQALDVVAQLDAVGIDEVTLIGGEAFLRRDWLTI
AAEITRRGMGCTVTTGGYRLSAAMARGLRAAGVTQCSVSVDGMTATHDRLRGRVGSWESCFRTMERLRSA
GVEATCNTQINRLTAPELPRLYQRLRSAGVVAWQWQLTVPMGNAADHADLLLQPVELLEVFPVLARIARRAS
QDGVRIHAGNNVGYYG PYERLLRSPEGSAFWTGCQAGLSTLGIESDGTIKGCPSLPTRDYAGGNILDRSLTDLL
RDAPELGINLTAGTAAAAENLWGFCRGCTYADVCRGGCTWTAHTFFGRPGNNPYCHHRALTHQRAGRRER
LVQTAPAPGEPFDHGLFSLLEEDLDTPWPSTERPCLTAADIRWPADWVD
SEQ ID NO: 36 (70% TIGR04103, >WP_047858470)
MPPPAVQTRTAYAVWELTLKCNLACGHCGSRAGDSRKNELSREEALDLVRQLAEVGIQEVTIEGGEAFLRPD
WLDIARAITDHGMLCTMTTGGYGLSRETARRMKEAGIAHVSVSVDGLEATHDRIRGRKGSFRFCFETLGHFRE
VGLPFSSNTQVNRLSAPELPALYERLRDAGIRAWQVQLTGPMGNGTDNAWMLLQPAELPDLYRMLARVAL
RVREESRLSLVPGNDVGYFGPYDDLLFSSSGAKVWAGCKAGLSVLGIHADGGIKACPTLPSEFVGGNIRQQPL
ADILETRELTFNVDAGTPEGIAHLWGHCASCRYAEACRGGCSQRAHVLFNRRGNNPYCHHRSLRLAESGVRE
RVVRAAPGTGLPFDHGVFELVEEPLESPWPSDDPHHFTYERVEWPPGWEAFPLPGV
SEQ ID NO: 37 (70% TIGR04103, >WP_002708735)
MSTQLRQRRTYAVWEITLKCNLACQHCGSRAGEARQDELSTAEALDLVQQMAEAGIGEVTLIGGEAFLRKD
WLEIAAEINRCGMICTLTTGGYGISAELARRIKQAGLASVSVSIDGMEASHDAQRGKAGSWKFAFESLQHLRH
AGVPITANSQANRLSAPEFPLFYEKLVEVGVGGWQIAMTVPMGNAADNSWLLLQPAELLVLHPMLAYLARR
GRREGLIMQPGNNVGYYGPYEKLLRSYGSDNDWAFWRGCKAGLALIGIEADGTIKGCPSLPTNAYAGGNIRR
HSLRDIVLNAEKMQINMSTGTEQGTDHMWGFCKSCEYAELCRGGCSWTSHVFFDKRGNNPYCHYRSLVHA AHGIREDLRIKRNAFGLPFDNGEFEISEKALGAAWSGNEEQRLTPDRIQWPEQWLQEDSELKGFIQNEIDHNI
GNMRNYLGLTRKHKLAV
SEQ ID NO: 38 (70% TIGR04103, >SEA53645)
MTDTQTTTQYTPGERSCYAVWEITLKCNLACSHCGSRAGDARVNELSTAEALDLVHQMADVGITEVTLIGGE
AFLRSDWLEIAAEIVKCGMICSMTTGGYGINLTTAQRMKAAGINQVSVSVDGMRHTHDRLRGKIGSWKYAFE
TMGHLREAGIPFGANTQVNRHSAPEFPLLFQALIDAGAKAWQIQMTVPMGNAADNSDILLQPDELLLFHPLL
ANLAKRGYPQGFYVQPGNNYGYYGPYDRMLRGFGKPTEWSFWQGCFAGLRTIGIEADGTIKGCPSLPTAAYS
GGTIRDASLATILTERDELTFNLSAGTPAATDHLWGFCKTCDFAELCRGACNWTAHVFFNRRGNNPYCYHRSL
VNAANGVRERFALRKAASGLPFDNGIFDLFAEDALSTAAADPMRFTSDKIQWPQAWLAENPKLSSALQHEVE
QNVRDMRNARLSAQI
SEQ ID NO: 39 (70% TIGR04103, >WP_012628459)
MSTQEDYYRTRYAVWELTLKCNLACQHCGSRAGQPRTQELTTAEALDLVQQLAQIGIREVTLIGGEAFLRPD
WLEIAAAIARAGMICNLTTGGYGLSLQLAQAMQRAGIAAVSVSIDGLETTHDRLRGKQGAWHSAFRTMQHL
RQVGVPFACNTQINRLSAPELPRIYEQIRDAGVYAWQIQLTVPMGHAADHWEILLQPCELLDLFPLLAQIAQW
AAQEGVRLYPGNNVGYYGPYESLLRGGGHPGAVWQGCGAGLNTLGIEADGTIKACPSLPTSAYAGGNIRDQ
PLASM MAQSEALRFNFNAGLPEGTAHLWGFCQ.TCEFAALCRGGCNWTAHVFFGRRG NPYCHHRALN LA
RQGLRERLALNIPAPGLPFDHGQFLLFQEPLNAPWPEPDPLYFTADQVQWSSSWTEQPVKVC
SEQ ID NO: 40 (70% TIGR04103, >WP_019503880)
MTKKYRRVSYAVWEITLKCNLACSHCGSRAGQARTKELSTEEAFNLVRQLADVGIKEVTLIGGEAFMRSDWLE
IAKAVTEAGMICGMTTGGFGVSLETARKMKEAGIKTVSVSIDGGIPETHDRQRGKKGAWHSAFRTMSHLKEV
GIYFGCNTQINRLSASEFPIIYERIRDAGARAWQIQLTVPMGNAADNADMLLQPYELLDIYPMLARVAKRAKQ
EGVRIQAGN IGYYG PYERLLRGSDEWTFWQGCGAGLNTLGIEADGKIKGCPSLPTAAYTGGNIRDRPLREIV
EQTEELKFNLKAGTEQGTDHMWGFCKTCEFAELCRGGCSWTAHVFFDRRGNNPYCHHRALKQAQKDIRERF
YLKVKAKGNPFDNGEFVIIEEPFNAPLPENDLLHFNSDHIQWPENWQNSESAYALAK
SEQ ID NO: 41 (70% TIGR04103, >WP_015143117)
MTYRRTSYAVWEITLKCNLACSHCGSRAGHTRAKELSTQEALDLVRQMADVGIIEVTLIGGEAFLRPDWLQIA
EAITKAGMLCSMTTGGYGISLETARKMKAAGIASVSVSIDGLEETHDRLRGRKGSWQAAFKTMSHLREVGIFF
GCNTQINRLSAPEFPLIYERIRDAGARAWQIQLTVPMGRAADNANILLQPYELLDLYPMIARVARRARQEGVQ
IQPGNNIGYYGPYERLLRGRGSDSEWAFWQGCAAGLSTLGIEADGAIKGCPSLPTSAYTGGNIREHSLREIVEES
EQLRFNLGAGTSQGTAHLWGFCQTCEFSELCRGGCTWTAHVFFNRRGNNPYCHHRALFQAEQGIRERVVPK
VEAQGLPFDNGEFELIEEPIDAPLPENDPLHFTSDLVQWSASWQEESESIGAVVD
SEQ ID NO: 42 (70% TIGR04103, >WP_047157009)
MNYRISYAVWEITLKCNLACQHCGSRAGHTRTKELSTEEALDMVKQLAEVGITEVTLIGGEAFLRPDWLEIAKA
ITDAGMMCSMTTGGFGITLDTARRMKEAGIRVVSVSVDGLEGTHDRLRGRKGSWQWAFKTIGNLRQVGIFV
GCNTQINRLSAPEFPQIYERIRDAGVFAWQIQLTVPMGNAADNSEILMQPYELLDLYPMIAHVAKRAYKEGV
QVQPGNNIGYYGPYERLLRGQGKDNPWAFWQGCNAGLSTLGIEADGAIKGCPSLPTSVYTGGNIRDYSLRKII
EETEELRFNLGADTPQGTEHLWGFCKGCEFAQLCRGGCSWTAHVFFDKRGNNPYCHHRALTQAKQGIRERV
ELKYRAEGNPFDNGEFALIEEPINAPWPENDPLHFTRDHIQWHGIWQKENKSTPELVAVSK
SEQ ID NO: 43 (70% TIGR04103, >WP_013652855)
MDETVRLARLYDGDPAKLLRTWAGDSPPRIVVWELTLACDLGCRHCGSRAGKARRDELDTQEALDVVRQLA
DLGVAEVILIGGEVYLRDDWFLIAAAVTQAGMTCSLVTGGRGFDAGVVDEALAAGVRIVGVSIDGLPATHDRL
RGVPGSYEAAIATARRIAATGRLTLSVNTQINRLSLPELRAVAERVVELGAVAWQIQLTVALGRAADRPDLLLQ
PWHLLELFPQLVAIKKEILEPGGVQLFPGNNIGYFGPFEAELRYGGDAGHTWMGCGAGRAALGLEADGKLKG
CPSLPTVPYTGGNVRDTPIAELWAHAPEISALGRRTTDDLWGFCGTCRHAAVCKAGCTWTAHALFGRPGNN
PYCHHRAWSLAQTGLRERVVLVEKAPGRPFDHGRYEIVVEPLDAETPDEERLAPVPRARAAALFGLRADAPSA
WSGEDLVEATRSARR
SEQ ID NO: 44 (70% TIGR04103, >WP_068184797)
MPHPPTYDGNPQQLARWRPEHDAPPSIAVWEITLRCDLGCCHCGSRAARARPDELSTTEALDLVRQFADLGL
KEVTLIGGEFYMRDDWDRIAAEINRCGMLCSIVTGARQMTAERVSRAVAAGVGKISISIDGLERTHDAIRGSK
GSWKAATAAARRISDSGIDLSVNTQMNRLTMPELPAVADMLVDIGARSWMVILTAAMGRAADHPSLLLQP YHLLYLFPLLADIKREKLDPNGIAFFPGNNVGYFGPLAETLRYGSELGHMWGGCGAGDSTLGIEADGRIKGCPS LPTSDYVRGNIRERPLREIAAELKREKTEAPTQLWGFCQSCRYAARCKGGCTWTSHVLFGRPGNNPFCHFRAL TMAEAGLVERLEPVAVAPGKPFDFGHCRIVEAPFGPDIESDPLIGMTKLSQVFGLNAGAAGLWSKQELSNTLE YRQSDKT
SEQ ID NO: 45 (70% TIGR04103, >WP_015929579)
MHRPSDEPTYDGDPRSLARWRPGGSAPPSHAVWEITLRCDLGCRHCGSRAGRARRDELSTDAALDVVAQLA
DLGLREVTLIGGEFYLREDWDRIAAAITRRGMLCSIVTGARQMTRARIARAVAAGVGKISLSIDGLEQTHDSVR
GSAGSWQAAVTAGRRIASSGIDLSVNTQINRLTMPELPGVADLLVEIGARSWMVILTAAMGRAADRRALML
QPYHLLHLFPLLAAIKRERLDPAGIAFFPANNIGYFGPLAETLRYGAEGGHAWAGCDAGVASLGIEADGRLKGC
PSLPSADYTMGNVRDHSLAQLWAKRTPNRPIAAAEDLWGFCWTCPHATRCRGGCTWTSHVLFGRRGNNPF
CHYRALALAERGFAEAIEPVSVAPGEPFDFGRHRIVELPLPTALTDDPVIERTLASHVFGLRPGTASVWSPDERE
EAVI
SEQ ID NO: 46 (70% TIGR04103, >SFE54945)
MSLRDNRRRLPVVASLPRPDDRRALTRVAGEAPRPRYAVWELTLKCDQKCIHCGSRAGVARHGELTTAEALA
LVTDLRALGIGEITLIGGEAYLRDDFILIARAIRNAGMDCTMTTGGLNLGEARVAALAEAGIRSVSVSIDGTQAA
HDALRGVPGSFDRAFAALARLRAAGVGRAVNTQINRLTLPTLEALQERLIAEKIGGWQLQITAPFGNAADHPEI
LLQPFMMLEVFAVIERLIARGAPHGLRLFPANNLGYFGPLEAELRGRQKAGGHYKGCIAGRHALGIEADGTIKG
CPSLGGPANVAGNVRERPLREIWEHAPELQFTRVRTVDDLWGYCRDCYYNDVCMAGCSATSEPLLGRPGNN
PYCHHRALELGRAGLRERIEAVAPAPGVPFDHGLYRVVREALDPELAARGPVAVDDPRVGRDVQPFGPGAPV
G
SEQ ID NO: 47 (70% TIGR04103, >SFE62827)
MGIREERRRLPIVALAPPTRSRRALPIAAGQAPVPRIVVWEFTSACDQHCAHCGPRSGKRRPDELTTEEALRLV
DELAAAGVGEVTLIGGEAYLRPDVLRIVRAIRERGMSCTMTTGGYSLTREIAEALVEAGVQSVSVSIDGLAACH
DALRGRPNSFARAFAALRHLKAAGSQISANTQLNAKTLPDLEGLLELLAAEGIHSWQVQVTMAHGAAADHPE
ILLQPYQMIAAYEVVERLLARCEALGIRLYPGNSLGYFGPLEHRLRRNSTQRGHYFGCQAGISGAAVSSHGEVK
SCPSLGEEGVGGSWREHGFAALWERAPEIVYMRQRTRAELWGLCASCYYAAVCMGGCTSMSEPLLGRPGN
NPMCHHRALELDRQGLRERIEPVRPAPGQPFDHGLFRLILEHKDPELRALHGPLTIEEPRRSRVDEPRGPGSPL
A
SEQ ID NO: 48 (70% TIGR04103, >SFE67100)
MLPPPRGAVQRPALAVWEFTRACDQRCKACGPRAGVARPDELTTDEALRLVDELAELGVGEVALIGGEAYLR
ADVLWVIRRIRERGMSCSLTTGGLGLTQTRAEALVEAGLQLVSVSIDGLEASHDALRGTPGGWRRCFEALAHS
RRAGARIAANTQINRLTWRELLPLCDLLADAGAEVWQMFLTMPHGNAADHPELLLQPFELLELFPELERVIAR
CAARRIRFWPGNNLGYFGPLEGKLRRLQQEDGHYKGCSAGRTG LGIEADGTIKSCPSVGGAVNAGGNWRDH
GLRALWERAPEIRYVEQRGLDSLWGYCRECYYADTCMGGCTAMSEPLLGRPGNNPYCHHRALEMDRMGLR
ERVEQVAGADDQAFAHGLFRVVREPKAAGAG PVTIEGPRTGREAAFFGPGAPLAVAGADEP
SEQ ID NO: 49 (70% TIGR04103, >SFF23105)
MSLSDVRRRLPVVASLPAPANRWLTHEDRREAKAPRWAVWELTLACDQHCAHCGPRAGHKRPDELSTEECL
KVVRELAELGCGEVVLIGGEAYLRNDFILIIRAIREAGMACTMTTGGLNLTQERAEAMIEAGIGSVTFSIDGLEAT
HDRLRGVQGSWQRAFAAMRRIRAAGGKIASNTQINALTRHELLPLFELLADEGIHSWQLQITVPHGNAADHP
EILLQPHMFIDIFATLEQVLDRCEARKVRLWPGNNLGYFGPLERRLRQSQRKHWRGCTAGVSVMGIESDGAIK
NCPSLGGGTNIGGNWRVHGVKKVWEESYQLGYIRARTVDDLWGYCRECYYAETCMAGCTAAAEPLLGRPG
NNPFCHHRALMMDRAGLRERIEQIRGAGGKSFDNGLFRVIREHVDPELREKHGPVAIEEPRVSRLEEPYGAGH
TVAL
SEQ ID NO: 50 (70% TIGR04103, >WP_006972642)
MRLKEVRKRLPVVDSLPKGRGRRFRTHEAEGPVPRPALAVWEFTLACDHRCLHCGPRAGEARPNELTTDEAL
QLVDELAEAGVGEVVLIGGEAYLRNDFLLVIRRIRERGMTCTMTTGGLGLTKTRAEAMVEAGIQSVSVSIDGLE
AAHDKLRNRPGSWEHAFEALRNLRNAGSRVAVNSQINQINLGDHIHLLELIADEGVHSWQLQITVAHGNAA
DNADIILQPYMFLELFDQLDAIIDRAFERRVRIWPANNLGYFGPFEHKLRKSQKAHYRGCSAGRSTIGIESDGNI
KNCPSLGGPANIGGSWREHGLAKIWKEAAEITYIRRRTVDDLWGYCRECYYAETCMSGCTAANEPLLGRPGN NPFCHHRALEMDRMGMRERIEPFIPAKGVPFDNGLFRLIREWKDPARREAEGPVEVTEPRVSRLIDEMGSGR AIRMDELVDGRAPFELKDH
SEQ ID NO: 51 (70% TIGR04103, >WP_053236092)
MTHGSIRDPRVTPIGETGLRYVVWELTLRCDLACRHCGSRAG KAREDELSTDEALDVVRQLASMGAREVVLIG
GEAYLRDDWTIVARAIADAGMRCAMVSGGRGLDATRARAAREAGVASVSISIDGIGATHDVQRGLDGAFES
ARVAMRNLRDAGVTLQANTQVNRLSYPELDAILDLLVEERATGWQLAMTVPMGRAADRPDWLLQPHELLE
VYPKLAALAERGARHGVLFFPGNNIGYFGPHEATLRGRGITDDVAWGGCIAGKHAMGIESDGSIKGCPSLPSA
DWVGGTAREASLREIWEQTRELRYVRDRELPGALWGECARCYYASVCGGGCTWTAHTFFGRPGNNPYCHH
RALEMRARGERERLVRVEAAPGAPFDHGRWEIVVEPWVEGEGVARVERPSKRLRVL
SEQ ID NO: 52 (70% TIGR04103, >WP_006969608)
MSTSSDSPARRGPSLPVLDGRGRSDGKLRLPLAEQTRECDTLARPEYAVWELTLRCDLACRHCGSRAGKARPD
ELSTEEALELVTQMAEMGVQETTVIGGEAYLRADWHRIARALTDAGISTTMTTGGRGLDPERVALAKAAGIQ
SVSVSIDGLEAEHDYQRNLKGSYAAAMAALDNLAAAGIPRSVNTQLNGSNLRDIEALLEVIATKGIHSWQIQIT
VAMGRAADHPELLLQPWQMIELMPMAARIARRCRELGIRLWPGSNVGYFGPYEALLRWDHPDGHQTGCE
AGTRTLGIEANGDIKGCPSLPTADYVGANVRDHSLRAIWERSSALRFNRERGTEELWGRCASCYYAPICKAGCT
WTGHVLFGRRGNNPYCHHRALELLREGRRERVELREAASGDPFDHAIYELIEEPWPEPELSRARAVAESGEG
WIR
SEQ ID NO: 53 (70% TIGR04103, >KIG11737)
MSRPSLPIVDASRPKPRVRLPLAPGVRSCDDVRPEYAVWEVTLRCDLACRHCGSRAGHARADELDTEEALDLV
TQMAALGVKETTIIGGEAYLRDDWHLIAAALVNAGIRCTMTTGGRGLTAERVEIAKRAGIESVSVSIDGLAQAH
DHLRALHGSHAAAMRALDHLRAAGIPRSVNTQLNGYNLREIEPLLDQLTAREIHSWQVQITVAMGRAADHPE
LLLQPWQMLELMPLVARLARRCDELGIRLWPGSNIGYFGPYEQLLRWDHRDGHQTGCDAGTRTLGIEANGD
IKGCPSLPSNEYVGGNVREHSLREIWERADALRFNRERRVDELWGRCAGCYYADECKAGCTWTGHVLFGRR
GNNPYCHHRALELLREGRRERLELHTPAPGEPFDHGLYRVIEEPWPAELIDRAREVAASGVGWIS
SEQ ID NO: 54 (70% TIGR04103, >AKV02060)
MELVACSDPTVSIRDPLLAAERAELLRAKRPRVGLPTISKPRRPLPVLSEPRQRDRSIRPRHAVWEITLRCDQAC
RHCGSRAGVERPNELTTEECLDLVRQIAELGVMEVTLIGGEAYLRPDFVQIVRAIRSHGMHCTMTTGGRGLSP
TLAREAAAAGLGSASVSIDGAEETHDRLRGAKGSHRDAIAAMRALREAGVRLTVNTQINRLSLVDLPSILEMLV
REGAEAWQIMLTVAMGRAADEPDVLLQPYDLLDLFPLLDTLAARCEEHGVRLYPGNNLGYFGPYESRLRGTLP
RGHGTSCSAGRGTLGIESDGLVKGCPSLPSEQWGGGTVRDHSLVDLWERASALRYTRDRTVEDLTGFCRTCY
YADICRAGCTWTTSVLFGRPGDNPYCHHRALERDREGLRERLVRRQPAPGEPFDHGIFELIVEPVPGEKEANS
SEQ ID NO: 55 (70% TIGR04103, >AKU99181)
MEKFDPLTAKARAKELQRERPKRALPIAPTAPLGVRHRPLREPRDVDRRYRPIYAVWEITLACDLACRHCGSRA
GRERPDELDTKEALDLVGQMASLGVKEVTLIGGEAYLRGDWLDIVRAIRAHGMVATMTSGGRGLTPELVAQ
AHEAGLVGASISLDGDEVTHDRLRGVKGSYRAAIEALRALRERNMRVSCNSQINRLSVPYLDFILESIAAIGVHS
WQIQLTVPMGRAADEPDVLLQPYDLLELFPRLAELKKRCDELAVRMLPGNNIGYFGPYESTLRGYHVSGHAGS
CGAGRATLGIEANGAIKGCPSLPTEHWTGGNVRDASLLDIWERAEPLRYTRDRTVDDLWGFCRTCYYAEECAS
GCTWTSFVTLGKAGNNPYCHHRALELEKRGKRERVVRVQSAPGEPFDHGVFALVEEDLESGSETL
SEQ ID NO: 56 (80% TIGR03913, >WP_052261552)
MAATRFLSKSDVETRRPVYVVWETTLACNLKCKHCGSRAGTPRKDELSTEDAFGLIDELADLGTREITLIGGELF
LRKDWLKLVERISSKGILCTMQSGGFHLTRERLKDAKEAGLAAIGISIDGLEKTHNSLRGVRTSFQHALRSLEAA
RDLHITSNVNTTITSKNIDELELLYEVLKGFHVRNWQFQIVVAMGNAADEETLLFQPYQITKVLDSVARIRTSAS
RVGMLVQASNALGYFGPYEWLWERGSDTDSHWSSCGAGQSTMGIEADGTIKACPSLATEDFGVTPSEYDTL
QDLWIGSDRIRFNETRDPPSGSICGACYYWAACKGGCSWATHSLTGTVGENPYCHYRAICLSELGLKEKIRKVR
DAPGSSFDRGEFEVVIEDEEGKTLKSDDPRKHRIIEKISAGREQGASEEALKLCTSCHQFAWEYEKQCPFCSGTK
FVSTRALKSIKSATKFL
SEQ ID NO: 57 (80% TIGR03913, >WP_046757912)
MEALSSKRYRVRDDFKTARPVYVVWEITLACNLKCTHCGSRAGKVRPGELTTEQCFEVVDSLKKLGTREISIIGG EAFLRKDWLDIIRRIDGHGMECSMQTGAYNLTEKRITDAKEAGIKNIGVSIDGLPDVHNEIRGRKDSFEQAITCL GLLKKHNIVSSVNTTITKKNKNQLNELLDILIANGVKNWQVQLVVAMGNAVDHSDELIIQPYELIEFYDNLIQIY RRALANNVLVQAGNNIGYFGPYEHIWRQGSVGYYSGCGAGHTALGIEADGVVKGCPSLPTTDYTGGNIKNM
SIEDIWNYSEEMFFSRYRNKNEMWGGCKGCYYETSCLAGCSWTSHVLFGKRGNNPFCHHRALELHKKGQKE
RIRKIKEAEGTSFDYGLYEVVVEDFNGNIIEVQTPTSTPTIVNEYTTRIPRDPEPLKLCQSCKNHIYESSEDCDFCG
ENFEASTAKYEKNLAHAKQVLIRLENLINLDKVNDE
SEQ ID NO: 58 (80% TIGR03913, >WP_024981470)
MEKTSTRYRVRDDYKTATPVHVVWEITLACNLKCSHCGSRAGKVRPGELTTEQCFDVIDSLKRLGTREITIIGGE
AFLRKDWLDIIQRIHQSGMECSMQSGAYNLTEQRIIDAKKAGIRNIGVSLDGLPATHNKIRGRSDSFDHVINCL
QLLKQHNIPSSVNTVVTKRSKNELKELLDVLIENGVKNWQIQLAVAMGNAVDNSEELIVQPYELIDFYDKLIVIY
RKALANNVLIQAGNNIGYFGPYEHIWRQGNEKYYTGCSAGHTGIGIEADGKIKGCPSLPTTEYTGGNVKDMKL
EDIWKYSEEMVFSRYRNKEELWGGCKGCYYESSCLAGCTWTSHVLFGKRGNNPFCHYRALELKKKGLHERIRK
IQEAPGLSFDIGLFEIVVENEKGEIVEVQSPNDVAPISVLSNSDRIPRIPKALKMCNGCDNYVYEEEVICTFCKSDI
QKVNEEYAQKLQRAIVSLEKLELLMMK
SEQ ID NO: 59 (80% TIGR03913, >WP_063775385)
MRSARDRKTHVPVHVVWEITLACNLKCGHCGSRAGKRRANELSTAECLDVVRQLAAVGTREITLIGGEAYLRK
DWLEIAAEIARLGMHCGLQTGARGLTRERIAAAYAAGVRAIGISLDGLRDLHDELRGVKGSHDQAIQAIKWVS
EVGIEPGVNSQINLRSMRELDGIFDEIVAAGAKYWQVQLTVAMGNAVDNSEMLLQPHQIVDVVDKLAELYH
RGRDVGLRLLPGNSIGYFGRHEAYWRSLTDDVTHWGGCTAGETTLGLEADGTIKSCPSLPKSHFAGGETRTSTI
EEALKALESRNVRRDGNRGRSFCGSCYYWNVCRGGCTWVSHVLEGRRGDNPYCYYRATTLARRGLRERIVKV
ADAPDEPFAVGRFEVRLEHEDGSRAPRSIGLDDKPRKRGRLGLCQSCYEYMSLGERHCPHCGEVNRPKLMVD
AKLELDVQIALDEIERHGRSILELAAAAGQDPQAAE
SEQ ID NO: 60 (80% TIGR03913, >OCW56221)
MPVHVVWEITLACNLACGHCGSRAGARRPDELTTAECFDIIRQLREAGTREITLIGGEAYLRKDWLEIAAEISRL
GMLCGLQTGARGLTKARIEAAYDAGVRAIGVSIDGPRDIHDRLRGVAGSHDQAMRAIADIAETGIRPGVNTQI
NALSAPHLWDIYRAIRDAGARSWQTQLTVAMGNAVDNAEMLLQPHLIVDVVEDLYAIFEDGLLHDFRVLPG
NSIGYFGRHEAQWRSITEAAEPWQGCTAG ETTLGLEADGTIKGCPSLTRESYSGGNSRTVSIAEAIGNLADRTV
RRDGNPGGGFCATCYYYDFCQAGCTWVTHSLTGRRGDNPFCVYRAGKLRDAGLRERIVKTAEAPDTPFAIGK
FDIVLETLDGKPGPATVADTAPRSVGATHLVVCEQCGQFIANTEPVCVHCHAEQRAAARSRLERVQDVHNLL
AEIDSHSHGIHALVDEISAR
SEQ ID NO: 61 (80% TIGR03913, >AKI02186)
MDAPLRYRTQDDTGSTTPVHVVWEITLACNLSCGHCGSRAGSRRPNELTTLECFDIIHQLRDAGTREITLIGGE
AYLRKDWLEIAAEITRSGILCGMQTGARGLTRPRVQAAYDAGIRAIGVSIDGPKDMHDRLRGFDGSYDQAMQ
AIGYIAETGIRPGVNTQI NVLSAPYLWEIYGEILKAGAKSWQTQLTVAMGNAVDSAHILLQPYQIIDVIDDLYSIY
EDGLLNDFRLLPGNSIGYFGKYEAHWRSITRTAEAWQGCTAGTTTLGLEADGTIKGCPSLSKDSYSGGVSREVS
LAEAISNLSGRTVKRDGNPGRGFCKTCYYYEFCQGGCTWVTHSLTGERGDNPYCYYRASKLRKAGLRESIVKTA
EAAQTPFAIGKFAILIETLEGKPDSVTIEDDAPKPDHSTHLVICENCEEFIGNNEPVCVHCNTEVRSAAQTRLAHT
QEINNLVAEIEMHSRQIQSVIDTIGDR
SEQ ID NO: 62 (80% TIGR03913, >WP_060978522)
MVPQAELPHARFFGSADQRRRVPVSVVWELTRACDLSCCHCGSRAGRRAREELSSVECLDLVDQLAELGARD
IGLIGGEVYLRRDWLEIVRRIRQAGMDCSVQTGGRFFTPATLDAAIAAGVMSIGVSLDGVGQTHDLQRGVPG
SFEAALALLHLVANRPIMASVNTQINRHTMQQLPELLDVLIAAKVSNWQLALTVAMGNAADNDDLLLQPYDL
LALFPLLASLHDRAAENGIVLQPSNNIGYFGPYEEKLRLIGDIAAHWTGCSAGDNVLGIEADGNIKGCPGLSRDY
VGGNVRDEPLRVIWDRAERLAFTHRATKSDLWGYCAQCYYAEICMAGCTWTAHSLMGRPGNNPMCHYRA
LEFARRGKRERLVKVADAPGHAFDYGRHALVVEDRPLSDQAVLS
SEQ ID NO: 63 (80% TIGR03913, >SAL02149)
MELKISHHDAPVRVLRPSDRGGRTPVYAVWELTLQCNLACSHCGSRAGKKRTGELSTSEALALVEDLRGLGVR ELALIGGEAYLRGDWIEIVRRARDLGIRPVLQSGGYGFTATLAQRAKEAGLAALGISVDGLEPIHDDIRGVVGSY RAAFEALNAAREVGLTVTANTQIHAKNWRSLPDIYTDLRKAQIAAWQLQLTVAMGNAADNDDLLLQPFHLY ELFPVIAALTDMAKTENVQIYPGNNIGFFGPYEHLWRGPHRFDNHYKGCQAGINTIGIEADGTIKGCPSLATSR YSEGNVRVVPIKEIWNGNLAFPLNRDWDLKLWGFCKTCYYAKVCRGGCNWTSDSLFGKPGNNPYCHHRVLS LKKLG KRE R VVKVSE ADKSSFGTG LFKLI EESW SEQ ID NO: 64 (80% TIGR03913, >WP_043629091)
MGDAGGRARPARPNRYLSGPDVRGHQLVHAVWELTLACNLKCRHCGSRAGSVRPEEMTTQECLGVVRQLH
ELGAREVTLIGGEAYLRKDWVEIVRSISDAGMECTLQTGAWQLTGTRIAQAADAGLVACGVSIDGLAPLHDHL
RGRPGSFDAAIDALGRLREHGIRTSVNTQITAAVIPQLRDLFREFLATGVRNWQVQLTVAMGRAADNHDLLM
QPYQMNELMPLLAELHAAGVAKGLLLQPGNNIGYFGPYEAQLRGSGTASMHWAGCFAGRNVLGIEADGTIK
GCPSLPTVTYAGGNVRDSSIAEIWASASQLSFARKSRGKEMWGFCASCYYADVCEGGCTWTSHSLLGRPGNN
PYCHYRASELKKQGKRERVVRRLPAPGTSFDHGRFEIVVEPDPDTAGEHRAAGASPQERSRPGPPPAPGDAG
SHLPSLVLCRACDCHVFAGTDFCPHCGADVPASQREYESALADARHAASRLARAIGSLGPELPATTTPRA
SEQ ID NO: 65 (80% TIGR03913, >WP_050043969)
MDQAPDDTPRRRLSLSDQLDCVPVYVVWELTLACNLKCIHCGSRAGHKRAKELTTDECVDVIRQLAELGTREI
SVIGGEAFIRRDWLTII AIRDHGIDCTMQTGGYKLSGDMIRSAADAGLLG LGVSVDGLEPLHDRLRGVRGSYR
EALRVLDDCRRLGLTASVNTQITSAVMSELPQVMEIIIDAGAKYWQVQLTVAMGNAVDHDDILLQPYDLTTL
MPLLAELHLKGRERNLVLLPGNNVGYFGPYDVLWRGPDRGYYSGCPAGQNVIGLEADGTVKGCPSLATDRYG
AGDVRSASIAELWATHPALQFNRNRGTDDLWGFCRDCYYAEACRGGCTWTADSLLGRRGNNPYCHHRVLT
LAERGIRERIVKIEDAPALPFATGRFALIEEPLPAGAPHA
SEQ ID NO: 66 (80% TIGR03913, >WP_020539729)
MSVEEDGLARRVARERDFRDKVPVHVVWELTLACNLKCLHCGSRAGPRRPAELTTEEALDLVAQLAGLGTRE
LTVIGGEAYLRNDWVEIIRRATELGMSCSMQTGARALTPARLRAGADAGLRGIGVSIDGLRDLHDEVRGVPG
AYANAFKVLRDAREAGLRVSVNTQIGARTIGELPALMDELVAAGVTHWQVQLTVAMGNAADHDELLLQPYR
LAELMPLLARLHHEGQRHGLLMVPGNNIGYFGPYEHLWRNASSMSGHWSGCEAGHTALGIEADGTVKGCP
SLPTSAYTAGNVRDLSVADMWRDSPALGFRRGRAGADELWGHCGTCYYADVCDAGCTWTAHSLLGRPGN
NPYCHHRVLELAKKGIRERIVKVADASAAPFGIGEFALVEEAIPADPRPADPRPAVPGRAERAPGDRGVPELRL
CGDCAHFVMAGDAACPFCSADVAEAEERARAVHRRRTELMDQVKALMRQET
SEQ ID NO: 67 (80% TIGR03913, >KYF96555)
MSRAVRDRETDDFRRNVPVHAVWEITLACDLKCRHCGSRAGARRDRELDTAECLEVVAALARLGTRELSLIGG
EAYLRSDWIDIIRAARAARMRCAVQTGGRNLTEARLAAAVGAGLQAVGVSIDGLAPLHDELRGVPGSFSRAID
AVRRARAHGIAASVNTQIGARTMDDLPALLDAIAAAGATHWQIQLTVAMGNAVDNDAVLLQPYQLLELMPL
LDRLYREGLDRGLLMVVGNNIGYFGPFEHRLRGVGDESVHWTGCAAGQNVIGLEADGTVKGCPSLRTSGYA
GGNVRDLRLEDIWNHAEEIHFGRLATVDALWGFCRTCYYADVCRGGCTWTADSLFGRPGNNPYCHHRALEL
DRRSLRERVVKKREAPGAPFAIGEFELIVEPVPGREGASPAPPVTS
SEQ ID NO: 68 (80% TIGR03913, >WP_013570078)
MLLSEPRKMETPPRYLTNQDFQRYIPVHVVWEITLACDLKCMHCGSRAGHRRPDELTTQECLDVVDALARLG
TREVTLIGGEAYLRKDWTQIIRRCADHGIYCATQTGGRNFTQQRLQEAIDAGMNGLGVSLDGLEPLHDRLRG
VPGSFRQALDTLQRAHDAGLSISVNTQIGAEIIPQLPELMEIILGAGAKQWQIQITVAMGNAVDHPELLLQPH
RVLELMPMLARLYQEAAERGMLMVTGNNIGYFGPYEHIWRGFGDERVHWSGCNAGHTGLALEADGTVKG
CPSLATVGFSGGNVRDLTLEHIWNHSKEIHFGRLRSIEDLWGFCRTCYYAEVCRGGCTWTSHSLLGKPGNNPY
CHYRALELEKQGLRERVVKIRDAAPDSFAVGEFELITERIEDGTVAASSVSQKQLVQLGLSNDERWSPREGRVP
PTLKLCRACNEYVWPHEVDCPHCGENIAASLAQYQEDTRRRREAMDAVRALIESYRAAETSVVPST
SEQ ID NO: 69 (80% TIGR03913, >AAY91426)
MGWCRAPPGAQGTLMSDRLPARYLSETDLKRFVPVHVVWEITLACDLKCLHCGSRAGHRRPGELNTQECLS
VIDSIAALGTREVTLIGGEAYLRKDWTRLIQAIHDHGMYVAIQTGGRNLTPAKMQAAVDAGLNGVGVSLDGL
APLHDAVRNVPGSFDKALDTLRRAKQAGLKVSVNTQIGAATLPDLPALMELIIDAGASHWQIQLTVAMGNAV
DHPELLLQPYQLLEVMPLLARLYREGAERGLLMNVGNNIGYYGPYEHMWRGFGDDRVHWSGCAAGQTVLA
LEADGTVKGCPSLATVGFSGGNVRSMSLHDIWHYSEGIHFGRLRSVDDLWGYCRTCYYNDVCRGGCTWTSH
SLLGKPGNNPYCHYRTLDLAKKGLRERIVKLEDAGPASFSVGRFDLITERIDTGEAVSSVNDSGQVIKLAWVNQ
GQASPEEGRIPPRLALCRSCLEYIHAHESTCPHCNADVAAAEARHQEDRLRQQALINTLHQLLGTPQEQPRL
SEQ ID NO: 70 (80% TIGR03913, >WP_006753568)
MDENQVRPVRFLSEQDYERCVPVHVVWEITLACDLKCLHCGSRAGHRRTNELSTGECLEVIGALARLGTREVS MIGGEAYLRKDWAQLIKAIRSHGMYCAVQTGGRNLTPARLAQAVEAGLNGLGVSLDGLAPLHDKVRNVPGS FDRAVDTLKRARAHGLAISVNTQIGSATMRDLPALMDSIIEIGATHWQIQLTVAMGNAVDHDELLLQPYQLEE LMPLLADLYKRGLDRGLLMNVGNNIGYFGPHEHLWRGFGDERVHWTGCAAGQTVIALEADGTVKGCPSLAT
VGFAGGNVRDLSLEEIWRTSEAIHFGRLRSVDDLWGFCRTCYYADVCRGGCTWTSHSLLGKPGNNPYCHYRV
LELKKQGLRERIEKIEDAAPASFAVGRFDLVTERISDGTPVSSISRSGQTVELAWKHKGKRAPEVGRVPPRLVVC
RACNSYVHQHESRCPHCGADIAAAERAYEHDAQRRHALIQEVERLLS
SEQ ID NO: 71 (80% TIGR03913, >WP_043628614)
MSDSQEKRPARYLSREDFERYNPVHVVWEITLACDLKCLHCGSRAGHRRPSELSTAECLQVIDALAKLGTREITL
IGGEAYLRKDWTQLIRAIRGHGIYCATQTGGRNLTPAKLQEAVDAGLNGVGVSLDGLAPLHDKLRNVPGSFDK
ASDALRRAKAAGLAVSVNTQIGAATMPDLPELMDHIIELGATHWQIQLTVAMGNAVDNDEVLLQPYRVLEL
MPLLARLYQEGLERGLLMTIGNNIGYYGPYEHIWRGFGDDRVHWAGCGAGQTVMALEADGTVKGCPSLAT
VGFAGGNVRDMALEDIWRHSEGIHFGRLRSVDDLWGYCRGCYYNDVCRGGCTWTSHSLLGKPGNNPYCHY
RALDLQQRGLRERIVKLQDAAQDSFAVGRFDLITEEIASGKVVSQISRSGQVIELSWKNRGKKAPETGRPPARL
ALCSNCRQYIHQHETTCPHCKGDVIAAERLHRQKMAERDEAIQRLRSLLGEVS
SEQ ID NO: 72 (80% TIGR03913, >WP_062765523)
MPDGQTPAERAIPRARSLDDIVDLVPVH VWEITLACNLKCQHCGSRAGHVRKGELSTAECLDLVDQMARLG
TREVTLIGGEAYLRRDWLEILARVRAHGILCLIQTGGRNLTDKRLEAAIAAGINGIGVSIDGLAPLHDRLRGVPGS
FDQAMSALTRAKAAGLSISVNTQIGAETMEDLPELMDRIIAAGATHWQIQMTVAMGNAVDNPDILLQPYRLI
ELMPLLARLYREGARRGLTMVVGNNIGYFGPYESLWRGFGNEAVHWTGCAAGQNVIGIEADGTIKGCPSLAT
HGYAAGNIRDLALDDIWRNQEAMAFGRTRSVEKDLWGYCRSCYYADVCRAGCTWTSDSLLGRRGNNPYCH
YRVRDLAKKGRRERVVKIREAGPESFAVGEFALIEEAIPPEELAALPPDRLPGFSATDGRPVHIAPEDDDDPARA
AGARGRIPPHLDLCRSCHEYVWPHETDCPHCGADIAAAAAAYAARLAEVQALAARIRARLDEAAPGI
SEQ ID NO: 73 (80% TIGR03913, >WP_014249603)
MDQPLPSPDLSARSPEQLPEQPASVPARYCFDDDFRKLVPVLVVWETTLACNLKCQHCGSRAGRPRPDELTTE
EALDLVDRLAALGTREISLIGGEAYLRKDWVEIIRRCRSHGIRTAVQTGGRNLTDRRLDEAVAAGLQAIGVSIDG
LPDLHDRVRGVSGSYDQAMSALRRAKDRGLAVSVNTQIGPETPDHLPELMNRIIEAGATHWQIQFTVAMGN
AVDNPDLLLQPHRLLDVMPLLARLYREGLDRGLLMVVGNNVGYYGPYEHIWRGLGDDRMHWTGCAAGQI
GIGIEADGTLKGCPSLATSLYAAGNIREMSVEDIWRHADRMQFGRLRSVDELWGYCRTCYYADICRAGCTWT
SESLLGKRGNNPYCHYRVLDLAKHGLRERVVKIKDAPNEAFAIGEFALIQEPIPGAESPGLPERDPAKVHRHEYE
RSLEGGVVPPSLTLCRACNQYVWPHETDCPHCGADVAAAAAQHGIDSARRRALIRETQRLLDEARAAKLEAA
GAPSPATAAGD
SEQ ID NO: 74 (80% TIGR03913, >WP_045586236)
MDQTIPGLVPARSPSEELPQDHVPARFCDAEDYRRLVPVHVVWEITLACNLKCQHCGSRAGRPRPDELDTGE
ALDLVDRLAALGTREISLIGGEAYLRRDWLEIVRRCRSHGMRTSMQTGARNLTDARIDAAAEAGLQAIGVSID
GMPELHDRVRGVPGSYEQAIGALRRAKARGLAVSANTQIGPETPDHLPAIMDAIIEAGATHWQIQFTVAMG
NAVDNPDULQPHRUEVMPLLARLYREGLDRGLLLVMGNNVGYYGPYERLWRGFGDESQHWSGCSAGQTG
IGIEADGTIKGCPSLATSLYASGNIRDMTLEDIWRLSDRMAFARTRSVDELWGYCRTCYYADACRAGCTWTSE
SLLGKRGNNPYCHYRVLDLAKHGLRERVVKIKDAPKEAFAIGEFALITEPIPGADSPGLPERDPAKVHRHSCERS
AEGGVVPPSLTLCRSCNQYIWPHETDCPHCGADVAAAAARHDIDSARRRALIRETQRLLDEARAAKAAAKEA
VKEASTVAGAVSPS
SEQ ID NO: 75 (80% TIGR03913, >WP_004273211)
METSASALPHAPQEAPVRFRRMEDHHNLVPVHVVWEITLACNLKCQHCGSRAGRPRADELTTAEALDLVDQ
LAALGTREMTLIGGEAYLRRDWIDIVRRCREHGMRTAIQTGARNLTDARLEQAVDAGLQGLGVSIDGLPDLH
DRVRGVPGSYDQAISALRRAKVLGLDVSVNTQIGPETPAHLPDLMDRIIEAGATHWQIQLTVAMGNAVDNP
DLLLQPYQLIDVMPLLARLYQEGVERGLLMIVGNNIGYFGPYERMWRGYGDETMHWTGCAAGQTTIGIEAD
GTIKGCPSLATSLYSAGHVRDMTVEDIWRHTERISFGRLRSVEEMWGYCRTCYYADACRAGCTWTSESLLGRR
GNNPYCHHRVLDLAKHGLRERVVKVKEAPPESFAIGEFALITEAIPGVADAPPLPARDPAHVHIHPRERAAGG
GTTPPKLEVCRGCDQYIWPHEEACPHCGSDVAAEAARHAEDTARRRALINQAKRFLAGVRTASVAE
SEQ ID NO: 76 (80% TIGR03913, >WP_056720094)
MVGKPVRYRVDSDYTELVPVHVVWEVTLACNLKCQHCGSRAGRPRSDELNTAEALELVDHLAALGTRELTLIG GEAYLRKDWIELIRRSRDHGIRTAIQTGARNLTDKRLDEAAEAGLQGVGVSIDGLPELHDRVRGVPGSYDQAID ALKRAKARGMAVSVNTQIGAETAEHLPELMDRIIEAGATHWQIQLTVAMGNAVDNPELLLQPYKLIEVMPLL ARLYREGEARGLLMVVGNNVGYFGPYEHIWRGFGDDADHWNGCSAGQTGIGIEADGTIKGCPSLATSLYAA
GNVREMSVGDIWRHSEKMSFGRLRDVEELWGYCRTCYYADVCRAGCTWTSESLLGKRGNNPYCHHRVLDL
AKHGLRERVVKIREAPQESFAVGEFALITEAIPGAEVTPLPVRDPAAVHVARRERSAAGGALPPSLKPCRSCQQ
YIWPHEVTCPHCDADVATAATAHEAEQTRRRALIADATRILAKAAQARAARLETGA
SEQ ID NO: 77 (80% TIGR04103, >WP_020737613)
MIEPTEIPVRALMPADLMEAKPIYAVWELTMKCDQPCQHCGSRAGAARDAELSTEEVLEVAASLARLGCREV
ALIGGEAYLREDLAEIVSFLARSGMRVIMQTGGRAFTAERAKALRAAGLTGLGVSVDGPAHIHDELRGNVGSH
AAAIRALDNARAAGLITTANTQINRLNAHLLRETCAELRSHGIQTWQVQITVPMGRAADHPEWILEPWRVVE
VIDTLAAIQREALETHVSGVPFNVFANNNIGYFGPHEQLLRSRPGGGDAHWRGCRGGINAIGIESDGTVKACP
SVPTVPYAGGNVRERGLEHIWEGSAEVRFARDRDASELWGHCATCYYADECRAGCSWTAHCTLGRRGNNPF
CYHRVTQLKRRGIRERLVMKQRAPHVPYDHGVFELVEEAWDAPPPPPPEPVVPRNARRRLAVV
SEQ ID NO: 78 (80% TIGR04103, >SFD76092)
MNSPARALRPDDLRQPRPIYVVWETTLRCDHECAHCGSRAGDARPDELSTEELLEVADALVRLGSREVTLIGG
EAYLRGDCYRLIEHMTKAGIRVTMQTGGRGLTQDRCRKLREAGLAAIGVSVDGPEAAHDTLRASPGSHAAAL
KGIRNAREAGLLVTSNSQINRLNKDVLRETAELLADAGVAVWRAQMTAPMGRAADRPDWLLEPYMVLEVID
TLADIQRWAQRRAADRGIPWERAFHVRLGNNLGYFGPHEQLLRTRPGSPDSYWQGCSAGKFVMGIESDGTI
KGCPSLPTAPYTGGNVKTAALADIWNEAPEIAFARDRGTSELWGFCKSCYYAEVCRAGCSFTAHSAIGKRGNN
PFCYYRATQMKRKGLRERVVLRQAAPGDPYDFGQYEIVEEPWDSAPPRAAVRLPVLAG
SEQ ID NO: 79 (80% TIGR04103, >KIG18351)
MSSSRTIREHEVDQPRPIYTVWEITLRCDHACAHCGSRAGPVRDDELDTAELLAVADALVELGSREVTLIGGEA
YLRSDVYQLVEHLAKAGVRVTMQTGGRGLTAARAQRLRDAGLAAVGVSIDGTAAVHDRLRASPGSHDAAM
RAIEHARAAGMVVTSNSQINQLNMHELPAIAAELEAAGVLVWRGQLTAPMGRAADHPEWIVQPYMVLEIID
TLAQIQAGASARAQARGASEMESFRVTLGNNLGYYGPHEPLLRSRPDRRDRFFPGCQAGRYVLGIESDGTVK
GCPSLPTAPYQGGNVRELSLEQIWDSEAIRFTRDRSTDELWGHCASCYYADVCRAGCSFTSHSTLGRRGNNPF
CYYRADKLRKQGLREVIVHARAAPGSPYDFGGFELREQPWSDLPPPAGRRSLPWTS
SEQ ID NO: 80 (80% TIGR04103, >WP_006974888)
MVSPRSIRPEERDTPRPVYVVWEITLRCDHACAHCGSRAGPVREDELSTEELFEVADSLARLGAREVTLIGGEA
YLRSDVYALIAHLRGHGLRVTMQTGGRGLTEGRARKLEEAGLAAVGVSVDGTAETHDTLRASPGSHAAAIQAI
HNARAAGLLVTSNSQINRLNMHELPAIAAELEAAGALVWRAQLTAPMGRAADRPGWIVQPYMVLDIIDTLA
EIQAGAQARASAAGLDPMRAFRVTLGNNLGYYGRHEGKLRSRPDRSDRYFTGCQAGRFVMGIESDGVIKGC
PSLPTAPYVGGNVRDLDIETIWADAAPIRFTRDRDTSELWGHCASCYYADTCMAGCSFTSHSTLGRRGNNPFC
YYRADKLRREGLREVLVHAEAAPNSPYDFGRFELREQPWSDPVPAPTRKRSLPVV
SEQ ID NO: 81 (80% TIGR04103, >WP_006974883)
MAGRRLDVLSTDFFPAYVVWELTLRCDLACRHCGSRAGPARPVELTTSEAVAVAEELGRMGAREVVLIGGEA
YLHEGFLEVVEALARAGVVPVMTTGGRGVDEALARAMAEAGLRRVSVSIDGLEPTHDRMRGFRGSFAAALA
ALDHCAAAGLSISANTNLNRLNWGDLEALYEQHLRGRVRSWQLQITTPLGRAADRTAMIFQPFDLLELLPRVA
ALKRRAFAEGVLILPGNNLGYFGPEEGLLRSQTPEGTDHWQGCQAGRFVMGIESDGAVKGCPSLQTAAYVG
GKVLERGLAEIWNEAPQLAFTRERSVEDLWGYCRGCVFAKTCLGGCSFTAHAVFGRPGNNPYCHYRARDFAK
RGLRERLVPTEAAEGTPFDNGLFEVVEEPLDAPDPEAALEPRELVQITRRPGSTPAQTPRDPAGGGA
SEQ ID NO: 82 (80% TIGR04103, >WP_012234464)
MWRSAQRARARARARARARARARAQVIIHPPRDRKCTGEQEPAAEFAVGSGMALRRLDVVPGEYFPAYVV
WELTLRCDQPCRHCGSRAGAARPSELGTDEALGVVRQLAAMGAREVVLIGGEAYLHDGFLEIIAALKAAGVRP
TMTTGGRGITAEIAAQLKEAGLHSVSVSVDGLERAHDLIRKAPGSHGSALAALGHLRSAGLLTAANTNLNRVN
QGDLEALYDLLREQGIKAWQVQITAALGRAADRPAMLLQPYDLLDVLPRIAELKRRAFRDGITVMPGNNLGYF
GPEEALLRSLREGGRDHFRGCQAGKLVLGIESDGAVKGCPSLQSDAYVGGDLRGRALQEIWDEAPRLAFARA
RTADDLWGFCRSCAFAEVCMGGCTFTAHALFGRPGNNPYCHFRARTLAAQGKRERLVPAEPAAGRPFDNGL
FELVLEDLDGPEPGLDSPEQLVQLTRKPRPSS
SEQ ID NO: 83 (80% TIGR04103, >WP_002625456)
MSMLRRLDVTRPDHHPAYVVWELTLRCDQPCTHCGSRAGTQRPDELSTAEALDVVRQLREMRTREVVLIGG EAYLHPGFLDIIRALKEAGIRPGLTTGGRGMTEALARQVAEAGLYAASVSIDGLEPTHDLMRAAPGSFASATAA LGFLSAAGVRVAVNTNFNRLNQADLEPLYEHLKGLGPRAWQLQITAPLGRAADRPALLLQPWDLLDLLPRIAA LKQRAFADGITLMPGNNLGYFGPEEGVLRSPGPDASDHWRGCMAGRYVMGIESNGAVKGCPSLQTAHYVG GNLRERPLRDLWDNAPPLAFTRTRTVEDLWGFCRTCPFASTCMAGCSFTAHALFGRPGNNPYCHYRARTLAK QGVRERLVPQAPAPGKPFDHGLFDLVVEPLDAPDPRPPTPRMLVKRLKWPEAHTPQAVGARTDPTG SEQ ID NO: 84(80% TIGR04103, >WP_044192718)
MVLRRIDTAPEDFHPAYVVWELTLACDQPCTHCGSRAGTARAGELSTEEALGVVAQLAAMRAREVVLIGGEA
YLHPGFLDIIRALKAAGLRPMLTTGGRGITAELAGEMARAGLHGASVSVDGLEETHDLMRAARGSFASATAAL
GHLKAAGVRTAANTNLNRLNQGDLEELYAHLKAQGIGAWQLQITAPLGRAADRPDLLLQPWDLIELLPRIARL
KEQAYRERILIMPGNNLGYFGTEEALLRSVQAGGRDHWRGCQAGRYVLGIESNGAVKGCPSLQTAHYVGGN
LREQPLERIWNDSSELAFTRARTVEDLWGFCRTCPFAEVCMGGCSFTAHSLFGRPGNNPYCHYRAKTLASRG
QRERLLPKAPAPGRPFDHGLYEIVPESLDAPDPKPELKREYVKRRRWP
SEQ ID NO: 85 (80% TIGR04103, >WP_010607032)
MTNTLENAGVKVREYDRSTYAVWEITLKCNLACSHCGSRAGDKRADELSTTEAFDLIKQMADLGVKEVTLIGG EAYLRPDWLMIASEIKNKGMRVTMTTGGYGISRGTAKRMAQAGIEAVSVSIDGLEDEHNSIRGKSDSWSQCF TTLSQFKDLGVHTGVNTTVTRKSAKDLPLLYEKLIDVGVKNWRIQLAVPMGNAADNNEMLMQPYELLDLYPL LGLLSVRGRKDDLIIQPGNNVGYFGPYERLLRGTFMQESKYSFYTGCVAGQGAIGIEADGKIKGCPSLPSEEYTG GNIRERTLKDIYENAPELNFNSQEMDDASIAHLWGNCKGCKYAKLCRAGCNWTAHVFFGKRGNNPYCHHRA LTLAASGLRERFQQRVAASGIPFDHGVFEIYEESISATSNDNVNRFTIQGINFPPSWLATDSNLRERLNSEKLNAI HQYRALGLAKAV
SEQ ID NO: 86 (80% TIGR04103, >WP_010607027)
MIVLVARYAPPIRLELLGVKNMSVNMDSAGIKIKKEARQTYAVWEITLKCNLACSHCGSRAGDSRVNELSTTEA
LDLVHQMADLGIKEVSLIGGEAFMRPDWLMIAAEITRLGMKASMTTGGFGISEGTAKRMKQAGISTVSVSID
GLEKEHDLLRGKVGAWKQCFLTIERLTNVGINVGCNTQINRYSAKQLPLLYQKLVDVGARAWQLQLTVPMG
NAADNDEMLLQPNELLDVFPLINFLSVRGRRDGLAVQAGNNIGYFGPYERQLRDNKSTHSEWAFYRGCGAG
QNTLGIEADGSIKGCPSLPTNAYTGGNIRERSLRDIYENTDELRFNDINKPEDATKHLWGECATCEFAKVCRGG
CNWTSHVFFGKRGNNPYCHHRAVKMAVRGKQERFFIREKASGDPFDHGVFDLVVEDFKPLDPQDTSVFSLA
QAQFPENWLEADPNLVRRLFTERGLVMKQYVDSGIVPKEESPWFDKTQREALMSSAVPA
SEQ ID NO: 87 (80% TIGR04103, >WP_054014533)
MTDLTNRSDIRINAAYRQTYAVWEITLKCNLACNHCGSRAGDARVDELSTSEALDLVAQMADLGIKEVTLIGG
EAFMRPDWLQIAAEITKKGMKATMTTGGYGISLGTAKRMKEAGIAAVSLSIDGMERSHDLLRGKQGAWQKC
FETIAHLREAGIPVGCNSQVNRESIAELPSLYEELLKAGISAWQLALTVPMGNAVENSHILLQPYELLDVFPLLAY
LSKRGNSEGIRVHMGNNIGYFGPYERLLKEPIASEAKWAFTRGCSAGQNAIGIEADGSIKGCPSLPSAEYTGGNI
RDRKLQDIYQNSPELRINDITTPEDATRHLWGECSSCEFASVCRAGCHWTAHVFFGKRGNNPYCHHRALKKA
AKNQRERFYVKQAAPGKPFDHGVFAIHDELCIVNEDSGQFRIDDMAIPDRWQEDGLDLLALIREEKASAIESYR
SLVN
SEQ ID NO: 88 (80% TIGR04103, >WP_055410774)
MRGRRRTYAVWELTLACNLACGHCGSRAGARRPAELSTAQALDVVAQLDAVGIDEVTLIGGEAFLRRDWLTI
AAEITRRGMGCTVTTGGYRLSAAMARGLRAAGVTQCSVSVDGMTATHDRLRGRVGSWESCFRTMERLRSA
GVEATCNTQINRLTAPELPRLYQRLRSAGVVAWQWQLTVPMGNAADHADLLLQPVELLEVFPVLARI ARRAS
QDGVRIHAGNNVGYYG PYERLLRSPEGSAFWTGCQAGLSTLGIESDGTIKGCPSLPTRDYAGGNILDRSLTDLL
RDAPELGINLTAGTAAAAENLWGFCRGCTYADVCRGGCTWTAHTFFGRPGNNPYCHHRALTHQRAGRRER
LVQTAPAPGEPFDHGLFSLLEEDLDTPWPSTERPCLTAADIRWPADWVD
SEQ ID NO: 89 (80% TIGR04103, >WP_047858470)
MPPPAVQTRTAYAVWELTLKCNLACGHCGSRAGDSRKNELSREEALDLVRQLAEVGIQEVTIEGGEAFLRPD
WLDIARAITDHGMLCTMTTGGYGLSRETARRMKEAGIAHVSVSVDGLEATHDRIRGRKGSFRFCFETLGHFRE
VGLPFSSNTQVNRLSAPELPALYERLRDAGIRAWQVQLTGPMGNGTDNAWMLLQPAELPDLYRMLARVAL
RVREESRLSLVPGNDVGYFGPYDDLLFSSSGAKVWAGCKAGLSVLGIHADGGIKACPTLPSEFVGGNIRQQPL
ADILETRELTFNVDAGTPEGIAHLWGHCASCRYAEACRGGCSQRAHVLFNRRGNNPYCHHRSLRLAESGVRE
RVVRAAPGTGLPFDHGVFELVEEPLESPWPSDDPHHFTYERVEWPPGWEAFPLPGV
SEQ ID NO: 90 (80% TIGR04103, >WP_002708735) MSTQLRQRRTYAVWEITLKCNLACQHCGSRAGEARQDELSTAEALDLVQQMAEAGIGEVTLIGGEAFLRKD
WLEIAAEINRCGMICTLTTGGYGISAELARRIKQAGLASVSVSIDGMEASHDAQRGKAGSWKFAFESLQHLRH
AGVPITANSQANRLSAPEFPLFYEKLVEVGVGGWQIAMTVPMGNAADNSWLLLQPAELLVLHPMLAYLARR
GRREGLIMQPGNNVGYYGPYEKLLRSYGSDNDWAFWRGCKAGLALIGIEADGTIKGCPSLPTNAYAGGNIRR
HSLRDIVLNAEKMQINMSTGTEQGTDHMWGFCKSCEYAELCRGGCSWTSHVFFDKRGNNPYCHYRSLVHA
AHGIREDLRIKRNAFGLPFDNGEFEISEKALGAAWSGNEEQRLTPDRIQWPEQWLQEDSELKGFIQNEIDHNI
GNMRNYLGLTRKHKLAV
SEQ ID NO: 91 (80% TIGR04103, >SEA53645)
MTDTQTTTQYTPGERSCYAVWEITLKCNLACSHCGSRAGDARVNELSTAEALDLVHQMADVGITEVTUGGE
AFLRSDWLEIAAEIVKCGMICSMTTGGYGINLTTAQRMKAAGINQVSVSVDGMRHTHDRLRGKIGSWKYAFE
TMGHLREAGIPFGANTQVNRHSAPEFPLLFQALIDAGAKAWQIQMTVPMGNAADNSDILLQPDELLLFHPLL
ANLAKRGYPQGFYVQPGNNYGYYGPYDRMLRGFGKPTEWSFWQGCFAGLRTIGIEADGTIKGCPSLPTAAYS
GGTIRDASLATILTERDELTFNLSAGTPAATDHLWGFCKTCDFAELCRGACNWTAHVFFNRRGNNPYCYHRSL
VNAANGVRERFALRKAASGLPFDNGIFDLFAEDALSTAAADPMRFTSDKIQWPQAWLAENPKLSSALQHEVE
QNVRDMRNARLSAQI
SEQ ID NO: 92 (80% TIGR04103, >WP_012628459)
MSTQEDYYRTRYAVWELTLKCNLACQHCGSRAGQPRTQELTTAEALDLVQQLAQIGIREVTLIGGEAFLRPD
WLEIAAAIARAGMICNLTTGGYGLSLQLAQAMQRAGIAAVSVSIDGLETTHDRLRGKQGAWHSAFRTMQHL
RQVGVPFACNTQINRLSAPELPRIYEQIRDAGVYAWQIQLTVPMGHAADHWEILLQPCELLDLFPLLAQIAQW
AAQEGVRLYPGNNVGYYGPYESLLRGGGHPGAVWQGCGAGLNTLGIEADGTIKACPSLPTSAYAGGNIRDQ
PLASM MAQSEALRFNFNAGLPEGTAHLWGFCQ.TCEFAALCRGGCNWTAHVFFGRRG NPYCHHRALN LA
RQGLRERLALNIPAPGLPFDHGQFLLFQEPLNAPWPEPDPLYFTADQVQWSSSWTEQPVKVC
SEQ ID NO: 93 (80% TIGR04103, >WP_019501033)
MSYSRKSYAVWEITLNCNLACQHCGSRAGHARNAELTTSEALDLVRQMSDIGITEVTIIGGEAFMRPDWLEIA
QVISQAGMVCSMTTGGFGISLDMARRMKDAGISAVSISVDGLEGTHDRLRGRVGSWASCFRTMGHFREVG
LFFGCNTQINRYSAPELPLVYEKILEAGAKAWQIQLTVPMGNAVDNSAMLLQPYELLDLYPMLAEIAKKAKND
GLSLQPGNNIGYYG PYERLLRGRGDAWGFWQGCAAGLSTLGIEADGAIKGCPSLPTNAYTGGNVRDRTLREI
VENSAELRFNIGADTDRLWGFCETCEFAKLCRGGCTWTSHVFFDRRGNNPYCHHRALTQASRGIRERVVLKT
KAPGLPFDNGMFEMVAEPTSTELNSDDPLAFSSDRIQWPTNWHQEEVMTNS
SEQ ID NO: 94 (80% TIGR04103, >WP_019503880)
MTKKYRRVSYAVWEITLKCNLACSHCGSRAGQARTKELSTEEAFNLVRQLADVGIKEVTLIGGEAFMRSDWLE
IAKAVTEAGMICGMTTGGFGVSLETARKMKEAGIKTVSVSIDGGIPETHDRQRGKKGAWHSAFRTMSHLKEV
GIYFGCNTQINRLSASEFPIIYERIRDAGARAWQIQLTVPMGNAADNADMLLQPYELLDIYPMLARVAKRAKQ
EGVRIQAGNNIGYYG PYERLLRGSDEWTFWQGCGAGLNTLGIEADGKIKGCPSLPTAAYTGGNIRDRPLREIV
EQTEELKFNLKAGTEQGTDHMWGFCKTCEFAELCRGGCSWTAHVFFDRRGNNPYCHHRALKQAQKDIRERF
YLKVKAKGNPFDNGEFVIIEEPFNAPLPENDLLHFNSDHIQWPENWQNSESAYALAK
SEQ ID NO: 95 (80% TIGR04103, >WP_015143117)
MTYRRTSYAVWEITLKCNLACSHCGSRAGHTRAKELSTQEALDLVRQMADVGIIEVTLIGGEAFLRPDWLQIA
EAITKAGMLCSMTTGGYGISLETARKMKAAGIASVSVSIDGLEETHDRLRGRKGSWQAAFKTMSHLREVGIFF
GCNTQINRLSAPEFPLIYERIRDAGARAWQIQLTVPMGRAADNANILLQPYELLDLYPMIARVARRARQEGVQ
IQPGNNIGYYGPYERLLRGRGSDSEWAFWQGCAAGLSTLGIEADGAIKGCPSLPTSAYTGGNIREHSLREIVEES
EQLRFNLGAGTSQGTAHLWGFCQTCEFSELCRGGCTWTAHVFFNRRGNNPYCHHRALFQAEQGIRERVVPK
VEAQGLPFDNGEFELIEEPIDAPLPENDPLHFTSDLVQWSASWQEESESIGAVVD
SEQ ID NO: 96 (80% TIGR04103, >WP_017748600)
MTYHRKSYAVWEITLKCNLACSHCGSRAGNARSEELSTEEALDLVRQMAEVGIKEVTLIGGEAFLRPDWLEIAK
AISEAGMLCGMTTGGYGISLETAGKMKAAGIRTVSVSIDGLEETHDRLRGRKGSWKYAFKTMSHLREVGIAFG
CNTQINRLSAPEFPCIYECIRDAGARAWQIQLTVPMGNAADNADILLQPHELLDIYPMLARVARRAYQEGVQL
QAGNNIGYYGPDERLLRGRGSEHEFSFWQGCGAGLSTLGIEADGAIKGCPSLPTTAYTGGNIRERSLYDIIENSA
ELRLNLGAGTPEGTKHLWGFCKTCEYAELCRGGCSWTAHVFFDRRGNNPYCHHRALVHEERGLRERVIPKVR
AQGLPFDNGEFELIVEPINTPLPENDPLNFSADRIQWSESWQNKPEVSYSLAEQ SEQ ID NO: 97 (80% TIGR04103, >WP_046277258)
MAYRRTSYAVWEITLKCNLACSHCGSRAGQAREQELSTHEALDLVQQMAEVGITEVTLIGGEAFLRPDWLQI
AEAINRAGMRCTMTTGGYGISLETAEKMHRAGIATVSISVDGLEATHDRLRGRPGSWQWAFKTMGHLKQV
GIPFGCNTQINRLSAPEFPRIYEKIRDAGARAWQIQLTVPMGNAADNSEILLQPCELLAVYPMLARVAQQAKK
DGVQIQPGNNIGYYGPYERLLRGRGQEDDWTFWQGCNAGLSTLGIEADGAIKGCPSLPTKAYTGGNIRERPL
REIIEATEELRFNLNAGTAEGMAHLWGFCQTCEYAELCRGGCTWTAHVFFNRRGNNPYCHHRALAHADQGK
RERVVPKVQAVGLPFDNGEFELVEEPLNAPWVESDPLHFTADQIQWPEHWHEQPSKLLSHK
SEQ ID NO: 98 (80% TIGR04103, >WP_017740073)
MSKEYQRISYAVWEITLKCNLACSHCGSRAGQARTKELSTEEALDLVQQLAEVGIKEVTLIGGEAFLRPDWLEIA
KAITQAGMLCGMTTGGYGISLDMARRMKEAGISMVSVSTDGMEATHDHLRGRKGSWKSGLRTMGYLKEV
GIPFGCNTQINRLSAPEFPLIYEHIRDAGACAWQIQLTVPMGNAADNADILLQPSELLDIYPMLARVTQRANRE
GVKVRAGNNIGYYGPYERLLRGGGNEWTFWQGCGAGLSTLGLEADGAIKGCPSLPTAAYTGGNIRERSLREII
EQTEELRFNLGADTPQGTEHLWGFCKTCKFAELCRGGCTWTAHVFFNRRGNNPYCHHRALEQAKHGIRERV
YPKVRAQGLPFDNGEFALIEEPLDAPWPLDDPLHFTADRIQWSNSWQEEPEYTYSLAR
SEQ ID NO: 99 (80% TIGR04103, >WP_070390932)
MTYRRISYAVWEITLKCNLACQHCGSRAGHTRAKELSTTEALDMVRQLADVGITEVTLIGGEAFLRPDWLDIA
KAITSAGMVCGMTTGGFGISLDTARRMKDAGIRVVSVSVDGLEATHDRLRGRKGSWQWAFKTMSHLKEAGI
PFGCNTQINRLSAPEFPQIYERIRDAGVFAWQIQLTVPMGNAADNSEILLQPYELLDVYPMIARVARRAFREG
VKVQAGNNIGYYGPYERLLRGRGEDNPWAFWQGCNAGLSSLGIEADGAIKGCPSLPTSAYTGGNIRDHSLREI
IEETEELRFNLGADTPKGTDHLWGFCKSCEFAQLCRGGCSWTAHVFFDRRGNNPYCHHRALTQEKQGIRERV
EIKHRAEGNPFDNGEFVLIEEPIDAPWPDNDPLHFSADRILWPKGWQEQEELVPSLASSSS
SEQ ID NO: 100 (80% TIGR04103, >WP_013652855)
MDETVRLARLYDGDPAKLLRTWAGDSPPRIVVWELTLACDLGCRHCGSRAGKARRDELDTQEALDVVRQLA
DLGVAEVILIGGEVYLRDDWFLIAAAVTQAGMTCSLVTGGRGFDAGVVDEALAAGVRIVGVSIDGLPATHDRL
RGVPGSYEAAIATARRIAATGRLTLSVNTQINRLSLPELRAVAERVVELGAVAWQIQLTVALGRAADRPDLLLQ
PWHLLELFPQLVAIKKEILEPGGVQLFPGNNIGYFGPFEAELRYGGDAGHTWMGCGAGRAALGLEADGKLKG
CPSLPTVPYTGGNVRDTPIAELWAHAPEISALGRRTTDDLWGFCGTCRHAAVCKAGCTWTAHALFGRPGNN
PYCHHRAWSLAQTGLRERVVLVEKAPGRPFDHGRYEIVVEPLDAETPDEERLAPVPRARAAALFGLRADAPSA
WSGEDLVEATRSARR
SEQ ID NO: 101 (80% TIGR04103, >WP_068184797)
MPHPPTYDGNPQQLARWRPEHDAPPSIAVWEITLRCDLGCCHCGSRAARARPDELSTTEALDLVRQFADLGL
KEVTLIGGEFYMRDDWDRIAAEINRCGMLCSIVTGARQMTAERVSRAVAAGVGKISISIDGLERTHDAIRGSK
GSWKAATAAARRISDSGIDLSVNTQMNRLTMPELPAVADMLVDIGARSWMVILTAAMGRAADHPSLLLQP
YHLLYLFPLLADIKREKLDPNGIAFFPGNNVGYFGPLAETLRYGSELGHMWGGCGAGDSTLGIEADGRIKGCPS
LPTSDYVRGNIRERPLREIAAELKREKTEAPTQLWGFCQSCRYAARCKGGCTWTSHVLFGRPGNNPFCHFRAL
TMAEAGLVERLEPVAVAPGKPFDFGHCRIVEAPFGPDIESDPLIGMTKLSQVFGLNAGAAGLWSKQELSNTLE
YRQSDKT
SEQ ID NO: 102 (80% TIGR04103, >WP_015929579)
MHRPSDEPTYDGDPRSLARWRPGGSAPPSHAVWEITLRCDLGCRHCGSRAGRARRDELSTDAALDVVAQLA
DLGLREVTLIGGEFYLREDWDRIAAAITRRGMLCSIVTGARQMTRARIARAVAAGVGKISLSIDGLEQTHDSVR
GSAGSWQAAVTAGRRIASSGIDLSVNTQINRLTMPELPGVADLLVEIGARSWMVILTAAMGRAADRRALML
QPYHLLHLFPLLAAIKRERLDPAGIAFFPANNIGYFGPLAETLRYGAEGGHAWAGCDAGVASLGIEADGRLKGC
PSLPSADYTMGNVRDHSLAQLWAKRTPNRPIAAAEDLWGFCWTCPHATRCRGGCTWTSHVLFGRRGNNPF
CHYRALALAERGFAEAIEPVSVAPGEPFDFGRHRIVELPLPTALTDDPVIERTLASHVFGLRPGTASVWSPDERE
EAVI
SEQ ID NO: 103 (80% TIGR04103, >SFE54945)
MSLRDNRRRLPVVASLPRPDDRRALTRVAGEAPRPRYAVWELTLKCDQKCIHCGSRAGVARHGELTTAEALA LVTDLRALGIGEITLIGGEAYLRDDFILIARAIRNAGMDCTMTTGGLNLGEARVAALAEAGIRSVSVSIDGTQAA HDALRGVPGSFDRAFAALARLRAAGVGRAVNTQINRLTLPTLEALQERLIAEKIGGWQLQITAPFGNAADHPEI LLQPFMMLEVFAVIERLIARGAPHGLRLFPANNLGYFGPLEAELRGRQKAGGHYKGCIAGRHALGIEADGTIKG CPSLGGPANVAGNVRERPLREIWEHAPELQFTRVRTVDDLWGYCRDCYYNDVCMAGCSATSEPLLGRPGNN PYCHHRALELGRAGLRERIEAVAPAPGVPFDHGLYRVVREALDPELAARGPVAVDDPRVGRDVQPFGPGAPV G
SEQ ID NO: 104 (80% TIGR04103, >SFE62827)
MGIREERRRLPIVALAPPTRSRRALPIAAGQAPVPRIVVWEFTSACDQHCAHCGPRSGKRRPDELTTEEALRLV
DELAAAGVGEVTLIGGEAYLRPDVLRIVRAIRERGMSCTMTTGGYSLTREIAEALVEAGVQSVSVSIDGLAACH
DALRGRPNSFARAFAALRHLKAAGSQISANTQLNAKTLPDLEGLLELLAAEGIHSWQVQVTMAHGAAADHPE
ILLQPYQMIAAYEVVERLLARCEALGIRLYPGNSLGYFGPLEHRLRRNSTQRGHYFGCQAGISGAAVSSHGEVK
SCPSLGEEGVGGSWREHGFAALWERAPEIVYMRQRTRAELWGLCASCYYAAVCMGGCTSMSEPLLGRPGN
NPMCHHRALELDRQGLRERIEPVRPAPGQPFDHGLFRLILEHKDPELRALHGPLTIEEPRRSRVDEPRGPGSPL
A
SEQ ID NO: 105 (80% TIGR04103, >SFE67100)
MLPPPRGAVQRPALAVWEFTRACDQRCKACGPRAGVARPDELTTDEALRLVDELAELGVGEVALIGGEAYLR
ADVLWVIRRIRERGMSCSLTTGGLGLTQTRAEALVEAGLQLVSVSIDGLEASHDALRGTPGGWRRCFEALAHS
RRAGARIAANTQINRLTWRELLPLCDLLADAGAEVWQMFLTMPHGNAADHPELLLQPFELLELFPELERVIAR
CAARRIRFWPGNNLGYFGPLEGKLRRLQQEDGHYKGCSAGRTG LGIEADGTIKSCPSVGGAVNAGGNWRDH
GLRALWERAPEIRYVEQRGLDSLWGYCRECYYADTCMGGCTAMSEPLLGRPGNNPYCHHRALEMDRMGLR
ERVEQVAGADDQAFAHGLFRWREPKAAGAG PVTIEGPRTGREAAFFGPGAPLAVAGADEP
SEQ ID NO: 106 (80% TIGR04103, >SFF23105)
MSLSDVRRRLPVVASLPAPANRWLTHEDRREAKAPRWAVWELTLACDQHCAHCGPRAGHKRPDELSTEECL
KVVRELAELGCGEVVLIGGEAYLRNDFILIIRAIREAGMACTMTTGGLNLTQERAEAMIEAGIGSVTFSIDGLEAT
HDRLRGVQGSWQRAFAAMRRIRAAGGKIASNTQINALTRHELLPLFELLADEGIHSWQLQITVPHGNAADHP
EILLQPHMFIDIFATLEQVLDRCEARKVRLWPGNNLGYFGPLERRLRQSQRKHWRGCTAGVSVMGIESDGAIK
NCPSLGGGTNIGGNWRVHGVKKVWEESYQLGYIRARTVDDLWGYCRECYYAETCMAGCTAAAEPLLGRPG
NNPFCHHRALMMDRAGLRERIEQIRGAGGKSFDNGLFRVIREHVDPELREKHGPVAIEEPRVSRLEEPYGAGH
TVAL
SEQ ID NO: 107 (80% TIGR04103, >WP_006972642)
MRLKEVRKRLPVVDSLPKGRGRRFRTHEAEGPVPRPALAVWEFTLACDHRCLHCGPRAGEARPNELTTDEAL
QLVDELAEAGVGEVVLIGGEAYLRNDFLLVIRRIRERGMTCTMTTGGLGLTKTRAEAMVEAGIQSVSVSIDGLE
AAHDKLRNRPGSWEHAFEALRNLRNAGSRVAVNSQINQINLGDHIHLLELIADEGVHSWQLQITVAHGNAA
DNADIILQPYMFLELFDQLDAIIDRAFERRVRIWPANNLGYFGPFEHKLRKSQKAHYRGCSAGRSTIGIESDGNI
KNCPSLGGPANIGGSWREHGLAKIWKEAAEITYIRRRTVDDLWGYCRECYYAETCMSGCTAANEPLLGRPGN
NPFCHHRALEMDRMGMRERIEPFIPAKGVPFDNGLFRLIREWKDPARREAEGPVEVTEPRVSRLIDEMGSGR
AIRMDELVDGRAPFELKDH
SEQ ID NO: 108 (80% TIGR04103, >KIG15048)
MRLKDARKRLPVVTSPLPKGRGRKFMTNEDAPRPALAVWEFTLACDHRCLHCGPRAGEPRADELSTDEALRL
VDDLAEAGVGEVVLIGGEAYLRNDFLLVIRRIRERGMTCTMTTGGLGVTKTRAEAMVEAGIQSVSVSIDGLEP
AHDRLRNREGSWKRAFEALANLRAAGAKVSVNSQINQVNFGDHEPLLELIAKAGAHSWQLQITVAHGNAAD
NADIILQPYRFLELFEQLDRIADRAHELKVRIWPANNLGYFGPFEAKLRRYQKLHYRGCSAGKSTIGIESNGMLK
NCPSLGGPANVAGSWREHGFDPIWQGAPEMTYIRRRTIDDLWGYCRECYYATTCMSGCTAANEPLLGRPG
NNPFCHHRAIEMDRMDMRERIEPVAAAQGVPFDNGLFRIIREHKDPARREAEGPIEITEPRVSRLVEEMGSGR
PIRADELPDGRVPFEK
SEQ ID NO: 109 (80% TIGR04103, >WP_053236092)
MTHGSIRDPRVTPIGETGLRYVVWELTLRCDLACRHCGSRAG KAREDELSTDEALDVVRQLASMGAREVVLIG
GEAYLRDDWTIVARAIADAGMRCAMVSGGRGLDATRARAAREAGVASVSISIDGIGATHDVQRGLDGAFES
ARVAMRNLRDAGVTLQANTQVNRLSYPELDAILDLLVEERATGWQLAMTVPMGRAADRPDWLLQPHELLE
VYPKLAALAERGARHGVLFFPGNNIGYFGPHEATLRGRGITDDVAWGGCIAGKHAMGIESDGSIKGCPSLPSA
DWVGGTAREASLREIWEQTRELRYVRDRELPGALWGECARCYYASVCGGGCTWTAHTFFGRPGNNPYCHH
RALEMRARGERERLVRVEAAPGAPFDHGRWEIVVEPWVEGEGVARVERPSKRLRVL
SEQ ID NO: 110 (80% TIGR04103, >WP_006969608) MSTSSDSPARRGPSLPVLDGRGRSDGKLRLPLAEQTRECDTLARPEYAVWELTLRCDLACRHCGSRAGKARPD
ELSTEEALELVTQMAEMGVQETTVIGGEAYLRADWHRIARALTDAGISTTMTTGGRGLDPERVALAKAAGIQ
SVSVSIDGLEAEHDYQRNLKGSYAAAMAALDNLAAAGIPRSVNTQLNGSNLRDIEALLEVIATKGIHSWQIQIT
VAMGRAADHPELLLQPWQMIELMPMAARIARRCRELGIRLWPGSNVGYFGPYEALLRWDHPDGHQTGCE
AGTRTLGIEANGDIKGCPSLPTADYVGANVRDHSLRAIWERSSALRFNRERGTEELWGRCASCYYAPICKAGCT
WTGHVLFGRRGNNPYCHHRALELLREGRRERVELREAASGDPFDHAIYELIEEPWPEPELSRARAVAESGEG
WIR
SEQ ID NO: 111 (80% TIGR04103, >KIG11737)
MSRPSLPIVDASRPKPRVRLPLAPGVRSCDDVRPEYAVWEVTLRCDLACRHCGSRAGHARADELDTEEALDLV
TQMAALGVKETTIIGGEAYLRDDWHLIAAALVNAGIRCTMTTGGRGLTAERVEIAKRAGIESVSVSIDGLAQAH
DHLRALHGSHAAAMRALDHLRAAGIPRSVNTQLNGYNLREIEPLLDQLTAREIHSWQVQITVAMGRAADHPE
LLLQPWQMLELMPLVARLARRCDELGIRLWPGSNIGYFGPYEQLLRWDHRDGHQTGCDAGTRTLGIEANGD
IKGCPSLPSNEYVGGNVREHSLREIWERADALRFNRERRVDELWGRCAGCYYADECKAGCTWTGHVLFGRR
GNNPYCHHRALELLREGRRERLELHTPAPGEPFDHGLYRVIEEPWPAELIDRAREVAASGVGWIS
SEQ ID NO: 112 (80% TIGR04103, >AKV02060)
MELVACSDPTVSIRDPLLAAERAELLRAKRPRVGLPTISKPRRPLPVLSEPRQRDRSIRPRHAVWEITLRCDQAC
RHCGSRAGVERPNELTTEECLDLVRQIAELGVMEVTLIGGEAYLRPDFVQIVRAIRSHGMHCTMTTGGRGLSP
TLAREAAAAGLGSASVSIDGAEETHDRLRGAKGSHRDAIAAMRALREAGVRLTVNTQINRLSLVDLPSILEMLV
REGAEAWQIMLTVAMGRAADEPDVLLQPYDLLDLFPLLDTLAARCEEHGVRLYPGNNLGYFGPYESRLRGTLP
RGHGTSCSAGRGTLGIESDGLVKGCPSLPSEQWGGGTVRDHSLVDLWERASALRYTRDRTVEDLTGFCRTCY
YADICRAGCTWTTSVLFGRPGDNPYCHHRALERDREGLRERLVRRQPAPGEPFDHGIFELIVEPVPGEKEANS
SEQ ID NO: 113 (80% TIGR04103, >AKU99181)
MEKFDPLTAKARAKELQRERPKRALPIAPTAPLGVRHRPLREPRDVDRRYRPIYAVWEITLACDLACRHCGSRA
GRERPDELDTKEALDLVGQMASLGVKEVTLIGGEAYLRGDWLDIVRAIRAHGMVATMTSGGRGLTPELVAQ
AHEAGLVGASISLDGDEVTHDRLRGVKGSYRAAIEALRALRERNMRVSCNSQINRLSVPYLDFILESIAAIGVHS
WQIQLTVPMGRAADEPDVLLQPYDLLELFPRLAELKKRCDELAVRMLPGNNIGYFGPYESTLRGYHVSGHAGS
CGAGRATLGIEANGAIKGCPSLPTEHWTGGNVRDASLLDIWERAEPLRYTRDRTVDDLWGFCRTCYYAEECAS
GCTWTSFVTLGKAGNNPYCHHRALELEKRGKRERVVRVQSAPGEPFDHGVFALVEEDLESGSETL
SEQ ID NO: 114 (80% TIGR04103, >AKF04999)
MSAAERDALLRAPRPRSLPLAPMAEPARRRLPLADDARAIDRRVRPIYAVWEITLACDLACRHCGSRAGRDRP
DELSTEQCLDLVDQMAELGVKEVSLIGGEAYLRDDWTDIIRRIRARGMMAILTTGGRGITPERARDAAAAGLQ
SASVSIDGDEATHDRLRGVVGSYRAALEAMKNLRAAGVQVSANTQINRLSAPDLPSVLETIADHGAHSWQIQ
LTVAMGRAADEPEVLLQPYDLLEVFPMLARLAERCRERGVRLWPGNNVGYFGPYEHALRGTLPKGHMYSCG
AGRSTLGVEADGSI KGCPSLPTLSWTGGN I RDAKLVDIWERGGTAMRYTRDRTVDDLWGYCRTCYYADECRA
GCTWTGFVLFGRAGNNPYCHHRALEMQRAGKRERVVKVESAPGQPFDHARFELIVEDVSGEETRDAETV
SEQ ID NO: 115 (PlpA2 gene, Gene ID: 2509711952)
ATGTCTATCGAAAACGCAAAATCATTTTATGAACGAGTATCTACGGATAAGCAATTTCGTACTCAACTCGA
AAATACAGCTTCAGCAGAAGAACGTCAGAAAATTATCCAAGCTGCTGGTTTTGAATTTACAAATCAAGAA
TGGGAAATTGCTAAAGAGCAGATTTTAGCAACATCTGAATCTAATAATGGTGAATTAAGTGAAGCAGAAT
TAACGGCGGTAAGTGGTGGAGTTGATTTATCTATTTTTGAGTTATTAGATGAAGAACCATTATTTCCTATC
AGACCTCTGTATGGTTTACCAATATAG
SEQ ID NO: 116 (PlpA2 protein)
MSIENAKSFYERVSTDKQFRTQLENTASAEERQKIIQAAGFEFTNQEWEIAKEQILATSESNNGELSEAELTAVS
GGVDLSIFELLDEEPLFPIRPLYGLPI
SEQ ID NO: 117 (PlpA3, Gene ID: 2509711953)
ATGTCTATTGAAAGTGCAAAAGCTTTTTACCAAAGAATGACCGATGATGCATCCTTTCGCACACCATTTGA
AGCAGAATTATCTAAAGAAGAACGCCAACAACTAATTAAAGACTCTGGCTATGATTTCACTGCCGAGGAA
TGGCAGCAAGCGATGACAGAAATTCAAGCTGCTAGGTCTAATGAGGAATTGAATGAAGAAGAACTTGAA
GCGATCGCAGGTGGTGCTGTAGCAGCAATGTATGGCGTAGTTTTTCCTTGGGATAATGAATTTCCTTGGC
CTAGGTGGGGGGGATAA SEQ ID NO: 118 (PlpA3 protein)
MSIESAKAFYQRMTDDASFRTPFEAELSKEERQQLI KDSGYDFTAEEWQQAMTEIQAARSN EELN EEELEAIA
GGAVAAMYGVVFPWDN EFPWPRWGG)
SEQ ID NO: 119 (PlpX gene, Gene I D: 2509711951)
ATG ACTA A AA A ATAC AG ACG AGTT AGTTATG C AGTTTG G G A AATT ACCTTG AA ATG C A ATCT AG CTTGTA
GTCACTGTGGTTCGAGAGCAGGGCAGGCAAGAACCAAGGAACTATCTACAGAAGAAGCTTTTAATCTGG
TTCGGCAACTAGCCGATGTAGGAATCAAAGAGGTTACTCTAATCGGTGGCGAAGCCTTTATGCGCTCTGA
TTGGCTAGAAATTGCCAAGGCTGTTACTGAGGCAGGGATGATCTGCGGTATGACTACAGGTGGATTTGG
TGTCAGTTTGGAAACTGCCAGAAAAATGAAAGAAGCTGGAATTAAAACAGTTTCTGTATCTATCGATGGT
GGCATACCAGAAACCCACGATCGCCAGCGAGGGAAAAAAGGTGCTTGGCATTCTGCTTTTAGAACCATG
AGCCATCTAAAAGAAGTCGGCATCTATTTTGGCTGCAATACCCAGATAAACCGTTTATCTGCCTCTGAATT
CCCAATAATTTACGAACGAATAAGGGACGCTGGAGCAAGAGCTTGGCAGATTCAATTAACTGTACCTATG
GGTAATGCGGCAGATAATGCAGACATGTTATTGCAACCATACGAACTATTAGATATTTATCCCATGTTAGC
TCGTGTTGCTAAACGAGCTAAACAGGAAGGTGTTCGCATACAGGCGGGAAATAATATTGGCTATTATGGC
CCTTATGAAAGACTGCTGCGTGGTAGTGATGAATGGACATTTTGGCAGGGTTGCGGAGCGGGTTTAAAT
ACCTTG G GTATCG A AG CTG ATG G C A A AATTAA AG GTTGTCCTTCTTTACCT ACGG CTG CTT ATACG G G CG
GTAATATCCGCGATCGCCCTTTAAGAGAAATAGTCGAACAGACTGAAGAGCTTAAATTTAATCTGAAGGC
TGGGACTGAACAGGGCACAGACCACATGTGGGGATTTTGTAAAACCTGTGAATTTGCTGAACTCTGTCGA
G GTGGTTGTTCTTG G ACGG CTC ATGTCTTCTTTG ATCGCCGTGG G AATAATCCCTACTG CCATCATCGTG C
TTTGAAACAGGCACAAAAAGACATCAGAGAAAGATTCTATTTAAAAGTAAAAGCAAAAGGGAATCCTTTT
GATAATGGGGAATTTGTCATTATAGAAGAACCTTTCAACGCACCTTTGCCAGAGAACGATTTGCTTCATTT
TAATAGCGATCACATTCAGTGGCCAGAAAACTGGCAAAATTCTGAATCTGCTTACGCTTTAGCAAAGTAA
SEQ ID NO: 120 (PlpX protein)
MTKKYRRVSYAVWEITLKCN LACSHCGSRAGQARTKELSTEEAFNLVRQLADVGI KEVTLIGG EAFM RSDWLE
IAKAVTEAGMICGMTTGGFGVSLETARKMKEAGI KTVSVSI DGGI PETH DRQRGKKGAWHSAFRTMSH LKEV
GIYFGCNTQI N RLSASEFPI IYERI RDAGARAWQIQLTVPMGNAADNADM LLQPYELLDIYPM LARVAKRAKQ
EGVRIQAGN IGYYG PYERLLRGSDEWTFWQGCGAG LNTLGI EADGKI KGCPSLPTAAYTGGN I RDRPLREIV
EQTEELKFNLKAGTEQGTDH MWGFCKTCEFAELCRGGCSWTAHVFFDRRGN N PYCHHRALKQAQKDI RERF
YLKVKAKG NPFDNGEFVII EEPFNAPLPEN DLLHFNSDH IQWPENWQNSESAYALAK
SEQ ID NO: 121 (PlpY gene, Gene ID: 2509711950)
TCATGTCAGAAAATTGCTAATTTCAGCAATTTCTAAACTTTGTTTTGCTTTAATTAATGCTTTGATAAAAGCT TGCTTGTCTTGCCAACCCTGACGAGGTAAAACCGAGCTGGAATCATCTGATTTTTGAGCTGCTGTGGCTAC TTTATTTG GT ATTTG ATTAG AGTTC AT SEQ ID NO: 122 (PlpY protein)
M NSNQI PNKVATAAQKSDDSSSVLPRQGWQDKQAFIKALI KAKQSLEIAEISN FLT
SEQ ID NO: 123 (PlpA2 core peptide)
VDLSIFELLDEEPLFPI RPLYGLPI
SEQ ID NO: 124 (PlpA3 core peptide)
AVAAMYGVVFPWDN EFPWPRWGG
SEQ ID NO: 125 (PlpA3-9 core peptide)
AVAAMYGVV
SEQ ID NO: 126 (PcpA core peptide)
VTGGSG IYGPIQAMYGAVVGDPKPG KDWGWRFPSPLPKPSPI PSPWKPPV DVQPMYGWVSN DS SEQ ID NO: 127 (PlpA3 core peptide)
AVAAMYGVV FPWDN EFPWPR SEQ ID NO: 128 (PlpA3 digested)
AVAAMYGVV FPW
SEQ ID NO: 129 (PlpA3 mutated M5G)
AVAAGYGVVFPWDN EFPWPR
SEQ ID NO: 130 (PlpA3 mutated M5A) AVAAAYGVVFPWDNEFPWPR
SEQ ID NO: 131 (PlpA3 mutated M5V)
AVAAVYGVVFPWDNEFPWPR
SEQ ID NO: 132 (PlpA3 mutated M5L)
AVAALYGVVFPWDNEFPWPR
SEQ ID NO: 133 (PlpA3 mutated M5S)
A V A AS YG VV FPWDNEFPWPR
SEQ ID NO: 134 (PlpA3 mutated M5P)
AVAAPYGVVFPWDNEFPWPR
SEQ ID NO: 135 (PlpA3 mutated M5E)
AVAAEYGVVFPWDNEFPWPR
SEQ ID NO: 136 (PlpA3 mutated M5Q)
AVAAQYGWFPWDNEFPWPR
SEQ ID NO: 137 (PlpA3 mutated M5F)
AVAAFYG VV FP W D N E F PW P R
SEQ ID NO: 138 (PlpA3 mutated M5K)
AVAAKYGVVFPWDNEFPWPR
SEQ ID NO: 139 (PlpA3 mutated 4A5)
AVAAAMYGVVFPWDNEFPWPR
SEQ ID NO: 140 (PlpA3 mutated 7A8)
AVAAM YG AVVF P W D N E F P W PR
SEQ ID NO: 141 (PlpA3 mutated A4d)
AVAMYGVVFPWDNEFPWPR
SEQ ID NO: 142 (PlpA3 mutated V8d)
AVAAMYGVFPWDNEFPWPR
SEQ ID NO: 143 (PlpA3 mutated Y6F)
AVAAM FGVVFPWDNEFPWPR
SEQ ID NO: 144 (PlpA3 mutated Y6W)
AVAAMWGVVFPWDNEFPWPR
SEQ ID NO: 145 (PlpX_F primer)
CTCGCATATGACTAAAAAATACAGACGAGTTAGTTAT
SEQ ID NO: 146 (PlpX_R primer)
ATCTCTCG AGTTACTTTG CTA A AG CGTA AG C AG A
SEQ ID NO: 147 (PlpY_F primer)
GCGAACTCATGA ACTCTAATCAAATACCAAATAAA
SEQ ID NO: 148 (PlpY_R primer)
G CG C AG CTGTTATGTC AG AA A ATTG CT

Claims

Claims
1. Use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine.
2. Use according to claim 1, wherein the rSAM enzyme is a nifll-class peptide radical SAM maturase 3.
3. Use according to claim 1 or 2, wherein the rSAM enzyme comprises
(A) an amino acid sequence according to Formula (I) (SEQ ID NO: 1) or (II) (SEQ ID NO : 2)
FormulS (I); Xi-X2~X3~X4~X5~X6~X7~X8~^9~^10~^ll~^12~^13~^14~^15~^16~^17~^18~^19~^20/
Formula (II): Z1-Z2-Z3-Z4-Z5-Z6-Z7-Z8-Z9-Z1o-Z11-Z12-Z13-Z14-Z15-Z16-Z17-Z18-Z19-Z2o, wherein Χι-Χ20 and Zi-Z20 each denote amino acids and
Xi is selected from the group consisting of Y, H, F and W, preferably Y and H;
X2 is selected from the group consisting of Y, R and H;
X3 is selected from the group consisting of R, K and Q, preferably R;
X4 is selected from the group consisting of I, T and V, preferably I and T;
X5 is selected from the group consisting of R, S and K, preferably R and S;
X6 is selected from the group consisting of H, Y, F and W, preferably H and Y;
X7 is selected from the group consisting of A and S, preferably A;
X8 is selected from the group consisting of V, I and L, preferably V;
X9 is selected from the group consisting of W, Y and F, preferably W;
Xio is selected from the group consisting of E, Q, D and K, preferably E;
Xii is selected from the group consisting of I, L, V and M, preferably I and L;
Xi2 is selected from the group consisting of T and S, preferably T;
Xi3 is selected from the group consisting of L, M, I and V, preferably L;
Xi4 is selected from the group consisting of K, R, E and Q, preferably K;
Xi5 is C;
Xi6 is selected from the group consisting of N and D, preferably N; Xi7 is selected from the group consisting of L, M, I and V, preferably L;
Xis is selected from the group consisting of A and S, preferably A;
X20 is selected from the group consisting of S, Q, E and K, preferably S and Q;
Zi s selected from the group consisting of T, D, T, E and N, preferably T and D; Z2 s selected from the group consisting of R, P, N and A, preferably R, P and A; Z3 s selected from the group consisting of R, Q, K and L, preferably R;
Z4 s P;
z5 s selected from the group consisting of A and S, preferably A;
z6 s selected from the group consisting of R, K and Q, preferably R;
z7 s selected from the group consisting of Y, F, H and W, preferably Y;
z8 s selected from the group consisting of L, M, I and V, preferably L;
z9 s selected from the group consisting of F, H, S and Y, preferably H, F and S;
Zio s selected from the group consisting of D, E and A, preferably D and E;
Zii s selected from the group consisting of D, S and T, preferably T;
Zl2 s selected from the group consisting of D, E and N, preferably D;
Zl3 s selected from the group consisting of Y, F, L and M, preferably Y, F and L;
Zl4 s selected from the group consisting of K, Q, R and E, preferably Q and K;
Zl5 s selected from the group consisting of R, K and Q, preferably R;
Zl6 s selected from the group consisting of Y, F and W, preferably Y and F;
Zl7 s selected from the group consisting of V, I and L, preferably V; Zl9 s selected from the group consisting of V, I and L, preferably V; and
Z20 s selected from the group consisting of H and Y, preferably H;
or
an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with at least one of the amino acid sequences of Formula (I) or (II), more preferably an amino acid sequence having at least 14, 16, 18 or 19 of the 20 amino acids of Formula (I) or (II); (C) a functional fragment and/or functional derivative of (A) or (B), preferably a functional fragment of at least 10 amino acids, more preferably at least 15 amino acids of (A) or (B).
4. Use according to any of claims 1 to 3, wherein the rSAM enzyme comprises at least one motif selected from the group consisting of
(i) motif CXXXCXXC (SEQ ID NO: 3);
(ii) motif EXTXXCXXXCXXCGX XXXX XXEL (SEQ ID NO: 4);
(iii) motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5); and
(iv) motif CX9.15GX4C (SEQ ID NO: 6) and CX2CX5CX3CX14.18C (SEQ ID NO: 7),
wherein is X is any natural amino acid and wherein the integers denote the number of X(s).
5. Use according to any of claims 1 to 4, wherein the rSAM enzyme comprises
(i) an amino acid sequence according to Formula (I) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (I);
(ii) a motif EXTXXCXXXCXXCGXRXXXXRXXEL (SEQ ID NO: 4), wherein X is any natural amino acid;
and wherein the rSAM enzyme further comprises
(iii) a motif CXXGXXXXXXXXXGXXKXCP (SEQ ID NO: 8) and/or
GXCXXCXXXXXCXXXCXXXXXXXXXXXGXNPXCXXR (SEQ ID NO: 9), wherein X is any natural amino acid.
6. Use according to any of claims 1 to 4, wherein the rSAM enzyme comprises
(i) an amino acid sequence according to Formula (II) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (II);
(ii) a motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5), wherein X is any natural amino acid; and wherein the rSAM enzyme further comprises
(iii) a motif CXAGXXXXXEADGXXKXCPXL (SEQ ID NO: 10) and/or
CXXCYYXXXCXXGCXWXXXXLXGXXGXNPXCXXR (SEQ ID NO: 11), wherein X is any natural amino acid.
7. Use according to any of claims 1 to 6, wherein the rSAM enzyme comprises an amino acid sequence selected from the group consisting of (i) sequences listed in any of SEQ ID NOs: 12 to 54, preferably SEQ ID NOs: 39 and 40, or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 12 to 54; and
(ii) sequences listed in any of SEQ ID NOs: 55 to 113, preferably SEQ ID NOs: 93 and 94 or an amino acid sequence having an amino acid sequence identity of at least 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 55 to 113.
8. Use of a recombinant vector comprising a nucleic acid encoding an rSAM enzyme as defined in any of claims 1 to 7 in a method for introducing at least one a-keto^3-amino acid into
(poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non- natural amino acid, Y is tyrosine and G is glycine, preferably a viral or episomal vector, more preferably a vector selected from the group consisting of lentivirus vector, adenovirus vector, baculovirus vector, bacterial vector and yeast vector.
9. Use of a host cell expressing an rSAM enzyme as defined in any of claims 1 to 7, preferably comprising a recombinant vector as defined in claim 8, in a method for introducing at least one a-keto^3-amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a host cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae cells, and Pichia pastoris cells; bacterial cells, preferably E. coli cells, and Bacillus subtilis cells; plant cells, preferably Nicotiana tabacum, and Physcomitrella patens; NIH-3T3 mammalian cells; and insect cells, preferably sf9 insect cells.
10. Use of a radical S-adenosyl methionine (rSAM) enzyme according to any of claims 1 to 7, a vector and/or host cell according to any of claims 8 or 9 in combination with an rSAM- associated protein comprising an amino acid sequence selected from the group consisting of
(a) SEQ ID NO: 121,
(b) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with SEQ ID NO: 121, and
(c) a functional fragment and/or functional derivative of (a) or (b), preferably a functional fragment of at least 30 amino acids, more preferably at least 45 amino acids of (a) or (b).
11. Method for introducing at least one a-keto^3-amino acid into (poly)-peptides comprising the steps of: (i) providing a radical S-adenosyl methionine (rSAM) enzyme as defined in any of claims 1 to 7, and/or a host cell as defined in claim 9,
(ii) optionally providing an rSAM-associated protein as defined in claim 10,
(iii) providing at least one (poly)peptide substrate of interest comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, and
(iv) contacting the enzyme and/or host cell of (i) with the substrate of (iii) and optionally the rSAM-associated protein of (ii) under conditions suitable for the enzymatic introduction of at least one a-keto^3-amino acid into the substrate.
12. Method according to claim 11, wherein at least one of the enzyme in step (i) and the
(poly)peptide substrate in step (iii), preferably both, optionally together with the rSAM- associated protein of optional step (ii), are provided in the form of a host cell, more preferably are all co-expressed in the host cell as defined in claim 9.
13. Method according to claim 11 or 12, wherein the host cell for use in step (i) of the above method is an E. coli host cell.
14. Method according to any of claims 11 to 13, wherein step (iv) is followed by step (v), wherein the keto-functionality resulting from step (iv) is reduced chemically, preferably by sodium borohydride, or is converted to an imine, preferably the methoxyamine.
PCT/EP2018/050225 2017-01-06 2018-01-05 USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES WO2018127544A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP17150498.8 2017-01-06
EP17150498 2017-01-06

Publications (1)

Publication Number Publication Date
WO2018127544A1 true WO2018127544A1 (en) 2018-07-12

Family

ID=57796156

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/050225 WO2018127544A1 (en) 2017-01-06 2018-01-05 USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES

Country Status (1)

Country Link
WO (1) WO2018127544A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4159743A1 (en) * 2021-09-30 2023-04-05 ETH Zurich Methods for preparing pyridazine compounds

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030113882A1 (en) 1998-11-24 2003-06-19 Wisconsin Alumni Research Foundation Methods for the preparation of beta-amino acids
WO2007047680A2 (en) * 2005-10-14 2007-04-26 Cargill, Incorporated Increasing the activity of radical s-adenosyl methionine (sam) enzymes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030113882A1 (en) 1998-11-24 2003-06-19 Wisconsin Alumni Research Foundation Methods for the preparation of beta-amino acids
WO2007047680A2 (en) * 2005-10-14 2007-04-26 Cargill, Incorporated Increasing the activity of radical s-adenosyl methionine (sam) enzymes

Non-Patent Citations (21)

* Cited by examiner, † Cited by third party
Title
"NCBI", Database accession no. rad_SAM_trio
"NCBI", Database accession no. rSAM_nif11_3
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
ALTSCHUL ET AL., NCB NLM NIH BETHESDA
APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 72, 2006, pages 211
CLARISSA MELO CZEKSTER ET AL: "In Vivo Biosynthesis of a β-Amino Acid-Containing Protein", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 138, no. 16, 27 April 2016 (2016-04-27), US, pages 5194 - 5197, XP055369605, ISSN: 0002-7863, DOI: 10.1021/jacs.6b01023 *
CZEKSTER ET AL., JACS, vol. 138, 2016, pages 5194 - 5197
DATABASE EMBL [online] 23 November 2015 (2015-11-23), "Labilithrix luteola Radical SAM, Pyruvate-formate lyase-activating enzyme like protein", XP002778916, Database accession no. AKU99181 *
DATABASE GenPept [online] 27 October 2016 (2016-10-27), "radical SAM/SPASM domain-containing protein [Ruegeria sp. ANG-S4]", XP002778915, retrieved from NCBI Database accession no. WP_052261552 *
FREEMAN ET AL., SCIENCE, vol. 338, 2012, pages 387 - 390
FUKUHARA, K. ET AL., ORG. LETT., vol. 17, 2015, pages 2646 - 2648
HAFT ET AL., BMC BIOLOGY, vol. 8, 2010, pages 70
HAFT ET AL., J. BACTERIOL., vol. 193, 2011, pages 2745 - 2755
HAFT ET AL., NUCLEIC ACIDS RES., vol. 31, 2003, pages 371 - 373
HAFT; BASU, J. BACTERIOL., vol. 193, 2011, pages 2745 - 2755
HAMADA ET AL., TETRAHEDRON LETT., vol. 35, 1994, pages 719 - 720
LARKIN MA ET AL., BIOINFORMATICS, vol. 23, 2007, pages 2947 - 2948
LAU; SUN, BIOTECHNOL ADV., vol. 27, no. 6, 2009, pages 1015 - 1022
METHODS IN ENZMOLOGY, vol. 350, 2002, pages 248
MORINAKA, B. I. ET AL., ANGEW. CHEM. INT. ED., vol. 53, 2014, pages 8503 - 8507
WESTERS ET AL., MOL. CELL. RES., vol. 1694, no. 1-3, 2004, pages 299 - 310

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4159743A1 (en) * 2021-09-30 2023-04-05 ETH Zurich Methods for preparing pyridazine compounds
WO2023052526A1 (en) * 2021-09-30 2023-04-06 Eth Zurich Methods for preparing pyridazine compounds

Similar Documents

Publication Publication Date Title
AU2016272543B2 (en) Methods and products for fusion protein synthesis
AU2018258000B2 (en) Proteins and peptide tags with enhanced rate of spontaneous isopeptide bond formation and uses thereof
AU2018251237B2 (en) Peptide ligase and use thereof
US5656726A (en) Peptide inhibitors of urokinase receptor activity
CN102762737B (en) The method of production pyridine Nan Ping
CN111757891A (en) Chemical-enzymatic synthesis of somaglutide, liraglutide and GLP-1
KR100762315B1 (en) Fuzacidin biosynthesis enzymes and genes encoding them
WO2018127544A1 (en) USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES
JP2007319063A (en) Method for producing dipeptide
JP6637904B2 (en) Cluster of colistin synthetase and corresponding genes
KR100750658B1 (en) Polymyxin biosynthesis enzymes and gene families encoding them
CA3188462A1 (en) Chemical synthesis of large and mirror-image proteins and uses thereof
CN103382496B (en) Method for preparation of S-adenosylmethionine
US11542508B2 (en) Isolated polynucleotides and polypeptides and methods of using same for expressing an expression product of interest
CA3204424A1 (en) A protein translation system
Kastrinsky et al. A convergent synthesis of chiral diaminopimelic acid derived substrates for mycobacterial L, D-transpeptidases
EP3997231A1 (en) Methods for making recombinant protein
WO2024197333A1 (en) Reagents and methods for energy generation
KR20120127617A (en) Method for producing pyripyropene derivative by enzymatic process
CA3194643A1 (en) Long-acting dnase
IL299113A (en) A method for the production of oleic acid in a type of host Ambozoa
WO2003056009A1 (en) Nucleotide sequences having activity of controlling translation efficiency and utilization thereof
EP3615666A1 (en) Process for obtaining omphalotin in a host cell comprising polynucleotides encoding for polypeptides involved in the omphatolin biosynthesis
JP2007029008A (en) New gene and its application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18701105

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18701105

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载