WO2018127544A1 - USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES - Google Patents
USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES Download PDFInfo
- Publication number
- WO2018127544A1 WO2018127544A1 PCT/EP2018/050225 EP2018050225W WO2018127544A1 WO 2018127544 A1 WO2018127544 A1 WO 2018127544A1 EP 2018050225 W EP2018050225 W EP 2018050225W WO 2018127544 A1 WO2018127544 A1 WO 2018127544A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- amino acid
- seq
- group
- rsam
- acid sequence
- Prior art date
Links
- 108090000765 processed proteins & peptides Proteins 0.000 title claims abstract description 71
- 102000004190 Enzymes Human genes 0.000 title claims abstract description 61
- 108090000790 Enzymes Proteins 0.000 title claims abstract description 61
- MEFKEPWMEQBLKI-AIRLBKTGSA-N S-adenosyl-L-methioninate Chemical compound O[C@@H]1[C@H](O)[C@@H](C[S+](CC[C@H](N)C([O-])=O)C)O[C@H]1N1C2=NC=NC(N)=C2N=C1 MEFKEPWMEQBLKI-AIRLBKTGSA-N 0.000 title claims abstract description 14
- 229960001570 ademetionine Drugs 0.000 title claims abstract description 14
- 102000004196 processed proteins & peptides Human genes 0.000 title claims description 29
- 150000001413 amino acids Chemical group 0.000 claims abstract description 101
- 238000000034 method Methods 0.000 claims abstract description 31
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 claims abstract description 29
- 239000000758 substrate Substances 0.000 claims abstract description 25
- 239000004471 Glycine Substances 0.000 claims abstract description 15
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 claims abstract description 13
- 125000001493 tyrosinyl group Chemical group [H]OC1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 claims abstract description 13
- 108090000623 proteins and genes Proteins 0.000 claims description 59
- 210000004027 cell Anatomy 0.000 claims description 42
- 102000004169 proteins and genes Human genes 0.000 claims description 40
- 239000013598 vector Substances 0.000 claims description 29
- 229910052727 yttrium Inorganic materials 0.000 claims description 26
- 229910052731 fluorine Inorganic materials 0.000 claims description 20
- 229910052739 hydrogen Inorganic materials 0.000 claims description 16
- 239000012634 fragment Substances 0.000 claims description 14
- 241000588724 Escherichia coli Species 0.000 claims description 13
- 102000016812 Radical SAM Human genes 0.000 claims description 9
- 108050006523 Radical SAM Proteins 0.000 claims description 9
- 229910052717 sulfur Inorganic materials 0.000 claims description 8
- 229910000033 sodium borohydride Inorganic materials 0.000 claims description 7
- 239000012279 sodium borohydride Substances 0.000 claims description 7
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 6
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 claims description 6
- 230000002829 reductive effect Effects 0.000 claims description 6
- 241000238631 Hexapoda Species 0.000 claims description 5
- 230000001580 bacterial effect Effects 0.000 claims description 5
- 229910052757 nitrogen Inorganic materials 0.000 claims description 4
- 108020004707 nucleic acids Proteins 0.000 claims description 4
- 102000039446 nucleic acids Human genes 0.000 claims description 4
- 150000007523 nucleic acids Chemical class 0.000 claims description 4
- 241000196324 Embryophyta Species 0.000 claims description 3
- 241000713666 Lentivirus Species 0.000 claims description 3
- GMPKIPWJBDOURN-UHFFFAOYSA-N Methoxyamine Chemical compound CON GMPKIPWJBDOURN-UHFFFAOYSA-N 0.000 claims description 3
- 230000002255 enzymatic effect Effects 0.000 claims description 3
- 150000002466 imines Chemical class 0.000 claims description 3
- 210000004962 mammalian cell Anatomy 0.000 claims description 3
- 241000701161 unidentified adenovirus Species 0.000 claims description 3
- 241000701447 unidentified baculovirus Species 0.000 claims description 3
- 244000063299 Bacillus subtilis Species 0.000 claims description 2
- 235000014469 Bacillus subtilis Nutrition 0.000 claims description 2
- 241000235058 Komagataella pastoris Species 0.000 claims description 2
- 244000061176 Nicotiana tabacum Species 0.000 claims description 2
- 235000002637 Nicotiana tabacum Nutrition 0.000 claims description 2
- 241000195887 Physcomitrella patens Species 0.000 claims description 2
- 229910052698 phosphorus Inorganic materials 0.000 claims description 2
- 230000003612 virological effect Effects 0.000 claims description 2
- 210000005253 yeast cell Anatomy 0.000 claims description 2
- 229940024606 amino acid Drugs 0.000 description 83
- 235000001014 amino acid Nutrition 0.000 description 58
- 229940088598 enzyme Drugs 0.000 description 43
- 125000003275 alpha amino acid group Chemical group 0.000 description 34
- 235000018102 proteins Nutrition 0.000 description 32
- 239000002243 precursor Substances 0.000 description 28
- 239000000047 product Substances 0.000 description 26
- 150000001576 beta-amino acids Chemical class 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 12
- 230000014509 gene expression Effects 0.000 description 12
- DZGWFCGJZKJUFP-UHFFFAOYSA-N tyramine Chemical compound NCCC1=CC=C(O)C=C1 DZGWFCGJZKJUFP-UHFFFAOYSA-N 0.000 description 12
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 8
- 239000002609 medium Substances 0.000 description 8
- 238000001228 spectrum Methods 0.000 description 8
- 238000000746 purification Methods 0.000 description 7
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 108091008053 gene clusters Proteins 0.000 description 6
- 229930014626 natural product Natural products 0.000 description 6
- 150000003254 radicals Chemical class 0.000 description 6
- 229960003732 tyramine Drugs 0.000 description 6
- GDAZZZLHHSQPOQ-UHFFFAOYSA-N 2-methyl-3-phenyloxaziridine Chemical compound CN1OC1C1=CC=CC=C1 GDAZZZLHHSQPOQ-UHFFFAOYSA-N 0.000 description 5
- 108010074860 Factor Xa Proteins 0.000 description 5
- 229910052799 carbon Inorganic materials 0.000 description 5
- 125000000468 ketone group Chemical group 0.000 description 5
- 239000013615 primer Substances 0.000 description 5
- 239000011347 resin Substances 0.000 description 5
- 229920005989 resin Polymers 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 101001069755 Bacterium symbiont subsp. Theonella swinhoei (strain pTSMAC1) Polytheonamide B Proteins 0.000 description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 4
- 238000005481 NMR spectroscopy Methods 0.000 description 4
- 108090000631 Trypsin Proteins 0.000 description 4
- 102000004142 Trypsin Human genes 0.000 description 4
- 239000002253 acid Substances 0.000 description 4
- -1 aromatic amino acids Chemical class 0.000 description 4
- 238000010828 elution Methods 0.000 description 4
- 239000013604 expression vector Substances 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 210000003705 ribosome Anatomy 0.000 description 4
- 239000012588 trypsin Substances 0.000 description 4
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 3
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 3
- 241000894006 Bacteria Species 0.000 description 3
- 108090000317 Chymotrypsin Proteins 0.000 description 3
- SNDPXSYFESPGGJ-UHFFFAOYSA-N L-norVal-OH Natural products CCCC(N)C(O)=O SNDPXSYFESPGGJ-UHFFFAOYSA-N 0.000 description 3
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 3
- 108010033276 Peptide Fragments Proteins 0.000 description 3
- 102000007079 Peptide Fragments Human genes 0.000 description 3
- 108010019477 S-adenosyl-L-methionine-dependent N-methyltransferase Proteins 0.000 description 3
- 208000005392 Spasm Diseases 0.000 description 3
- 230000001851 biosynthetic effect Effects 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 229960002376 chymotrypsin Drugs 0.000 description 3
- 230000004186 co-expression Effects 0.000 description 3
- 239000012149 elution buffer Substances 0.000 description 3
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 3
- 238000000338 in vitro Methods 0.000 description 3
- 108010000785 non-ribosomal peptide synthase Proteins 0.000 description 3
- 101150051209 pip gene Proteins 0.000 description 3
- 239000013612 plasmid Substances 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- GAUBNQMYYJLWNF-UHFFFAOYSA-N 3-(Carboxymethylamino)propanoic acid Chemical compound OC(=O)CCNCC(O)=O GAUBNQMYYJLWNF-UHFFFAOYSA-N 0.000 description 2
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 2
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 2
- ALYNCZNDIQEVRV-UHFFFAOYSA-N 4-aminobenzoic acid Chemical compound NC1=CC=C(C(O)=O)C=C1 ALYNCZNDIQEVRV-UHFFFAOYSA-N 0.000 description 2
- 101000728229 Asticcacaulis excentricus (strain ATCC 15261 / DSM 4724 / KCTC 12464 / NCIMB 9791 / VKM B-1370 / CB 48) Astexin-1 Proteins 0.000 description 2
- 101000728234 Asticcacaulis excentricus (strain ATCC 15261 / DSM 4724 / KCTC 12464 / NCIMB 9791 / VKM B-1370 / CB 48) Astexin-2 Proteins 0.000 description 2
- 101000728232 Asticcacaulis excentricus (strain ATCC 15261 / DSM 4724 / KCTC 12464 / NCIMB 9791 / VKM B-1370 / CB 48) Astexin-3 Proteins 0.000 description 2
- 101000761079 Burkholderia thailandensis (strain ATCC 700388 / DSM 13276 / CIP 106301 / E264) Capistruin Proteins 0.000 description 2
- 101000979117 Curvularia clavata Nonribosomal peptide synthetase Proteins 0.000 description 2
- 241000192700 Cyanobacteria Species 0.000 description 2
- FCKYPQBAHLOOJQ-UHFFFAOYSA-N Cyclohexane-1,2-diaminetetraacetic acid Chemical compound OC(=O)CN(CC(O)=O)C1CCCCC1N(CC(O)=O)CC(O)=O FCKYPQBAHLOOJQ-UHFFFAOYSA-N 0.000 description 2
- IAZDPXIOMUYVGZ-WFGJKAKNSA-N Dimethyl sulfoxide Chemical compound [2H]C([2H])([2H])S(=O)C([2H])([2H])[2H] IAZDPXIOMUYVGZ-WFGJKAKNSA-N 0.000 description 2
- ULGZDMOVFRHVEP-RWJQBGPGSA-N Erythromycin Chemical compound O([C@@H]1[C@@H](C)C(=O)O[C@@H]([C@@]([C@H](O)[C@@H](C)C(=O)[C@H](C)C[C@@](C)(O)[C@H](O[C@H]2[C@@H]([C@H](C[C@@H](C)O2)N(C)C)O)[C@H]1C)(C)O)CC)[C@H]1C[C@@](C)(OC)[C@@H](O)[C@H](C)O1 ULGZDMOVFRHVEP-RWJQBGPGSA-N 0.000 description 2
- 101001056191 Escherichia coli Microcin J25 Proteins 0.000 description 2
- 101100383920 Fragaria ananassa MCSI gene Proteins 0.000 description 2
- 229930194542 Keto Natural products 0.000 description 2
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 2
- LRQKBLKVPFOOQJ-YFKPBYRVSA-N L-norleucine Chemical compound CCCC[C@H]([NH3+])C([O-])=O LRQKBLKVPFOOQJ-YFKPBYRVSA-N 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- 102000008153 Peptide Elongation Factor Tu Human genes 0.000 description 2
- 108010049977 Peptide Elongation Factor Tu Proteins 0.000 description 2
- 241000179975 Pleurocapsa sp. Species 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 2
- 101001138028 Rhodococcus jostii Lariatin Proteins 0.000 description 2
- 108010022394 Threonine synthase Proteins 0.000 description 2
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 238000001261 affinity purification Methods 0.000 description 2
- 125000003277 amino group Chemical group 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 239000003242 anti bacterial agent Substances 0.000 description 2
- 229940088710 antibiotic agent Drugs 0.000 description 2
- UCMIRNVEIXFBKS-UHFFFAOYSA-N beta-alanine Chemical compound NCCC(O)=O UCMIRNVEIXFBKS-UHFFFAOYSA-N 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 150000007942 carboxylates Chemical class 0.000 description 2
- 238000005119 centrifugation Methods 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- ZPUCINDJVBIVPJ-LJISPDSOSA-N cocaine Chemical compound O([C@H]1C[C@@H]2CC[C@@H](N2C)[C@H]1C(=O)OC)C(=O)C1=CC=CC=C1 ZPUCINDJVBIVPJ-LJISPDSOSA-N 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 102000004419 dihydrofolate reductase Human genes 0.000 description 2
- 235000019253 formic acid Nutrition 0.000 description 2
- 125000000524 functional group Chemical group 0.000 description 2
- BTCSSZJGUNDROE-UHFFFAOYSA-N gamma-aminobutyric acid Chemical compound NCCCC(O)=O BTCSSZJGUNDROE-UHFFFAOYSA-N 0.000 description 2
- 238000001052 heteronuclear multiple bond coherence spectrum Methods 0.000 description 2
- 238000004128 high performance liquid chromatography Methods 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 229960001913 mecysteine Drugs 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 230000003647 oxidation Effects 0.000 description 2
- 238000007254 oxidation reaction Methods 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 229960001639 penicillamine Drugs 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 229930182852 proteinogenic amino acid Natural products 0.000 description 2
- 238000004007 reversed phase HPLC Methods 0.000 description 2
- FSYKKLYZXJSNPZ-UHFFFAOYSA-N sarcosine Chemical compound C[NH2+]CC([O-])=O FSYKKLYZXJSNPZ-UHFFFAOYSA-N 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- JQWHASGSAFIOCM-UHFFFAOYSA-M sodium periodate Chemical compound [Na+].[O-]I(=O)(=O)=O JQWHASGSAFIOCM-UHFFFAOYSA-M 0.000 description 2
- 239000001488 sodium phosphate Substances 0.000 description 2
- 229910000162 sodium phosphate Inorganic materials 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- RYFMWSXOAZQYPI-UHFFFAOYSA-K trisodium phosphate Chemical compound [Na+].[Na+].[Na+].[O-]P([O-])([O-])=O RYFMWSXOAZQYPI-UHFFFAOYSA-K 0.000 description 2
- CWLQUGTUXBXTLF-RXMQYKEDSA-N (2r)-1-methylpyrrolidine-2-carboxylic acid Chemical compound CN1CCC[C@@H]1C(O)=O CWLQUGTUXBXTLF-RXMQYKEDSA-N 0.000 description 1
- YAXAFCHJCYILRU-RXMQYKEDSA-N (2r)-2-(methylamino)-4-methylsulfanylbutanoic acid Chemical compound CN[C@@H](C(O)=O)CCSC YAXAFCHJCYILRU-RXMQYKEDSA-N 0.000 description 1
- XLBVNMSMFQMKEY-SCSAIBSYSA-N (2r)-2-(methylamino)pentanedioic acid Chemical compound CN[C@@H](C(O)=O)CCC(O)=O XLBVNMSMFQMKEY-SCSAIBSYSA-N 0.000 description 1
- GDFAOVXKHJXLEI-GSVOUGTGSA-N (2r)-2-(methylamino)propanoic acid Chemical compound CN[C@H](C)C(O)=O GDFAOVXKHJXLEI-GSVOUGTGSA-N 0.000 description 1
- SCIFESDRCALIIM-SECBINFHSA-N (2r)-2-(methylazaniumyl)-3-phenylpropanoate Chemical compound CN[C@@H](C(O)=O)CC1=CC=CC=C1 SCIFESDRCALIIM-SECBINFHSA-N 0.000 description 1
- CYZKJBZEIFWZSR-ZCFIWIBFSA-N (2r)-3-(1h-imidazol-5-yl)-2-(methylamino)propanoic acid Chemical compound CN[C@@H](C(O)=O)CC1=CN=CN1 CYZKJBZEIFWZSR-ZCFIWIBFSA-N 0.000 description 1
- CZCIKBSVHDNIDH-LLVKDONJSA-N (2r)-3-(1h-indol-3-yl)-2-(methylamino)propanoic acid Chemical compound C1=CC=C2C(C[C@@H](NC)C(O)=O)=CNC2=C1 CZCIKBSVHDNIDH-LLVKDONJSA-N 0.000 description 1
- AKCRVYNORCOYQT-RXMQYKEDSA-N (2r)-3-methyl-2-(methylazaniumyl)butanoate Chemical compound C[NH2+][C@H](C(C)C)C([O-])=O AKCRVYNORCOYQT-RXMQYKEDSA-N 0.000 description 1
- LNSMPSPTFDIWRQ-GSVOUGTGSA-N (2r)-4-amino-2-(methylamino)-4-oxobutanoic acid Chemical compound CN[C@@H](C(O)=O)CC(N)=O LNSMPSPTFDIWRQ-GSVOUGTGSA-N 0.000 description 1
- NTWVQPHTOUKMDI-RXMQYKEDSA-N (2r)-5-(diaminomethylideneamino)-2-(methylamino)pentanoic acid Chemical compound CN[C@@H](C(O)=O)CCCNC(N)=N NTWVQPHTOUKMDI-RXMQYKEDSA-N 0.000 description 1
- KSZFSNZOGAXEGH-SCSAIBSYSA-N (2r)-5-amino-2-(methylamino)-5-oxopentanoic acid Chemical compound CN[C@@H](C(O)=O)CCC(N)=O KSZFSNZOGAXEGH-SCSAIBSYSA-N 0.000 description 1
- OZRWQPFBXDVLAH-RXMQYKEDSA-N (2r)-5-amino-2-(methylamino)pentanoic acid Chemical compound CN[C@@H](C(O)=O)CCCN OZRWQPFBXDVLAH-RXMQYKEDSA-N 0.000 description 1
- KSPIYJQBLVDRRI-NTSWFWBYSA-N (2r,3s)-3-methyl-2-(methylazaniumyl)pentanoate Chemical compound CC[C@H](C)[C@@H](NC)C(O)=O KSPIYJQBLVDRRI-NTSWFWBYSA-N 0.000 description 1
- BVAUMRCGVHUWOZ-ZETCQYMHSA-N (2s)-2-(cyclohexylazaniumyl)propanoate Chemical compound OC(=O)[C@H](C)NC1CCCCC1 BVAUMRCGVHUWOZ-ZETCQYMHSA-N 0.000 description 1
- LDUWTIUXPVCEQF-LURJTMIESA-N (2s)-2-(cyclopentylamino)propanoic acid Chemical compound OC(=O)[C@H](C)NC1CCCC1 LDUWTIUXPVCEQF-LURJTMIESA-N 0.000 description 1
- NVXKJPGRZSDYPK-JTQLQIEISA-N (2s)-2-(methylamino)-4-phenylbutanoic acid Chemical compound CN[C@H](C(O)=O)CCC1=CC=CC=C1 NVXKJPGRZSDYPK-JTQLQIEISA-N 0.000 description 1
- HOKKHZGPKSLGJE-VKHMYHEASA-N (2s)-2-(methylamino)butanedioic acid Chemical compound CN[C@H](C(O)=O)CC(O)=O HOKKHZGPKSLGJE-VKHMYHEASA-N 0.000 description 1
- FPDYKABXINADKS-LURJTMIESA-N (2s)-2-(methylazaniumyl)hexanoate Chemical compound CCCC[C@H](NC)C(O)=O FPDYKABXINADKS-LURJTMIESA-N 0.000 description 1
- HCPKYUNZBPVCHC-YFKPBYRVSA-N (2s)-2-(methylazaniumyl)pentanoate Chemical compound CCC[C@H](NC)C(O)=O HCPKYUNZBPVCHC-YFKPBYRVSA-N 0.000 description 1
- WTDHSXGBDZBWAW-QMMMGPOBSA-N (2s)-2-[cyclohexyl(methyl)azaniumyl]propanoate Chemical compound OC(=O)[C@H](C)N(C)C1CCCCC1 WTDHSXGBDZBWAW-QMMMGPOBSA-N 0.000 description 1
- IUYZJPXOXGRNNE-ZETCQYMHSA-N (2s)-2-[cyclopentyl(methyl)amino]propanoic acid Chemical compound OC(=O)[C@H](C)N(C)C1CCCC1 IUYZJPXOXGRNNE-ZETCQYMHSA-N 0.000 description 1
- AXDLCFOOGCNDST-VIFPVBQESA-N (2s)-3-(4-hydroxyphenyl)-2-(methylamino)propanoic acid Chemical compound CN[C@H](C(O)=O)CC1=CC=C(O)C=C1 AXDLCFOOGCNDST-VIFPVBQESA-N 0.000 description 1
- XKZCXMNMUMGDJG-AWEZNQCLSA-N (2s)-3-[(6-acetylnaphthalen-2-yl)amino]-2-aminopropanoic acid Chemical compound C1=C(NC[C@H](N)C(O)=O)C=CC2=CC(C(=O)C)=CC=C21 XKZCXMNMUMGDJG-AWEZNQCLSA-N 0.000 description 1
- LNSMPSPTFDIWRQ-VKHMYHEASA-N (2s)-4-amino-2-(methylamino)-4-oxobutanoic acid Chemical compound CN[C@H](C(O)=O)CC(N)=O LNSMPSPTFDIWRQ-VKHMYHEASA-N 0.000 description 1
- XJODGRWDFZVTKW-LURJTMIESA-N (2s)-4-methyl-2-(methylamino)pentanoic acid Chemical compound CN[C@H](C(O)=O)CC(C)C XJODGRWDFZVTKW-LURJTMIESA-N 0.000 description 1
- KSZFSNZOGAXEGH-BYPYZUCNSA-N (2s)-5-amino-2-(methylamino)-5-oxopentanoic acid Chemical compound CN[C@H](C(O)=O)CCC(N)=O KSZFSNZOGAXEGH-BYPYZUCNSA-N 0.000 description 1
- OZRWQPFBXDVLAH-YFKPBYRVSA-N (2s)-5-amino-2-(methylamino)pentanoic acid Chemical compound CN[C@H](C(O)=O)CCCN OZRWQPFBXDVLAH-YFKPBYRVSA-N 0.000 description 1
- RHMALYOXPBRJBG-WXHCCQJTSA-N (2s)-6-amino-2-[[(2s)-2-[[(2s)-2-[[(2s)-2-[[(2s)-6-amino-2-[[(2s)-2-[[(2s)-2-[[2-[[(2s,3r)-2-[[(2s)-2-[[2-[[2-[[(2r)-2-amino-3-phenylpropanoyl]amino]acetyl]amino]acetyl]amino]-3-phenylpropanoyl]amino]-3-hydroxybutanoyl]amino]acetyl]amino]propanoyl]amino]- Chemical compound C([C@@H](C(=O)N[C@@H]([C@H](O)C)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(N)=O)NC(=O)CNC(=O)CNC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 RHMALYOXPBRJBG-WXHCCQJTSA-N 0.000 description 1
- LJRDOKAZOAKLDU-UDXJMMFXSA-N (2s,3s,4r,5r,6r)-5-amino-2-(aminomethyl)-6-[(2r,3s,4r,5s)-5-[(1r,2r,3s,5r,6s)-3,5-diamino-2-[(2s,3r,4r,5s,6r)-3-amino-4,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-6-hydroxycyclohexyl]oxy-4-hydroxy-2-(hydroxymethyl)oxolan-3-yl]oxyoxane-3,4-diol;sulfuric ac Chemical compound OS(O)(=O)=O.N[C@@H]1[C@@H](O)[C@H](O)[C@H](CN)O[C@@H]1O[C@H]1[C@@H](O)[C@H](O[C@H]2[C@@H]([C@@H](N)C[C@@H](N)[C@@H]2O)O[C@@H]2[C@@H]([C@@H](O)[C@H](O)[C@@H](CO)O2)N)O[C@@H]1CO LJRDOKAZOAKLDU-UDXJMMFXSA-N 0.000 description 1
- HNSDLXPSAYFUHK-UHFFFAOYSA-N 1,4-bis(2-ethylhexyl) sulfosuccinate Chemical compound CCCCC(CC)COC(=O)CC(S(O)(=O)=O)C(=O)OCC(CC)CCCC HNSDLXPSAYFUHK-UHFFFAOYSA-N 0.000 description 1
- WAAJQPAIOASFSC-UHFFFAOYSA-N 2-(1-hydroxyethylamino)acetic acid Chemical compound CC(O)NCC(O)=O WAAJQPAIOASFSC-UHFFFAOYSA-N 0.000 description 1
- UEQSFWNXRZJTKB-UHFFFAOYSA-N 2-(2,2-diphenylethylamino)acetic acid Chemical compound C=1C=CC=CC=1C(CNCC(=O)O)C1=CC=CC=C1 UEQSFWNXRZJTKB-UHFFFAOYSA-N 0.000 description 1
- PIINGYXNCHTJTF-UHFFFAOYSA-N 2-(2-azaniumylethylamino)acetate Chemical compound NCCNCC(O)=O PIINGYXNCHTJTF-UHFFFAOYSA-N 0.000 description 1
- XCDGCRLSSSSBIA-UHFFFAOYSA-N 2-(2-methylsulfanylethylamino)acetic acid Chemical compound CSCCNCC(O)=O XCDGCRLSSSSBIA-UHFFFAOYSA-N 0.000 description 1
- STMXJQHRRCPJCJ-UHFFFAOYSA-N 2-(3,3-diphenylpropylamino)acetic acid Chemical compound C=1C=CC=CC=1C(CCNCC(=O)O)C1=CC=CC=C1 STMXJQHRRCPJCJ-UHFFFAOYSA-N 0.000 description 1
- DHGYLUFLENKZHH-UHFFFAOYSA-N 2-(3-aminopropylamino)acetic acid Chemical compound NCCCNCC(O)=O DHGYLUFLENKZHH-UHFFFAOYSA-N 0.000 description 1
- OGAULEBSQQMUKP-UHFFFAOYSA-N 2-(4-aminobutylamino)acetic acid Chemical compound NCCCCNCC(O)=O OGAULEBSQQMUKP-UHFFFAOYSA-N 0.000 description 1
- KGSVNOLLROCJQM-UHFFFAOYSA-N 2-(benzylamino)acetic acid Chemical compound OC(=O)CNCC1=CC=CC=C1 KGSVNOLLROCJQM-UHFFFAOYSA-N 0.000 description 1
- IVCQRTJVLJXKKJ-UHFFFAOYSA-N 2-(butan-2-ylazaniumyl)acetate Chemical compound CCC(C)NCC(O)=O IVCQRTJVLJXKKJ-UHFFFAOYSA-N 0.000 description 1
- KQLGGQARRCMYGD-UHFFFAOYSA-N 2-(cyclobutylamino)acetic acid Chemical compound OC(=O)CNC1CCC1 KQLGGQARRCMYGD-UHFFFAOYSA-N 0.000 description 1
- DICMQVOBSKLBBN-UHFFFAOYSA-N 2-(cyclodecylamino)acetic acid Chemical compound OC(=O)CNC1CCCCCCCCC1 DICMQVOBSKLBBN-UHFFFAOYSA-N 0.000 description 1
- NPLBBQAAYSJEMO-UHFFFAOYSA-N 2-(cycloheptylazaniumyl)acetate Chemical compound OC(=O)CNC1CCCCCC1 NPLBBQAAYSJEMO-UHFFFAOYSA-N 0.000 description 1
- OQMYZVWIXPPDDE-UHFFFAOYSA-N 2-(cyclohexylazaniumyl)acetate Chemical compound OC(=O)CNC1CCCCC1 OQMYZVWIXPPDDE-UHFFFAOYSA-N 0.000 description 1
- PNKNDNFLQNMQJL-UHFFFAOYSA-N 2-(cyclooctylazaniumyl)acetate Chemical compound OC(=O)CNC1CCCCCCC1 PNKNDNFLQNMQJL-UHFFFAOYSA-N 0.000 description 1
- DXQCCQKRNWMECV-UHFFFAOYSA-N 2-(cyclopropylazaniumyl)acetate Chemical compound OC(=O)CNC1CC1 DXQCCQKRNWMECV-UHFFFAOYSA-N 0.000 description 1
- PRVOMNLNSHAUEI-UHFFFAOYSA-N 2-(cycloundecylamino)acetic acid Chemical compound OC(=O)CNC1CCCCCCCCCC1 PRVOMNLNSHAUEI-UHFFFAOYSA-N 0.000 description 1
- HEPOIJKOXBKKNJ-UHFFFAOYSA-N 2-(propan-2-ylazaniumyl)acetate Chemical compound CC(C)NCC(O)=O HEPOIJKOXBKKNJ-UHFFFAOYSA-N 0.000 description 1
- FUOOLUPWFVMBKG-UHFFFAOYSA-N 2-Aminoisobutyric acid Chemical compound CC(C)(N)C(O)=O FUOOLUPWFVMBKG-UHFFFAOYSA-N 0.000 description 1
- AWEZYTUWDZADKR-UHFFFAOYSA-N 2-[(2-amino-2-oxoethyl)azaniumyl]acetate Chemical compound NC(=O)CNCC(O)=O AWEZYTUWDZADKR-UHFFFAOYSA-N 0.000 description 1
- MNDBDVPDSHGIHR-UHFFFAOYSA-N 2-[(3-amino-3-oxopropyl)amino]acetic acid Chemical compound NC(=O)CCNCC(O)=O MNDBDVPDSHGIHR-UHFFFAOYSA-N 0.000 description 1
- YDBPFLZECVWPSH-UHFFFAOYSA-N 2-[3-(diaminomethylideneamino)propylamino]acetic acid Chemical compound NC(=N)NCCCNCC(O)=O YDBPFLZECVWPSH-UHFFFAOYSA-N 0.000 description 1
- XFDUHJPVQKIXHO-UHFFFAOYSA-N 3-aminobenzoic acid Chemical compound NC1=CC=CC(C(O)=O)=C1 XFDUHJPVQKIXHO-UHFFFAOYSA-N 0.000 description 1
- 101800000535 3C-like proteinase Proteins 0.000 description 1
- 101800002396 3C-like proteinase nsp5 Proteins 0.000 description 1
- AOKCDAVWJLOAHG-UHFFFAOYSA-N 4-(methylamino)butyric acid Chemical compound C[NH2+]CCCC([O-])=O AOKCDAVWJLOAHG-UHFFFAOYSA-N 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- KXDHJXZQYSOELW-UHFFFAOYSA-M Carbamate Chemical compound NC([O-])=O KXDHJXZQYSOELW-UHFFFAOYSA-M 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- LVZWSLJZHVFIQJ-UHFFFAOYSA-N Cyclopropane Chemical compound C1CC1 LVZWSLJZHVFIQJ-UHFFFAOYSA-N 0.000 description 1
- HTJDQJBWANPRPF-UHFFFAOYSA-N Cyclopropylamine Chemical compound NC1CC1 HTJDQJBWANPRPF-UHFFFAOYSA-N 0.000 description 1
- 238000010485 C−C bond formation reaction Methods 0.000 description 1
- AHLPHDHHMVZTML-SCSAIBSYSA-N D-Ornithine Chemical compound NCCC[C@@H](N)C(O)=O AHLPHDHHMVZTML-SCSAIBSYSA-N 0.000 description 1
- 150000008574 D-amino acids Chemical class 0.000 description 1
- 102000012410 DNA Ligases Human genes 0.000 description 1
- 108010061982 DNA Ligases Proteins 0.000 description 1
- 108020001019 DNA Primers Proteins 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 108010016626 Dipeptides Proteins 0.000 description 1
- 241000672609 Escherichia coli BL21 Species 0.000 description 1
- 241001198387 Escherichia coli BL21(DE3) Species 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- IMROMDMJAWUWLK-UHFFFAOYSA-N Ethenol Chemical group OC=C IMROMDMJAWUWLK-UHFFFAOYSA-N 0.000 description 1
- 241000187809 Frankia Species 0.000 description 1
- 101000946053 Homo sapiens Lysosomal-associated transmembrane protein 4A Proteins 0.000 description 1
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 1
- SNDPXSYFESPGGJ-BYPYZUCNSA-N L-2-aminopentanoic acid Chemical compound CCC[C@H](N)C(O)=O SNDPXSYFESPGGJ-BYPYZUCNSA-N 0.000 description 1
- GDFAOVXKHJXLEI-UHFFFAOYSA-N L-N-Boc-N-methylalanine Natural products CNC(C)C(O)=O GDFAOVXKHJXLEI-UHFFFAOYSA-N 0.000 description 1
- QWCKQJZIFLGMSD-VKHMYHEASA-N L-alpha-aminobutyric acid Chemical compound CC[C@H](N)C(O)=O QWCKQJZIFLGMSD-VKHMYHEASA-N 0.000 description 1
- 150000008575 L-amino acids Chemical class 0.000 description 1
- JTTHKOPSMAVJFE-VIFPVBQESA-N L-homophenylalanine Chemical compound OC(=O)[C@@H](N)CCC1=CC=CC=C1 JTTHKOPSMAVJFE-VIFPVBQESA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 108091000076 Lysine 2,3-aminomutase Proteins 0.000 description 1
- 102100034728 Lysosomal-associated transmembrane protein 4A Human genes 0.000 description 1
- CYZKJBZEIFWZSR-LURJTMIESA-N N(alpha)-methyl-L-histidine Chemical compound CN[C@H](C(O)=O)CC1=CNC=N1 CYZKJBZEIFWZSR-LURJTMIESA-N 0.000 description 1
- CZCIKBSVHDNIDH-NSHDSACASA-N N(alpha)-methyl-L-tryptophan Chemical compound C1=CC=C2C(C[C@H]([NH2+]C)C([O-])=O)=CNC2=C1 CZCIKBSVHDNIDH-NSHDSACASA-N 0.000 description 1
- WRUZLCLJULHLEY-UHFFFAOYSA-N N-(p-hydroxyphenyl)glycine Chemical compound OC(=O)CNC1=CC=C(O)C=C1 WRUZLCLJULHLEY-UHFFFAOYSA-N 0.000 description 1
- VKZGJEWGVNFKPE-UHFFFAOYSA-N N-Isobutylglycine Chemical compound CC(C)CNCC(O)=O VKZGJEWGVNFKPE-UHFFFAOYSA-N 0.000 description 1
- SCIFESDRCALIIM-UHFFFAOYSA-N N-Me-Phenylalanine Natural products CNC(C(O)=O)CC1=CC=CC=C1 SCIFESDRCALIIM-UHFFFAOYSA-N 0.000 description 1
- HOKKHZGPKSLGJE-GSVOUGTGSA-N N-Methyl-D-aspartic acid Chemical compound CN[C@@H](C(O)=O)CC(O)=O HOKKHZGPKSLGJE-GSVOUGTGSA-N 0.000 description 1
- NTWVQPHTOUKMDI-YFKPBYRVSA-N N-Methyl-arginine Chemical compound CN[C@H](C(O)=O)CCCN=C(N)N NTWVQPHTOUKMDI-YFKPBYRVSA-N 0.000 description 1
- GDFAOVXKHJXLEI-VKHMYHEASA-N N-methyl-L-alanine Chemical compound C[NH2+][C@@H](C)C([O-])=O GDFAOVXKHJXLEI-VKHMYHEASA-N 0.000 description 1
- XLBVNMSMFQMKEY-BYPYZUCNSA-N N-methyl-L-glutamic acid Chemical compound CN[C@H](C(O)=O)CCC(O)=O XLBVNMSMFQMKEY-BYPYZUCNSA-N 0.000 description 1
- YAXAFCHJCYILRU-YFKPBYRVSA-N N-methyl-L-methionine Chemical compound C[NH2+][C@H](C([O-])=O)CCSC YAXAFCHJCYILRU-YFKPBYRVSA-N 0.000 description 1
- SCIFESDRCALIIM-VIFPVBQESA-N N-methyl-L-phenylalanine Chemical compound C[NH2+][C@H](C([O-])=O)CC1=CC=CC=C1 SCIFESDRCALIIM-VIFPVBQESA-N 0.000 description 1
- AKCRVYNORCOYQT-YFKPBYRVSA-N N-methyl-L-valine Chemical compound CN[C@@H](C(C)C)C(O)=O AKCRVYNORCOYQT-YFKPBYRVSA-N 0.000 description 1
- CWLQUGTUXBXTLF-YFKPBYRVSA-N N-methylproline Chemical compound CN1CCC[C@H]1C(O)=O CWLQUGTUXBXTLF-YFKPBYRVSA-N 0.000 description 1
- 101150054880 NASP gene Proteins 0.000 description 1
- 241001538234 Nala Species 0.000 description 1
- 108010038807 Oligopeptides Proteins 0.000 description 1
- 102000015636 Oligopeptides Human genes 0.000 description 1
- 229930012538 Paclitaxel Natural products 0.000 description 1
- 229930182555 Penicillin Natural products 0.000 description 1
- JGSARLDLIJGVTE-MBNYWOFBSA-N Penicillin G Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)CC1=CC=CC=C1 JGSARLDLIJGVTE-MBNYWOFBSA-N 0.000 description 1
- 108010004478 Phenylalanine-tRNA Ligase Proteins 0.000 description 1
- 102100029354 Phenylalanine-tRNA ligase, mitochondrial Human genes 0.000 description 1
- 241000235648 Pichia Species 0.000 description 1
- 241000519590 Pseudoalteromonas Species 0.000 description 1
- 108010077895 Sarcosine Proteins 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 241000190807 Thiothrix Species 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- 101100296672 Yersinia enterocolitica pcp gene Proteins 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 238000001042 affinity chromatography Methods 0.000 description 1
- DLAMVQGYEVKIRE-UHFFFAOYSA-N alpha-(methylamino)isobutyric acid Chemical compound CNC(C)(C)C(O)=O DLAMVQGYEVKIRE-UHFFFAOYSA-N 0.000 description 1
- 150000001412 amines Chemical group 0.000 description 1
- 125000000539 amino acid group Chemical group 0.000 description 1
- QCTBMLYLENLHLA-UHFFFAOYSA-N aminomethylbenzoic acid Chemical compound NCC1=CC=C(C(O)=O)C=C1 QCTBMLYLENLHLA-UHFFFAOYSA-N 0.000 description 1
- 229960003375 aminomethylbenzoic acid Drugs 0.000 description 1
- RWZYAGGXGHYGMB-UHFFFAOYSA-N anthranilic acid Chemical compound NC1=CC=CC=C1C(O)=O RWZYAGGXGHYGMB-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 229940000635 beta-alanine Drugs 0.000 description 1
- 125000002619 bicyclic group Chemical group 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 125000001314 canonical amino-acid group Chemical group 0.000 description 1
- 239000004202 carbamide Substances 0.000 description 1
- 125000002915 carbonyl group Chemical group [*:2]C([*:1])=O 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003054 catalyst Substances 0.000 description 1
- 125000003636 chemical group Chemical group 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 229960005091 chloramphenicol Drugs 0.000 description 1
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 1
- 229960003920 cocaine Drugs 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 108091036078 conserved sequence Proteins 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 231100000433 cytotoxic Toxicity 0.000 description 1
- 230000001472 cytotoxic effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 229960003276 erythromycin Drugs 0.000 description 1
- AEOCXXJPGCBFJA-UHFFFAOYSA-N ethionamide Chemical compound CCC1=CC(C(N)=S)=CC=N1 AEOCXXJPGCBFJA-UHFFFAOYSA-N 0.000 description 1
- 238000004896 high resolution mass spectrometry Methods 0.000 description 1
- NBZBKCUXIYYUSX-UHFFFAOYSA-N iminodiacetic acid Chemical compound OC(=O)CNCC(O)=O NBZBKCUXIYYUSX-UHFFFAOYSA-N 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 229960000318 kanamycin Drugs 0.000 description 1
- 229930027917 kanamycin Natural products 0.000 description 1
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 1
- 229930182823 kanamycin A Natural products 0.000 description 1
- 150000002576 ketones Chemical class 0.000 description 1
- 150000003951 lactams Chemical class 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 238000001819 mass spectrum Methods 0.000 description 1
- 238000000816 matrix-assisted laser desorption--ionisation Methods 0.000 description 1
- 238000001906 matrix-assisted laser desorption--ionisation mass spectrometry Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- XJODGRWDFZVTKW-ZCFIWIBFSA-N n-methylleucine Chemical compound CN[C@@H](C(O)=O)CC(C)C XJODGRWDFZVTKW-ZCFIWIBFSA-N 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 239000003592 new natural product Substances 0.000 description 1
- 238000000655 nuclear magnetic resonance spectrum Methods 0.000 description 1
- 230000002018 overexpression Effects 0.000 description 1
- 229960001592 paclitaxel Drugs 0.000 description 1
- 229940049954 penicillin Drugs 0.000 description 1
- 101150031287 petH gene Proteins 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 230000026731 phosphorylation Effects 0.000 description 1
- 238000006366 phosphorylation reaction Methods 0.000 description 1
- 238000003752 polymerase chain reaction Methods 0.000 description 1
- 230000004481 post-translational protein modification Effects 0.000 description 1
- 230000001323 posttranslational effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000007320 rich medium Substances 0.000 description 1
- 229910052708 sodium Inorganic materials 0.000 description 1
- 239000011734 sodium Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010532 solid phase synthesis reaction Methods 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- RCINICONZNJXQF-MZXODVADSA-N taxol Chemical compound O([C@@H]1[C@@]2(C[C@@H](C(C)=C(C2(C)C)[C@H](C([C@]2(C)[C@@H](O)C[C@H]3OC[C@]3([C@H]21)OC(C)=O)=O)OC(=O)C)OC(=O)[C@H](O)[C@@H](NC(=O)C=1C=CC=CC=1)C=1C=CC=CC=1)O)C(=O)C1=CC=CC=C1 RCINICONZNJXQF-MZXODVADSA-N 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 150000003568 thioethers Chemical class 0.000 description 1
- 238000006257 total synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000001195 ultra high performance liquid chromatography Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 108010064245 urinary gonadotropin fragment Proteins 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 125000000391 vinyl group Chemical group [H]C([*])=C([H])[H] 0.000 description 1
- 229920002554 vinyl polymer Polymers 0.000 description 1
- 239000013603 viral vector Substances 0.000 description 1
- 239000011534 wash buffer Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/52—Genes encoding for enzymes or proenzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y205/00—Transferases transferring alkyl or aryl groups, other than methyl groups (2.5)
- C12Y205/01—Transferases transferring alkyl or aryl groups, other than methyl groups (2.5) transferring alkyl or aryl groups, other than methyl groups (2.5.1)
- C12Y205/01006—Methionine adenosyltransferase (2.5.1.6), i.e. adenosylmethionine synthetase
Definitions
- the present invention relates to the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-3 ⁇ 4 3 -amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine. Furthermore, the present invention relates to a method for introducing at least one a-keto-3 ⁇ 4 3 -amino acid into (poly)-peptides comprising said XYG motif using a radical S-adenosyl methionine (rSAM) enzyme.
- rSAM radical S-adenosyl methionine
- ⁇ -amino acids are widely distributed in nature and are present in natural product-derived drugs such as penicillin, taxol and cocaine.
- the installation of the ⁇ -amino acids in peptides offers, e.g., greater stability (for example from protease-mediated degradation) and can confer structural features distinct from all L-amino acid-containing peptides or natural products, resulting in unique functions and bioactivities.
- Known methods to incorporate ⁇ -amino acids into peptides generally rely on total synthesis, for example, by condensation of monomers in solid- phase synthesis.
- a second approach relies on in vitro translation, which usually suffers from low incorporation efficiency.
- ribosomally synthesized peptides and proteins are comprised of a-amino acids.
- No post-translational modifying enzymes are known that incorporate ⁇ -amino acids into ribosomal products.
- Biosynthetic routes to ⁇ -amino acid-containing peptides typically utilize non-ribosomal peptide synthetases (NRPS) that act on free amino acids or peptide residues.
- NRPS non-ribosomal peptide synthetases
- These enzymes are very large multimodular enzymes and are difficult to manipulate for bioengineering or biotechnological applications.
- the loading of amino acids onto NRPS usually occurs specifically for certain amino acids.
- the NRPS-type machinery is limited to small peptides with typically less than 15 residues. Therefore, incorporation of ⁇ - amino acids into proteins by NRPSs is not possible.
- the above objective is solved by the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)- peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine.
- rSAM radical S-adenosyl methionine
- radical S-adenosyl methionine (rSAM) enzymes have the capacity to incorporate different types of a-keto-IS 3 -amino acids into diverse (poly)peptide substrates that comprise one or more of amino acid motif XYG.
- rSAM enzymes are monomeric enzymes of up to 50 kDa and can be overexpressed in high yields in many cells, e.g. in E. coli (see below).
- peptides generated by NRPSs require huge enzyme complexes and are typically quite specific to result in either a single product or a few closely related analogs only.
- Preferred rSAM enzymes for use in the present invention were identified by their surrogate substrate nifll precursor peptides within ribosomally synthesized and post-translationally modified peptide (RiPP) natural product gene clusters.
- Gene clusters containing NHLP- or Nifll- type precursors are considered a new natural product family (see Freeman et al. Science 338, 387-90 (2012) and Haft et al. BMC Biology 8, 70 (2010)) termed "proteusins".
- proteusin biosynthetic gene clusters are widespread in bacteria, their biosynthetic end products and functions are currently unknown with the exception of the cytotoxic pore-forming polytheonamides (see Hamada et al. Tetrahedron Lett.
- a-keto ⁇ 3 -amino acid-comprising (poly)peptide products can conveniently be generated according to the present invention by in vivo co-expression of a (poly)peptide substrate with the rSAM enzymes in bacteria and also other organisms, e.g. yeast, plant cells, mammalian cells or insect cells.
- the rSAM enzymes described herein also introduce a keto functionality that is not present in proteinogenic amino acids.
- the introduction of a keto functionality into ribosomally synthesized (poly)peptides is of interest, e.g.
- non-selective chemical oxidation such as sodium periodate to oxidize all side chains of serine
- introduction of non-canonical amino acids by an unnatural amino acid mutagenesis strategy based on recoding of stop codons.
- the pro- teusin cluster e.g. from Pleurocapsa sp. PCC 7319, containing genes for two substrates (e.g. plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117), an rSAM SPASM protein (e.g. plpX (SEQ ID NOs: 118 and 119), and optionally an associated protein (e.g. plpY (SEQ ID NOs: 120 and 121) can be cloned into expression vectors for expression in E. coli.
- the associated protein e.g.
- plpY is only optional, e.g. to increase the enzymatic efficiency of the rSAM and is not required for practicing the present invention.
- the substrate peptides e.g. plpA2 and plpA3
- the rSAM enzyme and the optional associated protein e.g. pip X and plpY
- the substrate peptide genes can be individually transformed with the rSAM enzyme and the optional associated protein (e.g. plpX and plpY), e.g. into E.
- coli and protein overexpression can be carried out in standard medium.
- purification e.g. Ni-affinity purification of the N-terminally hexahistidine-tagged (NHis6) peptides, and in vitro cleavage of leader, e.g. with a protease
- the core (poly)peptide comprising the a-keto ⁇ 3 -amino acid(s) can be analyzed, e.g. by MALDI-MS, and optionally the a-keto ⁇ -amino acid(s) can be reduced to the corresponding ⁇ -amino acid(s) using standard chemical methods, e.g. using sodium borohydride.
- the rSAM enzyme for use in the present invention excises a structure from the XYG motif that has a mass of C 8 H 9 NO, as determined by high resolution mass spectrometry and is attributed to the tyrosine residue.
- the structures of the transformed substrates are products containing the a-keto ⁇ 3 -amino acid.
- the XYG motif is conserved among a wide range of precursors, e.g.
- precursors carrying more than one XYG motif can be converted to products with several a-keto ⁇ 3 -amino acids after expression of the rSAM enzyme in E. coli.
- ⁇ -amino acid refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has its amino group bonded to the ⁇ -carbon rather than the a carbon.
- a-keto ⁇ 3 -amino acid refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has a keto group at the a carbon and its amino group bonded to the 3- ⁇ 3 ⁇ rather than the a carbon.
- polypeptide as used herein, is meant to encompass peptides, polypeptides, oligopeptides and proteins that comprise two or more amino acids linked covalently through peptide bonds. The term does not refer to a specific length of the product.
- the term (poly)- peptide includes (poly)peptides with post-translational modifications, for example, glycosylates, acetylations, phosphorylations and the like, as well as (poly)peptides comprising non- natural or non-conventional amino acids and functional derivatives as described below.
- non-natural or non-conventional amino acid refers to naturally occurring or naturally not occurring unnatural amino acids or chemical amino acid analogues, e.g. D-amino acids, ⁇ , ⁇ -disubstituted amino acids, N-alkyl amino acids, homo-amino acids, dehyd roamino acids, aromatic amino acids (other than phenylalanine, tyrosine and tryptophan), and ortho-, meta- or para-aminobenzoic acid.
- D-amino acids e.g. D-amino acids, ⁇ , ⁇ -disubstituted amino acids, N-alkyl amino acids, homo-amino acids, dehyd roamino acids, aromatic amino acids (other than phenylalanine, tyrosine and tryptophan), and ortho-, meta- or para-aminobenzoic acid.
- Non-conventional amino acids also include compounds which have an amine and carboxyl functional group separated in a 1,3 or larger substitution pattern, such as ⁇ -alanine, y-amino butyric acid, Freidinger lactam, the bicyclic dipeptide (BTD) , amino- methyl benzoic acid and others well known in the art.
- BTD bicyclic dipeptide
- Statine-like isosteres, hydroxyethylene isosteres, reduced amide bond isosteres, thioamide isosteres, urea isosteres, carbamate isosteres, thioether isosteres, vinyl isosteres and other amide bond isosteres known to the art may also be used.
- a non limiting list of non-conventional amino acids which may be comprised in the (poly)peptide and their standard abbreviations (in brackets) is as follows: a-aminobutyric acid (Abu), L-N-methylalanine (Nmala), ⁇ -amino-a-methylbutyrate (Mgabu), L-N-methylarginine (Nmarg), aminocyclopropane (Cpro), L-N-methylasparagine (Nmasn), carboxylate L-N-methyl- aspartic acid (Nmasp), aniinoisobutyric acid (Aib), L-N-methylcysteine (Nmcys), aminonorbornyl (Norb), L-N-methylglutamine (Nmgln), carboxylate L-N-methylglutamic acid (Nmglu), cyclohexyl- alanine (Chexa), L-N-methylhistidine (Nmhis), cyclopentylalanine
- Nmaabu D-a-methylleucine (Dmleu), a-napthylalanine (Anap), D-a-methyllysine (Dmlys), N- benzylglycine (Nphe), D-a-methylmethionine (Dmmet), N-(2-carbamylethyl)glycine (Ngln), D-a- methylornithine (Dmorn), N-(carbamylmethyl)glycine (Nasn), D-a-methylphenylalanine (Dmphe), N-(2-carboxyethyl)glycine (Nglu), D-a-methylproline (Dmpro), N-(carboxymethyl)glycine (Nasp), D-a-methylserine (Dmser), N-cyclobutylglycine (Ncbut), D-a-methylthreonine (Dmthr), N-cyclo- heptylglycine (
- the rSAM enzyme for use in the present invention is a peptide radical SAM maturase, preferably a nifll class peptide radical SAM maturase 3.
- the nifll-class peptide radical SAM maturase 3 belongs to the conserved protein domain family rSAM_nifll_3 (nifll-class peptide radical SAM maturase 3; IPR026482) or the rad_SAM_trio family (radical SAM GDL-associated; IPR023820) as defined by the National Center for Biotechnology Information (NCBI).
- rSAM_nifll_3 are radical SAM enzymes that often occur co-clustered together with nifll-related ribosomal natural product (RNP) precursors described by TIGRFAMs model TIGR03798.
- RNP ribosomal natural product
- rad_SAM_trio radical SAM enzymes that often occur co-clustered together with DUF1843-domain RNP precursors carrying a YXioGDL motif and described by Pfam model PF08898 (see Haft et al. Nucleic Acids Res. 31, 371-3 (2003); Haft and Basu, J. Bacteriol. 193, 2745-2755 (2011); NCBI database on
- the rSAM enzyme for use in the present invention comprises (A) an amino acid sequence according to Formula (I) (SEQ ID NO: 1) or (II) (SEQ ID NO: 2)
- ⁇ - ⁇ 20 and Zi-Z 20 each denote amino acids
- Xi is selected from the group consisting of Y, H, F and W, preferably Y and H;
- X 2 is selected from the group consisting of Y, R and H;
- X 3 is selected from the group consisting of R, K and Q, preferably R;
- X 4 is selected from the group consisting of I, T and V, preferably I and T;
- X 5 is selected from the group consisting of R, S and K, preferably R and S;
- X 6 is selected from the group consisting of H, Y, F and W, preferably H and Y;
- X 7 is selected from the group consisting of A and S, preferably A;
- X 8 is selected from the group consisting of V, I and L, preferably V;
- X 9 is selected from the group consisting of W, Y and F, preferably W;
- Xio is selected from the group consisting of E, Q, D and K, preferably E;
- Xii is selected from the group consisting of I, L, V and M, preferably I and L;
- Xi2 is selected from the group consisting of T and S, preferably T;
- Xi3 is selected from the group consisting of L, M, I and V, preferably L;
- Xi4 is selected from the group consisting of K, R, E and Q, preferably K;
- Xi5 is C
- Xi6 is selected from the group consisting of N and D, preferably N;
- Xi7 is selected from the group consisting of L, M, I and V, preferably L;
- Xi8 is selected from the group consisting of A and S, preferably A;
- X 2 o is selected from the group consisting of S, Q, E and K, preferably S and Q;
- Zi is selected from the group consisting of T, D, T, E and N, preferably T and D;
- Z 2 is selected from the group consisting of R, P, N and A, preferably R, P and A;
- Z 3 is selected from the group consisting of R, Q, K and L, preferably R;
- Z 5 is selected from the group consisting of A and S, preferably A;
- Z 6 is selected from the group consisting of R, K and Q, preferably R;
- Z 7 is selected from the group consisting of Y, F, H and W, preferably Y;
- Z 8 is selected from the group consisting of L, M, I and V, preferably L;
- Z 9 is selected from the group consisting of F, H, S and Y, preferably H, F and S;
- Z10 is selected from the group consisting of D, E and A, preferably D and E;
- Zn is selected from the group consisting of D, S and T, preferably T;
- Z12 is selected from the group consisting of D, E and N, preferably D;
- Zi3 is selected from the group consisting of Y, F, L and M, preferably Y, F and L;
- Zi4 is selected from the group consisting of K, Q, R and E, preferably Q. and K;
- Zi5 is selected from the group consisting of R, K and Q, preferably R;
- Zi6 is selected from the group consisting of Y, F and W, preferably Y and F;
- Zi7 is selected from the group consisting of V, I and L, preferably V;
- Zi9 is selected from the group consisting of V, I and L, preferably V;
- Z 2 o is selected from the group consisting of H and Y, preferably H; or
- (C) a functional fragment and/or functional derivative of (A) or (B), preferably a functional fragment of at least 10 amino acids, more preferably at least 15 amino acids of (A) or (B).
- the percentage identity of related amino acid molecules can be determined with the assistance of known methods. In general, special computer programs are employed that use algorithms adapted to accommodate the specific needs of this task. Preferred methods for determining identity begin with the generation of the largest degree of identity among the sequences to be compared. Preferred computer programs for determining the identity among two amino acid sequences comprise, but are not limited to, TBLASTN, BLASTP, BLASTX, TBLASTX (Altschul et al., J. Mol.
- the BLAST programs can be obtained from the National Center for Biotechnology Information (NCBI) and from other sources (BLAST handbook, Altschul et al., NCB NLM NIH Bethesda, MD 20894).
- NCBI National Center for Biotechnology Information
- the ClustalW program can be obtained from
- the term "functional derivative" of a (poly)peptide of the present invention is meant to include any (poly)peptide or fragment thereof that has been chemically or genetically modified in its amino acid sequence, e.g. by addition, substitution and/or deletion of amino acid residue(s) and/or has been chemically modified in at least one of its atoms and/or functional chemical groups, e.g. by additions, deletions, rearrangement, oxidation, reduction, etc. as long as the derivative still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
- a "functional fragment" of the invention is one that forms part of a
- polypeptide or derivative of the invention still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
- amino acid sequence of Formula (I) and (II) are based on the sequences disclosed by TIGFRAMs TIGR04103 (Formula (I) above) and TIGR03913 (Formula (II) above) with the specific preferred amino acids defined for ⁇ - ⁇ 20 and ⁇ - ⁇ 2 ⁇
- the rSAM enzyme for use in the present invention comprises at least one motif selected from the group consisting of
- motif CXXXCXXC (SEQ ID NO: 3), wherein X is any natural amino acid and wherein the motif CXXXCXXC (SEQ ID NO: 3) is preferably comprised in an N-terminal radical SAM domain;
- motif CX 9 _i 5 GX 4 C (SEQ ID NO: 6) reads on a motif consisting of amino acid C, 9 to 15 natural amino acids, amino acid G, 4 natural amino acids and amino acid C.
- the rSAM enzyme for use in the present invention comprises
- rSAM enzyme further comprises
- the rSAM enzyme for use in the present invention comprises
- the rSAM enzyme for use in the present invention comprises an amino acid sequence selected from the group of
- sequences listed in any of SEQ ID NOs: 12 to 54 preferably SEQ ID NOs: 39 and 40, or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 12 to 54;
- amino acid sequence having an amino acid sequence identity of at least 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 55 to 113.
- the present invention is directed to a use of a recombinant vector comprising a nucleic acid encoding an rSAM enzyme as defined above in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a viral or episomal vector, more preferably a vector selected from the group consisting of lentivirus vector, adenovirus vector, baculovirus vector, bacterial vector and yeast vector.
- the viral vector is a lentivirus vector (see for example System Biosciences, Mountain View, CA, USA), adenovirus vector (see for example ViraPower Adenoviral Expression System, Life Technologies, Carlsbad, CA, USA), baculovirus vector (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.), bacterial vector (see for example Novagen, Darmstadt, Germany)) or yeast vector (see for example ATCC Manassas, Virginia).
- Vector construction including the operable linkage of a coding sequence with a promoter and other expression control sequences, is within the ordinary skill in the art.
- a host cell expressing an rSAM enzyme as defined above preferably comprising a recombinant vector as defined above, in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a host cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae cells (see for example Methods in Enzmology, 350, 248, 2002), and Pichia pastoris cells (see for example Pichia Expression Kit Instruction Manual, Invitrogen Corporation, Carlsbad, Calif.); bacterial cells, preferably E.
- coll cells preferably BL21(DE3), K-12 and derivatives (see for example Applied Microbiology and Biotechnology, 72, 211, 2006), and Bacillus subtilis cells, preferably 1012 wild type, 168 Marburg or WB800N (see for example Westers et al., (2004) Mol. Cell. Res. Volume 1694, Issues 1-3 P:299-310); plant cells, preferably Nicotiana tabacum, and Physcomitrella patens (see e.g. Lau and Sun, Biotechnol Adv.
- NIH-3T3 mammalian cells see for example Sambrook and Russell, 2001
- insect cells preferably sf9 insect cells (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.).
- rSAM-associated protein plpY (SEQ ID NO: 121) is optional but can significantly contribute to the efficiency of rSAM enzyme activity in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG.
- the present invention is also directed to an rSAM-associated protein comprising an amino acid sequence selected from the group consisting of (a) SEQ ID NO: 121, (b) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with SEQ ID NO: 121, and (c) a functional fragment and/or functional derivative of (a) or (b), preferably a functional fragment of at least 30 amino acids, more preferably at least 45 amino acids of (a) or (b).
- the rSAM-associated protein is preferably for use in combination with an rSAM enzyme as described above in a method for introducing at least one a-keto-IS 3 -amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG. More preferably, the rSAM-associa- ted protein is expressed by a recombinant vector and/or host cell also expressing the rSAM enzyme as described above.
- the present invention relates to a method for introducing at least one a- keto-IS 3 -amino acid into (poly)-peptides comprising the steps of:
- polypeptide substrate in step (iii), preferably both, optionally together with the rSAM-associated protein of optional step (ii), of the above method are provided in the form of a host cell, more preferably are all co-expressed in the host cell as defined above.
- the host cell for use in step (i) of the above method is an E. coli host cell.
- step (iv) is followed by step (v), wherein the keto-functionality resulting from step (iv) is reduced chemically, preferably by sodium
- borohydride or is converted to an imine, preferably the methoxyamine.
- E. coli expressing an rSAM enzyme as defined above is cultured in a rich medium such as TB medium, LB or YT medium.
- a culture in the above medium preferably in an Erienmeyer flask or ultra-yield flask, is inoculated with an overnight culture at a concentration of preferably about 1:100.
- the culture is grown at about 37°C and shaken at, e.g. about 250 RPM until, e.g. an OD 600 of about 1.2-2.0.
- the culture is cooled, e.g. on ice, and induced with, e.g. IPTG (preferably about ImM final concentration).
- the culture is shaken at, e.g.
- the cells are collected by centrifugation, lysed, and the substrate subjected to purification, preferably Ni-affinity purification.
- purification preferably Ni-affinity purification.
- the product(s) are verified, e.g. by mass spectrometry of the full length or digested precursors, e.g. NHis-precursors.
- the keto-functionality may be reduced (e.g. by sodium borohydride) or converted to the imine, preferably the methoxyamine.
- the present invention is directed to isolated and purified nucleic acids encoding the (poly)peptides for use in the present invention.
- Fig. 1 a) pip gene cluster encoding precursors (plpAl, A2, and A3), rSAM epimerase (plpD), rSAM excision enzyme (plpX) and associated protein (plpY). b) Protein sequences for core peptides of precursors PlpA2 (SEQ ID NO: 122), PlpA3 (SEQ ID NO: 123), PlpA3-9 (SEQ ID NO: 124) and PcpA (SEQ ID NO: 125).
- Fig. 3 MS 2 spectra results for 1 (SEQ ID NO: 122). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
- Fig. 4 MS 2 spectra for 2 (SEQ ID NO: 122). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
- Fig. 5 MS 2 spectra for 3 (SEQ ID NO: 126). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
- Fig. 6 MS 2 spectra for 4 (SEQ ID NO: 126). ⁇ Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C 8 H 9 NO) from the corresponding fragment.
- Fig. 7 Sodium borohydride reduction of a mixture of 1 (SEQ ID NO: 122) and 2 (SEQ ID NO:
- Fig. 8 HMBC spectra for product 5 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto ⁇ -amino acid.
- Fig. 9 HMBC spectra for product 6 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto ⁇ -amino acid.
- Fig. 10 13 C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
- Fig. 11 13 C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
- Fig. 12 Reaction catalyzed by PIpX.
- Fig. 13 Results for all PlpA3-Fx mutants. Shown is the peptide fragment (SEQ ID NOs: 128 to 143) affected by mutation and the respective detection of conversion.
- th( gene was co-expressed with either of the two precursor genes plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117) located upstream.
- the translated precursors contain, in addition to an N-terminal leader region of the Nifll family, predicted core regions of 25 and 23 aa, respectively (Fig. lb).
- the Nifll precursor genes (plpAl and plpA2) were individually cloned with N-terminal His6-tags and a Factor Xa site at the interface of the leader and core and inserted into pACYCDuet-1.
- the rSAM gene (plpX) was cloned into MCSII of pRSFDuet-1 and constructs were transformed into E. coli BL21 DE3 for protein expression (Fig 2). Under these conditions, transformation of the precursors was not observed. More detailed analysis of the pip cluster (Fig. la) revealed a small conserved gene, plpY (SEQ ID NOs: 120 and 121), located downstream of plpX, an architecture also preserved in other clusters with plpX homologs.
- HMBC Hetero- nuclear Multiple Bond Correlation
- PlpA3-Fx was used to investigate the origin of the ⁇ -amino acid moiety by feeding of various 13 C-labeled amino acids to E. coli expression cultures.
- labels of [l- 13 C] Met, [U- 13 C]Met, [l- 13 C]Tyr, and [U- 13 C]Tyr were detected by MS in the peptide products.
- NMR-based characterization of the purified core fragment 6 revealed enhancements of carbon signals that were consistent with Met remaining fully intact (Fig. 10), while only CI of Tyr is retained and accounts for the amide carbonyl in the product (Fig. 11).
- PCC 7327 was co- expressed with its cognate excisase gene partners pcpX and pcpY. The excision reaction was detected at two of the three YG motifs. With a translated core containing 64 aa and three predicted YG motifs, the pep pathway generates a giant natural product that may exceed the size of all specialized metabolites reported to date.
- Factor Xa protease was purchased from Merck (USA). Restriction enzymes and GluC were purchased from New England Biolabs (USA). Thermo Scientific (USA) Phusion ® DNA polymerase and T4 DNA ligase were used for all PCRs and ligations, respectively. DNA primers were obtained from Microsynth (Switzerland) or Thermo Scientific (USA). Antibiotics (chloramphenicol for pACYCDuet-1 and kanamycin for pRSFDuet-1) were used at a concentration of 25mg/mL in solid and liquid medium.
- Expression vectors containing NHis 6 -precursor genes were constructed as follows. Mini-preps derived from previously reported plasmids (plpAl-Fx, plpA2-Fx, and p/p3-Fx in pET-28b) (see Morinaka, B. I. et al. Angew. Chem. Int. Ed. 53, 8503-8507 (2014)) containing NHis 6 -precursor genes containing a Factor Xa site (IDGR) at the interface of leader and core peptide were digested with Ncol and EcoRI.
- IDGR Factor Xa site
- the precursor peptide inserts were gel-purified, ligated into MCSI of pACYCDuet-1 to give pAlFxACYC, pA2FxACYC, and pA3FxACYC. These precursor constructs were sequence verified. Constructs for the excision enzyme (PlpX) and associated protein (PlpY) were constructed as follows. The gene for the excision enzyme was amplified by PCR (primers PlpX_F,
- ATCTCTCG AGTTACTTTG CT A AAG CGTA AG C AG A (SEQ ID NO: 145)) and products were gel-purified, digested with Ndel and Xhol and ligated into MCSII of pRSFDuet-1 to give plasmid pXRSF.
- the gene for the associated protein was amplified by PCR (PlpY_F, GCGAACTCATGA ACTCTAATCAAATACCAAATAAA (SEQ ID NO: 146) and PlpY_R, GCGCAGCTGT- TATGTCAGAAAATTGCT (SEQ ID NO: 147)), gel-purified, digested with BspHI and Sail, ligated into MCSI of pXRSF (cut with Ncol and Sail) to give pXYRSF, and the insert confirmed by sequencing.
- Precursor constructs were transformed and expressed in E. coli BL21(DE3) cells alone and with pXRSF or pXYRSF. Proteins containing a Factor Xa cleavage site are denoted with an 'Fx'.
- TB medium (30 mL) containing appropriate antibiotics was inoculated with 300 ⁇ _ overnight culture grown in LB. The cells were grown at 37°C at 250 rpm until an OD 600 of ⁇ 1.6-2.0, cooled on ice for 30 min, induced with IPTG (1 mM final concentration), then shaken for 24 hours (250 rpm, 16 °C). The cells were collected by centrifugation (3,220 x g, 10 min). Proteins were purified using Ni-NTA resin (Macherey-Nagel (Germany)) according to the manufacturer's protocol. 10% glycerol was added to the lysis, wash and elution buffers.
- Proteins were adsorbed using 0.5 mL Ni-NTA resin, and eluted with 2.5 mL (250 mM imidazole, 50 mM sodium phosphate, 300 mM NaCI, and 10% (v/v) glycerol, pH 8). Elution fractions were desalted on a PD-10 column, digested with Factor Xa or trypsin, and ana- lyzed by LC-MS and MALDI.
- LC-MS conditions column: Kinetex C18-XB, 2.6 ⁇ , 150 x 4.6 mm; flow rate: 1.0 mL/min; mobile phase/gradient: 95:5 A/B for 5 minutes ramped to 40:60 A/B over 30 minutes.
- SEQ ID NO: 12 (70% TIGR03913, >WP_052261552)
- SEQ ID NO: 15 (70% TIGR03913, >OCW56221)
- SEQ ID NO: 19 (70% TIGR03913, >WP_050043969)
- SEQ ID NO: 20 (70% TIGR03913, >WP_020539729)
- SEQ ID NO: 24 (70% TIGR03913, >WP_062765523)
- SEQ ID NO: 26 (70% TIGR04103, >WP_020737613)
- SEQ ID NO: 28 (70% TIGR04103, >KIG18351)
- SEQ ID NO: 29 (70% TIGR04103, >WP_006974883)
- SEQ ID NO: 30 (70% TIGR04103, >WP_012234464)
- SEQ ID NO: 32 (70% TIGR04103, >WP_010607032)
- SEQ ID NO: 33 (70% TIGR04103, >WP_010607027)
- SEQ ID NO: 34 (70% TIGR04103, >WP_054014533)
- SEQ ID NO: 38 (70% TIGR04103, >SEA53645)
- SEQ ID NO: 45 (70% TIGR04103, >WP_015929579)
- SEQ ID NO: 46 (70% TIGR04103, >SFE54945)
- SEQ ID NO: 48 (70% TIGR04103, >SFE67100)
- SEQ ID NO: 50 (70% TIGR04103, >WP_006972642)
- SEQ ID NO: 52 (70% TIGR04103, >WP_006969608)
- SEQ ID NO: 54 (70% TIGR04103, >AKV02060)
- SEQ ID NO: 56 (80% TIGR03913, >WP_052261552)
- SEQ ID NO: 70 (80% TIGR03913, >WP_006753568)
- SEQ ID NO: 72 (80% TIGR03913, >WP_062765523)
- SEQ ID NO: 82 (80% TIGR04103, >WP_012234464)
- SEQ ID NO: 83 (80% TIGR04103, >WP_002625456)
- SEQ ID NO: 90 (80% TIGR04103, >WP_002708735) MSTQLRQRRTYAVWEITLKCNLACQHCGSRAGEARQDELSTAEALDLVQQMAEAGIGEVTLIGGEAFLRKD
Landscapes
- Genetics & Genomics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
- Enzymes And Modification Thereof (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present invention relates to the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one α-keto-ß3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine. Furthermore, the present invention relates to a method for introducing at least one α-keto-ß3-amino acid into (poly)-peptides comprising said XYG motif using a radical S-adenosyl methionine (rSAM) enzyme.
Description
USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-ΚΕΤΟ-β3-ΑΜΙΝΟ ACIDS INTO (POLY)PEPTIDES
The present invention relates to the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-¾3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine. Furthermore, the present invention relates to a method for introducing at least one a-keto-¾3-amino acid into (poly)-peptides comprising said XYG motif using a radical S-adenosyl methionine (rSAM) enzyme.
β-amino acids are widely distributed in nature and are present in natural product-derived drugs such as penicillin, taxol and cocaine. The installation of the β-amino acids in peptides offers, e.g., greater stability (for example from protease-mediated degradation) and can confer structural features distinct from all L-amino acid-containing peptides or natural products, resulting in unique functions and bioactivities. Known methods to incorporate β-amino acids into peptides generally rely on total synthesis, for example, by condensation of monomers in solid- phase synthesis. A second approach relies on in vitro translation, which usually suffers from low incorporation efficiency.
Substantially all known ribosomally synthesized peptides and proteins are comprised of a-amino acids. No post-translational modifying enzymes are known that incorporate β-amino acids into ribosomal products. Biosynthetic routes to β-amino acid-containing peptides typically utilize non-ribosomal peptide synthetases (NRPS) that act on free amino acids or peptide residues. These enzymes are very large multimodular enzymes and are difficult to manipulate for bioengineering or biotechnological applications. For example, the loading of amino acids onto NRPS usually occurs specifically for certain amino acids. Furthermore, the NRPS-type machinery is limited to small peptides with typically less than 15 residues. Therefore, incorporation of β- amino acids into proteins by NRPSs is not possible.
Czekster et al. (Czekster et al., JACS 2016, 138, 5194-5197) report that E. coli elongation factor Tu (EF-Tu) and phenylalanyl-tRNA synthetase collaborate with erythromycin-resistant E. coli ribosomes to incorporate β3-Ρΐΐθ analogs into full length dihydrofolate reductase (DHFR) in vivo. However, this system is rather complex and limited to specific β-amino acid substrates as well as to target proteins that can be modified with the β-amino acid.
US 2003/113882 Al discloses a method for producing enantiomerically pure β-amino acids from a-amino acids. For this conversion a lysine 2,3-aminomutase catalyst is required. However, US 2003/113882 Al is silent on the incorporation of IS3-amino acid into peptides or proteins.
In summary, there is a need for methods for the incorporation of diverse IS3-amino acids into peptides or proteins. It is the objective of the present invention to provide new means and methods for introducing IS3-amino acids into peptides and polypeptides.
In a first aspect, the above objective is solved by the use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-IS3-amino acid into (poly)- peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine.
It was surprisingly found that radical S-adenosyl methionine (rSAM) enzymes have the capacity to incorporate different types of a-keto-IS3-amino acids into diverse (poly)peptide substrates that comprise one or more of amino acid motif XYG. rSAM enzymes are monomeric enzymes of up to 50 kDa and can be overexpressed in high yields in many cells, e.g. in E. coli (see below). Conversely, peptides generated by NRPSs require huge enzyme complexes and are typically quite specific to result in either a single product or a few closely related analogs only.
Preferred rSAM enzymes for use in the present invention were identified by their surrogate substrate nifll precursor peptides within ribosomally synthesized and post-translationally modified peptide (RiPP) natural product gene clusters. Gene clusters containing NHLP- or Nifll- type precursors are considered a new natural product family (see Freeman et al. Science 338, 387-90 (2012) and Haft et al. BMC Biology 8, 70 (2010)) termed "proteusins". Although proteusin biosynthetic gene clusters are widespread in bacteria, their biosynthetic end products and functions are currently unknown with the exception of the cytotoxic pore-forming polytheonamides (see Hamada et al. Tetrahedron Lett. 35, 719-720 (1994) and Freeman et al. Science 338, 387-90 (2012)). In view of previous in silico studies (see Haft et al. J . Bacteriol. 193, 2745-2755 (2011)) and a search of genomic databases in cyanobacteria, a proteusin pathway containing one or two nifll precursor peptides (containing an XYG motif in the core peptide) as the substrate(s), an rSAM enzyme annotated as SPASM domain, and a small associated protein were surprisingly identified.
For example, a-keto^3-amino acid-comprising (poly)peptide products can conveniently be generated according to the present invention by in vivo co-expression of a (poly)peptide substrate with the rSAM enzymes in bacteria and also other organisms, e.g. yeast, plant cells, mammalian cells or insect cells. In addition to introducing β3-3 ΐτιίηο acid topology into the backbone of
ribosomal (poly)peptide products, the rSAM enzymes described herein also introduce a keto functionality that is not present in proteinogenic amino acids. The introduction of a keto functionality into ribosomally synthesized (poly)peptides is of interest, e.g. for site-selective modification in a wide range of applications and is usually carried out in the prior art by (i) non-selective chemical oxidation such as sodium periodate to oxidize all side chains of serine or by (ii) introduction of non-canonical amino acids by an unnatural amino acid mutagenesis strategy based on recoding of stop codons.
In a preferred example for the use of an rSAM according to the present invention, the pro- teusin cluster, e.g. from Pleurocapsa sp. PCC 7319, containing genes for two substrates (e.g. plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117), an rSAM SPASM protein (e.g. plpX (SEQ ID NOs: 118 and 119), and optionally an associated protein (e.g. plpY (SEQ ID NOs: 120 and 121) can be cloned into expression vectors for expression in E. coli. The associated protein, e.g. plpY is only optional, e.g. to increase the enzymatic efficiency of the rSAM and is not required for practicing the present invention. The substrate peptides (e.g. plpA2 and plpA3) can be cloned into an expression vector, e.g. with an N-terminal His6 tag for purification by affinity chromatography. The rSAM enzyme and the optional associated protein (e.g. pip X and plpY) can be cloned into a second expression vector for convenient co-transformation, e.g. with a substrate-containing vector. The substrate peptide genes can be individually transformed with the rSAM enzyme and the optional associated protein (e.g. plpX and plpY), e.g. into E. coli and protein overexpression can be carried out in standard medium. Following purification, e.g. Ni-affinity purification of the N-terminally hexahistidine-tagged (NHis6) peptides, and in vitro cleavage of leader, e.g. with a protease, the core (poly)peptide comprising the a-keto^3-amino acid(s) can be analyzed, e.g. by MALDI-MS, and optionally the a-keto^-amino acid(s) can be reduced to the corresponding β-amino acid(s) using standard chemical methods, e.g. using sodium borohydride.
Also, the introduction of a-keto^3-amino acids into (poly)peptides using the rSAM enzymes according to the present invention can be carried out in vitro.
Without wishing to be bound by theory, it is assumed that the rSAM enzyme for use in the present invention excises a structure from the XYG motif that has a mass of C8H9NO, as determined by high resolution mass spectrometry and is attributed to the tyrosine residue. The structures of the transformed substrates are products containing the a-keto^3-amino acid. The XYG motif is conserved among a wide range of precursors, e.g. from related cyanobacterial gene clusters, from bacteria of the genus Thiothrix, Pseudoalteromonas, Frankia, Bulkholderia, or Nitrospirilium and serves as general tyrosine excision site for placement of a-keto^3-amino
acids. Also, precursors carrying more than one XYG motif can be converted to products with several a-keto^3-amino acids after expression of the rSAM enzyme in E. coli.
The term β-amino acid, as used herein, refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has its amino group bonded to the β-carbon rather than the a carbon.
The term a-keto^3-amino acid, as used herein, refers to any non-natural or non-conventional amino acid, preferably to any proteinogenic amino acid, more preferably to any of the 20 standard amino acids, that has a keto group at the a carbon and its amino group bonded to the 3-ΰ3^οη rather than the a carbon.
The term (poly)peptide, as used herein, is meant to encompass peptides, polypeptides, oligopeptides and proteins that comprise two or more amino acids linked covalently through peptide bonds. The term does not refer to a specific length of the product. The term (poly)- peptide includes (poly)peptides with post-translational modifications, for example, glycosylates, acetylations, phosphorylations and the like, as well as (poly)peptides comprising non- natural or non-conventional amino acids and functional derivatives as described below.
The term non-natural or non-conventional amino acid refers to naturally occurring or naturally not occurring unnatural amino acids or chemical amino acid analogues, e.g. D-amino acids, α,α-disubstituted amino acids, N-alkyl amino acids, homo-amino acids, dehyd roamino acids, aromatic amino acids (other than phenylalanine, tyrosine and tryptophan), and ortho-, meta- or para-aminobenzoic acid. Non-conventional amino acids also include compounds which have an amine and carboxyl functional group separated in a 1,3 or larger substitution pattern, such as β-alanine, y-amino butyric acid, Freidinger lactam, the bicyclic dipeptide (BTD) , amino- methyl benzoic acid and others well known in the art. Statine-like isosteres, hydroxyethylene isosteres, reduced amide bond isosteres, thioamide isosteres, urea isosteres, carbamate isosteres, thioether isosteres, vinyl isosteres and other amide bond isosteres known to the art may also be used. A non limiting list of non-conventional amino acids which may be comprised in the (poly)peptide and their standard abbreviations (in brackets) is as follows: a-aminobutyric acid (Abu), L-N-methylalanine (Nmala), α-amino-a-methylbutyrate (Mgabu), L-N-methylarginine (Nmarg), aminocyclopropane (Cpro), L-N-methylasparagine (Nmasn), carboxylate L-N-methyl- aspartic acid (Nmasp), aniinoisobutyric acid (Aib), L-N-methylcysteine (Nmcys), aminonorbornyl (Norb), L-N-methylglutamine (Nmgln), carboxylate L-N-methylglutamic acid (Nmglu), cyclohexyl- alanine (Chexa), L-N-methylhistidine (Nmhis), cyclopentylalanine (Cpen), L-N-methylisolleucine
(Nmile), L-N-methylleucine (Nmleu), L-N-methyllysine (Nmlys), L-N-methylmethionine (Nmmet), L-N-methylnorleucine (Nmnle), L-N-methylnorvaline (Nmnva), L-N-methylornithine (Nmorn), L- N-methylphenylalanine (Nmphe), L-N-methylproline (Nmpro), L-N-methylserine (Nmser), L-N- methylthreonine (Nmthr), L-N-methyltryptophan (Nmtrp), D-ornithine (Dorn), L-N-methyltyro- sine (Nmtyr), L-N-methylvaline (Nmval), L-N-methylethylglycine (Nmetg), L-N-methyl-t-butyl- glycine (Nmtbug), L-norleucine (Nle), L-norvaline (Nva), a-methyl-aminoisobutyrate (Maib), a- methyl-y-aminobutyrate (Mgabu), D-a-methylalanine (Dmala), a-methylcyclohexylalanine (Mchexa), D-a-methylarginine (Dmarg), a-methylcylcopentylalanine (Mcpen), D-a-methylaspara- gine (Dmasn), α-methyl-a-napthylalanine (Manap), D-a-methylaspartate (Dmasp), a-methyl- penicillamine (Mpen), D-a-methylcysteine (Dmcys), N-(4-aminobutyl)glycine (Nglu), D-a- methylglutamine (Dmgln), N-(2-aminoethyl)glycine (Naeg), D-a-methylhistidine (Dmhis), N-(3 - aminopropyl)glycine (Norn), D-a-methylisoleucine (Dmile), N-amino-a-methylbutyrate
(Nmaabu), D-a-methylleucine (Dmleu), a-napthylalanine (Anap), D-a-methyllysine (Dmlys), N- benzylglycine (Nphe), D-a-methylmethionine (Dmmet), N-(2-carbamylethyl)glycine (Ngln), D-a- methylornithine (Dmorn), N-(carbamylmethyl)glycine (Nasn), D-a-methylphenylalanine (Dmphe), N-(2-carboxyethyl)glycine (Nglu), D-a-methylproline (Dmpro), N-(carboxymethyl)glycine (Nasp), D-a-methylserine (Dmser), N-cyclobutylglycine (Ncbut), D-a-methylthreonine (Dmthr), N-cyclo- heptylglycine (Nchep), D-a-methyltryptophan (Dmtrp), N-cyclohexylglycine (Nchex), D-a-methyl- tyrosine (Dmty), N-cyclodecylglycine (Ncdec), D-a-methylvaline (Dmval), N-cylcododecylglycine (Ncdod), D-N-methylalanine (Dnmala), N-cyclooctylglycine (Ncoct), D-N-methylarginine
(Dnmarg), N-cyclopropylglycine (Ncpro), D-N-methylasparagine (Dnmasn), N-cycloundecylglycine (Ncund), D-N-methylaspartate (Dnmasp), N-(2,2-diphenylethyl)glycine (Nbhm), D-N-methylcys- teine (Dnmcys), N-(3,3-diphenylpropyl)glycine (Nbhe), D-N-methylglutamine (Dnmgln), N-(3 - guanidinopropyl)glycine (Narg), D-N-methylglutamate (Dnmglu), N-( 1 -hydroxyethyl)glycine (Ntbx), D-N-methylhistidine (Dnmhis), N-(hydroxyethyl))glycine (Nser), D-N-methylisoleucine (Dnmile), N-(imidazolylethyl))glycine (Nhis), D-N-methylleucine (Dnmleu), N-(3 -indolylyethyl)- glycine (Nhtrp), D-N-methyllysine (Dnnilys), N-methyl-y-aminobutyrate (Nmgabu), N-methyl- cyclohexylalanine (Nmchexa), D-N-methylmethionine (Dnmmet), D-N-methylornithine (Dnmorn), N-methylcyclopentylalanine (Nmcpen), N-methylglycine (Nala), D-N-methylphenylalanine (Dnmphe), N-methylaminoisobutyrate (Nmaib), D-N-methylproline (Dnmpro), N-( 1 -methyl- propyl)glycine (Nile), D-N-methylserine (Dnmser), N-(2-methylpropyl)glycine (Nleu), D-N-methyl- threonine (Dnmthr), D-N-methyltryptophan (Dnmtrp), N-(l-methylethyl)glycine (Nval), D-N- methyltyrosine (Dnmtyr), N-methyla-napthylalanine (Nmanap), D-N-methylvaline (Dnmval), N-
methylpenicillamine (Nmpen), y-aminobutyric acid (Gabu), N-(p-hydroxyphenyl)glycine (Nhtyr), L-/-butylglycine (Tbug), N-(thiomethyl)glycine (Ncys), L-ethylglycine (Etg), penicillamine (Pen), L- homophenylalanine (Hphe), L-a-methylalanine (Mala), L-a-methylarginine (Marg), L-a-methyl- asparagine (Masn), L-a-methylaspartate (Masp), L-a-methyl-t-butylglycine (Mtbug), L-a-methyl- cysteine (Mcys), L-methylethylglycine (Metg), L-a-methylglutamine (Mgln), L-a-methylglutamate (Mglu), L-a-methylhistidine (Mhis), L-a-methylhomophenylalanine (Mhphe), L-a-methyliso- leucine (Mile), N-(2-methylthioethyl)glycine (Nmet), L-a-methylleucine (Mleu), L-a-methyllysine (Mlys), L-a-methylmethionine (Mmet), L-a-methylnorleucine (Mnle), L-a-methylnorvaline (Mnva), L-a-methylornithine (Morn), L-a-methylphenylalanine (Mphe), L-a-methylproline (Mpro), L-a-methylserine (Mser), L-a-methylthreonine (Mthr), L-a-methyltryptophan (Mtrp), L- a-methyltyrosine (Mtyr), L-a-methylvaline (Mval), L-N-methylhomophenylalanine (Nmhphe), N- (N-(2,2-diphenylethyl)carbamylmethyl)glycine (Nnbhm), N-(N-(3 ,3 -diphenylpropyl)carbamyl- methyl)glycine (Nnbhe), l-carboxy-l-(2,2-diphenyl-ethylamino)cyclopropane (Nmbc), L-O-methyl serine (Omser), L-O-methyl homoserine (Omhser).
In a preferred embodiment, the rSAM enzyme for use in the present invention is a peptide radical SAM maturase, preferably a nifll class peptide radical SAM maturase 3. The nifll-class peptide radical SAM maturase 3 (IPR026482) belongs to the conserved protein domain family rSAM_nifll_3 (nifll-class peptide radical SAM maturase 3; IPR026482) or the rad_SAM_trio family (radical SAM GDL-associated; IPR023820) as defined by the National Center for Biotechnology Information (NCBI). Members of the rSAM_nifll_3 family are radical SAM enzymes that often occur co-clustered together with nifll-related ribosomal natural product (RNP) precursors described by TIGRFAMs model TIGR03798. Members of the rad_SAM_trio family are radical SAM enzymes that often occur co-clustered together with DUF1843-domain RNP precursors carrying a YXioGDL motif and described by Pfam model PF08898 (see Haft et al. Nucleic Acids Res. 31, 371-3 (2003); Haft and Basu, J. Bacteriol. 193, 2745-2755 (2011); NCBI database on
"rSAM_nifll_3" and "rad_SAM_trio").
In a preferred embodiment, the rSAM enzyme for use in the present invention comprises (A) an amino acid sequence according to Formula (I) (SEQ ID NO: 1) or (II) (SEQ ID NO: 2)
FormulS (I); Xi-X2~X3~X4~X5~X6~X7~X8~^9~^10~^ll~^12~^13~^14~^15~^16~^17~^18~^19~^20/
Formula (II): Z1-Z2-Z3-Z4-Z5-Z6-Z7-Z8-Z9-Z1o-Z11-Z12-Z13-Z14-Z15-Z16-Z17-Z18-Z19-Z2o,
wherein Χι-Χ20 and Zi-Z20 each denote amino acids and
Xi is selected from the group consisting of Y, H, F and W, preferably Y and H;
X2 is selected from the group consisting of Y, R and H;
X3 is selected from the group consisting of R, K and Q, preferably R;
X4 is selected from the group consisting of I, T and V, preferably I and T;
X5 is selected from the group consisting of R, S and K, preferably R and S; X6 is selected from the group consisting of H, Y, F and W, preferably H and Y; X7 is selected from the group consisting of A and S, preferably A;
X8 is selected from the group consisting of V, I and L, preferably V;
X9 is selected from the group consisting of W, Y and F, preferably W;
Xio is selected from the group consisting of E, Q, D and K, preferably E;
Xii is selected from the group consisting of I, L, V and M, preferably I and L; Xi2 is selected from the group consisting of T and S, preferably T;
Xi3 is selected from the group consisting of L, M, I and V, preferably L;
Xi4 is selected from the group consisting of K, R, E and Q, preferably K;
Xi5 is C;
Xi6 is selected from the group consisting of N and D, preferably N;
Xi7 is selected from the group consisting of L, M, I and V, preferably L;
Xi8 is selected from the group consisting of A and S, preferably A;
X2o is selected from the group consisting of S, Q, E and K, preferably S and Q; Zi is selected from the group consisting of T, D, T, E and N, preferably T and D; Z2 is selected from the group consisting of R, P, N and A, preferably R, P and A; Z3 is selected from the group consisting of R, Q, K and L, preferably R;
Z4 is P;
Z5 is selected from the group consisting of A and S, preferably A;
Z6 is selected from the group consisting of R, K and Q, preferably R;
Z7 is selected from the group consisting of Y, F, H and W, preferably Y;
Z8 is selected from the group consisting of L, M, I and V, preferably L;
Z9 is selected from the group consisting of F, H, S and Y, preferably H, F and S;
Z10 is selected from the group consisting of D, E and A, preferably D and E;
Zn is selected from the group consisting of D, S and T, preferably T;
Z12 is selected from the group consisting of D, E and N, preferably D;
Zi3 is selected from the group consisting of Y, F, L and M, preferably Y, F and L;
Zi4 is selected from the group consisting of K, Q, R and E, preferably Q. and K;
Zi5 is selected from the group consisting of R, K and Q, preferably R;
Zi6 is selected from the group consisting of Y, F and W, preferably Y and F; Zi7 is selected from the group consisting of V, I and L, preferably V;
Zi9 is selected from the group consisting of V, I and L, preferably V; and
Z2o is selected from the group consisting of H and Y, preferably H; or
(B) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with at least one of the amino acid sequences of Formula (I) or (II), more preferably an amino acid sequence having at least 14, 16, 18 or 19 of the 20 amino acids of Formula (I) or (II); or
(C) a functional fragment and/or functional derivative of (A) or (B), preferably a functional fragment of at least 10 amino acids, more preferably at least 15 amino acids of (A) or (B). The percentage identity of related amino acid molecules can be determined with the assistance of known methods. In general, special computer programs are employed that use algorithms adapted to accommodate the specific needs of this task. Preferred methods for determining identity begin with the generation of the largest degree of identity among the sequences to be compared. Preferred computer programs for determining the identity among two amino acid sequences comprise, but are not limited to, TBLASTN, BLASTP, BLASTX, TBLASTX (Altschul et al., J. Mol. Biol., 215, 403-410, 1990), or ClustalW (Larkin MA et al., Bioinformatics, 23, 2947-2948, 2007). The BLAST programs can be obtained from the National Center for Biotechnology Information (NCBI) and from other sources (BLAST handbook, Altschul et al., NCB NLM NIH Bethesda, MD 20894). The ClustalW program can be obtained from
http://www.clustal.org.
The term "functional derivative" of a (poly)peptide of the present invention is meant to include any (poly)peptide or fragment thereof that has been chemically or genetically modified in its amino acid sequence, e.g. by addition, substitution and/or deletion of amino acid residue(s) and/or has been chemically modified in at least one of its atoms and/or functional chemical groups, e.g. by additions, deletions, rearrangement, oxidation, reduction, etc. as long as the derivative still has at least some rSAM activity to a measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
In this context a "functional fragment" of the invention is one that forms part of a
(poly)peptide or derivative of the invention and still has at least some rSAM activity to a
measurable extent, e.g. of at least about 1 to 10%, preferably 10 to 50% rSAM activity of the original unmodified (poly)peptide of the invention.
The amino acid sequence of Formula (I) and (II) are based on the sequences disclosed by TIGFRAMs TIGR04103 (Formula (I) above) and TIGR03913 (Formula (II) above) with the specific preferred amino acids defined for Χι-Χ20 and Ζι-Ζ2ο·
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises at least one motif selected from the group consisting of
(i) motif CXXXCXXC (SEQ ID NO: 3), wherein X is any natural amino acid and wherein the motif CXXXCXXC (SEQ ID NO: 3) is preferably comprised in an N-terminal radical SAM domain;
(ii) motif EXTXXCXXXCXXCGXRXXXXRXXEL (SEQ ID NO: 4), wherein X is any natural amino acid;
(iii) motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5), wherein X is any natural amino acid; and
(iv) motif CX9_15GX4C (SEQ ID NO: 6) and CX2CX5CX3CX14_18C (SEQ ID NO: 7), wherein X is any natural amino acid and the integers denote the number of X(s), and wherein the motif CX9_ 15GX4C (SEQ ID NO: 6) and CX2CX5CX3CX14-18C (SEQ ID NO: 7) is preferably comprised in a C- terminal SPASM domain.
For example, motif CX9_i5GX4C (SEQ ID NO: 6) reads on a motif consisting of amino acid C, 9 to 15 natural amino acids, amino acid G, 4 natural amino acids and amino acid C.
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises
(i) an amino acid sequence according to Formula (I) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (I);
(ii) a motif EXTXXCXXXCXXCGXRXXXXRXXEL (SEQ ID NO: 4), wherein X is any natural amino acid;
and wherein the rSAM enzyme further comprises
(iii) a motif CXXGXXXXXXXXXGXXKXCP (SEQ ID NO: 8) and/or
GXCXXCXXXXXCXXXCXXXXXXXXXXXGXNPXCXXR (SEQ ID NO: 9), wherein X is any natural amino acid.
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises
(i) an amino acid sequence according to Formula (II) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (II);
(ii) a motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5), wherein X is any natural amino acid; and wherein the rSAM enzyme further comprises
(iii) a motif CXAGXXXXXEADGXXKXCPXL (SEQ ID NO: 10) and/or
CXXCYYXXXCXXGCXWXXXXLXGXXGXNPXCXXR (SEQ ID NO: 11), wherein X is any natural amino acid.
In a further preferred embodiment, the rSAM enzyme for use in the present invention comprises an amino acid sequence selected from the group of
(i) sequences listed in any of SEQ ID NOs: 12 to 54, preferably SEQ ID NOs: 39 and 40, or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 12 to 54; and
(ii) sequences listed in any of SEQ ID NOs: 55 to 113, preferably SEQ ID NOs: 93 and 94 or an
amino acid sequence having an amino acid sequence identity of at least 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 55 to 113.
In another aspect, the present invention is directed to a use of a recombinant vector comprising a nucleic acid encoding an rSAM enzyme as defined above in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a viral or episomal vector, more preferably a vector selected from the group consisting of lentivirus vector, adenovirus vector, baculovirus vector, bacterial vector and yeast vector.
The selection of a suitable vector and expression control sequences as well as vector construction are within the ordinary skill in the art. Preferably, the viral vector is a lentivirus vector (see for example System Biosciences, Mountain View, CA, USA), adenovirus vector (see for example ViraPower Adenoviral Expression System, Life Technologies, Carlsbad, CA, USA), baculovirus vector (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.), bacterial vector (see for example Novagen, Darmstadt, Germany)) or yeast vector (see for example ATCC Manassas, Virginia). Vector construction, including the operable linkage of a coding sequence with a promoter and other expression control sequences, is within the ordinary skill in the art.
In a further preferred embodiment is directed to a use of a host cell expressing an rSAM enzyme as defined above, preferably comprising a recombinant vector as defined above, in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine
and G is glycine, preferably a host cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae cells (see for example Methods in Enzmology, 350, 248, 2002), and Pichia pastoris cells (see for example Pichia Expression Kit Instruction Manual, Invitrogen Corporation, Carlsbad, Calif.); bacterial cells, preferably E. coll cells, preferably BL21(DE3), K-12 and derivatives (see for example Applied Microbiology and Biotechnology, 72, 211, 2006), and Bacillus subtilis cells, preferably 1012 wild type, 168 Marburg or WB800N (see for example Westers et al., (2004) Mol. Cell. Res. Volume 1694, Issues 1-3 P:299-310); plant cells, preferably Nicotiana tabacum, and Physcomitrella patens (see e.g. Lau and Sun, Biotechnol Adv. 2009 27(6):1015-22); NIH-3T3 mammalian cells (see for example Sambrook and Russell, 2001); and insect cells, preferably sf9 insect cells (see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.).
The use of the rSAM-associated protein plpY (SEQ ID NO: 121) is optional but can significantly contribute to the efficiency of rSAM enzyme activity in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG. Therefore, the present invention is also directed to an rSAM-associated protein comprising an amino acid sequence selected from the group consisting of (a) SEQ ID NO: 121, (b) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with SEQ ID NO: 121, and (c) a functional fragment and/or functional derivative of (a) or (b), preferably a functional fragment of at least 30 amino acids, more preferably at least 45 amino acids of (a) or (b).
The rSAM-associated protein is preferably for use in combination with an rSAM enzyme as described above in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG. More preferably, the rSAM-associa- ted protein is expressed by a recombinant vector and/or host cell also expressing the rSAM enzyme as described above.
In another aspect, the present invention relates to a method for introducing at least one a- keto-IS3-amino acid into (poly)-peptides comprising the steps of:
(i) providing a radical S-adenosyl methionine (rSAM) enzyme as defined above, and/or a host cell as defined above,
(ii) optionally providing an rSAM-associated protein as defined above,
(iii) providing at least one (poly)peptide substrate of interest comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, and
(iv) contacting the enzyme and/or host cell of (i) with the substrate of (iii), and optionally the rSAM-associated protein of (ii), under conditions suitable for the enzymatic introduction of at least one a-keto^3-amino acid into the substrate.
In a preferred embodiment, the at least one of the enzyme in step (i) and the
(poly)peptide substrate in step (iii), preferably both, optionally together with the rSAM-associated protein of optional step (ii), of the above method are provided in the form of a host cell, more preferably are all co-expressed in the host cell as defined above.
In a further preferred embodiment, the host cell for use in step (i) of the above method is an E. coli host cell. In addition, it is preferred that step (iv) is followed by step (v), wherein the keto-functionality resulting from step (iv) is reduced chemically, preferably by sodium
borohydride, or is converted to an imine, preferably the methoxyamine.
As an example of the above method, E. coli expressing an rSAM enzyme as defined above is cultured in a rich medium such as TB medium, LB or YT medium. A culture in the above medium, preferably in an Erienmeyer flask or ultra-yield flask, is inoculated with an overnight culture at a concentration of preferably about 1:100. The culture is grown at about 37°C and shaken at, e.g. about 250 RPM until, e.g. an OD600 of about 1.2-2.0. The culture is cooled, e.g. on ice, and induced with, e.g. IPTG (preferably about ImM final concentration). The culture is shaken at, e.g. about 16°C at about 250 RPM for, e.g. 24 hours and the cells are collected by centrifugation, lysed, and the substrate subjected to purification, preferably Ni-affinity purification. Following purification, the product(s) are verified, e.g. by mass spectrometry of the full length or digested precursors, e.g. NHis-precursors. The keto-functionality may be reduced (e.g. by sodium borohydride) or converted to the imine, preferably the methoxyamine.
In a further aspect, the present invention is directed to isolated and purified nucleic acids encoding the (poly)peptides for use in the present invention.
The following Figures and Examples serve to illustrate the invention and are not intended to limit the scope of the invention as described in the appended claims.
Fig. 1: a) pip gene cluster encoding precursors (plpAl, A2, and A3), rSAM epimerase (plpD), rSAM excision enzyme (plpX) and associated protein (plpY). b) Protein sequences for core peptides of precursors PlpA2 (SEQ ID NO: 122), PlpA3 (SEQ ID NO: 123), PlpA3-9 (SEQ ID NO: 124) and PcpA (SEQ ID NO: 125).
Fig. 2: Detection of PlpX activity by coexpression experiments in E. coli. HPLC chromato- grams indicating starting material (SM). New products 1-4 are only detected in coexpression experiments with PlpX and Y. The inlays are extracted mass spectra (t = 27.6 - 28.6 min for
PlpA2-Fx and t = 21.3 - 22.5 min for PlpA3-Fx) from the corresponding LC-MS chromatograms. In all cases the [M+H]2+ ions are shown.
Fig. 3: MS2 spectra results for 1 (SEQ ID NO: 122). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 4: MS2 spectra for 2 (SEQ ID NO: 122). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 5: MS2 spectra for 3 (SEQ ID NO: 126). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 6: MS2 spectra for 4 (SEQ ID NO: 126). Ό Ex' indicates no excision. '-1 Tya' indicates loss of 'tyramine' (C8H9NO) from the corresponding fragment.
Fig. 7: Sodium borohydride reduction of a mixture of 1 (SEQ ID NO: 122) and 2 (SEQ ID NO
122).
Fig. 8: HMBC spectra for product 5 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto^-amino acid.
Fig. 9: HMBC spectra for product 6 (SEQ ID NO: 127) showing key HMBC correlations to th keto and amide carbonyls of the a-keto^-amino acid.
Fig. 10: 13C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
Fig. 11: 13C spectra for product 6 (SEQ ID NO: 127) from feeding experiments with methionine.
Fig. 12: Reaction catalyzed by PIpX.
Fig. 13: Results for all PlpA3-Fx mutants. Shown is the peptide fragment (SEQ ID NOs: 128 to 143) affected by mutation and the respective detection of conversion.
Example 1: The function of plpX - a new rSAM
To investigate the function of pi pX (SEQ ID NOs: 118 and 119) encoding the new rSAM, th( gene was co-expressed with either of the two precursor genes plpA2 (SEQ ID NOs: 114 and 115) and plpA3 (SEQ ID NOs: 116 and 117) located upstream. The translated precursors contain, in addition to an N-terminal leader region of the Nifll family, predicted core regions of 25 and 23 aa, respectively (Fig. lb).
The Nifll precursor genes (plpAl and plpA2) were individually cloned with N-terminal His6-tags and a Factor Xa site at the interface of the leader and core and inserted into pACYCDuet-1. The rSAM gene (plpX) was cloned into MCSII of pRSFDuet-1 and constructs were
transformed into E. coli BL21 DE3 for protein expression (Fig 2). Under these conditions, transformation of the precursors was not observed. More detailed analysis of the pip cluster (Fig. la) revealed a small conserved gene, plpY (SEQ ID NOs: 120 and 121), located downstream of plpX, an architecture also preserved in other clusters with plpX homologs. When plpY was included in E. coli co-expressions, the presence of two additional broad peaks in the LCMS chromatograms that overlapped with the unmodified PlpA2-Fx (products 1 (SEQ ID NO: 122) and 2 (SEQ ID NO: 122)) and PlpA3-Fx (products 3 (SEQ ID NO: 126) and 4 (SEQ ID NO: 126)) (Fig. 2) was observed.
Example 2: Characterization of the products
For initial characterization of the products, a combination of high resolution and tandem mass spectrometric (MS2) experiments were performed (Fig. 3-6). These data showed that the mass loss could be attributed to a loss of -C8H9NO comprising four degrees of unsaturation and localized the modification to the tyrosine residues (Y21 and Y6, respectively) in the PlpA2 and PlpA3 core, which are part of a conserved YG core motif in gene clusters encoding PlpX homo- logs. Assuming that the atoms were covalently bonded and the information above it was proposed that an extraordinary excision of a tyramine equivalent from the backbone had taken place. In this scenario, excision of tyramine is followed by concomitant C-C bond formation of Cl-Leu/Met (PlpA2/PlpA3) and Cl-Tyr resulting in formation of the corresponding a-keto-β3- amino moiety. This functional group is known to be reactive under reductive conditions and indeed, in the presence of sodium borohydride a mass shift in the HRMS spectrum for the corresponding alcohol was observed (Figure 7).
To obtain more detailed structural information by NMR, it was found that simultaneous digestion (trypsin/chymotrypsin) of the co-expression of PlpA3-Fx + PlpXY cleanly provided a peptide fragment (Ala-1 to Trp-12) composed of the two isomeric products (5 and 6 (SEQ ID NO: 127)). Each product was purified to homogeneity by reversed phase HPLC and the samples were analyzed by NMR. Heteronuclear two- and three-bond correlations observed in the Hetero- nuclear Multiple Bond Correlation (HMBC) spectra showed cross peaks to the newly formed ketone and amide carbonyls that possess characteristic chemical shifts of δ ~195 and ~160 ppm, respectively, that have been reported in other natural products (Figs. 8 and 9 and Fukuhara, K. et al., Org. Lett. 17, 2646-2648 (2015)).
PlpA3-Fx was used to investigate the origin of the β-amino acid moiety by feeding of various 13C-labeled amino acids to E. coli expression cultures. For individual feeding experiments, labels of [l-13C] Met, [U-13C]Met, [l-13C]Tyr, and [U-13C]Tyr were detected by MS in the peptide products. NMR-based characterization of the purified core fragment 6 revealed enhancements
of carbon signals that were consistent with Met remaining fully intact (Fig. 10), while only CI of Tyr is retained and accounts for the amide carbonyl in the product (Fig. 11). These data confirm that PlpX catalyzes an extraordinary reaction involving excision of almost the entire Tyr moiety (only excluding the carbonyl) and reconnection of the remaining protein sections (Fig. 12). This modification, representing a non-canonical splicing process, is unprecedented for proteins.
To obtain further insights into the prevalence and distribution of this modification, a large- scale sequence analysis of taxonomically diverse Cyanobacteria was performed that comprised 436 combined published and newly sequenced organisms. Homologs of the rSAM genes were detected in 53 of these 436 genomes, all of them located in proteusin gene clusters, suggesting the generation of β-amino acid products. All precursors contained at least one YG motif and further YG copies were identified in many of the cores. To test whether these sites collectively direct multiple tyrosine excision events, pcpA (Fig. lb) from Pleurocapsa sp. PCC 7327 was co- expressed with its cognate excisase gene partners pcpX and pcpY. The excision reaction was detected at two of the three YG motifs. With a translated core containing 64 aa and three predicted YG motifs, the pep pathway generates a giant natural product that may exceed the size of all specialized metabolites reported to date.
The introduction of an a-keto^3-amino amide by PlpX has wide-ranging potential applications in drug discovery, chemical biology, and synthetic biology. To determine the versatility of PlpX a series of mutations at the site converted to the a-keto^-amino amide in PlpA3-Fx (M5) was created. The results show that PlpX is very promiscuous and can convert nearly every type of residues (Fig. 13).
Example 3: Materials and Methods
General Experimental Procedures. Factor Xa protease was purchased from Merck (USA). Restriction enzymes and GluC were purchased from New England Biolabs (USA). Thermo Scientific (USA) Phusion® DNA polymerase and T4 DNA ligase were used for all PCRs and ligations, respectively. DNA primers were obtained from Microsynth (Switzerland) or Thermo Scientific (USA). Antibiotics (chloramphenicol for pACYCDuet-1 and kanamycin for pRSFDuet-1) were used at a concentration of 25mg/mL in solid and liquid medium. Protino® Ni-NTA resin, Nucleospin plasmid and gel purification kits were purchased from Macherey-Nagel (Germany). LC-MS experiments were performed on a Dionex Ultimate 3000 UHPLC coupled to a Thermo Scientific (USA) Qexactive mass spectrometer. LC-MS measurements were carried out using solvents A (water + 0.1% formic acid) and B (acetonitrile + 0.1% formic acid). All HPLC columns were purchased from Phenomenex (USA). NMR spectra were acquired using a Bruker (USA) 500 MHz
Avance III equipped with a 5 mm TCI cryoprobe. 13C-labeled amino acids used in labeling experiments was purchased from Cambridge Isotope Labs (USA).
Cloning of precursor and excision enzyme constructs for protein expression. Expression vectors containing NHis6-precursor genes were constructed as follows. Mini-preps derived from previously reported plasmids (plpAl-Fx, plpA2-Fx, and p/p3-Fx in pET-28b) (see Morinaka, B. I. et al. Angew. Chem. Int. Ed. 53, 8503-8507 (2014)) containing NHis6-precursor genes containing a Factor Xa site (IDGR) at the interface of leader and core peptide were digested with Ncol and EcoRI. The precursor peptide inserts were gel-purified, ligated into MCSI of pACYCDuet-1 to give pAlFxACYC, pA2FxACYC, and pA3FxACYC. These precursor constructs were sequence verified. Constructs for the excision enzyme (PlpX) and associated protein (PlpY) were constructed as follows. The gene for the excision enzyme was amplified by PCR (primers PlpX_F,
CTCGCATATGACTAAAAAATACAGACGAGTTAGTTAT (SEQ ID NO: 144) and PlpX_R,
ATCTCTCG AGTTACTTTG CT A AAG CGTA AG C AG A (SEQ ID NO: 145)) and products were gel-purified, digested with Ndel and Xhol and ligated into MCSII of pRSFDuet-1 to give plasmid pXRSF. Following sequence verification, the gene for the associated protein was amplified by PCR (PlpY_F, GCGAACTCATGA ACTCTAATCAAATACCAAATAAA (SEQ ID NO: 146) and PlpY_R, GCGCAGCTGT- TATGTCAGAAAATTGCT (SEQ ID NO: 147)), gel-purified, digested with BspHI and Sail, ligated into MCSI of pXRSF (cut with Ncol and Sail) to give pXYRSF, and the insert confirmed by sequencing. Precursor constructs were transformed and expressed in E. coli BL21(DE3) cells alone and with pXRSF or pXYRSF. Proteins containing a Factor Xa cleavage site are denoted with an 'Fx'.
Cloning of PlpA3-Fx mutants. Mutants were constructed using New England Biolabs (USA) Q5 site-directed mutagenisis kit according to the manufacture's protocol. pA3FxACYC was used as a template. The identity of constructs was verified by sequencing. Each precursor was expressed alone and with pXYRSF in BL21 (DE3).
Protein expression and purification of precursors. TB medium (30 mL) containing appropriate antibiotics was inoculated with 300 μΙ_ overnight culture grown in LB. The cells were grown at 37°C at 250 rpm until an OD600 of ~1.6-2.0, cooled on ice for 30 min, induced with IPTG (1 mM final concentration), then shaken for 24 hours (250 rpm, 16 °C). The cells were collected by centrifugation (3,220 x g, 10 min). Proteins were purified using Ni-NTA resin (Macherey-Nagel (Germany)) according to the manufacturer's protocol. 10% glycerol was added to the lysis, wash and elution buffers. Proteins were adsorbed using 0.5 mL Ni-NTA resin, and eluted with 2.5 mL (250 mM imidazole, 50 mM sodium phosphate, 300 mM NaCI, and 10% (v/v) glycerol, pH 8). Elution fractions were desalted on a PD-10 column, digested with Factor Xa or trypsin, and ana-
lyzed by LC-MS and MALDI. LC-MS conditions: column: Kinetex C18-XB, 2.6μ, 150 x 4.6 mm; flow rate: 1.0 mL/min; mobile phase/gradient: 95:5 A/B for 5 minutes ramped to 40:60 A/B over 30 minutes.
Purification of excision products 5 and 6 from NHis6-PlpA3-Fx + PlpXY. The following procedure as above was carried out with 1L TB medium in an Ultra Yield Flask™ (Thomson, USA). The NHis6-precursor was bound with 6 mL Ni-NTA resin and eluted with 30 mL elution buffer. The elution fraction was buffer exchanged using a PD-10 column which into 100 mM sodium phosphate, 2 mM CaCI2, pH 8 (0.22 μ PES-membrane filtered). To the buffer exchanged elution was added trypsin (1:100, trypsin/precursor (m/m)) and chymotrypsin (1:20 chymotrypsin/pre- cursor (m/m)) and incubated overnight at 37°C. The peptide digest was desalted by C18 SPE (Strata, C18-E, 2g, Phenomenex, USA) to give 39 mg of digested peptide. This material was subjected to reversed phase HPLC (column: Phenyl-Hexyl, 5μ, 10 x 250 mm; flow rate: 4.5 mL/min; column temperature: 75°C) to give products A (1.7 mg) and B (1.8 mg) which were composed of residues 1-12 of the core peptide. Samples were dissolved in 500 μΐ DMSO-d6.
Feeding experiments with 13C -labeled amino acids. Carried out as above except 150 mL TB media in a 500 mL Erienmeyer flasks were used. Four separate experiments were carried out with one labeled amino acid each. At the time of induction 13C -labeled amino acids (1-13C Met, 21 mg; 1-13C Tyr, 25.6 mg; U-13C Met, 24.6 mg; U-13C Tyr, 23.7 mg) were directly added to the medium. The labeled NHis6-precursors were bound with 1 mL Ni-NTA resin and eluted with 6 mL elution buffer. The elution was desalted and purified as above to give the core peptide fragments for each labeling experiment which were subjected to LC-MS and NMR analysis.
Reduction of excision products with sodium borohydride. A portion of the PlpA2 core (100 μg) was dissolved in 1:1 MeOH/H20 (500 μΐ) a solution of sodium borohydride in H20 (500 μΐ, 1 mg/mL)) was added and the mixture left standing for 10 minutes. The reaction was quenched by neutralization with acetic acid and subjected to LC-MS analysis. See fig. 7.
Sequence listing
SEQ ID NO: 1 (Formula I, claim 3)
SEQ ID NO: 2 (Formula II, claim 3)
SEQ ID NO: 3 CXXXCXXC (X is any natural amino acid)
SEQ ID NO: 4 EXTXXCXXXCXXCGXRXXXXRXXEL (X is any natural amino acid)
SEQ ID NO: 5 CXLXCXHCGSRAGXXXXXE (X is any natural amino acid)
SEQ ID NO: 6 CX9-15GX4C (X is any natural amino acid and the integers denote the number of X(s)) SEQ ID NO: 7 CX2CX5CX3CXi4_i8C (X is any natural amino acid and the integers denote the number of X(s))
SEQ ID NO: 8 CXXGXXXXXXXXXGXXKXCP (X is any natural amino acid)
SEQ ID NO: 9 GXCXXCXXXXXCXXXCXXXXXXXXXXXGXNPXCXXR (X is any natural amino acid)
SEQ ID NO: 10 CXAGXXXXXEADGXXKXCPXL (X is any natural amino acid)
SEQ ID NO: 11 CXXCYYXXXCXXGCXWXXXXLXGXXGXNPXCXXR (X is any natural amino acid)
SEQ ID NO: 12 (70% TIGR03913, >WP_052261552)
MAATRFLSKSDVETRRPVYVVWETTLACNLKCKHCGSRAGTPRKDELSTEDAFGLIDELADLGTREITLIGGELF
LRKDWLKLVERISSKGILCTMQSGGFHLTRERLKDAKEAGLAAIGISIDGLEKTHNSLRGVRTSFQHALRSLEAA
RDLHITSNVNTTITSKNIDELELLYEVLKGFHVRNWQFQIVVAMGNAADEETLLFQPYQITKVLDSVARIRTSAS
RVGMLVQASNALGYFGPYEWLWERGSDTDSHWSSCGAGQSTMGIEADGTIKACPSLATEDFGVTPSEYDTL
QDLWIGSDRIRFNETRDPPSGSICGACYYWAACKGGCSWATHSLTGTVGENPYCHYRAICLSELGLKEKIRKVR
DAPGSSFDRGEFEVVIEDEEGKTLKSDDPRKHRIIEKISAGREQGASEEALKLCTSCHQFAWEYEKQCPFCSGTK
FVSTRALKSIKSATKFL
SEQ ID NO: 13 (70% TIGR03913, >WP_035619696)
MDKTSTRYRVRDDYETATPVHVVWEITLACNLKCSHCGSRAGKVRPGELTTEQCFGVIDSLKRLGTREISIIGGE
AFLRKDWLEIIERIHQSGIECSMQSGAYNLNEERIIDAKKAGINNIGVSIDGMPDTHNKIRGRRDSFEHAVNCL
QLLKKHNITSSVNTVITKRSKNELNELLDILIENDVKNWQIQLAVAMGNAVDNSDELIVQPYELIDFYDDLIVIYR
KALAHNILIQAGNNIGYFGPYEHIWRQGNEKYYTGCSAGHTGIGIEADGKIKGCPSLPTSAYTGGNVKDMDLE
DIWKYSEEMVFSRYRNKEELWGGCKGCYYESSCLAGCTWTSHVLFGKRGNNPFCHHRALELKKKGLKERIRKI
QEAPGVSFDMGLFEIIVEDENGVIVEIQSPNSETPVLVPDYSARIPRIPKALKLCNGCDNYVYEEETVCSFCNAD
VQKVN D EYAAKM E RAKQTLE KLELLM M K
SEQ ID NO: 14 (70% TIGR03913, >WP_063775385)
MRSARDRKTHVPVHVVWEITLACNLKCGHCGSRAGKRRANELSTAECLDVVRQLAAVGTREITLIGGEAYLRK
DWLEIAAEIARLGMHCGLQTGARGLTRERIAAAYAAGVRAIGISLDGLRDLHDELRGVKGSHDQAIQAIKWVS
EVGIEPGVNSQINLRSMRELDGIFDEIVAAGAKYWQVQLTVAMGNAVDNSEMLLQPHQIVDVVDKLAELYH
RGRDVGLRLLPGNSIGYFGRHEAYWRSLTDDVTHWGGCTAGETTLGLEADGTIKSCPSLPKSHFAGGETRTSTI
EEALKALESRNVRRDGNRGRSFCGSCYYWNVCRGGCTWVSHVLEGRRGDNPYCYYRATTLARRGLRERIVKV
ADAPDEPFAVGRFEVRLEHEDGSRAPRSIGLDDKPRKRGRLG LCQSCYEYMSLGERHCPHCGEVNRPKLMVD
AKLELDVQIALDEIERHGRSILELAAAAGQDPQAAE
SEQ ID NO: 15 (70% TIGR03913, >OCW56221)
MPVHVVWEITLACNLACGHCGSRAGARRPDELTTAECFDIIRQLREAGTREITLIGGEAYLRKDWLEIAAEISRL
GMLCGLQTGARGLTKARIEAAYDAGVRAIGVSIDGPRDIHDRLRGVAGSHDQAMRAIADIAETGIRPGVNTQI
NALSAPHLWDIYRAIRDAGARSWQTQLTVAMGNAVDNAEMLLQPHLIVDVVEDLYAIFEDGLLHDFRVLPG
NSIGYFGRHEAQWRSITEAAEPWQGCTAGETTLGLEADGTIKGCPSLTRESYSGGNSRTVSIAEAIGNLADRTV
RRDGNPGGGFCATCYYYDFCQAGCTWVTHSLTGRRGDNPFCVYRAGKLRDAGLRERIVKTAEAPDTPFAIGK
FDIVLETLDGKPGPATVADTAPRSVGATHLVVCEQCGQFIANTEPVCVHCHAEQRAAARSRLERVQDVHNLL
AEIDSHSHGIHALVDEISAR
SEQ ID NO: 16 (70% TIGR03913, >WP_060978522)
MVPQAELPHARFFGSADQRRRVPVSVVWELTRACDLSCCHCGSRAGRRAREELSSVECLDLVDQLAELGARD
IGLIGGEVYLRRDWLEIVRRIRQAGMDCSVQTGGRFFTPATLDAAIAAGVMSIGVSLDGVGQTHDLQRGVPG
SFEAALALLHLVANRPIMASVNTQINRHTMQQLPELLDVLIAAKVSNWQLALTVAMGNAADNDDLLLQPYDL
LALFPLLASLHDRAAENGIVLQPSNNIGYFGPYEEKLRLIGDIAAHWTGCSAGDNVLGIEADGNIKGCPGLSRDY
VGGNVRDEPLRVIWDRAERLAFTHRATKSDLWGYCAQCYYAEICMAGCTWTAHSLMGRPGNNPMCHYRA
LEFARRGKRERLVKVADAPGHAFDYGRHALVVEDRPLSDQAVLS
SEQ ID NO: 17 (70% TIGR03913, >SAL02149)
MELKISHHDAPVRVLRPSDRGGRTPVYAVWELTLQCNLACSHCGSRAGKKRTGELSTSEALALVEDLRGLGVR ELALIGGEAYLRGDWIEIVRRARDLGIRPVLQSGGYG FTATLAQRAKEAGLAALGISVDGLEPIHDDIRGVVGSY RAAFEALNAAREVGLTVTANTQIHAKNWRSLPDIYTDLRKAQIAAWQLQLTVAMGNAADNDDLLLQPFHLY ELFPVIAALTDMAKTENVQIYPGNNIGFFGPYEHLWRGPHRFDNHYKGCQAGINTIGIEADGTIKGCPSLATSR YSEGNVRVVPIKEIWNGNLAFPLNRDWDLKLWGFCKTCYYAKVCRGGCNWTSDSLFGKPGNNPYCHHRVLS LKKLGKRERVVKVSEADKSSFGTGLFKLIEESW
SEQ ID NO: 18 (70% TIGR03913, >WP_043629091)
MGDAGGRARPARPNRYLSGPDVRGHQLVHAVWELTLACNLKCRHCGSRAGSVRPEEMTTQECLGVVRQLH
ELGAREVTLIGGEAYLRKDWVEIVRSISDAGMECTLQTGAWQLTGTRIAQAADAGLVACGVSIDGLAPLHDHL
RGRPGSFDAAIDALGRLREHGIRTSVNTQITAAVIPQLRDLFREFLATGVRNWQVQLTVAMGRAADNHDLLM
QPYQMNELMPLLAELHAAGVAKGLLLQPGNNIGYFGPYEAQLRGSGTASMHWAGCFAGRNVLGIEADGTIK
GCPSLPTVTYAGGNVRDSSIAEIWASASQLSFARKSRGKEMWGFCASCYYADVCEGGCTWTSHSLLGRPGNN
PYCHYRASELKKQGKRERVVRRLPAPGTSFDHGRFEIVVEPDPDTAGEHRAAGASPQERSRPGPPPAPGDAG
SHLPSLVLCRACDCHVFAGTDFCPHCGADVPASQREYESALADARHAASRLARAIGSLGPELPATTTPRA
SEQ ID NO: 19 (70% TIGR03913, >WP_050043969)
MDQAPDDTPRRRLSLSDQLDCVPVYVVWELTLACNLKCIHCGSRAGHKRAKELTTDECVDVIRQLAELGTREI
SVIGGEAFIRRDWLTII AIRDHGIDCTMQTGGYKLSGDMIRSAADAGLLG LGVSVDGLEPLHDRLRGVRGSYR
EALRVLDDCRRLGLTASVNTQITSAVMSELPQVMEIIIDAGAKYWQVQLTVAMGNAVDHDDILLQPYDLTTL
MPLLAELHLKGRERNLVLLPGNNVGYFGPYDVLWRGPDRGYYSGCPAGQNVIGLEADGTVKGCPSLATDRYG
AGDVRSASIAELWATHPALQFNRNRGTDDLWGFCRDCYYAEACRGGCTWTADSLLGRRGNNPYCHHRVLT
LAERGIRERIVKIEDAPALPFATGRFALIEEPLPAGAPHA
SEQ ID NO: 20 (70% TIGR03913, >WP_020539729)
MSVEEDGLARRVARERDFRDKVPVHVVWELTLACNLKCLHCGSRAGPRRPAELTTEEALDLVAQLAGLGTRE
LTVIGGEAYLRNDWVEIIRRATELGMSCSMQTGARALTPARLRAGADAGLRGIGVSIDGLRDLHDEVRGVPG
AYANAFKVLRDAREAGLRVSVNTQIGARTIGELPALMDELVAAGVTHWQVQLTVAMGNAADHDELLLQPYR
LAELMPLLARLHHEGQRHGLLMVPGNNIGYFGPYEHLWRNASSMSGHWSGCEAGHTALGIEADGTVKGCP
SLPTSAYTAGNVRDLSVADMWRDSPALGFRRGRAGADELWGHCGTCYYADVCDAGCTWTAHSLLGRPGN
NPYCHHRVLELAKKGIRERIVKVADASAAPFGIGEFALVEEAIPADPRPADPRPAVPGRAERAPGDRGVPELRL
CGDCAHFVMAGDAACPFCSADVAEAEERARAVHRRRTELMDQVKALMRQET
SEQ ID NO: 21 (70% TIGR03913, >KYF96555)
MSRAVRDRETDDFRRNVPVHAVWEITLACDLKCRHCGSRAGARRDRELDTAECLEVVAALARLGTRELSLIGG
EAYLRSDWIDIIRAARAARMRCAVQTGGRNLTEARLAAAVGAGLQAVGVSIDGLAPLHDELRGVPGSFSRAID
AVRRARAHGIAASVNTQIGARTMDDLPALLDAIAAAGATHWQIQLTVAMGNAVDNDAVLLQPYQLLELMPL
LDRLYREGLDRGLLMVVGNNIGYFGPFEHRLRGVGDESVHWTGCAAGQNVIGLEADGTVKGCPSLRTSGYA
GGNVRDLRLEDIWNHAEEIHFGRLATVDALWGFCRTCYYADVCRGGCTWTADSLFGRPGNNPYCHHRALEL
DRRSLRERVVKKREAPGAPFAIGEFELIVEPVPGREGASPAPPVTS
SEQ ID NO: 22 (70% TIGR03913, >WP_013570078)
MLLSEPRKMETPPRYLTNQDFQRYIPVHVVWEITLACDLKCMHCGSRAGHRRPDELTTQECLDVVDALARLG
TREVTLIGGEAYLRKDWTQIIRRCADHGIYCATQTGGRNFTQQRLQEAIDAGMNGLGVSLDGLEPLHDRLRG
VPGSFRQALDTLQRAHDAGLSISVNTQIGAEIIPQLPELMEIILGAGAKQWQIQITVAMGNAVDHPELLLQPH
RVLELMPMLARLYQEAAERGMLMVTGNNIGYFGPYEHIWRGFGDERVHWSGCNAGHTGLALEADGTVKG
CPSLATVGFSGGNVRDLTLEHIWNHSKEIHFGRLRSIEDLWGFCRTCYYAEVCRGGCTWTSHSLLGKPGNNPY
CHYRALELEKQGLRERVVKIRDAAPDSFAVGEFELITERIEDGTVAASSVSQKQLVQLGLSNDERWSPREGRVP
PTLKLCRACNEYVWPHEVDCPHCGENIAASLAQYQEDTRRRREAMDAVRALIESYRAAETSVVPST
SEQ ID NO: 23 (70% TIGR03913, >AAY91426)
MGWCRAPPGAQGTLMSDRLPARYLSETDLKRFVPVHVVWEITLACDLKCLHCGSRAGHRRPGELNTQECLS
VIDSIAALGTREVTLIGGEAYLRKDWTRLIQAIHDHGMYVAIQTGGRNLTPAKMQAAVDAGLNGVGVSLDGL
APLHDAVRNVPGSFDKALDTLRRAKQAGLKVSVNTQIGAATLPDLPALMELIIDAGASHWQIQLTVAMGNAV
DHPELLLQPYQLLEVMPLLARLYREGAERGLLMNVGNNIGYYGPYEHMWRGFGDDRVHWSGCAAGQTVLA
LEADGTVKGCPSLATVGFSGGNVRSMSLHDIWHYSEGIHFGRLRSVDDLWGYCRTCYYNDVCRGGCTWTSH
SLLGKPGNNPYCHYRTLDLAKKGLRERIVKLEDAGPASFSVGRFDLITERIDTGEAVSSVNDSGQVIKLAWVNQ
GQASPEEGRIPPRLALCRSCLEYIHAHESTCPHCNADVAAAEARHQEDRLRQQALINTLHQLLGTPQEQPRL
SEQ ID NO: 24 (70% TIGR03913, >WP_062765523)
MPDGQTPAERAIPRARSLDDIVDLVPVHVVWEITLACNLKCQHCGSRAGHVRKGELSTAECLDLVDQMARLG TREVTLIGGEAYLRRDWLEILARVRAHGILCLIQTGGRNLTDKRLEAAIAAGINGIGVSIDGLAPLHDRLRGVPGS FDQAMSALTRAKAAGLSISVNTQIGAETMEDLPELMDRIIAAGATHWQIQMTVAMGNAVDNPDILLQPYRLI
ELMPLLARLYREGARRGLTMVVGNNIGYFGPYESLWRGFGNEAVHWTGCAAGQNVIGIEADGTIKGCPSLAT HGYAAGNIRDLALDDIWRNQEAMAFGRTRSVEKDLWGYCRSCYYADVCRAGCTWTSDSLLGRRGNNPYCH YRVRDLAKKGRRERVVKIREAGPESFAVGEFALIEEAIPPEELAALPPDRLPGFSATDGRPVHIAPEDDDDPARA AGARGRIPPHLDLCRSCHEYVWPHETDCPHCGADIAAAAAAYAARLAEVQALAARIRARLDEAAPGI SEQ ID NO: 25 (70% TIGR03913, >WP_014249603)
MDQPLPSPDLSARSPEQLPEQPASVPARYCFDDDFRKLVPVLVVWETTLACNLKCQHCGSRAGRPRPDELTTE
EALDLVDRLAALGTREISLIGGEAYLRKDWVEIIRRCRSHGIRTAVQTGGRNLTDRRLDEAVAAGLQAIGVSIDG
LPDLHDRVRGVSGSYDQAMSALRRAKDRGLAVSVNTQIGPETPDHLPELMNRIIEAGATHWQIQFTVAMGN
AVDNPDLLLQPHRLLDVMPLLARLYREGLDRGLLMVVGNNVGYYGPYEHIWRGLGDDRMHWTGCAAGQI
GIGIEADGTLKGCPSLATSLYAAGNIREMSVEDIWRHADRMQFGRLRSVDELWGYCRTCYYADICRAGCTWT
SESLLGKRGNNPYCHYRVLDLAKHGLRERVVKIKDAPNEAFAIGEFALIQEPIPGAESPGLPERDPAKVHRHEYE
RSLEGGVVPPSLTLCRACNQYVWPHETDCPHCGADVAAAAAQHGIDSARRRALIRETQRLLDEARAAKLEAA
GAPSPATAAGD
SEQ ID NO: 26 (70% TIGR04103, >WP_020737613)
MIEPTEIPVRALMPADLMEAKPIYAVWELTMKCDQPCQHCGSRAGAARDAELSTEEVLEVAASLARLGCREV
ALIGGEAYLREDLAEIVSFLARSGMRVIMQTGGRAFTAERAKALRAAGLTGLGVSVDGPAHIHDELRGNVGSH
AAAIRALDNARAAGLITTANTQINRLNAHLLRETCAELRSHGIQTWQVQITVPMGRAADHPEWILEPWRVVE
VIDTLAAIQREALETHVSGVPFNVFANNNIGYFGPHEQLLRSRPGGGDAHWRGCRGGINAIGIESDGTVKACP
SVPTVPYAGGNVRERGLEHIWEGSAEVRFARDRDASELWGHCATCYYADECRAGCSWTAHCTLGRRGNNPF
CYHRVTQLKRRGIRERLVMKQRAPHVPYDHGVFELVEEAWDAPPPPPPEPVVPRNARRRLAVV
SEQ ID NO: 27 (70% TIGR04103, >SFD76092)
MNSPARALRPDDLRQPRPIYVVWETTLRCDHECAHCGSRAGDARPDELSTEELLEVADALVRLGSREVTLIGG
EAYLRGDCYRLIEHMTKAGIRVTMQTGGRGLTQDRCRKLREAGLAAIGVSVDGPEAAHDTLRASPGSHAAAL
KGIRNAREAGLLVTSNSQINRLNKDVLRETAELLADAGVAVWRAQMTAPMGRAADRPDWLLEPYMVLEVID
TLADIQRWAQRRAADRGIPWERAFHVRLGNNLGYFGPHEQLLRTRPGSPDSYWQGCSAGKFVMGIESDGTI
KGCPSLPTAPYTGGNVKTAALADIWNEAPEIAFARDRGTSELWGFCKSCYYAEVCRAGCSFTAHSAIGKRGNN
PFCYYRATQMKRKGLRERVVLRQAAPGDPYDFGQYEIVEEPWDSAPPRAAVRLPVLAG
SEQ ID NO: 28 (70% TIGR04103, >KIG18351)
MSSSRTIREHEVDQPRPIYTVWEITLRCDHACAHCGSRAGPVRDDELDTAELLAVADALVELGSREVTLIGGEA
YLRSDVYQLVEHLAKAGVRVTMQTGGRGLTAARAQRLRDAGLAAVGVSIDGTAAVHDRLRASPGSHDAAM
RAIEHARAAGMVVTSNSQINQLNMHELPAIAAELEAAGVLVWRGQLTAPMGRAADHPEWIVQPYMVLEIID
TLAQIQAGASARAQARGASEMESFRVTLGNNLGYYGPHEPLLRSRPDRRDRFFPGCQAGRYVLGIESDGTVK
GCPSLPTAPYQGGNVRELSLEQIWDSEAIRFTRDRSTDELWGHCASCYYADVCRAGCSFTSHSTLGRRGNNPF
CYYRADKLRKQGLREVIVHARAAPGSPYDFGGFELREQPWSDLPPPAGRRSLPVVTS
SEQ ID NO: 29 (70% TIGR04103, >WP_006974883)
MAGRRLDVLSTDFFPAYVVWELTLRCDLACRHCGSRAGPARPVELTTSEAVAVAEELGRMGAREVVLIGGEA
YLHEGFLEVVEALARAGVVPVMTTGGRGVDEALARAMAEAGLRRVSVSIDGLEPTHDRMRGFRGSFAAALA
ALDHCAAAGLSISANTNLNRLNWGDLEALYEQHLRGRVRSWQLQITTPLGRAADRTAMIFQPFDLLELLPRVA
ALKRRAFAEGVLILPGNNLGYFGPEEGLLRSQTPEGTDHWQGCQAGRFVMGIESDGAVKGCPSLQTAAYVG
GKVLERGLAEIWNEAPQLAFTRERSVEDLWGYCRGCVFAKTCLGGCSFTAHAVFGRPGNNPYCHYRARDFAK
RGLRERLVPTEAAEGTPFDNGLFEVVEEPLDAPDPEAALEPRELVQITRRPGSTPAQTPRDPAGGGA
SEQ ID NO: 30 (70% TIGR04103, >WP_012234464)
MWRSAQRARARARARARARARARAQVIIHPPRDRKCTGEQEPAAEFAVGSGMALRRLDVVPGEYFPAYVV
WELTLRCDQPCRHCGSRAGAARPSELGTDEALGVVRQLAAMGAREVVLIGGEAYLHDGFLEIIAALKAAGVRP
TMTTGGRGITAEIAAQLKEAGLHSVSVSVDGLERAHDLIRKAPGSHGSALAALGHLRSAGLLTAANTNLNRVN
QGDLEALYDLLREQGIKAWQVQITAALGRAADRPAMLLQPYDLLDVLPRIAELKRRAFRDGITVMPGNNLGYF
GPEEALLRSLREGGRDHFRGCQAGKLVLGIESDGAVKGCPSLQSDAYVGGDLRGRALQEIWDEAPRLAFARA
RTADDLWGFCRSCAFAEVCMGGCTFTAHALFGRPGNNPYCHFRARTLAAQGKRERLVPAEPAAGRPFDNGL
FELVLEDLDGPEPGLDSPEQLVQLTRKPRPSS
SEQ ID NO: 31 (70% TIGR04103, >WP_002625456)
MSMLRRLDVTRPDHHPAYVVWELTLRCDQPCTHCGSRAGTQRPDELSTAEALDVVRQLREMRTREVVLIGG
EAYLHPGFLDIIRALKEAGIRPGLTTGGRGMTEALARQVAEAGLYAASVSIDGLEPTHDLMRAAPGSFASATAA
LGFLSAAGVRVAVNTNFNRLNQADLEPLYEHLKGLGPRAWQLQITAPLGRAADRPALLLQPWDLLDLLPRIAA
LKQRAFADGITLMPGNNLGYFGPEEGVLRSPGPDASDHWRGCMAGRYVMGIESNGAVKGCPSLQTAHYVG
GNLRERPLRDLWDNAPPLAFTRTRTVEDLWGFCRTCPFASTCMAGCSFTAHALFGRPGNNPYCHYRARTLAK
QGVRERLVPQAPAPGKPFDHGLFDLVVEPLDAPDPRPPTPRMLVKRLKWPEAHTPQAVGARTDPTG
SEQ ID NO: 32 (70% TIGR04103, >WP_010607032)
MTNTLENAGVKVREYDRSTYAVWEITLKCNLACSHCGSRAGDKRADELSTTEAFDLIKQMADLGVKEVTLIGG EAYLRPDWLMIASEIKNKGMRVTMTTGGYGISRGTAKRMAQAGIEAVSVSIDGLEDEHNSIRGKSDSWSQCF TTLSQFKDLGVHTGVNTTVTRKSAKDLPLLYEKLIDVGVKNWRIQLAVPMGNAADNNEMLMQPYELLDLYPL LGLLSVRGRKDDLIIQPGNNVGYFGPYERLLRGTFMQESKYSFYTGCVAGQGAIGIEADGKIKGCPSLPSEEYTG GNIRERTLKDIYENAPELNFNSQEMDDASIAHLWGNCKGCKYAKLCRAGCNWTAHVFFGKRGNNPYCHHRA LTLAASGLRERFQQRVAASGIPFDHGVFEIYEESISATSNDNVNRFTIQGINFPPSWLATDSNLRERLNSEKLNAI HQYRALGLAKAV
SEQ ID NO: 33 (70% TIGR04103, >WP_010607027)
MIVLVARYAPPIRLELLGVKNMSVNMDSAGIKIKKEARQTYAVWEITLKCNLACSHCGSRAGDSRVNELSTTEA
LDLVHQMADLGIKEVSLIGGEAFMRPDWLMIAAEITRLGMKASMTTGGFGISEGTAKRMKQAGISTVSVSID
GLEKEHDLLRGKVGAWKQCFLTIERLTNVGINVGCNTQINRYSAKQLPLLYQKLVDVGARAWQLQLTVPMG
NAADNDEMLLQPNELLDVFPLINFLSVRGRRDGLAVQAGNNIGYFGPYERQLRDNKSTHSEWAFYRGCGAG
QNTLGIEADGSIKGCPSLPTNAYTGGNIRERSLRDIYENTDELRFNDINKPEDATKHLWGECATCEFAKVCRGG
CNWTSHVFFGKRGNNPYCHHRAVKMAVRGKQERFFIREKASGDPFDHGVFDLVVEDFKPLDPQDTSVFSLA
QAQFPENWLEADPNLVRRLFTERGLVMKQYVDSGIVPKEESPWFDKTQREALMSSAVPA
SEQ ID NO: 34 (70% TIGR04103, >WP_054014533)
MTDLTNRSDIRINAAYRQTYAVWEITLKCNLACNHCGSRAGDARVDELSTSEALDLVAQMADLGIKEVTLIGG
EAFMRPDWLQIAAEITKKGMKATMTTGGYGISLGTAKRMKEAGIAAVSLSIDGMERSHDLLRGKQGAWQKC
FETIAHLREAGIPVGCNSQVNRESIAELPSLYEELLKAGISAWQLALTVPMGNAVENSHILLQPYELLDVFPLLAY
LSKRGNSEGIRVHMGNNIGYFGPYERLLKEPIASEAKWAFTRGCSAGQNAIGIEADGSIKGCPSLPSAEYTGGNI
RDRKLQDIYQNSPELRINDITTPEDATRHLWGECSSCEFASVCRAGCHWTAHVFFGKRGNNPYCHHRALKKA
AKNQRERFYVKQAAPGKPFDHGVFAIHDELCIVNEDSGQFRIDDMAIPDRWQEDGLDLLALIREEKASAIESYR
SLVN
SEQ ID NO: 35 (70% TIGR04103, >WP_055410774)
MRGRRRTYAVWELTLACNLACGHCGSRAGARRPAELSTAQALDVVAQLDAVGIDEVTLIGGEAFLRRDWLTI
AAEITRRGMGCTVTTGGYRLSAAMARGLRAAGVTQCSVSVDGMTATHDRLRGRVGSWESCFRTMERLRSA
GVEATCNTQINRLTAPELPRLYQRLRSAGVVAWQWQLTVPMGNAADHADLLLQPVELLEVFPVLARIARRAS
QDGVRIHAGNNVGYYG PYERLLRSPEGSAFWTGCQAGLSTLGIESDGTIKGCPSLPTRDYAGGNILDRSLTDLL
RDAPELGINLTAGTAAAAENLWGFCRGCTYADVCRGGCTWTAHTFFGRPGNNPYCHHRALTHQRAGRRER
LVQTAPAPGEPFDHGLFSLLEEDLDTPWPSTERPCLTAADIRWPADWVD
SEQ ID NO: 36 (70% TIGR04103, >WP_047858470)
MPPPAVQTRTAYAVWELTLKCNLACGHCGSRAGDSRKNELSREEALDLVRQLAEVGIQEVTIEGGEAFLRPD
WLDIARAITDHGMLCTMTTGGYGLSRETARRMKEAGIAHVSVSVDGLEATHDRIRGRKGSFRFCFETLGHFRE
VGLPFSSNTQVNRLSAPELPALYERLRDAGIRAWQVQLTGPMGNGTDNAWMLLQPAELPDLYRMLARVAL
RVREESRLSLVPGNDVGYFGPYDDLLFSSSGAKVWAGCKAGLSVLGIHADGGIKACPTLPSEFVGGNIRQQPL
ADILETRELTFNVDAGTPEGIAHLWGHCASCRYAEACRGGCSQRAHVLFNRRGNNPYCHHRSLRLAESGVRE
RVVRAAPGTGLPFDHGVFELVEEPLESPWPSDDPHHFTYERVEWPPGWEAFPLPGV
SEQ ID NO: 37 (70% TIGR04103, >WP_002708735)
MSTQLRQRRTYAVWEITLKCNLACQHCGSRAGEARQDELSTAEALDLVQQMAEAGIGEVTLIGGEAFLRKD
WLEIAAEINRCGMICTLTTGGYGISAELARRIKQAGLASVSVSIDGMEASHDAQRGKAGSWKFAFESLQHLRH
AGVPITANSQANRLSAPEFPLFYEKLVEVGVGGWQIAMTVPMGNAADNSWLLLQPAELLVLHPMLAYLARR
GRREGLIMQPGNNVGYYGPYEKLLRSYGSDNDWAFWRGCKAGLALIGIEADGTIKGCPSLPTNAYAGGNIRR
HSLRDIVLNAEKMQINMSTGTEQGTDHMWGFCKSCEYAELCRGGCSWTSHVFFDKRGNNPYCHYRSLVHA
AHGIREDLRIKRNAFGLPFDNGEFEISEKALGAAWSGNEEQRLTPDRIQWPEQWLQEDSELKGFIQNEIDHNI
GNMRNYLGLTRKHKLAV
SEQ ID NO: 38 (70% TIGR04103, >SEA53645)
MTDTQTTTQYTPGERSCYAVWEITLKCNLACSHCGSRAGDARVNELSTAEALDLVHQMADVGITEVTLIGGE
AFLRSDWLEIAAEIVKCGMICSMTTGGYGINLTTAQRMKAAGINQVSVSVDGMRHTHDRLRGKIGSWKYAFE
TMGHLREAGIPFGANTQVNRHSAPEFPLLFQALIDAGAKAWQIQMTVPMGNAADNSDILLQPDELLLFHPLL
ANLAKRGYPQGFYVQPGNNYGYYGPYDRMLRGFGKPTEWSFWQGCFAGLRTIGIEADGTIKGCPSLPTAAYS
GGTIRDASLATILTERDELTFNLSAGTPAATDHLWGFCKTCDFAELCRGACNWTAHVFFNRRGNNPYCYHRSL
VNAANGVRERFALRKAASGLPFDNGIFDLFAEDALSTAAADPMRFTSDKIQWPQAWLAENPKLSSALQHEVE
QNVRDMRNARLSAQI
SEQ ID NO: 39 (70% TIGR04103, >WP_012628459)
MSTQEDYYRTRYAVWELTLKCNLACQHCGSRAGQPRTQELTTAEALDLVQQLAQIGIREVTLIGGEAFLRPD
WLEIAAAIARAGMICNLTTGGYGLSLQLAQAMQRAGIAAVSVSIDGLETTHDRLRGKQGAWHSAFRTMQHL
RQVGVPFACNTQINRLSAPELPRIYEQIRDAGVYAWQIQLTVPMGHAADHWEILLQPCELLDLFPLLAQIAQW
AAQEGVRLYPGNNVGYYGPYESLLRGGGHPGAVWQGCGAGLNTLGIEADGTIKACPSLPTSAYAGGNIRDQ
PLASM MAQSEALRFNFNAGLPEGTAHLWGFCQ.TCEFAALCRGGCNWTAHVFFGRRG NPYCHHRALN LA
RQGLRERLALNIPAPGLPFDHGQFLLFQEPLNAPWPEPDPLYFTADQVQWSSSWTEQPVKVC
SEQ ID NO: 40 (70% TIGR04103, >WP_019503880)
MTKKYRRVSYAVWEITLKCNLACSHCGSRAGQARTKELSTEEAFNLVRQLADVGIKEVTLIGGEAFMRSDWLE
IAKAVTEAGMICGMTTGGFGVSLETARKMKEAGIKTVSVSIDGGIPETHDRQRGKKGAWHSAFRTMSHLKEV
GIYFGCNTQINRLSASEFPIIYERIRDAGARAWQIQLTVPMGNAADNADMLLQPYELLDIYPMLARVAKRAKQ
EGVRIQAGN IGYYG PYERLLRGSDEWTFWQGCGAGLNTLGIEADGKIKGCPSLPTAAYTGGNIRDRPLREIV
EQTEELKFNLKAGTEQGTDHMWGFCKTCEFAELCRGGCSWTAHVFFDRRGNNPYCHHRALKQAQKDIRERF
YLKVKAKGNPFDNGEFVIIEEPFNAPLPENDLLHFNSDHIQWPENWQNSESAYALAK
SEQ ID NO: 41 (70% TIGR04103, >WP_015143117)
MTYRRTSYAVWEITLKCNLACSHCGSRAGHTRAKELSTQEALDLVRQMADVGIIEVTLIGGEAFLRPDWLQIA
EAITKAGMLCSMTTGGYGISLETARKMKAAGIASVSVSIDGLEETHDRLRGRKGSWQAAFKTMSHLREVGIFF
GCNTQINRLSAPEFPLIYERIRDAGARAWQIQLTVPMGRAADNANILLQPYELLDLYPMIARVARRARQEGVQ
IQPGNNIGYYGPYERLLRGRGSDSEWAFWQGCAAGLSTLGIEADGAIKGCPSLPTSAYTGGNIREHSLREIVEES
EQLRFNLGAGTSQGTAHLWGFCQTCEFSELCRGGCTWTAHVFFNRRGNNPYCHHRALFQAEQGIRERVVPK
VEAQGLPFDNGEFELIEEPIDAPLPENDPLHFTSDLVQWSASWQEESESIGAVVD
SEQ ID NO: 42 (70% TIGR04103, >WP_047157009)
MNYRISYAVWEITLKCNLACQHCGSRAGHTRTKELSTEEALDMVKQLAEVGITEVTLIGGEAFLRPDWLEIAKA
ITDAGMMCSMTTGGFGITLDTARRMKEAGIRVVSVSVDGLEGTHDRLRGRKGSWQWAFKTIGNLRQVGIFV
GCNTQINRLSAPEFPQIYERIRDAGVFAWQIQLTVPMGNAADNSEILMQPYELLDLYPMIAHVAKRAYKEGV
QVQPGNNIGYYGPYERLLRGQGKDNPWAFWQGCNAGLSTLGIEADGAIKGCPSLPTSVYTGGNIRDYSLRKII
EETEELRFNLGADTPQGTEHLWGFCKGCEFAQLCRGGCSWTAHVFFDKRGNNPYCHHRALTQAKQGIRERV
ELKYRAEGNPFDNGEFALIEEPINAPWPENDPLHFTRDHIQWHGIWQKENKSTPELVAVSK
SEQ ID NO: 43 (70% TIGR04103, >WP_013652855)
MDETVRLARLYDGDPAKLLRTWAGDSPPRIVVWELTLACDLGCRHCGSRAGKARRDELDTQEALDVVRQLA
DLGVAEVILIGGEVYLRDDWFLIAAAVTQAGMTCSLVTGGRGFDAGVVDEALAAGVRIVGVSIDGLPATHDRL
RGVPGSYEAAIATARRIAATGRLTLSVNTQINRLSLPELRAVAERVVELGAVAWQIQLTVALGRAADRPDLLLQ
PWHLLELFPQLVAIKKEILEPGGVQLFPGNNIGYFGPFEAELRYGGDAGHTWMGCGAGRAALGLEADGKLKG
CPSLPTVPYTGGNVRDTPIAELWAHAPEISALGRRTTDDLWGFCGTCRHAAVCKAGCTWTAHALFGRPGNN
PYCHHRAWSLAQTGLRERVVLVEKAPGRPFDHGRYEIVVEPLDAETPDEERLAPVPRARAAALFGLRADAPSA
WSGEDLVEATRSARR
SEQ ID NO: 44 (70% TIGR04103, >WP_068184797)
MPHPPTYDGNPQQLARWRPEHDAPPSIAVWEITLRCDLGCCHCGSRAARARPDELSTTEALDLVRQFADLGL
KEVTLIGGEFYMRDDWDRIAAEINRCGMLCSIVTGARQMTAERVSRAVAAGVGKISISIDGLERTHDAIRGSK
GSWKAATAAARRISDSGIDLSVNTQMNRLTMPELPAVADMLVDIGARSWMVILTAAMGRAADHPSLLLQP
YHLLYLFPLLADIKREKLDPNGIAFFPGNNVGYFGPLAETLRYGSELGHMWGGCGAGDSTLGIEADGRIKGCPS LPTSDYVRGNIRERPLREIAAELKREKTEAPTQLWGFCQSCRYAARCKGGCTWTSHVLFGRPGNNPFCHFRAL TMAEAGLVERLEPVAVAPGKPFDFGHCRIVEAPFGPDIESDPLIGMTKLSQVFGLNAGAAGLWSKQELSNTLE YRQSDKT
SEQ ID NO: 45 (70% TIGR04103, >WP_015929579)
MHRPSDEPTYDGDPRSLARWRPGGSAPPSHAVWEITLRCDLGCRHCGSRAGRARRDELSTDAALDVVAQLA
DLGLREVTLIGGEFYLREDWDRIAAAITRRGMLCSIVTGARQMTRARIARAVAAGVGKISLSIDGLEQTHDSVR
GSAGSWQAAVTAGRRIASSGIDLSVNTQINRLTMPELPGVADLLVEIGARSWMVILTAAMGRAADRRALML
QPYHLLHLFPLLAAIKRERLDPAGIAFFPANNIGYFGPLAETLRYGAEGGHAWAGCDAGVASLGIEADGRLKGC
PSLPSADYTMGNVRDHSLAQLWAKRTPNRPIAAAEDLWGFCWTCPHATRCRGGCTWTSHVLFGRRGNNPF
CHYRALALAERGFAEAIEPVSVAPGEPFDFGRHRIVELPLPTALTDDPVIERTLASHVFGLRPGTASVWSPDERE
EAVI
SEQ ID NO: 46 (70% TIGR04103, >SFE54945)
MSLRDNRRRLPVVASLPRPDDRRALTRVAGEAPRPRYAVWELTLKCDQKCIHCGSRAGVARHGELTTAEALA
LVTDLRALGIGEITLIGGEAYLRDDFILIARAIRNAGMDCTMTTGGLNLGEARVAALAEAGIRSVSVSIDGTQAA
HDALRGVPGSFDRAFAALARLRAAGVGRAVNTQINRLTLPTLEALQERLIAEKIGGWQLQITAPFGNAADHPEI
LLQPFMMLEVFAVIERLIARGAPHGLRLFPANNLGYFGPLEAELRGRQKAGGHYKGCIAGRHALGIEADGTIKG
CPSLGGPANVAGNVRERPLREIWEHAPELQFTRVRTVDDLWGYCRDCYYNDVCMAGCSATSEPLLGRPGNN
PYCHHRALELGRAGLRERIEAVAPAPGVPFDHGLYRVVREALDPELAARGPVAVDDPRVGRDVQPFGPGAPV
G
SEQ ID NO: 47 (70% TIGR04103, >SFE62827)
MGIREERRRLPIVALAPPTRSRRALPIAAGQAPVPRIVVWEFTSACDQHCAHCGPRSGKRRPDELTTEEALRLV
DELAAAGVGEVTLIGGEAYLRPDVLRIVRAIRERGMSCTMTTGGYSLTREIAEALVEAGVQSVSVSIDGLAACH
DALRGRPNSFARAFAALRHLKAAGSQISANTQLNAKTLPDLEGLLELLAAEGIHSWQVQVTMAHGAAADHPE
ILLQPYQMIAAYEVVERLLARCEALGIRLYPGNSLGYFGPLEHRLRRNSTQRGHYFGCQAGISGAAVSSHGEVK
SCPSLGEEGVGGSWREHGFAALWERAPEIVYMRQRTRAELWGLCASCYYAAVCMGGCTSMSEPLLGRPGN
NPMCHHRALELDRQGLRERIEPVRPAPGQPFDHGLFRLILEHKDPELRALHGPLTIEEPRRSRVDEPRGPGSPL
A
SEQ ID NO: 48 (70% TIGR04103, >SFE67100)
MLPPPRGAVQRPALAVWEFTRACDQRCKACGPRAGVARPDELTTDEALRLVDELAELGVGEVALIGGEAYLR
ADVLWVIRRIRERGMSCSLTTGGLGLTQTRAEALVEAGLQLVSVSIDGLEASHDALRGTPGGWRRCFEALAHS
RRAGARIAANTQINRLTWRELLPLCDLLADAGAEVWQMFLTMPHGNAADHPELLLQPFELLELFPELERVIAR
CAARRIRFWPGNNLGYFGPLEGKLRRLQQEDGHYKGCSAGRTG LGIEADGTIKSCPSVGGAVNAGGNWRDH
GLRALWERAPEIRYVEQRGLDSLWGYCRECYYADTCMGGCTAMSEPLLGRPGNNPYCHHRALEMDRMGLR
ERVEQVAGADDQAFAHGLFRVVREPKAAGAG PVTIEGPRTGREAAFFGPGAPLAVAGADEP
SEQ ID NO: 49 (70% TIGR04103, >SFF23105)
MSLSDVRRRLPVVASLPAPANRWLTHEDRREAKAPRWAVWELTLACDQHCAHCGPRAGHKRPDELSTEECL
KVVRELAELGCGEVVLIGGEAYLRNDFILIIRAIREAGMACTMTTGGLNLTQERAEAMIEAGIGSVTFSIDGLEAT
HDRLRGVQGSWQRAFAAMRRIRAAGGKIASNTQINALTRHELLPLFELLADEGIHSWQLQITVPHGNAADHP
EILLQPHMFIDIFATLEQVLDRCEARKVRLWPGNNLGYFGPLERRLRQSQRKHWRGCTAGVSVMGIESDGAIK
NCPSLGGGTNIGGNWRVHGVKKVWEESYQLGYIRARTVDDLWGYCRECYYAETCMAGCTAAAEPLLGRPG
NNPFCHHRALMMDRAGLRERIEQIRGAGGKSFDNGLFRVIREHVDPELREKHGPVAIEEPRVSRLEEPYGAGH
TVAL
SEQ ID NO: 50 (70% TIGR04103, >WP_006972642)
MRLKEVRKRLPVVDSLPKGRGRRFRTHEAEGPVPRPALAVWEFTLACDHRCLHCGPRAGEARPNELTTDEAL
QLVDELAEAGVGEVVLIGGEAYLRNDFLLVIRRIRERGMTCTMTTGGLGLTKTRAEAMVEAGIQSVSVSIDGLE
AAHDKLRNRPGSWEHAFEALRNLRNAGSRVAVNSQINQINLGDHIHLLELIADEGVHSWQLQITVAHGNAA
DNADIILQPYMFLELFDQLDAIIDRAFERRVRIWPANNLGYFGPFEHKLRKSQKAHYRGCSAGRSTIGIESDGNI
KNCPSLGGPANIGGSWREHGLAKIWKEAAEITYIRRRTVDDLWGYCRECYYAETCMSGCTAANEPLLGRPGN
NPFCHHRALEMDRMGMRERIEPFIPAKGVPFDNGLFRLIREWKDPARREAEGPVEVTEPRVSRLIDEMGSGR AIRMDELVDGRAPFELKDH
SEQ ID NO: 51 (70% TIGR04103, >WP_053236092)
MTHGSIRDPRVTPIGETGLRYVVWELTLRCDLACRHCGSRAG KAREDELSTDEALDVVRQLASMGAREVVLIG
GEAYLRDDWTIVARAIADAGMRCAMVSGGRGLDATRARAAREAGVASVSISIDGIGATHDVQRGLDGAFES
ARVAMRNLRDAGVTLQANTQVNRLSYPELDAILDLLVEERATGWQLAMTVPMGRAADRPDWLLQPHELLE
VYPKLAALAERGARHGVLFFPGNNIGYFGPHEATLRGRGITDDVAWGGCIAGKHAMGIESDGSIKGCPSLPSA
DWVGGTAREASLREIWEQTRELRYVRDRELPGALWGECARCYYASVCGGGCTWTAHTFFGRPGNNPYCHH
RALEMRARGERERLVRVEAAPGAPFDHGRWEIVVEPWVEGEGVARVERPSKRLRVL
SEQ ID NO: 52 (70% TIGR04103, >WP_006969608)
MSTSSDSPARRGPSLPVLDGRGRSDGKLRLPLAEQTRECDTLARPEYAVWELTLRCDLACRHCGSRAGKARPD
ELSTEEALELVTQMAEMGVQETTVIGGEAYLRADWHRIARALTDAGISTTMTTGGRGLDPERVALAKAAGIQ
SVSVSIDGLEAEHDYQRNLKGSYAAAMAALDNLAAAGIPRSVNTQLNGSNLRDIEALLEVIATKGIHSWQIQIT
VAMGRAADHPELLLQPWQMIELMPMAARIARRCRELGIRLWPGSNVGYFGPYEALLRWDHPDGHQTGCE
AGTRTLGIEANGDIKGCPSLPTADYVGANVRDHSLRAIWERSSALRFNRERGTEELWGRCASCYYAPICKAGCT
WTGHVLFGRRGNNPYCHHRALELLREGRRERVELREAASGDPFDHAIYELIEEPWPEPELSRARAVAESGEG
WIR
SEQ ID NO: 53 (70% TIGR04103, >KIG11737)
MSRPSLPIVDASRPKPRVRLPLAPGVRSCDDVRPEYAVWEVTLRCDLACRHCGSRAGHARADELDTEEALDLV
TQMAALGVKETTIIGGEAYLRDDWHLIAAALVNAGIRCTMTTGGRGLTAERVEIAKRAGIESVSVSIDGLAQAH
DHLRALHGSHAAAMRALDHLRAAGIPRSVNTQLNGYNLREIEPLLDQLTAREIHSWQVQITVAMGRAADHPE
LLLQPWQMLELMPLVARLARRCDELGIRLWPGSNIGYFGPYEQLLRWDHRDGHQTGCDAGTRTLGIEANGD
IKGCPSLPSNEYVGGNVREHSLREIWERADALRFNRERRVDELWGRCAGCYYADECKAGCTWTGHVLFGRR
GNNPYCHHRALELLREGRRERLELHTPAPGEPFDHGLYRVIEEPWPAELIDRAREVAASGVGWIS
SEQ ID NO: 54 (70% TIGR04103, >AKV02060)
MELVACSDPTVSIRDPLLAAERAELLRAKRPRVGLPTISKPRRPLPVLSEPRQRDRSIRPRHAVWEITLRCDQAC
RHCGSRAGVERPNELTTEECLDLVRQIAELGVMEVTLIGGEAYLRPDFVQIVRAIRSHGMHCTMTTGGRGLSP
TLAREAAAAGLGSASVSIDGAEETHDRLRGAKGSHRDAIAAMRALREAGVRLTVNTQINRLSLVDLPSILEMLV
REGAEAWQIMLTVAMGRAADEPDVLLQPYDLLDLFPLLDTLAARCEEHGVRLYPGNNLGYFGPYESRLRGTLP
RGHGTSCSAGRGTLGIESDGLVKGCPSLPSEQWGGGTVRDHSLVDLWERASALRYTRDRTVEDLTGFCRTCY
YADICRAGCTWTTSVLFGRPGDNPYCHHRALERDREGLRERLVRRQPAPGEPFDHGIFELIVEPVPGEKEANS
SEQ ID NO: 55 (70% TIGR04103, >AKU99181)
MEKFDPLTAKARAKELQRERPKRALPIAPTAPLGVRHRPLREPRDVDRRYRPIYAVWEITLACDLACRHCGSRA
GRERPDELDTKEALDLVGQMASLGVKEVTLIGGEAYLRGDWLDIVRAIRAHGMVATMTSGGRGLTPELVAQ
AHEAGLVGASISLDGDEVTHDRLRGVKGSYRAAIEALRALRERNMRVSCNSQINRLSVPYLDFILESIAAIGVHS
WQIQLTVPMGRAADEPDVLLQPYDLLELFPRLAELKKRCDELAVRMLPGNNIGYFGPYESTLRGYHVSGHAGS
CGAGRATLGIEANGAIKGCPSLPTEHWTGGNVRDASLLDIWERAEPLRYTRDRTVDDLWGFCRTCYYAEECAS
GCTWTSFVTLGKAGNNPYCHHRALELEKRGKRERVVRVQSAPGEPFDHGVFALVEEDLESGSETL
SEQ ID NO: 56 (80% TIGR03913, >WP_052261552)
MAATRFLSKSDVETRRPVYVVWETTLACNLKCKHCGSRAGTPRKDELSTEDAFGLIDELADLGTREITLIGGELF
LRKDWLKLVERISSKGILCTMQSGGFHLTRERLKDAKEAGLAAIGISIDGLEKTHNSLRGVRTSFQHALRSLEAA
RDLHITSNVNTTITSKNIDELELLYEVLKGFHVRNWQFQIVVAMGNAADEETLLFQPYQITKVLDSVARIRTSAS
RVGMLVQASNALGYFGPYEWLWERGSDTDSHWSSCGAGQSTMGIEADGTIKACPSLATEDFGVTPSEYDTL
QDLWIGSDRIRFNETRDPPSGSICGACYYWAACKGGCSWATHSLTGTVGENPYCHYRAICLSELGLKEKIRKVR
DAPGSSFDRGEFEVVIEDEEGKTLKSDDPRKHRIIEKISAGREQGASEEALKLCTSCHQFAWEYEKQCPFCSGTK
FVSTRALKSIKSATKFL
SEQ ID NO: 57 (80% TIGR03913, >WP_046757912)
MEALSSKRYRVRDDFKTARPVYVVWEITLACNLKCTHCGSRAGKVRPGELTTEQCFEVVDSLKKLGTREISIIGG EAFLRKDWLDIIRRIDGHGMECSMQTGAYNLTEKRITDAKEAGIKNIGVSIDGLPDVHNEIRGRKDSFEQAITCL GLLKKHNIVSSVNTTITKKNKNQLNELLDILIANGVKNWQVQLVVAMGNAVDHSDELIIQPYELIEFYDNLIQIY
RRALANNVLVQAGNNIGYFGPYEHIWRQGSVGYYSGCGAGHTALGIEADGVVKGCPSLPTTDYTGGNIKNM
SIEDIWNYSEEMFFSRYRNKNEMWGGCKGCYYETSCLAGCSWTSHVLFGKRGNNPFCHHRALELHKKGQKE
RIRKIKEAEGTSFDYGLYEVVVEDFNGNIIEVQTPTSTPTIVNEYTTRIPRDPEPLKLCQSCKNHIYESSEDCDFCG
ENFEASTAKYEKNLAHAKQVLIRLENLINLDKVNDE
SEQ ID NO: 58 (80% TIGR03913, >WP_024981470)
MEKTSTRYRVRDDYKTATPVHVVWEITLACNLKCSHCGSRAGKVRPGELTTEQCFDVIDSLKRLGTREITIIGGE
AFLRKDWLDIIQRIHQSGMECSMQSGAYNLTEQRIIDAKKAGIRNIGVSLDGLPATHNKIRGRSDSFDHVINCL
QLLKQHNIPSSVNTVVTKRSKNELKELLDVLIENGVKNWQIQLAVAMGNAVDNSEELIVQPYELIDFYDKLIVIY
RKALANNVLIQAGNNIGYFGPYEHIWRQGNEKYYTGCSAGHTGIGIEADGKIKGCPSLPTTEYTGGNVKDMKL
EDIWKYSEEMVFSRYRNKEELWGGCKGCYYESSCLAGCTWTSHVLFGKRGNNPFCHYRALELKKKGLHERIRK
IQEAPGLSFDIGLFEIVVENEKGEIVEVQSPNDVAPISVLSNSDRIPRIPKALKMCNGCDNYVYEEEVICTFCKSDI
QKVNEEYAQKLQRAIVSLEKLELLMMK
SEQ ID NO: 59 (80% TIGR03913, >WP_063775385)
MRSARDRKTHVPVHVVWEITLACNLKCGHCGSRAGKRRANELSTAECLDVVRQLAAVGTREITLIGGEAYLRK
DWLEIAAEIARLGMHCGLQTGARGLTRERIAAAYAAGVRAIGISLDGLRDLHDELRGVKGSHDQAIQAIKWVS
EVGIEPGVNSQINLRSMRELDGIFDEIVAAGAKYWQVQLTVAMGNAVDNSEMLLQPHQIVDVVDKLAELYH
RGRDVGLRLLPGNSIGYFGRHEAYWRSLTDDVTHWGGCTAGETTLGLEADGTIKSCPSLPKSHFAGGETRTSTI
EEALKALESRNVRRDGNRGRSFCGSCYYWNVCRGGCTWVSHVLEGRRGDNPYCYYRATTLARRGLRERIVKV
ADAPDEPFAVGRFEVRLEHEDGSRAPRSIGLDDKPRKRGRLGLCQSCYEYMSLGERHCPHCGEVNRPKLMVD
AKLELDVQIALDEIERHGRSILELAAAAGQDPQAAE
SEQ ID NO: 60 (80% TIGR03913, >OCW56221)
MPVHVVWEITLACNLACGHCGSRAGARRPDELTTAECFDIIRQLREAGTREITLIGGEAYLRKDWLEIAAEISRL
GMLCGLQTGARGLTKARIEAAYDAGVRAIGVSIDGPRDIHDRLRGVAGSHDQAMRAIADIAETGIRPGVNTQI
NALSAPHLWDIYRAIRDAGARSWQTQLTVAMGNAVDNAEMLLQPHLIVDVVEDLYAIFEDGLLHDFRVLPG
NSIGYFGRHEAQWRSITEAAEPWQGCTAG ETTLGLEADGTIKGCPSLTRESYSGGNSRTVSIAEAIGNLADRTV
RRDGNPGGGFCATCYYYDFCQAGCTWVTHSLTGRRGDNPFCVYRAGKLRDAGLRERIVKTAEAPDTPFAIGK
FDIVLETLDGKPGPATVADTAPRSVGATHLVVCEQCGQFIANTEPVCVHCHAEQRAAARSRLERVQDVHNLL
AEIDSHSHGIHALVDEISAR
SEQ ID NO: 61 (80% TIGR03913, >AKI02186)
MDAPLRYRTQDDTGSTTPVHVVWEITLACNLSCGHCGSRAGSRRPNELTTLECFDIIHQLRDAGTREITLIGGE
AYLRKDWLEIAAEITRSGILCGMQTGARGLTRPRVQAAYDAGIRAIGVSIDGPKDMHDRLRGFDGSYDQAMQ
AIGYIAETGIRPGVNTQI NVLSAPYLWEIYGEILKAGAKSWQTQLTVAMGNAVDSAHILLQPYQIIDVIDDLYSIY
EDGLLNDFRLLPGNSIGYFGKYEAHWRSITRTAEAWQGCTAGTTTLGLEADGTIKGCPSLSKDSYSGGVSREVS
LAEAISNLSGRTVKRDGNPGRGFCKTCYYYEFCQGGCTWVTHSLTGERGDNPYCYYRASKLRKAGLRESIVKTA
EAAQTPFAIGKFAILIETLEGKPDSVTIEDDAPKPDHSTHLVICENCEEFIGNNEPVCVHCNTEVRSAAQTRLAHT
QEINNLVAEIEMHSRQIQSVIDTIGDR
SEQ ID NO: 62 (80% TIGR03913, >WP_060978522)
MVPQAELPHARFFGSADQRRRVPVSVVWELTRACDLSCCHCGSRAGRRAREELSSVECLDLVDQLAELGARD
IGLIGGEVYLRRDWLEIVRRIRQAGMDCSVQTGGRFFTPATLDAAIAAGVMSIGVSLDGVGQTHDLQRGVPG
SFEAALALLHLVANRPIMASVNTQINRHTMQQLPELLDVLIAAKVSNWQLALTVAMGNAADNDDLLLQPYDL
LALFPLLASLHDRAAENGIVLQPSNNIGYFGPYEEKLRLIGDIAAHWTGCSAGDNVLGIEADGNIKGCPGLSRDY
VGGNVRDEPLRVIWDRAERLAFTHRATKSDLWGYCAQCYYAEICMAGCTWTAHSLMGRPGNNPMCHYRA
LEFARRGKRERLVKVADAPGHAFDYGRHALVVEDRPLSDQAVLS
SEQ ID NO: 63 (80% TIGR03913, >SAL02149)
MELKISHHDAPVRVLRPSDRGGRTPVYAVWELTLQCNLACSHCGSRAGKKRTGELSTSEALALVEDLRGLGVR ELALIGGEAYLRGDWIEIVRRARDLGIRPVLQSGGYGFTATLAQRAKEAGLAALGISVDGLEPIHDDIRGVVGSY RAAFEALNAAREVGLTVTANTQIHAKNWRSLPDIYTDLRKAQIAAWQLQLTVAMGNAADNDDLLLQPFHLY ELFPVIAALTDMAKTENVQIYPGNNIGFFGPYEHLWRGPHRFDNHYKGCQAGINTIGIEADGTIKGCPSLATSR YSEGNVRVVPIKEIWNGNLAFPLNRDWDLKLWGFCKTCYYAKVCRGGCNWTSDSLFGKPGNNPYCHHRVLS LKKLG KRE R VVKVSE ADKSSFGTG LFKLI EESW
SEQ ID NO: 64 (80% TIGR03913, >WP_043629091)
MGDAGGRARPARPNRYLSGPDVRGHQLVHAVWELTLACNLKCRHCGSRAGSVRPEEMTTQECLGVVRQLH
ELGAREVTLIGGEAYLRKDWVEIVRSISDAGMECTLQTGAWQLTGTRIAQAADAGLVACGVSIDGLAPLHDHL
RGRPGSFDAAIDALGRLREHGIRTSVNTQITAAVIPQLRDLFREFLATGVRNWQVQLTVAMGRAADNHDLLM
QPYQMNELMPLLAELHAAGVAKGLLLQPGNNIGYFGPYEAQLRGSGTASMHWAGCFAGRNVLGIEADGTIK
GCPSLPTVTYAGGNVRDSSIAEIWASASQLSFARKSRGKEMWGFCASCYYADVCEGGCTWTSHSLLGRPGNN
PYCHYRASELKKQGKRERVVRRLPAPGTSFDHGRFEIVVEPDPDTAGEHRAAGASPQERSRPGPPPAPGDAG
SHLPSLVLCRACDCHVFAGTDFCPHCGADVPASQREYESALADARHAASRLARAIGSLGPELPATTTPRA
SEQ ID NO: 65 (80% TIGR03913, >WP_050043969)
MDQAPDDTPRRRLSLSDQLDCVPVYVVWELTLACNLKCIHCGSRAGHKRAKELTTDECVDVIRQLAELGTREI
SVIGGEAFIRRDWLTII AIRDHGIDCTMQTGGYKLSGDMIRSAADAGLLG LGVSVDGLEPLHDRLRGVRGSYR
EALRVLDDCRRLGLTASVNTQITSAVMSELPQVMEIIIDAGAKYWQVQLTVAMGNAVDHDDILLQPYDLTTL
MPLLAELHLKGRERNLVLLPGNNVGYFGPYDVLWRGPDRGYYSGCPAGQNVIGLEADGTVKGCPSLATDRYG
AGDVRSASIAELWATHPALQFNRNRGTDDLWGFCRDCYYAEACRGGCTWTADSLLGRRGNNPYCHHRVLT
LAERGIRERIVKIEDAPALPFATGRFALIEEPLPAGAPHA
SEQ ID NO: 66 (80% TIGR03913, >WP_020539729)
MSVEEDGLARRVARERDFRDKVPVHVVWELTLACNLKCLHCGSRAGPRRPAELTTEEALDLVAQLAGLGTRE
LTVIGGEAYLRNDWVEIIRRATELGMSCSMQTGARALTPARLRAGADAGLRGIGVSIDGLRDLHDEVRGVPG
AYANAFKVLRDAREAGLRVSVNTQIGARTIGELPALMDELVAAGVTHWQVQLTVAMGNAADHDELLLQPYR
LAELMPLLARLHHEGQRHGLLMVPGNNIGYFGPYEHLWRNASSMSGHWSGCEAGHTALGIEADGTVKGCP
SLPTSAYTAGNVRDLSVADMWRDSPALGFRRGRAGADELWGHCGTCYYADVCDAGCTWTAHSLLGRPGN
NPYCHHRVLELAKKGIRERIVKVADASAAPFGIGEFALVEEAIPADPRPADPRPAVPGRAERAPGDRGVPELRL
CGDCAHFVMAGDAACPFCSADVAEAEERARAVHRRRTELMDQVKALMRQET
SEQ ID NO: 67 (80% TIGR03913, >KYF96555)
MSRAVRDRETDDFRRNVPVHAVWEITLACDLKCRHCGSRAGARRDRELDTAECLEVVAALARLGTRELSLIGG
EAYLRSDWIDIIRAARAARMRCAVQTGGRNLTEARLAAAVGAGLQAVGVSIDGLAPLHDELRGVPGSFSRAID
AVRRARAHGIAASVNTQIGARTMDDLPALLDAIAAAGATHWQIQLTVAMGNAVDNDAVLLQPYQLLELMPL
LDRLYREGLDRGLLMVVGNNIGYFGPFEHRLRGVGDESVHWTGCAAGQNVIGLEADGTVKGCPSLRTSGYA
GGNVRDLRLEDIWNHAEEIHFGRLATVDALWGFCRTCYYADVCRGGCTWTADSLFGRPGNNPYCHHRALEL
DRRSLRERVVKKREAPGAPFAIGEFELIVEPVPGREGASPAPPVTS
SEQ ID NO: 68 (80% TIGR03913, >WP_013570078)
MLLSEPRKMETPPRYLTNQDFQRYIPVHVVWEITLACDLKCMHCGSRAGHRRPDELTTQECLDVVDALARLG
TREVTLIGGEAYLRKDWTQIIRRCADHGIYCATQTGGRNFTQQRLQEAIDAGMNGLGVSLDGLEPLHDRLRG
VPGSFRQALDTLQRAHDAGLSISVNTQIGAEIIPQLPELMEIILGAGAKQWQIQITVAMGNAVDHPELLLQPH
RVLELMPMLARLYQEAAERGMLMVTGNNIGYFGPYEHIWRGFGDERVHWSGCNAGHTGLALEADGTVKG
CPSLATVGFSGGNVRDLTLEHIWNHSKEIHFGRLRSIEDLWGFCRTCYYAEVCRGGCTWTSHSLLGKPGNNPY
CHYRALELEKQGLRERVVKIRDAAPDSFAVGEFELITERIEDGTVAASSVSQKQLVQLGLSNDERWSPREGRVP
PTLKLCRACNEYVWPHEVDCPHCGENIAASLAQYQEDTRRRREAMDAVRALIESYRAAETSVVPST
SEQ ID NO: 69 (80% TIGR03913, >AAY91426)
MGWCRAPPGAQGTLMSDRLPARYLSETDLKRFVPVHVVWEITLACDLKCLHCGSRAGHRRPGELNTQECLS
VIDSIAALGTREVTLIGGEAYLRKDWTRLIQAIHDHGMYVAIQTGGRNLTPAKMQAAVDAGLNGVGVSLDGL
APLHDAVRNVPGSFDKALDTLRRAKQAGLKVSVNTQIGAATLPDLPALMELIIDAGASHWQIQLTVAMGNAV
DHPELLLQPYQLLEVMPLLARLYREGAERGLLMNVGNNIGYYGPYEHMWRGFGDDRVHWSGCAAGQTVLA
LEADGTVKGCPSLATVGFSGGNVRSMSLHDIWHYSEGIHFGRLRSVDDLWGYCRTCYYNDVCRGGCTWTSH
SLLGKPGNNPYCHYRTLDLAKKGLRERIVKLEDAGPASFSVGRFDLITERIDTGEAVSSVNDSGQVIKLAWVNQ
GQASPEEGRIPPRLALCRSCLEYIHAHESTCPHCNADVAAAEARHQEDRLRQQALINTLHQLLGTPQEQPRL
SEQ ID NO: 70 (80% TIGR03913, >WP_006753568)
MDENQVRPVRFLSEQDYERCVPVHVVWEITLACDLKCLHCGSRAGHRRTNELSTGECLEVIGALARLGTREVS MIGGEAYLRKDWAQLIKAIRSHGMYCAVQTGGRNLTPARLAQAVEAGLNGLGVSLDGLAPLHDKVRNVPGS FDRAVDTLKRARAHGLAISVNTQIGSATMRDLPALMDSIIEIGATHWQIQLTVAMGNAVDHDELLLQPYQLEE
LMPLLADLYKRGLDRGLLMNVGNNIGYFGPHEHLWRGFGDERVHWTGCAAGQTVIALEADGTVKGCPSLAT
VGFAGGNVRDLSLEEIWRTSEAIHFGRLRSVDDLWGFCRTCYYADVCRGGCTWTSHSLLGKPGNNPYCHYRV
LELKKQGLRERIEKIEDAAPASFAVGRFDLVTERISDGTPVSSISRSGQTVELAWKHKGKRAPEVGRVPPRLVVC
RACNSYVHQHESRCPHCGADIAAAERAYEHDAQRRHALIQEVERLLS
SEQ ID NO: 71 (80% TIGR03913, >WP_043628614)
MSDSQEKRPARYLSREDFERYNPVHVVWEITLACDLKCLHCGSRAGHRRPSELSTAECLQVIDALAKLGTREITL
IGGEAYLRKDWTQLIRAIRGHGIYCATQTGGRNLTPAKLQEAVDAGLNGVGVSLDGLAPLHDKLRNVPGSFDK
ASDALRRAKAAGLAVSVNTQIGAATMPDLPELMDHIIELGATHWQIQLTVAMGNAVDNDEVLLQPYRVLEL
MPLLARLYQEGLERGLLMTIGNNIGYYGPYEHIWRGFGDDRVHWAGCGAGQTVMALEADGTVKGCPSLAT
VGFAGGNVRDMALEDIWRHSEGIHFGRLRSVDDLWGYCRGCYYNDVCRGGCTWTSHSLLGKPGNNPYCHY
RALDLQQRGLRERIVKLQDAAQDSFAVGRFDLITEEIASGKVVSQISRSGQVIELSWKNRGKKAPETGRPPARL
ALCSNCRQYIHQHETTCPHCKGDVIAAERLHRQKMAERDEAIQRLRSLLGEVS
SEQ ID NO: 72 (80% TIGR03913, >WP_062765523)
MPDGQTPAERAIPRARSLDDIVDLVPVH VWEITLACNLKCQHCGSRAGHVRKGELSTAECLDLVDQMARLG
TREVTLIGGEAYLRRDWLEILARVRAHGILCLIQTGGRNLTDKRLEAAIAAGINGIGVSIDGLAPLHDRLRGVPGS
FDQAMSALTRAKAAGLSISVNTQIGAETMEDLPELMDRIIAAGATHWQIQMTVAMGNAVDNPDILLQPYRLI
ELMPLLARLYREGARRGLTMVVGNNIGYFGPYESLWRGFGNEAVHWTGCAAGQNVIGIEADGTIKGCPSLAT
HGYAAGNIRDLALDDIWRNQEAMAFGRTRSVEKDLWGYCRSCYYADVCRAGCTWTSDSLLGRRGNNPYCH
YRVRDLAKKGRRERVVKIREAGPESFAVGEFALIEEAIPPEELAALPPDRLPGFSATDGRPVHIAPEDDDDPARA
AGARGRIPPHLDLCRSCHEYVWPHETDCPHCGADIAAAAAAYAARLAEVQALAARIRARLDEAAPGI
SEQ ID NO: 73 (80% TIGR03913, >WP_014249603)
MDQPLPSPDLSARSPEQLPEQPASVPARYCFDDDFRKLVPVLVVWETTLACNLKCQHCGSRAGRPRPDELTTE
EALDLVDRLAALGTREISLIGGEAYLRKDWVEIIRRCRSHGIRTAVQTGGRNLTDRRLDEAVAAGLQAIGVSIDG
LPDLHDRVRGVSGSYDQAMSALRRAKDRGLAVSVNTQIGPETPDHLPELMNRIIEAGATHWQIQFTVAMGN
AVDNPDLLLQPHRLLDVMPLLARLYREGLDRGLLMVVGNNVGYYGPYEHIWRGLGDDRMHWTGCAAGQI
GIGIEADGTLKGCPSLATSLYAAGNIREMSVEDIWRHADRMQFGRLRSVDELWGYCRTCYYADICRAGCTWT
SESLLGKRGNNPYCHYRVLDLAKHGLRERVVKIKDAPNEAFAIGEFALIQEPIPGAESPGLPERDPAKVHRHEYE
RSLEGGVVPPSLTLCRACNQYVWPHETDCPHCGADVAAAAAQHGIDSARRRALIRETQRLLDEARAAKLEAA
GAPSPATAAGD
SEQ ID NO: 74 (80% TIGR03913, >WP_045586236)
MDQTIPGLVPARSPSEELPQDHVPARFCDAEDYRRLVPVHVVWEITLACNLKCQHCGSRAGRPRPDELDTGE
ALDLVDRLAALGTREISLIGGEAYLRRDWLEIVRRCRSHGMRTSMQTGARNLTDARIDAAAEAGLQAIGVSID
GMPELHDRVRGVPGSYEQAIGALRRAKARGLAVSANTQIGPETPDHLPAIMDAIIEAGATHWQIQFTVAMG
NAVDNPDULQPHRUEVMPLLARLYREGLDRGLLLVMGNNVGYYGPYERLWRGFGDESQHWSGCSAGQTG
IGIEADGTIKGCPSLATSLYASGNIRDMTLEDIWRLSDRMAFARTRSVDELWGYCRTCYYADACRAGCTWTSE
SLLGKRGNNPYCHYRVLDLAKHGLRERVVKIKDAPKEAFAIGEFALITEPIPGADSPGLPERDPAKVHRHSCERS
AEGGVVPPSLTLCRSCNQYIWPHETDCPHCGADVAAAAARHDIDSARRRALIRETQRLLDEARAAKAAAKEA
VKEASTVAGAVSPS
SEQ ID NO: 75 (80% TIGR03913, >WP_004273211)
METSASALPHAPQEAPVRFRRMEDHHNLVPVHVVWEITLACNLKCQHCGSRAGRPRADELTTAEALDLVDQ
LAALGTREMTLIGGEAYLRRDWIDIVRRCREHGMRTAIQTGARNLTDARLEQAVDAGLQGLGVSIDGLPDLH
DRVRGVPGSYDQAISALRRAKVLGLDVSVNTQIGPETPAHLPDLMDRIIEAGATHWQIQLTVAMGNAVDNP
DLLLQPYQLIDVMPLLARLYQEGVERGLLMIVGNNIGYFGPYERMWRGYGDETMHWTGCAAGQTTIGIEAD
GTIKGCPSLATSLYSAGHVRDMTVEDIWRHTERISFGRLRSVEEMWGYCRTCYYADACRAGCTWTSESLLGRR
GNNPYCHHRVLDLAKHGLRERVVKVKEAPPESFAIGEFALITEAIPGVADAPPLPARDPAHVHIHPRERAAGG
GTTPPKLEVCRGCDQYIWPHEEACPHCGSDVAAEAARHAEDTARRRALINQAKRFLAGVRTASVAE
SEQ ID NO: 76 (80% TIGR03913, >WP_056720094)
MVGKPVRYRVDSDYTELVPVHVVWEVTLACNLKCQHCGSRAGRPRSDELNTAEALELVDHLAALGTRELTLIG GEAYLRKDWIELIRRSRDHGIRTAIQTGARNLTDKRLDEAAEAGLQGVGVSIDGLPELHDRVRGVPGSYDQAID ALKRAKARGMAVSVNTQIGAETAEHLPELMDRIIEAGATHWQIQLTVAMGNAVDNPELLLQPYKLIEVMPLL
ARLYREGEARGLLMVVGNNVGYFGPYEHIWRGFGDDADHWNGCSAGQTGIGIEADGTIKGCPSLATSLYAA
GNVREMSVGDIWRHSEKMSFGRLRDVEELWGYCRTCYYADVCRAGCTWTSESLLGKRGNNPYCHHRVLDL
AKHGLRERVVKIREAPQESFAVGEFALITEAIPGAEVTPLPVRDPAAVHVARRERSAAGGALPPSLKPCRSCQQ
YIWPHEVTCPHCDADVATAATAHEAEQTRRRALIADATRILAKAAQARAARLETGA
SEQ ID NO: 77 (80% TIGR04103, >WP_020737613)
MIEPTEIPVRALMPADLMEAKPIYAVWELTMKCDQPCQHCGSRAGAARDAELSTEEVLEVAASLARLGCREV
ALIGGEAYLREDLAEIVSFLARSGMRVIMQTGGRAFTAERAKALRAAGLTGLGVSVDGPAHIHDELRGNVGSH
AAAIRALDNARAAGLITTANTQINRLNAHLLRETCAELRSHGIQTWQVQITVPMGRAADHPEWILEPWRVVE
VIDTLAAIQREALETHVSGVPFNVFANNNIGYFGPHEQLLRSRPGGGDAHWRGCRGGINAIGIESDGTVKACP
SVPTVPYAGGNVRERGLEHIWEGSAEVRFARDRDASELWGHCATCYYADECRAGCSWTAHCTLGRRGNNPF
CYHRVTQLKRRGIRERLVMKQRAPHVPYDHGVFELVEEAWDAPPPPPPEPVVPRNARRRLAVV
SEQ ID NO: 78 (80% TIGR04103, >SFD76092)
MNSPARALRPDDLRQPRPIYVVWETTLRCDHECAHCGSRAGDARPDELSTEELLEVADALVRLGSREVTLIGG
EAYLRGDCYRLIEHMTKAGIRVTMQTGGRGLTQDRCRKLREAGLAAIGVSVDGPEAAHDTLRASPGSHAAAL
KGIRNAREAGLLVTSNSQINRLNKDVLRETAELLADAGVAVWRAQMTAPMGRAADRPDWLLEPYMVLEVID
TLADIQRWAQRRAADRGIPWERAFHVRLGNNLGYFGPHEQLLRTRPGSPDSYWQGCSAGKFVMGIESDGTI
KGCPSLPTAPYTGGNVKTAALADIWNEAPEIAFARDRGTSELWGFCKSCYYAEVCRAGCSFTAHSAIGKRGNN
PFCYYRATQMKRKGLRERVVLRQAAPGDPYDFGQYEIVEEPWDSAPPRAAVRLPVLAG
SEQ ID NO: 79 (80% TIGR04103, >KIG18351)
MSSSRTIREHEVDQPRPIYTVWEITLRCDHACAHCGSRAGPVRDDELDTAELLAVADALVELGSREVTLIGGEA
YLRSDVYQLVEHLAKAGVRVTMQTGGRGLTAARAQRLRDAGLAAVGVSIDGTAAVHDRLRASPGSHDAAM
RAIEHARAAGMVVTSNSQINQLNMHELPAIAAELEAAGVLVWRGQLTAPMGRAADHPEWIVQPYMVLEIID
TLAQIQAGASARAQARGASEMESFRVTLGNNLGYYGPHEPLLRSRPDRRDRFFPGCQAGRYVLGIESDGTVK
GCPSLPTAPYQGGNVRELSLEQIWDSEAIRFTRDRSTDELWGHCASCYYADVCRAGCSFTSHSTLGRRGNNPF
CYYRADKLRKQGLREVIVHARAAPGSPYDFGGFELREQPWSDLPPPAGRRSLPWTS
SEQ ID NO: 80 (80% TIGR04103, >WP_006974888)
MVSPRSIRPEERDTPRPVYVVWEITLRCDHACAHCGSRAGPVREDELSTEELFEVADSLARLGAREVTLIGGEA
YLRSDVYALIAHLRGHGLRVTMQTGGRGLTEGRARKLEEAGLAAVGVSVDGTAETHDTLRASPGSHAAAIQAI
HNARAAGLLVTSNSQINRLNMHELPAIAAELEAAGALVWRAQLTAPMGRAADRPGWIVQPYMVLDIIDTLA
EIQAGAQARASAAGLDPMRAFRVTLGNNLGYYGRHEGKLRSRPDRSDRYFTGCQAGRFVMGIESDGVIKGC
PSLPTAPYVGGNVRDLDIETIWADAAPIRFTRDRDTSELWGHCASCYYADTCMAGCSFTSHSTLGRRGNNPFC
YYRADKLRREGLREVLVHAEAAPNSPYDFGRFELREQPWSDPVPAPTRKRSLPVV
SEQ ID NO: 81 (80% TIGR04103, >WP_006974883)
MAGRRLDVLSTDFFPAYVVWELTLRCDLACRHCGSRAGPARPVELTTSEAVAVAEELGRMGAREVVLIGGEA
YLHEGFLEVVEALARAGVVPVMTTGGRGVDEALARAMAEAGLRRVSVSIDGLEPTHDRMRGFRGSFAAALA
ALDHCAAAGLSISANTNLNRLNWGDLEALYEQHLRGRVRSWQLQITTPLGRAADRTAMIFQPFDLLELLPRVA
ALKRRAFAEGVLILPGNNLGYFGPEEGLLRSQTPEGTDHWQGCQAGRFVMGIESDGAVKGCPSLQTAAYVG
GKVLERGLAEIWNEAPQLAFTRERSVEDLWGYCRGCVFAKTCLGGCSFTAHAVFGRPGNNPYCHYRARDFAK
RGLRERLVPTEAAEGTPFDNGLFEVVEEPLDAPDPEAALEPRELVQITRRPGSTPAQTPRDPAGGGA
SEQ ID NO: 82 (80% TIGR04103, >WP_012234464)
MWRSAQRARARARARARARARARAQVIIHPPRDRKCTGEQEPAAEFAVGSGMALRRLDVVPGEYFPAYVV
WELTLRCDQPCRHCGSRAGAARPSELGTDEALGVVRQLAAMGAREVVLIGGEAYLHDGFLEIIAALKAAGVRP
TMTTGGRGITAEIAAQLKEAGLHSVSVSVDGLERAHDLIRKAPGSHGSALAALGHLRSAGLLTAANTNLNRVN
QGDLEALYDLLREQGIKAWQVQITAALGRAADRPAMLLQPYDLLDVLPRIAELKRRAFRDGITVMPGNNLGYF
GPEEALLRSLREGGRDHFRGCQAGKLVLGIESDGAVKGCPSLQSDAYVGGDLRGRALQEIWDEAPRLAFARA
RTADDLWGFCRSCAFAEVCMGGCTFTAHALFGRPGNNPYCHFRARTLAAQGKRERLVPAEPAAGRPFDNGL
FELVLEDLDGPEPGLDSPEQLVQLTRKPRPSS
SEQ ID NO: 83 (80% TIGR04103, >WP_002625456)
MSMLRRLDVTRPDHHPAYVVWELTLRCDQPCTHCGSRAGTQRPDELSTAEALDVVRQLREMRTREVVLIGG EAYLHPGFLDIIRALKEAGIRPGLTTGGRGMTEALARQVAEAGLYAASVSIDGLEPTHDLMRAAPGSFASATAA
LGFLSAAGVRVAVNTNFNRLNQADLEPLYEHLKGLGPRAWQLQITAPLGRAADRPALLLQPWDLLDLLPRIAA LKQRAFADGITLMPGNNLGYFGPEEGVLRSPGPDASDHWRGCMAGRYVMGIESNGAVKGCPSLQTAHYVG GNLRERPLRDLWDNAPPLAFTRTRTVEDLWGFCRTCPFASTCMAGCSFTAHALFGRPGNNPYCHYRARTLAK QGVRERLVPQAPAPGKPFDHGLFDLVVEPLDAPDPRPPTPRMLVKRLKWPEAHTPQAVGARTDPTG SEQ ID NO: 84(80% TIGR04103, >WP_044192718)
MVLRRIDTAPEDFHPAYVVWELTLACDQPCTHCGSRAGTARAGELSTEEALGVVAQLAAMRAREVVLIGGEA
YLHPGFLDIIRALKAAGLRPMLTTGGRGITAELAGEMARAGLHGASVSVDGLEETHDLMRAARGSFASATAAL
GHLKAAGVRTAANTNLNRLNQGDLEELYAHLKAQGIGAWQLQITAPLGRAADRPDLLLQPWDLIELLPRIARL
KEQAYRERILIMPGNNLGYFGTEEALLRSVQAGGRDHWRGCQAGRYVLGIESNGAVKGCPSLQTAHYVGGN
LREQPLERIWNDSSELAFTRARTVEDLWGFCRTCPFAEVCMGGCSFTAHSLFGRPGNNPYCHYRAKTLASRG
QRERLLPKAPAPGRPFDHGLYEIVPESLDAPDPKPELKREYVKRRRWP
SEQ ID NO: 85 (80% TIGR04103, >WP_010607032)
MTNTLENAGVKVREYDRSTYAVWEITLKCNLACSHCGSRAGDKRADELSTTEAFDLIKQMADLGVKEVTLIGG EAYLRPDWLMIASEIKNKGMRVTMTTGGYGISRGTAKRMAQAGIEAVSVSIDGLEDEHNSIRGKSDSWSQCF TTLSQFKDLGVHTGVNTTVTRKSAKDLPLLYEKLIDVGVKNWRIQLAVPMGNAADNNEMLMQPYELLDLYPL LGLLSVRGRKDDLIIQPGNNVGYFGPYERLLRGTFMQESKYSFYTGCVAGQGAIGIEADGKIKGCPSLPSEEYTG GNIRERTLKDIYENAPELNFNSQEMDDASIAHLWGNCKGCKYAKLCRAGCNWTAHVFFGKRGNNPYCHHRA LTLAASGLRERFQQRVAASGIPFDHGVFEIYEESISATSNDNVNRFTIQGINFPPSWLATDSNLRERLNSEKLNAI HQYRALGLAKAV
SEQ ID NO: 86 (80% TIGR04103, >WP_010607027)
MIVLVARYAPPIRLELLGVKNMSVNMDSAGIKIKKEARQTYAVWEITLKCNLACSHCGSRAGDSRVNELSTTEA
LDLVHQMADLGIKEVSLIGGEAFMRPDWLMIAAEITRLGMKASMTTGGFGISEGTAKRMKQAGISTVSVSID
GLEKEHDLLRGKVGAWKQCFLTIERLTNVGINVGCNTQINRYSAKQLPLLYQKLVDVGARAWQLQLTVPMG
NAADNDEMLLQPNELLDVFPLINFLSVRGRRDGLAVQAGNNIGYFGPYERQLRDNKSTHSEWAFYRGCGAG
QNTLGIEADGSIKGCPSLPTNAYTGGNIRERSLRDIYENTDELRFNDINKPEDATKHLWGECATCEFAKVCRGG
CNWTSHVFFGKRGNNPYCHHRAVKMAVRGKQERFFIREKASGDPFDHGVFDLVVEDFKPLDPQDTSVFSLA
QAQFPENWLEADPNLVRRLFTERGLVMKQYVDSGIVPKEESPWFDKTQREALMSSAVPA
SEQ ID NO: 87 (80% TIGR04103, >WP_054014533)
MTDLTNRSDIRINAAYRQTYAVWEITLKCNLACNHCGSRAGDARVDELSTSEALDLVAQMADLGIKEVTLIGG
EAFMRPDWLQIAAEITKKGMKATMTTGGYGISLGTAKRMKEAGIAAVSLSIDGMERSHDLLRGKQGAWQKC
FETIAHLREAGIPVGCNSQVNRESIAELPSLYEELLKAGISAWQLALTVPMGNAVENSHILLQPYELLDVFPLLAY
LSKRGNSEGIRVHMGNNIGYFGPYERLLKEPIASEAKWAFTRGCSAGQNAIGIEADGSIKGCPSLPSAEYTGGNI
RDRKLQDIYQNSPELRINDITTPEDATRHLWGECSSCEFASVCRAGCHWTAHVFFGKRGNNPYCHHRALKKA
AKNQRERFYVKQAAPGKPFDHGVFAIHDELCIVNEDSGQFRIDDMAIPDRWQEDGLDLLALIREEKASAIESYR
SLVN
SEQ ID NO: 88 (80% TIGR04103, >WP_055410774)
MRGRRRTYAVWELTLACNLACGHCGSRAGARRPAELSTAQALDVVAQLDAVGIDEVTLIGGEAFLRRDWLTI
AAEITRRGMGCTVTTGGYRLSAAMARGLRAAGVTQCSVSVDGMTATHDRLRGRVGSWESCFRTMERLRSA
GVEATCNTQINRLTAPELPRLYQRLRSAGVVAWQWQLTVPMGNAADHADLLLQPVELLEVFPVLARI ARRAS
QDGVRIHAGNNVGYYG PYERLLRSPEGSAFWTGCQAGLSTLGIESDGTIKGCPSLPTRDYAGGNILDRSLTDLL
RDAPELGINLTAGTAAAAENLWGFCRGCTYADVCRGGCTWTAHTFFGRPGNNPYCHHRALTHQRAGRRER
LVQTAPAPGEPFDHGLFSLLEEDLDTPWPSTERPCLTAADIRWPADWVD
SEQ ID NO: 89 (80% TIGR04103, >WP_047858470)
MPPPAVQTRTAYAVWELTLKCNLACGHCGSRAGDSRKNELSREEALDLVRQLAEVGIQEVTIEGGEAFLRPD
WLDIARAITDHGMLCTMTTGGYGLSRETARRMKEAGIAHVSVSVDGLEATHDRIRGRKGSFRFCFETLGHFRE
VGLPFSSNTQVNRLSAPELPALYERLRDAGIRAWQVQLTGPMGNGTDNAWMLLQPAELPDLYRMLARVAL
RVREESRLSLVPGNDVGYFGPYDDLLFSSSGAKVWAGCKAGLSVLGIHADGGIKACPTLPSEFVGGNIRQQPL
ADILETRELTFNVDAGTPEGIAHLWGHCASCRYAEACRGGCSQRAHVLFNRRGNNPYCHHRSLRLAESGVRE
RVVRAAPGTGLPFDHGVFELVEEPLESPWPSDDPHHFTYERVEWPPGWEAFPLPGV
SEQ ID NO: 90 (80% TIGR04103, >WP_002708735)
MSTQLRQRRTYAVWEITLKCNLACQHCGSRAGEARQDELSTAEALDLVQQMAEAGIGEVTLIGGEAFLRKD
WLEIAAEINRCGMICTLTTGGYGISAELARRIKQAGLASVSVSIDGMEASHDAQRGKAGSWKFAFESLQHLRH
AGVPITANSQANRLSAPEFPLFYEKLVEVGVGGWQIAMTVPMGNAADNSWLLLQPAELLVLHPMLAYLARR
GRREGLIMQPGNNVGYYGPYEKLLRSYGSDNDWAFWRGCKAGLALIGIEADGTIKGCPSLPTNAYAGGNIRR
HSLRDIVLNAEKMQINMSTGTEQGTDHMWGFCKSCEYAELCRGGCSWTSHVFFDKRGNNPYCHYRSLVHA
AHGIREDLRIKRNAFGLPFDNGEFEISEKALGAAWSGNEEQRLTPDRIQWPEQWLQEDSELKGFIQNEIDHNI
GNMRNYLGLTRKHKLAV
SEQ ID NO: 91 (80% TIGR04103, >SEA53645)
MTDTQTTTQYTPGERSCYAVWEITLKCNLACSHCGSRAGDARVNELSTAEALDLVHQMADVGITEVTUGGE
AFLRSDWLEIAAEIVKCGMICSMTTGGYGINLTTAQRMKAAGINQVSVSVDGMRHTHDRLRGKIGSWKYAFE
TMGHLREAGIPFGANTQVNRHSAPEFPLLFQALIDAGAKAWQIQMTVPMGNAADNSDILLQPDELLLFHPLL
ANLAKRGYPQGFYVQPGNNYGYYGPYDRMLRGFGKPTEWSFWQGCFAGLRTIGIEADGTIKGCPSLPTAAYS
GGTIRDASLATILTERDELTFNLSAGTPAATDHLWGFCKTCDFAELCRGACNWTAHVFFNRRGNNPYCYHRSL
VNAANGVRERFALRKAASGLPFDNGIFDLFAEDALSTAAADPMRFTSDKIQWPQAWLAENPKLSSALQHEVE
QNVRDMRNARLSAQI
SEQ ID NO: 92 (80% TIGR04103, >WP_012628459)
MSTQEDYYRTRYAVWELTLKCNLACQHCGSRAGQPRTQELTTAEALDLVQQLAQIGIREVTLIGGEAFLRPD
WLEIAAAIARAGMICNLTTGGYGLSLQLAQAMQRAGIAAVSVSIDGLETTHDRLRGKQGAWHSAFRTMQHL
RQVGVPFACNTQINRLSAPELPRIYEQIRDAGVYAWQIQLTVPMGHAADHWEILLQPCELLDLFPLLAQIAQW
AAQEGVRLYPGNNVGYYGPYESLLRGGGHPGAVWQGCGAGLNTLGIEADGTIKACPSLPTSAYAGGNIRDQ
PLASM MAQSEALRFNFNAGLPEGTAHLWGFCQ.TCEFAALCRGGCNWTAHVFFGRRG NPYCHHRALN LA
RQGLRERLALNIPAPGLPFDHGQFLLFQEPLNAPWPEPDPLYFTADQVQWSSSWTEQPVKVC
SEQ ID NO: 93 (80% TIGR04103, >WP_019501033)
MSYSRKSYAVWEITLNCNLACQHCGSRAGHARNAELTTSEALDLVRQMSDIGITEVTIIGGEAFMRPDWLEIA
QVISQAGMVCSMTTGGFGISLDMARRMKDAGISAVSISVDGLEGTHDRLRGRVGSWASCFRTMGHFREVG
LFFGCNTQINRYSAPELPLVYEKILEAGAKAWQIQLTVPMGNAVDNSAMLLQPYELLDLYPMLAEIAKKAKND
GLSLQPGNNIGYYG PYERLLRGRGDAWGFWQGCAAGLSTLGIEADGAIKGCPSLPTNAYTGGNVRDRTLREI
VENSAELRFNIGADTDRLWGFCETCEFAKLCRGGCTWTSHVFFDRRGNNPYCHHRALTQASRGIRERVVLKT
KAPGLPFDNGMFEMVAEPTSTELNSDDPLAFSSDRIQWPTNWHQEEVMTNS
SEQ ID NO: 94 (80% TIGR04103, >WP_019503880)
MTKKYRRVSYAVWEITLKCNLACSHCGSRAGQARTKELSTEEAFNLVRQLADVGIKEVTLIGGEAFMRSDWLE
IAKAVTEAGMICGMTTGGFGVSLETARKMKEAGIKTVSVSIDGGIPETHDRQRGKKGAWHSAFRTMSHLKEV
GIYFGCNTQINRLSASEFPIIYERIRDAGARAWQIQLTVPMGNAADNADMLLQPYELLDIYPMLARVAKRAKQ
EGVRIQAGNNIGYYG PYERLLRGSDEWTFWQGCGAGLNTLGIEADGKIKGCPSLPTAAYTGGNIRDRPLREIV
EQTEELKFNLKAGTEQGTDHMWGFCKTCEFAELCRGGCSWTAHVFFDRRGNNPYCHHRALKQAQKDIRERF
YLKVKAKGNPFDNGEFVIIEEPFNAPLPENDLLHFNSDHIQWPENWQNSESAYALAK
SEQ ID NO: 95 (80% TIGR04103, >WP_015143117)
MTYRRTSYAVWEITLKCNLACSHCGSRAGHTRAKELSTQEALDLVRQMADVGIIEVTLIGGEAFLRPDWLQIA
EAITKAGMLCSMTTGGYGISLETARKMKAAGIASVSVSIDGLEETHDRLRGRKGSWQAAFKTMSHLREVGIFF
GCNTQINRLSAPEFPLIYERIRDAGARAWQIQLTVPMGRAADNANILLQPYELLDLYPMIARVARRARQEGVQ
IQPGNNIGYYGPYERLLRGRGSDSEWAFWQGCAAGLSTLGIEADGAIKGCPSLPTSAYTGGNIREHSLREIVEES
EQLRFNLGAGTSQGTAHLWGFCQTCEFSELCRGGCTWTAHVFFNRRGNNPYCHHRALFQAEQGIRERVVPK
VEAQGLPFDNGEFELIEEPIDAPLPENDPLHFTSDLVQWSASWQEESESIGAVVD
SEQ ID NO: 96 (80% TIGR04103, >WP_017748600)
MTYHRKSYAVWEITLKCNLACSHCGSRAGNARSEELSTEEALDLVRQMAEVGIKEVTLIGGEAFLRPDWLEIAK
AISEAGMLCGMTTGGYGISLETAGKMKAAGIRTVSVSIDGLEETHDRLRGRKGSWKYAFKTMSHLREVGIAFG
CNTQINRLSAPEFPCIYECIRDAGARAWQIQLTVPMGNAADNADILLQPHELLDIYPMLARVARRAYQEGVQL
QAGNNIGYYGPDERLLRGRGSEHEFSFWQGCGAGLSTLGIEADGAIKGCPSLPTTAYTGGNIRERSLYDIIENSA
ELRLNLGAGTPEGTKHLWGFCKTCEYAELCRGGCSWTAHVFFDRRGNNPYCHHRALVHEERGLRERVIPKVR
AQGLPFDNGEFELIVEPINTPLPENDPLNFSADRIQWSESWQNKPEVSYSLAEQ
SEQ ID NO: 97 (80% TIGR04103, >WP_046277258)
MAYRRTSYAVWEITLKCNLACSHCGSRAGQAREQELSTHEALDLVQQMAEVGITEVTLIGGEAFLRPDWLQI
AEAINRAGMRCTMTTGGYGISLETAEKMHRAGIATVSISVDGLEATHDRLRGRPGSWQWAFKTMGHLKQV
GIPFGCNTQINRLSAPEFPRIYEKIRDAGARAWQIQLTVPMGNAADNSEILLQPCELLAVYPMLARVAQQAKK
DGVQIQPGNNIGYYGPYERLLRGRGQEDDWTFWQGCNAGLSTLGIEADGAIKGCPSLPTKAYTGGNIRERPL
REIIEATEELRFNLNAGTAEGMAHLWGFCQTCEYAELCRGGCTWTAHVFFNRRGNNPYCHHRALAHADQGK
RERVVPKVQAVGLPFDNGEFELVEEPLNAPWVESDPLHFTADQIQWPEHWHEQPSKLLSHK
SEQ ID NO: 98 (80% TIGR04103, >WP_017740073)
MSKEYQRISYAVWEITLKCNLACSHCGSRAGQARTKELSTEEALDLVQQLAEVGIKEVTLIGGEAFLRPDWLEIA
KAITQAGMLCGMTTGGYGISLDMARRMKEAGISMVSVSTDGMEATHDHLRGRKGSWKSGLRTMGYLKEV
GIPFGCNTQINRLSAPEFPLIYEHIRDAGACAWQIQLTVPMGNAADNADILLQPSELLDIYPMLARVTQRANRE
GVKVRAGNNIGYYGPYERLLRGGGNEWTFWQGCGAGLSTLGLEADGAIKGCPSLPTAAYTGGNIRERSLREII
EQTEELRFNLGADTPQGTEHLWGFCKTCKFAELCRGGCTWTAHVFFNRRGNNPYCHHRALEQAKHGIRERV
YPKVRAQGLPFDNGEFALIEEPLDAPWPLDDPLHFTADRIQWSNSWQEEPEYTYSLAR
SEQ ID NO: 99 (80% TIGR04103, >WP_070390932)
MTYRRISYAVWEITLKCNLACQHCGSRAGHTRAKELSTTEALDMVRQLADVGITEVTLIGGEAFLRPDWLDIA
KAITSAGMVCGMTTGGFGISLDTARRMKDAGIRVVSVSVDGLEATHDRLRGRKGSWQWAFKTMSHLKEAGI
PFGCNTQINRLSAPEFPQIYERIRDAGVFAWQIQLTVPMGNAADNSEILLQPYELLDVYPMIARVARRAFREG
VKVQAGNNIGYYGPYERLLRGRGEDNPWAFWQGCNAGLSSLGIEADGAIKGCPSLPTSAYTGGNIRDHSLREI
IEETEELRFNLGADTPKGTDHLWGFCKSCEFAQLCRGGCSWTAHVFFDRRGNNPYCHHRALTQEKQGIRERV
EIKHRAEGNPFDNGEFVLIEEPIDAPWPDNDPLHFSADRILWPKGWQEQEELVPSLASSSS
SEQ ID NO: 100 (80% TIGR04103, >WP_013652855)
MDETVRLARLYDGDPAKLLRTWAGDSPPRIVVWELTLACDLGCRHCGSRAGKARRDELDTQEALDVVRQLA
DLGVAEVILIGGEVYLRDDWFLIAAAVTQAGMTCSLVTGGRGFDAGVVDEALAAGVRIVGVSIDGLPATHDRL
RGVPGSYEAAIATARRIAATGRLTLSVNTQINRLSLPELRAVAERVVELGAVAWQIQLTVALGRAADRPDLLLQ
PWHLLELFPQLVAIKKEILEPGGVQLFPGNNIGYFGPFEAELRYGGDAGHTWMGCGAGRAALGLEADGKLKG
CPSLPTVPYTGGNVRDTPIAELWAHAPEISALGRRTTDDLWGFCGTCRHAAVCKAGCTWTAHALFGRPGNN
PYCHHRAWSLAQTGLRERVVLVEKAPGRPFDHGRYEIVVEPLDAETPDEERLAPVPRARAAALFGLRADAPSA
WSGEDLVEATRSARR
SEQ ID NO: 101 (80% TIGR04103, >WP_068184797)
MPHPPTYDGNPQQLARWRPEHDAPPSIAVWEITLRCDLGCCHCGSRAARARPDELSTTEALDLVRQFADLGL
KEVTLIGGEFYMRDDWDRIAAEINRCGMLCSIVTGARQMTAERVSRAVAAGVGKISISIDGLERTHDAIRGSK
GSWKAATAAARRISDSGIDLSVNTQMNRLTMPELPAVADMLVDIGARSWMVILTAAMGRAADHPSLLLQP
YHLLYLFPLLADIKREKLDPNGIAFFPGNNVGYFGPLAETLRYGSELGHMWGGCGAGDSTLGIEADGRIKGCPS
LPTSDYVRGNIRERPLREIAAELKREKTEAPTQLWGFCQSCRYAARCKGGCTWTSHVLFGRPGNNPFCHFRAL
TMAEAGLVERLEPVAVAPGKPFDFGHCRIVEAPFGPDIESDPLIGMTKLSQVFGLNAGAAGLWSKQELSNTLE
YRQSDKT
SEQ ID NO: 102 (80% TIGR04103, >WP_015929579)
MHRPSDEPTYDGDPRSLARWRPGGSAPPSHAVWEITLRCDLGCRHCGSRAGRARRDELSTDAALDVVAQLA
DLGLREVTLIGGEFYLREDWDRIAAAITRRGMLCSIVTGARQMTRARIARAVAAGVGKISLSIDGLEQTHDSVR
GSAGSWQAAVTAGRRIASSGIDLSVNTQINRLTMPELPGVADLLVEIGARSWMVILTAAMGRAADRRALML
QPYHLLHLFPLLAAIKRERLDPAGIAFFPANNIGYFGPLAETLRYGAEGGHAWAGCDAGVASLGIEADGRLKGC
PSLPSADYTMGNVRDHSLAQLWAKRTPNRPIAAAEDLWGFCWTCPHATRCRGGCTWTSHVLFGRRGNNPF
CHYRALALAERGFAEAIEPVSVAPGEPFDFGRHRIVELPLPTALTDDPVIERTLASHVFGLRPGTASVWSPDERE
EAVI
SEQ ID NO: 103 (80% TIGR04103, >SFE54945)
MSLRDNRRRLPVVASLPRPDDRRALTRVAGEAPRPRYAVWELTLKCDQKCIHCGSRAGVARHGELTTAEALA LVTDLRALGIGEITLIGGEAYLRDDFILIARAIRNAGMDCTMTTGGLNLGEARVAALAEAGIRSVSVSIDGTQAA HDALRGVPGSFDRAFAALARLRAAGVGRAVNTQINRLTLPTLEALQERLIAEKIGGWQLQITAPFGNAADHPEI LLQPFMMLEVFAVIERLIARGAPHGLRLFPANNLGYFGPLEAELRGRQKAGGHYKGCIAGRHALGIEADGTIKG
CPSLGGPANVAGNVRERPLREIWEHAPELQFTRVRTVDDLWGYCRDCYYNDVCMAGCSATSEPLLGRPGNN PYCHHRALELGRAGLRERIEAVAPAPGVPFDHGLYRVVREALDPELAARGPVAVDDPRVGRDVQPFGPGAPV G
SEQ ID NO: 104 (80% TIGR04103, >SFE62827)
MGIREERRRLPIVALAPPTRSRRALPIAAGQAPVPRIVVWEFTSACDQHCAHCGPRSGKRRPDELTTEEALRLV
DELAAAGVGEVTLIGGEAYLRPDVLRIVRAIRERGMSCTMTTGGYSLTREIAEALVEAGVQSVSVSIDGLAACH
DALRGRPNSFARAFAALRHLKAAGSQISANTQLNAKTLPDLEGLLELLAAEGIHSWQVQVTMAHGAAADHPE
ILLQPYQMIAAYEVVERLLARCEALGIRLYPGNSLGYFGPLEHRLRRNSTQRGHYFGCQAGISGAAVSSHGEVK
SCPSLGEEGVGGSWREHGFAALWERAPEIVYMRQRTRAELWGLCASCYYAAVCMGGCTSMSEPLLGRPGN
NPMCHHRALELDRQGLRERIEPVRPAPGQPFDHGLFRLILEHKDPELRALHGPLTIEEPRRSRVDEPRGPGSPL
A
SEQ ID NO: 105 (80% TIGR04103, >SFE67100)
MLPPPRGAVQRPALAVWEFTRACDQRCKACGPRAGVARPDELTTDEALRLVDELAELGVGEVALIGGEAYLR
ADVLWVIRRIRERGMSCSLTTGGLGLTQTRAEALVEAGLQLVSVSIDGLEASHDALRGTPGGWRRCFEALAHS
RRAGARIAANTQINRLTWRELLPLCDLLADAGAEVWQMFLTMPHGNAADHPELLLQPFELLELFPELERVIAR
CAARRIRFWPGNNLGYFGPLEGKLRRLQQEDGHYKGCSAGRTG LGIEADGTIKSCPSVGGAVNAGGNWRDH
GLRALWERAPEIRYVEQRGLDSLWGYCRECYYADTCMGGCTAMSEPLLGRPGNNPYCHHRALEMDRMGLR
ERVEQVAGADDQAFAHGLFRWREPKAAGAG PVTIEGPRTGREAAFFGPGAPLAVAGADEP
SEQ ID NO: 106 (80% TIGR04103, >SFF23105)
MSLSDVRRRLPVVASLPAPANRWLTHEDRREAKAPRWAVWELTLACDQHCAHCGPRAGHKRPDELSTEECL
KVVRELAELGCGEVVLIGGEAYLRNDFILIIRAIREAGMACTMTTGGLNLTQERAEAMIEAGIGSVTFSIDGLEAT
HDRLRGVQGSWQRAFAAMRRIRAAGGKIASNTQINALTRHELLPLFELLADEGIHSWQLQITVPHGNAADHP
EILLQPHMFIDIFATLEQVLDRCEARKVRLWPGNNLGYFGPLERRLRQSQRKHWRGCTAGVSVMGIESDGAIK
NCPSLGGGTNIGGNWRVHGVKKVWEESYQLGYIRARTVDDLWGYCRECYYAETCMAGCTAAAEPLLGRPG
NNPFCHHRALMMDRAGLRERIEQIRGAGGKSFDNGLFRVIREHVDPELREKHGPVAIEEPRVSRLEEPYGAGH
TVAL
SEQ ID NO: 107 (80% TIGR04103, >WP_006972642)
MRLKEVRKRLPVVDSLPKGRGRRFRTHEAEGPVPRPALAVWEFTLACDHRCLHCGPRAGEARPNELTTDEAL
QLVDELAEAGVGEVVLIGGEAYLRNDFLLVIRRIRERGMTCTMTTGGLGLTKTRAEAMVEAGIQSVSVSIDGLE
AAHDKLRNRPGSWEHAFEALRNLRNAGSRVAVNSQINQINLGDHIHLLELIADEGVHSWQLQITVAHGNAA
DNADIILQPYMFLELFDQLDAIIDRAFERRVRIWPANNLGYFGPFEHKLRKSQKAHYRGCSAGRSTIGIESDGNI
KNCPSLGGPANIGGSWREHGLAKIWKEAAEITYIRRRTVDDLWGYCRECYYAETCMSGCTAANEPLLGRPGN
NPFCHHRALEMDRMGMRERIEPFIPAKGVPFDNGLFRLIREWKDPARREAEGPVEVTEPRVSRLIDEMGSGR
AIRMDELVDGRAPFELKDH
SEQ ID NO: 108 (80% TIGR04103, >KIG15048)
MRLKDARKRLPVVTSPLPKGRGRKFMTNEDAPRPALAVWEFTLACDHRCLHCGPRAGEPRADELSTDEALRL
VDDLAEAGVGEVVLIGGEAYLRNDFLLVIRRIRERGMTCTMTTGGLGVTKTRAEAMVEAGIQSVSVSIDGLEP
AHDRLRNREGSWKRAFEALANLRAAGAKVSVNSQINQVNFGDHEPLLELIAKAGAHSWQLQITVAHGNAAD
NADIILQPYRFLELFEQLDRIADRAHELKVRIWPANNLGYFGPFEAKLRRYQKLHYRGCSAGKSTIGIESNGMLK
NCPSLGGPANVAGSWREHGFDPIWQGAPEMTYIRRRTIDDLWGYCRECYYATTCMSGCTAANEPLLGRPG
NNPFCHHRAIEMDRMDMRERIEPVAAAQGVPFDNGLFRIIREHKDPARREAEGPIEITEPRVSRLVEEMGSGR
PIRADELPDGRVPFEK
SEQ ID NO: 109 (80% TIGR04103, >WP_053236092)
MTHGSIRDPRVTPIGETGLRYVVWELTLRCDLACRHCGSRAG KAREDELSTDEALDVVRQLASMGAREVVLIG
GEAYLRDDWTIVARAIADAGMRCAMVSGGRGLDATRARAAREAGVASVSISIDGIGATHDVQRGLDGAFES
ARVAMRNLRDAGVTLQANTQVNRLSYPELDAILDLLVEERATGWQLAMTVPMGRAADRPDWLLQPHELLE
VYPKLAALAERGARHGVLFFPGNNIGYFGPHEATLRGRGITDDVAWGGCIAGKHAMGIESDGSIKGCPSLPSA
DWVGGTAREASLREIWEQTRELRYVRDRELPGALWGECARCYYASVCGGGCTWTAHTFFGRPGNNPYCHH
RALEMRARGERERLVRVEAAPGAPFDHGRWEIVVEPWVEGEGVARVERPSKRLRVL
SEQ ID NO: 110 (80% TIGR04103, >WP_006969608)
MSTSSDSPARRGPSLPVLDGRGRSDGKLRLPLAEQTRECDTLARPEYAVWELTLRCDLACRHCGSRAGKARPD
ELSTEEALELVTQMAEMGVQETTVIGGEAYLRADWHRIARALTDAGISTTMTTGGRGLDPERVALAKAAGIQ
SVSVSIDGLEAEHDYQRNLKGSYAAAMAALDNLAAAGIPRSVNTQLNGSNLRDIEALLEVIATKGIHSWQIQIT
VAMGRAADHPELLLQPWQMIELMPMAARIARRCRELGIRLWPGSNVGYFGPYEALLRWDHPDGHQTGCE
AGTRTLGIEANGDIKGCPSLPTADYVGANVRDHSLRAIWERSSALRFNRERGTEELWGRCASCYYAPICKAGCT
WTGHVLFGRRGNNPYCHHRALELLREGRRERVELREAASGDPFDHAIYELIEEPWPEPELSRARAVAESGEG
WIR
SEQ ID NO: 111 (80% TIGR04103, >KIG11737)
MSRPSLPIVDASRPKPRVRLPLAPGVRSCDDVRPEYAVWEVTLRCDLACRHCGSRAGHARADELDTEEALDLV
TQMAALGVKETTIIGGEAYLRDDWHLIAAALVNAGIRCTMTTGGRGLTAERVEIAKRAGIESVSVSIDGLAQAH
DHLRALHGSHAAAMRALDHLRAAGIPRSVNTQLNGYNLREIEPLLDQLTAREIHSWQVQITVAMGRAADHPE
LLLQPWQMLELMPLVARLARRCDELGIRLWPGSNIGYFGPYEQLLRWDHRDGHQTGCDAGTRTLGIEANGD
IKGCPSLPSNEYVGGNVREHSLREIWERADALRFNRERRVDELWGRCAGCYYADECKAGCTWTGHVLFGRR
GNNPYCHHRALELLREGRRERLELHTPAPGEPFDHGLYRVIEEPWPAELIDRAREVAASGVGWIS
SEQ ID NO: 112 (80% TIGR04103, >AKV02060)
MELVACSDPTVSIRDPLLAAERAELLRAKRPRVGLPTISKPRRPLPVLSEPRQRDRSIRPRHAVWEITLRCDQAC
RHCGSRAGVERPNELTTEECLDLVRQIAELGVMEVTLIGGEAYLRPDFVQIVRAIRSHGMHCTMTTGGRGLSP
TLAREAAAAGLGSASVSIDGAEETHDRLRGAKGSHRDAIAAMRALREAGVRLTVNTQINRLSLVDLPSILEMLV
REGAEAWQIMLTVAMGRAADEPDVLLQPYDLLDLFPLLDTLAARCEEHGVRLYPGNNLGYFGPYESRLRGTLP
RGHGTSCSAGRGTLGIESDGLVKGCPSLPSEQWGGGTVRDHSLVDLWERASALRYTRDRTVEDLTGFCRTCY
YADICRAGCTWTTSVLFGRPGDNPYCHHRALERDREGLRERLVRRQPAPGEPFDHGIFELIVEPVPGEKEANS
SEQ ID NO: 113 (80% TIGR04103, >AKU99181)
MEKFDPLTAKARAKELQRERPKRALPIAPTAPLGVRHRPLREPRDVDRRYRPIYAVWEITLACDLACRHCGSRA
GRERPDELDTKEALDLVGQMASLGVKEVTLIGGEAYLRGDWLDIVRAIRAHGMVATMTSGGRGLTPELVAQ
AHEAGLVGASISLDGDEVTHDRLRGVKGSYRAAIEALRALRERNMRVSCNSQINRLSVPYLDFILESIAAIGVHS
WQIQLTVPMGRAADEPDVLLQPYDLLELFPRLAELKKRCDELAVRMLPGNNIGYFGPYESTLRGYHVSGHAGS
CGAGRATLGIEANGAIKGCPSLPTEHWTGGNVRDASLLDIWERAEPLRYTRDRTVDDLWGFCRTCYYAEECAS
GCTWTSFVTLGKAGNNPYCHHRALELEKRGKRERVVRVQSAPGEPFDHGVFALVEEDLESGSETL
SEQ ID NO: 114 (80% TIGR04103, >AKF04999)
MSAAERDALLRAPRPRSLPLAPMAEPARRRLPLADDARAIDRRVRPIYAVWEITLACDLACRHCGSRAGRDRP
DELSTEQCLDLVDQMAELGVKEVSLIGGEAYLRDDWTDIIRRIRARGMMAILTTGGRGITPERARDAAAAGLQ
SASVSIDGDEATHDRLRGVVGSYRAALEAMKNLRAAGVQVSANTQINRLSAPDLPSVLETIADHGAHSWQIQ
LTVAMGRAADEPEVLLQPYDLLEVFPMLARLAERCRERGVRLWPGNNVGYFGPYEHALRGTLPKGHMYSCG
AGRSTLGVEADGSI KGCPSLPTLSWTGGN I RDAKLVDIWERGGTAMRYTRDRTVDDLWGYCRTCYYADECRA
GCTWTGFVLFGRAGNNPYCHHRALEMQRAGKRERVVKVESAPGQPFDHARFELIVEDVSGEETRDAETV
SEQ ID NO: 115 (PlpA2 gene, Gene ID: 2509711952)
ATGTCTATCGAAAACGCAAAATCATTTTATGAACGAGTATCTACGGATAAGCAATTTCGTACTCAACTCGA
AAATACAGCTTCAGCAGAAGAACGTCAGAAAATTATCCAAGCTGCTGGTTTTGAATTTACAAATCAAGAA
TGGGAAATTGCTAAAGAGCAGATTTTAGCAACATCTGAATCTAATAATGGTGAATTAAGTGAAGCAGAAT
TAACGGCGGTAAGTGGTGGAGTTGATTTATCTATTTTTGAGTTATTAGATGAAGAACCATTATTTCCTATC
AGACCTCTGTATGGTTTACCAATATAG
SEQ ID NO: 116 (PlpA2 protein)
MSIENAKSFYERVSTDKQFRTQLENTASAEERQKIIQAAGFEFTNQEWEIAKEQILATSESNNGELSEAELTAVS
GGVDLSIFELLDEEPLFPIRPLYGLPI
SEQ ID NO: 117 (PlpA3, Gene ID: 2509711953)
ATGTCTATTGAAAGTGCAAAAGCTTTTTACCAAAGAATGACCGATGATGCATCCTTTCGCACACCATTTGA
AGCAGAATTATCTAAAGAAGAACGCCAACAACTAATTAAAGACTCTGGCTATGATTTCACTGCCGAGGAA
TGGCAGCAAGCGATGACAGAAATTCAAGCTGCTAGGTCTAATGAGGAATTGAATGAAGAAGAACTTGAA
GCGATCGCAGGTGGTGCTGTAGCAGCAATGTATGGCGTAGTTTTTCCTTGGGATAATGAATTTCCTTGGC
CTAGGTGGGGGGGATAA
SEQ ID NO: 118 (PlpA3 protein)
MSIESAKAFYQRMTDDASFRTPFEAELSKEERQQLI KDSGYDFTAEEWQQAMTEIQAARSN EELN EEELEAIA
GGAVAAMYGVVFPWDN EFPWPRWGG)
SEQ ID NO: 119 (PlpX gene, Gene I D: 2509711951)
ATG ACTA A AA A ATAC AG ACG AGTT AGTTATG C AGTTTG G G A AATT ACCTTG AA ATG C A ATCT AG CTTGTA
GTCACTGTGGTTCGAGAGCAGGGCAGGCAAGAACCAAGGAACTATCTACAGAAGAAGCTTTTAATCTGG
TTCGGCAACTAGCCGATGTAGGAATCAAAGAGGTTACTCTAATCGGTGGCGAAGCCTTTATGCGCTCTGA
TTGGCTAGAAATTGCCAAGGCTGTTACTGAGGCAGGGATGATCTGCGGTATGACTACAGGTGGATTTGG
TGTCAGTTTGGAAACTGCCAGAAAAATGAAAGAAGCTGGAATTAAAACAGTTTCTGTATCTATCGATGGT
GGCATACCAGAAACCCACGATCGCCAGCGAGGGAAAAAAGGTGCTTGGCATTCTGCTTTTAGAACCATG
AGCCATCTAAAAGAAGTCGGCATCTATTTTGGCTGCAATACCCAGATAAACCGTTTATCTGCCTCTGAATT
CCCAATAATTTACGAACGAATAAGGGACGCTGGAGCAAGAGCTTGGCAGATTCAATTAACTGTACCTATG
GGTAATGCGGCAGATAATGCAGACATGTTATTGCAACCATACGAACTATTAGATATTTATCCCATGTTAGC
TCGTGTTGCTAAACGAGCTAAACAGGAAGGTGTTCGCATACAGGCGGGAAATAATATTGGCTATTATGGC
CCTTATGAAAGACTGCTGCGTGGTAGTGATGAATGGACATTTTGGCAGGGTTGCGGAGCGGGTTTAAAT
ACCTTG G GTATCG A AG CTG ATG G C A A AATTAA AG GTTGTCCTTCTTTACCT ACGG CTG CTT ATACG G G CG
GTAATATCCGCGATCGCCCTTTAAGAGAAATAGTCGAACAGACTGAAGAGCTTAAATTTAATCTGAAGGC
TGGGACTGAACAGGGCACAGACCACATGTGGGGATTTTGTAAAACCTGTGAATTTGCTGAACTCTGTCGA
G GTGGTTGTTCTTG G ACGG CTC ATGTCTTCTTTG ATCGCCGTGG G AATAATCCCTACTG CCATCATCGTG C
TTTGAAACAGGCACAAAAAGACATCAGAGAAAGATTCTATTTAAAAGTAAAAGCAAAAGGGAATCCTTTT
GATAATGGGGAATTTGTCATTATAGAAGAACCTTTCAACGCACCTTTGCCAGAGAACGATTTGCTTCATTT
TAATAGCGATCACATTCAGTGGCCAGAAAACTGGCAAAATTCTGAATCTGCTTACGCTTTAGCAAAGTAA
SEQ ID NO: 120 (PlpX protein)
MTKKYRRVSYAVWEITLKCN LACSHCGSRAGQARTKELSTEEAFNLVRQLADVGI KEVTLIGG EAFM RSDWLE
IAKAVTEAGMICGMTTGGFGVSLETARKMKEAGI KTVSVSI DGGI PETH DRQRGKKGAWHSAFRTMSH LKEV
GIYFGCNTQI N RLSASEFPI IYERI RDAGARAWQIQLTVPMGNAADNADM LLQPYELLDIYPM LARVAKRAKQ
EGVRIQAGN IGYYG PYERLLRGSDEWTFWQGCGAG LNTLGI EADGKI KGCPSLPTAAYTGGN I RDRPLREIV
EQTEELKFNLKAGTEQGTDH MWGFCKTCEFAELCRGGCSWTAHVFFDRRGN N PYCHHRALKQAQKDI RERF
YLKVKAKG NPFDNGEFVII EEPFNAPLPEN DLLHFNSDH IQWPENWQNSESAYALAK
SEQ ID NO: 121 (PlpY gene, Gene ID: 2509711950)
TCATGTCAGAAAATTGCTAATTTCAGCAATTTCTAAACTTTGTTTTGCTTTAATTAATGCTTTGATAAAAGCT TGCTTGTCTTGCCAACCCTGACGAGGTAAAACCGAGCTGGAATCATCTGATTTTTGAGCTGCTGTGGCTAC TTTATTTG GT ATTTG ATTAG AGTTC AT SEQ ID NO: 122 (PlpY protein)
M NSNQI PNKVATAAQKSDDSSSVLPRQGWQDKQAFIKALI KAKQSLEIAEISN FLT
SEQ ID NO: 123 (PlpA2 core peptide)
VDLSIFELLDEEPLFPI RPLYGLPI
SEQ ID NO: 124 (PlpA3 core peptide)
AVAAMYGVVFPWDN EFPWPRWGG
SEQ ID NO: 125 (PlpA3-9 core peptide)
AVAAMYGVV
SEQ ID NO: 126 (PcpA core peptide)
VTGGSG IYGPIQAMYGAVVGDPKPG KDWGWRFPSPLPKPSPI PSPWKPPV DVQPMYGWVSN DS SEQ ID NO: 127 (PlpA3 core peptide)
AVAAMYGVV FPWDN EFPWPR SEQ ID NO: 128 (PlpA3 digested)
AVAAMYGVV FPW
SEQ ID NO: 129 (PlpA3 mutated M5G)
AVAAGYGVVFPWDN EFPWPR
SEQ ID NO: 130 (PlpA3 mutated M5A)
AVAAAYGVVFPWDNEFPWPR
SEQ ID NO: 131 (PlpA3 mutated M5V)
AVAAVYGVVFPWDNEFPWPR
SEQ ID NO: 132 (PlpA3 mutated M5L)
AVAALYGVVFPWDNEFPWPR
SEQ ID NO: 133 (PlpA3 mutated M5S)
A V A AS YG VV FPWDNEFPWPR
SEQ ID NO: 134 (PlpA3 mutated M5P)
AVAAPYGVVFPWDNEFPWPR
SEQ ID NO: 135 (PlpA3 mutated M5E)
AVAAEYGVVFPWDNEFPWPR
SEQ ID NO: 136 (PlpA3 mutated M5Q)
AVAAQYGWFPWDNEFPWPR
SEQ ID NO: 137 (PlpA3 mutated M5F)
AVAAFYG VV FP W D N E F PW P R
SEQ ID NO: 138 (PlpA3 mutated M5K)
AVAAKYGVVFPWDNEFPWPR
SEQ ID NO: 139 (PlpA3 mutated 4A5)
AVAAAMYGVVFPWDNEFPWPR
SEQ ID NO: 140 (PlpA3 mutated 7A8)
AVAAM YG AVVF P W D N E F P W PR
SEQ ID NO: 141 (PlpA3 mutated A4d)
AVAMYGVVFPWDNEFPWPR
SEQ ID NO: 142 (PlpA3 mutated V8d)
AVAAMYGVFPWDNEFPWPR
SEQ ID NO: 143 (PlpA3 mutated Y6F)
AVAAM FGVVFPWDNEFPWPR
SEQ ID NO: 144 (PlpA3 mutated Y6W)
AVAAMWGVVFPWDNEFPWPR
SEQ ID NO: 145 (PlpX_F primer)
CTCGCATATGACTAAAAAATACAGACGAGTTAGTTAT
SEQ ID NO: 146 (PlpX_R primer)
ATCTCTCG AGTTACTTTG CTA A AG CGTA AG C AG A
SEQ ID NO: 147 (PlpY_F primer)
GCGAACTCATGA ACTCTAATCAAATACCAAATAAA
SEQ ID NO: 148 (PlpY_R primer)
G CG C AG CTGTTATGTC AG AA A ATTG CT
Claims
1. Use of a radical S-adenosyl methionine (rSAM) enzyme in a method for introducing at least one a-keto-IS3-amino acid into (poly)peptide substrates comprising one or more of amino acid motif XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine.
2. Use according to claim 1, wherein the rSAM enzyme is a nifll-class peptide radical SAM maturase 3.
3. Use according to claim 1 or 2, wherein the rSAM enzyme comprises
(A) an amino acid sequence according to Formula (I) (SEQ ID NO: 1) or (II) (SEQ ID NO : 2)
FormulS (I); Xi-X2~X3~X4~X5~X6~X7~X8~^9~^10~^ll~^12~^13~^14~^15~^16~^17~^18~^19~^20/
Formula (II): Z1-Z2-Z3-Z4-Z5-Z6-Z7-Z8-Z9-Z1o-Z11-Z12-Z13-Z14-Z15-Z16-Z17-Z18-Z19-Z2o, wherein Χι-Χ20 and Zi-Z20 each denote amino acids and
Xi is selected from the group consisting of Y, H, F and W, preferably Y and H;
X2 is selected from the group consisting of Y, R and H;
X3 is selected from the group consisting of R, K and Q, preferably R;
X4 is selected from the group consisting of I, T and V, preferably I and T;
X5 is selected from the group consisting of R, S and K, preferably R and S;
X6 is selected from the group consisting of H, Y, F and W, preferably H and Y;
X7 is selected from the group consisting of A and S, preferably A;
X8 is selected from the group consisting of V, I and L, preferably V;
X9 is selected from the group consisting of W, Y and F, preferably W;
Xio is selected from the group consisting of E, Q, D and K, preferably E;
Xii is selected from the group consisting of I, L, V and M, preferably I and L;
Xi2 is selected from the group consisting of T and S, preferably T;
Xi3 is selected from the group consisting of L, M, I and V, preferably L;
Xi4 is selected from the group consisting of K, R, E and Q, preferably K;
Xi5 is C;
Xi6 is selected from the group consisting of N and D, preferably N;
Xi7 is selected from the group consisting of L, M, I and V, preferably L;
Xis is selected from the group consisting of A and S, preferably A;
X20 is selected from the group consisting of S, Q, E and K, preferably S and Q;
Zi s selected from the group consisting of T, D, T, E and N, preferably T and D; Z2 s selected from the group consisting of R, P, N and A, preferably R, P and A; Z3 s selected from the group consisting of R, Q, K and L, preferably R;
Z4 s P;
z5 s selected from the group consisting of A and S, preferably A;
z6 s selected from the group consisting of R, K and Q, preferably R;
z7 s selected from the group consisting of Y, F, H and W, preferably Y;
z8 s selected from the group consisting of L, M, I and V, preferably L;
z9 s selected from the group consisting of F, H, S and Y, preferably H, F and S;
Zio s selected from the group consisting of D, E and A, preferably D and E;
Zii s selected from the group consisting of D, S and T, preferably T;
Zl2 s selected from the group consisting of D, E and N, preferably D;
Zl3 s selected from the group consisting of Y, F, L and M, preferably Y, F and L;
Zl4 s selected from the group consisting of K, Q, R and E, preferably Q and K;
Zl5 s selected from the group consisting of R, K and Q, preferably R;
Zl6 s selected from the group consisting of Y, F and W, preferably Y and F;
Zl7 s selected from the group consisting of V, I and L, preferably V; Zl9 s selected from the group consisting of V, I and L, preferably V; and
Z20 s selected from the group consisting of H and Y, preferably H;
or
an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with at least one of the amino acid sequences of Formula (I) or (II), more preferably an amino acid sequence having at least 14, 16, 18 or 19 of the 20 amino acids of Formula (I) or (II);
(C) a functional fragment and/or functional derivative of (A) or (B), preferably a functional fragment of at least 10 amino acids, more preferably at least 15 amino acids of (A) or (B).
4. Use according to any of claims 1 to 3, wherein the rSAM enzyme comprises at least one motif selected from the group consisting of
(i) motif CXXXCXXC (SEQ ID NO: 3);
(ii) motif EXTXXCXXXCXXCGX XXXX XXEL (SEQ ID NO: 4);
(iii) motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5); and
(iv) motif CX9.15GX4C (SEQ ID NO: 6) and CX2CX5CX3CX14.18C (SEQ ID NO: 7),
wherein is X is any natural amino acid and wherein the integers denote the number of X(s).
5. Use according to any of claims 1 to 4, wherein the rSAM enzyme comprises
(i) an amino acid sequence according to Formula (I) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (I);
(ii) a motif EXTXXCXXXCXXCGXRXXXXRXXEL (SEQ ID NO: 4), wherein X is any natural amino acid;
and wherein the rSAM enzyme further comprises
(iii) a motif CXXGXXXXXXXXXGXXKXCP (SEQ ID NO: 8) and/or
GXCXXCXXXXXCXXXCXXXXXXXXXXXGXNPXCXXR (SEQ ID NO: 9), wherein X is any natural amino acid.
6. Use according to any of claims 1 to 4, wherein the rSAM enzyme comprises
(i) an amino acid sequence according to Formula (II) or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequence of Formula (II);
(ii) a motif CXLXCXHCGSRAGXXXXXE (SEQ ID NO: 5), wherein X is any natural amino acid; and wherein the rSAM enzyme further comprises
(iii) a motif CXAGXXXXXEADGXXKXCPXL (SEQ ID NO: 10) and/or
CXXCYYXXXCXXGCXWXXXXLXGXXGXNPXCXXR (SEQ ID NO: 11), wherein X is any natural amino acid.
7. Use according to any of claims 1 to 6, wherein the rSAM enzyme comprises an amino acid sequence selected from the group consisting of
(i) sequences listed in any of SEQ ID NOs: 12 to 54, preferably SEQ ID NOs: 39 and 40, or an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 12 to 54; and
(ii) sequences listed in any of SEQ ID NOs: 55 to 113, preferably SEQ ID NOs: 93 and 94 or an amino acid sequence having an amino acid sequence identity of at least 80 %, preferably at least 90 or 95 % with the amino acid sequences in any of SEQ ID NOs: 55 to 113.
8. Use of a recombinant vector comprising a nucleic acid encoding an rSAM enzyme as defined in any of claims 1 to 7 in a method for introducing at least one a-keto^3-amino acid into
(poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non- natural amino acid, Y is tyrosine and G is glycine, preferably a viral or episomal vector, more preferably a vector selected from the group consisting of lentivirus vector, adenovirus vector, baculovirus vector, bacterial vector and yeast vector.
9. Use of a host cell expressing an rSAM enzyme as defined in any of claims 1 to 7, preferably comprising a recombinant vector as defined in claim 8, in a method for introducing at least one a-keto^3-amino acid into (poly)peptides comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, preferably a host cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae cells, and Pichia pastoris cells; bacterial cells, preferably E. coli cells, and Bacillus subtilis cells; plant cells, preferably Nicotiana tabacum, and Physcomitrella patens; NIH-3T3 mammalian cells; and insect cells, preferably sf9 insect cells.
10. Use of a radical S-adenosyl methionine (rSAM) enzyme according to any of claims 1 to 7, a vector and/or host cell according to any of claims 8 or 9 in combination with an rSAM- associated protein comprising an amino acid sequence selected from the group consisting of
(a) SEQ ID NO: 121,
(b) an amino acid sequence having an amino acid sequence identity of at least 70 or 80 %, preferably at least 90 or 95 % with SEQ ID NO: 121, and
(c) a functional fragment and/or functional derivative of (a) or (b), preferably a functional fragment of at least 30 amino acids, more preferably at least 45 amino acids of (a) or (b).
11. Method for introducing at least one a-keto^3-amino acid into (poly)-peptides comprising the steps of:
(i) providing a radical S-adenosyl methionine (rSAM) enzyme as defined in any of claims 1 to 7, and/or a host cell as defined in claim 9,
(ii) optionally providing an rSAM-associated protein as defined in claim 10,
(iii) providing at least one (poly)peptide substrate of interest comprising one or more amino acid motifs XYG, wherein X is any natural or non-natural amino acid, Y is tyrosine and G is glycine, and
(iv) contacting the enzyme and/or host cell of (i) with the substrate of (iii) and optionally the rSAM-associated protein of (ii) under conditions suitable for the enzymatic introduction of at least one a-keto^3-amino acid into the substrate.
12. Method according to claim 11, wherein at least one of the enzyme in step (i) and the
(poly)peptide substrate in step (iii), preferably both, optionally together with the rSAM- associated protein of optional step (ii), are provided in the form of a host cell, more preferably are all co-expressed in the host cell as defined in claim 9.
13. Method according to claim 11 or 12, wherein the host cell for use in step (i) of the above method is an E. coli host cell.
14. Method according to any of claims 11 to 13, wherein step (iv) is followed by step (v), wherein the keto-functionality resulting from step (iv) is reduced chemically, preferably by sodium borohydride, or is converted to an imine, preferably the methoxyamine.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17150498.8 | 2017-01-06 | ||
EP17150498 | 2017-01-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018127544A1 true WO2018127544A1 (en) | 2018-07-12 |
Family
ID=57796156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2018/050225 WO2018127544A1 (en) | 2017-01-06 | 2018-01-05 | USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018127544A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4159743A1 (en) * | 2021-09-30 | 2023-04-05 | ETH Zurich | Methods for preparing pyridazine compounds |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030113882A1 (en) | 1998-11-24 | 2003-06-19 | Wisconsin Alumni Research Foundation | Methods for the preparation of beta-amino acids |
WO2007047680A2 (en) * | 2005-10-14 | 2007-04-26 | Cargill, Incorporated | Increasing the activity of radical s-adenosyl methionine (sam) enzymes |
-
2018
- 2018-01-05 WO PCT/EP2018/050225 patent/WO2018127544A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030113882A1 (en) | 1998-11-24 | 2003-06-19 | Wisconsin Alumni Research Foundation | Methods for the preparation of beta-amino acids |
WO2007047680A2 (en) * | 2005-10-14 | 2007-04-26 | Cargill, Incorporated | Increasing the activity of radical s-adenosyl methionine (sam) enzymes |
Non-Patent Citations (21)
Title |
---|
"NCBI", Database accession no. rad_SAM_trio |
"NCBI", Database accession no. rSAM_nif11_3 |
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410 |
ALTSCHUL ET AL., NCB NLM NIH BETHESDA |
APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 72, 2006, pages 211 |
CLARISSA MELO CZEKSTER ET AL: "In Vivo Biosynthesis of a β-Amino Acid-Containing Protein", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 138, no. 16, 27 April 2016 (2016-04-27), US, pages 5194 - 5197, XP055369605, ISSN: 0002-7863, DOI: 10.1021/jacs.6b01023 * |
CZEKSTER ET AL., JACS, vol. 138, 2016, pages 5194 - 5197 |
DATABASE EMBL [online] 23 November 2015 (2015-11-23), "Labilithrix luteola Radical SAM, Pyruvate-formate lyase-activating enzyme like protein", XP002778916, Database accession no. AKU99181 * |
DATABASE GenPept [online] 27 October 2016 (2016-10-27), "radical SAM/SPASM domain-containing protein [Ruegeria sp. ANG-S4]", XP002778915, retrieved from NCBI Database accession no. WP_052261552 * |
FREEMAN ET AL., SCIENCE, vol. 338, 2012, pages 387 - 390 |
FUKUHARA, K. ET AL., ORG. LETT., vol. 17, 2015, pages 2646 - 2648 |
HAFT ET AL., BMC BIOLOGY, vol. 8, 2010, pages 70 |
HAFT ET AL., J. BACTERIOL., vol. 193, 2011, pages 2745 - 2755 |
HAFT ET AL., NUCLEIC ACIDS RES., vol. 31, 2003, pages 371 - 373 |
HAFT; BASU, J. BACTERIOL., vol. 193, 2011, pages 2745 - 2755 |
HAMADA ET AL., TETRAHEDRON LETT., vol. 35, 1994, pages 719 - 720 |
LARKIN MA ET AL., BIOINFORMATICS, vol. 23, 2007, pages 2947 - 2948 |
LAU; SUN, BIOTECHNOL ADV., vol. 27, no. 6, 2009, pages 1015 - 1022 |
METHODS IN ENZMOLOGY, vol. 350, 2002, pages 248 |
MORINAKA, B. I. ET AL., ANGEW. CHEM. INT. ED., vol. 53, 2014, pages 8503 - 8507 |
WESTERS ET AL., MOL. CELL. RES., vol. 1694, no. 1-3, 2004, pages 299 - 310 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4159743A1 (en) * | 2021-09-30 | 2023-04-05 | ETH Zurich | Methods for preparing pyridazine compounds |
WO2023052526A1 (en) * | 2021-09-30 | 2023-04-06 | Eth Zurich | Methods for preparing pyridazine compounds |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2016272543B2 (en) | Methods and products for fusion protein synthesis | |
AU2018258000B2 (en) | Proteins and peptide tags with enhanced rate of spontaneous isopeptide bond formation and uses thereof | |
AU2018251237B2 (en) | Peptide ligase and use thereof | |
US5656726A (en) | Peptide inhibitors of urokinase receptor activity | |
CN102762737B (en) | The method of production pyridine Nan Ping | |
CN111757891A (en) | Chemical-enzymatic synthesis of somaglutide, liraglutide and GLP-1 | |
KR100762315B1 (en) | Fuzacidin biosynthesis enzymes and genes encoding them | |
WO2018127544A1 (en) | USE OF RADICAL S-ADENOSYL METHIONINE (SAM) ENZYMES FOR INTRODUCING α-KETO-β3-AMINO ACIDS INTO (POLY)PEPTIDES | |
JP2007319063A (en) | Method for producing dipeptide | |
JP6637904B2 (en) | Cluster of colistin synthetase and corresponding genes | |
KR100750658B1 (en) | Polymyxin biosynthesis enzymes and gene families encoding them | |
CA3188462A1 (en) | Chemical synthesis of large and mirror-image proteins and uses thereof | |
CN103382496B (en) | Method for preparation of S-adenosylmethionine | |
US11542508B2 (en) | Isolated polynucleotides and polypeptides and methods of using same for expressing an expression product of interest | |
CA3204424A1 (en) | A protein translation system | |
Kastrinsky et al. | A convergent synthesis of chiral diaminopimelic acid derived substrates for mycobacterial L, D-transpeptidases | |
EP3997231A1 (en) | Methods for making recombinant protein | |
WO2024197333A1 (en) | Reagents and methods for energy generation | |
KR20120127617A (en) | Method for producing pyripyropene derivative by enzymatic process | |
CA3194643A1 (en) | Long-acting dnase | |
IL299113A (en) | A method for the production of oleic acid in a type of host Ambozoa | |
WO2003056009A1 (en) | Nucleotide sequences having activity of controlling translation efficiency and utilization thereof | |
EP3615666A1 (en) | Process for obtaining omphalotin in a host cell comprising polynucleotides encoding for polypeptides involved in the omphatolin biosynthesis | |
JP2007029008A (en) | New gene and its application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18701105 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18701105 Country of ref document: EP Kind code of ref document: A1 |