+

WO2013166312A1 - Biofuel production enzymes and uses thereof - Google Patents

Biofuel production enzymes and uses thereof Download PDF

Info

Publication number
WO2013166312A1
WO2013166312A1 PCT/US2013/039306 US2013039306W WO2013166312A1 WO 2013166312 A1 WO2013166312 A1 WO 2013166312A1 US 2013039306 W US2013039306 W US 2013039306W WO 2013166312 A1 WO2013166312 A1 WO 2013166312A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
ascomycota
node
amino acid
acid sequence
Prior art date
Application number
PCT/US2013/039306
Other languages
French (fr)
Inventor
Julio M. Fernandez
Raul Perez-Jimenez
Original Assignee
The Trustees Of Columbia University In The City Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York filed Critical The Trustees Of Columbia University In The City Of New York
Publication of WO2013166312A1 publication Critical patent/WO2013166312A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2405Glucanases
    • C12N9/2434Glucanases acting on beta-1,4-glucosidic bonds
    • C12N9/2437Cellulases (3.2.1.4; 3.2.1.74; 3.2.1.91; 3.2.1.150)

Definitions

  • Industrial enzymes have many applications in detergents, textile production, pharmaceuticals, biofuel production, as well as many other applications. Industrial enzymes can be limited by their stability at a wide range of temperatures and pH levels, and currently there is no reliable method to broaden the range of use without simultaneously affecting the enzyme's activity. A common practice involves randomly inserting mutations in existing enzymes and screening for variants that exhibit the desired characteristics; however, due to the enormous combinatorial possibilities, this can become costly and work-intensive and does not guarantee success.
  • the invention relates to a modified method to predictably alter and optimize enzymes, mainly by identifying and resurrecting suitable ancestral strains. By examining the molecular and evolutionary biology involved in the enzyme chemistry, this method establishes a fast and economically efficient system to develop and/or improve enzymes, especially industrial enzymes used in harsh environments such as high temperature or low pH.
  • the ancestral cellulases can be classified as cellobiohydrolases II. Modern cellobiohydrolases from fungi are used in the biofuel industry to produce cellulosic ethanol.
  • the invention is based, at least in part, on the discovery of the resurrection of four ancestral cellulase enzymes from fungi.
  • These enzymes are useful for the production of Cellulosic Ethanol for biofuels.
  • Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for bio fuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 2.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 3.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 10.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 11.
  • the present invention provides for an isolated polypeptide comprising about 90%> identity to any one of the amino acid sequences of SEQ ID NO: 1,
  • SEQ ID NO: 4 SEQ ID NO: 5 SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO:
  • SEQ ID NO: 28 SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 122, SEQ ID NO: 123, SEQ
  • SEQ ID NO: 179 SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ
  • SEQ ID NO: 229 SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, or SEQ ID NO: 233.
  • the signal peptide of the isolated polypeptide is removed.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 31.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 32.
  • the present invention provides for an isolated polypeptide comprising about 90%> identity to the amino acid sequence of SEQ ID NO: 39.
  • the present invention provides for an isolated polypeptide comprising at least about 90% identity to the amino acid sequence of SEQ ID NO: 40.
  • the present invention provides for an isolated polypeptide comprising about 90%> identity to any one of the amino acid sequences of SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, or SEQ ID NO: 59.
  • the present invention provides for an nucleic acid encoding a polypeptide of the present invention.
  • the present invention provides for a recombinant
  • the present invention provides for a recombinant
  • the recombinant microorganism is a fungus.
  • the recombinant microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes.
  • the recombinant microorganism is a yeast.
  • the recombinant microorganism is a bacteria.
  • the recombinant microorganism is Saccharomyces cerevisiae.
  • the recombinant microorganism is selected from the group consisting of Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium,
  • Trichoderma sp. Aspergillus sp., Schizophyllum sp., and Penicillium sp. In some
  • the recombinant microorganism is selected from the group consisting of E.coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., and Streptomyces sp.
  • the present invention provides for a method for the production of cellulosic ethanol, comprising adding an isolated polypeptide of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing.
  • the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • the present invention provides for a method for the production of cellulosic ethanol, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing.
  • the method further comprises adding a polypeptide of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • the present invention provides for a method for cellulose processing, comprising adding a polypeptide of the present invention, or a combination thereof, to a source material of cellulose.
  • the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • the present invention provides for a method for cellulose processing, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose.
  • the method further comprises adding a polypeptide of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • FIG. 1 is a phylogenetic tree of fungal cellulases.
  • FIG. 2 is a phylogenetic tree of fungal cellulases.
  • FIG. 3 is a phylogenetic tree of fungal cellulases obtained using BEAST.
  • FIG. 4 is a phylogenetic tree of fungal cellulases obtained using MrBayes.
  • Cellulases are enzymes that can catalyze the hydrolysis of the ⁇ -1,4 glucosidic bonds in cellulose, the predominant component of plant matter. In nature, cellulases facilitate microbial conversion of insoluble cellulose contained within biomass into soluble sugars (EA Bayer et al. Current Opinion in Structural Biology, 8:548-557, 1998).
  • Cellobiohydrolases from fungi can be used in the biofuel industry to produce cellulosic ethanol. Before the sugars in lignocellulosic biomass, such as wood, can be fermented into ethanol, the lignin that encapsulates the cellulose and the cellulose's unique structural conformation within can be addressed with either acid or enzyme hydrolysis (PC Badger, In: J. Janick and A. Whipkey (eds.), Trends in new crops and new uses. ASHS Press, Alexandria, VA. 2002.)
  • the sequences in List #3 are new ancestral sequences for the specified nodes. They contain restriction sites at the termini.
  • the DNA sequence gctagc encodes a restriction site for the Nhel enzyme and the sequence ggtacc encodes a restriction site for the restriction enzyme Kpnl.
  • the protein sequences are in capital letters. These sequences are depicted in Appendix 3.
  • the sequences used in the trees and the sequence alignment are shown in List #4, the phylogenetic trees are shown in FIGS.3 and 4, and the resurrected sequences for each tree are shown in List #5 (BEAST tree) and List #6 (MrBayes tree). These sequences present a reconstruction of fungal cellulase enzymes to be used in the production of bioethanol as a green-fuel source.
  • an "ancestral cellulase molecule” refers to an ancestral cellulose protein, or a fragment thereof.
  • An “ancestral cellulase molecule” can also refer to a nucleic acid (including, for example, genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA) which encodes a polypeptide corresponding to an ancestral cellulase protein, or fragment thereof.
  • an ancestral cellulose molecule comprises the amino acid sequence shown in SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO:
  • SEQ ID NO: 186 SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO:
  • an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • the ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art.
  • a nucleic acid that encodes an ancestral cellulase molecule can be obtained by synthetic or semi-synthetic methods, by screening DNA libraries, or by amplification from a natural source.
  • An ancestral cellulase molecule can include a fragment or portion of an ancestral cellulase protein.
  • An ancestral cellulase molecule can include a variant of the above described examples, such as a fragment thereof.
  • an ancestral cellulase molecule comprises a variant of an ancestral cellulase protein or polypeptide encoded by an ancestral cellulase nucleic acid sequence wherein the variant has an amino acid identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,
  • Such variants can include those having at least from about 46% to about 50% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167,
  • protein variants can include amino acid sequence modifications.
  • amino acid sequence modifications fall into one or more of three classes: substitutional, insertional or deletional variants.
  • Insertions can include amino and/or carboxyl terminal fusions as well as intrasequence insertions of single or multiple amino acid residues. Insertions ordinarily will be smaller insertions than those of amino or carboxyl terminal fusions, for example, on the order of one to four residues. Deletions are characterized by the removal of one or more amino acid residues from the protein sequence.
  • variants ordinarily are prepared by site-specific mutagenesis of nucleotides in the DNA encoding the protein, thereby producing DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture.
  • an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide.
  • Signal peptides are polypeptide sequences variable in length and amino acid composition found at the amino-terminus of some proteins. Signal peptides direct the secretion of polypeptide molecules through a prokaryotic or eukaryotic cell membrane.
  • Signal peptides have a tripartite structure consisting of a hydrophobic core flanked by a positively charged n-region and a neutral but polar c-region on either side (Tuteja, R., (2005) Arch. BioChem. Biophys. 441 : 107-111).
  • Signal peptide sequences can be identified by various methods, known to one of skill in the art. For example, signal peptide sequences within a polypeptide sequence can be identified using various prediction tools including, but not limited to, Phobius (http://phobius.sbc.su.se/), Predotar (http://urgi.versailles.inra.fr/predotar/predotar.html), SignalP
  • an ancestral cellulase molecule comprises a protein or polypeptide encoded by a nucleic acid sequence encoding an ancestral cellulase protein, such as the sequences shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
  • an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • the nucleic acid can be any type of nucleic acid, including genomic DNA, complementary DNA (cDNA), synthetic or semi-synthetic DNA, as well as any form of corresponding RNA.
  • a nucleic acid encoding an ancestral cellulase protein can comprise a recombinant nucleic acid encoding such a protein.
  • the nucleic acid can be a non-naturally occurring nucleic acid created artificially (such as by assembling, cutting, ligating or amplifying sequences).
  • Restriction enzymes can be used to cut nucleic acid sequences in a sequence specific manner, as is known in the art. Restriction enzyme recognition sequences can be added to the ends of a nucleic acid sequence encoding an ancestral cellulase protein (e.g. SEQ ID NOS: 60, 61, 62 and 63).
  • the nucleic acid sequence of a restriction enzyme site can encode amino acids. Amino acids encoded by a restriction enzyme site can form part of the sequence of an ancestral cellulase protein, or may encode additional amino acids at the ends of a polypeptide sequence of an ancestral cellulase protein. Nucleic acid sequences can be double-stranded or single-stranded.
  • the invention further provides for nucleic acids that are complementary to an ancestral cellulase molecule.
  • Complementary nucleic acids can hybridize to the nucleic acid sequence described above under stringent hybridization conditions.
  • stringent hybridization conditions include temperatures above 30°C, above 35°C, in excess of 42°C, and/or salinity of less than about 500 mM, or less than 200 mM.
  • Hybridization conditions can be adjusted by the skilled artisan via modifying the temperature, salinity and/or the concentration of other reagents such as SDS or SSC.
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing.
  • an ancestral cellulase molecule can be added as an isolated recombinant protein.
  • molecule can be added as an isolated modified recombinant protein.
  • an ancestral cellulase protein, or fragment thereof can be modified by removal of the signal peptide.
  • an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an amino acid sequence of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant protein, or by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or a combination thereof.
  • the recombinant protein and the recombinant microorganism can be added sequentially, in any order, or simultaneously.
  • the invention utilizes conventional molecular biology, microbiology, and recombinant DNA techniques available to one of ordinary skill in the art. Such techniques are well known to the skilled worker and are explained fully in the literature. See, e.g., Maniatis, Fritsch & Sambrook, "DNA Cloning: A Practical Approach,” Volumes I and II (D. N. Glover, ed., 1985); “Oligonucleotide Synthesis” (M. J. Gait, ed., 1984); “Nucleic Acid Hybridization” (B. D. Hames & S. J. Higgins, eds., 1985); “Transcription and Translation” (B. D. Hames & S. J.
  • an ancestral cellulase e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 16
  • 229, 230, 231, 232, or 233) in several ways, which include, but are not limited to, isolating the protein via biochemical means or expressing a nucleotide sequence encoding the protein of interest by genetic engineering methods.
  • the invention provides for ancestral cellulase molecules that are encoded by nucleotide sequences.
  • the ancestral cellulase molecule can be a polypeptide encoded by a nucleic acid (including genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA).
  • an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • the ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art.
  • the ancestral cellulase molecule of the invention can be produced via recombinant DNA technology and such recombinant nucleic acids can be prepared by conventional techniques, including chemical synthesis, genetic engineering, enzymatic techniques, or a combination thereof.
  • a nucleic acid that encodes an ancestral cellulase molecule can be obtained by screening DNA libraries, or by amplification from a natural source.
  • a nucleic acid amplified from a natural source is modified by various mutagenesis methods known in the art to obtain the ancestral cellulase molecules of the invention.
  • an ancestral cellulase molecule can be "codon-optimized," as known in the art.
  • An ancestral cellulase molecule can be a fragment of ancestral cellulase protein.
  • the ancestral cellulase protein fragment can encompass any portion of at least about 8 consecutive amino acids of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
  • the fragment can comprise at least about 10 consecutive amino acids, at least about 20 consecutive amino acids, at least about 30 consecutive amino acids, at least about 40 consecutive amino acids, a least about 50 consecutive amino acids, at least about 60 consecutive amino acids, at least about 70 consecutive amino acids, at least about 80 consecutive amino acids, at least about 90 consecutive amino acids, at least about 100 consecutive amino acids, at least about 110 consecutive amino acids, at least about 120 consecutive amino acids, at least about 130 consecutive amino acids, at least about 140 consecutive amino acids, at least about 150 consecutive amino acids, at least about 200 consecutive amino acids, at least about 250 consecutive amino acids, at least about 300 consecutive amino acids, at least about 350 consecutive amino acids, or at least about 400 consecutive amino acids of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
  • Fragments include all possible amino acid lengths between about 8 and about 400 amino acids, for example, lengths between about 10 and about 400 amino acids, between about 15 and about 400 amino acids, between about 20 and about 400 amino acids, between about 35 and about 400 amino acids, between about 40 and about 400 amino acids, between about 50 and about 400 amino acids, between about 70 and about 400 amino acids, between about 100 and about 400 amino acids, between about 200 and about 400 amino acids, between about 300 and about 400 amino acids, or between about 350 and about 400 amino acids.
  • a fragment of a nucleic acid sequence that comprises an ancestral cellulase molecule can encompass any portion of at least about 8 consecutive nucleotides. In one embodiment, the fragment can comprise at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, or at least about 30 nucleotides.
  • the ancestral cellulase molecules can be recombinant enzymes, and can be produced in a variety of ways known in the art.
  • polypeptides e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150
  • the nucleic acid is expressed in an expression cassette, for example, to achieve overexpression in a cell.
  • the nucleic acids of the invention can be an R A, cDNA, cDNA-like, or a DNA of interest in an expressible format, such as an expression cassette, which can be expressed from a natural promoter or an entirely
  • the nucleic acid of interest can encode a protein, and may or may not include introns. Any recombinant expression system can be used, including, but not limited to, the recombinant microorganisms of the invention, as well as other bacterial, fungal, mammalian, insect, or plant cell expression systems.
  • Nucleic acid sequences comprising an ancestral cellulase molecule that encode a polypeptide can be synthesized, in whole or in part, using chemical methods known in the art.
  • an ancestral cellulase molecule can be produced using chemical methods to synthesize its amino acid sequence, such as by direct peptide synthesis using solid-phase techniques. Protein synthesis can either be performed using manual techniques or by automation. Automated synthesis can be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer).
  • fragments of an ancestral cellulase molecule can be separately synthesized and combined using chemical methods to produce a full-length molecule.
  • Host cells transformed with a nucleic acid sequence encoding an ancestral cellulase molecule such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
  • the polypeptide produced by a transformed cell can be secreted or contained intracellularly depending on the sequence and/or the vector used.
  • Methods for protein production by recombinant technology in different host systems are well known in the art (Sambrook, et al., "Molecular Cloning: a Laboratory Manual” (2001); Gellissen, G., “Novel Microbial and Eukaryotic Expression Systems” (2005)).
  • Expression vectors containing a nucleic acid sequence encoding an ancestral cellulase molecule can be designed to contain signal sequences which direct secretion of soluble polypeptide molecules encoded by an ancestral cellulase molecule, through a prokaryotic or eukaryotic cell membrane.
  • An ancestral cellulase molecule can be produced as an extracellular enzyme that is secreted into the culture medium, from which it can easily be recovered and isolated.
  • the spent culture medium of the production host can be used as such, or the host cells can be removed therefrom, and/or it can be concentrated, filtrated or fractionated. It can also be dried.
  • an ancestral cellulase molecule, or fragment thereof can be modified by removal of the signal peptide which can allow the polypeptide molecules to be contained intracellularly.
  • An isolated polypeptide of the present invention includes, but is not limited to, culture medium containing the polypeptide from which cells and cell debris have been removed.
  • the polypeptides can be isolated e.g. by adding anionic and/or cationic polymers to the spent culture medium to enhance precipitation of cells, cell debris and other unwanted enzymes.
  • the medium can be filtrated using an inorganic filtering agent and a filter to remove the precipitants formed.
  • the filtrate can be further processed using a semi-permeable membrane to remove excess of salts, sugars and metabolic products.
  • a synthetic peptide can be substantially purified via high performance liquid chromatography (HPLC).
  • HPLC high performance liquid chromatography
  • the composition of a synthetic ancestral cellulase molecule can be confirmed by amino acid analysis or sequencing. Additionally, any portion of an amino acid sequence comprising a protein encoded by an ancestral cellulase molecule can be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins to produce a variant polypeptide or a fusion protein.
  • the invention further encompasses methods for using a protein or polypeptide encoded by a nucleic acid sequence of an ancestral cellulase molecule.
  • the polypeptide can be modified, such as by glycosylations and/or acetylations and/or chemical reaction or coupling, and can contain one or several non-natural or synthetic amino acids.
  • An example of an ancestral cellulase molecule comprises the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
  • Some aspects of the present invention provide for recombinant microorganisms that express a nucleic acid encoding an ancestral cellulase enzyme (e.g., the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
  • microorganisms can include both prokaryotic and eukaryotic microorganisms, such as bacteria and yeast.
  • the microorganism is a fungus.
  • the microorganism is from the phylum Basidomycota, from the phylum
  • the microorganism is a yeast. In yet another embodiment, the microorganism is a bacteria. In another embodiment, the microorganism is E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp.
  • the microorganism is Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.
  • a microorganism is a eukaryotic or prokaryotic microorganism.
  • a microorganism is a yeast, such as Saccharomyces cerevisiae.
  • a microorganism is a bacteria, such as a gram-positive bacteria or a gram-negative bacteria.
  • microorganisms may be used according to the present invention.
  • other organisms from the genera Achaetomium, Acremonium, Aspergillus,
  • Botrytis Chaetomium, Chrysosporium, Collybia, Fames, Fusarium, Humicola, Hypocrea, Lentinus, Metanacarpus, Myceliophthora, Myriococcum, Neurospora, Penicillium,
  • Phanerochaete Phlebia, Pleurotus, Podospora, Polyporus, Pycnoporus, Rhizoctonia,
  • Additional organisms include, but are not limited to Acetobacter aceti, Achromobacter, Acidiphilium,
  • chrysanthemi Gliconobacter, Gluconacetobacter, Haloarcula, Humicola insolens, Humicola nsolens, Kitasatospora setae, Klebsiella, Klebsiella oxytoca, Kluyveromyces, Kluyveromyces fragilis, Kluyveromyces lactis, Kocuria, Lactlactis, Lactobacillus, Lactobacillus fermentum, Lactobacillus sake, Lactococcus, Lactococcus lactis, Leuconostoc, Methylocystis,
  • Microbacterium imperiale, Micrococcus lysodeikticus, Microlunatus, Mucor javanicus, Mycobacterium, Myrothecium, Nitrobacter, Nitrosomonas, Nocardia, Papaya carica,
  • Pediococcus Pediococcus halophilus, Penicillium, Penicillium camemberti, Penicillium citrinum, Penicillium emersonii, Penicillium roqueforti, Penicillum lilactinum, Penicillum multicolor, Paracoccus pantotrophus, Propionibacterium, Pseudomonas, Pseudomonas fluorescens, Pseudomonas denitrificans, Pyrococcus, Pyrococcus furiosus, Pyrococcus horikoshii, Rhizobium, Rhizomucor miehei, Rhizomucor pusillus Lindt, Rhizopus, Rhizopus delemar, Rhizopus japonicus, Rhizopus niveus, Rhizopus oryzae, Rhizopus oligosporus, Rhodococcus, Sccharomyces cerevisiae, Sclerotina libertina
  • a recombinant microorganism may be engineered to secrete an ancestral cellulase molecule into the culture media, such as by incorporating a signal peptide or an autotransporter domain into the ancestral cellulase molecule.
  • ancestral cellulase molecules can be fused with any combination of signal peptides and or autotransporter domains found in secreted proteins as is known in the art.
  • ancestral cellulase molecules can be designed to maximize the secretion of ancestal cellulase molecules into the culture media, and may also include the use of many different linker sequences that fuse signal peptides, ancestal cellulase molecules, and autotransporters that improve the efficiency of secretion or the cell surface presentation.
  • an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide.
  • an ancestral cellulase molecule is purified from the culture media. In other embodiments, an ancestral cellulase molecule is not purified from the culture media.
  • any other recombinant expression system can be used to obtain an isolated ancestral cellulase molecule.
  • Bacterial Expression Systems One skilled in the art understands that expression of desired protein products in prokaryotes is most often carried out in E. coli with vectors that contain constitutive or inducible promoters.
  • Some non-limiting examples of bacterial cells for transformation include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp., and the bacterial cell line E.
  • E. coli strains DH5 or MC1061/p3 (Invitrogen Corp., San Diego, Calif), which can be transformed using standard procedures practiced in the art, and colonies can then be screened for the appropriate plasmid expression.
  • a number of expression vectors can be selected.
  • Non- limiting examples of such vectors include multifunctional E. coli cloning and expression vectors such as BLUESCRIPT (Stratagene).
  • Some E. coli expression vectors (also known in the art as fusion- vectors) are designed to add a number of amino acid residues, usually to the N-terminus of the expressed recombinant protein.
  • Such fusion vectors can serve three functions: 1) to increase the solubility of the desired recombinant protein; 2) to increase expression of the recombinant protein of interest; and 3) to aid in recombinant protein purification by acting as a ligand in affinity purification.
  • vectors which direct the expression of high levels of fusion protein products that are readily purified, may also be used.
  • fusion expression vectors include pGEX, which fuse glutathione S-tranf erase (GST) to desired protein; pcDNA 3.1/V5-His A B & C (Invitrogen Corp, Carlsbad, CA) which fuse 6x-His to the recombinant proteins of interest; pMAL (New England Biolabs, MA) which fuse maltose E binding protein to the target recombinant protein; the E.
  • coli expression vector pUR278 (Ruther et al., (1983) EMBO 12: 1791), wherein the coding sequence may be ligated individually into the vector in frame with the lac Z coding region in order to generate a fusion protein; and pIN vectors (Inouye et al, (1985) Nucleic Acids Res. 13:3101-3109; Van Heeke et al, (1989) J. Biol. Chem.
  • Fusion proteins generated by the likes of the above-mentioned vectors are generally soluble and can be purified easily from lysed cells via adsorption and binding of the fusion protein to an affinity matrix.
  • fusion proteins can be purified from lysed cells via adsorption and binding to a matrix of glutathione agarose beads subsequently followed by elution in the presence of free glutathione.
  • the pGEX vectors are designed to include thrombin or factor Xa protease cleavage sites so that the cloned target can be released from the GST moiety.
  • an ancestral cellulase molecule is not purified from the culture media.
  • Plant and Insect Expression Systems Other suitable cell lines, in addition to microorganisms such as bacteria (e.g., E. coli and B. subtilis) transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression vectors containing coding sequences for an ancestral cellulase molecule may alternatively be used to produce the molecule of interest.
  • a non-limiting example includes plant cell systems infected with recombinant virus expression vectors (for example, tobacco mosaic virus, TMV; cauliflower mosaic virus, CaMV) or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid) containing coding sequences for an ancestral cellulase molecule.
  • sequences encoding an ancestral cellulase molecule can be driven by any of a number of promoters.
  • viral promoters such as the 35S and 19S promoters of CaMV can be used alone or in combination with the omega leader sequence from tobacco mosaic virus TMV.
  • plant promoters such as the small subunit of RUBISCO or heat shock promoters, can be used. These constructs can be introduced into plant cells by direct DNA transformation or by pathogen-mediated transfection.
  • an insect system also can be used to express an ancestral cellulase molecule.
  • Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae.
  • Sequences encoding a trefoil family molecule can be cloned into a non-essential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter.
  • Successful insertion of the nucleic acid sequences of an ancestral cellulase molecule will render the polyhedrin gene inactive and produce recombinant virus lacking coat protein.
  • the recombinant viruses can then be used to infect S. frugiperda cells or Trichoplusia larvae in which an ancestral cellulase molecule can be expressed.
  • a fungal system also can be used to express an ancestral cellulase molecule.
  • Fungi can be transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule.
  • Some non- limiting examples of fungi for transformation include, Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp..
  • fungi from the subkingdom dikarya, from the phylum Basidomycota, from the phylum Ascomycota, or from the class Sordariomycetes can be transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule.
  • Mammalian Expression Systems Mammalian cells can also contain an expression vector (for example, one that harbors a nucleotide sequence encoding an ancestral cellulase molecule for expression of a desired product.
  • Expression vectors containing such a nucleic acid sequence linked to at least one regulatory sequence in a manner that allows expression of the nucleotide sequence in a host cell can be introduced via methods known in the art.
  • the vector can be a recombinant DNA or RNA vector, and includes DNA plasmids or viral vectors.
  • a number of viral-based expression systems can be used to express an ancestral cellulase molecule in mammalian host cells (e.g., adeno-associated virus, retrovirus, adenovirus, lentivirus or alphavirus).
  • mammalian host cells e.g., adeno-associated virus, retrovirus, adenovirus, lentivirus or alphavirus.
  • Regulatory sequences are well known in the art, and can be selected to direct the expression of a protein or polypeptide of interest (such as an ancestral cellulase molecule) in an appropriate host cell as described in Goeddel, Gene Expression Technology: Methods in Enzvmology 185, Academic Press, San Diego, Calif. (1990).
  • Non- limiting examples of regulatory sequences include: polyadenylation signals, promoters, enhancers, and other expression control elements. Practitioners in the art understand that designing an expression vector can depend on factors, such as the choice of host cell to be transfected and/or the type and/
  • Enhancer regions which are those sequences found upstream or downstream of the promoter region in non-coding DNA regions, are also known in the art to be important in optimizing expression. If needed, origins of replication from viral sources can be employed, such as if a prokaryotic host is utilized for introduction of plasmid DNA. However, in eukaryotic organisms, chromosome integration is a common mechanism for DNA replication.
  • a gene that encodes a selectable marker (for example, resistance to antibiotics or drugs, such as ampicillin, neomycin, G418, and hygromycin) can be introduced into host cells along with the gene of interest in order to identify and select clones that stably express a gene encoding a protein of interest.
  • the gene encoding a selectable marker can be introduced into a host cell on the same plasmid as the gene of interest or can be introduced on a separate plasmid. Cells containing the gene of interest can be identified by drug selection wherein cells that have incorporated the selectable marker gene will survive in the presence of the drug. Cells that have not incorporated the gene for the selectable marker die. Surviving cells can then be screened for the production of the desired protein molecule (for example, an ancestral cellulase molecule).
  • a host cell strain can be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed ancestral cellulase molecule in the desired fashion.
  • modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation.
  • Post-translational processing which cleaves a "prepro" form of the polypeptide also can be used to facilitate correct insertion, folding and/or function.
  • Different host cells which have specific cellular machinery and characteristic mechanisms for post-translational activities can be chosen to ensure the correct modification and processing of the foreign protein.
  • An exogenous nucleic acid can be introduced into a cell via a variety of techniques known in the art, such as lipofection, microinjection, calcium phosphate or calcium chloride precipitation, DEAE-dextrin-mediated transfection, or electroporation. Electroporation is carried out at approximate voltage and capacitance to result in entry of the DNA construct(s) into cells of interest. Other methods used to transfect cells can also include modified calcium phosphate precipitation, polybrene precipitation, liposome fusion, and receptor-mediated gene delivery.
  • a host cell strain which modulates the expression of the inserted sequences, or modifies and processes the nucleic acid in a specific fashion desired also may be chosen. Such modifications (for example, glycosylation and other post-translational modifications) and processing (for example, cleavage) of protein products may be important for the function of the protein.
  • Different host cell strains have characteristic and specific mechanisms for the post-translational processing and modification of proteins and gene products. As such, appropriate host systems or cell lines can be chosen to ensure the correct modification and processing of the foreign protein expressed, such as an ancestral cellulase molecule.
  • eukaryotic host cells possessing the cellular machinery for proper processing of the primary transcript, glycosylation, and phosphorylation of the gene product may be used.
  • Non-limiting examples of host cells include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp., Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp..
  • Various culturing parameters can be used with respect to the host cell being cultured. Appropriate culture conditions for host cells are well known in the art or can be determined by the skilled artisan (see, for example, Madigan M. et al., "Brock Biology of Microorganisms", 2012). Cell culturing conditions can vary according to the type of host cell selected. Commercially available medium can be utilized.
  • Cells suitable for culturing can contain introduced expression vectors, such as plasmids or viruses.
  • the expression vector constructs can be introduced via transformation, microinjection, trans fection, lipofection, electroporation, or infection.
  • the expression vectors can contain coding sequences, or portions thereof, encoding the proteins for expression and production.
  • Expression vectors containing sequences encoding the produced proteins and polypeptides, as well as the appropriate transcriptional and translational control elements, can be generated using methods well known to and practiced by those skilled in the art. These methods include synthetic techniques, in vitro recombinant DNA techniques, and in vivo genetic recombination which are described in J.
  • An ancestral cellulase molecule (such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161,
  • a purified ancestral cellulase molecule can be separated from other compounds which normally associate with the ancestral cellulase molecules, in the cell, such as certain proteins, carbohydrates, or lipids, using methods practiced in the art.
  • the cell culture medium or cell lysate is centrifuged to remove particulate cells and cell debris.
  • the desired polypeptide molecule (for example, an ancestral cellulase molecule) is isolated or purified away from contaminating soluble proteins and polypeptides by suitable purification techniques.
  • Non-limiting purification methods for proteins include: size exclusion chromatography; affinity chromatography; ion exchange chromatography; ethanol precipitation; reverse phase HPLC; chromatography on a resin, such as silica, or cation exchange resin, e.g., DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate precipitation; gel filtration using, e.g., Sephadex G-75, Sepharose; and the like.
  • Other additives such as protease inhibitors (e.g., PMSF or proteinase K) can be used to inhibit proteolytic degradation during purification.
  • Purification procedures that can select for carbohydrates can also be used, e.g., ion-exchange soft gel chromatography, or HPLC using cation- or anion-exchange resins, in which the more acidic fraction(s) is/are collected.
  • ion-exchange soft gel chromatography or HPLC using cation- or anion-exchange resins, in which the more acidic fraction(s) is/are collected.
  • HPLC cation- or anion-exchange resins
  • Some aspects of the present invention provide for ancestral fungal cellulases.
  • Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels.
  • ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose.
  • microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
  • the production of cellulosic ethanol biofuels from cellulosic materials can be performed by various techniques known in the art. See, for example, Canilha L. et al., (2012) J. Biomed. Biotech., 2012:989572; U.S. Patent No. 8,318,473; U.S. Patent No. 8,409,836; U.S. Patent Application Publication No. 20110171710A1 , which are incorporated by reference in their entireties.
  • the starting material for the production of cellulosic biofuels can be cellulosic materials (i.e. any material comprising lignocellulose, cellulose, hemicellulose, or a combination thereof).
  • a source material of cellulose can be any cellulosic material.
  • cellulosic materials include, but are not limited to, fruits, plants, vegetables, woods, grasses, inedible parts of plants, byproducts of lawn and tree maintenance, corn stover, Panicum virgatum, Miscanthus grass species, wood chips, sugarcane residues, sugarcane bagasse, straw, pulp and paper residues, waste paper, textile fibers (e.g., cotton, linen, hemp, jute) and cellulosic fibers (e.g., modal, viscose, lyocel).
  • textile fibers e.g., cotton, linen, hemp, jute
  • cellulosic fibers e.g., modal, viscose, lyocel
  • cellulosic ethanol During the production of cellulosic ethanol from cellulosic material, cellulosic materials can be processed by various techniques known in the art. Most of the
  • carbohydrates in cellulosic material are in the form of lignocellulose, which can comprise cellulose, hemicellulose, pectin and/or lignin.
  • Cellulosic material can be pre-treated by physical and/or chemical means. Pre-treatment can make the cellulose fraction more accessible to hydrolysis.
  • Cellulose and/or hemicellulose comprising the cellulosic materials can then be hydrolysed into sugars (e.g., glucose).
  • ancestral cellulases can be used for the hydrolysis of cellulose.
  • the carbohydrate polymers of cellulose are depolymerized by an ancestral cellulase.
  • Sugars made available by hydrolysis can be used by microorganisms to produce ethanol by fermentation.
  • the present invention provides a method for the production of cellulosic ethanol from a source material of cellulose.
  • Cellulase enzymes can be added to cellulosic materials by various techniques.
  • ancestral cellulases are added to cellulosic materials as an isolated polypeptide.
  • recombinant microorganisms that express ancestral cellulases are added to cellulosic materials.
  • microorganisms that do not express cellulase enzymes can be genetically modified to express ancestral cellulases.
  • Microorganisms can be modified in a variety of ways, such as, but not limited to, to express cellulases, to express large volumes of cellulases, to express modified cellulases, and to express ancestral cellulases.
  • ancestral cellulases can be added to cellulosic materials by addition of isolated polypeptides and by addition of recombinant microorganisms that express ancestral cellulases. Isolated polypeptides and recombinant microorganisms can be added simultaneously or sequentially, in any order.
  • Ancestral cellulases needed for the hydrolysis of the cellulosic material according to the invention may be added in an enzymatically effective amount either simultaneously e.g. in the form of an enzyme mixture, or sequentially, or in combination with any microorganism of the present invention, or in combination with microorganisms that mediate fermentation.
  • any combination of the ancestral cellulase molecules comprising an amino acid sequence having about 90% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166
  • the tree is shown in FIG. 1.
  • the list of ancestral sequences are listed as SED ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, and 63.
  • Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology.
  • the sequences used in the trees and the sequence alignment are listed as SED ID NOS: 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, and 121.
  • the phylogenetic trees are shown in FIGS. 3 and 4, and the resurrected sequences for each tree are listed as SEQ ID NOS: 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, and 177, (BEAST tree) and SEQ ID NOS: 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195,
  • AAALPVEERQ ACASVWGQCG GQGWSGPTCC ASGSTCWQN PYYSQCLPGS ATTTTTSSST TSRSSSTTST SSTTTTSPPT TTPSTASYSG NPFAGVQLWA NAYYASEVHT LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPTLMAGT LADIRAANKA GANPPYAGQF VVYDLPDRDC AAAASNGEFS IADNGVANYK AYIDAIRAQL VEYSDIRIIL VIEPDSLANM VTNMNVAKCA NAQSAYLECT NYALKQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPAAV RGLATNVANY NAWSIASPPS YTQGDPNYDE KHYINALAPL LSANGWPDAH FIVDQGRSGK QPTGQQEWGD WCNVIGTGFG VRPTTNTGSS LEDAFV
  • AAALPVEERQ ACASVWGQCG GQGWSGPTCC ASGSTCVVSN PYYSQCLPGS ATPTTTSSST TSRSSSTTSR SSTTTTSPPT TTPSTASYSG NPFAGVNLWA NAYYASEVHT LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPTLMAGT LADIRAANKA GGNPPYAGQF VVYDLPDRDC AAAASNGEFS IADGGVAKYK AYIDAIRAQL VEYSDIRIIL VIEPDSLANM VTNMGVPKCA NAQSAYLECT NYAVKQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPAAL RGLATNVANY NAWNITSPPS YTQGNPNYDE KHYINALAPL LSANGWPDAH FIVDQGRSGK QPTGQQEWGD WCNVIGTGFG VRPTANTGSS LVDAFV
  • FRMLRYLSIV AATAILTGVE AQQSVWGQCG GQGWSGATSC AAGSTCSTQN PYYAQCIPGT ATSTTSSTTT SSTSASTTTT TTTTTTTAST TTTTTAAASG NPFSGYQLYA NPYYSSEVHT LAIPSLTDGS LAAAATKAAE IPSFVWLDTA AKVPTLMGTY LANIEAANKA GASPPIAGIF VVYDLPDRDC AAAASNGEYT VANNGVANYK AYIDSIVAQL KAYPDVHTIL IIEPDSLANM VTNLSTAKCA EAQSAYYECV NYALINLNLP NVAMYIDAGH AGWLGWSANL SPAAQLFATV YKNASSPAAL RGLATNVANY NAWSISSPPS YTSGDSNYDE KLYINALSPL LTSNGWPNAH FIMDTSRNGV QPTKQQAWGD WCNVIGTGFG VQPTTNTGDP LEDAFV
  • AAQAPVWGQC GGTGWTGPTC CASGSTCWQ NPYYSQCLPG STTSSTTTTS TTTSSPTSST TTTTSTTTPP TTSPTTTTPP PGSTSTPASG NPFAGYQLYL SPYYAAEVAA LAAPNITDPA LKAKAASVAN IPTFTWFDTV AKVPDLGTYL ADASALQKSS GQKPYAVQIV VYDLPDRDCA AAASNGEFSI ANNGMANYKT YIDSIAAQLK KYSDVRWAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNLVG VYMYLDAGHA GWLGWPANLS PAAQLFAQLY KNAGSPSFVR GLATNVANYN ALSAASPDSY TQGNPNYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAIVWVKPG GESDGTSDTS
  • AAQAPVWGQC GGTGWTGPTT CASGSTCWQ NPYYSQCLPG STTSSTTTTS TTTSSPTSST TTTTSTTTPP TGSPTTTTPP PGSTSTPAAG NPFVGYQLYL SPYYAAEVAA LAASNITDPT LKAKAASVAN IPTFTWFDVV AKVPDLGTYL ADASALQKSS GQKPYLVQIV VYDLPDRDCA AAASNGEFSI ANNGMANYKT YIDQIAAQIK KYPDVRVVAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNSVG VYMYLDAGHA GWLGWPANLS PAAQLFAQLY KNAGSPSFVR GLATNVANYN ALSAASPDPI TQGNPNYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAIVWVKPG GES
  • QAQASVWGQC GGQGWSGPTC CASGSTCWQ NPYYSQCLPG STTTSTTTTS TTTTSSTSST TTTTSTTTPP TTSPTTTTPP
  • AASTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFVWLDTA AKVPTMGTYL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGVANYKA YIDSIRAQLV KYSDVRIILV IEPDSLANMV TNLNVAKCAN AQSAYLECVN YALKQLNLPN VAMYLDAGHA GWLGWPANLS PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYINALAPLL SSNGFPDAHF IVDTGRNGVQ PTRQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GESDGTSD
  • QACASQWGQC GGQGWSGPTC CASGSTCWQ NPYYSQCLPG STTTSTTTRS STTTSSVSST TTSTSTTTPP TGSPTTTTPP AASGTASYSG NPFAGVQLWA NAYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPLMAGTL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ADNGVANYKA YIDAIRAQLV EYSDIRIILV IEPDSLANMV TNMNVAKCAN AQSAYLECTN YAVKQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KNAGKPAALR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYIQALAPLL SSNGWPDAHF IVDQGRSGKQ PTGQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GECDGTSDTT
  • QACASQWGQC GGQGWSGPTC CASGSTCVVS NPYYSQCLPG SATTSSSSTA SSTTSSVRST TTSTSTTTPP TGSPTTTTPP APSGGATYTG NPFAGVNLWA NAYYASEVSS LAIPSLSDGA LATAAAKVAK VPTFQWMDTA AKVPLMDGTL ADIRRANKAG GNPPYAGQFV VYNLPDRDCA AAASNGELSI ADGGVAKYKA YIDAIRAMLV KYSDIRIILV IEPDSLANMV TNMGVPKCAN AQAAYLECTN YAVTQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KDAGKPKALR GLATNVSNYN AWNITSPPSY TQGNPNYDEK HYIEALAPLL SSNGWPDAKF IVDQGRSGKQ PTGQQEWGDW CNAIGTGFGV RPTANTGSSL VDAFVWVKPG GESDGTSDTT AARY
  • SEQ ID NO: 62 (AsCA node #41) gctagcQAQASVWGQC GGQGWSGPTC CASGSTCVVQ NPYYSQCLPG STTTSTTTTS TTTTSSTSST TTTTSTTTPP TTSPTTTTPP AASTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFVWLDTA AKVPTMGTYL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGVANYKA YIDSIRAQLV KYSDVRIILV IEPDSLANMV TNLNVAKCAN AQSAYLECVN YALKQLNLPN VAMYLDAGHA GWLGWPANLS PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYINALAPLL SSNGFPDAHF IVDTGRNGVQ PTRQQEWGDW CNVIGTGFGV
  • SEQ ID NO: 64 (tr_A9FHT2_Bacteria_Sorangium_cellulosum) AC GDGG-GGDTS GTGGGSSGVG APTSSGVGAG TPTSSGNVDP TTTSSGNVDP
  • NQPTGQSQWG DWCNVKNTGF GVRPTTDTGD ELVDAFVWVK PGGESDGTSD TSAERYDAHC
  • GLA-DALKPA PEAGQWFQAY FEQLLTNANP PF SEQ ID NO: 71 (tr_B2ABX7_Ascomycota_Podospora_anserina)
  • SEQ ID NO: 80 sp_Q9ClS9_Ascomycota_ Humicola insolens
  • VQPTAQQAWG DWCNLIGTGF GVRPTTNTGD ALEDAFVWIK PGGEGDGTSD TTAARYDFHC
  • SEQ ID NO: 90 (tr_G7XQ80 Ascomycota Aspergillus kawachii) QTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA T-SATTLKTT TSTTTA AMTTT TATSSPAASA SP TTTAS ASGPFSGYQL YVNPYYSSEV ASLAIPSLT-
  • KQPTGQQAWG DWCNVINTGF GVRPTTSTGD DLVDAFVWVK PGGESDGTSD SSATRYDAHC
  • KQPTGQQAWG DWCNVINTGF GERPTTDTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC
  • GYS-DALQPA PEAGTWFQAY FVQLLTNANP AF SEQ ID NO: 92 (sp_P46236_Ascomycota_Fusarium_oxysporum)
  • SEQ ID NO: 93 sp QOCFPl Ascomycota Aspergillus terreus
  • QTLWGQC GGIGWTGPTN
  • CVAGAACSTQ NPYYAQCLPG TSTTLTTTTR VTTTTTSTTS
  • GDMVAKASAV AKVPSFQWLD TAAKVPTMAD TLADIAKANQ
  • AGASPAYAGL FWYDLPDRD CAAAASNGEY SIADNGVANY KAYI
  • DAIKAQ LVANSDTRIL LVVEPDSLAN LVTNMNVAKC ANAHDAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWSAN LQPAATLFAN VYSNAGKPAS LRGLATNVAN YNAWTIASAP SYTQGDSNYD
  • EKLYVQALSP LLSSAGW-DA HFITDQSRSG KQPTGQNAWG DWCNVIGTGF GTRPTTDTGL DIEDALVWVK PGGECDGTSN TTAARYDYHC GLS-DALQPS PEAGTWFQAY FVQLLTNANP AF
  • VQPTKQQAWG DWCNVIGTGF GTPFTTDTGD ALQDAFIWVK PGGECDGTSD TSSPRYDAHC
  • VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC
  • VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC
  • GYS-GALQPA PEAGTWFQAY FVQLLTNANP AL SEQ ID NO: 99 (sp_Q4WFK4_Ascomycota_Neosartorya_fumigata) QTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG A—STTLTTT TAATTT SQTTT KPTTTGPTTS AP TVT ASGPFSGYQL YANPYYSSEV HTLAMPSLP-
  • SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD
  • SEQ ID NO: 100 (sp_Q5B2E8_Ascomycota_Emericella_nidulans) QTLYGQC GGSGWTGATS CVAGAACSTL NQWYAQCLPA —ATTTSTTL TTTTSS VTTTS NPGSTTTT— —SSVTVTAT ASGPFSGYQL YVNPYYSSEV QSIAIPSLT-
  • TSVGTTSPAT TPTTKPST TAA ASGPFSGYQL YANPYYSSEV HTLALPSLT-
  • VQPTKQQAWG DWCNVIGTGF GVPPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC
  • SEQ ID NO: 102 (sp_AlDJQ7_Ascomycota_Neosartorya_fischeri) QTVWGQC GGQGWSGPTN CVAGAACSTL NPYYAQCIPG ATATSTT LSTTTT TQTTT KPTTTGPTTS AP TVT ASGPFSGYQL YANPYYSSEV HTLAMPSLP-
  • SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGASPPIAGI FWYDLPDRD
  • CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA LRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKNAGF-DA HFIMDTSRNG VQPTKQSAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYS-DALQPA PEAGTWFQAY FEQLLTNANP SF
  • QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PP—PTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLS- PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGAN—YAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGW-DA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDG
  • GSA-SSMKP- SEQ ID NO: 106 (gi_345565889_Ascomycota_Arthrobotrys_oligospora) ⁇ LWGQC
  • GGIGWTGATN CVAGAACSTL NPYYAQCLSA AATTPRTTTT PATTTR—TT
  • SEQ ID NO: 115 (tr_E2JAJ2_Basidiomycota_Neolentinus_lepideus) SPIYGQC GGTGWTGATT CASGSTCVFS NPYYSQCLPG A-TTTTTSPQ PTTTTT TTTTN SGGGNPTTTT SAPGSTSTPD -AGPFVGYTL YLSPYYAAEV QA-AAGNITD
  • CAAAASNGEF SIANNGLANY ETYIDQLAAQ IQQYPDVRVV AVIEPDSLAN LVTNLNVAKC
  • IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN
  • VYKNAGSPAA LRGLATNVAN YNAWSISTCP
  • SYTQGDSNCD EKRYINALAP LLKAQGFPDA
  • VYKNAGSPAA LRGLATNVAN YNAWSISTCP
  • SYTQGDSVCD EKRYINALAP LLKAQGFPDA
  • VYKDAGSPAA LRGLATNVAN YNAWSISTCP
  • SYTQGDQNCD EKRYINALAP LLKAQGFPDA
  • VKTTSSTSVG TSSATTSTTT TPTTTTTTTTTTTT ASTTATTTAA ASGPFSGYQL

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention discloses enzymes for use in biofuel production. Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.

Description

BIOFUEL PRODUCTION ENZYMES AND USES THEREOF
[0001] All patents, patent applications and publications cited herein are hereby incorporated by reference in their entirety. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art as known to those skilled therein as of the date of the invention described and claimed herein.
[0002] This patent disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves any and all copyright rights.
BACKGROUND OF THE INVENTION
[0003] Industrial enzymes have many applications in detergents, textile production, pharmaceuticals, biofuel production, as well as many other applications. Industrial enzymes can be limited by their stability at a wide range of temperatures and pH levels, and currently there is no reliable method to broaden the range of use without simultaneously affecting the enzyme's activity. A common practice involves randomly inserting mutations in existing enzymes and screening for variants that exhibit the desired characteristics; however, due to the enormous combinatorial possibilities, this can become costly and work-intensive and does not guarantee success. The invention relates to a modified method to predictably alter and optimize enzymes, mainly by identifying and resurrecting suitable ancestral strains. By examining the molecular and evolutionary biology involved in the enzyme chemistry, this method establishes a fast and economically efficient system to develop and/or improve enzymes, especially industrial enzymes used in harsh environments such as high temperature or low pH.
[0004] The ancestral cellulases can be classified as cellobiohydrolases II. Modern cellobiohydrolases from fungi are used in the biofuel industry to produce cellulosic ethanol.
SUMMARY OF THE INVENTION
[0005] The invention is based, at least in part, on the discovery of the resurrection of four ancestral cellulase enzymes from fungi.
[0006] These enzymes are useful for the production of Cellulosic Ethanol for biofuels. [0007] Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for bio fuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
[0008] In one aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 2.
[0009] In another aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 3.
[0010] In another aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 10.
[0011] In another aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 11.
[0012] In another aspect, the present invention provides for an isolated polypeptide comprising about 90%> identity to any one of the amino acid sequences of SEQ ID NO: 1,
SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO:
9, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ
ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO:
22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27,
SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 122, SEQ ID NO: 123, SEQ
ID NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ
ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ
ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ
ID NO: 139, SEQ ID NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ
ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ
ID NO: 149, SEQ ID NO: 150, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ
ID NO: 154, SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ
ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ
ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ
ID NO: 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ
ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ
ID NO: 179, SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ
ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ
ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ
ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ
ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ
ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ
ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ
ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ
ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ
ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, or SEQ ID NO: 233.
[0013] In some embodiments the signal peptide of the isolated polypeptide is removed.
[0014] In another aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 31.
[0015] In another aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NO: 32.
[0016] In another aspect, the present invention provides for an isolated polypeptide comprising about 90%> identity to the amino acid sequence of SEQ ID NO: 39.
[0017] In another aspect, the present invention provides for an isolated polypeptide comprising at least about 90% identity to the amino acid sequence of SEQ ID NO: 40.
[0018] In another aspect, the present invention provides for an isolated polypeptide comprising about 90%> identity to any one of the amino acid sequences of SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, or SEQ ID NO: 59.
[0019] In another aspect, the present invention provides for an nucleic acid encoding a polypeptide of the present invention.
[0020] In another aspect, the present invention provides for a recombinant
microorganism, wherein said microorganism expresses a nucleic acid of the present invention. In another aspect, the present invention provides for a recombinant
microorganism, wherein said microorganism expresses a nucleic acid encoding a polypeptide of the present invention, or a combination thereof. In some embodiments, the recombinant microorganism is a fungus. In some embodiments, the recombinant microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes. In some embodiments, the recombinant microorganism is a yeast. In some embodiments, the recombinant microorganism is a bacteria. In some embodiments, the recombinant microorganism is Saccharomyces cerevisiae. In some embodiments, the recombinant microorganism is selected from the group consisting of Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium,
Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp. In some
embodiments, the recombinant microorganism is selected from the group consisting of E.coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., and Streptomyces sp.
[0021] In another aspect, the present invention provides for a method for the production of cellulosic ethanol, comprising adding an isolated polypeptide of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing. In some embodiments, the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized.
[0022] In another aspect, the present invention provides for a method for the production of cellulosic ethanol, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing. In some embodiments, the method further comprises adding a polypeptide of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized.
[0023] In another aspect, the present invention provides for a method for cellulose processing, comprising adding a polypeptide of the present invention, or a combination thereof, to a source material of cellulose. In some embodiments, the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized. [0024] In another aspect, the present invention provides for a method for cellulose processing, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose. In some embodiments, the method further comprises adding a polypeptide of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized.
BRIEF DESCRIPTION OF THE FIGURES
[0025] FIG. 1 is a phylogenetic tree of fungal cellulases.
[0026] FIG. 2 is a phylogenetic tree of fungal cellulases.
[0027] FIG. 3 is a phylogenetic tree of fungal cellulases obtained using BEAST.
[0028] FIG. 4 is a phylogenetic tree of fungal cellulases obtained using MrBayes.
DETAILED DESCRIPTION OF THE INVENTION
[0029] Cellulases are enzymes that can catalyze the hydrolysis of the β-1,4 glucosidic bonds in cellulose, the predominant component of plant matter. In nature, cellulases facilitate microbial conversion of insoluble cellulose contained within biomass into soluble sugars (EA Bayer et al. Current Opinion in Structural Biology, 8:548-557, 1998).
[0030] Cellobiohydrolases from fungi can be used in the biofuel industry to produce cellulosic ethanol. Before the sugars in lignocellulosic biomass, such as wood, can be fermented into ethanol, the lignin that encapsulates the cellulose and the cellulose's unique structural conformation within can be addressed with either acid or enzyme hydrolysis (PC Badger, In: J. Janick and A. Whipkey (eds.), Trends in new crops and new uses. ASHS Press, Alexandria, VA. 2002.)
[0031] The industry needs highly stable and active cellulases that can withstand harsh conditions in ethanol production and processing, such as high temperature, low pH, and/or substrate pretreatment. Previous work demonstrated that ancestral enzymes from thioredoxin superfamily were more thermally stable, pH resistant, and more active than their modern relatives. These organisms lived on the primordial earth, in an environment that was much hotter and more acidic than today. Consequently, their enzymes should have a higher thermal and acidic stability than their modern counterparts (R Perez- Jiminez et al. Nature Structural & Molecular Biology, 18: 592-596, 2011).
[0032] Previous methods for novel enzyme production include directed evolution with its time consuming trial and error approach, yet in the last two decades methods based on statistical theory have been developed to computationally reconstruct ancestral protein sequences (MF Cole & EA Gaucher, Journal of Molecular Evolution, 72(2): 193-203, 2011.1).
[0033] With rising oil prices and increased emphasis on negating the threats of global warming, additional sources of green technologies are being investigated for the generation of bio fuels and textiles. There is an incresing effort to produce highly stable and highly active cellulases to overcome the limitations imposed by the industrial process of ethanol production. These limitations are, for instance, high temperature, low pH or substrate pretreatment. The field of paleoenzymology relates to how ancient organisms subsisted and the biomolecules that supported the environment. In previous work, the inventors demonstrated that ancestral enzymes from the thioredoxin superfamily were more thermally stable, pH resitant and more active than their modern relatives.
List #1 - List of ancestral cellulases
[0034] Following upon this, ancestral sequence reconstruction methods was applied to resurrect cellobiohydrolases going back in time up to -1200 Myr. A set of 31 modern enzymes was used from two different fungi classes, basidiomycota and ascomycota. Most fungal cellulases used in industry belong to these classes. The sequences were aligned using MUSCLE software. Further corrections by hand were necessary to obtain a suitable aligment. The aligment was then used to construct a phylogeny using MrBayes software. Inconsistencies were corrected using the literature and also running additional tests using MEGA and PAUP software. The final tree was used to reconstruct the most probabilistic ancestral sequence for each node of the tree. PAML software was used. A set of 30 ancestral sequences, four of which can be experimentally tested, was finally obtained. The tree is shown in FIG. 1. The list of ancestral sequences are listed in Appendix 1.
[0035] The cellulases that will be resurrected correspond to:
• Last common ancestor of dikarya (1208 Myr): DiCA
• Last common ancestor of Basidiomycota (966 Myr): BasCA • Last common ancestor of Ascomycota (1144 Myr): AsCA
• Last common ancestor of Sordariomycetes (433 Myr):SorCA
[0036] Divergence times were obtained from the literature at http://www.timetree.org.
[0037] The amino acid sequences in bolded text in Appendix 1 can be experimentally tested.
List #2 - List of ancestral cellulases excluding signal peptide
[0038] The amino acid sequences in bolded text in Appendix 2 can be experimentally tested. The sequences in List #2 were constructed in the same way as described above, but in this instance, the signal peptide was removed in all sequences. The signal peptide is a natural addition used in cells to direct proteins to specific compartments. This signal peptide is not critical for function in that it is always cleaved once target has been reached. However, sequences without the signal peptide can be more useful for industrial applications. The names of the sequences remain the same as they represent the same ancestors. The phylogenetic tree is shown in FIG. 2.
[0039] Last common ancestor of dikarya (1208 Myr): DiCA
[0040] Last common ancestor of Basidiomycota (966 Myr): BasCA
[0041] Last common ancestor of Ascomycota (1144 Myr): AsCA
[0042] Last common ancestor of Sordariomycetes (433 Myr):SorCA
List #3 - New ancestral sequences with no signal peptide with restriction sites
[0043] The sequences in List #3 are new ancestral sequences for the specified nodes. They contain restriction sites at the termini. The DNA sequence gctagc encodes a restriction site for the Nhel enzyme and the sequence ggtacc encodes a restriction site for the restriction enzyme Kpnl. The protein sequences are in capital letters. These sequences are depicted in Appendix 3.
[0044] The sequences in Lists #5-6 were constructed in the same way as described above, but in this instance, the phylogeny of the previous finding was expanded to incorporate more sequences of modern cellulases. A total of 58 sequences are now part of the phylogenetic tree. In addition, different software platforms have been used to build trees using bayesian analysis. MrBayes and BEAST software were used. These two programs use Bayesian MCMC analysis but differ in the procedure to estimate posterior probability. Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology. The sequences used in the trees and the sequence alignment are shown in List #4, the phylogenetic trees are shown in FIGS.3 and 4, and the resurrected sequences for each tree are shown in List #5 (BEAST tree) and List #6 (MrBayes tree). These sequences present a reconstruction of fungal cellulase enzymes to be used in the production of bioethanol as a green-fuel source.
List #4 - List of sequences used for the reconstruction including sequence alignment obtained using MUSCLE software
[0045] These sequences are depicted in Appendix 4.
List #5 - List of ancestral cellulases identified using PAML and BEAST tree
[0046] These sequence are depicted in Appendix 5.
[0047] Overall accuracy of the 56 ancestral sequences:
0.74109 0.73826 0.97835 0.97835 0.97835 0.94117 0.94529 0.94578
0.92510 0.94465 0.94629 0.94601 0.94854 0.92166 0.97835 0.92600
0.94225 0.96071 0.96111 0.96784 0.96017 0.96374 0.95726 0.97405
0.96833 0.96265 0.95336 0.97883 0.97716 0.97988 0.98803 0.94760
0.95237 0.94720 0.94774 0.94221 0.96879 0.96964 0.97211 0.97245
0.96504 0.96520 0.99716 0.99644 0.93515 0.93299 0.99472 0.99382
0.94897 0.96626 0.99577 0.99653 0.996830.99713 0.73208 0.91673
List #6 - List of ancestral cellulases identified using PAML and MrBayes tree [0048] These sequence are depicted in Appendix 6. Molecules of the invention
[0049] The singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise.
[0050] The term "about" is used herein to mean approximately, in the region of, roughly, or around. When the term "about" is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term "about" is used herein to modify a numerical value above and below the stated value by a variance of 20%.
[0051] As used herein, an "ancestral cellulase molecule" refers to an ancestral cellulose protein, or a fragment thereof. An "ancestral cellulase molecule" can also refer to a nucleic acid (including, for example, genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA) which encodes a polypeptide corresponding to an ancestral cellulase protein, or fragment thereof. For example, an ancestral cellulose molecule comprises the amino acid sequence shown in SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO:40, SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60:, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ
ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ
ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ
ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, SEQ
ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ
ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ
ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 155, SEQ
ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ
ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ
ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ
ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ
ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ
ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ
ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, or SEQ ID NO: 233, or comprises a nucleic acid sequence encoding the amino acid sequence shown in SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO:40, SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60:, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ
ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ
ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ
ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, SEQ
ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ
ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ
ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 155, SEQ
ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ
ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ
ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ
ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ
ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ
ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, or SEQ ID NO: 233. For example, an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof. The ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art. For example, a nucleic acid that encodes an ancestral cellulase molecule can be obtained by synthetic or semi-synthetic methods, by screening DNA libraries, or by amplification from a natural source. An ancestral cellulase molecule can include a fragment or portion of an ancestral cellulase protein. An ancestral cellulase molecule can include a variant of the above described examples, such as a fragment thereof. Such a variant can comprise a naturally-occurring variant due to allelic variations between individuals (e.g., polymorphisms), mutated alleles, or alternative splicing forms. In one embodiment, an ancestral cellulase molecule comprises a variant of an ancestral cellulase protein or polypeptide encoded by an ancestral cellulase nucleic acid sequence wherein the variant has an amino acid identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233 of about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%. Such variants can include those having at least from about 46% to about 50% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233 or having at least from about 50.1% to about 55% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 55.1% to about 60%> identity to SEQ ID NOS: 1,2,3,4,5,6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21,22, 23,24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 60.1% to about 65% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157,
158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 65.1% to about 70% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 70.1% to about 75% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 75.1% to about 80% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 80.1% to about 85% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 85.1% to about 90% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 90.1% to about 95% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 95.1% to about 97% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233, or having at least from about 97.1% to about 99% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233. In another embodiment, an ancestral cellulase molecule can be a fragment of an ancestral cellulase protein.
[0052] According to the invention, protein variants can include amino acid sequence modifications. For example, amino acid sequence modifications fall into one or more of three classes: substitutional, insertional or deletional variants. Insertions can include amino and/or carboxyl terminal fusions as well as intrasequence insertions of single or multiple amino acid residues. Insertions ordinarily will be smaller insertions than those of amino or carboxyl terminal fusions, for example, on the order of one to four residues. Deletions are characterized by the removal of one or more amino acid residues from the protein sequence. These variants ordinarily are prepared by site-specific mutagenesis of nucleotides in the DNA encoding the protein, thereby producing DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture. In one embodiment, an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide.
[0053] Signal peptides are polypeptide sequences variable in length and amino acid composition found at the amino-terminus of some proteins. Signal peptides direct the secretion of polypeptide molecules through a prokaryotic or eukaryotic cell membrane.
Signal peptides have a tripartite structure consisting of a hydrophobic core flanked by a positively charged n-region and a neutral but polar c-region on either side (Tuteja, R., (2005) Arch. BioChem. Biophys. 441 : 107-111). Signal peptide sequences can be identified by various methods, known to one of skill in the art. For example, signal peptide sequences within a polypeptide sequence can be identified using various prediction tools including, but not limited to, Phobius (http://phobius.sbc.su.se/), Predotar (http://urgi.versailles.inra.fr/predotar/predotar.html), SignalP
(www.cbs.dtu.dk/services/SignalP/), and TargetP (www.cbs.dtu.dk/services/TargetP/).
[0054] In one embodiment, an ancestral cellulase molecule comprises a protein or polypeptide encoded by a nucleic acid sequence encoding an ancestral cellulase protein, such as the sequences shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233. In another embodiment, the polypeptide can be modified, such as by glycosylations and/or acetylations and/or chemical reaction or coupling, and can contain one or several non- natural or synthetic amino acids.
[0055] In one embodiment, an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof. The nucleic acid can be any type of nucleic acid, including genomic DNA, complementary DNA (cDNA), synthetic or semi-synthetic DNA, as well as any form of corresponding RNA. For example, a nucleic acid encoding an ancestral cellulase protein can comprise a recombinant nucleic acid encoding such a protein. The nucleic acid can be a non-naturally occurring nucleic acid created artificially (such as by assembling, cutting, ligating or amplifying sequences). Restriction enzymes can be used to cut nucleic acid sequences in a sequence specific manner, as is known in the art. Restriction enzyme recognition sequences can be added to the ends of a nucleic acid sequence encoding an ancestral cellulase protein (e.g. SEQ ID NOS: 60, 61, 62 and 63). The nucleic acid sequence of a restriction enzyme site can encode amino acids. Amino acids encoded by a restriction enzyme site can form part of the sequence of an ancestral cellulase protein, or may encode additional amino acids at the ends of a polypeptide sequence of an ancestral cellulase protein. Nucleic acid sequences can be double-stranded or single-stranded.
[0056] The invention further provides for nucleic acids that are complementary to an ancestral cellulase molecule. Complementary nucleic acids can hybridize to the nucleic acid sequence described above under stringent hybridization conditions. Non-limiting examples of stringent hybridization conditions include temperatures above 30°C, above 35°C, in excess of 42°C, and/or salinity of less than about 500 mM, or less than 200 mM. Hybridization conditions can be adjusted by the skilled artisan via modifying the temperature, salinity and/or the concentration of other reagents such as SDS or SSC.
[0057] In one embodiment, an ancestral cellulase molecule, according to the methods described herein, can be added to a source material of cellulose for cellulose processing. In one embodiment, an ancestral cellulase molecule can be added as an isolated recombinant protein. In another embodiment, molecule can be added as an isolated modified recombinant protein. For example, an ancestral cellulase protein, or fragment thereof, can be modified by removal of the signal peptide. In another embodiment, an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, or a combination thereof, can be added to a source material of cellulose for cellulose processing.
[0058] In one embodiment, an ancestral cellulase molecule, according to the methods described herein, can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or fragment thereof. In another embodiment, an ancestral cellulase molecule, according to the methods described herein, can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an amino acid sequence of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227,
228, 229, 230, 231, 232, 233 or a combination thereof.
[0059] In one embodiment, an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant protein, or by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or a combination thereof. For example, the recombinant protein and the recombinant microorganism can be added sequentially, in any order, or simultaneously.
[0060] The invention utilizes conventional molecular biology, microbiology, and recombinant DNA techniques available to one of ordinary skill in the art. Such techniques are well known to the skilled worker and are explained fully in the literature. See, e.g., Maniatis, Fritsch & Sambrook, "DNA Cloning: A Practical Approach," Volumes I and II (D. N. Glover, ed., 1985); "Oligonucleotide Synthesis" (M. J. Gait, ed., 1984); "Nucleic Acid Hybridization" (B. D. Hames & S. J. Higgins, eds., 1985); "Transcription and Translation" (B. D. Hames & S. J. Higgins, eds., 1984); "Immobilized Cells and Enzymes" (IRL Press, 1986): B. Perbal, "A Practical Guide to Molecular Cloning" (1984), and Sambrook, et al, "Molecular Cloning: a Laboratory Manual" (2001).
[0061] One skilled in the art can obtain an ancestral cellulase (e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192,
193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,
211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228,
229, 230, 231, 232, or 233) in several ways, which include, but are not limited to, isolating the protein via biochemical means or expressing a nucleotide sequence encoding the protein of interest by genetic engineering methods.
[0062] The invention provides for ancestral cellulase molecules that are encoded by nucleotide sequences. The ancestral cellulase molecule can be a polypeptide encoded by a nucleic acid (including genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA). For example, an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof. The ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art. The ancestral cellulase molecule of the invention can be produced via recombinant DNA technology and such recombinant nucleic acids can be prepared by conventional techniques, including chemical synthesis, genetic engineering, enzymatic techniques, or a combination thereof. For example, a nucleic acid that encodes an ancestral cellulase molecule can be obtained by screening DNA libraries, or by amplification from a natural source. In one embodiment, a nucleic acid amplified from a natural source is modified by various mutagenesis methods known in the art to obtain the ancestral cellulase molecules of the invention. In some embodiment, an ancestral cellulase molecule can be "codon-optimized," as known in the art.
[0063] An ancestral cellulase molecule, can be a fragment of ancestral cellulase protein. For example, the ancestral cellulase protein fragment can encompass any portion of at least about 8 consecutive amino acids of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233. The fragment can comprise at least about 10 consecutive amino acids, at least about 20 consecutive amino acids, at least about 30 consecutive amino acids, at least about 40 consecutive amino acids, a least about 50 consecutive amino acids, at least about 60 consecutive amino acids, at least about 70 consecutive amino acids, at least about 80 consecutive amino acids, at least about 90 consecutive amino acids, at least about 100 consecutive amino acids, at least about 110 consecutive amino acids, at least about 120 consecutive amino acids, at least about 130 consecutive amino acids, at least about 140 consecutive amino acids, at least about 150 consecutive amino acids, at least about 200 consecutive amino acids, at least about 250 consecutive amino acids, at least about 300 consecutive amino acids, at least about 350 consecutive amino acids, or at least about 400 consecutive amino acids of SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233. Fragments include all possible amino acid lengths between about 8 and about 400 amino acids, for example, lengths between about 10 and about 400 amino acids, between about 15 and about 400 amino acids, between about 20 and about 400 amino acids, between about 35 and about 400 amino acids, between about 40 and about 400 amino acids, between about 50 and about 400 amino acids, between about 70 and about 400 amino acids, between about 100 and about 400 amino acids, between about 200 and about 400 amino acids, between about 300 and about 400 amino acids, or between about 350 and about 400 amino acids.
[0064] In one embodiment, a fragment of a nucleic acid sequence that comprises an ancestral cellulase molecule can encompass any portion of at least about 8 consecutive nucleotides. In one embodiment, the fragment can comprise at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, or at least about 30 nucleotides.
Recombinant proteins
[0065] The ancestral cellulase molecules can be recombinant enzymes, and can be produced in a variety of ways known in the art. One skilled in the art understands that polypeptides (e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, and the like) can be obtained in several ways, which include but are not limited to, expressing a nucleotide sequence encoding the protein of interest, or fragment thereof, by genetic engineering methods.
[0066] In one embodiment, the nucleic acid is expressed in an expression cassette, for example, to achieve overexpression in a cell. The nucleic acids of the invention can be an R A, cDNA, cDNA-like, or a DNA of interest in an expressible format, such as an expression cassette, which can be expressed from a natural promoter or an entirely
heterologous promoter. The nucleic acid of interest can encode a protein, and may or may not include introns. Any recombinant expression system can be used, including, but not limited to, the recombinant microorganisms of the invention, as well as other bacterial, fungal, mammalian, insect, or plant cell expression systems.
[0067] Nucleic acid sequences comprising an ancestral cellulase molecule that encode a polypeptide can be synthesized, in whole or in part, using chemical methods known in the art. Alternatively, an ancestral cellulase molecule can be produced using chemical methods to synthesize its amino acid sequence, such as by direct peptide synthesis using solid-phase techniques. Protein synthesis can either be performed using manual techniques or by automation. Automated synthesis can be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer). Optionally, fragments of an ancestral cellulase molecule can be separately synthesized and combined using chemical methods to produce a full-length molecule.
[0068] Host cells transformed with a nucleic acid sequence encoding an ancestral cellulase molecule (such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233) can be cultured under conditions suitable for the expression and recovery of the protein from cell culture. The polypeptide produced by a transformed cell can be secreted or contained intracellularly depending on the sequence and/or the vector used. Methods for protein production by recombinant technology in different host systems are well known in the art (Sambrook, et al., "Molecular Cloning: a Laboratory Manual" (2001); Gellissen, G., "Novel Microbial and Eukaryotic Expression Systems" (2005)). Expression vectors containing a nucleic acid sequence encoding an ancestral cellulase molecule can be designed to contain signal sequences which direct secretion of soluble polypeptide molecules encoded by an ancestral cellulase molecule, through a prokaryotic or eukaryotic cell membrane. An ancestral cellulase molecule can be produced as an extracellular enzyme that is secreted into the culture medium, from which it can easily be recovered and isolated. The spent culture medium of the production host can be used as such, or the host cells can be removed therefrom, and/or it can be concentrated, filtrated or fractionated. It can also be dried. In some embodiments an ancestral cellulase molecule, or fragment thereof, can be modified by removal of the signal peptide which can allow the polypeptide molecules to be contained intracellularly.
[0069] An isolated polypeptide of the present invention includes, but is not limited to, culture medium containing the polypeptide from which cells and cell debris have been removed. Conveniently the polypeptides can be isolated e.g. by adding anionic and/or cationic polymers to the spent culture medium to enhance precipitation of cells, cell debris and other unwanted enzymes. The medium can be filtrated using an inorganic filtering agent and a filter to remove the precipitants formed. The filtrate can be further processed using a semi-permeable membrane to remove excess of salts, sugars and metabolic products.
[0070] A synthetic peptide can be substantially purified via high performance liquid chromatography (HPLC). The composition of a synthetic ancestral cellulase molecule can be confirmed by amino acid analysis or sequencing. Additionally, any portion of an amino acid sequence comprising a protein encoded by an ancestral cellulase molecule can be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins to produce a variant polypeptide or a fusion protein.
[0071] The invention further encompasses methods for using a protein or polypeptide encoded by a nucleic acid sequence of an ancestral cellulase molecule. In another
embodiment, the polypeptide can be modified, such as by glycosylations and/or acetylations and/or chemical reaction or coupling, and can contain one or several non-natural or synthetic amino acids. An example of an ancestral cellulase molecule comprises the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233. In certain embodiments, the invention encompasses variants of a protein encoded by an ancestral cellulase molecule.
Microorganisms of the invention
[0072] Some aspects of the present invention provide for recombinant microorganisms that express a nucleic acid encoding an ancestral cellulase enzyme (e.g., the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233). Such microorganisms can include both prokaryotic and eukaryotic microorganisms, such as bacteria and yeast. In one embodiment, the microorganism is a fungus. In another embodiment, the microorganism is from the phylum Basidomycota, from the phylum
Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes. In a further embodiment, the microorganism is a yeast. In yet another embodiment, the microorganism is a bacteria. In another embodiment, the microorganism is E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp. In a further embodiment, the microorganism is Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.
[0073] Any microorganism may be utilized according to the present invention. In certain aspects, a microorganism is a eukaryotic or prokaryotic microorganism. In certain aspects, a microorganism is a yeast, such as Saccharomyces cerevisiae. In certain aspects, a microorganism is a bacteria, such as a gram-positive bacteria or a gram-negative bacteria.
[0074] Other microorganisms may be used according to the present invention. For example, other organisms from the genera Achaetomium, Acremonium, Aspergillus,
Botrytis, Chaetomium, Chrysosporium, Collybia, Fames, Fusarium, Humicola, Hypocrea, Lentinus, Metanacarpus, Myceliophthora, Myriococcum, Neurospora, Penicillium,
Phanerochaete, Phlebia, Pleurotus, Podospora, Polyporus, Pycnoporus, Rhizoctonia,
Scytalidium, Thermoascus, Thielavia, Trametes and Trichoderma. Additional organisms include, but are not limited to Acetobacter aceti, Achromobacter, Acidiphilium,
Acinetobacter, Actinomadura, Actinoplanes, Aeropyrum pernix, Agrobacterium, Alcaligenes, Ananas comosus (M), Arthrobacter, Aspargillus niger, Aspargillus oryze, Aspergillus melleus, Aspergillus pulverulentus, Aspergillus saitoi, Aspergillus sojea, Aspergillus usamii, Bacillus alcalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus lentus, Bacillus licheniformis, Bacillus macerans, Bacillus
stearothermophilus, Bacillus subtilis, Bifidobacterium, Brevibacillus brevis, Burkholderia cepacia, Candida cylindracea, Candida rugosa, Carica papaya (L), Cellulosimicrobium, Cephalosporium, Chaetomium erraticum, Chaetomium gracile, Clostridium, Clostridium butyricum, Clostridium acetobutylicum, Clostridium thermocellum, Corynebacterium
(glutamicum), Corynebacterium efficiens, Escherichia coli, Enterococcus, Erwina
chrysanthemi, Gliconobacter, Gluconacetobacter, Haloarcula, Humicola insolens, Humicola nsolens, Kitasatospora setae, Klebsiella, Klebsiella oxytoca, Kluyveromyces, Kluyveromyces fragilis, Kluyveromyces lactis, Kocuria, Lactlactis, Lactobacillus, Lactobacillus fermentum, Lactobacillus sake, Lactococcus, Lactococcus lactis, Leuconostoc, Methylocystis,
Methanolobus siciliae, Methanogenium organophilum, Methanobacterium bryantii,
Microbacterium imperiale, Micrococcus lysodeikticus, Microlunatus, Mucor javanicus, Mycobacterium, Myrothecium, Nitrobacter, Nitrosomonas, Nocardia, Papaya carica,
Pediococcus, Pediococcus halophilus, Penicillium, Penicillium camemberti, Penicillium citrinum, Penicillium emersonii, Penicillium roqueforti, Penicillum lilactinum, Penicillum multicolor, Paracoccus pantotrophus, Propionibacterium, Pseudomonas, Pseudomonas fluorescens, Pseudomonas denitrificans, Pyrococcus, Pyrococcus furiosus, Pyrococcus horikoshii, Rhizobium, Rhizomucor miehei, Rhizomucor pusillus Lindt, Rhizopus, Rhizopus delemar, Rhizopus japonicus, Rhizopus niveus, Rhizopus oryzae, Rhizopus oligosporus, Rhodococcus, Sccharomyces cerevisiae, Sclerotina libertina, Sphingobacterium multivorum, Sphingobium, Sphingomonas, Streptococcus, Streptococcus thermophilus Y-1, Streptomyces, Streptomyces griseus, Streptomyces lividans, Streptomyces murinus, Streptomyces rubiginosus, Streptomyces violaceoruber, Streptoverticillium mobaraense, Tetragenococcus, Thermus, Thiosphaera pantotropha, Trametes, Trichoderma, Trichoderma longibrachiatum, Trichoderma reesei, Trichoderma viride, Trichosporon penicillatum, Vibrio alginolyticus, Xanthomonas, yeast, Zygosaccharomyces rouxii, Zymomonas, and Zymomonus mobilis. The organisms can be utilized as recombinant microorganisms provided herein, and, can be utilized according to the various methods of the present invention.
[0075] In certain embodiments, a recombinant microorganism may be engineered to secrete an ancestral cellulase molecule into the culture media, such as by incorporating a signal peptide or an autotransporter domain into the ancestral cellulase molecule. In some embodiments, ancestral cellulase molecules can be fused with any combination of signal peptides and or autotransporter domains found in secreted proteins as is known in the art. In some embodiments, ancestral cellulase molecules can be designed to maximize the secretion of ancestal cellulase molecules into the culture media, and may also include the use of many different linker sequences that fuse signal peptides, ancestal cellulase molecules, and autotransporters that improve the efficiency of secretion or the cell surface presentation. In some embodiments, an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide. In some embodiments, an ancestral cellulase molecule is purified from the culture media. In other embodiments, an ancestral cellulase molecule is not purified from the culture media.
Expression Systems
[0076] In addition to recombinant microorganisms of the invention, as detailed above, any other recombinant expression system can be used to obtain an isolated ancestral cellulase molecule.
[0077] Bacterial Expression Systems. One skilled in the art understands that expression of desired protein products in prokaryotes is most often carried out in E. coli with vectors that contain constitutive or inducible promoters. Some non-limiting examples of bacterial cells for transformation include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp., and the bacterial cell line E. coli strains DH5 or MC1061/p3 (Invitrogen Corp., San Diego, Calif), which can be transformed using standard procedures practiced in the art, and colonies can then be screened for the appropriate plasmid expression. In bacterial systems, a number of expression vectors can be selected. Non- limiting examples of such vectors include multifunctional E. coli cloning and expression vectors such as BLUESCRIPT (Stratagene). Some E. coli expression vectors (also known in the art as fusion- vectors) are designed to add a number of amino acid residues, usually to the N-terminus of the expressed recombinant protein. Such fusion vectors can serve three functions: 1) to increase the solubility of the desired recombinant protein; 2) to increase expression of the recombinant protein of interest; and 3) to aid in recombinant protein purification by acting as a ligand in affinity purification. In some instances, vectors, which direct the expression of high levels of fusion protein products that are readily purified, may also be used. Some non-limiting examples of fusion expression vectors include pGEX, which fuse glutathione S-tranf erase (GST) to desired protein; pcDNA 3.1/V5-His A B & C (Invitrogen Corp, Carlsbad, CA) which fuse 6x-His to the recombinant proteins of interest; pMAL (New England Biolabs, MA) which fuse maltose E binding protein to the target recombinant protein; the E. coli expression vector pUR278 (Ruther et al., (1983) EMBO 12: 1791), wherein the coding sequence may be ligated individually into the vector in frame with the lac Z coding region in order to generate a fusion protein; and pIN vectors (Inouye et al, (1985) Nucleic Acids Res. 13:3101-3109; Van Heeke et al, (1989) J. Biol. Chem.
24:5503-5509. Fusion proteins generated by the likes of the above-mentioned vectors are generally soluble and can be purified easily from lysed cells via adsorption and binding of the fusion protein to an affinity matrix. For example, fusion proteins can be purified from lysed cells via adsorption and binding to a matrix of glutathione agarose beads subsequently followed by elution in the presence of free glutathione. For example, the pGEX vectors are designed to include thrombin or factor Xa protease cleavage sites so that the cloned target can be released from the GST moiety. In other embodiments, an ancestral cellulase molecule is not purified from the culture media.
[0078] Plant and Insect Expression Systems. Other suitable cell lines, in addition to microorganisms such as bacteria (e.g., E. coli and B. subtilis) transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression vectors containing coding sequences for an ancestral cellulase molecule may alternatively be used to produce the molecule of interest. A non-limiting example includes plant cell systems infected with recombinant virus expression vectors (for example, tobacco mosaic virus, TMV; cauliflower mosaic virus, CaMV) or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid) containing coding sequences for an ancestral cellulase molecule. If plant expression vectors are used, the expression of sequences encoding an ancestral cellulase molecule can be driven by any of a number of promoters. For example, viral promoters such as the 35S and 19S promoters of CaMV can be used alone or in combination with the omega leader sequence from tobacco mosaic virus TMV. Alternatively, plant promoters such as the small subunit of RUBISCO or heat shock promoters, can be used. These constructs can be introduced into plant cells by direct DNA transformation or by pathogen-mediated transfection.
[0079] In another embodiment, an insect system also can be used to express an ancestral cellulase molecule. For example, in one such system Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. Sequences encoding a trefoil family molecule can be cloned into a non-essential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion of the nucleic acid sequences of an ancestral cellulase molecule will render the polyhedrin gene inactive and produce recombinant virus lacking coat protein. The recombinant viruses can then be used to infect S. frugiperda cells or Trichoplusia larvae in which an ancestral cellulase molecule can be expressed.
[0080] Fungal Expression Systems. In another embodiment, a fungal system also can be used to express an ancestral cellulase molecule. Fungi can be transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule. Some non- limiting examples of fungi for transformation include, Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.. In some embodiment, fungi from the subkingdom dikarya, from the phylum Basidomycota, from the phylum Ascomycota, or from the class Sordariomycetes can be transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule.
[0081] Mammalian Expression Systems. Mammalian cells can also contain an expression vector (for example, one that harbors a nucleotide sequence encoding an ancestral cellulase molecule for expression of a desired product. Expression vectors containing such a nucleic acid sequence linked to at least one regulatory sequence in a manner that allows expression of the nucleotide sequence in a host cell can be introduced via methods known in the art. The vector can be a recombinant DNA or RNA vector, and includes DNA plasmids or viral vectors. A number of viral-based expression systems can be used to express an ancestral cellulase molecule in mammalian host cells (e.g., adeno-associated virus, retrovirus, adenovirus, lentivirus or alphavirus). [0082] Expression of recombinant proteins. Regulatory sequences are well known in the art, and can be selected to direct the expression of a protein or polypeptide of interest (such as an ancestral cellulase molecule) in an appropriate host cell as described in Goeddel, Gene Expression Technology: Methods in Enzvmology 185, Academic Press, San Diego, Calif. (1990). Non- limiting examples of regulatory sequences include: polyadenylation signals, promoters, enhancers, and other expression control elements. Practitioners in the art understand that designing an expression vector can depend on factors, such as the choice of host cell to be transfected and/or the type and/or amount of desired protein to be expressed.
[0083] Enhancer regions, which are those sequences found upstream or downstream of the promoter region in non-coding DNA regions, are also known in the art to be important in optimizing expression. If needed, origins of replication from viral sources can be employed, such as if a prokaryotic host is utilized for introduction of plasmid DNA. However, in eukaryotic organisms, chromosome integration is a common mechanism for DNA replication.
[0084] A gene that encodes a selectable marker (for example, resistance to antibiotics or drugs, such as ampicillin, neomycin, G418, and hygromycin) can be introduced into host cells along with the gene of interest in order to identify and select clones that stably express a gene encoding a protein of interest. The gene encoding a selectable marker can be introduced into a host cell on the same plasmid as the gene of interest or can be introduced on a separate plasmid. Cells containing the gene of interest can be identified by drug selection wherein cells that have incorporated the selectable marker gene will survive in the presence of the drug. Cells that have not incorporated the gene for the selectable marker die. Surviving cells can then be screened for the production of the desired protein molecule (for example, an ancestral cellulase molecule).
[0085] A host cell strain can be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed ancestral cellulase molecule in the desired fashion. Such modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation. Post-translational processing which cleaves a "prepro" form of the polypeptide also can be used to facilitate correct insertion, folding and/or function. Different host cells which have specific cellular machinery and characteristic mechanisms for post-translational activities can be chosen to ensure the correct modification and processing of the foreign protein.
[0086] An exogenous nucleic acid can be introduced into a cell via a variety of techniques known in the art, such as lipofection, microinjection, calcium phosphate or calcium chloride precipitation, DEAE-dextrin-mediated transfection, or electroporation. Electroporation is carried out at approximate voltage and capacitance to result in entry of the DNA construct(s) into cells of interest. Other methods used to transfect cells can also include modified calcium phosphate precipitation, polybrene precipitation, liposome fusion, and receptor-mediated gene delivery.
[0087] A host cell strain, which modulates the expression of the inserted sequences, or modifies and processes the nucleic acid in a specific fashion desired also may be chosen. Such modifications (for example, glycosylation and other post-translational modifications) and processing (for example, cleavage) of protein products may be important for the function of the protein. Different host cell strains have characteristic and specific mechanisms for the post-translational processing and modification of proteins and gene products. As such, appropriate host systems or cell lines can be chosen to ensure the correct modification and processing of the foreign protein expressed, such as an ancestral cellulase molecule. Thus, eukaryotic host cells possessing the cellular machinery for proper processing of the primary transcript, glycosylation, and phosphorylation of the gene product may be used. Non-limiting examples of host cells include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp., Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp..
[0088] Various culturing parameters can be used with respect to the host cell being cultured. Appropriate culture conditions for host cells are well known in the art or can be determined by the skilled artisan (see, for example, Madigan M. et al., "Brock Biology of Microorganisms", 2012). Cell culturing conditions can vary according to the type of host cell selected. Commercially available medium can be utilized.
[0089] Cells suitable for culturing can contain introduced expression vectors, such as plasmids or viruses. The expression vector constructs can be introduced via transformation, microinjection, trans fection, lipofection, electroporation, or infection. The expression vectors can contain coding sequences, or portions thereof, encoding the proteins for expression and production. Expression vectors containing sequences encoding the produced proteins and polypeptides, as well as the appropriate transcriptional and translational control elements, can be generated using methods well known to and practiced by those skilled in the art. These methods include synthetic techniques, in vitro recombinant DNA techniques, and in vivo genetic recombination which are described in J. Sambrook et al., 201, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. and in F. M. Ausubel et al, 1989, Current Protocols in Molecular Biology, John Wiley & Sons, New York, N.Y.
Purification of recombinant proteins
[0090] An ancestral cellulase molecule (such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, or 233) can be purified from any cell which expresses the polypeptide, including those which have been transfected with expression constructs that express an ancestral cellulase molecule. A purified ancestral cellulase molecule can be separated from other compounds which normally associate with the ancestral cellulase molecules, in the cell, such as certain proteins, carbohydrates, or lipids, using methods practiced in the art. For protein recovery, isolation and/or purification, the cell culture medium or cell lysate is centrifuged to remove particulate cells and cell debris. The desired polypeptide molecule (for example, an ancestral cellulase molecule) is isolated or purified away from contaminating soluble proteins and polypeptides by suitable purification techniques. Non-limiting purification methods for proteins include: size exclusion chromatography; affinity chromatography; ion exchange chromatography; ethanol precipitation; reverse phase HPLC; chromatography on a resin, such as silica, or cation exchange resin, e.g., DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate precipitation; gel filtration using, e.g., Sephadex G-75, Sepharose; and the like. Other additives, such as protease inhibitors (e.g., PMSF or proteinase K) can be used to inhibit proteolytic degradation during purification. Purification procedures that can select for carbohydrates can also be used, e.g., ion-exchange soft gel chromatography, or HPLC using cation- or anion-exchange resins, in which the more acidic fraction(s) is/are collected. Cellulases and cellulosic biofuels
[0091] Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
[0092] The production of cellulosic ethanol biofuels from cellulosic materials can be performed by various techniques known in the art. See, for example, Canilha L. et al., (2012) J. Biomed. Biotech., 2012:989572; U.S. Patent No. 8,318,473; U.S. Patent No. 8,409,836; U.S. Patent Application Publication No. 20110171710A1 , which are incorporated by reference in their entireties. The starting material for the production of cellulosic biofuels can be cellulosic materials (i.e. any material comprising lignocellulose, cellulose, hemicellulose, or a combination thereof). A source material of cellulose can be any cellulosic material.
Examples of cellulosic materials, include, but are not limited to, fruits, plants, vegetables, woods, grasses, inedible parts of plants, byproducts of lawn and tree maintenance, corn stover, Panicum virgatum, Miscanthus grass species, wood chips, sugarcane residues, sugarcane bagasse, straw, pulp and paper residues, waste paper, textile fibers (e.g., cotton, linen, hemp, jute) and cellulosic fibers (e.g., modal, viscose, lyocel).
[0093] During the production of cellulosic ethanol from cellulosic material, cellulosic materials can be processed by various techniques known in the art. Most of the
carbohydrates in cellulosic material are in the form of lignocellulose, which can comprise cellulose, hemicellulose, pectin and/or lignin. Cellulosic material can be pre-treated by physical and/or chemical means. Pre-treatment can make the cellulose fraction more accessible to hydrolysis. Cellulose and/or hemicellulose comprising the cellulosic materials can then be hydrolysed into sugars (e.g., glucose). In some embodiments, ancestral cellulases can be used for the hydrolysis of cellulose. In other embodiments, the carbohydrate polymers of cellulose are depolymerized by an ancestral cellulase. Sugars made available by hydrolysis can be used by microorganisms to produce ethanol by fermentation.
[0094] In some aspects, the present invention provides a method for the production of cellulosic ethanol from a source material of cellulose. Cellulase enzymes can be added to cellulosic materials by various techniques. In one embodiment, ancestral cellulases are added to cellulosic materials as an isolated polypeptide. In another embodiment, recombinant microorganisms that express ancestral cellulases are added to cellulosic materials. For example, microorganisms that do not express cellulase enzymes can be genetically modified to express ancestral cellulases. Microorganisms can be modified in a variety of ways, such as, but not limited to, to express cellulases, to express large volumes of cellulases, to express modified cellulases, and to express ancestral cellulases. In a further embodiment, ancestral cellulases can be added to cellulosic materials by addition of isolated polypeptides and by addition of recombinant microorganisms that express ancestral cellulases. Isolated polypeptides and recombinant microorganisms can be added simultaneously or sequentially, in any order.
[0095] Ancestral cellulases needed for the hydrolysis of the cellulosic material according to the invention may be added in an enzymatically effective amount either simultaneously e.g. in the form of an enzyme mixture, or sequentially, or in combination with any microorganism of the present invention, or in combination with microorganisms that mediate fermentation. Any combination of the ancestral cellulase molecules comprising an amino acid sequence having about 90% identity to SEQ ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, or a fragment thereof, may be used together with any combination of other enzymes, for example hemicellulases, proteases,amylases, laccases, lipases, pectinases, esterases and/or peroxidases. Another enzyme treatment may be carried out before, during or after the cellulase treatment.
[0096] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Exemplary methods and materials are described below, although methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention.
[0097] All publications and other references mentioned herein are incorporated by reference in their entirety, as if each individual publication or reference were specifically and individually indicated to be incorporated by reference. Publications and references cited herein are not admitted to be prior art.
EXAMPLES
EXAMPLE 1 - Reconstuction Of Ancestral Cellulase Sequences
[0098] There is an incresing effort to produce highly stable and highly active cellulases to overcome the limitations imposed by the industrial process of ethanol production. These limitations are, for instance, high temperature, low pH or substrate pretreatment. In previous work, the inventors demonstrated that ancestral enzymes from the thioredoxin superfamily were more thermally stable, pH resitant and more active than their modern relatives.
[0099] Following upon this, ancestral sequence reconstruction methods was applied to resurrect cellobiohydrolases going back in time up to -1200 Myr. A set of 31 modern enzymes was used from two different fungi classes, basidiomycota and ascomycota. Most fungal cellulases used in industry belong to these classes. The sequences were aligned using MUSCLE software. Further corrections by hand were necessary to obtain a suitable aligment. The aligment was then used to construct a phylogeny using MrBayes software. Inconsistencies were corrected using the literature and also running additional tests using MEGA and PAUP software. The final tree was used to reconstruct the most probabilistic ancestral sequence for each node of the tree. PAML software was used. A set of 30 ancestral sequences, four of which will be experimentally tested, was finally obtained. The tree is shown in FIG. 1. The list of ancestral sequences are listed as SED ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, and 63.
[0100] The cellulases that will be resurrected correspond to:
• Last common ancestor of dikarya (1208 Myr): DiCA
• Last common ancestor of Basidiomycota (966 Myr): BasCA
• Last common ancestor of Ascomycota (1144 Myr): AsCA
• Last common ancestor of Sordariomycetes (433 Myr):SorCA
[0101] Divergence times were obtained from the literature at http://www.timetree.org. EXAMPLE 2 - Reconstuction Of Ancestral Cellulase Sequences using an increased number of sequences [0102] The resconstructed sequences were obtained in the same way as described above in Example 1 , but in this instance, the phylogeny of the previous finding was expanded to incorporate more sequences of modern cellulases. A total of 58 sequences are now part of the phylogenetic tree. In addition, different software platforms have been used to build trees using bayesian analysis. MrBayes and BEAST software were used. These two programs use Bayesian MCMC analysis but differ in the procedure to estimate posterior probability.
Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology. The sequences used in the trees and the sequence alignment are listed as SED ID NOS: 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, and 121. The phylogenetic trees are shown in FIGS. 3 and 4, and the resurrected sequences for each tree are listed as SEQ ID NOS: 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, and 177, (BEAST tree) and SEQ ID NOS: 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, and 233 (MrBayes tree). These sequences present a reconstruction of fungal cellulase enzymes to be used in the production of bioethanol as a green-fuel source.
Data set for the tree shown in FIG. 3
[0103] ((((tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum:
0.14832,tr_Q96V98_Neocallimasti_Orpinomyces: 0.10181):
0.48220,tr_Q9UWl O Neocallimasti Piromyces rhizinflatus: 0.86216):
0.16134,((tr_Q9Y894_Basidiomycota_Volvariella_volvacea:
0.73410,((tr_C4B8Il_Basidiomycota_Coniophora_puteana:
0.24027,tr_G4TC42_Basidiomycota_Piriformospora_indica: 0.42299):
0.10387,(((tr_Q6E5Bl_Basidiomycota_Volvariella_volvacea:
0.14128 ,tr_E2 JAJ2_Basidiomycota_Neolentinus_lepideus : 0.16400) :
0.03301 ,((tr_A8NEJ3_Basidiomycota_Coprinopsis_cinerea:
0.00000,gi_169852726_Ascomycota_Coprinopsis_cinerea: 0.00000): 0.31528,tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju: 0.22551): 0.06861): 0.02141, ((((tr_Q02321_Basidiomycota_Phanerochaete_chrysos:
0.12648,tr_B2ZZ24_Basidiomycota_Irpex_lacteus: 0.11924):
0.07358 ,tr_A8CED8_Basidiomycota_Polyporus_arcularius : 0.12372) :
0.05135,tr_F8Q7V9_Basidiomycota_Serpula_lacrymans: 0.20628):
0.03719,tr_Q96VU2_Basidiomycota_Lentinula_edodes: 0.16802): 0.02828):
0.06485):0.05505): 0.06209,
((((tr_J3NZ73_Ascomycota_Gaeumannomyces_graminis:0.17800,
tr_G4MM92_Ascomycota_Magnaporthe_oryzae : 0.22361 ) :
0.09673,((((sp_Q9ClS9_Ascomycota_Humicola_insolens:
0.15506,tr_Q4JQF8_Ascomycota_Chaetomium_thermophilum: 0.12926):
0.04646,(tr_B2ABX7_Ascomycota_Podospora_anserina:
0.14978,tr_Q2GMP2_Ascomycota_Chaetomium_globosum: 0.10020): 0.04691):
0.031 15,(tr_G2QW39_Ascomycota_Thielavia_terrestris:
0.16912,(gi_367023495_Ascomycota_Myceliophthora_thermophila:
0.00000,tr_G2QA39_Ascomycota_Thielavia_heterothallica: 0.00000): 0.12077):
0.04611):0.02753, (tr_F8MDR2_Ascomycota_Neurospora_tetrasperma: 0.00652, tr_Q872J7_Ascomycota_Neurospora_crassa: 0.00221): 0.19589): 0.07279):0.06599,
(((sp_P46236_Ascomycota_Fusarium_oxysporum:
0.12432 ,tr_11 RIJ 1 Ascomycota Gibberella zeae : 0.08747) :
0.21910,((gi_302405457_Ascomycota_Verticillium_albo-atnim:
0.00734,tr_G2XB72_Ascomycota_Verticillium_dahliae: 0.00466):
0.25016,(tr_E3Q540_Ascomycota_Colletotrichum_graminicola:
0.00000,gi_310790274_Ascomycota_Glomerella_graminicola: 0.00000): 0.19747):0.05590): 0.06455 , ((((tr_H9C5T 1 Ascomycota Hypocrea orientalis :
0.00000,tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum: 0.00930):
0.01458 ,(((sp_P07987_Ascomycota_Hypocrea J ecorina:
0.00000,tr_Q7LSP2_Ascomycota_Trichoderma_koningii: 0.00224):
0.00000,tr_Q6UJX9_Acomycota_Trichoderma_viride: 0.00224):
0.00704,tr_Q66PNl_Ascomycota_Trichoderma_parceramosum: 0.02019): 0.00417):
0.11581,tr_G9NFV6_Ascomycota_Hypocrea_atroviridis: 0.09265):
0.10551,tr_F2VRZ0_Ascomycota_Phialophora_sp: 0.23080): 0.05114): 0.02534):
0.09892,(gi_345565889_Ascomycota_Arthrobotrys_oligospora:
0.36409,((((sp_AlCCN4_Ascomycota_Aspergillus_clavatus: 0.18558 ,tr_G2XV25_Ascomycota_Botryotinia fuckeliana: 0.21037):
0.02494,(tr_F 1 CHI2_Ascomycota_Penicillium_decumbens :
0.13951 ,(sp_Q0CFP 1 Ascomycota Aspergillus terreus :
0.12080,(sp_AlDJQ7_Ascomycota_Neosartorya_fischeri:
0.0206 l,sp_Q4WFK4_Ascomycota_Neosartorya_fumigata: 0.01653): 0.07979):
0.03080):0.03517): 0.03621, ((tr_B8MHF4_Ascomycota_Talaromyces_stipitatus:
0.05529,((tr_B5TMG4_Acomycota_Penicillium_funiculosum:
0.01343,tr_O93837_Ascomycota_Acremonium_cellulolyticus: 0.00000):
0.03069,tr_B6QMM6_Ascomycota_Penicillium_marneffei: 0.06648): 0.01340):
0.12914,tr_Q8NIB5_Ascomycota_Talaromyces_emersonii: 0.19811): 0.05037):
0.02493,((sp_Q5B2E8_Ascomycota_Emericella_nidulans:
0.10337,(tr_G7XQ80_Ascomycota_Aspergillus_kawachii:
0.04951 ,sp_A2QYR9_Ascomycota_Aspergillus_niger: 0.03798): 0.06732):
0.03638,tr_Q2U2I8_Ascomycota_Aspergillus_oryzae: 0.35624): 0.04475):
0.04225):0.04942): 0.14331): 0.34731):
0.31315,tr_A9FHT2_Bacteria_Sorangium_cellulosum: 0.32971);
[0104] Tree with node labels:
( ( ( (4_tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum, 3_tr_Q96V98 Neocallimasti Orpinomyces ) 62
,2 tr Q9UW10 Neocallimasti Piromyces rhizinflatus) 61
, ( (48_tr_Q9Y894_Basidiomycota_Volvariella_volvacea, ( (51_tr_C4B8ll_Ba sidiomycota Coniophora puteana,47 tr G4TC42 Basidiomycota Piriformos pora_indica) 66
, (((57 tr Q6E5B1 Basidiomycota Volvariella volvacea, 52 tr E2JAJ2 Bas idiomycota Neolentinus lepideus) 69
, ((54 tr A8NEJ3 Basidiomycota Coprinopsis cinerea, 44 gi 169852726 As comycota Coprinopsis cinerea) 71
, 49_tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju) 70 ) 68
, ( ( ( (50_tr_Q02321_Basidiomycota_Phanerochaete_chrysos, 55_tr_B2ZZ24_B asidiomycota_Irpex_lacteus ) 75
,53 tr A8CED8 Basidiomycota Polyporus arcularius) 74
,58 tr F8Q7V9 Basidiomycota Serpula lacrymans) 73
,56 tr Q96VU2 Basidiomycota Lentinula edodes) 72 ) 67 ) 65 ) 64
, ((((13 tr J3NZ73 Ascomycota Gaeumannomyces graminis, 6 tr G4MM92 Ascomycota Magnaporthe oryzae) 79
, ( ( ( ( 17_sp_Q9ClS 9_Ascomycota_Humicola_insolens , 16_tr_Q4JQF8_Ascomyco ta Chaetomium thermophilum) 83
, (8 tr B2ABX7 Ascomycota Podospora anserina, 7 tr Q2GMP2 Ascomycota C haetomium globosum) 84 ) 82
, (12_tr_G2QW39_Ascomycota_Thielavia_terrestris, (40_gi_367023495_Asco mycota Myceliophthora thermophila, 11 tr G2QA39 Ascomycota Thielavia heterothallica) 86 ) 85 ) 81
, (10 tr F8MDR2 Ascomycota Neurospora tetrasperma, 9 tr Q872J7 Ascomyc ota_Neurospora_crassa) 87 ) 80 ) 78
, (((29 sp P46236 Ascomycota Fusarium oxysporum,23 tr I1RIJ1 Ascomyco ta_Gibberella_zeae ) 90
, ( (42 gi 302405457 Ascomycota Verticillium albo- atrum, 14 tr G2XB72 Ascomycota Verticillium dahliae) 92
, (15 tr E3Q540 Ascomycota Colletotrichum graminicola, 41 gi 310790274
Ascomycota Glomerella graminicola) 93 ) 91 ) 89
, ( ( ( (20_tr_H9C5Tl_Ascomycota_Hypocrea_orientalis, 2 l_tr_DlMGM6_Ascomy cota Trichoderma longibrachiatum) 97
, ( ( ( 18_sp_P07987_Ascomycota_Hypocrea_j ecorina, 19_tr_Q7LSP2_Ascomycot a Trichoderma koningii) 100
,45 tr Q6UJX9 Acomycota Trichoderma viride) 99
,46 tr Q66PN1 Ascomycota Trichoderma parceramosum) 98 ) 96
,22 tr G9NFV6 Ascomycota Hypocrea atroviridis) 95
, 31_tr_F2VRZ0_Ascomycota_Phialophora_sp) 94 ) 88 ) 77
, (43_gi_345565889_Ascomycota_Arthrobotrys_oligospora, ( ( ( (33_sp_AlCCN 4 Ascomycota Aspergillus clavatus,25 tr G2XV25 Ascomycota Botryotini a fuckeliana) 105
, (32 tr F1CHI2 Ascomycota Penicillium decumbens, (30 sp Q0CFP1 Ascomy cota Aspergillus terreus, (39 sp A1DJQ7 Ascomycota Neosartorya fische ri, 36_sp_Q4WFK4_Ascomycota_Neosartorya_fumigata) 108 ) 107 ) 106 ) 104
, ( (38_tr_B8MHF4_Ascomycota_Talaromyces_stipitatus, ( ( 35_tr_B5TMG4_Aco mycota Penicillium funiculosum, 34 tr 093837 Ascomycota Acremonium ce llulolyticus ) 112 ,26 tr B6QMM6 Ascomycota Penicillium marneffei) 111 ) 110 , 24_tr_Q8NIB5_Ascomycota_Talaromyces_emersonii) 109 ) 103 , ((37 sp Q5B2E8 Ascomycota Emericella nidulans, (27 tr G7XQ80 Ascomyc ota Aspergillus ka achii,28 sp A2QYR9 Ascomycota Aspergillus niger) 115 ) 114 , 5_tr_Q2U2l8_Ascomycota_Aspergillus_oryzae) 113 ) 102 ) 101 ) 76 ) 63 ) 60 , l_tr_A9FHT2_Bacteria_Sorangium_cellulosum) 59 ;
[0105] Nodes 59 to 115 are ancestral.
Data set for the tree shown in FIG. 4
[0106] (tr_A9FHT2_Bacteria_Sorangium_cellulosum:
0.315097,(((tr_G4TC42_Basidiomycota_Piriformospora_indica:
0.696259,(((((tr_Q9Y894_Basidiomycota_Volvariella_volvacea:
0.765873,tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju: 0.247644):
0.029651 ,((((tr_Q0232 l Basidiomycota Phanerochaete chrys:
0.129750,tr_B2ZZ24_Basidiomycota_Irpex_lacteus: 0.117206):
0.06773 l,tr_A8CED8_Basidiomycota_Polyporus_arcularius: 0.130269):
0.041234,(tr_C4B811 _Basidiomycota_Coniophora_puteana :
0.300685,tr_F8Q7V9_Basidiomycota_Serpula_lacrymans: 0.152715): 0.076409):
0.039837,tr_Q96VU2_Basidiomycota_Lentinula_edodes: 0.162517): 0.039347):
0.02452 l,tr_E2JAJ2_Basidiomycota_Neolentinus_lepideus: 0.180124):
0.030126,tr_Q6E5Bl_Basidiomycota_Volvariella_volvacea: 0.122701):
0.349094,tr_A8NEJ3_Basidiomycota_Coprinopsis_cinerea: 0.000004): 0.000004):
0.000004,(((((((((tr_Q2U2I8_Ascomycota_Aspergillus_oryzae:
0.360028 ,(tr_G7XQ80_Ascomycota_Aspergillus_kawachii :
0.042486,sp_A2QYR9_Ascomycota_Aspergillus_niger: 0.042641): 0.050992):
0.047464,sp_Q5B2E8_Ascomycota_Emericella_nidulans: 0.091170):
0.049844,gi_345565889_Ascomycota_Arthrobotrys_oligospora: 0.379980):
0.037226,((sp_Q0CFPl_Ascomycota_Aspergillus_terreus:
0.113465,(tr_FlCHI2_Ascomycota_Penicillium_decumbens:
0.122187,sp_AlCCN4_Ascomycota_Aspergillus_clavatus: 0.201311): 0.043444):
0.030675 ,(sp_Q4 WFK4_Ascomycota_Neosartorya_fumigata :
0.018654,sp_AlDJQ7_Ascomycota_Neosartorya_fischeri: 0.018172): 0.080620):
0.033350):0.025088, tr_Q8NIB5_Ascomycota_Talaromyces_emersonii: 0.202884):
0.027539,(tr_B 8MHF4_Ascomycota_Talaromyces_stipitatus :
0.057020,((tr_O93837_Ascomycota_Acremonium_cellulolyticus :
0.000004,tr_B5TMG4_Acomycota_Penicillium_funiculosum: 0.013427):
0.029062,tr_B6QMM6_Ascomycota_Penicillium_marneffei: 0.068832): 0.013034):
0.130673):0.077560, (((((tr_G4MM92_Ascomycota_Magnaporthe_oryzae: 0.234522,tr_J3NZ73_Ascomycota_Gaeumannomyces_graminis: 0.169297):
0.084951 ,((((tr_Q2GMP2_Ascomycota_Chaetomium_globosum:
0.098682,tr_B2ABX7_Ascomycota_Podospora_anserina: 0.151560):
0.044216,(tr_Q4JQF8_Ascomycota_Chaetomium_thermophilum:
0.13229 l,sp_Q9ClS9_Ascomycota_Humicola_insolens: 0.152435): 0.047285):
0.030908 ,((tr_G2Q A39_Ascomycota_Thielavia_heterothallica:
0.000004,gi_367023495_Ascomycota_Myceliophthora_thermophila: 0.000004):
0.120318,tr_G2QW39_Ascomycota_Thielavia_terrestris: 0.168944): 0.049852):
0.021734,(tr_Q872J7_Ascomycota_Neurospora_crassa:
0.002119,tr_F8MDR2_Ascomycota_Neurospora_tetrasperma: 0.006620):
0.200980):0.083349): 0.054273, (((tr_G2XB72_Ascomycota_Verticillium_dahliae:
0.004339,gi_302405457_Ascomycota_Verticillium_albo-atrum: 0.007681):
0.245610,(tr_E3Q540_Ascomycota_Colletotrichum_graminicola:
0.000004,gi_310790274_Ascomycota_Glomerella_graminicola: 0.000004):
0.200648):0.056802, (tr llRIJl Ascomycota Gibberella zeae:
0.086766,sp_P46236_Ascomycota_Fusarium_oxysporum: 0.124144): 0.222234):
0.060478):0.061418, ((((sp_P07987_Ascomycota_HypocreaJecorina:
0.000004,tr_Q7LSP2_Ascomycota_Trichoderma_koningii :
0.002243 ,tr_Q6UJX9_Acomycota_Trichoderma_viride : 0.002241 ) :
0.006895 ,(tr_H9C5T 1 Ascomycota Hypocrea orientalis :
0.000004,tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum: 0.009305):
0.018631):0.004472, tr_Q66PNl_Ascomycota_Trichoderma_parceramosum:
0.016011):0.117974, tr_G9NFV6_Ascomycota_Hypocrea_atroviridis: 0.091196):
0.105817):0.030437, tr_F2VRZ0_Ascomycota_Phialophora_sp: 0.218653):
0.084991):0.052613, tr_G2XV25_Ascomycota_Botryotinia_fuckeliana: 0.147396):
0.629782,gi_169852726_Ascomycota_Coprinopsis_cinerea: 0.000004): 0.000004):
0.628371 ,(tr_Q 9UW 1 O Neocallimasti Piromy ces rhizinflatus :
0.896155 ,(tr_Q96V98_Neocallimasti_Orpinomyces :
0.110954,tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum: 0.139779):
0.463386):0.212739): 0.301278);
[0107] Tree with node labels:
(l_tr_A9FHT2_Bacteria_Sorangium_cellulosum, ( ( ( 47_tr_G4TC42_Basidiomy cota Piriformospora indica, (((((48 tr Q9Y894 Basidiomycota Volvariel la_volvacea, 49_tr_Q96TP4_Basidiomycota_Pleurotus_saj or-caju) 67 , ( ( ( (50_tr_Q02321_Basidiomycota_Phanerochaete_chrys, 55_tr_B2ZZ24_Bas idiomycota Irpex lacteus) 71
,53 tr A8CED8 Basidiomycota Polyporus arcularius) 70
, (51 tr C4B8I1 Basidiomycota Coniophora puteana, 58 tr F8Q7V9 Basidio mycota_Serpula_lacrymans ) 72 ) 69
, 56_tr_Q96VU2_Basidiomycota_Lentinula_edodes) 68 ) 66
,52 tr E2JAJ2 Basidiomycota Neolentinus lepideus) 65
,57 tr Q6E5B1 Basidiomycota Volvariella volvacea) 64
,54 tr A8NEJ3 Basidiomycota Coprinopsis cinerea) 63 ) 62
, (((((((( (5_tr_Q2U2l8_Ascomycota_Aspergillus_oryzae, (27_tr_G7XQ80_As comycota Aspergillus ka achii,28 sp A2QYR9 Ascomycota Aspergillus ni ger) 82 ) 81 ,37 sp Q5B2E8 Ascomycota Emericella nidulans) 80
, 43_gi_345565889_Ascomycota_Arthrobotrys_oligospora) 79
, ((30 sp Q0CFP1 Ascomycota Aspergillus terreus, (32 tr F1CHI2 Ascomyc ota Penicillium decumbens,33 sp A1CCN4 Ascomycota Aspergillus clavat us) 85 ) 84
, (36 sp Q4WFK4 Ascomycota Neosartorya fumigata,39 sp A1DJQ7 Ascomyco ta_Neosartorya_fischeri ) 86 ) 83 ) 78
,24 tr Q8NIB5 Ascomycota Talaromyces emersonii) 77
, (38_tr_B8MHF4_Ascomycota_Talaromyces_stipitatus, ( (34_tr_093837_Asco mycota Acremonium cellulolyticus , 35 tr B5TMG4 Acomycota Penicillium funiculosum) 89 ,26 tr B6QMM6 Ascomycota Penicillium marneffei) 88 ) 87 ) 76
, ( ( ( ( ( 6_tr_G4MM92_Ascomycota_Magnaporthe_oryzae, 13_tr_J3NZ73_Ascomyc ota Gaeumannomyces graminis) 94
, ( ( ( (7 tr Q2GMP2 Ascomycota Chaetomium globosum, 8 tr B2ABX7 Ascomyco ta_Podospora_anserina) 98
, (16 tr Q4JQF8 Ascomycota Chaetomium thermophilum, 17 sp Q9C1S9 Ascom ycota_Humicola_insolens ) 99 ) 97
, ( (ll_tr_G2QA39_Ascomycota_Thielavia_heterothallica, 40_gi_367023495_
Ascomycota Myceliophthora thermophila) 101
, 12_tr_G2QW39_Ascomycota_Thielavia_terrestris) 100 ) 96
, (9 tr Q872J7 Ascomycota Neurospora crassa,10 tr F8MDR2 Ascomycota N eurospora_tetrasperma) 102 ) 95 ) 93
, ( ( (14_tr_G2XB72_Ascomycota_Verticillium_dahliae, 42_gi_302405457_Asc omycota Verticillium albo-atrum) 105
, (15 tr E3Q540 Ascomycota Colletotrichum graminicola, 41 gi 310790274 Ascomycota Glomerella graminicola) 106 ) 104
, (23 tr I1RIJ1 Ascomycota Gibberella zeae,29 sp P46236 Ascomycota Fu sarium oxysporum) 107 ) 103 ) 92
, ( ( ( (18_sp_P07987_Ascomycota_Hypocrea_jecorina, 19_tr_Q7LSP2_Ascomyco ta Trichoderma koningii,45 tr Q6UJX9 Acomycota Trichoderma viride) 111
, (20_tr_H9C5Tl_Ascomycota_Hypocrea_orientalis, 21_tr_DlMGM6_Ascomycot a Trichoderma longibrachiatum) 112 ) 110
,46 tr Q66PN1 Ascomycota Trichoderma parceramosum) 109
,22 tr G9NFV6 Ascomycota Hypocrea atroviridis) 108 ) 91
, 31_tr_F2VRZ0_Ascomycota_Phialophora_sp) 90 ) 75
,25 tr G2XV25 Ascomycota Botryotinia fuckeliana) 74
,44 gi 169852726 Ascomycota Coprinopsis cinerea) 73 ) 61
, (2_tr_Q9UW10_Neocallimasti_Piromyces_rhizinflatus , (3_tr_Q96V98_Neoc allimasti Orpinomyces , 4 tr H2BPT8 Neocallimasti Neocallimastix patri ciarum) 114 ) 113 ) 60 ) 59 ;
[0108] Nodes 59 to 114 are ancestral .
Appendix 1
SEQ ID NO: 1 (node #32) -
FRMAHLLGLL AALLPILALA ACAPVWGQCG GGGWSGPTCC GGGSTCVGQG PYYSGCLPGT PTTTGDPTTT SSGNVDPTTT SSGGVTDPPT TTPDITPPSG NPFAGYQLYL DPYYVK VDA LAIPQVTDPA LKAKMEKVKQ IQTFFWLDRI EKIKELLGAY LDDALKLQKE LCQPPYAALI VVYDLPDRDC FAEASNGELH LDQNGMQRYK EYIDPIKQQL VKYSGQRIVA VIEPDSLPNI VTNLGGK CD EAQAAYKDCV AYALKQLNLP HVYQYLDAGH AGWLGWPDNL K GAKLFAQV YKAAGSPANV RGLATNVANY NQLSYASPES YDQQDPPYGE FDYVDALAPM LSAQGLGDKH FIIDQGRNGV GNIKRQDWGY WCNNKGAGFG QRPKANPGGA LLDAFV
SEQ ID NO: 2 (DiCA: node #33) -
FRMAHLLALL AALLPVFAVA AQAPVWGQCG GTGWSGPTCC ASGSTCWQN PYYSQCLPGS ATTTTSPTST SSTSSSTTTT SSSGTTSPPT TTPTTTPASG NPFAGYQLYL NPYYAAEVAA LAIPNITDPA LKAKAAAVAK IPTFVWLDTV AKVPDLLGTY LADASALQKA SGQPPYAAQI VVYDLPDRDC AAAASNGEFS IANNGMANYK AYIDSIRAQL VKYSDVRIVA VIEPDSLANM VTNLNVAKCA NAQTAYKECV TYALKQLNLP HVYMYLDAGH AGWLGWPANL SPAAQLFAQV YKNAGSPASV RGLATNVANY NALSAASPPS YTQGDPNYDE IHYINALAPM LSSQGFPDAH FIVDQGRSGV QNIKRQEWGD WCNVKGAGFG TRPTTNTGSS LIDAFV
SEQ ID NO: 3 (BasCA: node #34) -
FKMATLLALL AALLPAFWA AQAPVWGQCG GTGWTGPTCC VSGSTCWQN PYYSQCLPGS ATTTTSPTST SSTSSSTTTT SSSGTTSPPT TTPTTTPASG NPFAGYQLYL SPYYAAEVAA LAAPNITDPA LKAKAAAVAN IPTFTWFDTV AKVPDLLGTY LADASALQKA SGQKPYAAQI VVYDLPDRDC AAAASNGEFS IANNGMANYK TYIDSIAAQL KKYSDVRWA VIEPDSLANM VTNLNVAKCA NAQTAYKECV TYALKQLNLV GVYMYLDAGH AGWLGWPANL SPAAQLFAQM YKNAGSPSFV RGLATNVANY NALSAASPPS YTQGNPNYDE IHYINALAPM LSSQGFPPAH FIVDQGRSGV QNIKRQQWGD WCNVKGAGFG TRPTTNTGSS LIDAIV
SEQ ID NO: 4 (node #35) -
FKFATLLALL AALLPAFVVA AQAPVWGQCG GTGWTGPTTC VSGSTCVVQN PYYSQCLPGS ATTTTSPTST SSTSSSTTTT SSSGTGSPST TTPTSTPAAG NPFVGYQLYL SPYYAAEVAA LAASNITDPT LKAKAASVAN IPTFTWFDVV AKVPDLLGTY LADASALQKS SGQKPYLVQI VVYDLPDRDC AAAASNGEFS IANNGMANYK TYIDQIAAQI KKYPDVRVVA VIEPDSLANM VTNLNVAKCA NAQTAYKECV TYALKQLNSV GVYMYLDAGH AGWLGWPANL SPAAQLFAQM YKNAGSPSFV RGLATNVANY NALSAASPDP ITQGNPNYDE IHYINALAPM LSSQGFPPAH FIVDQGRSGV QNIKRQQWGD WCNVKGAGFG TRPTTNTGSS LIDAIV
SEQ ID NO: 5 (node #36) -
FKFATLLALL AALLPAFVVA AQAPVWGQCG GTGWTGPTTC VSGSVCTVQN PYYSQCLPGS ATTTTSPSST SSPSSSPTTT SSSGTGSPST TTPTSTPAAG NPFVGYQIYL SPYYAAEVAA LAASAITDPT LKAKAASVAN IPTFTWFDVV AKVPDLLGTY LADASALQKS SGQKPQLVQI VVYDLPDRDC AAAASNGEFS IANNGMANYK NYIDQIAAQI KKYPDVRVVA VIEPDSLANM VTNLNVAKCA NAQTAYKECV TYALKQLNSV GVYMYMDAGH AGWLGWPANL SPAAQLFAQM YKNAGSPSFI RGLATNVANY NALSAASPDP ITQGNPNYDE IHYINALAPM LSSQGFPPAH FIVDQGRSGV QNIKRQQWGD WCNVKGAGFG TRPTTNTGSS LIDAIV
SEQ ID NO: 6 (node #37)
FKFATLLALL AALLPAFVVA AQAPVWGQCG GIGWTGPTTC VSGSVCTVQN PYYSQCLPGS ATTTTSPSST SSPSSSPTST SSSGTGSPSG TTPTSTPAAG NPFVGYQIYL SPYYAAEVAA LAASAITDPT LKAKAASVAN IPTFTWLDSV AKVPDLLGTY LADASALQKS SGQKPQLVQI VVYDLPDRDC AAKASNGEFS IANNGQANYQ NYIDQIVAQI KKFPDVRVVA VIEPDSLANL VTNLNVQKCA NAQTTYKACV TYALKQLASV GVYMYMDAGH AGWLGWPANL SPAAQLFAQM YKNAGSSSFI RGLATNVANY NALSAASPDP ITQGNPNYDE IHYINALAPM LSSQGFPPAH FIVDQGRSGV QNIKRQQWGD WCNIKGAGFG TRPTTNTGSS LIDAIV SEQ ID NO: 7 (node #38) -
MKSAAFLAAL AALLPAYVVA GQAPVWGQCG GIGWTGPTTC VSGSVCTVQN PYYSQCLPGS ATTTTSTSST SSQSSSPTST SSSTTGGPSG TTPTSTPSAN NPWTGYQIYL SPYYANEVAA LAAKAITDPT LSAKAASVAN IPTFTWLDSV AKIPDLLGTY LADASALGKS SGQKPQLVQI VVYDLPDRDC AAKASNGEFS IANNGQANYQ NYIDQIVAQI KQFPDVRVVA VIEPDSLANL VTNLNVQKCA NAKTTYLACV NYALKQLASV GVYMYMDAGH AGWLGWPANL SPAAQLFAQV YKNAGKSPFI KGLATNVANY NALSAASPDP ITQGNPNYDE IHYINALAPM LQSAGFPPAT FIVDQGRSGV QNIKRQQWGD WCNIKGAGFG TRPTTNTGSS LIDSIV
SEQ ID NO: 8 (node #39)
FKFATLLALL VALLPAFVVA AQAPVWGQCG GTGWTGPTTC VSGSVCTVIN PYYHQCLPGS ATTTTSPSST SSPSSSPTTT SSSGTGSPST TTPTSTPAAG NPFVGYQIYL SPYYAAEVAA LAASAISDPT LKAKAASVAN IPTFTWFDVV AKVPDLLGTY LADASAIQQS TGRKPQLVQI VVYDLPDRDC AAAASNGEFS IANNGMANYK NYIDQIAAQI KKYPDVRVVA VIEPDSLANM VTNLNVAKCA GAQAAYKEGV TYALKQLNSV GVYMYMDAGH AGWLGWPANL SPAAQLFAQM YKNAGSPSFI RGLATNVANY NALSAASPDP VTQGNANYDE IHYINALAPM LSSQGFPPAH FIVDQGRSGV QNIKRQQWGD WCNVKGAGFG TRPTTNTGSS LIDAIV
SEQ ID NO: 9 (node #40) -
LKFATLLALL VALLPAFVVA AQAPVWGQCG GTGWTGPTTC VSGSVCTVIN PWYHQCLPGS ATTTTSPSST SSPSSSPTTS SSSGTGSPST TTPTSTPATG NPFVGYQIYL SPYYAAEVEA LAASAISDPT LKAKALKVKE IPTFTWFDVV AKVPDLLGTY LADASAIQQS TGRKPQLVQI VVYDLPDRDC AAAASNGEFS IANGGMAKYK NYIDQIAAQI KKYPDVRVVA VIEPDSLANM VTNLNVAKCA GAQAAYKEGV TYALKQLNAV GVYMYMDAGH AGWLGWPANL SPAAQLFAQM YKNAGRSSFI RGLATNVANY NALSAASPDP VTQGNANYDE IHYINALAPM LRSQGWPDAH FIVDQGRSGV QNIKRQQWGN WCNVKGAGFG MRPTTNTGSS LIDAIV
SEQ ID NO: 10 (AsCA: node #41) - FRMAHLLALL AATLPVSAVQ AQASVWGQCG GQGWSGPTCC ASGSTCWQN PYYSQCLPGS ATTTTSSTST TSTSSSTTTT SSTTTTSPPT TTPTTASASG NPFAGYQLYA NPYYASEVHT LAIPSLTDGA LAAKASAVAK VPSFVWLDTA AKVPTLMGTY LADIRAANKA GANPPYAGQF VVYDLPDRDC AAAASNGEFS IANNGVANYK AYIDSIRAQL VKYSDVRIIL VIEPDSLANM VTNLNVAKCA NAQSAYLECV NYALKQLNLP HVAMYLDAGH AGWLGWPANL SPAAQLFAKV YKNAGSPAAV RGLATNVANY NAWSIASPPS YTQGDPNYDE KHYINALAPL LSSNGFPDAH FIVDQGRSGV QPTKQQEWGD WCNVIGTGFG VRPTTNTGSS LEDAFV
SEQ ID NO: 11 (SorCA: node #42) --
FLAAALLATL AAALPVEERQ ACASVWGQCG GQGWSGPTCC ASGSTCWQN PYYSQCLPGS ATTTTTSSST TSRSSSTTST SSTTTTSPPT TTPSTASYSG NPFAGVQLWA NAYYASEVHT LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPTLMAGT LADIRAANKA GANPPYAGQF VVYDLPDRDC AAAASNGEFS IADNGVANYK AYIDAIRAQL VEYSDIRIIL VIEPDSLANM VTNMNVAKCA NAQSAYLECT NYALKQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPAAV RGLATNVANY NAWSIASPPS YTQGDPNYDE KHYINALAPL LSANGWPDAH FIVDQGRSGK QPTGQQEWGD WCNVIGTGFG VRPTTNTGSS LEDAFV
SEQ ID NO: 12 (node #43) -
FLAAALLATL AAALPVEERQ ACASVWGQCG GQGWSGPTCC ASGSTCVVSN PYYSQCLPGS ATPTTTSSST TSRSSSTTSR SSTTTTSPPT TTPSTASYSG NPFAGVNLWA NAYYASEVHT LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPTLMAGT LADIRAANKA GGNPPYAGQF VVYDLPDRDC AAAASNGEFS IADGGVAKYK AYIDAIRAQL VEYSDIRIIL VIEPDSLANM VTNMGVPKCA NAQSAYLECT NYAVKQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPAAL RGLATNVANY NAWNITSPPS YTQGNPNYDE KHYINALAPL LSANGWPDAH FIVDQGRSGK QPTGQQEWGD WCNVIGTGFG VRPTANTGSS LVDAFV
SEQ ID NO: 13 (node #44) - FLAAALLATL AAAVPVEERQ ACASVWGQCG GQGWSGPTCC ASGSTCVVSN PYYSQCLPGS ATPSSTASST TSRSSSTTSR SSTTTTSPPT TTTGTATYSG NPFAGVNLWA NAYYASEVST LAIPSLTDGA LATAAAAVAK VPSFMWLDTA AKVPLLMDGT LADIRAANKA GGNPPYAGQF VVYDLPDRDC AAAASNGEFS IADGGVAKYK AYIDAIRAML VEYSDIRIIL VIEPDSLANM VTNMGVPKCA NAQSAYLECT NYAVTQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPKAL RGLATNVANY NAWNITSPPS YTQGNPNYDE KHYINALAPL LSANGWPDAH FIVDQGRSGK QPTGQQEWGD WCNVIGTGFG VRPTANTGSS LVDAFV
SEQ ID NO: 14 (node #45) -
LVTLATLATL AASVPVEERQ ACSSVWGQCG GQNWSGPTCC ASGSTCVFSN DYYSQCLPGA ASSSSRASST TSRASSTTSR SSTTTTPPPG STTGTATYSG NPFAGVTPWA NAYYASEVST LAIPSLTDGA MATAAAAVAK VP SFM WLDTL DKTPLLMEQT LADIRAANKN GGNPPYAGQF VVYDLPDRDC AALASNGEYS IADGGVAKYK NYIDTIRQIV VEYSDIRILL VIEPDSLANL VTNLGTPKCA NAQSAYLECl NYAVTQLNLP NVAMYLDAGH AGWLGWPANQ DPAAQLFANV YKNASSPRAL RGLATNVANY NAWNITSPPS YTQGNAVYNE KLYIHAIGPL LANHGWSNAF FITDQGRSGK QPTGQQEWGD WCNVIGTGFG IRPSANTGDS LLDAFV
SEQ ID NO: 15 (node #46) -
LTTLATLATL AASVPLEERQ ACSSVWGQCG GQNWSGPTCC ASGSTCVYSN DYYSQCLPGA ASSSSRASST TSRASSTTSR SSATTTPPPG STTGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLLMEQT LADIRTANKN GGNPPYAGQF VVYDLPDRDC AALASNGEYS IADGGVAKYK NYIDTIRQIV VEYSDIRTLL VIEPDSLANL VTNLGTPKCA NAQSAYLECl NYAVTQLNLP NVAMYLDAGH AGWLGWPANQ DPAAQLFANV YKNASSPRAL RGLATNVANY NGWNITSPPS YTQGNAVYNE KLYIHAIGPL LANHGWSNAF FITDQGRSGK QPTGQQQWGD WCNVIGTGFG IRPSANTGDS LLDSFV
SEQ ID NO: 16 (node #47) - LTTLATLATL AASVPLEERQ ACSSVWGQCG GQNWSGPTCC ASGSTCVYSN DYYSQCLPGA ASSSSRASST TSRVSSTTSR SSATTTPPPG STTGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLLMEQT LADIRTANKN GGNPPYAGQF V V YDLPDRD C AALASNGEYS IADGGVAKYK NYIDTIRQIV VEYSDIRTLL VIEPDSLANL VTNLGTPKCA NAQSAYLECl NYAVTQLNLP NVAMYLDAGH AGWLGWPANQ DPAAQLFANV YKNASSPRAL RGLATNVANY NGWNITSPPS YTQGNAVYNE KLYIHAIGPL LANHGWSNAF FITDQGRSGK QPTGQQQWGD WCNVIGTGFG IRPSANTGDS LLDSFV
SEQ ID NO: 17 (node #48) -
LTTLATLATL AASVPLEERQ ACSSVWGQCG GQNWSGPTCC ASGSTCVYSN DYYSQCLPGA ASSSSRAAST TSRVSPTTSR SSSATTPPPG STTGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLLMEQT LADIRTANKN GGNPPYAGQF VV YDLPDRD C AALASNGEYS IADGGVAKYK NYIDTIRQIV VEYSDIRTLL VIEPDSLANL VTNLGTPKCA NAQSAYLECl NYAVTQLNLP NVAMYLDAGH AGWLGWPANQ DPAAQLFANV YKNASSPRAL RGLATNVANY NGWNITSPPS YTQGNAVYNE KLYIHAIGPL LANHGWSNAF FITDQGRSGK QPTGQQQWGD WCNVIGTGFG IRPSANTGDS LLDSFV
SEQ ID NO: 18 (node #49)
LTTLATLATL AASVPLEERQ ACSSVWGQCG GQNWSGPTCC ASGSTCVYSN DYYSQCLPGA ASSSSRAAST TSRVSPTTSR SSSATTPPPG STTGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLLMEQT LADIRTANKN GGNPPYAGQF VV YDLPDRD C AALASNGEYS IADGGVAKYK NYIDTIRQIV VEYSDIRTLL VIEPDSLANL VTNLGTPKCA NAQSAYLECl NYAVTQLNLP NVAMYLDAGH AGWLGWPANQ DPAAQLFANV YKNASSPRAL RGLATNVANY NGWNITSPPS YTQGNAVYNE KLYIHAIGPL LANHGWSNAF FITDQGRSGK QPTGQQQWGD WCNVIGTGFG IRPSANTGDS LLDSFV
SEQ ID NO: 19 (node #50) - FLAAALLATA AFAVPVEERQ ACASQWGQCG GQGWSGPTCC ASGSTCVVSN PYYSQCLPGS ATPSSTASST TSSSTSTTSR SSTTVTSPPT TTTGGATYTG NPFAGVNLWA NAYYASEVST LAIPSLSDGA LATAAAKVAK VPTFMWMDTA AKVPLLMDGT LADIRRANKA GGNPPYAGQF VVYNLPDRDC AAAASNGELS IADGGVAKYK AYIDAIRAML VKYSDIRIIL VIEPDSLANM VTNMGVPKCA NAQAAYLECT NYAVTQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPKAL RGLATNVSNY NAWNITSPPS YTQGNPNYDE KHYIDALAPL LSANGWPDAK FIVDQGRSGK QPTGQQEWGD WCNAIGTGFG VRPTANTGSS LVDAFV
SEQ ID NO: 20 (node #51) -
FLSAALLSTA AFAAPLEERQ ACASQWGQCG GQGWSGPTCC PSGTTCQLQN AWYSQCLPGA APPPATAASS TRPATTSSVR STTVVTNPPT TTVGGATYTG NPFAGVNQWA NAYYRSEVSS LAVPSLSDGP LATAAAKVAD VPTFQWMDTT AKVPLLIDGA LADIRRANAA GGNPPYAGIF VVYNLPDRDC AAAASNGELS IANDGINKYK AYIDSIRAVL LKYNDIRTLL VIEPDSLANM VTNMGVAKCS NAAAAYKECT KYAVQQLDLP HVAQYLDAGH AGWLGWPANI GPAATIFTDI YKEAGKPKSL RGLATNVSNY NAWNATSPAP YTSPNPNYDE KHYVDAFAPL LRQNGWPDAK FIIDQGRSGK QPTGQQEWGH WCNALGTGFG LRPTSNTGHP DVDAFV
SEQ ID NO: 21 (node #52) -
FLAAALAATA LAALPVEERQ NCASVWGQCG GTGWSGPTCC ASGSTCVEQN PYYSQCLPGS QVTTTTSSST TSRSSSTTST SSTTTTSPPT TTPSTASYSG NPFAGVQLWA NAYYASEVHN LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPTLMAGT LADIRAANKA GANPPYAGQF VVYDLPDRDC AAAASNGEFS IADNGVANYK AYIDAIRAQL VEYSDIRIIL VIEPDSLANM VTNMNVAKCA NAQSAYLECT NYALKQLNLP HVAMYLDAGH AGWLGWPANL QPAATLFAKV YKDAGKPAAV RGLATNVANY NAWSIASPPS YTQGDPNYDE KHYIQALAPL LSANGWPDAH FIVDQGRSGK QPTGQQEWGD WCNVIGTGFG VRPTTNTGSD LEDAFV SEQ ID NO: 22 (node #53) -
FLAAALAASA LAAPVVEERQ NCGSVWSQCG GIGWSGPTCC ASGSTCVEQN PYYSQCLPGS QVTTTTSSST TSRSSSTTST STTTTTSPPV TTPSTASYSG NPFSGVQLWA NDYYASEVHN LAIPSMTDGA LAAKASAVAK VPSFQWLDRN VTVDTLMAGT LSEIRAANKA GANPPYAGQF VVYDLPDRDC AAAASNGEFS IANNGAANYK TYIDAIRALL IEYSDIRIIL VIEPDSLANM VTNMNVAKCA NAESTYRELT VYALKQLNLP HVAMYLDAGH AGWLGWPANl QPAATLFAKI YKDAGKPAAV RGLATNVANY NAWSIASPPS YTQGDPNYDE KHYIQALAPL LNANGFPPAH FIVDTGRNGK QPTGQQEWGD WCNVIGTGFG VRPTTNTGSD LEDAFV
SEQ ID NO: 23 (node #54) -
FLAAALAASA LAAPVVEERQ NCGSVWSQCG GIGWSGPTCC ASGSTCVEQN PYYSQCLPGS QVTTTTSSST TSRSSSTTST STSSTTSPPV TTPSTASYSG NPFSGVQLWA NDYYASEVHN LAIPSMTDGA LAAKASAVAK VPSFQWLDRN VTVDTLMAGT LSEIRAANKA GANPPYAAQF VVYDLPDRDC AAAASNGEFS IANNGAANYK TYIDAIRKLL IEYSDIRIIL VIEPDSLANM VTNMNVAKCA NAASTYRELT VYALKQLNLP HVAMYLDAGH AGWLGWPANl QPAATLFAKI YKDAGKPAAV RGLATNVANY NAWSIASPPS YTSPNPNYDE KHYIEAFAPL LNAAGFPPAQ FIVDTGRNGK QPTGQQEWGD WCNVIGTGFG VRPTTNTGSD LVDAFV
SEQ ID NO: 24 (node #55) -
FLTAALAAAA LAAPVVEERQ NCGSVWSQCG GIGWNGPTCC ASGSTCVEQN DWYSQCLPGS QVTTTTSSST TSRSSSTTST STSSTTPPPV TTPSTASYSG NPFSGVQLWA NDYYRSEVHN LAIPSMTDGA LAAKASAVAE VPSFQWLDRN VTVDTLMVGT LSEIRAANKA GANPPYAAQF VVYDLPDRDC AAAASNGEFS IANNGAANYK SYIDAIRKLL IEYSDIRIIL VIEPDSLANM VTNMNVAKCS NAASTYRELT VYALKQLNLP HVAMYLDAGH AGWLGWPANl QPAAELFAKI YKDAGKPAAV RGLATNVANY NAWSIASPPS YTSPNPNYDE KHYIEAFAPL LNAAGFPPAQ FIVDTGRNGK QPTGQQEWGD WCNVIGTGFG VRPTANTGHE LVDAFV
SEQ ID NO: 25 (node #56) -
FLTAALAAAA LAAPVVEERQ NCGSVWGQCG GIGYNGPTCC QSGSTCVEQN DWYSQCLPGS QVTTTTSSST TSRSSSTTST STSSTRPPPV TTPSTASYNG NPFSGVQLWA NNYYRSEVHT LAIPSMTDPA LAAKASAVAE VPSFQWLDRN VTVDTLLVGT LSEIRAANQA GANPPYAAQF VVYDLPDRDC AAAASNGEWS IANNGANNYK SYIDRIRELL IEYSDIRTIL VIEPDSLANM VTNMNVAKCS NAASTYRELT VYALKQLNLP HVAMYMDAGH AGWLGWPANl QPAAELFAKI YKDAGKPAAV RGLATNVANY NAWSIASPPS YTSPNPNYDE KHYIEAFAPL LNARGFPPAQ FIVDTGRNGK QPTGQQEWGH WCNVIGTGFG VRPTANTGHE LVDAFV
SEQ ID NO: 26 (node #57) -
FRMRHLLSLL AATLLLSAVQ AQQSVWGQCG GQGWSGPTSC AAGSTCSTQN PYYAQCIPGS ATTTTSSTTT TSTSTSTTTT TTTTTTSPPT TTPTTASASG NPFSGYQLYA NPYYASEVHT LAIPSLTDGS LAAKASAVAK VPSFVWLDTA AKVPTLMGTY LADIQAANKA GANPPIAGIF VVYDLPDRDC AAAASNGEYS IANNGVANYK AYIDSIRAQL VKYSDVHTIL VIEPDSLANM VTNLNVAKCA NAQSAYLECV NYALKQLNLP NVAMYLDAGH AGWLGWPANL SPAAQLFAKV YKNAGSPAAV RGLATNVANY NAWSISSPPS YTQGDPNYDE KHYINALAPL LTSNGFPDAH FIMDTSRNGV QPTKQQAWGD WCNVIGTGFG VRPTTNTGDP LEDAFV
SEQ ID NO: 27 (node #58) -
FRMLRYLSIV AATAILTGVE AQQSVWGQCG GQGWSGATSC AAGSTCSTQN PYYAQCIPGT ATSTTSSTTT SSTSASTTTT TTTTTTTAST TTTTTAAASG NPFSGYQLYA NPYYSSEVHT LAIPSLTDGS LAAAATKAAE IPSFVWLDTA AKVPTLMGTY LANIEAANKA GASPPIAGIF VVYDLPDRDC AAAASNGEYT VANNGVANYK AYIDSIVAQL KAYPDVHTIL IIEPDSLANM VTNLSTAKCA EAQSAYYECV NYALINLNLP NVAMYIDAGH AGWLGWSANL SPAAQLFATV YKNASSPAAL RGLATNVANY NAWSISSPPS YTSGDSNYDE KLYINALSPL LTSNGWPNAH FIMDTSRNGV QPTKQQAWGD WCNVIGTGFG VQPTTNTGDP LEDAFV
SEQ ID NO: 28 (node #59) -
FRMRHLLSLL AATLLLSAVQ AQQSVWGQCG GQGWSGPTSC AAGSTCSTQN PYYAQCIPGS ATTTTSSTTT TSTTTSTTTT TTTTTTSPPT TTPTTASASG NPFSGYQLYA NPYYASEVHT LAIPSLTDSS LAPKASAVAK VPSFVWLDTA AKVPTLMGTY LADIQAKNKA GANPPIAGIF VVYDLPDRDC AAAASNGEYS IANNGVANYK AYIDSIRAQL VKYSDVHTIL VIEPDSLANM VTNLNVAKCA NAQSAYLECV NYALKQLNLP NVAMYLDAGH AGWLGWPANL SPAAQLFAKV YKNAGSPAAV RGLATNVANY NAWSISSCPS YTQGDPNCDE KHYINALAPL LKENGFPDAH FIMDTSRNGV QPTKQQAWGD WCNVIGTGFG VRPTTNTGDP LEDAFV
SEQ ID NO: 29 (node #60) -
FRMRHLASLL AATLLLSAVQ AQQTVWGQCG GQGWSGPTSC VAGSTCSTQN PYYAQCIPGS ATTSTSTTTT TSTTTSTTTT TTTTTTSPTT TTPTTASASG NPFSGYQLYA NPYYASEVHT LAIPSLTDSS LAPKASAVAK VPSFVWLDTA AKVPTLMGTY LADIQAKNKA GANPPIAGIF VVYDLPDRDC AALASNGEYS IANNGVANYK AYIDSIRAQL VKYSDVHTIL VIEPDSLANL VTNLNVAKCA NAQSAYLECV DYALKQLNLP NVAMYLDAGH AGWLGWPANL GPAAQLFAKV YKNAGSPAAV RGLATNVANY NAWSISTCPS YTQGDPNCDE KRYINALAPL LKENGFPDAH FIMDTSRNGV QPTKQQAWGD WCNVIGTGFG VRPTTNTGDP LQDAFV
SEQ ID NO: 30 (node #61) -
FRMKHLASSI ALTLLLPAVQ AQQTVWGQCG GQGWSGPTSC VAGAACSTLN PYYAQCIPGA TATSTLTTTT ASTTTSTTTT TPTTTTGPTT SAPTTVTASG NPFSGYQLYA NPYYSSEVHT LAMPSLPDSS LQPKASAVAE VPSFVWLDVA AKVPTLMGTY LADIQAKNKA GANPPIAGIF VVYDLPDRDC AALASNGEYS IANNGVANYK AYIDAIRAQL VKYSDVHTIL VIEPDSLANL VTNLNVAKCA NAQSAYLECV DYALKQLNLP NVAMYLDAGH AGWLGWPANL GPAATLFAKV YTDAGSPAAV RGLATNVANY NAWSLSTCPS YTQGDPNCDE KKYINAMAPL LKEAGFPDAH FIMDTSRNGV QPTKQNAWGD WCNVIGTGFG VRPSTNTGDP LQDAFV
Appendix 2
SEQ ID NO: 31 (DiCA: node #33)
AAQAPVWGQC GGTGWSGPTC CASGSTCWQ NPYYSQCLPG STTSSTTTTS TTTSSSTSST TTTTSTTTPP TTSPTTTTPP PASTTTPASG NPFAGYQLYL NPYYAAEVAS LAIPNITDPA LKAKAASVAK IPTFVWLDTV AKVPDLGTYL ADASALQKAS GQPPYAVQIV VYDLPDRD C A AAASNGEFSI ANNGMANYKA YIDSIRAQLV KYSDVRIVAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNLPN VYMYLDAGHA GWLGWPANLS PAAQLFAQVY KNAGSPAAVR GLATNVANYN ALSAASPDSY TQGDPNYDEI HYINALAPML SSQGFPDAHF IVDTGRNGVQ NIRQQEWGDW CNVKGAGFGT RPTTNTGSSL IDAFVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LVKNANPPL
SEQ ID NO: 32 (BasCA: node #34)
AAQAPVWGQC GGTGWTGPTC CASGSTCWQ NPYYSQCLPG STTSSTTTTS TTTSSPTSST TTTTSTTTPP TTSPTTTTPP PGSTSTPASG NPFAGYQLYL SPYYAAEVAA LAAPNITDPA LKAKAASVAN IPTFTWFDTV AKVPDLGTYL ADASALQKSS GQKPYAVQIV VYDLPDRDCA AAASNGEFSI ANNGMANYKT YIDSIAAQLK KYSDVRWAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNLVG VYMYLDAGHA GWLGWPANLS PAAQLFAQLY KNAGSPSFVR GLATNVANYN ALSAASPDSY TQGNPNYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAIVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFET LVKNANPPL
SEQ ID NO: 33 (node #35)
AAQAPVWGQC GGTGWTGPTT CASGSTCWQ NPYYSQCLPG STTSSTTTTS TTTSSPTSST TTTTSTTTPP TGSPTTTTPP PGSTSTPAAG NPFVGYQLYL SPYYAAEVAA LAASNITDPT LKAKAASVAN IPTFTWFDVV AKVPDLGTYL ADASALQKSS GQKPYLVQIV VYDLPDRDCA AAASNGEFSI ANNGMANYKT YIDQIAAQIK KYPDVRVVAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNSVG VYMYLDAGHA GWLGWPANLS PAAQLFAQLY KNAGSPSFVR GLATNVANYN ALSAASPDPI TQGNPNYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAIVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFET LVKNANPPL SEQ ID NO: 34 (node #36)
AAQAPVWGQC GGTGWTGPTT CVSGSVCTVQ NDYYSQCLPG STTSSTTTTS TTTSSPTSST TTTTSTSTPP TGSPTTTTPP PGSTSTPAAG NPFVGYQIYL SPYYAAEVAA LAASAITDPT LKAKAASVAN IPTFTWFDVV AKVPDLGTYL ADASALQKSS GQKPQLVQIV VYDLPDRDCA AAASNGEFSl ANNGMANYKN YIDQIAAQIK KYPDVRVVAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNSVG VYMYMDAGHA GWLGWPANLS PAAQLFAQMY KNAGSPSFIR GLATNVANYN ALSAASPDPI TQGNPNYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAIVWVKPG GESDGTSDTS APRYDSHCGL SDAKQPAPEA GTWFQAYFET LVKNANPPL
SEQ ID NO: 35 (node #37)
AAQAPVWGQC GGIGWTGPTT CVSGSVCTKQ NDYYSQCLPG STTSSTTTTS APTSSPTSSA SSSTSTSTPP TGSTTTNSPP PGSTSTPAAG NPFVGYQIYL SPYYAAEVAA LAANAITDPT LKAKAASVAN IPTFTWLDSV AKVPDLGTYL ADASALQKSS GQKPQLVQIV VYDLPDRDCA AKASNGEFSI ADNGQANYQN YIDQIVAQIK KFPDVRVVAV IEPDSLANLV TNLNVQKCAN AQTTYKACVT YALKQLASVG VYMYMDAGHA GWLGWPANLS PAAQLFAQMY KNAGSSSFIR GLATNVANYN ALSAASPDPI TQGDPNYDEI HYINALAPML SQQGFPPAHF IVDQGRSGQQ NIRQQQWGDW CNIKGAGFGT RPTTNTGSSL IDAIVWVKPG GESDGTSNSS SPRFDSTCSL SDATQPAPEA GTWFQAYFET LVSKANPPL
SEQ ID NO: 36 (node #38)
AGQAPVWGQC GGIGWTGPTT CVSGSVCTKQ NDYYSQCLPG STTSSTTTTS APTSSPTSSP SSSSSTSSAP TGSTTTNSPP PGSTSTPSAN NPWTGYQIYL SPYYANEVAA LAAKAITDPT LAAKAASVAN IPTFTWLDSV AKIPDLGTYL ADASALGKSS GQKPQLVQIV VYDLPDRDCA AKASNGEFSI ADNGQANYQN YIDQIVAQIK QFPDVRVVAV IEPDSLANLV TNLNVQKCAN AKTTYLACVN YALKQLASVG VYMYMDAGHA GWLGWPANLS PAAQLFAQVY KNAGKSPFIK GLATNVANYN ALSAASPDPI TQGDPNYDEI HYINALAPML QQAGFPPATF IVDQGRSGQQ NIRQQQWGDW CNIKGAGFGT RPTTNTGSSL IDSIVWVKPG GESDGTSNSS SPRFDSTCSL SDATQPAPEA GTWFQAYFET LVSKANPPL SEQ ID NO: 37 (node #39)
AAQAPVWGQC GGTGWTGPTT CVSGSVCTVI NDYYHQCLPG STTSSTTTTS TTTSSPTSST TTTTSTSTPP TGSPTTTTPP PGSTSTPAAG NPFVGYQIYL SPYYAAEVAA LAASAISDPT LKAKAASVAN IPTFTWFDVV AKVPDLGTYL ADASAIQQST GRKPQLVQIV VYDLPDRDCA AAASNGEFSI ANNGMANYKN YIDQIAAQIK KYPDVRVVAV IEPDSLANMV TNLNVAKCAG AQAAYKEGVT YALKQLNSVG VYMYMDAGHA GWLGWPANLS PAAQLFAQMY KNAGSPSFIR GLATNVANYN ALSAASPDPV TQGNANYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAIVWVKPG GESDGTSDTS APRYDSHCGL SDAKKPAPEA GTWFQAYFET LVKNANPPL
SEQ ID NO: 38 (node #40)
AAQAPVWGQC GGTGWTGPTT CVSGSVCTVI NQWYHQCLPG STTSSTTTTS TTSSRPTSST TTTTSTSTPP TSSPTTTTPT PGSTSTPATG NPFVGYQIYL SPYYAAEVEA LAASAISDPT LKAKALKVKE IPTFTWFDVV AKVPDLGTYL ADASAIQQST GRKPQLVQIV VYDLPDRDCA AAASNGEFSI ANGGMAKYKN YIDRIAAQIK KYPDVRVVAV IEPDSLANMV TNLNVAKCAG AQAAYKEGVT YALKQLSAVG VYMYMDAGHA GWLGWPANLS PAAQLFAQMY KNAGRSSFIR GLATNVANYN ALSAASPDPV TQGNANYDEI HYINALAPML RSQGWPDAQF IVDQGRSGVQ NIRQQQWGNW CNVKGAGFGM RPTTNTGSSL IDAIVWIKPG GESDGTSDTS APRYDTHCGL SDAKKPAPEA GTWFQAYFVN LVKNANPPL
SEQ ID NO: 39 (AsCA node #41)
QAQASVWGQC GGQGWSGPTC CASGSTCWQ NPYYSQCLPG STTTSTTTTS TTTTSSTSST TTTTSTTTPP TTSPTTTTPP AASTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFVWLDTA AKVPTMGTYL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGVANYKA YIDSIRAQLV KYSDVRIILV IEPDSLANMV TNLNVAKCAN AQSAYLECVN YALKQLNLPN VAMYLDAGHA GWLGWPANLS PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYINALAPLL SSNGFPDAHF IVDTGRNGVQ PTRQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LLTNANPPF SEQ ID NO: 40 (SorCA node #42)
QACASVWGQC GGQGWSGPTC CASGSTCWQ NPYYSQCLPG STTTSTTTRS STTTSSTSST TTSTSTTTPP TTSPTTTTPP AASSTASYSG NPFSGVQLWA NAYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPLMAGTL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGVANYKA YIDAIRAQLV EYSDIRIILV IEPDSLANMV TNMNVAKCAN AQSAYLECTN YALKQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KNAGKPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYIQALAPLL SSNGFPDAHF IVDTGRNGKQ PTGQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GECDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LLTNANPPF
SEQ ID NO: 41 (node #43)
QACASQWGQC GGQGWSGPTC CASGSTCWQ NPYYSQCLPG STTTSTTTRS STTTSSVSST TTSTSTTTPP TGSPTTTTPP AASGTASYSG NPFAGVQLWA NAYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPLMAGTL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ADNGVANYKA YIDAIRAQLV EYSDIRIILV IEPDSLANMV TNMNVAKCAN AQSAYLECTN YAVKQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KNAGKPAALR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYIQALAPLL SSNGWPDAHF IVDQGRSGKQ PTGQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GECDGTSDTT APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LLTNANPPF
SEQ ID NO: 42 (node #44)
QACASQWGQC GGQGWSGPTC CASGSTCWS NPYYSQCLPG SATTSSTTRA SSTTSSVSST TTSTSTTTPP TGSPTTTTPP APSGTASYSG NPFAGVNLWA NAYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPLMAGTL ADIRAANKAG GNPPYAGQFV VYDLPDRDCA AAASNGEFSI ADGGVAKYKA YIDAIRAQLV EYSDIRIILV IEPDSLANMV TNMGVPKCAN AQSAYLECTN YAVKQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KNAGKPAALR GLATNVANYN AWNITSPPSY TQGNPNYDEK HYIQALAPLL SSNGWPDAHF IVDQGRSGKQ PTGQQEWGDW CNVIGTGFGV RPTANTGSSL VDAFVWVKPG GECDGTSDTT APRYDHHCGS SDALQPAPEA GTWFQAYFEQ LLTNANPPF SEQ ID NO: 43 (node #45)
QACASQWGQC GGQGWSGPTC CASGSTCVVS NPYYSQCLPG SATTSSSTRA SSTTSSVSST TTSTSTTTPP TGSPTTTTPP APSGTATYSG NPFAGVNLWA NAYYASEVSS LAIPSLTDGA LATAAAAVAK VPSFQWLDTA AKVPLMDGTL ADIRAANKAG GNPPYAGQFV VYDLPDRDCA AAASNGEFSI ADGGVAKYKA YIDAIRAMLV EYSDIRIILV IEPDSLANMV TNMGVPKCAN AQSAYLECTN YAVTQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KNAGKPKALR GLATNVANYN AWNITSPPSY TQGNPNYDEK HYIQALAPLL SSNGWPDAHF IVDQGRSGKQ PTGQQEWGDW CNVIGTGFGV RPTANTGSSL VDAFVWVKPG GECDGTSDTT APRYDHHCGS SDALQPAPEA GTWFQAYFEQ LLTNANPSF
SEQ ID NO: 44 (node #46)
QACSSVWGQC GGQNWSGPTC CASGSTCVFS NDYYSQCLPG AASSSSSTRA SSTTSRVSST TTRSSTTTPP PGSSTTSAPP VGSGTATYSG NPFAGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLMEQTL ADIRAANKNG GNPPYAGQFV VYDLPDRDCA ALASNGEYSl ADGGVAKYKN YIDTIRGIVV EYSDIRILLV IEPDSLANLV TNLGTPKCAN AQSAYLECIN YAVTQLNLPN VAMYLDAGHA GWLGWPANQD PAAQLFANVY KNASSPRALR GLATNVANYN AWNITSPPSY TQGNAVYNEK LYIHAIGPLL ANHGWSNAFF ITDQGRSGKQ PTGQQEWGDW CNVIGTGFGI RPSANTGDSL LDAFVWVKPG GECDGTSDSS APRFDSHCAS PDALQPAPQA GAWFQAYFVQ LLTNANPSF
SEQ ID NO: 45 (node #47)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG AASSSSSTRA SSTTSRVSST TSRSSSATPP PGSSTTRVPP VGSGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLMEQTL ADIRTANKNG GNPPYAGQFV VYDLPDRDCA ALASNGEYSl ADGGVAKYKN YIDTIRQIVV EYSDIRTLLV IEPDSLANLV TNLGTPKCAN AQSAYLECIN YAVTQLNLPN VAMYLDAGHA GWLGWPANQD PAAQLFANVY KNASSPRALR GLATNVANYN GWNITSPPSY TQGNAVYNEK LYIHAIGPLL ANHGWSNAFF ITDQGRSGKQ PTGQQQWGDW CNVIGTGFGI RPSANTGDSL LDSFVWVKPG GECDGTSDSS APRFDSHCAL PDALQPAPQA GAWFQAYFVQ LLTNANPSF SEQ ID NO: 46 (node #48)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG AASSSSSTRA SSTTSRVSST TSRSSSATPP PGSTTTRVPP VGSGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLMEQTL ADIRTANKNG GNPPYAGQFV VYDLPDRDCA ALASNGEYSl ADGGVAKYKN YIDTIRQIVV EYSDIRTLLV IEPDSLANLV TNLGTPKCAN AQSAYLECIN YAVTQLNLPN VAMYLDAGHA GWLGWPANQD PAAQLFANVY KNASSPRALR GLATNVANYN GWNITSPPSY TQGNAVYNEK LYIHAIGPLL ANHGWSNAFF ITDQGRSGKQ PTGQQQWGDW CNVIGTGFGI RPSANTGDSL LDSFVWVKPG GECDGTSDSS APRFDSHCAL PDALQPAPQA GAWFQAYFVQ LLTNANPSF
SEQ ID NO: 47 (node #49)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG AASSSSSTRA ASTTSRVSPT TSRSSSATPP PGSTTTRVPP VGSGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLMEQTL ADIRTANKNG GNPPYAGQFV VYDLPDRDCA ALASNGEYSl ADGGVAKYKN YIDTIRQIVV EYSDIRTLLV IEPDSLANLV TNLGTPKCAN AQSAYLECIN YAVTQLNLPN VAMYLDAGHA GWLGWPANQD PAAQLFANVY KNASSPRALR GLATNVANYN GWNITSPPSY TQGNAVYNEK LYIHAIGPLL ANHGWSNAFF ITDQGRSGKQ PTGQQQWGDW CNVIGTGFGI RPSANTGDSL LDSFVWVKPG GECDGTSDSS APRFDSHCAL PDALQPAPQA GAWFQAYFVQ LLTNANPSF
SEQ ID NO: 48 (node #50)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG AASSSSSTRA ASTTSRVSPT TSRSSSATPP PGSTTTRVPP VGSGTATYSG NPFVGVTPWA NAYYASEVSS LAIPSLTDGA MATAAAAVAK VPSFMWLDTL DKTPLMEQTL ADIRTANKNG GNPPYAGQFV VYDLPDRDCA ALASNGEYSl ADGGVAKYKN YIDTIRQIVV EYSDIRTLLV IEPDSLANLV TNLGTPKCAN AQSAYLECIN YAVTQLNLPN VAMYLDAGHA GWLGWPANQD PAAQLFANVY KNASSPRALR GLATNVANYN GWNITSPPSY TQGNAVYNEK LYIHAIGPLL ANHGWSNAFF ITDQGRSGKQ PTGQQQWGDW CNVIGTGFGI RPSANTGDSL LDSFVWVKPG GECDGTSDSS APRFDSHCAL PDALQPAPQA GAWFQAYFVQ LLTNANPSF SEQ ID NO: 49 (node #51)
QACASQWGQC GGQGWSGPTC CASGSTCVVS NPYYSQCLPG SATTSSSSTA SSTTSSVRST TTSTSTTTPP TGSPTTTTPP APSGGATYTG NPFAGVNLWA NAYYASEVSS LAIPSLSDGA LATAAAKVAK VPTFQWMDTA AKVPLMDGTL ADIRRANKAG GNPPYAGQFV VYNLPDRDCA AAASNGELSI ADGGVAKYKA YIDAIRAMLV KYSDIRIILV IEPDSLANMV TNMGVPKCAN AQAAYLECTN YAVTQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KDAGKPKALR GLATNVSNYN AWNITSPPSY TQGNPNYDEK HYIEALAPLL SSNGWPDAKF IVDQGRSGKQ PTGQQEWGDW CNAIGTGFGV RPTANTGSSL VDAFVWVKPG GESDGTSDTT AARYDHHCGS ADALKPAPEA GTWFQAYFEQ LLTNANPSF
SEQ ID NO: 50 (node #52)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG PATTAASSTR PATTSSVRST TPPTTTVAPP PGPPTTTAPP APPGGATYTG NPFAGVNQWA NAYYRSEVSS LAVPSLSDGP LATAAAKVAD VPTFQWMDTT AKVPLIDGAL ADIRRANAAG GNPPYAGIFV VYNLPDRDCA AAASNGELSI ANDGINKYKA YIDSIRAVLL KYNDIRTLLV IEPDSLANMV TNMGVAKCSN AAAAYKECTK YAVQQLDLPH VAQYLDAGHA GWLGWPANIG PAATIFTDIY KEAGKPKSLR GLATNVSNYN AWNATSPAPY TSPNPNYDEK HYVDAFAPLL RQNGWPDAKF IIDQGRSGKQ PTGQQEWGHW CNALGTGFGL RPTSNTGHPD VDAFVWVKPG GEADGTSDTT AVRYDHFCGS ASSMKPAPEA GTWFQAYFEQ LLRNANPSF
SEQ ID NO: 51 (node #53)
QNCGSVWSQC GGIGWSGPTC CASGSTCVEQ NPYYSQCLPG STTTSTTTRS STTSSSTSST TTSTSTTTSP TTSPTTTIPG GASSTASYSG NPFSGVQLWA NDYYASEVHS LAIPSMTDGA LAAKASAVAK VPSFQWLDRN VTVDLMAGTL SEIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGAANYKT YIDAIRALLI EYSDIRIILV IEPDSLANMV TNMNVAKCAN AESTYRELTV YALKQLNLPN VAMYLDAGHA GWLGWPANIQ PAATLFAKIY KNAGKPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYIQALAPLL SSNGFPPAHF IVDTGRNGKQ PTGQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GECDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LLTNANPPF SEQ ID NO: 52 (node #54)
QNCGSVWSQC GGIGWSGPTC CASGSTCVEQ NPYYSQCLPG STTTSTTTRS STTSSSTSSS TTSTSTTTSP PTSPTTTIPG GASSTASYSG NPFSGVQLWA NDYYASEVHN LAIPSMTDGA LAAKASAVAK VPSFQWLDRN VTVDLMAGTL SEIRAANKAG ANPPYAAQFV VYDLPDRDCA AAASNGEFSI ANNGAANYKT YIDAIRKLLI EYSDIRIILV IEPDSLANMV TNMNVAKCAN AASTYRELTV YALKQLNLPH VAMYLDAGHA GWLGWPANIQ PAATLFAKIY KDAGKPAAVR GLATNVANYN AWSIASPPSY TSPNPNYDEK HYIEAFAPLL NSAGFPPAQF IVDTGRNGKQ PTGQQEWGDW CNVIGTGFGV RPTTNTGSSL VDAFVWVKPG GESDGTSDTS AARYDYHCGL SDALQPAPEA GQWFQAYFEQ LLTNANPPF
SEQ ID NO: 53 (node #55)
QNCGSVWSQC GGIGWSGPTC CASGSTCVEQ NDWYSQCLPG STTTSTTTRS STTSSSTSSS TTSTSTTTSP PTSPTTTIPG GASSTASYSG NPFSGVQLWA NDYYRSEVHN LAIPSMTDGA LAAKASAVAE VPSFQWLDRN VTVDLMVGTL SEIRAANKAG ANPPYAAQFV VYDLPDRDCA AAASNGEFSI ANNGAANYKT YIDAIRKLLI EYSDIRIILV IEPDSLANMV TNMNVAKCSN AASTYRELTV YALKQLNLPH VAMYLDAGHA GWLGWPANIQ PAAELFAKIY KDAGKPAAVR GLATNVANYN AWSIASPPSY TSPNPNYDEK HYIEAFAPLL NSAGFPPAQF IVDTGRNGKQ PTGQQEWGDW CNVIGTGFGV RPTANTGHEL VDAFVWVKPG GESDGTSDTS AARYDYHCGL SDALQPAPEA GQWFQAYFEQ LLTNANPPF
SEQ ID NO: 54 (node #56)
QNCSSVWGQC GGIGYNGPTC CQSGSTCVEQ NDWYSQCLPG STTTSTTSTS STSTSSTSSS TTSTSTTTSP PTSPTTTIPG GASSTASYNG NPFSGVQLWA NNYYRSEVHT LAIPSMTDPA LAAKASAVAE VPSFQWLDRN VTVDLLVGTL SEIRAANQAG ANPPYAAQFV VYDLPDRDCA AAASNGEWSI ANNGANNYKR YIDRIRELLI QYSDIRTILV IEPDSLANMV TNMNVAKCSN AASTYRELTV YALKQLNLPH VAMYMDAGHA GWLGWPANIQ PAAELFAKIY KDAGKPAAVR GLATNVANYN AWSIASPPSY TSPNPNYDEK HYIEAFAPLL NSQGFPPAQF IVDTGRNGKQ PTGQQEWGHW CNVIGTGFGV RPTANTGHEL VDAFVWVKPG GESDGTSDTS AARYDYHCGL SDALQPAPEA GQWFQAYFEQ LLTNANPPF SEQ ID NO: 55 (node #57)
QAQQSVWGQC GGQGWSGPTS CAAGSTCSTQ NPYYAQCIPG STATSTTTTS TTTTTSTSTT TTTTTTTTPP TTTPTTTTPP AATTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDGS LAAKASAVAK VPSFVWLDTA AKVPTMGTYL ADIQAANKAG ANPPIAGIFV VYDLPDRDCA AAASNGEYSI ANNGVANYKA YIDSIRAQLV KYSDVHTILV IEPDSLANMV TNLNVAKCAN AQSAYLECVN YALKQLNLPN VAMYLDAGHA GWLGWPANLS PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSISSPPSY TQGDPNYDEK HYINALAPLL TSNGFPDAHF IMDTSRNGVQ PTKQQAWGDW CNVIGTGFGV RPTTNTGDPL EDAFVWVKPG GESDGTSNTS APRYDYHCGY SDALQPAPEA GTWFQAYFEQ LLTNANPPF
SEQ ID NO: 56 (node #58)
EAQQSVWGQC GGQGWSGATS CAAGSTCSTQ NPYYAQCIPG STATSTTLVT TTSTTSVGTT SSTTTTTTSP TTTTTTTTST TATTTAAASG NPFSGYQLYA NPYYSSEVHT LAIPSLTDGS LAAAATKAAE IPSFVWLDTA AKVPTMGTYL ANIEAANKAG ASPPIAGIFV VYDLPDRDCA AAASNGEYTV ANNGVANYKA YIDSIVAQLK AYPDVHTILI IEPDSLANMV TNLSTAKCAE AQSAYYECVN YALINLNLPN VAMYIDAGHA GWLGWSANLS PAAQLFATVY KNASSPAALR GLATNVANYN AWSISSPPSY TSGDSNYDEK LYINALSPLL TSNGWPNAHF IMDTSRNGVQ PTKQQAWGDW CNVIGTGFGV QPTTNTGDPL EDAFVWVKPG GESDGTSNSS ATRYDYHCGY SDALQPAPEA GTWFQAYFVQ LLTNANPPL
SEQ ID NO: 57 (node #59)
QAQQSVWGQC GGQGWSGPTS CAAGSTCSTQ NPYYAQCIPG STATSTTTTS TTTTTTTSTT TTTTTTTTPP TTTPTTTTPP AATTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDSS LAPKASAVAK VPSFVWLDTA AKVPTMGTYL ADIQAKNKAG ANPPIAGIFV VYDLPDRDCA AAASNGEYSI ANNGVANYKA YIDSIRAQLV KYSDVHTILV IEPDSLANMV TNLNVAKCAN AQSAYLECVN YALKQLNLPN VAMYLDAGHA GWLGWPANLS PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSISSCPSY TQGDPNCDEK HYINALAPLL KENGFPDAHF IMDTSRNGVQ PTKQQAWGDW CNVIGTGFGV RPTTNTGDPL EDAFVWVKPG GESDGTSNTS SPRYDYHCGY SDALQPAPEA GTWFQAYFEQ LLTNANPPF SEQ ID NO: 58 (node #60)
QAQQTVWGQC GGQGWSGPTS CVAGSTCSTQ NPYYAQCIPG STATSTTTTS TTTTTTTSTT TTTTTTTTPP TTGPTTTAPP AATTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDSS LAPKASAVAK VPSFVWLDTA AKVPTMGTYL ADIQAKNKAG ANPPIAGIFV VYDLPDRDCA ALASNGEYSI ANNGVANYKA YIDSIRAQLV KYSDVHTILV IEPDSLANLV TNLNVAKCAN AQSAYLECVD YALKQLNLPN VAMYLDAGHA GWLGWPANLG PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSISTCPSY TQGDPNCDEK RYINALAPLL KENGFPDAHF IMDTSRNGVQ PTKQQAWGDW CNVIGTGFGV RPTTNTGDPL QDAFVWVKPG GESDGTSNTS SPRYDAHCGY SDALQPAPEA GTWFQAYFEQ LLTNANPSF
SEQ ID NO: 59 (node #61)
QAQQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG STATSATATS TTTTTTTSTT TTTQTTTKPT TTGPTTSAPT AATTTVTASG NPFSGYQLYA NPYYSSEVHT LAMPSLPDSS LQPKASAVAE VPSFVWLDVA AKVPTMGTYL ADIQAKNKAG ANPPIAGIFV VYDLPDRDCA ALASNGEYSI ANNGVANYKA YIDAIRAQLV KYSDVHTILV IEPDSLANLV TNLNVAKCAN AQSAYLECVD YALKQLNLPN VAMYLDAGHA GWLGWPANLG PAATLFAKVY TDAGSPAAVR GLATNVANYN AWSLSTCPSY TQGDPNCDEK KYINAMAPLL KEAGFPDAHF IMDTSRNGVQ PTKQNAWGDW CNVIGTGFGV RPSTNTGDPL QDAFVWIKPG GESDGTSNSS SPRYDAHCGY SDALQPAPEA GTWFQAYFEQ LLTNANPSF
Appendix 3
SEQ ID NO: 60 (DiCA: node #33) gctagcAAQAPVWGQC GGTGWSGPTC CASGSTCVVQ NPYYSQCLPG STTSSTTTTS TTTSSSTSST TTTTSTTTPP TTSPTTTTPP PASTTTPASG NPFAGYQLYL NPYYAAEVAS LAIPNITDPA LKAKAASVAK IPTFVWLDTV AKVPDLGTYL ADASALQKAS GQPPYAVQIV V YDLPDRD C A AAASNGEFSI ANNGMANYKA YIDSIRAQLV KYSDVPJVAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNLPN VYMYLDAGHA GWLGWPANLS PAAQLFAQVY KNAGSPAAVR GLATNVANYN ALSAASPDSY TQGDPNYDEI HYINALAPML SSQGFPDAHF IVDTGRNGVQ NIRQQEWGDW CNVKGAGFGT RPTTNTGSSL IDAFVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LVKNANPPLggtacc
SEQ ID NO: 61 (BasCA node #34) gctagcAAQAPVWGQC GGTGWTGPTC CASGSTCVVQ NPYYSQCLPG STTSSTTTTS TTTSSPTSST TTTTSTTTPP TTSPTTTTPP PGSTSTPASG NPFAGYQLYL SPYYAAEVAA LAAPNITDPA LKAKAASVAN IPTFTWFDTV AKVPDLGTYL ADASALQKSS GQKPYAVQIV V YDLPDRD C A AAASNGEFSI ANNGMANYKT YIDSIAAQLK KYSDVRVVAV IEPDSLANMV TNLNVAKCAN AQTAYKEGVT YALKQLNLVG VYMYLDAGHA GWLGWPANLS PAAQLFAQLY KNAGSPSFVR GLATNVANYN ALSAASPDSY TQGNPNYDEI HYINALAPML SSQGFPPAHF IVDQGRSGVQ NIRQQQWGDW CNVKGAGFGT RPTTNTGSSL IDAFVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFET LVKNANPPLggtacc
SEQ ID NO: 62 (AsCA node #41) gctagcQAQASVWGQC GGQGWSGPTC CASGSTCVVQ NPYYSQCLPG STTTSTTTTS TTTTSSTSST TTTTSTTTPP TTSPTTTTPP AASTTASASG NPFSGYQLYA NPYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFVWLDTA AKVPTMGTYL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGVANYKA YIDSIRAQLV KYSDVRIILV IEPDSLANMV TNLNVAKCAN AQSAYLECVN YALKQLNLPN VAMYLDAGHA GWLGWPANLS PAAQLFAKVY KNAGSPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYINALAPLL SSNGFPDAHF IVDTGRNGVQ PTRQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GESDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LLTNANPPFggtacc SEQ ID NO: 63 (SorCA node #42) gctagcQACASVWGQC GGQGWSGPTC CASGSTCVVQ NPYYSQCLPG STTTSTTTRS STTTSSTSST TTSTSTTTPP TTSPTTTTPP AASSTASYSG NPFSGVQLWA NAYYASEVHS LAIPSLTDGA LAAKASAVAK VPSFQWLDTA AKVPLMAGTL ADIRAANKAG ANPPYAGQFV VYDLPDRDCA AAASNGEFSI ANNGVANYKA YIDAIRAQLV EYSDIRIILV IEPDSLANMV TNMNVAKCAN AQSAYLECTN YALKQLNLPN VAMYLDAGHA GWLGWPANLQ PAATLFAKVY KNAGKPAAVR GLATNVANYN AWSIASPPSY TQGDPNYDEK HYIQALAPLL SSNGFPDAHF IVDTGRNGKQ PTGQQEWGDW CNVIGTGFGV RPTTNTGSSL EDAFVWVKPG GECDGTSDTS APRYDYHCGL SDALQPAPEA GTWFQAYFEQ LLTNANPPFggtacc
Appendix 4
SEQ ID NO: 64 (tr_A9FHT2_Bacteria_Sorangium_cellulosum) AC GDGG-GGDTS GTGGGSSGVG APTSSGVGAG TPTSSGNVDP TTTSSGNVDP
TTTSSPTTTS SGNVDPTTSA ASGGGNCSPA VSGPFADAHL FVDPGYVKKV DS-SIAQVTD TALKAKMEKV KQIQTAFWLD RIEAIKELPA YLDAALKLQN ELCEP-VTAL IWYDLPNRD CFAEASNGEL HLDQNGTQRY REYIAPIKQI LAAHSGQRIA AVIEPDSLPN IATNLGGKRC DETTASYRDN VAHTLKELNM PHVYQYIDAA HSGWLGWPDN QKKGAKIFAE VIKAAGSPAN VRGFATNVAN YTQLSYTAES YDQQDNPCFG EFDYVDAMAS ALSAEGLGDK HFI IDTSRNG VGNI-REDWG YWCNNKGAGM GQRPKANGGA TNLDAFVWVK PPGDSDGVGQ EGQPRYDLFC GKE-NADTRA PQAGQWFHEY FVECVKNANP AL
SEQ ID NO: 65 (tr_Q9UW10_Neocallimasti_Piromyces_rhizinflatus)
NDCNVDWGVL NGQEWIDKNR CNGGGYCKFE SLGYP-CCNG C DV YYTDND—GR
WGVENKCNGY QQPRTTTTTR TTTRTTTTQR PSDFFEN-TL YSNFKFQGEV QS-SIQKLS-
GDMAKKAEKV KYVPTAVWL- AWEGAPRVPQ YLDDAGS KTW FVLYMIPTRD
CNANASVG— —GSATLEKY KGYI DNIYNT FNQYPNSKIV MILEPDTIGN LVTA-NNANC
MNVQNLHKQG LAYAISKFGQ KNVRVYLDAA HGAWL—SSH ADKTAQVIKE ILNNAGS-GK
LRGITTNVSN YQTVN D EYSYQMRLNS ALQNLGVRDL HYI IDTSRNG
ANIAQQNQSG TWCNFKGAGL GARPQANSSK PLLDAYMWIK TPGEADGSS- -SGSRADPVC GRW-DSLQGA PDAGSWFHDY FVMLLQNANP PF
SEQ ID NO: 66 (tr_Q96V98_Neocallimasti_Orpinomyces)
L GYPCC SSSDYVDSDG VENGNWCGIP DPT CWAE RPCCTTTTTV EYTDSD—GK
WGVENPVPTK TQGPTPTSGS DP-TPGSQLT LSGPFSGVEF FLNPYYVAEV DA-AIAQMSN
SSLKAKAEKM KTYSNAIWLD TIKNMQQLET NLKGALAQ-Q TGSKK-VLTV FWYDLPGRD
CHALASNGEL LANDSDAQRY KTYIDVIEEK LKYYKSQPVV LIIEPDSLAN LVTNLNTPAC
RDSEQYYLDG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYSK VI-SSGSPGK
VRGFTDNVAN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGISSV YFVSDTSRNG
HK-TDRKHPG EWCNQTGVGI GARPQANSSM DYLDAFYWIK PLGESDGTSD TSAARYDGYC
GHE-TAMKPA PEAGQWFQKH FEQGLENANP PL
SEQ ID NO: 67(tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum)
M GYPCC SGNEYTDDDG VENGNWCGIA DPVYESCWSE SCCSSPNAEV WYTDES—GK
WGVENGIPKG TPTPI DDE— PYE I SGPFKGVEF YINPYYVDEV DG-AIAQMTD
SSLIAKAEKM KTFSNAIWLD TIKNMQSLET NLQGAQSQHQ SSGKD-ILTV FWYDLPGRD CHALASNGEL LANDGDLARY KSYI DVIEGH LKKYNTQPVV LIVEPDSLAN LVTNLSTPAC
ADSEKYYLEG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAGKVYAK VI-SSGSPGK
VRGFTDNVSN YTPWEDPSRG PETEWNPCPD EKRYLEAMHK DFKAAGIQSV YFVCDTSRNG
KK-VDRKHPG EWCNQTGVGV GARPKASSGM DYLDAFYWIK PLGESDGTSD ENAVRYDGYC
GHE-TAMKPA PEAGQWFQKH FEQGIKNANP PL
SEQ ID NO: 68 (tr_Q2U2I8_Ascomycota_Aspergillus_oryzae) SASWHHL TDSSFTDRVC CISDGDQQSL KPVTTVPSPE FQSSVNSKQA LVALSP—LL
FSAATALPQA SVTPSPSS-S VPSGPAPTAT AGGPFEGYDL YVNPYYKSEV ESLAIPSMT-
GSLAEKASAA ANVPSFHWLD TTDKVPQMGE FLEDIKTKNA AGANPPTAGI FWYDLPDRD
CAALASNGEF LISDGGVEKY KAYIDSIREQ VEKYSDTQI I LVIEPDSLAN LVTNLNVQKC
A AQDAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN IGPAAELYAS VYKNASSPAA
VRGLATNVAN YNAFSIDSCP SYTQGSTVCD EKTYINNFAP QLKSAGF-DA HFIVDTGRNG
NQPTGQSQWG DWCNVKNTGF GVRPTTDTGD ELVDAFVWVK PGGESDGTSD TSAERYDAHC
GYA-DALTPA PEAGTWFQAY FEQLVENANP SL
SEQ ID NO: 69 (tr_G4MM92 Ascomycota Magnaporthe oryzae)
QACAAQWGQC GGQDYTGPTC CQSGSTCVVS NQWYSQCLPG STTTSRTSTS SSSSTSRTSS
STSRPPTTPT SVPPTITTTT TPSGPGTTAS FTGPFAGVNL FPNKFYSSEV HTLAIPSLT-
GSLVAKASAV AQVPSFQWLD IAAKVETMPG ALADVRAANA AGGN—YAAQ LWYDLPDRD
CAAAASNGEF SIADGGVVKY KAYI DAIRKQ LLAYSDVRTI LVIEPDSLAN MVTNMGVPKC
AGAKDAYLEC TIYAVKQLNL PHVAMYLDGG HAGWLGWPAN LQPAADLFGK LYADAGKPSQ
LRGMATNVAN YNAWDLTTAP SYTTPNPNFD EKKYI SAFAP LLAAKGW-SA HFI IDQGRSG
KQPTGQKEWG HWCNQQGVGF GRRPSANTGS ELADAFVWIK PGGECDGVSD PTAPRFDHFC
GTDYGAMSDA PQAGQWFQKY FEMLLTNANP PL
SEQ ID NO: 70 (tr_Q2GMP2_Ascomycota_ Chaetomium globosum)
QNCATLWGQC GGNGWNGATC CASGSTCTKQ NDWYSQCLPG GGTTTKPTST STSTSTSSRS
TSTSQVSSST SSPPVVTNTS IPGGASSTAS YTGPFSGVQM WANDYYRSEV HTLAMPSLT-
GAMATKAAKV AEVPSYQWMD RNVTVDTFSG TLAQIRAANQ AGASPPYAGI FWYDLPDRD
CAAAASNGEW SIANGGAANY KAYIKRIREL I IQYSDIRML LVIEPDSLAN MVTNMGVAKC
AGAASTYKEL TIHALKELNL PNVAMYLDAG HAGWLGWPAN IQPAADLFAT LYKDAGRPAA
VRGLATNVAN YNAWSVSSAP AYTSPNPNYD EKHYVEAFSP LLTAAGF-PA HFITDTGRSG
KQPTGQLEWG HWCNAVGTGF GQRPSANTGH DLLDAFVWIK PGGECDGTSD TTAARYDHNC
GLA-DALKPA PEAGQWFQAY FEQLLTNANP PF SEQ ID NO: 71 (tr_B2ABX7_Ascomycota_Podospora_anserina)
QNCGSVWSQC GGQGWTGATC CASGSTCVAQ NQWYSQCLPG STTAQAPSST RTTTSSSSRP
TSSSINVPTT TTSAGASVTV PPGGASSTAS YSGPFLGVQQ WANSYYSSEV HTLAIPSLT-
GPMATKAAAV AKVPSFQWMD RNVTVDTFSG TLADIRAANR AGANPPYAGI FWYDLPDRD
CAAAASNGEW AIADGGAAKY KAYI DRIRHH LVQYSDIRTI LVIEPDSLAN MVTNMNVPKC
QGAANTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAELFAG IYKDAGRPTS
LRGLATNVAN YNGWSLSSAP SYTTPNPNFD EKRFVQAFSP LLTAAGF-PA HFITDTGRSG
KQPTGQLEWG HWCNAIGTGF GPRPTTDTGL DIEDAFVWIK PGGECDGTSD TTAARYDHHC
GFA-DALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 72 (tr_Q872J7_Ascomycota_Neurospora_crassa)
QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK VSSTTSKVTS
SSAAQPITTT TAPSVPTT-T IAGGASSTAS FTGPFLGVQG WANSYYSSEI YNHAIPSMT-
GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH FWYDLPDRD
CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN LVTNTNVAKC
A AASAYREC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD IYKAAGKPKS
VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGF-PA QFIVDTGRSG
KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD TSATRYDYHC
GLS-DALKPA PEAGQWFQAY FEQLLKNANP AF
SEQ ID NO: 73 (tr_F8MDR2_Ascomycota_ Neurospora tetrasperma)
QNCGSAWTQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK VSSTTTKVTS
SSAAQPITTT TAPSVPTT-T VAGGASSTAS FTGPFLGVQG WANSYYSSEI YNHAIPSMT-
GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH FWYDLPDRD
CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN LVTNTNVAKC
A AASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD IYKAAGKPKS
VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGF-PA QFIVDTGRSG
KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD TSATRYDYHC
GLS-DALKPA PEAGQWFQAY FEQLLKNANP AF
SEQ ID NO: 74 (tr_G2QA39 Ascomycota Thielavia heterothallica)
QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR
SGSS-SSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAIPSMT-
GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD
CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGF-PA RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC
GLS-DALQPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 75 (tr_G2QW39_Ascomycota_Thielavia_terrestris)
QNCGSVWSQC GGIGWSGATC CASGNTCVEL NPYYSQCLPN SSKTTSTTTR SSTTSHSSGP
TSTSTTTTSS PWTTPPSTS IPGGASSTAS WSGPFSGVQM WANDYYASEV SSLAIPSMT-
GAMATKAAEV AKVPSFQWLD RNVTIDTFAH TLSQIRAANQ KGANPPYAGI FWYDLPDRD
CAAAASNGEF S IAN GAANY KTYI DAIRSL VIQYSDIRII FVIEPDSLAN MVTNLNVAKC
ANAESTYKEL TVYALQQLNL PNVAMYLDAG HAGWLGWPAN IQPAANLFAE IYTSAGKPAA
VRGLATNVAN YNGWSLATPP SYTQGDPNYD ESHYVQALAP LLTANGF-PA HFITDTGRNG
KQPTGQRQWG DWCNVIGTGF GVRPTTNTGL DIEDAFVWVK PGGECDGTSN TTSPRYDYHC GLS-DALQPA PEAGTWFQAY FEQLLTNANP PF
SEQ ID NO: 76 (tr_J3NZ73_Ascomycota_Gaeumannomyces_graminis)
QSCGSQWSQC GGIGWGGATC CASGSTCVRQ NDYYFQCIPG SPTTTTSRSG PTTTTRAPGP
STTVTRGTTT TGGNGGTPTS GGGGAGTTAS FTGPFQGVNL WVNDYYASEI STLAIPSLS-
GAMATKAAAV AKVPSFEWFD IAAKVGTMPH TLNAIRAANK AGGN—FAAQ FWYDLPDRD
CAAAASNGEY SIVDGGVAKY KAYI DS IRAQ LVSFSDIRTI LVVEPDSLAN MVTNLNVPKC
ANAQAAYREC TLYAIKQLNL PNVAMYLDGG HAGWLGWPAN LGPAADLFGK LYVDAGKPSQ
LRGMATNVAN YNSWNLTSAP AYTSPNPNYD ERHYVEAFHP LLAAKGW-NA HFITDQGRSG
KQPTGQLEWG HWCNAMGTGF GMRPSANTGL EIQDAFVWIK PGGECDGTSD TTAARFDRFC GMA-DALKPA PEAGQWFQAY FVQLLTNANP PF
SEQ ID NO: 77 (tr_G2XB72_Ascomycota_Verticillium_dahliae)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSVRST
TVVNPPTTTV APPPGTTVAP PP GGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLS-
GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGN-YAGI FVVYNLPDRD
CAAAASNGEL SIANDGINKY KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC
SNAAAAYKEC TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGRPKS
LRGLATNVSN YNAWNATSPA PYTSPNPNYD EKHYVDAFAP LLRQNGW-DA KFI IDQGRSG
KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC GSA-SSMKPA PEAGTWFQAY FEQLLRNANP SF SEQ ID NO: 78 (tr_E3Q540_Ascomycota_Colletotrichum_graminicola)
QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP
SRTSTVTGSV STTSAGTGTT PP—PTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLS-
PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGAN—YAGE FWYNLPDRD
CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC
AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA
LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGW-DA KFIVDQGRSG
KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC
GKA-DALKPA PEAGTWFQAY FEQLLINANP AF
SEQ ID NO: 79 (tr_Q4JQF8 Ascomycota Chaetomium thermophilum)
QSCSSVWGQC GGINYNGPTC CQSGSVCTYL NDWYSQCIPG Q--GTTSTTA RTTSTSTTST
SSVRPTTSNT PVTTAPPTTT IPGGASSTAS YNGPFSGVQL WANTYYSSEV HTLAIPSLS-
PELAAKAAKV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ RGANPPYAGI FWYDLPDRD
CAAAASNGEW S IANNGANNY KRYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVQKC
SNAASTYKEL TVYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAQ IYRDAGRPAA
VRGLATNVAN YNAWSIASPP SYTSPNPNYD EKHYIEAFAP LLRNQGF-DA KFIVDTGRNG
KQPTGQLEWG HWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC
GLS-DALTPA PEAGQWFQAY FEQLLINANP PF
SEQ ID NO: 80 (sp_Q9ClS9_Ascomycota_ Humicola insolens)
QNCAPTWGQC GGIGFNGPTC CQSGSTCVKQ NDWYSQCLPG S-TTSTTSTS SSSTTSRATS
TTRTGGVTSI TTAPTRTV-T IPGGATTTAS YNGPFEGVQL WANNYYRSEV HTLAI PQITD
PALRAAASAV AEVPSFQWLD RNVTVDTLVE TLSEIRAANQ AGANPPYAAQ IWYDLPDRD
CAAAASNGEW AIANNGANNY KGYINRIREI LISFSDVRTI LVIEPDSLAN MVTNMNVAKC
SGAASTYREL TIYALKQLDL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK IYEDAGKPRA
VRGLATNVAN YNAWSISSPP PYTSPNPNYD EKHYIEAFRP LLEARGF-PA QFIVDQGRSG
KQPTGQKEWG HWCNAIGTGF GMRPTANTGH QYVDAFVWVK PGGECDGTSD TTAARYDYHC
GLE-DALKPA PEAGQWFQAY FEQLLRNANP PF
SEQ ID NO: 81 (sp_P07987 Ascomycota Hypocreajecorina)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG -AASSSSSTR AASTTS—RV
SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLT-
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGN—YAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALP-DALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 82 (tr_Q7LSP2_Ascomycota_Trichoderma_koningii)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG -AASSSSSTR AASTTS—RV
SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLT-
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGG —YAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGR LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC
ALP-DALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 83 (tr_H9C5Tl_Ascomycota_Hypocrea_orientalis)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG -AASSSSSTR ASSTTA—RA SSTT-SRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLT- GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGN—YAGQ FWYDLPDRD CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC ALP-DALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 84 (tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG -AASSSSSTR ASSTTARASS
TT SRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLT-
GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGN—YAGQ FWYDLPDRD CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC ALP-DALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 85 (tr_G9NFV6_Ascomycota_Hypocrea_atroviridis) QACASVWGQC GGQGWSGATC CASGSSCVVS NPYYSQCLPG S SS SSTLASSTRA
SSTTVRSSST TPPPSSST-P PPPVGSGTAT YQGPFSGINP WANSFYAQEV SSSAIPSLS-
GAMATAAAAA AKVPSFMWLD TLSKTSLLSS TLSDIRAANK AGG —YAGQ FWYDLPDRD
CAAAASNGEY SIADNGVANY KNYIDTIVGI LKTYSDIRTI LVIEPDSLAN LVTNLSVAKC
SNAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAS VYKNASSPRA
VRGLATNVAN YNGWNITSAP SYTQGNSVYN EQLYIHAISP LLTQQGWSNT YFITDQGRSG
KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFTWIK PGGECDGTSN TSATRYDYHC GLS-DALQPA PEAGSWFQAY FVQLLTNANP SF
SEQ ID NO: 86 (tr llRIJl Ascomycota Gibberella zeae)
QSCSNVWSQC GGQNWSGTPC CTSGNKCVKV NDFYSQCQPG S—SPTSTIV SATTTK ATTTG SGGSVTSP-- PPV ATNPFSGVDL WANNYYRSEV STLAIPKLS-
GAMATAAAKV ADVPSFQWMD TYDHISFMEE SLADIRKANK AGGN—YAGQ FWYDLPDRD
CAAAASNGEY SLDKDGKNKY KAYIAKIKGI LQDYSDTRI I LVIEPDSLAN MVTNMNVPKC
ANAASAYKEL TIHALKELNL PNVSMYIDAG HGGWLGWPAN LPPAAQLYGQ LYKDAGKPSR
LRGLVTNVSN YNAWKLSSKP DYTESNPNYD EQKYIHALSP LLEQEGWPGA KFIVDQGRSG
KQPTGQKAWG DWCNAPGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GID-GAVKPA PEAGTWFQAY FEQLLKNANP SF
SEQ ID NO: 87 (tr_Q8NIB5_Ascomycota_Talaromyces_emersonii) QSLWGQC GGSSWTGATS CAAGATCSTI NPYYAQCVPA T-TTLTTTTK PTSTGG AAPTT PPPTTTGTTT SP-VVTRPAS ASGPFEGYQL YANPYYASEV ISLAIPSLS-
SELVPKASEV AKVPSFVWLD QAAKVPSMGD YLKDIQSQNA AGADPPIAGI FWYDLPDRD
CAAAASNGEF SIANNGVALY KQYIDSIREQ LTTYSDVHTI LVIEPDSLAN VVTNLNVPKC
ANAQDAYLEC INYAITQLDL PNVAMYLDAG HAGWLGWQAN LAPAAQLFAS VYKNASSPAS
VRGLATNVAN YNAWSISRCP SYTQGDANCD EEDYVNALGP LFQEQGF-PA YFI IDTSRNG
VRPTKQSQWG DWCNVIGTGF GVRPTTDTGN PLEDAFVWVK PGGESDGTSN TTSPRYDYHC
GLS-DALQPA PEAGTWFQAY FEQLLTNANP LF
SEQ ID NO: 88 (tr_G2XV25_Ascomycota_Botryotinia_mckeliana) GAAYAQC GGQGWSGATT CVSGYTCVVN NAYYSQCLPG SAVTTTATTA PTATTPTTI I
TSTT-KATTT TGGSSATT-- TAA VAGPFSGKAL YANPYYASEI SASAIPSLT-
GAMATKAAAV AKVPTFYWLD TAAKVPLMGT YLANIRALNK AGANPPVAGT FWYDLPDRD
CAAAASNGEY SIADGGLVKY KAYIDSIVAL LKTYSDVSVI LVIEPDSLAN LVTNLSVAKC
SNAQAAYLEG TEYAIAQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFGQ IYKAAGSPAA VRGLATNVAN YNAWTSTTCP SYTSGDSNCN EKLYINALAP LLTAQGF-PA HFIMDTSRNG
VQPTAQQAWG DWCNLIGTGF GVRPTTNTGD ALEDAFVWIK PGGEGDGTSD TTAARYDFHC
GLA-DALKPA PEAGTWFQAY FAQLLTNANP SF
SEQ ID NO: 89 (tr_B6QMM6_Ascomycota Penicillium marneffei) QSVWGQC GGQGYTGATS CAAGSTCSTQ NPYYAQCIPA TATSTTL VKTTSSTSVG
TTSAPTTTTT KATTTKASTT AT TAA ASGPFSGYQL YANPYYSSEV HTLAIPSLT-
GTLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGATPPIAGI FWYDLPDRD
CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAHPDVHTI LI IEPDSLAN MVTNLSTAKC
TEAQPAYYEC VNYALINLNL PNVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAA
LRGLVTNVAN YNAWSISSAP SYTSGDSNYD EQLYVNALSP LLTSNGWPNA HFIMDTSRNG
VQPTQQKAWG DWCNLIGTGF GVAPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC
GNS-DSLQPA PEAGSWFQAY FVQLLTNANP PL
SEQ ID NO: 90 (tr_G7XQ80 Ascomycota Aspergillus kawachii) QTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA T-SATTLKTT TSTTTA AMTTT TATSSPAASA SP TTTAS ASGPFSGYQL YVNPYYSSEV ASLAIPSLT-
GSLQAAATAA AKVPSFVWLD TADKVPTMAD YLADIKSQNS AGASPPIAGQ FWYDLPDRD
CAALASNGEY SIADNGVEHY KAYI DS IREV LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC
ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA
VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLEAQGF-DA HFIVDTGRNG
KQPTGQQAWG DWCNVINTGF GVRPTTSTGD DLVDAFVWVK PGGESDGTSD SSATRYDAHC
GYS-DALQPA PEAGTWFQAY FVQLLTNANP AF
SEQ ID NO: 91 (sp_A2QYR9_Ascomycota _Aspergillus_niger) QTLWGQC GGQGYSGATS CVAGATCATV NEYYAQCTPA -AGTSSATTL KTTTSS
TTAAVTTTTT TQSPTGSA-- SPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAIPSLT-
GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIQSQNA AGANPPIAGQ FWYDLPDRD
CAALASNGEY SIADNGVEHY KSYIDSIREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC
ANAESAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA
VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGF-DA HFIVDTGRNG
KQPTGQQAWG DWCNVINTGF GERPTTDTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC
GYS-DALQPA PEAGTWFQAY FVQLLTNANP AF SEQ ID NO: 92 (sp_P46236_Ascomycota_Fusarium_oxysporum)
QSCSNVWAQC GGQNWSGTPC CTSGNKCVKL NDFYSQCQPG SAEPSST AAGPSS—TT
ATKTTATGGS STTAGGSVTS AP PA ASDPYAGVDL WANNYYRSEV MNLAVPKLS-
GAKATAAAKV ADVPSFQWMD TYDHISLMED TLADIRKANK AGGK—YAGQ FWYDLPNRD
CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTKVI LVIEPDSLAN LVTNLNVDKC
AKAESAYKEL TVYAIKELNL PNVSMYLDAG HGGWLGWPAN IGPAAKLYAQ IYKDAGKPSR
VRGLVTNVSN YNGWKLSTKP DYTESNPNYD EQRYINAFAP LLAQEGWSNV KFIVDQGRSG
KQPTGQKAQG DWCNAKGTGF GLRPSTNTGD ALADAFVWVK PGGESDGTSD TSAARYDYHC
GLD-DALKPA PEAGTWFQAY FEQLLDNANP SF
SEQ ID NO: 93 (sp QOCFPl Ascomycota Aspergillus terreus) QTLWGQC GGIGWTGPTN CVAGAACSTQ NPYYAQCLPG TSTTLTTTTR VTTTTTSTTS
KSSSTGSTTT TKSTGTTTTS GS-STTITSA PSGPFSGYQL YANPYYSSEV HTLAMPSLA-
SSLLPAASAA AKVPSFTWLD TAAKVPTMGT YLADIKAKNA AGANPPIAAQ FWYDLPDRD
CAALASNGEY SIANGGVANY KKYI DAIRAQ LLNYPDVHTI LVIEPDSLAN LVTNLNVAKC
ANAQSAYLEC VNYALIQLNL PNVAMYIDAG HAGWLGWPAN IGPAAQLFAG VYKDAGAPAA
LRGLATNVAN YNAFSISTCP SYTSGDANCD ENRYINAIAP LLKDQGW-DA HFIVDTGRNG
VQPTKQNAWG DWCNVIGTGF GVRPTTNTGN SLVDAFVWVK PGGESDGTSD SSSARYDAHC
GYS-DALQPA PEAGTWFQAY FEQLLKNANP AF
SEQ ID NO: 94 (tr_F2VRZ0_Ascomycota_Phialophora_sp)
QNCASEWGQC GGTGFTGASC CASGSTCTQQ NEYYSQCVPG S TG QIASTPAATV
VGSATSSPSQ MTAPAASA— SGTTS YSGPFEGVQM WANAYYASEV LNLAVPSLS-
GDMVAKASAV AKVPSFQWLD TAAKVPTMAD TLADIAKANQ AGASPAYAGL FWYDLPDRD CAAAASNGEY SIADNGVANY KAYI DAIKAQ LVANSDTRIL LVVEPDSLAN LVTNMNVAKC ANAHDAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWSAN LQPAATLFAN VYSNAGKPAS LRGLATNVAN YNAWTIASAP SYTQGDSNYD EKLYVQALSP LLSSAGW-DA HFITDQSRSG KQPTGQNAWG DWCNVIGTGF GTRPTTDTGL DIEDALVWVK PGGECDGTSN TTAARYDYHC GLS-DALQPS PEAGTWFQAY FVQLLTNANP AF
SEQ ID NO: 95 (tr_FlCHI2_Ascomycota_Penicillium_decumbens) QTVWGQC GGIGYSGPTS CVAGSSCSTQ NSYYAQCLPG SGGGAATTTT TAGQTTKTTM
ATTTTTSTKT SAGSGGSTTT AP PAS NSGPFKGYQP YVNPYYASEV QSLAIPSLA-
ASLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DS IRAQ LLKYPDVHTI LVIEPDSLAN LVTNMNVAKC SGAHDAYLEC TDYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAADLFAS VYKNAGSPAA
VRGLATNVAN YNAWSISTCP SYTQGDQNCD EKRYINALAP LLRANGF-DA HFIMDTSRNG
VQPTKQQAWG DWCNVIGTGF GTPFTTDTGD ALQDAFIWVK PGGECDGTSD TSSPRYDAHC
GYS-DALKPA PEAGTWFQAY FEQLLVNANP SF
SEQ ID NO: 96 (sp_AlCCN4_Ascomycota _Aspergillus_clavatus) QTMWGQC GGAGWSGATD CVAGGVCSTQ NAYYAQCLPG ATTATTL STTSKG—TT
TTTTSSTTST GGGSSSTTTS TSAGPTVTGS PSGPFSGYQQ YANPYYSSEV HTLAIPSMT-
GALAVKASAV ADVPSFVWLD VAAKVPTMGT YLENIRAKNK AGANPPVAGI FWYDLPDRD
CAALASNGEY AIADGGIAKY KAYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LITNINVAKC
SGAKDAYLEC INYALKQLNL PNVAMYIDAG HGGWLGWDAN IGPAAEMYAK VYKDADAPAA
LRGLAVNVAN YNAWTIDTCP SYTQGNKNCD EKRYIHALYP LLKAAGW-DA RFIMDTGRNG
VQPTKQQAQG DWCNVIGTGF GIRPSSETGD DLLDAFVWVK PGAESDGTSD TTAARYDAHC
GYT-DALKPA PEAGQWFQAY FEQLLTNANP AF
SEQ ID NO: 97 (tr_093837 Ascomycota Acremonium cellulolyticus) QSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATSTTL VKTTSS TSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLT-
GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD
CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC
AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS
LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG
VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC
GYS-DALQPA PEAGTWFQAY FVQLLTNANP AL
SEQ ID NO: 98 (tr_B5TMG4_Acomycota_Penicillium_fliniculosum) QSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATSTTL VKTTSS TSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLT-
GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD
CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC
AEAQSAYYEC VNYALIKPHL AHVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS
LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPDA HFIMDTSRNG
VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC
GYS-GALQPA PEAGTWFQAY FVQLLTNANP AL SEQ ID NO: 99 (sp_Q4WFK4_Ascomycota_Neosartorya_fumigata) QTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG A—STTLTTT TAATTT SQTTT KPTTTGPTTS AP TVT ASGPFSGYQL YANPYYSSEV HTLAMPSLP-
SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD
CAALASNGEY S IAN GVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC
ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA
VRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGF-DA HFIMDTSRNG
VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN STSPRYDAHC GYS-DALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 100 (sp_Q5B2E8_Ascomycota_Emericella_nidulans) QTLYGQC GGSGWTGATS CVAGAACSTL NQWYAQCLPA —ATTTSTTL TTTTSS VTTTS NPGSTTTT— —SSVTVTAT ASGPFSGYQL YVNPYYSSEV QSIAIPSLT-
GSLAPAATAA AKVPSFVWLD VAAKVPTMAT YLADIRSQNA AGANPPIAGQ FWYDLPDRD
CAALASNGEF AISDGGVQHY KDYIDSIREI LVEYSDVHVI LVIEPDSLAN LVTNLNVAKC
ANAQSAYLEC TNYAVTQLNL PNVAMYLDAG HAGWLGWPAN LQPAANLYAG VYSDAGSPAA
LRGLATNVAN YNAWAIDTCP SYTQGNSVCD EKDYINALAP LLRAQGF-DA HFITDTGRNG
KQPTGQQAWG DWCNVIGTGF GARPSTNTGD SLLDAFVWVK PGGESDGTSD TSAARYDAHC GYS-DALQPA PEAGTWFQAY FVQLLQNANP SF
SEQ ID NO: 101 (tr_B8MHF4_Ascomycota_Talaromyces_stipitatus) VWGQC GGQGWTGATI CAAGATCSAI NSYYAQCTPA AAASTTL VTKTSS
TSVGTTSPAT TPTTKPST— TAA ASGPFSGYQL YANPYYSSEV HTLALPSLT-
GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI FWYDLPDRD
CAAAASNGEY TVANNGVANY KAYIDSIVKQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC
SEAQAAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAQ VYKNASSPAS
LRGLATNVAN YNAWSLSSAP SYTSGDSNYD EQLYINALSP LLTQNGWPNA HFIMDTSRNG
VQPTKQQAWG DWCNVIGTGF GVPPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC
GYS-DALQPA PEAGTWFQAY FVQLLTNANP SL
SEQ ID NO: 102 (sp_AlDJQ7_Ascomycota_Neosartorya_fischeri) QTVWGQC GGQGWSGPTN CVAGAACSTL NPYYAQCIPG ATATSTT LSTTTT TQTTT KPTTTGPTTS AP TVT ASGPFSGYQL YANPYYSSEV HTLAMPSLP-
SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGASPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA LRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKNAGF-DA HFIMDTSRNG VQPTKQSAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYS-DALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 103 (gi_367023495_Ascomycota_Myceliophthora_thermophila)
QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR
SGSS-SSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAIPSMT-
GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD
CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC
SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA
VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGF-PA RFIVDTGRNG
KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC GLS-DALQPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 104 (gi_310790274_Ascomycota_Glomerella_graminicola)
QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PP—PTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLS- PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGAN—YAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGW-DA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKA-DALKPA PEAGTWFQAY FEQLLINANP AF
SEQ ID NO: 105 (gi_302405457_Ascomycota_Verticillium_albo-atrum)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSIRST
TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLS-
GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGN—YAGI FWYNLPDRD
CAAAASNGEL SIANDGINKY KAYIDSIRTV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC
SNAAAAYKEC TKYAVQKLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS
LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGW-DA KFI IDQGRSG
KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC
GSA-SSMKP- SEQ ID NO: 106 (gi_345565889_Ascomycota_Arthrobotrys_oligospora)\ LWGQC GGIGWTGATN CVAGAACSTL NPYYAQCLSA AATTPRTTTT PATTTR—TT
TPATTPRTTT PATTPATTTT TPGGVGNPVT INGPFAGRKQ HYNAYYSSEI YNIAVPSLVA ASLTAAAKAV ATVPTFVWFD TIDKLSQLEG HLNDIRAKRA TGED—TLGI FWYDLPDRD CAALASNGEL SIANNGVNIY KTYI DPMVAI FKRYPDI PLA LVIEPDSVAN MITNMGTAKC ANAKSAYEEC I SYAVQKLNL PNIAMYLDAG HAGWLGWPDN LSKSGPYYAN IYKNAGSPAS FRGLATNVAN YNAWSISTCP PYTQGASICD EKRYINAFGP LLRQNGW-DA HFIVDQGRSG KQPTGQGQWG DWCNAIGTGF GIRPDTATND ALLDAFVWIK PGGECDGTSD TSAVRYDSHC GSS-SSLKPA PEAGTWFQAY FEQLLVNANP SF
SEQ ID NO: 107 (gi_169852726_Ascomycota_Coprinopsis_cinerea) RPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG S QP PVTTQPPV— VVPTT SQPPVWPTN PP—GGTPVP STGPFEGYDI YLSPYYAEEV EA-AAAMIDD
PVLKAKALKV KEIPTFIWFD WRKTPDLGR YLADATAIQQ RTGRK-QLVQ IWYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ IYKDAGRSAF
IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGW-DA KFIVDQGRSG VQNI-RQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC GKS-DSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 108 (tr_Q6UJX9_Acomycota_Trichoderma_viride)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG -AASSSSSTR AASTTS—RV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLT- GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGN—YAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAPSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALP-DALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 109 (tr_Q66PNl_Ascomycota_Trichoderma_parceramosum)
QACSSVWGQC GGQNWSGPTC CAAGSTCVYS NDYYSQCPPG -AASSSSSTR ASSTTN—RV SSTT-STSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLT- GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGN—YAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAITQLNL PNIAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPSA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSSNTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALP-DALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 110 (tr_G4TC42_Basidiomycota_Piriformospora_indica) AGQWGQC GGNGYTGPTQ CPSGWVCTPV SPWYYQCLQG TRSSSSSSSS SRSTSS--SS
STRSTSTSSS STRSTSTSTA TTSGSTVIPT ATGPFSGKTV WLSTYYAAEV DS-AADQVSD
ATLKAKILKV KEI PTFTWLD TIAKVATLDD YLPAASG KIFQ LWYDLPNRD
CHANASNGEL FFDQGGAAKY QGYIDGIAAA VKRNPSTTVI AVIEPDSLAN LVTNLSDPRC
SAAADGYKSS TTYALKTLAA AGVYMYMDAG HAGWLGWPAN ISPAADLFVT MWTNGGKSPF
IRGLATNVAN YNALTAASPD PATQGNANYD ETHYINALAP MLRTKGW-NA QFIVDQGRSG
VQNI-RSAWG NWCNIKGAGF GLRPTTNTGN QYIDAIVWIK PGGESDGTSN TSATRYDTMC GGP-DAKIPA PEAGQWFQAY FVDLVNNANP AF
SEQ ID NO: 111 (tr_Q9Y894_Basidiomycota_Volvariella_volvacea) QRPWGQC GGPGWTGPTC CVTGCTCPVT ND-YSQCLPG —TTTTTPGP PSTTTT PTSGG TPPPNNAATT TA TTA VNGPVTGWQP FLTPYYAGEV AAPLAPDIDT
PALSTKAAAV ANI PTFNWFD T-AKGPDLGA YLGMF LGNQ IWYDLPDRD
CAALRWRRME SLASQTMGST TTRATS INWL LRSRNTLESL QVIEPDS ATRC
IWPQ L CGVYMYLDAG HAGWLGWPAN LNPAAQLFSQ LYRDAGSPQY
VRGLATNVAN YNALCTPPRP SHTRQSQLCR SSLHQRALTP AVQSGGF-PA HFIVDQGRSG
VQNI-RQQWG DWYDQHACGY CIRPASNYYH RFIPHRRHC- LGQTRRRVNP AAKAAGDSTC SLT-DAPQPA PQAGTWFQAY FGTLVPAANP TE
SEQ ID NO: 112 (tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju) VGEWGQC GGINYTGSTT CDAGLVCNVI NDYYHQCLP- TPDA-- --GPYIGYQI YLSPYYADEV AA-AVSAISN
PALAAKAASV ANI PTFIWFD WAKVPTLGT YLADALSIQQ STGRN-QLVQ IWYDLPDRD
CAALASNGEF SIANNGLANY KNYVDQIVAQ LSEYPQIRVV AVVEPDSLAN MVTNLNVPKC
AGAQAAYTEG VTYALQKLNT VGVYSYVDAG HAGWLGWPAN LGPAAQLFAN LYTNAGSPSF
FRGLATNVAN YNLLNAPSPD PVTSPNANYD EIHYINALAP ELSSRGF-PA HFIVDQGRSA
VQGI-RGAWG DWCNVDNAGF GTRPTTSTGS SLIDAIVWVK PGGESDGTSD TSAVRYDGHC GLA-SAKKPA PEAHSSFQAY FEMLVANAVP AL SEQ ID NO: 113 (tr_Q02321_Basidiomycota_Phanerochaete_chrys) ASEWGQC GGIGWTGPTT CVSGTTCTVL NPYYSQCLPG S—TTSVITS HSSSVS--SV
SSHSGSSTST SSPTGPTGTN PP PPP SANPWTGFQI FLSPYYANEV AA-AAKQITD
PTLSSKAASV ANIPTFTWLD SVAKIPDLGT YLASASALGK STGTK-QLVQ IVIYDLPDRD
CAAKASNGEF SIANNGQANY ENYIDQIVAQ IQQFPDVRVV AVIEPDSLAN LVTNLNVQKC
ANAKTTYLAC VNYALTNLAK VGVYMYMDAG HAGWLGWPAN LSPAAQLFTQ VWQNAGKSPF
IKGLATNVAN YNALQAASPD PITQGNPNYD EIHYINALAP LLQQAGW-DA TFIVDQGRSG
VQNI-RQQWG DWCNIKGAGF GTRPTTNTGS QFIDSIVWVK PGGECDGTSN SSSPRYDSTC
SLP-DAAQPA PEAGTWFQAY FQTLVSAANP PL
SEQ ID NO: 114 (tr_C4B8Il_Basidiomycota_Coniophora_puteana) VAAYGQC GGQDWTGATA CASGTACTKV NDYYYQCLPG SS GSSVSGGSGS
GSTSAPSPTS TVPTSTSSTA PSSTSTSSAA SSDPYTGYQI FLNPEYASEV QA-AIPSITD SAVAAKALKV AEVPVFFWLD QVAKVPDLET YLAAADKQGK SSGQK-QLLQ IWYDLPDRD CAANASNGEF SISDDGQAKY ENYIDQIVAI VKKYPDVRVV AVVEPDSMGN LVTNMDLPKC SAAAPTYKTC INYAIAQLSS AGVYMYVDAG HAGWLGWPNN LAPAAQLFGE LYETSGKSAY FRGLATNVAN YNALNTSSPD PCTQNAPNYD EMLYINALSP LLQQQGF-SA QFIVDQGRSG VQNI-RNAWG DWCNIKGAGF GIRPTTDTGS PLIDSIVWVK PGGECDGTSN SSAPRYDSTC SLS-DSLQPA PEAGTWFQQY FEALVTNAVP SL
SEQ ID NO: 115 (tr_E2JAJ2_Basidiomycota_Neolentinus_lepideus) SPIYGQC GGTGWTGATT CASGSTCVFS NPYYSQCLPG A-TTTTTSPQ PTTTTT TTTTN SGGGNPTTTT SAPGSTSTPD -AGPFVGYTL YLSPYYAAEV QA-AAGNITD
ATQKAKAASI ANI PTFTWFD VIAKTSQLGT YLADASAKQK SSGQK-YIVQ IWYDLPDRD
CAAAASNGEF SIANNGLANY ETYIDQLAAQ IQQYPDVRVV AVIEPDSLAN LVTNLNVAKC
SNAQTAYKAG VTYAMQQLNK VGVYMYLDAG HAGWLGWPAN LTPAAQLFAS LYKSAGSPSF
VRGLATNVAN YNALSAASPD PITQGNSNYD EIHYINALGP MLSSQGF-PA HFIVDQGRAG
VQNI-RQQWG DWCNVAGAGF GTRPTTNTGS SLIDAWWVK PGGECDGTSD TSAARYDYHC
GLS-DALQPA PEAGTWFQAY FAALVKNANP PL
SEQ ID NO: 116 (tr_A8CED8_Basidiomycota_Polyporus_arcularius) APVYGQC GGIGWSGATT CVSGSVCTKQ NDYYSQCLPG -AASSAPTSP PTTSAP SSTPV STPPTGTTGS AP SSTP AAGPFVGVTP FLSPYYAAEV AA-AADAITD
STLKAKAASV AKI PTFTWLD SVAKVPDLGT YLADASALQK SSGQP-QVVQ IWYDLPDRD CAAKASNGEF SIADGGQAKY YDYIDQIVAQ IKKFPDVRVI AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALNQLAS VGVYQYMDAG HAGWLGWPAN IQPAAQLFAD MFKSANSSKF
VRGLATNVAN YNALSAASPD PITQGDPNYD ELHYINALGP MLAQQGF-PA QFVVDQGRSG
QQNL-RQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGESDGTSN SSSPRFDSTC
SLS-DATQPA PEAGTWFQTY FETLVSKANP PL
SEQ ID NO: 117 (tr_A8NEJ3_Basidiomycota_Coprinopsis_cinerea) RPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG S QP PVTTQPPV— VVPTT SQPPVWPTN PP—GGTPVP STGPFEGYDI YLSPYYAEEV EA-AAAMIDD
PVLKAKALKV KEI PTFIWFD WRKTPDLGR YLADATAIQQ RTGRK-QLVQ IWYDLPDRD
CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC
RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ IYKDAGRSAF
IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGW-DA KFIVDQGRSG
VQNI-RQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC
GKS-DSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 118 (tr_B2ZZ24_Basidiomycota_Irpex_lacteus) AQTWAQC GGIGFTGPTT CVAGSVCTKQ NDYYSQCIPG S TT PTSAPT SAPTS QPSQPSSTSS APSGPSSTPT PSAPWTGYQI YLSPYYANEV AA-AAKAITD
PTLAAKAASV ANI PNFTWLD SVSKIADLKT YLADASALGK SSGQK-QLLQ IWYDLPDRD
CAAKASNGEF SIADNGLANY QNYIDQIVAA VKQFPDVRVV AVIEPDSLAN LVTNLNVQKC
ANAKSTYLTA VNYALKQLSS VGVYQYMDAG HAGWLGWPAN LTPAAQLFAQ VYSDAGKSPF
IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP ALQSAGF-PA TFIVDQGRSG
QQNH-RQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK PGGESDGTSN SSSPRFDSTC
SLS-DATQPA PEAGTWFQAY FETLVSKANP PL
SEQ ID NO: 119 (tr_Q96VU2_Basidiomycota_Lentinula_edodes) LYGQC GGIGWSGATT CVSGATCTW NAYYSQCLPG -SASAP PTSTS SIGTGTTTSS APGSTGTTTP AAGPFTGYEI YLSPYYANEI AA-AVTQISD
PTTAAAAAKV ANI PTFIWLD QVAKVPDLGT YLADASAKQK SEGKN-YLVQ IWYDLPDRD
CAALASNGEF TIADNGEANY HDYIDQIVAQ IKQYPDVHVV AVIEPDSLAN LVTNLSVAKC
ANAQTTYLEC VTYAMQQLSA VGVTMYLDAG HAGWLGWPAN LSPAAQLFTS LYSNAGSPSG
VRGLATNVAN YNALVATTPD PITQGDPNYD EMLYIEALAP LL GSFPA HFIVDQGRSG
VQDI-RQQWG DWCNVLGAGF GTQPTTNTGS SLIDSIVWVK PGGECDGTSN TSSPRYDAHC
GLP-DATPNA PEAGTWFQAY FETLVEKANP PL SEQ ID NO: 120 (tr_Q6E5Bl_Basidiomycota_Volvariella_volvacea) SPLYGQC GGNGWTGPKT CVSGATCTVI NDWYWQCLPG NG PTS SSPTS TPTTTTTT— —GGPQPTVP AAGPYTGYEI YLSPYYAAEA QA-AAAQISD
ATQKAKALKV AQIPTFTWFD VIAKTSTLGD YLAEASALGK SSGKK-YLVQ IWYDLPDRD
CAALASNGEF SIANNGLNNY KGYIDQLVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVSKC
ANAQTAYKAG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LNPAAQLFSQ LYRDAGSPQY
VRGLATNVAN YNALSASSPD PVTQGNPNYD ELHYINALAP ALQSGGF-PA HFIVDQGRSG
VQNI-RQQWG DWCNVKGAGF GQRPTLSTGS SLIDAIVWIK PGGECDGTTN TSSPRYDSHC GLS-DATPNA PEAGQWFQAY FETLVRNASP PL
SEQ ID NO: 121 (tr_F8Q7V9_Basidiomycota_Serpula_lacrymans) ASLYGQC GGVGWTGATT CDSGSSCQEI NSYYSQCLPG STTVPTT PTTQPA SASGT TTSSAPTS— TATG AAGPFTGYEI YLSPYYVAEV QA-AVANITD
SALQAKASKV ANI PNFTWLD EVAKVPTLGT YLADADALAK SSGNE-QLLQ IWYDLPDRD
CAAAASNGEF SIANNGQANY FNYIDQIVAQ IQKYPGVRVV AIIEPDSMAN LVTNLSVAKC
ANAASTYKAC VQYALEQLAT VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ TYQAAGSSPF
FRGLATNVAN YNALTTTSPD PITQGDANYD ELLYIQALSP LLIEQGF-PA QFIVDQGRSG
VQNI-RSAWG DWCNVKGAGF GTQPTTNTGS SLIDAIAWIK PGGECDGTSD SSSLRYDPHC SLS-DALQPA PEAGTWFQTY FEQLVSNANP AL
Appendix 5
SEQ ID NO: 122 node #59 QDCRPGYAAC GGGGWGGDTS GTGGGSCGVG APTYSGCGAG TPTSSGNVDP TTTSSGNVDP TTTSSPTTTS SQNVDPTTSA ASGGGNCSPA VSGPFADAHL FVDPGYVKKV DSLSIAQVTD TALKAKMEKV KQIQTAFWLD RIEAIKELPA YLDAALKLQQ ELCEPPVTAL IVVYDLPNRD CFAEASNGEL HLDQNGTQRY REYIDPIKQI LAKYSGQRIV AVIEPDSLPN IVTNLGGKRC DETEASYRDG VAYTLKELNM PHVYQYI DAA HSGWLGWPDN QKKGAKIFAE VIKAAGSPAN VRGFATNVAN YTQLSYTAES YDQQDNPCFG EFDYVDAMAS ALSAEGLGDK HFI I DTSRNG VGNIDREDWG YWCNNKGAGM GQRPKANGGA TNLDAFVWVK PPGDSDGVGQ EGQPRYDLFC GKEYNADKRA PQAGQWFHEY FVECVKNANP PL
SEQ ID NO: 123 node #60 QDCRPGYAAC GGSGWTGDTS CTGGGSCGVA NPTYSGCWAG SPTSTTNTDP
TYTDSDPVGP TGTSNPVPTS SQPPVPTTTS APGGGGTPLA VSGPFEGVDL YLNPYYVEEV
DALAIAQITD SALKAKAEKV KQIPTAIWLD AIENIQELPA YLDDALALQQ RTGKKPVLVL
IVVYDLPNRD CHAAASNGEL HLDDGGMERY KDYIDPIEQK LKKYPDQRIV AVIEPDSLAN
LVTNLNAAKC RDAEASYKEG VAYALKKLGM PHVYMYI DAG HAGWLGWNDN QEKAAKVFAE
VIKNAGSPAK VRGFATNVSN YTPLSDTTRG PDTQGNPCFD EFRYI DAMAS ALRAEGLSDA
HFI I DTSRNG VKNIDRQEWG NWCNNKGAGL GARPKANSGA SNLDAFVWIK PPGESDGTSD
ESAPRYDTYC GKEYDAHKPA PEAGQWFHEY FVQLVKNANP PL
SEQ ID NO: 124 node #61 QNCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP PVTTQPPVSS SSTSTWPTT SQPPVWPTN PPSGGGTPVP STGPFEGYDI YLSPYYAEEV EALAAAMIDD PVLKAKALKV KEIPTFIWFD WRKTPDLGR YLADATAIQQ RTGRKPQLVQ IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ
lYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL SEQ ID NO: 125 node #62 QNCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP PVTTQPPVSS SSTSTWPTT SQPPVWPTN PPSGGGTPVP STGPFEGYDI YLSPYYAEEV EALAAAMIDD PVLKAKALKV KEIPTFIWFD WRKTPDLGR YLADATAIQQ RTGRKPQLVQ IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ
lYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 126 node #63 QNCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP PVTTQPPVSS SSTSTWPTT SQPPVWPTN PPSGGGTPVP STGPFEGYDI YLSPYYAEEV EALAAAMIDD PVLKAKALKV KEIPTFIWFD WRKTPDLGR YLADATAIQQ RTGRKPQLVQ IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ
lYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 127 node #64 QNCSPLYGQC GGTGWTGATT CVSGATCTVI NDWYSQCLPG SATTTTTSSP
PTTTAPPVSS SSTSASTPTS SPPTTTTTTS APGGSSSTVP AAGPFTGYEI YLSPYYAAEV
QAPAAAQISD PTLKAKALKV ANIPTFTWFD VVAKTPDLGT YLADASALQK SSGKKPYLVQ
IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN
LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKDAGSPSF VRGLATNVAN YNALSATSPD PVTQGNPNYD ELHYINALAP LLQSQGFFPA
HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTLNTGS SLIDAIVWIK PGGECDGTSD
TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVKNANP PL
SEQ ID NO: 128 node #65 QNCSPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTSSP
PTTTAPPVSS SSTSASTSTS SPPTTPTTTS APSGSSSTTP AAGPFTGYEI YLSPYYAAEV QAPAAAQISD PTLKAKAAKV ANIPTFTWFD VVAKTPDLGT YLADASALQK SSGKKPYLVQ IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN
LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP LLQSQGFFPA
HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVKNANP PL
SEQ ID NO: 129 node #66 QNCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSP
PTTTAPPVSS SSTSASTSTS SPPTTPTTTS APSGSSSTTP AAGPFTGYEI YLSPYYAAEV
AAPAVAQISD PTLAAKAAKV ANIPTFTWFD VVAKVPDLGT YLADASALQK SSGKKPQLVQ
IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN
LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP LLQSQGFFPA
HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 130 node #67 QNCAPLWGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTSSP PTTTAPPVSS SSTSASTSTS SPPTTPTTTS APGGSSSTTP AAGPFTGYQI YLSPYYAAEV AAPAVAQISD PTLAAKAAKV ANIPTFTWFD VVAKVPDLGT YLADASALQK SSGKKPQLVQ IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ LKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSAPSPD PITQGNPNYD EIHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 131 node #68 QNCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSP PTTSAPGVSS SSTSASTSTS STPTTPTTTS APSGSSSTTP AAGPFTGYEI YLSPYYAAEV AAPAVAQISD PTLAAKAAKV ANIPTFTWLD QVAKVPDLGT YLADASALQK SSGKKPQLVQ IVVYDLPDRD CAALASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKEC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGDPNYD EIHYINALAP LLQQQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVSNANP PL SEQ ID NO: 132 node #69 QNCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTSP
PTTSAPGSSS SSTSASSSTS STPTTPTTTS APSGSSSTTP AAGPFTGYEI YLSPYYAAEV
AAPAVAQITD PTLAAKAAKV ANIPTFTWLD QVAKVPDLGT YLADASALQK SSGQKPQLVQ
IVVYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN
LVTNLNVAKC ANAQTTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKNAGSSPF VRGLATNVAN YNALSATSPD PITQGDPNYD EIHYINALAP LLQQQGFFPA
QFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN
SSSPRYDSHC SLSYDATQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 133 node #70 QNCAPVYGQC GGIGWTGATT CVSGSTCTKQ NDYYSQCLPG SAASTTVTSP PTTSAPGSSV SSTSASSTTS STPTTPTTTS APSGSSSTTP AAGPFTGYQI YLSPYYAAEV AAPAAAAITD PTLAAKAASV ANIPTFTWLD SVAKVPDLGT YLADASALQK SSGQKPQLVQ IVVYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSTC SLSYDATQPA PEAGTWFQAY FETLVSKANP PL
SEQ ID NO: 134 node #71 QNCAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SAATTTVTTS PTSSASGSSV SSHSGSSTTS SSPTTPTTTS APSGPSSTPP AAGPWTGYQI YLSPYYANEV AAPAAKAITD PTLAAKAASV ANIPTFTWLD SVAKI PDLGT YLADASALGK SSGQKPQLVQ IVVYDLPDRD CAAKASNGEF SIADNGQANY QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYQNAGKSPF IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA TFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK PGGECDGTSN SSSPRYDSTC SLSYDATQPA PEAGTWFQAY FETLVSKANP PL
SEQ ID NO: 135 node #72 QNCASLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATSTTVPSS PTTSAAGSSS SSTSASSSTS TTPTTPTSTS APSGSSSTTA AAGPFTGYEI YLSPYYAAEV QAPAVANITD SALAAKAAKV ANIPTFTWLD QVAKVPDLGT YLADADALAK SSGQKPQLLQ IVVYDLPDRD CAAKASNGEF SIADNGQANY ENYIDQIVAQ IKKYPDVRVV AVIEPDSMAN LVTNLNVAKC ANAATTYKAC VTYALEQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYQTAGSSPF FRGLATNVAN YNALSTTSPD PITQGDPNYD ELLYINALSP LLQQQGFFPA QFIVDQGRSG VQNIGRSAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSYDALQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 136 node #73 QNCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP
PVTTQPPVSS SSTSTVVPTT SQPPVWPTN PPSGGGTPVP STGPFEGYDI YLSPYYAEEV
EALAAAMIDD PVLKAKALKV KEIPTFIWFD VVRKTPDLGR YLADATAIQQ RTGRKPQLVQ
IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN
MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ
IYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA
KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD
TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 137 node #74 QNCGSVYAQC GGQGWSGATT CVSGSTCVVL NAYYSQCLPG
SATTTTSTTA PTTTTPTTTT TSTTTTATTT TPPPTATTTT PPSGASTTAA
ASGPFSGYQL YANPYYASEV SALAIPSLTD GAMATKAAAV AKVPTFVWLD
TAAKVPTMGT YLADIRALNK AGANPPIAGQ FVVYDLPDRD CAAAASNGEY
SIADGGVAKY KAYIDSIVAQ LKTYSDVRI I LVIEPDSLAN LVTNLNVAKC
ANAQAAYLEG INYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAQ
IYKNAGSPAA VRGLATNVAN YNAWSITTCP SYTQGDSNYD EKLYINALAP
LLTAQGWPDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD
ALEDAFVWIK PGGESDGTSD TTAARYDYHC GLSYDALKPA PEAGTWFQAY
FVQLLTNANP SF
SEQ ID NO: 138 node #75 QNCASVWGQC GGQGWTGATS CVSGSTCVVL NPYYSQCLPG
SATTTTSTTT
PTTTTSTTTT TSTTTTSTTT TPPPTATTTT PPSGASTTAT ASGPFSGYQL YANPYYASEV STLAI PSLTD GAMATKAAAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGQ FVVYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAQ LKTYSDVRTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAN VYKNAGSPAA LRGLATNVAN YNAWSITTCP SYTQGDSNYD EKLYINALAP LLTAQGWPDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK
PGGESDGTSD TTAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 139 node #76 QNCQSVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTSTTT
PTTTTSTTTT TSTTTTSTTT APSSATTTAT ASGPFSGYQL YANPYYSSEV HTLAI PSLTD GSLAAKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGI FVVYDLPDRD CAAAASNGEY S IANNGVANY KAYIDSIRAQ LKTYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNYD EKLYINALAP LLTAQGWPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSD TSAARYDYHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 140 node #77 QNCQSVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT
PTTTTSTTTT TSTTTTSTTT TPTTTTTTTT APSSVTTTAT ASGPFSGYQL YANPYYSSEV
HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI
FVVYDLPDRD CAAAASNGEY S IANNG ANY KAYIDSIRAQ LVTYSDVHTI LVIEPDSLAN
LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS
VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSD
TSAARYDYHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 141 node #78 QNCQTVWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT TTTTTSTTTT TSTTTTSTTT TPTTTTTTTT APSSVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FVVYDLPDRD CAALASNGEY S IANNG ANY KAYIDSIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLQDAFVWVK PGGESDGTSD TSAARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 142 node #79 QNCQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT
TTTTTSTTTT TSATTTSTTT TPTTTTTTTT APSSVTTTAT ASGPFSGYQL YANPYYSSEV
HSLAIPSLTD GSLAPAATAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI
FVVYDLPDRD CAALASNGEY S IAN GVANY KAYIDSIRAI LVKYSDVHTI LVIEPDSLAN
LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS
VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSVCD EKRYINALAP LLKAQGFPDA
HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD
TSAARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 143 node #80 QACQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTSTTT
TTTTSSTTTT TSAATATTTT TPTSTTTTTS APSSVTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAPAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FVVYDLPDRD CAALASNGEF SIADNGVEHY KAYIDSIREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAANLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD TSAARYDAHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 144 node #81 QACQTLWGQC GGQGYTGATS CVAGATCSTL NPYYAQCTPA TASTTTSTTT TTTTSSTTTT TSAATATTTT TATSTPSTSS APSSPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FVVYDLPDRD CAALASNGEY SIADNGVEHY KAYIDSIREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 145 node #82 QACQTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA TASTTTATTT TTTTSSTTTT TTAAVATTTT TATSTPSASA SPSSPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAIPSLTD GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FVVYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP AF
SEQ ID NO: 146 node #83 QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCLPG TATTTTTTTT TTTTTTSTTT TSTTTTSTTT TPTTTTTTTS APSSVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FVVYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLQDAFVWVK PGGESDGTSD TSSARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 147 node #84 QACQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT
TTTTTTSTTT TSTTTTSTTT TPSTGTTTTS APSSTTITAT PSGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT
YLADIKAKNA AGANPPIAGI FVVYDLPDRD CAALASNGEY SIANGGVANY
KKYIDAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC
INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS VYKDAGSPAA
LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD SLQDAFVWVK
PGGESDGTSD TSSARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 148 node #85 QACQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATATTTTTT TTTTTTSTTT TTTTTTSTTT SAGSGSTTTS APASTTITAS PSGPFSGYQL YANPYYSSEV HTLAIPSLAD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNA AGANPPIAGI FVVYDLPDRD CAALASNGEY SIANGGVANY KKYIDAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC SGAQDAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAELFAS
VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDQNCD EKRYINALAP LLKAQGFPDA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTDTGD ALQDAFVWVK PGGESDGTSD
TSSARYDAHC GYSYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 149 node #86 QACQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATATTTTTT TATTTTSTTT TSTTTTQTTT KPTTTGPTTS APSSVTTTVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FVVYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA LRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFPDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 150 node #87 QACQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VTTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI FVVYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SL
SEQ ID NO: 151 node #88 QACQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL
VKTTSSTSVG TSSATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL
YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT
YLANIEAANK AGASPPIAGI FVVYDLPDRD CAAAASNGEY TVANNGVANY
KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC
VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAS
LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK
PGGESDGTSN SSATRYDYHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SL SEQ ID NO: 152 node #89 QACQSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATTATSTTL VKTTSSTSVG TSSATTSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FVVYDLPDRD CAAAASNGEY TVAN GVA Y KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP AL
SEQ ID NO: 153 node #90 QSCASVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
PTTTTSSTTT TSTTTSSTST TPPPAATTTT PPSGASGTAT YSGPFSGVQL WANSYYASEV STLAIPSLSD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGANPPYAGQ FVVYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAQ LVTYSDIRTI LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGDSNYD EKLYIHALSP LLTAQGWSDA HFITDQSRSG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 154 node #91 QACASVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
STTTTSSTTT SSTTTSSTST TPPPAATTTT PPSGASGTAT YSGPFSGVNL WANSYYASEV STLAIPSLSD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAI LVTYSDIRTI LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGNSNYD EKLYIHALSP LLTQQGWSDA HFITDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF SEQ ID NO: 155 node #92 QSCASVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
STTTTSSTTT SSTTTSTTST TTPPATTTTT PPGGASGTAT YTGPFSGVNL WANSYYRSEV
STLAIPSLSD GAMATKAAAV AKVPSFQWLD TAAKVPTMSG TLADIRAANK AGGNPPYAGQ
FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRAI LVTYSDIRTI LVIEPDSLAN
MVTNMNVAKC ANAQSAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK
lYKDAGKPAA LRGLATNVAN YNAWNISSAP SYTQPNPNYD EKHYIEAFSP LLTAQGWSDA
HFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD
TTAARYDYHC GLSYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 156 node #93 QSCGSVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
STTTTSSTTS SSTTTSTTST TAPPATTTTT TPGGASSTAS YTGPFSGVNL WANNYYRSEV
HTLAIPSLTD GAMATKAAAV AKVPSFQWLD TAAKVDTMSG TLADIRAANK AGGNPPYAGQ
FVVYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKL LVTYSDIRTI LVIEPDSLAN
MVTNMNVAKC ANAQSAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAELFAK
lYKDAGKPAA LRGLATNVAN YNAWNISSAP SYTSPNPNYD EKHYIEAFSP LLTAQGWSNA
HFIVDQGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD
TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 157 node #94 QSCGSQWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG STTTTTSSTS
STTTTSSTSS STTTTPTTST TAPPATTTTT TPGGAGTTAS FTGPFSGVNL WANNYYASEV
HTLAIPSLTD GAMATKAAAV AKVPSFQWLD IAAKVDTMPG TLADIRAANK AGGNPPYAAQ
FVVYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKQ LVSYSDIRTI LVIEPDSLAN
MVTNMNVPKC ANAQAAYREC TIYAIKQLNL PNVAMYLDGG HAGWLGWPAN LQPAADLFGK
LYADAGKPSQ LRGMATNVAN YNAWNLTSAP SYTSPNPNYD EKHYIEAFSP LLAAKGWSNA
HFIVDQGRSG KQPTGQQEWG HWCNAMGTGF GMRPSANTGS ELVDAFVWIK PGGECDGTSD
TTAARFDHFC GMSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 158 node #95 QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDYYSQCLPG SSTTTTSSTR
STSTTSSTTS SSTSTSTTST TAPPVPTTTT IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSMTD GAMATKAAAV AKVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FVVYDLPDRD CAAAASNGEF SIANGGVANY KAYIDAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 159 node #96 QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPG SSTTTTSSTR STSTTSSTTS SSTSTSTTST TTPPVPTTTT IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAIPSMTD GAMATKAAAV AKVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FVVYDLPDRD CAAAASNGEF SIANGGAANY KAYIDAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 160 node #97 QNCGSVWGQC GGIGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTSTTT
STSTTSSTTS TSTSTSTTST TTPPVPTTTT IPGGASSTAS YTGPFSGVQL WANNYYRSEV
HTLAIPSLTD GAMATKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI
FVVYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK
IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA
HFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 161 node #98 QNCGSVWGQC GGNGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTPTST
STSTSSSSRS TSTSTSTSST TTPPVATTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV
HTLAIPSLTD GAMATKAAAV AEVPSFQWMD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI
FVVYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN
MVTNMNVAKC AGAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK
IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYVEAFSP LLTAAGFSPA
HFITDTGRSG KQPTGQLEWG HWCNAIGTGF GQRPTANTGH DLVDAFVWIK PGGECDGTSD TTAARYDHHC GLAYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 162 node #99 QNCASVWGQC GGIGYNGPTC CQSGSTCVKQ NDWYSQCLPG SSTTTTSTTT
STSTTSSTTS TSTSTSTTST TTAPAPTTTT IPGGASSTAS YNGPFSGVQL WANNYYRSEV HTLAI PSLTD PALAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FVVYDLPDRD CAAAASNGEW S IAN GAN Y KAYIDRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC SNAASTYKEL IYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSPP SYTSPNPNYD EKHYIEAFSP LLTAQGFSPA QFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 163 node #100 QNCGSVWSQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPN SSTTTSTSTR STSTSSSTTS SSTSTSTTST TAPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAIPSMTD GAMATKAAAV AKVPSFQWLD RNVTIDTMAQ TLSQIRAANK AGANPPYAGQ FVVYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL IIQYSDIRII LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAE IYKDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRNG KQPTGQQQWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALQPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 164 node #101 QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR
STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV
HNLAIPSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ
LVVYDLPDRD CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN
MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG
IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFSPA
RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD
TSAARYDYHC GLSYDALQPA PEAGQWFQAY FEQLLTNANP PF SEQ ID NO: 165 node #102 QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK
VSSTTSKVTS SSAAQPITTT TAPSVPTTTT IAGGASSTAS FTGPFLGVQG WANSYYSSEI
YNHAIPSMTD GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH
FVVYDLPDRD CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN
LVTNTNVAKC ANAASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD
IYKAAGKPKS VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGFSPA
QFIVDTGRSG KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD
TSATRYDYHC GLSYDALKPA PEAGQWFQAY FEQLLKNANP AF
SEQ ID NO: 166 node #103 QSCASVWGQC GGQGWSGATC CASGSTCVVQ NDFYSQCLPG SATTTTSSTR
STTTTSVTTT SSTTTATTST STPPATTVTT PPAGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN MVTNMNVPKC ANAQSAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAK IYKDAGKPSA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFSP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLAYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 167 node #104 QACASQWGQC GGQGWSGPTC CASGSTCVVQ NAFYSQCLPG SATTATSSTR STTTTSVTST SSTSTATTSV STPPATTVTT PPAGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FVVYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAVKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK
lYKDAGKPKA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFAP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLAYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 168 node #105 QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSVRST TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ
WANAYYRSEV SSLAVPSLSD GPLATAAAKV ADVPTFQWMD TTAKVPLIDG
ALADIRRANA AGGNPPYAGI FVVYNLPDRD CAAAASNGEL SIANDGINKY
KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC SNAAAAYKEC
TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS
LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGWSDA
KFI IDQGRSG KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK
PGGEADGTSD TTAVRYDHFC GSAYSSMKPA PEAGTWFQAY FEQLLRNANP SF
SEQ ID NO: 169 node #106 QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV
RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPAGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKI PLVDA TLADIRKANQ AGANPPYAGE FVVYNLPDRD CAAAASNGEL SIADGGVAKY KQYIDDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYS I STAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKAYDALKPA PEAGTWFQAY FEQLLINANP AF
SEQ ID NO: 170 node #107 QSCSNVWSQC GGQNWSGTPC CTSGNKCVKL NDFYSQCQPG SATSATSSTT
SATTTSVTTT ATKTTATTTS STTSGTSVTS APAGPSGPPA ATDPFSGVDL WANNYYRSEV
STLAIPKLSD GAMATAAAKV ADVPSFQWMD TYDHISLMEE TLADIRKANK AGGNPPYAGQ
FVVYDLPDRD CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTRI I LVIEPDSLAN
MVTNMNVPKC ANAESAYKEL TIYAIKELNL PNVSMYLDAG HGGWLGWPAN LPPAAQLYAQ
lYKDAGKPSR LRGLVTNVSN YNAWKLSSKP DYTESNPNYD EQRYINAFSP LLAQEGWSNA
KFIVDQGRSG KQPTGQKAWG DWCNAPGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD
TSAARYDYHC GLDYDALKPA PEAGTWFQAY FEQLLKNANP SF
SEQ ID NO: 171 node #108 QACASVWGQC GGQGWSGATC CASGSTCWS NDYYSQCLPG SAATSTSSTR
SSTTTSSTRA SSTTTSSSST TPPPSSTTTP PPPVGSGTAT YSGPFSGVNP WANSYYASEV
SSLAIPSLSD GAMATAAAAV AKVPSFMWLD TLAKTPLMSS TLADIRAANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADNGVAKY KNYIDTIRAI LVTYSDIRTI LVIEPDSLAN
LVTNLSVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN
VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA
YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFVWVK PGGECDGTSD TSAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 172 node #109 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 173 node #110 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR
ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP
WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ
TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVAKY
KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC
INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA
FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK
PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 174 node #111 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR
AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV
SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ
FVVYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN
LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN
VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA
FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 175 node #112 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTARARA SSTTSSRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 176 node #113 LDCRPGYAAC GGSEWTDDNG CEGGGWCGIA NPTYPSCWAG SPTSTTNTEV
SYTDSDPVGK WGVENPVPTS SQPPTPTTTS APGGGGSPLA VSGPFEGVDL YLNPYYVEEV
DALAIAQMSD SALKAKAEKV KQIPTAIWLD TIENMQQLPT YLDDALALQQ TSGKKPVLVV
FVVYDLPNRD CHAAASNGEL LLNDGGLQRY KDYI DAIEEK LKKYPNQRIV LI IEPDSLAN
LVTNLNTAKC RDAEASYKEG LAYAIKKFGL PHVYMYLDAG HAGWLGWNDN REKAAKVFAE
VIKNAGSPGK VRGFTTNVSN YTPLNDTSRG PDTQGNPCFD EFRYLDAMNS ALKAAGISDV
HFI I DTSRNG VKNIDRKQSG NWCNQKGAGL GARPKANSST SYLDAFVWIK PPGESDGTSD
ESAPRYDPYC GREYDAMKPA PEAGQWFHEY FVQLLKNANP PL
SEQ ID NO: 177 node #114 LDCRPGYPCC SGSEYVDDDG VENGNWCGIA DPTYESCWAE SPCSTTNTEV
EYTDSDPVGK WGVENPVPTG TQTPTPTSGS DPGTPGSPLT I SGPFKGVEF
YLNPYYVDEV DALAIAQMSD SSLKAKAEKM KTFSNAIWLD TIKNMQQLET
NLKGALAQHQ TSGKKPVLTV FVVYDLPGRD CHALASNGEL LANDGDLQRY
KSYIDVIEEK LKKYNSQPVV LI IEPDSLAN LVTNLNTPAC RDSEQYYLEG
HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYAK VIKSSGSPGK
VRGFTDNVSN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGISSV
YFVSDTSRNG HKNVDRKHPG EWCNQTGVGI GARPKANSSM DYLDAFYWIK
PLGESDGTSD ESAARYDGYC GHEYTAMKPA PEAGQWFQKH FEQGLKNANP PL
Overall accuracy of the 56 ancestral sequences: 0.74109 0.73826 0.97835 0.97835 0.97835 0.94117 0.94529 0.94578 0.92510 0.94465 0.94629 0.94601 0.94854 0.92166 0.97835 0.92600 0.94225 0.96071 0.96111 0.96784 0.96017 0.96374 0.95726 0.97405 0.96833 0.96265 0.95336 0.97883 0.97716 0.97988 0.98803 0.94760 0.95237 0.94720 0.94774 0.94221 0.96879 0.96964 0.97211 0.97245 0.96504 0.96520 0.99716 0.99644 0.93515 0.93299 0.99472 0.99382
0.94897 0.96626 0.99577 0.99653 0.996830.99713 0.73208 0.91673
Appendix 6
SEQ ID NO: 178 node #59 QDCRPGYAAC GGGGWGGDTS CTGGGWCGVG NPTYSGCWAG TPTSSGNVDP
TTTSSGNVGP TTTSNPVPTS SQPPDPTTSA APGGGGCPLA VSGPFEGADL YLDPYYVKKV
DALAIAQVTD TALKAKMEKV KQIPTAFWLD RIENIQELPA YLDDALKLQQ ELGKKPVLVL
IVVYDLPNRD CFAEASNGEL HLDDGGMQRY KEYI DPIKQK LKKYSGQRIV AVIEPDSLPN
LVTNLGGKKC RDTEASYKEG VAYALKKLGM PHVYQYIDAG HAGWLGWPDN QKKGAKVFAE
VIKAAGSPAN VRGFATNVAN YTPLSYTARG YDQQGNPCFG EFRYVDAMAS ALRAEGLGDK
HFI I DTSRNG VKNI DRKDWG YWCNNKGAGL GQRPKANGGA TNLDAFVWIK PPGESDGVGD
EGQPRYDLYC GKEYDADKPA PQAGQWFHEY FVQLVKNANP PL
SEQ ID NO: 179 node #60 QDCRPGYAAC GGSGWTGDTS CTGGGWCGVA NPTYSGCWAG SPTSSTNTDP
TTTSSGNVGP TSTSNPVPTS SQPPTPTTSS APGGGGSPLA VSGPFEGADL YLNPYYVEEV
DALAIAQVTD SALKAKAEKV KQIPTAIWLD TIENIQELPA YLDDALALQQ ESGKKPVLVL
IVVYDLPNRD CHAEASNGEL HLDDGGMQRY KEYI DPIEQK LKKYSGQRIV AVIEPDSLAN
LVTNLNTAKC RDTEASYKEG VAYALKKLGM PHVYMYIDAG HAGWLGWNDN QEKAAKVFAE
VIKSAGSPAN VRGFATNVAN YTPLSDTARG PDTQGNPCFD EFRYVDAMAS ALRAEGLSDV
HFI I DTSRNG VKNI DRKDWG NWCNNKGAGL GARPKANSGA TNLDAFVWIK PPGESDGTSD
ESAPRYDPYC GKEYDADKPA PEAGQWFHEY FVQLVKNANP PL
SEQ ID NO: 180 node #61 QDCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP
PVTTQPPVSS SSTSTVVPTT SQPPVWPTN PPGGGGTPVP STGPFEGYDI YLSPYYAEEV
EALAAAMI DD PVLKAKALKV KEI PTFIWFD WRKTPDLGR YLADATAIQQ RTGRKPQLVQ
IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN
MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ
lYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA
KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD
TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 181 node #62 QDCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP PVTTQPPVSS SSTSTVVPTT SQPPVWPTN PPGGGGTPVP STGPFEGYDI YLSPYYAEEV EALAAAMIDD PVLKAKALKV KEIPTFIWFD VVRKTPDLGR YLADATAIQQ RTGRKPQLVQ IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ IYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 182 node #63 QDCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP
PVTTQPPVSS SSTSTVVPTT SQPPVVVPTN PPGGGGTPVP STGPFEGYDI YLSPYYAEEV EALAAAMIDD PVLKAKALKV KEIPTFIWFD VVRKTPDLGR YLADATAIQQ RTGRKPQLVQ IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ IYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 183 node #64 QDCSPLYGQC GGTGWTGATT CVSGATCTVI NDWYSQCLPG SATTTTTTSP PTTTAPGVSS SSTSASTPTS SPPTTPTTTS APGGSSSTVP AAGPFTGYEI YLSPYYAAEV QALAAAQISD PTLKAKALKV ANIPTFTWFD WAKTPDLGT YLADASALQK SSGKKPQLVQ IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKDAGSPSF VRGLATNVAN YNALSATSPD PVTQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTLNTGS SLIDAIVWIK PGGECDGTSN TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVKNANP PL
SEQ ID NO: 184 node #65 QDCSPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSP PTTTAPGVSS SSTSASTSTS SPPTTPTTTS APGGSSSTTP AAGPFTGYQI YLSPYYAAEV QALAAAQISD PTLKAKAAKV ANIPTFTWFD WAKTPDLGT YLADASALQK SSGKKPQLVQ IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA
HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN
TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVKNANP PL
SEQ ID NO: 185 node #66 QDCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSP PTTTAPGVSS SSTSASTSTS SPPTTPTTTS APGGSSSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANI PTFTWFD WAKVPDLGT YLADASALQK SSGKKPQLVQ IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 186 node #67 QDCAPLWGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSP
PTTTAPGVSS SSTSASTSTS SPPTTPTTTS APGGSSSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANI PTFTWFD WAKVPDLGT YLADASALQK SSGKKPQLVQ IVVYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ LKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSAPSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 187 node #68 QDCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTVTSP
PTTSAPGVSS SSTSASTSTS STPTTPTTTS APGGSSSTTP AAGPFTGYQI YLSPYYAAEV
AALAVAQISD PTLAAKAAKV ANIPTFTWLD QVAKVPDLGT YLADASALQK SSGKKPQLVQ
IVVYDLPDRD CAALASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN
LVTNLNVAKC ANAQTTYKEC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGDPNYD ELHYINALAP LLQQQGFFPA
HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN
TSSPRYDSHC GLSYDATQPA PEAGTWFQAY FETLVSNANP PL SEQ ID NO: 188 node #69 QDCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTSP PTTSAPGVSS SSTSASSSTS STPTTPTTTS APSGSSSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLAAKAAKV ANIPTFTWLD QVAKVPDLGT YLADASALQK SSGQKPQLVQ IVVYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSATSPD PITQGDPNYD ELHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSYDATQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 189 node #70 QDCAPVYGQC GGIGWTGATT CVSGSTCTKQ NDYYSQCLPG SATTTTVTSP PTTSAPGVSS SSTSASSTTS STPTTPTTTS APSGSSSTTP AAGPFTGYQI YLSPYYAAEV AALAAAQITD PTLAAKAASV ANIPTFTWLD SVAKVPDLGT YLADASALQK SSGQKPQLVQ IVVYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSAASPD PITQGDPNYD ELHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSTC SLSYDATQPA PEAGTWFQAY FETLVSKANP PL
SEQ ID NO: 190 node #71 QDCAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SATTTTVTTS
PTSSASGVSS SSTSASSTTS SSPTTPTTTS APSGSSSTPP AAGPWTGYQI
YLSPYYANEV AALAAKQITD PTLAAKAASV ANIPTFTWLD SVAKI PDLGT
YLADASALGK SSGQKPQLVQ IVVYDLPDRD CAAKASNGEF SIADNGQANY
QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC
VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGKSPF
IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA
TFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK
PGGECDGTSN SSSPRYDSTC SLSYDATQPA PEAGTWFQAY FETLVSKANP PL
SEQ ID NO: 191 node #72 QDCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTSP PTTSAAGVSS SSTSASSSTS TTPTTPTSTS APSGSSSTTA AAGPFTGYQI YLSPYYAAEV QALAVANITD SALAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADADALAK SSGQKPQLLQ
IVVYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSMAN
LVTNLNVAKC ANAATTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ
LYKTAGSSPF FRGLATNVAN YNALSTTSPD PITQGDPNYD ELLYINALSP LLQQQGFFPA
QFIVDQGRSG VQNIGRSAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN
SSSPRYDSHC SLSYDALQPA PEAGTWFQAY FETLVSNANP PL
SEQ ID NO: 192 node #73 QDCRPLYAQC GGTGWTGETT CVSGAVCEVI NQWYHQCLPG SATTTTSTQP
PVTTQPPVSS SSTSTVVPTT SQPPVWPTN PPGGGGTPVP STGPFEGYDI YLSPYYAEEV
EALAAAMI DD PVLKAKALKV KEI PTFIWFD WRKTPDLGR YLADATAIQQ RTGRKPQLVQ
IVVYDLPDRD CAAAASNGEF SLADGGMEKY KDYVDRLASE IRKYPDVRIV AVIEPDSLAN
MVTNMNVAKC RGAEAAYKEG VIYALRQLSA LGVYSYVDAG HAGWLGWNAN LAPSARLFAQ
IYKDAGRSAF IRGLATNVSN YNALSATTRD PVTQGNDNYD ELRFINALAP LLRNEGWSDA
KFIVDQGRSG VQNIGRQEWG NWCNVYGAGF GMRPTLNTPS SAIDAIVWIK PGGEADGTSD
TSAPRYDTHC GKSYDSHKPA PEAGTWFQEY FVNLVKNANP PL
SEQ ID NO: 193 node #74 QACASVYAQC GGQGWTGATT CVSGSTCWL NPYYSQCLPG SATTTTSTTT
PTTTTPSTTT TSTTTTATTT TPPPATTTTS APGGASTTAT ASGPFSGYQL YANPYYASEV
STLAIPSLTD GAMATKAAAV AKVPTFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGQ
FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYI DS IRAQ LKTYSDVRI I LVIEPDSLAN
LVTNMNVAKC ANAQAAYLEG INYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAQ
lYKNAGSPAA VRGLATNVAN YNAWSITTCP SYTQGDSNYD EKLYINALAP LLTQQGWPDA
HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWIK PGGESDGTSD
TTAARYDYHC GLSYDALKPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 194 node #75 QACASVWGQC GGQGWTGATS CVSGSTCWL NPYYSQCLPG SATTTTSTTT
PTTTTSSTTT TSTTTTSTTT TPPPATTTTS APGGASTTAT ASGPFSGYQL YANPYYASEV STLAIPSLTD GAMATKAAAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRAQ LVTYSDVRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAN VYKNAGSPAA LRGLATNVAN YNAWSITTCP SYTQGDSNYD EKLYINALAP LLTQQGWPDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGESDGTSD TTAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 195 node #76 QACQSVWGQC GGQGWTGATS CVAGSTCSTL NPYYAQCLPA TATTTTSTTT PTTTTSSTTT TSTTTTSTTT TPTTTTTTTS APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKATAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGI FVVYDLPDRD CAAAASNGEY S IAN GVANY KAYI DS IRAQ LVTYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNYD EKLYINALAP LLTQQGWPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGESDGTSD TSAARYDYHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 196 node #77 QACQSVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT
PTTTTSSTTT TSTTTTSTTT TPTTTTTTTS APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKATAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI FVVYDLPDRD CAAAASNGEY S IANNGVANY KAYIDSIRAQ LVTYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGESDGTSD TSAARYDYHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 197 node #78 QACQTVWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT
TTTTTSSTTT TSTTTTSTTT TPTTTTTTTS APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKATAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI FVVYDLPDRD CAALASNGEY S IANNGVANY KAYIDSIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD TSAARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 198 node #79 QACQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT
TTTTTSSTTT TSATTTSTTT TPTTTTTTTS APSGATTTAT ASGPFSGYQL YANPYYSSEV
HSLAIPSLTD GSLAPAATAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI
FVVYDLPDRD CAALASNGEY S IAN GVANY KAYIDSIRAI LVKYSDVHTI LVIEPDSLAN
LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS
VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSVCD EKRYINALAP LLKEQGFPDA
HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD
TSAARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 199 node #80 QACQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTSTTT TTTTSSSTTT TSAATTTTTT TPTSTTTTTS APSSATTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAPAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FVVYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD TSAARYDAHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 200 node #81 QACQTLWGQC GGQGYTGATS CVAGAACSTL NPYYAQCTPA TASTTTSTTT
TTTTSSSTTT TSAATTTTTT TATSTPSTTS APSSPTTTAT ASGPFSGYQL YVNPYYSSEV
QSLAIPSLTD GSLAAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ
FVVYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN
LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA
HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD
TSAARYDAHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 201 node #82 QACQTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA TASTTTSTTT TTTTSSSTTT TTAAVTTTTT TATSTPSASA SPSSPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAI PSLTD GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FVVYDLPDRD CAALASNGEY SIADNGVEHY KAYIDSIREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP AF
SEQ ID NO: 202 node #83 QACQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCLPG TATTTTTTTT
TTTTTTSTTT TSTTTTSTTT TPTTTTTTTS APSGATITAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI
FVVYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN
LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS
VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFPDA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD
TSSARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 203 node #84 QACQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT
TTTTTTSTTT TSTTTTSTTT TASTGTTTTS APSGATITAT PSGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI
FVVYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN
LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS
VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFPDA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD
TSSARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 204 node #85 QACQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT
TTTTTTSTTT TTTTTTSTTT SAGSGTTTTS APSGATITAS PSGPFSGYQL YANPYYSSEV
HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI
FVVYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN
LVTNLNVAKC SGAQDAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAELFAS
VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDQNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTDTGD ALQDAFVWVK PGGESDGTSD TSSARYDAHC GYSYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 205 node #86 QACQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATATTTTTT
TSTTTTSTTT TSTTTTQTTT KPTTTGPTTS APSGATITVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FVVYDLPDRD CAALASNGEY S IAN GVANY KAYIDAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VY DAGSPAA LRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFPDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYSYDALQPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 206 node #87 QACQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL
VTTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV
HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI
FVVYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN
MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAT
VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTQNGWPNA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN
SSATRYDYHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SL
SEQ ID NO: 207 node #88 QACQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL
VKTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV
HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI
FVVYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN
MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT
VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA
HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN
SSATRYDYHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP SL SEQ ID NO: 208 node #89 QACQSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATTATSTTL VKTTSSTSVG TSTATTSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FVVYDLPDRD CAAAASNGEY TVAN GVA Y KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC GYSYDALQPA PEAGTWFQAY FVQLLTNANP AL
SEQ ID NO: 209 node #90 QACASVWGQC GGQGWTGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
PTTTTSSTTT SSTTTSSTST TPPPATTTTS APGGASGTAT YSGPFSGVQL
WANSYYASEV STLAIPSLTD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS
TLADIRAANK AGANPPYAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY
KAYIDSIRAQ LVTYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC
INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA
LRGLATNVAN YNAWNITSAP SYTQGDSNYD EKLYIQALSP LLTQQGWPDA
HFITDQSRSG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK
PGGECDGTSD TTAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 210 node #91 QACASVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR STTTTSSTTT SSTTTSSTST TPPPGTTTTS APGGASGTAT YSGPFSGVNL WANSYYASEV STLAIPSLTD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRAI LVTYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGNSNYD EKLYIQALSP LLTQQGWPNA HFITDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 211 node #92 QACASVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR STTTTSSTTT SSTTTSTTST TTPPGTTTTS APGGASGTAT YTGPFSGVNL WANSYYRSEV STLAI PSLTD GAMATKAAAV AKVPSFQWLD TAAKVPTMSG TLADIRAANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRAI LVTYSDIRII LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK IYKDAGKPAA LRGLATNVAN YNAWNITSAP SYTQPNPNYD EKHYIEAFSP LLTQEGWPNA HFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 212 node #93 QACGSVWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
STTTTSSTTS SSTTTSTTST TTPPGTTTTS APGGASSTAS YTGPFSGVNL WANNYYRSEV HTLAIPSLTD GAMATKAAAV AKVPSFQWLD TAAKVDTMSG TLADIRAANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRKL LVTYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAELFAK IYKDAGKPAA LRGLATNVAN YNAWNITSAP SYTSPNPNYD EKHYIEAFSP LLTAEGWPNA HFIVDQGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 213 node #94 QACGSQWGQC GGQGWSGATC CASGSTCWQ NDYYSQCLPG SATTTTSSTR
STTTTSSTSS STTTTSTTST TTPPGTTTTS APGGAGTTAS FTGPFSGVNL WANNYYSSEV
HTLAIPSLTD GAMATKAAAV AKVPSFQWLD lAAKVDTMPG TLADIRAANK AGGNPPYAAQ
FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRKQ LVTYSDIRTI LVIEPDSLAN
MVTNMNVPKC ANAQAAYREC TIYAIKQLNL PNVAMYLDGG HAGWLGWPAN LQPAADLFGK
LYADAGKPSQ LRGMATNVAN YNAWNLTSAP SYTSPNPNYD EKHYIEAFSP LLAAKGWPNA
HFIVDQGRSG KQPTGQQEWG HWCNAMGTGF GMRPSANTGH ELVDAFVWIK PGGECDGTSD
TTAARFDHFC GMSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 214 node #95 QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDYYSQCLPG SSTTTTSSTR
STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV
HTLAIPSMTD GAMATKAAAV AKVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ
- I l l - FVVYDLPDRD CAAAASNGEF SIANGGVANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN
MVTNMNVAKC ANAASAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK
IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFPPA
HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 215 node #96 QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPG SSTTTTSSTR STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAIPSMTD GAMATKAAAV AEVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FVVYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFPPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 216 node #97 QNCGSVWGQC GGIGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTSSTT
STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAIPSLTD GAMATKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGQ FVVYDLPDRD CAAAASNGEW SIANGGAANY KAYIDRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFPPA HFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 217 node #98 QNCGSVWGQC GGNGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTPSST STTTSSSSRS TSTSTSTTST TTPPVATTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAIPSLTD GAMATKAAAV AEVPSFQWMD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FVVYDLPDRD CAAAASNGEW SIANGGAANY KAYIDRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC AGAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYVEAFSP LLTAAGFPPA HFITDTGRSG KQPTGQLEWG HWCNAIGTGF GQRPTANTGH DLVDAFVWIK PGGECDGTSD TTAARYDHHC GLAYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 218 node #99 QNCGSVWGQC GGIGYNGPTC CQSGSTCVKQ NDWYSQCLPG SSTTTTSSTT
STTTTSSTTS SSTSTSTTST TTAPAPTTTT IPGGASSTAS YNGPFSGVQL WANNYYRSEV
HTLAIPSLTD PALAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGQ
FVVYDLPDRD CAAAASNGEW S IANNGANNY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN
MVTNMNVAKC SNAASTYKEL TIYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK
lYKDAGRPAA VRGLATNVAN YNAWSISSPP SYTSPNPNYD EKHYIEAFSP LLTAQGFPPA
QFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD
TTAARYDYHC GLSYDALKPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 219 node #100 QNCGSVWSQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPN SSTTTSTSTR
STTTSSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV
HTLAIPSMTD GAMATKAAAV AEVPSFQWLD RNVTIDTMAQ TLSQIRAANK AGANPPYAGQ
FVVYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL IIQYSDIRII LVIEPDSLAN
MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAE
IYKDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLTAAGFPPA
HFIVDTGRNG KQPTGQQQWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSYDALQPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 220 node #101 QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR
STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL
FANDYYRSEV HNLAIPSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ
TLSQVRALNK AGANPPYAAQ LVVYDLPDRD CAAAASNGEF SIANGGAANY
RSYIDAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL
TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA
VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFPPA
RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK
PGGESDGTSD TSAARYDYHC GLSYDALQPA PEAGQWFQAY FEQLLTNANP PF
SEQ ID NO: 221 node #102 QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK
VSSTTSKVTS SSAAQPITTT TAPSVPTTTT IAGGASSTAS FTGPFLGVQG WANSYYSSEI
YNHAIPSMTD GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH
FVVYDLPDRD CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN
LVTNTNVAKC ANAASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD
IYKAAGKPKS VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGFPPA
QFIVDTGRSG KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD
TSATRYDYHC GLSYDALKPA PEAGQWFQAY FEQLLKNANP AF
SEQ ID NO: 222 node #103 QACASVWGQC GGQGWSGATC CASGSTCWQ NDFYSQCLPG SATTATSSTR STTTTSVTTT SSTTTATTST STPPGTTVTS APGGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAK
lYKDAGKPAA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFSP LLTQEGWPNA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLAYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 223 node #104 QACASQWGQC GGQGWSGPTC CASGSTCWQ NAFYSQCLPG SATTATSSTR STTTTSVTST SSTTTATTSV STPPGTTVTS PPGGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FVVYNLPDRD CAAAASNGEL SIADGGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAVKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK
lYKDAGKPKA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFAP LLTQEGWPDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLAYDALKPA PEAGTWFQAY FEQLLTNANP SF
SEQ ID NO: 224 node #105 QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR
PATTSSVRST TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLSD GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGNPPYAGI FVVYNLPDRD CAAAASNGEL SIANDGINKY KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC SNAAAAYKEC TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS
LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGWPDA KFIIDQGRSG KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC GSAYSSMKPA PEAGTWFQAY FEQLLRNANP SF
SEQ ID NO: 225 node #106 QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV
RTSSTPWSP SRTSTVTGSV STTSAGTGTT PPGGPTGGAT YTGPFVGVNL WANSYYASEI
STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE
FVVYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK
VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWPDA
KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKAYDALKPA PEAGTWFQAY FEQLLINANP AF
SEQ ID NO: 226 node #107 QSCSNVWSQC GGQNWSGTPC CTSGNKCVKL NDFYSQCQPG SATSATSSTT
SATTTSVTTT ATKTTATTSS STTSGTSVTS APGGPSGPPA ATDPFSGVDL WANNYYRSEV STLAIPKLSD GAMATAAAKV ADVPSFQWMD TYDHISLMEE TLADIRKANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTRI I LVIEPDSLAN MVTNMNVPKC ANAESAYKEL TIYAIKELNL PNVSMYLDAG HGGWLGWPAN LPPAAQLYAQ IYKDAGKPSR LRGLVTNVSN YNAWKLSSKP DYTESNPNYD EQRYINAFSP LLAQEGWPNA KFIVDQGRSG KQPTGQKAWG DWCNAPGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLDYDALKPA PEAGTWFQAY FEQLLKNANP SF
SEQ ID NO: 227 node #108 QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTTTSSTR
SSTTTSSTRA SSTTTSSSST TPPPGSTTTS APPVGSGTAT YSGPFSGVNP WANSYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLAKTPLMSS TLADIRAANK AGGNPPYAGQ FVVYDLPDRD CAAAASNGEY SIADGGVAKY KNYIDTIRAI LVTYSDIRTI LVIEPDSLAN LVTNLSVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFVWVK PGGECDGTSD TSAARYDYHC GLSYDALQPA PEAGTWFQAY FVQLLTNANP SF
SEQ ID NO: 228 node #109 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 229 node #110 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 230 node #111 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF SEQ ID NO: 231 node #112 QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR
ASSTTARARA SSTTSSRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP
WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ
TLADIRTANK NGGNPPYAGQ FVVYDLPDRD CAALASNGEY SIADGGVDKY
KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC
INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA
FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK
PGGECDGTSD SSAPRFDSHC ALPYDALQPA PQAGAWFQAY FVQLLTNANP SF
SEQ ID NO: 232 node #113 QDCRPGYAAC GGSEWTDDNS CTGGGWCGVA NPTYSGCWAG SPTSSTNTDV TYTDSDNVGR WGVENPVPTS SQPPTPTTSS APGGGGSPLA VSGPFEGADL YLNPYYVEEV DALAIAQVTD SALKAKAEKV KQIPTAIWLD TIENIQQLPA YLDDALALQQ ESGKKPVLVV FVVYDLPNRD CHAEASNGEL HLDDGGLQRY KEYI DPIEQK LKKYSGQRIV LI IEPDSLAN LVTNLNTAKC RDTEASYKEG VAYAIKKFGM PHVYMYLDAG HAGWLGWNDN QEKAAKVFAE VIKSAGSPGK VRGFTTNVAN YTPLSDTARG PDTQGNPCFD EFRYLDAMAS ALRAEGISDV HFI I DTSRNG VKNI DRKHSG NWCNNKGAGL GARPKANSGA TNLDAFVWIK PPGESDGTSD ESAPRYDPYC GKEYDAMKPA PEAGQWFHEY FVQLLKNANP PL
SEQ ID NO: 233 node #114 LDCRPGYPCC SGSEYTDDDG VENGNWCGIA DPTYESCWAE SPCSSTNTEV EYTDSDNVGK WGVENPVPTG TQTPTPTSGS DPGTPGSPLT ISGPFKGVEF YLNPYYVDEV DALAIAQMTD SSLKAKAEKM KTFSNAIWLD TIKNMQQLET NLQGALAQHQ TSGKKPVLTV FVVYDLPGRD CHALASNGEL LANDGDLQRY KTYI DVIEEK LKKYNSQPVV LI IEPDSLAN LVTNLNTPAC RDSEQYYLEG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYAK VIKSSGSPGK VRGFTDNVAN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGI SSV YFVSDTSRNG HKNVDRKHPG EWCNQTGVGI GARPKANSGM DYLDAFYWIK PLGESDGTSD ESAARYDGYC GHEYTAMKPA PEAGQWFQKH FEQGLKNANP PL

Claims

What is claimed:
1. An isolated polypeptide comprising an amino acid sequence about 90% identical to the amino acid sequence of SEQ ID NO: 2.
2. An isolated polypeptide comprising an amino acid sequence about 90% identical to the amino acid sequence of SEQ ID NO: 3.
3. An isolated polypeptide comprising an amino acid sequence about 90%> identical to the amino acid sequence of SEQ ID NO: 10.
4. An isolated polypeptide comprising an amino acid sequence about 90%> identical to the amino acid sequence of SEQ ID NO: 11.
5. An isolated polypeptide comprising an amino acid sequence about 90%> identical to the amino acid sequence of SEQ ID NO: 1, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 12, SEQ ID NO 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, or SEQ ID NO: 233.
6. The isolated polypeptide of claim 1, 2, 3, 4, or 5, wherein the signal peptide is
removed.
7. An isolated polypeptide comprising an amino acid sequence about 90% identical to the amino acid sequence of SEQ ID NO: 31.
8. An isolated polypeptide comprising an amino acid sequence about 90% identical to the amino acid sequence of SEQ ID NO: 32.
9. An isolated polypeptide comprising an amino acid sequence about 90%> identical to the amino acid sequence of SEQ ID NO: 39.
10. An isolated polypeptide comprising an amino acid sequence about 90%> identical to the amino acid sequence of SEQ ID NO: 40.
11. An isolated polypeptide comprising an amino acid sequence about 90%> identical to the amino acid sequence of SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 41, SEQ ID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45, SEQ ID NO: 46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO: 52, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, or SEQ ID NO: 59.
12. A nucleic acid encoding a polypeptide of claim 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11.
13. A recombinant microorganism, wherein said microorganism expresses a nucleic acid of claim 12.
14. A recombinant microorganism, wherein said microorganism expresses a nucleic acid encoding a polypeptide of claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or a combination thereof.
15. The recombinant microorganism of claim 13 or 14, wherein the microorganism is a fungus.
16. The recombinant microorganism of claim 15, wherein the microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes.
17. The recombinant microorganism of claim 15, wherein the microorganism is a yeast.
18. The recombinant microorganism of claim 13 or 14, wherein the microorganism is a bacteria.
19. The recombinant microorganism of claim 17, wherein the microorganism is
Saccharomyces cerevisiae.
20. The recombinant microorganism of claim 15, wherein the microorganism is selected from the group consisting of Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.
21. The recombinant microorganism of claim 18, wherein the microorganism is selected from the group consisting of E.coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp.,
Acetovibrio sp., Microbispora sp., and Streptomyces sp.
22. A method for the production of cellulosic ethanol, comprising adding an isolated polypeptide of claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or a combination thereof, to a source material of cellulose for cellulose processing.
23. A method for the production of cellulosic ethanol, comprising adding a recombinant microorganism of claim 13, 14, 15, 16, 17, 18, or a combination thereof, to a source material of cellulose for cellulose processing.
24. A method for cellulose processing, comprising adding a polypeptide of claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11, or a combination thereof, to a source material of cellulose.
25. A method for cellulose processing, comprising adding a recombinant microorganism of claim 13, 14, 15, 16, 17, 18, or a combination thereof, to a source material of cellulose.
26. The method of claim 22 and 24, further comprising adding a recombinant
microorganism of claim 13, 14, 15, 16, 17, 18, or a combination thereof.
27. The method of claim 23 and 25, further comprising adding a polypeptide of claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or a combination thereof.
28. The method of claim 26 and 27, wherein the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
29. The method of claim 26 and 27, wherein the isolated polypeptide and recombinant microorganism are added simultaneously.
30. The method of claim 22, 23, 24, or 25, wherein carbohydrate polymers are
depolymerized.
PCT/US2013/039306 2012-05-02 2013-05-02 Biofuel production enzymes and uses thereof WO2013166312A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201261641786P 2012-05-02 2012-05-02
US61/641,786 2012-05-02
US201261651199P 2012-05-24 2012-05-24
US61/651,199 2012-05-24

Publications (1)

Publication Number Publication Date
WO2013166312A1 true WO2013166312A1 (en) 2013-11-07

Family

ID=49514893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/039306 WO2013166312A1 (en) 2012-05-02 2013-05-02 Biofuel production enzymes and uses thereof

Country Status (1)

Country Link
WO (1) WO2013166312A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015042543A3 (en) * 2013-09-20 2015-05-14 The Trustees Of Columbia University In The City Of New York Biofuel production enzymes and uses thereof
WO2017151957A1 (en) * 2016-03-02 2017-09-08 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
US11286508B2 (en) * 2017-12-19 2022-03-29 Universidad Del País Vasco Ancestral cellulases and uses thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6127160A (en) * 1996-03-14 2000-10-03 Japan As Represented By Director General Of Agency Of Industrial Science And Technology Protein having cellulase activities and process for producing the same
US20030082595A1 (en) * 2001-08-03 2003-05-01 Bo Jiang Nucleic acids of aspergillus fumigatus encoding industrial enzymes and methods of use
WO2006074005A2 (en) * 2004-12-30 2006-07-13 Genencor International, Inc. Variant hypocrea jecorina cbh2 cellulases
US20060218671A1 (en) * 2005-01-06 2006-09-28 Novozymes, Inc. Polypeptides having cellobiohydrolase activity and polynucleotides encoding same
US20090325240A1 (en) * 2008-02-29 2009-12-31 Henry Daniell Production and use of plant degrading materials
US20110159544A1 (en) * 2009-12-30 2011-06-30 Roal Oy Method for treating cellulosic material and CBHII/CEL6A enzymes useful therein

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6127160A (en) * 1996-03-14 2000-10-03 Japan As Represented By Director General Of Agency Of Industrial Science And Technology Protein having cellulase activities and process for producing the same
US20030082595A1 (en) * 2001-08-03 2003-05-01 Bo Jiang Nucleic acids of aspergillus fumigatus encoding industrial enzymes and methods of use
WO2006074005A2 (en) * 2004-12-30 2006-07-13 Genencor International, Inc. Variant hypocrea jecorina cbh2 cellulases
US20060205042A1 (en) * 2004-12-30 2006-09-14 Wolfgang Aehle Novel variant hypocrea jecorina CBH2 cellulases
US20060218671A1 (en) * 2005-01-06 2006-09-28 Novozymes, Inc. Polypeptides having cellobiohydrolase activity and polynucleotides encoding same
US20090325240A1 (en) * 2008-02-29 2009-12-31 Henry Daniell Production and use of plant degrading materials
US20110159544A1 (en) * 2009-12-30 2011-06-30 Roal Oy Method for treating cellulosic material and CBHII/CEL6A enzymes useful therein

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015042543A3 (en) * 2013-09-20 2015-05-14 The Trustees Of Columbia University In The City Of New York Biofuel production enzymes and uses thereof
WO2017151957A1 (en) * 2016-03-02 2017-09-08 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
CN109415712A (en) * 2016-03-02 2019-03-01 诺维信公司 Cellobiohydrolase variant and the polynucleotides for encoding them
US10738293B2 (en) 2016-03-02 2020-08-11 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
US11053489B2 (en) 2016-03-02 2021-07-06 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
US11286508B2 (en) * 2017-12-19 2022-03-29 Universidad Del País Vasco Ancestral cellulases and uses thereof

Similar Documents

Publication Publication Date Title
US20240124902A1 (en) Enzyme-expressing yeast for ethanol production
US20230012672A1 (en) Polypeptides having beta-glucanase activity and polynucleotides encoding same
US20230399632A1 (en) Glucoamylase Variants and Polynucleotides Encoding Same
US20240117331A1 (en) Polypeptides having pectinase activity, polynucleotides encoding same, and uses thereof
CA3114783A1 (en) Enzyme-expressing yeast for ethanol production
US20120276594A1 (en) Cellobiohydrolase variants
CA3064042A1 (en) Improved yeast for ethanol production
CN113286871A (en) Microorganisms with enhanced nitrogen utilization for ethanol production
JP2014512181A (en) Accelerated cellulose degradation
US20150087028A1 (en) Cbh1a variants
CA3143527A1 (en) Fusion proteins for improved enzyme expression
US8975058B2 (en) Endoglucanases for treatment of cellulosic material
Li et al. De novo transcriptome analysis of Pleurotus djamor to identify genes encoding CAZymes related to the decomposition of corn stalk lignocellulose
CN104685052A (en) Method for enhancing the degradation or conversion of cellulosic materials
DK2855673T3 (en) IMPROVED ENDOGLUCANASES FOR TREATMENT OF CELLULOS MATERIAL
WO2013166312A1 (en) Biofuel production enzymes and uses thereof
WO2015042543A2 (en) Biofuel production enzymes and uses thereof
Nevalainen et al. Sources, properties, and modification of lignocellulolytic enzymes for biomass degradation
US20220251609A1 (en) Microorganisms with improved nitrogen transport for ethanol production
Tran One-Pot Enzymatic Treatment of Lignocellulosic Biomass for Bioenergy Production
WO2024258820A2 (en) Processes for producing fermentation products using engineered yeast expressing a beta-xylosidase
WO2019074828A1 (en) Cellobiose dehydrogenase variants and methods of use thereof
Lehmann Optimization of cellulase CelA2 with improved performance in high ionic strength for the production of biofuels

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13784295

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13784295

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载