WO2012033457A1

WO2012033457A1 - Polypeptides

Info

Publication number: WO2012033457A1
Application number: PCT/SE2011/051086
Authority: WO
Inventors: Marie-Francoise Gorwa-Grauslund; Nadia Skorupa Parachin
Original assignee: C5 Ligno Technologies In Lund Ab
Priority date: 2010-09-10
Filing date: 2011-09-08
Publication date: 2012-03-15

Abstract

The invention relates to two polypeptides having xylose reductase (XR) activity, genes encoding said polypeptides, hosts harbouring said genes as well as the use of said genes encoding said polypeptides for the construction of microorganisms for xylose and/or arabinose utilisation.

Description

POLYPEPTIDES

FIELD OF INVENTION

The invention relates to two polypeptides having xylose reductase (XR) activity, genes encoding said polypeptides, host harbouring said genes as well as the use of said genes encoding said polypeptides for the construction of microorganisms for xylose and/or arabinose utilisation.

BACKGROUND OF INVENTION

Lignocellulosic biomass is the most abundant renewable resource on earth and the least costly feedstock available. Its utilization avoids arbitrary land-usage and additional C0₂ emission, thus being sustainable. Lignocellulose is composed of cellulose (35-50%),

hemicellulose (20-35%) and lignin (15-25%), with the percentage of each component varying according to the raw material. The cellulose fraction is composed of the hexose sugar glucose whereas the hemicellulose fraction is composed of both hexose and pentose sugars. Xylose and arabinose are the most common pentose sugars in lignocellulosic raw material thus being an excellent substrate for the production of fuels, bulk and platform chemicals.

Baker' s yeast Saccharomyces cerevisiae is a GRAS (generally regarded as safe) microorganism that has been used for baking and brewing throughout human recorded history. It is currently used as an industrial production microorganism in large scale facilities up to 6000 m³. However, S. cerevisiae cannot naturally ferment xylose and arabinose. To create pentose consuming S. cerevisiae strains these have to be engineered with one or several of the initial xylose and arabinose catabolic pathways (Fig 1 A-D) (Hahn-Hagerdal B, Karhumaa K, Jeppsson M, Gorwa-Grauslund MF (2007) Metabolic engineering for pentose utilization in

Saccharomyces cerevisiae. Adv Biochem Eng Biotechnol 108: 147-77).

There are two catabolic pathways that can be expressed in S. cerevisiae to enable xylose conversion to xylulose. One is an isomerization pathway (Fig 1A), most frequently found in bacteria but also present in some fungi. In this pathway xylose is directly converted to xylulose by xylose isomerase (XI), EC5.3.1.5. The other is a reductive/oxidative (Fig. IB) pathway, consisting of xylose reductase (XR), ECl .1.1.21 and xylitol dehydrogenase (XDH),

EC 1.1.1.175, most frequently from Pichia stipitis, which convert xylose to xylitol and xylitol to xylulose, respectively. In both pathways xylulose enters the pentose phosphate pathway (PPP) by its conversion to xylulose-5-phosphate by xylulokinase (XK) EC2.7.1.17. As for arabinose consumption, two metabolic pathways can also be introduced in S. cerevisiae. The first one is most frequently present in bacteria and consists in L-arabinose isomerase (AI), EC 5.3.1.4, L- ribulokinase (RK), EC 2.7.1.16, and L-ribulose-5 -phosphate 4-epimerase (RPE), EC 5.1.3.4 (Fig. 1 C). For the second one, most frequently present in fungi, an aldose reductase (AR, EC 1.1.1.21) reduces L-arabinose to L-arabitol. Then L-arabitol is oxidised by L-arabitol dehydrogenase (LAD, EC 1.1.1.12) to L-xylulose, which is reduced to xylitol by L-xylulose reductase (LXR, EC 1.1.1.10). In the second pathway, L-arabinose converges with the D-xylose pathway at the level of the chiral compound xylitol (Fig. ID).

Xylose reductase from P. stipitis is able to reduce xylose and arabinose to xylitol and arabitol, respectively. This has been considered a drawback when the xylose reductive/oxi dative pathway is combined with the arabinose isomerization pathway because arabinose isomerase is inhibited by the alcohols formed during the reduction reaction. Thus a xylose reductase with minimized activity towards arabinose is advantageous for the combination of xylose and arabinose catabolic pathways.

Both XR-XDH and XT encoding genes have been heterologously expressed in S.

cerevisiae. Expression of XR and XDH encoding genes from P. stipitis resulted in higher specific ethanol productivity than when expressing XI encoding gene, as a result of a

significantly higher xylose consumption rate (Karhumaa K, Garcia Sanchez R, Hahn-Hagerdal B, Gorwa-Grauslund MF (2007). Comparison of the xylose reductase-xylitol dehydrogenase and the xylose isomerase pathways for xylose fermentation by recombinant Saccharomyces cerevisiae Microbial Cell Factories. 6:5). However the reductive/oxi dative catabolic pathway also resulted in xylitol formation as by-product, probably due to a cofactor imbalance between XR and XDH. In contrast, only 4% of the consumed xylose was converted to xylitol using the isomerization pathway. However, heterologous overexpression of XT genes in S. cerevisiae is difficult, resulting in low activity and low sugar uptake rates in most cases. Thus pentose consumption in recombinant strains of S. cerevisiae could benefit from the isolation of novel enzymes within the xylose and arabinose catabolic pathway(s). Soil is one of most complex environments on earth and is composed of an immense diversity of microorganisms. For instance it has been reported that one gram of soil harbors approximately ten billion bacteria from 4000- 7000 different species. In order to fully access soil diversity and avoid cultivation steps its entire genetic material can be extracted and cloned into suitable vectors. This method has been named metagenomics and has emerged as a powerful tool for isolation of novel and previously unknown biocatalysts (Handelsman, J. 2004. Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68, 669-85). Metagenomes have so far been used for isolation of industrially important hydrolytic enzymes such as amylases, cellulases and xylanases. However, metagenome has never been used to screen for efficient biocatalyts involved in the conversion of xylose or arabinose into products such as fuels, bulk chemicals and platform chemicals. SUMMARY OF THE INVENTION

The invention relates to the isolation of polypeptides having XR activity, wherein said polypeptides have the ability to reduce a number of compounds. The invention also relates to the use of these polypeptides to improve xylose and/or arabinose conversion in host cells as well as host cell harbouring one or both of the mentioned enzymes/polypeptides.

In a first aspect the invention relates to an isolated polypeptide having xylose reductase (XR) activity but that has minimized/neglidible activity towards arabinose, wherein said polypeptide has at least 60 % identity to the amino acid sequences shown in SEQ ID NO:2 or 4, respectively, able to reduce xylose to xylitol. The new XRs have a higher affinity for xylose compared to Pichia stipitis XR and/or higher affinity for the cofactor NAD(P)H.

In a second aspect the invention relates to an isolated host cell selected from the group consisting of bacteria, yeast and filamentous fungi and comprising at least one nucleotide sequence selected from the group consisting of SEQ ID NO: 1 or 3 or a nucleotide sequence having at least 60 % identity to the nucleotide sequence shown in SEQ ID NO: 1 or 3.

The polypeptides defined above may be used alone or in combination with other polypeptides in several processes for the production of fuels, bulk chemicals and platform chemicals from lignocellulosic material, such as lignocellulosic feedstock, wherein there is a need for xylose and arabinose conversion. Examples of biofuels, bulk and platform chemicals include ethanol, butanol, isobutanol, lactate, 1,4-diacids (succinate, fumaric, malic), glycerol, sorbitol, mannitol, sugar alcohols (such as xylitol and arabi(ni)tol), L-ascorbic acid, xylitol, hydrogen gas, 2,5-furan dicarboxylic acid, 3 -hydroxy propionic acid, aspartic acid, glutaric acid, glutamic acid, itaconic acid, levulinic acid, and 3 -hydroxy butyrolactone, fatty acids, fatty- derived molecules, isoprenoids, isoprenoid-derived molecules, alkanes, isopentanol and isoamylacetate.

The different aspects mentioned above create new strains that solves a number of technical problems including that the polypeptides enable a more efficient utilisation of xylose and/or arabinose which results in higher yield and/or productivity of the specific product (gram product per gram substrate), which is achieved due to reduction of by-product formation, such as arabitol formation.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1. The xylose and arabinose oxidative/reductive (B and D); and isomerization (A and C) utilization pathways. Figure 2. Domains identified in the amino acid sequences of the xylose reductases encoded by P. stipitis XYLl (A), metARl (B) and metAR2 (C) genes. Conserved domains of each amino acid sequence were obtained by blasp search (http://blast.ncbi.nlm.nih.gov/Blast). The approximately length of the amino acid sequences are indicated above the sequences.

Abbreviations: ADH-alcohol dehydrogenase, NADB- NAD binding domain.

Figure 3. Aerobic growth of E. coli recombinant strains overexpressing xylose reductases isolated from metagenome library: metARl (■) and metAR2 (A) with positive (♦) control in SM3 media supplemented with xylose (30 g/1).

Figure 4. Ratio of arabinose and xylose and arabinose reductase specific activity in S. cerevisiae strains overexpressing XR(K270R) gene from P. stipitis, the metARl and the metAR2 gene. DETAILED DESCRIPTION OF THE INVENTION

Definitions

In the context of the present application and invention the following definitions apply:

The term "nucleic acid" and "polynucleotide" refer to RNA or DNA that is linear or branched, single or double stranded, or a hybrid thereof. The term also encompasses RNA/DNA hybrids. These terms also encompass untranslated sequence located at both the 3' and 5' ends of the coding region of the gene: at least about 1000 nucleotides of sequence upstream from the 5' end of the coding region and at least about 200 nucleotides of sequence downstream from the 3' end of the coding region of the gene. Less common bases, such as inosine, 5-methylcytosine, 6- methyladenine, hypoxanthine, and others can also be used for antisense, dsRNA, and ribozyme pairing. For example, polynucleotides that contain C-5 propyne analogues of uridine and cytidine have been shown to bind RNA with high affinity and to be potent antisense inhibitors of gene expression. Other modifications, such as modification to the phosphodiester backbone, or the 2'- hydroxy in the ribose sugar group of the RNA can also be made. The antisense polynucleotides and ribozymes can consist entirely of ribonucleotides, or can contain mixed ribonucleotides and deoxyribonucleotides. The polynucleotides of the invention may be produced by any means, including genomic preparations, cDNA preparations, in vitro synthesis, RT-PCR, and in vitro or in vivo transcription.

The term "vector" as used herein refers to a nucleic acid molecule, either single- or double-stranded, which is isolated or constructed using naturally occurring DNA sequences or which is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature. The term vector is synonymous with the term "expression cassette" when the vector contains the control sequences required for expression of a coding sequence of the present invention. The term vector also includes the term "integration nucleic acid construct" or

"integration fragment" when the construct is to be used to integrate the construct/fragment into the genome of a host.

The term "control sequences" is defined herein to include all components, which are necessary or advantageous for the expression of a polynucleotide encoding a polypeptide of the present invention. Each control sequence may be native or foreign to the nucleotide sequence encoding the polypeptide. Such control sequences include, but are not limited to,

polyadenylation sequence, pro-peptide sequence, promoter, and transcription terminator. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleotide sequence encoding a polypeptide.

The term "host cell", as used herein, includes any cell type which is susceptible to transformation, transfection, transduction, and the like with a nucleic acid.

In the present context, amino acid names and atom names are used as defined by the Protein DataBank (PNB) (www.pdb.org), which is based on the IUPAC nomenclature (IUPAC Nomenclature and Symbolism for Amino Acids and Peptides (residue names, atom names etc.), Eur J Biochem., 138, 9-37 (1984) together with their corrections in Eur J Biochem., 152, 1 (1985). The term "amino acid" is intended to indicate an amino acid from the group consisting of alanine (Ala or A), cysteine (Cys or C), aspartic acid (Asp or D), glutamic acid (Glu or E), phenylalanine (Phe or F), glycine (Gly or G), histidine (His or H), isoleucine (He or I), lysine (Lys or K), leucine (Leu or L), methionine (Met or M), asparagine (Asn or N), proline (Pro or P), glutamine (Gin or Q), arginine (Arg or R), serine (Ser or S), threonine (Thr or T), valine (Val or V), tryptophan (Trp or W) and tyrosine (Tyr or Y), or derivatives thereof.

To determine the percent identity of two amino acid sequences or of two nucleic acids, the sequences are aligned for optimal comparison purposes. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., percent identity=number of identical positions/total number of positions (e.g., overlapping

positions).times. l00). In one embodiment, the two sequences are the same length. In another embodiment, the comparison is across the entirety of the reference sequence (e.g., across the entirety of one of SEQ ID NO: 1-4. The percent identity between two sequences can be determined using techniques similar to those described below, with or without allowing gaps. In calculating percent identity, typically exact matches are counted. The determination of percent identity between two sequences can be accomplished using a mathematical algorithm. A non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into the BLASTN and BLASTX programs of Altschul et al. (1990) J. Mol. Biol. 215:403. BLAST nucleotide searches can be performed with the BLASTN program, score=100, wordlength=12, to obtain nucleotide sequences homologous to delta-endotoxin-like nucleic acid molecules of the invention. BLAST protein searches can be performed with the BLASTX program, score=50, wordlength=3, to obtain amino acid sequences homologous to delta-endotoxin protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-Blast can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., BLASTX and BLASTN) can be used. Alignment may also be performed manually by inspection.

Another non-limiting example of a mathematical algorithm utilized for the comparison of sequences is the ClustalW algorithm (Higgins et al. (1994) Nucleic Acids Res. 22:4673-4680). ClustalW compares sequences and aligns the entirety of the amino acid or DNA sequence, and thus can provide data about the sequence conservation of the entire amino acid sequence. The

ClustalW algorithm is used in several commercially available DNA/amino acid analysis software packages, such as the ALIGNX module of the Vector NTI Program Suite (Invitrogen

Corporation, Carlsbad, Calif). After alignment of amino acid sequences with ClustalW, the percent amino acid identity can be assessed. A non-limiting example of a software program useful for analysis of ClustalW alignments is GENEDOC.TM.. GENEDOC.TM. (Karl Nicholas) allows assessment of amino acid (or DNA) similarity and identity between multiple proteins. Another non-limiting example of a mathematical algorithm utilized for the comparison of sequences is the algorithm of Myers and Miller (1988) CABIOS 4:1 1-17. Such an algorithm is incorporated into the ALIGN program (version 2.0), which is part of the GCG Wisconsin Genetics Software Package, Version 10 (available from Accelrys, Inc., 9685 Scranton Rd., San Diego, Calif, USA). When utilizing the ALIGN program for comparing amino acid sequences, a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used.

Unless otherwise stated, GAP Version 10, which uses the algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48(3):443-453, will be used to determine sequence identity or similarity using the following parameters: % identity and % similarity for a nucleotide sequence using GAP Weight of 50 and Length Weight of 3, and the nwsgapdna.cmp scoring matrix; % identity or % similarity for an amino acid sequence using GAP weight of 8 and length weight of 2, and the BLOSUM62 scoring program. Equivalent programs may also be used. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by GAP Version 10.

Xylose reductase

The invention relates to an isolated polypeptide having aldose reductase activity but thathas negligible arabinose reduction ability , wherein said polypeptide has at least 60 % identity to the amino acid sequence shown in SEQ K) NO:2 or 4, such as 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99 % identity to the amino acid sequence shown in SEQ K) NO:2 or 4.

In another embodiment the invention relates to a nucleotide sequence having at least 60 % identity to the nucleotide sequence shown in SEQ ID NO:l or 3, such as 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99 % identity to the nucleotide sequence shown in SEQ ID NO:l or 3.

In a further embodiment the invention relates to a vector or vectors comprising the nucleotide sequence(s) as disclosed above or a host cell comprising the nucleotide sequence(s), such as bacteria, yeast as well as filamentous fungi or other host cells as defined later in the application under "isolated host cell".

In another embodiment the invention relates to the use of the polypeptide(s) having aldose reductase activity as disclosed above, the nucleotide sequence as disclosed above, the vector(s) as disclosed above or the host as disclosed above for the production of fuels, bulk chemicals and platform chemicals from lignocellulosic feedstock. The nucleotide sequences

The nucleotide sequences disclosed above may be obtained by standard cloning procedures used in genetic engineering to relocate the DNA sequence from its natural location to a different site where it will be reproduced. The cloning procedures may involve excision and isolation of a desired DNA fragment comprising the DNA sequence encoding the polypeptide of interest, insertion of the fragment into a vector molecule, and incorporation of the recombinant vector into a host cell where multiple copies or clones of the DNA sequence will be replicated. An isolated DNA sequence may be manipulated in a variety of ways to provide for expression of the polypeptide of interest. Manipulation of the DNA sequence prior to its insertion into a vector may be desirable or necessary depending on the expression vector. The techniques for modifying DNA sequences utilizing recombinant DNA methods are well known in the art. The nucleotide sequence to be introduced into the DNA of the host cell may be integrated in vectors comprising the nucleotide sequence operably linked to one or more control sequences that direct the expression of the coding sequence in a suitable host cell under conditions compatible with the control sequences. A nucleotide sequence encoding a polypeptide may be manipulated in a variety of ways to provide for expression of the polypeptide.

The control sequence may be an appropriate promoter sequence, a nucleotide sequence which is recognized by a host cell for expression of the nucleotide sequence. The promoter sequence contains transcriptional control sequences, which mediate the expression of the polypeptide. The promoter may be any nucleotide sequence which shows transcriptional activity in the host cell of choice including native, mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell. The promoter may be a weak or a strong promoter that is constitutive or regulated in the host to be used.

Examples of suitable promoters for directing the transcription of the nucleic acid constructs in bacteria of the present invention are described in "Useful proteins from

recombinant bacteria" in Scientific American, 1980, 242: 74-94; and in Sambrook J et al., 1989 Molecular Cloning: A Laboratory Manuel. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, supra. The promoters may be obtained from Escherichia coli, Zymomonas sp. and Klebsiella sp. Other examples includes Lactic acid bacteria such as Lactobacillus sp. as well as Lactococcus sp. Suitable promoters are constitutive and inducible promoters, are widely available and includes promoters from T5, T7, T3, SP6 phages, and the trp, lpp, and lac operons.

Examples of suitable promoters for directing the transcription of the nucleic acid constructs of the present invention in a yeast host are promoters obtained for example from the genes for Saccharomyces cerevisiae enolase (ENOl), S. cerevisiae galactokinase (GAL1), S. cerevisiae alcohol dehydrogenase 2 (ADH2), S. cerevisiae glyceraldehyde-3 -phosphate dehydrogenase (TDH1), S. cerevisiae glyceraldehyde-3 -phosphate dehydrogenase (TDH3) (Bitter and Egan. Expression of heterologous genes in Saccharomyces cerevisiae from vectors utilizing the glyceraldehyde-3 -phosphate dehydrogenase gene promoter. (1984) Gene 32: 263- 274), S. cerevisiae alcohol dehydrogenase 1 (ADH1), S. cerevisiae 3-phosphoglycerate kinase (PGKl) or S. cerevisiae cytochrome C (CYCl) (Karhumaa et al. Investigation of limiting metabolic steps in the utilization of xylose by recombinant Saccharomyces cerevisiae using metabolic engineering. 2005 Yeast 5:359-68). Another example of a yeast promoter is the constitutive truncated HXT7 promoter (Hauf et al. Enzym Microb Technol (2000) 26:688-698). Other suitable vectors and promoters for use in yeast expression are further described in EP A- 73,657 to Hitzeman, which is hereby incorporated by reference.

In a filamentous fungal host cell, promoters may be obtained from the genes for Aspergillus oryzae TAKA amylase, Rhizomucor miehei aspartic proteinase, Aspergillus niger neutral alpha- amylase, Aspergillus niger acid stable alpha-amylase, Aspergillus niger or Aspergillus awamori glucoamylase (glaA), Rhizomucor miehei lipase, Aspergillus oryzae alkaline protease,

Aspergillus oryzae triose phosphate isomerase, Aspergillus nidulans acetamidase, and Fusarium oxysporum trypsin-like protease (WO 96/00787), as well as the NA2-tpi promoter (a hybrid of the promoters from the genes for Aspergillus niger neutral alpha-amylase and Aspergillus oryzae triose phosphate isomerase), and mutant, truncated, and hybrid promoters thereof.

The polypeptides

The disclosed polypeptides may be partly or completely synthesized as long as the activity remains.

The polypeptides may also be hybrid polypeptides. The term "hybrid enzyme" or "hybrid polypeptide" is intended to mean for example that the polypeptides are linked to other amino acid residues or polypeptides as long as the activity remains or is increased. Additionally, the polypeptides may be exposed to mutations and/or substitutions as long as the mutations and/or substitutions increase the expression as well as the stability and activity of said polypeptides.

The vectors

The above disclosed vectors may comprise a DNA sequence encoding the polypeptide, a promoter, and transcriptional and translational stop signals as well as other DNA sequences. The vector comprises various DNA and control sequences known for a person skilled in the art, which may be joined together to produce a vector which may include one or more convenient restriction sites to allow for insertion or substitution of the DNA sequence encoding the polypeptide at such sites. Alternatively, the DNA sequence of the present invention may be expressed by inserting the DNA sequence or a DNA construct comprising the sequence into an appropriate vector for expression. In creating the vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression and possibly secretion.

The vector may be any vector (e.g., a plasmid, virus or an integration vector), which can be conveniently subjected to recombinant DNA procedures and can bring about the expression of the DNA sequence. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vectors may be linear or closed circular plasmids. The vector may contain any means for assuring self-replication.

Alternatively, the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. The vector system may be a single vector or plasmid or two or more vectors or plasmids, which together contain the total DNA to be introduced into the genome of the host cell, or a transposon. The vector may also be an integration vector comprising solely the gene or part of the gene to be integrated.

The vectors of the present invention may contain one or more selectable markers, which permit easy selection of transformed cells. A selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs and the like.

Useful expression vectors for eukaryotic hosts, include, for example, vectors comprising control sequences from SV40, bovine papilloma virus, adenovirus and cytomegalovirus. Specific vectors are, e.g., pCDNA3.1(+)Hyg (Invitrogen, Carlsbad, Calif, U.S.A.) and pCI-neo

(Stratagene, La Jolla, Calif , U.S.A.). Useful expression vectors for yeast cells include, for example, the 2 μ (micron) plasmid and derivatives thereof, the Yip, YEp and YCp vectors described by Gietz and Sugino (1988. "New yeast vectors constructed with in vitro mutagenized yeast genes lacking six-base pair restriction sites", Gene 74:527-534), the vectors described in Mumberg et al (Mumberg, Muller and Funk. 1995. "Yeast vectors for the controlled expression of heterologous proteins in different genetic backgrounds." Gene 156:419-422), YEplac-HXT vector (Karhumaa et al., 2005. Investigation of limiting metabolic steps in the utilization of xylose by recombinant Saccharomyces cerevisiae using metabolic engineering. Yeast.

22(5):359-68)), the POT1 vector (U.S. Pat. No. 4,931,373), the pJS037 vector described in Okkels, Ann. New York Acad. Sci. 782, 202-207, 1996, the pPICZ A, B or C vectors

(Invitrogen). Useful vectors for insect cells include pVL941, pBG311 (Cate et al., "Isolation of the Bovine and Human Genes for Mullerian Inhibiting Substance and Expression of the Human Gene in Animal Cells", Cell, 45, pp. 685-98 (1986), pBluebac 4.5 and pMelbac (both available from Invitrogen). Useful expression vectors for bacterial hosts include known bacterial plasmids, such as plasmids from E. coli, including pBR322, pET3a and pET12a (both from Novagen Inc., Wis., U.S.A.), wider host range plasmids, such as RP4, phage DNAs, e.g., the numerous derivatives of phage lambda, e.g., NM989, and other DNA phages, such as M13 and filamentous single stranded DNA phages. Examples of suitable viral vectors are Adenoviral vectors, Adeno associated viral vectors, retroviral vectors, lentiviral vectors, herpes vectors and cytomegalo viral vectors.

The vectors of the present invention may contain an element(s) that permits stable integration of the vector into the host cell genome or autonomous replication of the vector in the cell independent of the genome of the cell. The vectors of the present invention may be integrated into the host cell genome when introduced into a host cell. For integration, the vector may rely on the DNA sequence encoding the polypeptide of interest or any other element of the vector for stable integration of the vector into the genome by homologous or non homologous recombination.

Alternatively, the vector may contain additional DNA sequences for directing integration by homologous recombination into the genome of the host cell. The additional DNA sequences enable the vector to be integrated into the host cell genome at a precise location(s) in the chromosome(s). To increase the likelihood of integration at a precise location, the integrational elements should preferably contain a sufficient number of nucleotides, such as 100 to 1,500 base pairs, preferably 400 to 1,500 base pairs, and most preferably 800 to 1,500 base pairs, which are highly homologous with the corresponding target sequence to enhance the probability of homologous recombination. The integrational elements may be any sequence that is homologous with the target sequence in the genome of the host cell. Furthermore, the integrational elements may be non-encoding or encoding DNA sequences. On the other hand, the vector may be integrated into the genome of the host cell by non-homologous recombination. These DNA sequences may be any sequence that is homologous with a target sequence in the genome of the host cell, and, furthermore, may be non-encoding or encoding sequences. More than one copy of a DNA sequence encoding a polypeptide of interest may be inserted into the host cell to amplify expression of the DNA sequence.

An isolated host cell

The above disclosed host cells, comprise the nucleotide sequence as defined above either in a vector, such as an expression vector, or alternatively has the nucleotide sequence integrated into the genome, e.g. by homologous or heterologous recombination. The nucleotide sequence may be present as a single copy or multiple copies.

The host cell may be any appropriate prokaryotic or eukaryotic cell, e.g., a bacterial cell, such as both Gram positive or Gram negative bacteria, a yeast cell or a filamentous fungal cell. Any suitable host cell may be used for the maintenance and production of the vector of the invention, such as eukaryotic or prokaryotic cell, for example bacteria, fungi (including yeast). The host cell may be a host cell belonging to a GMP (Good Manufacturing Practice) certified cell-line, such as a mammalian cell-line.

Examples of bacterial host cells include Escherichia coli, Zymomonas sp. and Klebsiella sp. Other examples include Lactic acid bacteria such as Lactobacillus sp. as well as Lactococcus sp.

Examples of suitable yeast host cells include Saccharomyces sp., e.g. S. cerevisiae, S. bayanus or S. carlsbergensis, Schizosaccharomyces sp. such as Sch. pombe, Kluyveromyces sp. such as K. lactis, Pichia sp. such as P. stipitis, P. pastoris or P. methanolica, Hansenula sp., such as H. polymorpha, Candida sp., such as C. shehatae or Yarrowia sp.. Examples of S.cerevisiae strains are DBY746, ΑΗ22, S150-2B, GPY55-15Ba, CEN.PK, USM21, TMB3500, TMB 3400, VTT-A-63015, VTT-A-85068, VTT-c-79093) and their derivatives as well as Saccharomyces sp. 1400, 424A (LNH-ST), 259A (LNH-ST) and derivatives thereof.

Examples of suitable filamentous fungal host cells include Aspergillus sp., e.g. A. oryzae, A. niger, or A. nidulans , Fusarium sp., Hypocrea (formerly Trichoderma) sp. or Penicillium sp.

Use

The invented polypeptides, nucleotide sequences, vectors or host cells may be used in the production of biofuels, bulk chemicals and platform chemicals, such as ethanol, butanol, isobutanol, lactate, 1,4-diacids (succinate, fumaric, malic), glycerol, sorbitol, mannitol, L- ascorbic acid, xylitol, hydrogen gas, 2,5-furan dicarboxylic acid, 3 -hydroxy propionic acid, aspartic acid, glutaric acid, glutamic acid, itaconic acid, levulinic acid, and 3- hydroxybutyrolactone, fatty acids, fatty-derived molecules, isoprenoids, isoprenoid-derived molecules, alkanes, isopentanol, isoamylacetate. The process concept for the conversion of lignocellulosic feedstock to a fuel, bulk chemical and platform chemical e.g. ethanol may include steps such as a pre-treatment or fractionation step in which the chopped raw material is exposed to neutral, acidic or alkaline pH, at high temperature with or without air/oxygen added, so that the hemicellulose fraction is partially hydrolysed to monomeric and oligomeric sugars, rendering the cellulose fraction susceptible for hydrolysis or in which the chopped raw material is exposed to an organic solvent such as acetone, ethanol or similar, at high temperature, so that the lignin fraction is dissolved and extracted rendering the cellulose and hemicellulose fraction susceptible to hydrolysis. The hydrolysis of the pretreated and fractionated material may be performed with concentrated or diluted acids or with cellulolytic and hemicellulolytic enzyme mixtures.

In the processes where the hydrolysis of the raw material is performed with hydrolytic enzymes different fermentation modes may be applied. In the so called separate hydrolysis and fermentation (SHF) the fermentation takes place after hemicellulose hydrolysis. Alternatively the raw material may also be hydrolyzed simultaneously with sugar saccharification and

fermentation (SSF) mainly to reduce the reactor number and to avoid product inhibition. If the process involves co -fermentation of both hexoses and pentoses fraction, it is called simultaneous saccharification and co-fermentation (SSCF) (Hamelinck CN, van Hooijdonk G, Faaij APC .2005. Ethanol from lignocellulosic biomass: techno -economic performance in short-, middle- and long-term. Biomass & Bioenergy 28:384-410)

Following examples are intended to illustrate, but not to limit, the invention in any manner, shape, or form, either explicitly or implicitly. EXAMPLES EXAMPLE 1

Isolation of novel xylose reductase from soil metagenome

Construction of the soil metagenome library

Soil DNA was extracted from the upper soil layer in a garden in Southern Sweden. DNA extraction was performed utilizing PowerSoil™ DNA Isolation Kit (MO BIO, Carlsbad, California, USA). Isolated DNA was restricted with BamHl and Mbol. Subsequently, agarose gel electrophoresis was performed for purification of DNA fragments ranging from 2.0-6.0kb. Next a ligation system was set up with vector pRSETB (Invitrogen™, Carslbad, California, USA) restricted with BamHl and treated with alkaline phosphatase in order to prevent plasmid self ligation. Bacterial transformation was performed with ElectroTen-Blue competent cells

Stratagene® (La Jolla, California, USA) according to manufacturer's instructions.

Construction of screening strain and library screening

The gene XYL2 from P. stipitis that codes for XDH was excised from YIpOB09 (Bettiga, M., Bengtsson, O., Hahn-Hagerdal, B. and Gorwa-Grauslund, M. F. 2009. Arabinose and xylose fermentation by recombinant Saccharomyces cerevisiae expressing a fungal pentose utilization pathway. Microb Cell Fact 8, 40.) by restriction with Bg il and cloned into

pRSFDuet™(Novagen^® ,Darmstadt, Germany) restricted with the same enzyme. After sequencing, the plasmid with the XYL2 gene in correct orientation was introduced into E.coli strain HB\0\(supE44, hsdS20(rB «¾ ), recA\3, ara-\4, proA2, lacYl, galK2, rpsL20, xyl-5, mtl- 1, leuB6, thi-l). The soil metagenome library was then transferred to the screening strain.

Screening for novel enzymes that are able to reduce xylose was performed by detection of colonies in the SM3 medium (Rothen, S. A., Sauer, M., Sonnleitner, B. and Witholt, B. 1998. Growth characteristics of Escherichia coli HB101 [pGEc47] on defined medium. Biotechnology and Bioengineering 58, 92-100) supplemented with 11 g/L xylose, 100 μg/mL ampicillin, 30 μg/mL kanamycin and ImM IPTG at 37°C after about 5 days. Plasmids were extracted from the growing colonies and sequenced. Sequence analyses revealed the presence of two different reductases that were names metARl and metAR2. Neither metARl nor metAR2 corresponding aldose reductase had any sequence similarity with previously described xylose/arabinose reductases and more particularly no domain in common with ⁵. stipitis XR (Fig. 2). While XYL1 has a conserved domain of aldo-keto-reductases frequently found in other xylose/arabinose reductases, metARl had a conserved domain of carboxymucono lactone decarboxylase family (Fig. 2B). In metAR2, conserved domains for alcohol dehydrogenase (ADH) and NADB were found (Fig 2C). This last one is also found in several dehydrogenases of metabolic pathways such as glycolysis, and many other redox enzymes (Lesk, A. M. 1995. NAD-binding domains of dehydrogenases. Curr Opin Struct Biol 5:775-83). EXAMPLE 2

Growth on xylose and enzyme kinetics of metARl and metAR2

Construction of E. coli strains overexpressing metARl and metAR2

metARl and metAR2 encoding genes were subcloned into E.coli expression vector pRSETB (Invitrogen™, Carslbad, California, USA). For cloning in E.coli expression vector, the gene metARl was amplified with primers metARl F: 5'

ATCGGATCCATGCGTGCCTATTTCCTGCAAGC 3' and metAR2R 5'

CCAGAATTCTCAGATGCGCACCACGATCTTC 3' that contain restriction sites for BamHl and EcoRI respectively. Cloning of metARl gene followed exactly same procedures as described for metARl but with primers metAR2F: 5'ATGGATCCATGTCTTTGCAGAACCTGAGAG 3' and metAR2R: 5 ' TTTGAATTCCTAAACGGCCAAGGCCGCCTC 3'. Restriction sites are underlined in each primer sequence. After PCR, the amplified genes and plasmid pRSETB were restricted and ligated. Plasmids with insert, obtained after bacterial transformation, were sequenced and named pRSETB- metARl and pRSETB- metAR2. These were introduced into the screening strain which had XYL2 overexpressed. In addition the S. cerevisiae reductase YPRl cloned into the E.coli expression vector pRSETB (Skorupa Parachin, N., Carlquist, M. and Gorwa-Grauslund, M. F. 2009. Comparison of engineered Saccharomyces cerevisiae and engineered Escherichia coli for the production of an optically pure keto alcohol. Appl Microbiol Biotechnol 84, 487-97) was used to transform the screening strain in order to get a positive control.

Aerobic growth on xylose

E. coli strains were pre-grown in LB media at 180 rpm and 30°C until the end of exponential phase. After that, the strains were transferred to 50 mL of SM3 media supplemented with 30 g/L xylose and 1 mM IPTG at a starting

0.1. Aerobic growth in xylose liquid media was followed over time (Fig. 3). The positive control strain grew to the same extent as the strain overexpressing metARl but with a longer lag phase while the strain overexpressing the metARl reached stationary phase before the control strain. Recombinant E.coli strain overexpressing the metAR2 reached the highest final OD and reached the stationary phase at about the same time as the metARl . Affinity for xylose and NADPH

S. cerevisiae strains containing the novel reductases and the positive control were constructed and grown on YNB media (6.2 g/L YNB w/o amino acids, Difco, Becton, Dickinson and Company, Sparks, MD, USA) supplemented with 20 g/L glucose overnight. After that cell cultures were harvested by centrifugation at 5,000xg for 5 min. and washed once with distilled water. Wet cells were suspended (0.5 mg/mL) in Y-per yeast extraction protein reagent (Pierce, Rockford, IL, USA) in a 2-mL Eppendorf tube and incubated on a turning table at room temperature for 50 min. Cell debris was spun down for 5 min at 16,100xg (Hermle Labortechnik Z160M table centrifuge, Wehingen, Germany). Protein concentration was determined with Coomassie Protein Assay Reagent (Pierce, Rockford, Illinois, USA). Bovine serum albumin (BSA) was used as standard. Enzyme kinetics of metARl and metAR2 were determined in crude extracts as previously described (Rizzi, M., Erlemann, P., Buithanh, N. A. and Dellweg, H. 1988. Xylose Fermentation by Yeasts .4. Purification and Kinetic-Studies of Xylose Reductase from Pichia stipitis. Applied Microbiology and Biotechnology 29, 148-154) but with varying concentrations of xylose and NADPH (Table 8). The initial rates were fitted by unconstrained nonlinear optimization in MatLab2009a to eqn(l) which describes the initial rate for a two substrate reaction following a compulsory-order ternary-complex mechanism (Cornish-Bowden A. 2004. Fundamentals of Enzyme Kinetics: Portland Press): v = Vmax [A] £B]/ (KiAKmB + KmB \A} + KmA{B] +^■ [A] IS] } (1 ) where Vmax is the maximum velocity, [A] and [B] are the concentrations of NADPH and xylose, respectively, and KmA and KmB are the Michaelis constants for NADPH and xylose, respectively, and KiA is the dissociation constant of NADPH.

The affinity for NADPH (KmA) was about the same for XYL1(K207R) and for metARl while metAR2 had about 10 times higher affinity (Table 1). The reductases metARl and metAR2 also had about 2.3 and 4.4 times higher affinity for xylose (KmB) than XR(K207R) (Table 1). Table 1. Estimated kinetic parameters for NADPH-dependent xylose reduction from cell extracts of recombinant S. cerevisiae strains overexpressing XYL1 (K271R)-control, metARl and metAR2. KmA and KmB are the Michaelis constants of NADPH and xylose respectively.

Gene present KmA KmB

μΜ μΜ

XYL1 (K270R) 25.8±9.1 468 ± 28 metARl 21.2±12.7 202 ± 56 metAR2 2.6 ± 1.5 107 ± 32

EXAMPLE 3

Novel xylose reductases have negligible arabinose reductase activity

Xylose and arabinose reductase activity assays

For the determination of specific enzyme activity, the cell extracts of S. cerevisiae strains overexpressing the novel reductase genes, metARl and metARl, and the positive control XYL1 from P. stipitis were prepared as described above in example 4. Xylose reductase (XR) activity was measured in crude cell extracts at 30°C as previously described (Rizzi, M., Erlemann, P., Buithanh, N. A. and Dellweg, H. 1988. Xylose Fermentation by Yeasts .4. Purification and

Kinetic- Studies of Xylose Reductase from Pichia-Stipitis. Applied Microbiology and

Biotechnology 29, 148-154). Arabinose reductase activity was measured in exactly the same way as xylose reductase activity but with 3.5M of arabinose as substrate. Protein concentration was determined with Coomassie protein assay reagent (Pierce, Rockford, USA) according to the manufacturer's instructions. Enzyme assays were performed in three biological replicates. Figure 4 shows the results of enzyme activity measurements. The most common xylose reductase, from P. stipitis, had about the same reducing activity for xylose and arabinose while the novel reductases metARl and metAR2 had negligible arabinose reductase activity (Fig. 4).

Claims

1. An isolated polypeptide having xylose reductase activity but negligible arabinose activity, wherein said polypeptide has at least 60 % identity to the amino acid sequence shown in SEQ ID NO:2 or 4.

2. The isolated polypeptide according to claim 1, wherein said polypeptide has at least 70 % identity to the amino acid sequence shown in SEQ ID NO:2 or 4.

3. The isolated polypeptide according to claim 2, wherein said polypeptide has at least 80 % identity to the amino acid sequence shown in SEQ ID NO:2 or 4.

4. The isolated polypeptide according to claim 3, wherein said polypeptide has at least 90 % identity to the amino acid sequence shown in SEQ ID NO:2 or 4.

5. A nucleotide sequence encoding the polypeptide according to any of claims 1-4.

6. The nucleotide sequence according to claim 4, having at least 60 % identity to the nucleotide sequence shown in SEQ ID NO:l or 3.

7. A vector comprising a nucleotide sequence as defined in claims 5-6.

8. A host cell comprising a nucleotide sequence as defined in claims5-6 or a vector as defined in claim 7.

9. The host cell according to claim 8, wherein said host cell is selected from the group Escherichia coli, Zymomonas mobilis, Lactococcus sp., Lactobacillus sp., Saccharomyces sp., Schizosaccharomyces sp., Kluyveromyces sp., Pichia sp., Hansenula sp., Candida sp., Yarrowia sp., Dekkera sp., Aspergillus sp., e.g. A. oryzae, A. niger, or A. nidulans, Fusarium sp., Hypocrea (formerly Trichoderma) sp., or Penicillium sp.

10. Use of the polypeptide having xylose reductase activity according to any of claims 1 -4, the nucleotide sequence according to claim 5-6, the vector according to 7 or the host cell according to claims 8-9 for the production of fuels, bulk chemicals and platform chemicals from lignocellulosic feedstock.