+
Overview

Structural Survey of Antigen Recognition by Synthetic Human Antibodies

  1. Sachdev S. Sidhu2
  1. School of Pharmacy, University of Waterloo, Waterloo, Ontario N2G 1C5, Canada
  1. 2Correspondence: sachdev.sidhu{at}uwaterloo.ca
  1. 1 These authors contributed equally to this work.

Abstract

Synthetic antibody libraries have been used extensively to isolate and optimize antibodies. To generate these libraries, the immunological diversity and the antibody framework(s) that supports it outside of the binding regions are carefully designed/chosen to ensure favorable functional and biophysical properties. In particular, minimalist, single-framework synthetic libraries pioneered by our group have yielded a vast trove of antibodies to a broad array of antigens. Here, we review their systematic and iterative development to provide insights into the design principles that make them a powerful tool for drug discovery. In addition, the ongoing accumulation of crystal structures of antigen-binding fragment (Fab)–antigen complexes generated with synthetic antibodies enables a deepening understanding of the structural determinants of antigen recognition and usage of immunoglobulin sequence diversity, which can assist in developing new strategies for antibody and library optimization. Toward this, we also survey here the structural landscape of a comprehensive and unbiased set of 50 distinct complexes derived from these libraries and compare it to a similar set of natural antibodies with the goal of better understanding how each achieves molecular recognition and whether opportunities exist for iterative improvement of synthetic libraries. From this survey, we conclude that despite the minimalist strategies used for design of these synthetic antibody libraries, the overall structural interaction landscapes are highly similar to natural repertoires. We also found, however, some key differences that can help guide the iterative design of new synthetic libraries via the introduction of positionally tailored diversity.

INTRODUCTION

Monoclonal antibodies (Abs) are not only invaluable tools in biological research but are also increasingly used as therapeutic agents (Alfaleh et al. 2020). Abs can be generated through traditional hybridoma technology, a variety of single B-cell cloning technologies that rely on animal immunization (Lee et al. 2014; Pedrioli and Oxenius 2021), or through in vitro display technologies (i.e., phage, yeast, mRNA, bacterial, etc.) (Valldorf et al. 2022) that isolate Ab fragments from combinatorial libraries using in vitro selection techniques on purified proteins, virus-like particles, or even antigen-expressing cells (Nixon et al. 2019). For in vitro methods, the starting immune repertoires—whether natural naive (McCafferty et al. 1990; Marks et al. 1991) or immunized (Shen et al. 2007), or synthetic (Barbas et al. 1992) or semi-synthetic—play a crucial role in the performance of the library and the characteristics of the Abs that can be isolated from them. In particular, Ab libraries with paratopes constructed entirely from synthetic diversity have proven to be versatile tools for testing hypotheses regarding sequence diversity and function (Fellouse et al. 2004, 2006), for analyzing biomolecular interactions (Lee et al. 2004a,b; Bond et al. 2005), and for the selection and optimization of therapeutic Abs (Nelson et al. 2018; Pavlovic et al. 2018; Nabhan et al. 2023).

Because synthetic Ab libraries are completely naive, they are not subject to restrictions imposed by self-tolerance, which culls autoreactive B-cell clones from immunorepertoires to mitigate the risk of autoimmunity. Protein drug targets that are highly conserved across species can pose challenges in eliciting an immune response when used as antigens for hybridoma technologies that rely on immunization, because they are not recognized as foreign (Zhou et al. 2009). Synthetic libraries, on the other hand, suffer from no such constraints. From a more fundamental point of view, the ability to precisely control the design of synthetic diversity also offers a straightforward approach for facile characterization and continual, iterative improvement of synthetic library designs based on empirical observations.

Despite the enormous potential of synthetic Ab libraries, the field developed more slowly than phage display applications that use natural repertoires. For the most part, this was because the development of precisely designed synthetic repertoires is considerably more difficult than the cloning and exploitation of natural repertoires. In the latter case, the construction of libraries from natural sources requires sequence knowledge of immunoglobulin genes and an understanding of the fundamental principles of Ab structure and function to efficiently transfer the in vivo repertoire to an in vitro display system using molecular biology techniques (Vaughan et al. 1996; Lai and Lim 2023). On the other hand, synthetic repertoires are generated by chemical synthesis of degenerate oligonucleotides, and the introduction of this combinatorial diversity into Ab complementarity-determining regions (CDRs) requires both molecular techniques to capture this diversity and also extensive detailed knowledge of Ab structure and function and of the positional diversity that can generate effective Abs.

To contextualize this decades-long effort, we provide here an overview of the design principles and considerations that have been used to generate high-quality synthetic Ab libraries and discuss the advantages of the single-framework approach, also highlighting the use of this framework in approved therapeutic Abs. The systematic assessment of diversity and concurrent evolution is traced from the earliest minimalist libraries generated on a single framework to now, representing nearly 30 years of development. During this period, a growing number of crystal structures of synthetic antigen-binding fragments (Fabs) in complex with their target antigen have been published that provide exquisite details of how they achieve molecular recognition. Close inspection of these details and a comparison to those of natural Ab complexes can, in turn, be used to obtain insights and develop new design strategies to further improve synthetic repertoires (Burkovitz and Ofran 2016; Moreno et al. 2022). Toward this aim, we also surveyed the structural landscape of both natural and synthetic antibodies by analyzing the Fab–antigen interfaces of 50 distinct complexes to characterize the paratopes of each antibody. By comparing the two sets, we sought to contrast key features of both, including paratope size, framework, residue and CDR usage, and CDR length, sequence, and conformational diversity, to determine whether synthetic antibodies, designed in part using minimalist strategies (single-framework, fixed CDRs, restricted diversity), effectively recapitulate the usage of natural immune repertoires and eliminate or propagate development liabilities that can be found in natural antibodies. Similarities and differences between the two are discussed to understand how diversity is used. We also discuss preconceptions in design, limitations of this survey, and ways that synthetic diversity can be refined and suggest hypotheses that can inform and improve the design of next-generation synthetic Ab libraries.

SYNTHETIC ANTIBODY LIBRARIES

Principles of Library Design

Synthetic Ab libraries are constructed by introducing designed combinatorial diversity (using chemically synthesized, degenerate oligonucleotides) into the CDRs that are involved specifically in antigen recognition (Fellouse and Sidhu 2006). Thus, a major consideration in devising a synthetic library is the nature and valency of the Ab fragment that is used as the phage-displayed framework into which synthetic diversity is added. Most commonly, the entire Fab, the single-chain variable fragment (scFv) consisting of the light- and heavy-chain variable domains (VL and VH), or the VH domain alone is displayed in either mono- or bivalent formats, which can have critical influence over the properties of the Abs selected (Lee et al. 2004b; Sidhu et al. 2004). Beyond the basic molecular architecture of the displayed Ab, the framework (the portion of the Ab that is not subjected to diversification but rather serves as a stable scaffold for displaying diverse paratopes) is a key decision point in the type of library to be made or used. Many successful libraries have used a single framework consisting of a highly stable VH/VL pair (Silacci et al. 2005; Yang et al. 2009; Persson et al. 2013), whereas others have relied on multiple VH and VL frameworks in diverse combinations (Knappik et al. 2000; Rothe et al. 2008; Prassler et al. 2011; Tiller et al. 2013). The single-framework option simplifies library design and analysis but is thought to limit the diversity presented for selection. On the other hand, supposed diversity gains of libraries built with multiple VH/VL pairs may be negated by unfavorable framework combinations. Although frameworks generally provide structural stability for CDRs, different frameworks can exhibit differences in VH/VL packing, tilt angles, and interface area (Teplyakov et al. 2016), which can influence solubility, production yields, immunogenicity, and in vivo stability, to render some Abs to be poorly expressing, nonfunctional, aggregating, or difficult to engineer (Tiller et al. 2013). Alternatively, reducing library complexity by limiting the number of frameworks can facilitate the use of optimized frameworks and downstream engineering.

Insofar as the diversity of synthetic libraries is most often now introduced into the CDRs in a tailored fashion using mutagenic degenerate (Lee et al. 2004a,b; Sidhu et al. 2004; Fellouse et al. 2005) or trinucleotide (Virnekas et al. 1994; Knappik et al. 2000; Rothe et al. 2008; Persson et al. 2013) oligonucleotides, there is opportunity to devise innovative design strategies with the fundamental aim of generating a library that can yield high-affinity and specific Abs targeting a variety of epitopes on virtually any antigen. Perhaps the most widely adopted design philosophy has been to emulate natural repertoires, and this has been increasingly informed by attempts to capture the diversity of the human immune repertoire using high-throughput sequencing technologies (Glanville et al. 2009; Briney et al. 2019; Marks and Deane 2020). In these efforts, it has proven critical to balance the understanding of the diversity of naive repertoires found in the peripheral B-cell compartment with knowledge of the diversity of functional, mature Abs represented by the structural survey of natural Abs in complex with their antigens.

An appreciation of the liabilities that may impede development, even for natural Abs (Jain et al. 2017; Raybould et al. 2019), has spurred alternative design strategies to reduce the presence of residues that can adversely affect Ab function and properties (e.g., poor solubility, glycosylation sites, reactive cysteines, residues prone to oxidation or isomerization, etc.) and increase the likelihood of selecting Abs with favorable biophysical properties (high expression, solubility, and stability).

Finally, perhaps the most crucial property of a synthetic library is the choice of CDR residues targeted for diversification and the allowable positional diversity at each residue. Although a sufficient number of both diversified CDR positions and amino acids available at each position is required to randomize a library, theoretical diversity can be lost during library construction when it far exceeds the cellular transformation efficiency and, consequently, limits the diversity that can be captured during library construction. Thus, judicious and careful design of diversification strategies is recommended to focus diversity on positions that frequently participate in binding, and limit the presence of residues that can compromise folding or binding, or introduce developability liabilities. In short, successful designs should incorporate just enough diversity in the right positions and of the right kinds to enable the selection of stable Abs that recognize cognate antigens with high affinity and specificity and that are easily developable.

Trastuzumab Single-Framework Libraries

Based on the principles discussed above, numerous synthetic libraries have been developed and used with great success to develop Abs with high affinity and specificity against diverse proteins (Knappik et al. 2000; Silacci et al. 2005; Rothe et al. 2008; Yang et al. 2009; Shi et al. 2010; Persson et al. 2013; Tiller et al. 2013). Although a comprehensive survey of all synthetic Abs and the libraries from which they originated is beyond the scope of this overview, we have focused instead on a library design that was originally developed at Genentech using minimalist design principles (Fellouse et al. 2004, 2005, 2006, 2007; Sidhu et al. 2004; Persson et al. 2013), which, over the past two decades, has been used to develop specific, high-affinity Abs to a multitude of targets for a variety of diagnostic and therapeutic applications by our group and others (Paduch et al. 2013; Hornsby et al. 2015; Miersch et al. 2015, 2021). This library design is based on a single, optimized framework, which greatly simplifies the selection, analysis, and engineering of developable Abs suited for therapeutic purposes.

Synthetic Ab libraries at Genentech were developed with a single human framework derived from the consensus sequences of the most abundant human subclasses; namely, VH subgroup III and VL κ subgroup I (Kabat et al. 1991). This framework was originally used for the humanization of murine Abs (Carter et al. 1992; Presta et al. 1993, 1997; Werther et al. 1996), and of the 15 Abs that Genentech has had approved as therapeutics (including those developed in collaboration with others) (Lu et al. 2020; Wang et al. 2021), nine are based on this framework (Table 1). Specifically, the anti-ErbB2 Ab humanized 4D5 or trastuzumab (Carter et al. 1992) was chosen as the library scaffold because the trastuzumab Fab is well expressed in bacterial and mammalian cells and had been displayed previously on phage (Garrard and Henner 1993). Furthermore, the VH subgroup III Fab possesses a Protein-A-binding site that can facilitate engineering of the Fab on phage and purification from bacteria (Starovasnik et al. 1999), and high-resolution crystal structures of the Fab, alone (Eigenbrot et al. 1993) or in complex with antigen (Cho et al. 2003), were available to provide structural details to inform engineering efforts.

Table 1.

Genentech's approved antibodies in 4D5 framework

Using this framework, multiple libraries were developed that differed in the portion of Ab displayed for selection, the positions diversified, and the nature of diversification at different positions, aimed at exploring the basic principles of natural immune repertoire evolution and iteratively improving library design (Table 2). The earliest iterations of the library explored mono- and bivalent scFv versions of the 4D5 framework, introducing diversity to solvent-accessible positions using degenerate oligonucleotides that emulated the diversity observed in natural repertoires (Sidhu et al. 2004). A similar approach was taken in the subsequent iteration, which instead used mono- and bivalent Fab formats, exploring different CDR-H3 degeneracy and modest length diversity. These libraries yielded Abs with high affinities in the subnanomolar range, showing strong proof of concept that a single-framework library with restricted diversity could achieve the affinities observed from natural sources (Lee et al. 2004a).

Table 2.

An overview of minimalist libraries based on the 4D5 framework

To further explore restrictions with bivalent Fab display, tetranomial diversity was introduced into the CDRs of heavy chains (Fellouse et al. 2004), with some length diversity in CDR-H3. Although these results revealed that specific binding clones could be successfully obtained from restricted diversity libraries, naive clones exhibited affinities only in the micromolar range. However, affinity maturation of naive clones with tetranomial diversity in solvent-exposed positions of the light-chain CDRs resulted in specific clones with low nanomolar affinities whose paratopes were dominated by Tyr, suggesting the importance of this residue in mediating recognition (Fellouse et al. 2006). Natural Abs are known to exhibit strong biases for particular amino acids within the CDR loops, including Tyr and Ser residues, with Tyr sidechains contributing to a disproportionate number of antigen contacts relative to other residues (Mian et al. 1991; Padlan 1994; Davies and Cohen 1996).

Based on these findings, extreme restriction of diversity was explored by randomizing similar solvent-exposed positions in the heavy-chain CDRs with only Tyr and Ser. This confirmed that specific clones could be obtained with affinities in the low nanomolar range (Fellouse et al. 2005). From this foundation, binary code diversity was expanded into CDR-L3, which resulted in further improvements in affinity (Fellouse et al. 2005). Further expansion of diversity into nonparatope residues that were not solvent-exposed but could provide structural diversity for CDRs, together with the use of tetranomial diversity in CDR-H3, enabled the isolation of clones with single-digit nanomolar affinities, comparable to the affinities obtained from natural sources (Fellouse et al. 2006). Systematic addition of tailored diversity and additional refinement to CDRs L3 and H3 resulted in subsequent iterations of the library from which specific, high (single-digit to subnanomolar) affinities could be routinely obtained (Fellouse et al. 2007; Birtalan et al. 2008; Persson et al. 2013; Hanna et al. 2020; Nilvebrant et al. 2021).

STRUCTURAL ANALYSIS OF SYNTHETIC ANTIBODIES

The current, state-of-the-art synthetic Ab libraries (Fellouse et al. 2004, 2005, 2006; Persson et al. 2013) have been designed on two basic assumptions, which are directly informed by minimalist design principles that reflect the dominant functional usage of natural immune repertoires rather than the full diversity of human repertoires: first, that these libraries would yield Abs that interact with the antigen in a manner similar to that of natural Abs, preferentially using the same CDR positions and the same types of amino acids that are used to engage the antigen in a manner that is conformationally similar to that observed in nature, and second, that by limiting or eliminating altogether residues in natural repertoires that can impede therapeutic development, the developability of Abs isolated from the library is considered and ensured at the design stage.

Now, after more than two decades of successful development of Abs from synthetic libraries and the elucidation of many high-resolution Ab–antigen structures, we can examine whether this is the case and validate whether the design principles assumed are generally effective or require further optimization. Importantly, we can now explore whether there are unexpected interactions formed by synthetic Abs, positional preferences that can be exploited, conformational limitations imposed by the use of a single framework, or design elements that can be dispensed with if observed to be underused when designing the next generation of synthetic libraries. In this section, we review these questions by structural comparison of the published structures of 50 synthetic and 50 natural Ab–antigen complexes. In doing so, we provide a broad structural survey of the observed interactions and conformations formed by single-framework synthetic Abs, compare these to those formed by natural Abs, and illustrate the iterative design principles that can help to optimize future libraries.

Data Set of Structures of Synthetic and Natural Ab–Antigen Complexes

To obtain a general overview of the structural details of antigen recognition by synthetic and natural Abs, two data sets were compiled, each containing approximately 50 structures for one of the classes (synthetic or natural) (Table 3). To ensure a meaningful and consistent analysis of the Ab–antigen interactions, the structures had to meet the following criteria: (1) X-ray crystal structures with resolution of <3.5 Å, (2) Ab–antigen interface of >500 Å2 on each side, (3) Abs in Fab format bound to protein antigen, and (4) Abs containing a κ VL. Although this latter restriction limits the scope of the study to only Abs bearing the κ VL, it offers an opportunity to probe theoretical questions of relevance to the field of Ab structure and function and provides practical insight into the principles of library design. To eliminate potential bias, whereas the 50 structures analyzed for each data set corresponded to 50 unique Abs that used κ VL, they otherwise sampled a variety of frameworks and possessed no significant similarity in CDR sequences.

Table 3.

List of PDB structures selected for the comparison of synthetic and natural Abs

To compile the synthetic Ab set, the Protein Data Bank (PDB) was manually searched to identify structures that met the above criteria and were derived from trastuzumab-based synthetic libraries designed by our group and others that used similar strategies for library construction (Paduch et al. 2013; Hornsby et al. 2015; Miersch et al. 2015, 2021). The resulting data set contained 50 unique Abs bound to 37 different antigens, including diverse proteins from humans, yeast, bacteria, and viruses (Fig. 1A; Table 3).

Figure 1.Figure 1.
View larger version:
    Figure 1.

    Alignment of VH and VL sequences for the antibody (Ab) data sets. Alignments are shown for synthetic Abs (A) and natural Abs (B). Only positions identified to be important for antigen recognition based on a minimum 0.5% contribution in the paratope in synthetic and/or natural Abs are shown. Positions with ≥90% or 70%–90% conservation across the 50 structures chosen per data set are shaded dark or light gray, respectively. Complementarity-determining regions (CDRs) are indicated, according to the existing IMGT definition, by bold lines above the position numbers and numbered according to the IMGT nomenclature (Lefranc et al. 2003). Bold lines below the position numbers indicate alternate CDRs, defined based on the analysis in Figure 2 as continuous stretches of amino acids containing positions that contribute to antigen interaction based on a minimum 0.5% contribution in the paratope in >5% of structures analyzed from the synthetic and/or natural antibody sets. Asterisks indicate residues found outside of CDR definitions that nevertheless contribute to antigen interaction in >5% of structures analyzed.

    For the assembly of a complementary natural Ab data set, the Structural Antibody Database (SAbDab) was searched to compile a list of structures of human Abs in complex with protein antigens, restricting to those that, like trastuzumab, contained the γ-1κ framework but otherwise allowing for a diversity of heavy-chain frameworks, as shown in Table 3 (far right). The list was then manually curated to select structures with natural Abs isolated from human B cells, eliminating all Abs generated by synthetic methods, humanized Abs, and Abs with unclear origins. The final set contained 50 unique Abs bound to 36 unique antigens, among which a maximum of two Abs was allowed for each antigen to ensure a diverse set of interactions (Fig. 1B; Table 3). For example, although there were numerous structures corresponding to Abs in complex with the receptor-binding domain of SARS-CoV-2, only two of these Abs were chosen for analysis. In summary, eight of the 50 natural Abs possessed the same VH subgroup III and VL κ subgroup I frameworks as trastuzumab, whereas 10 VH III and 12 VL κ I were partnered with other frameworks, and all other frameworks were represented at least once, with VH subgroup III being the most abundant, followed by subgroup I. Similarly, all other VL subgroups were represented at least once in the set of natural Abs, with VL subgroups I and III being equally abundant.

    Analysis of Ab Residues Involved in Antigen Interaction

    To characterize the Ab residues involved in antigen interaction, we determined the structural paratope for each Ab. Although the structural paratope is often defined as the solvent-accessible surface area (SASA) that is buried upon complex formation with the antigen (Richards 1977; Reis et al. 2022), a definition of an interatomic distance between Ab and antigen residues of <4.0 Å is also used to identify interacting residues (McConkey et al. 2003; Akbar et al. 2021) between the Ab and the antigen. Although the latter provides more detailed information regarding Ab–antigen interactions in the interface, it can exclude elements of the molecular surface that are nonetheless important for understanding how the paratope contributes to Ab properties. Given the ease of analyzing SASA, the fact that it captures a more comprehensive view of the paratope, and its widespread use by others (Chen et al. 2013; Mitternacht 2016; Hebditch and Warwicker 2019; Myung et al. 2023), this method, expressed mathematically below, was used exclusively:Formula

    Each structure was thus analyzed using the GETAREA webserver (https://curie.utmb.edu/getarea.html) (von Freyberg et al. 1993), which further enables calculation of SASA per residue. Calculation of the SASA at the residue level is a validated analytical method for determining SASA (Fraczkiewicz and Braun 1998), which has been used and referenced extensively as a viable means for performing these calculations (Xu and Zhang 2009; Al Mughram et al. 2021; Sraphet and Javadi 2022) and is superior to solvent-excluded surface-based methods (Cai et al. 2011).

    In cases of multiple instances of Fab–antigen complexes within the asymmetric unit, a representative complex was chosen for analysis, and any crystallographic contacts between the complexes were ignored. The calculation was first performed on each complete PDB file, which calculated SASA for each residue in the Fab in the presence of the antigen. The PDB file was then edited to remove coordinates that corresponded to residues in the antigen and was analyzed again in the same way to calculate SASA in the absence of antigen. The structural paratope was then calculated as the change in SASA of each Fab residue upon complexation with the antigen.

    This analysis revealed that the synthetic and natural Ab sets matched well in terms of the size range of structural paratopes. For the synthetic Abs, the sizes ranged from 555 to 1422 Å2, with an average of 913 Å2, whereas for the natural Abs, the sizes ranged from 536 to 1388 Å2, with an average of 833 Å2. Figure 2 summarizes the relative contribution of each position in the Ab to the structural paratope averaged over 50 structures each for synthetic and natural Abs. Positions are not shown if they did not contribute to antigen interaction or if they were only present in a small number of structures due to CDR length variability. Although CDR definitions have been devised based on structural and sequence analysis of Abs and Ab complexes, it has been acknowledged that these definitions (Kabat, Chothia, IMGT) oversimplify, do not always accurately capture interface residues, and should be taken as an approximation of the paratope (Sela-Culang et al. 2013; Dondelinger et al. 2018). Thus, to more accurately compare the loop lengths and positions found within the paratope for the purposes of devising strategies for engineering functional Abs, the CDRs were defined here as continuous stretches containing amino acids that are found in the paratope, according to the SASA analysis, in at least some of the structures. Notably, similar approaches have been used by others to circumvent disparities between CDRs and paratopes (Kunik et al. 2012; Stave and Lindpaintner 2013), and ultimately, our analyses selected the same residues in the natural and synthetic Ab sets (see Fig. 3).

    Figure 2.
    View larger version:
      Figure 2.

      Relative contributions of antibody (Ab) residues to the structural paratope for the analyzed synthetic and natural Abs. Data are plotted for light-chain variable (VL) domains (A) and heavy-chain variable (VH) domains (B). At each position (x-axis), the black bars (left y-axis) show the percentage of contributing residues (contributing residues %), and the white bars (right y-axis) show average relative contribution to binding interface (average contribution %). The contributing residue percentage was calculated by dividing the number of structures with ≥0.5% relative contribution to the structural paratope at position X by the total number of structures available for position X. Average contribution percentage was obtained by calculating relative contribution to the structural paratope at position X for each structure and then averaging these values. Note that the relative contribution values for each position are not normally distributed but follow a strong positive skew distribution, with a significant proportion of values being 0 and standard deviations exceeding the average values. Thus, the average values shown are not meant to represent the most observed values for a given position but rather provide some measure of comparison between synthetic and natural data sets. Only positions with the contributing residues ≥5% and that have residues in ≥5% of structures are shown. Complementarity-determining regions (CDRs) are indicated and are defined as the continuous stretches of amino acids containing positions that contribute to the structural paratope in ≥5% of structures analyzed. The total number of structures that contain a residue at a given position is indicated in parentheses. Positions are numbered according to the IMGT nomenclature (Lefranc et al. 2003).

      Figure 3.
      View larger version:
        Figure 3.

        Relative contributions of complementarity-determining regions (CDRs) to the structural paratope for the analyzed synthetic and natural antibodies (Abs). The CDRs were defined based on the analysis described in Figure 2, and the positions assigned to each CDR are indicated on the x-axis. For each CDR (x-axis), the black bars (left y-axis) show the percentage of contributing residues (contributing residues %), and the white bars (right y-axis) show average relative contribution to the structural paratope (average contribution %). The contributing residue percentage was obtained by calculating the total number of residues with ≥0.5% relative contribution to the structural paratope within a given CDR in the 50 structures and dividing it by total number of residues within the same CDR in the 50 structures (values are shown above the bars for clarity). The average contribution percentage was obtained by calculating relative contribution of a given CDR to the structural paratope in each structure and then taking an average over 50 structures. The values of averages ± standard deviation are shown above the bars. Note that large standard deviations are reflective of strong positive skew of the distribution, where the majority of values are lower than the average and even include some 0 values. The same analysis was performed for all positions outside the CDRs, which were grouped as light-chain variable (VL) other and heavy-chain variable (VH) other for VL and VH domains, respectively.

        For each residue in the structural paratope, its contribution to antigen recognition was measured with two parameters. First, we calculated the percentage of structures where a residue at position X contributed to the interaction interface (Fig. 2, contributing residues %, black bars). To avoid the noise introduced by very minor interactions, a residue was only counted to contribute to antigen interaction if its relative contribution to the structural paratope was ≥0.5%. For example, if 25 out of 50 structures contained a residue at position X with ≥0.5% relative contribution to the structural paratope, the contributing residue percentage value at position X was calculated to be 50%. Second, we calculated the relative contribution to the structural paratope at position X averaged over the 50 structures (Fig. 2, average contribution %, white bars). For example, if at position X, a residue contributed 5% to the structural paratope in Structure 1 and 3% to the structural paratope in Structure 2, then the average contribution percentage value at position X was calculated to be 4% (this example is for two structures, but analysis was performed for all 50 structures). Overall, we saw good correlation between these two measures in Figure 2, insofar as the more frequently a residue was observed to contribute to the paratope, the greater its average contribution, with correlation coefficients of R > 0.85 both for VH and VL, across natural and synthetic Abs. Although these measures reflect the general importance of a given position in the structural paratope, they nevertheless provide different information. The former describes which residues tend to be used and how frequently in different structures, whereas the latter measures the contribution each residue tends to make to the overall paratope within a structure.

        The analyses described above allowed us to directly compare the structural paratopes of the curated synthetic and natural Ab data sets (Fig. 2). Remarkably, we found that, at the level of individual positions, the interaction landscapes were very similar. More specifically, the degree to which each position tends to contribute to the structural paratope in the synthetic Abs closely mirrors natural Abs, with some exceptions. In the VL domain (Fig. 2A), CDRs L2 and L3 are especially similar between the two data sets, with positions 55, 56, and 66 in CDR-L2 and positions 107–114 in CDR-L3 dominating antigen interactions. CDR-L1 interactions are less similar, with natural Abs tending to have longer CDR-L1 loops and position 28 playing a more important role in synthetic Abs, whereas the opposite is true for position 38. This is a likely a consequence of synthetic library design that eschewed diversification in L1 (Persson et al. 2013) to focus it on the more frequently used CDR-L3. In the VH domain (Fig. 2B), positions 35–38 in CDR-H1; positions 57, 59, 62, 64, and 66 in CDR-H2; and positions 108–113 in CDR-H3 dominate antigen interactions in both data sets. However, positions 58 and 65 in CDR-H2 play a more prominent role in natural Abs, which is likely a consequence of the diversity designs used in the synthetic libraries.

        Consistent with the overall similarity at the level of individual positions, a more comprehensive look at the relative contributions of individual CDRs toward antigen interaction (Fig. 3) reveals a similar picture for the synthetic and natural Abs. Based on the number of residues contributing to the structural paratope versus the total number of CDR residues over the 50 structures (i.e., the contributing residue percentage), the highest proportion of residues used is found in CDRs H3, H2, and L3, in which nearly half of the their CDR residues participate in the paratope, whereas only roughly one-quarter of residues in CDRs H1 and L2 are used, and usage is closely mirrored between synthetic and natural Abs. In considering the average contribution percentage of the CDR to the overall paratope, CDR-H3 is, as expected, dominant, contributing 35%–40% of the overall paratope, followed by CDR-H2, which uses roughly half that. CDRs H1, L3, L2, and L1 constitute an average of only 5%–15% of the overall CDRs, again with no substantial differences in usage between natural and synthetic Ab sets. Given the drastically different origins of the synthetic and natural Abs, the level of conservation in CDR usage observed between the two data sets is unexpected. Detailed examination of small differences that are observed will be helpful to inform the design of future naive synthetic libraries and libraries for affinity maturation.

        To determine whether additional opportunities for novel library design strategies exist, we explored whether residues outside the conventionally defined CDRs are involved in antigen recognition. Figure 2 shows that in both synthetic and natural data sets, there are few positions outside the CDR boundaries that contribute to antigen recognition. However, these are worth mentioning, as they present opportunities to mediate antigen recognition through residues outside the CDRs that are normally diversified. The position outside of the traditional CDR definitions with the most significant paratope contribution is VL position 80, which lies between CDRs L2 and L3 and corresponds to the recently identified CDR-L4 (Kelow et al. 2020). We observed this, however, only in the synthetic Abs, which tend to have an Arg residue at this position (Figs. 1 and 2). In natural Abs, on the other hand, position 80 is often a Gly residue, and although it is sometimes involved in antigen recognition, it makes contributions to the paratope much less frequently, presumably due to its lack of a sidechain. In addition to residue 80, other residues in this loop can be found in the paratopes of both natural and synthetic Abs. In the heavy chain, residues 82 and 83 of the H4 loop contribute to some natural paratopes, whereas synthetic Abs do not make use of this loop (Fig. 2B). Interestingly, both heavy and light chains make use of residues at the base of the ascending D strand, distal to the DE loop, and the two N-terminal positions of both the VH and VL domains are observed to be included in paratopes in both sets of Abs, albeit in rare instances. Each of these observations suggest additional opportunities for diversification outside of defined CDRs in future libraries.

        Core Structural Paratope

        For a more detailed look at the most crucial positions involved in antigen recognition, we defined a core structural paratope and examined which amino acids tend to contribute to antigen recognition at these positions (Fig. 4). The core positions were defined as those that are present in the paratope of at least 60% of structures and contribute to antigen recognition (relative contribution to binding interface ≥0.5% of structural paratope area) in at least 20% of structures. This stringent definition excludes positions that are not present in a majority of analyzed paratopes and eliminates those that make only minor contributions in few structures. The use of change in SASA as a means of defining a core paratope has been previously validated by determining and demonstrating strong correlation with the confidence level of protein–protein interactions for surface atoms (Peng et al. 2014). Thus, these criteria were used to define the core structural paratope in synthetic Abs, and the same positions were analyzed within the natural Abs for comparison.

        Figure 4.
        View larger version:
          Figure 4.

          Core structural paratope of the analyzed synthetic antibodies (Abs). The core structural paratope of the analyzed synthetic Abs was defined as positions that are present in ≥60% of the selected structures and contribute to antigen recognition in ≥20% of the structures. The same positions in natural Abs are shown for comparison. (A) For each position in the core structural paratope of synthetic Abs, the amino acid distributions are shown for synthetic Abs (top) and natural Abs (bottom). Amino acids that are present at ≥10% frequency are shown separately (colored bars), whereas amino acids present at <10% frequency are grouped together as “other” (gray bars). The length of each bar is proportional to the frequency, and the total length of the bars at each position is equal to 100%. (B) The core structural paratope of synthetic Abs mapped onto the structure of the trastuzumab antigen-binding fragment (Fab) (Protein Data Bank entry 1N8Z). The size of each sphere is proportional to the frequency at which residues at that position contribute to antigen recognition in synthetic Abs (left) and natural Abs (right).

          Comparison of the types of amino acids involved in antigen recognition at different positions between synthetic and natural Ab data sets revealed similarities and differences (Fig. 4A). In both sets, Ser and Tyr residues contribute to antigen interaction at most positions, which is especially true for positions outside CDR-H3. However, Trp residues, which often contribute to antigen interaction in synthetic paratopes, are much rarer in the examined natural paratopes. This is especially evident upon comparison of CDR-H3 loops and is an important consideration given prior observations that Trp, although capable of mediating high-affinity interactions (Birtalan et al. 2010), can be detrimental to specificity (Birtalan et al. 2010; Kelly et al. 2018) and may also be omitted from libraries without compromising affinity (Kelly et al. 2018).

          Mapping of the core paratope onto the Fab structure revealed that the examined synthetic and natural Abs are very similar in terms of the relative contributions of individual paratope residues to antigen recognition (Fig. 4B). However, synthetic Abs relied more heavily on aromatic residues (predominantly Tyr and Trp) for antigen recognition, whereas, in natural Abs, greater positional diversity was observed, and aromatic residues played a lesser role.

          Considering that the evaluated synthetic Abs were derived from libraries built with the trastuzumab framework and were diversified at specific positions with tailored ratios of amino acid subsets, it is not surprising to see differences in amino acids engaged in antigen recognition at some positions. At the same time, it is remarkable that other positions tend to use the same amino acids for antigen engagement within both data sets, suggesting some selection pressure for these residues. Both the differences and similarities highlighted by our analyses may help inform the design of future synthetic Ab libraries with desirable properties. For example, shifting the diversification schemes away from aromatic residues toward more polar amino acids observed within natural Abs may result in libraries with more hydrophilic paratopes, contributing to a higher yield of Abs with good developability properties. Furthermore, fixing certain positions toward a particular amino acid that tends to prevail in antigen recognition within natural Abs may produce libraries with more focused diversification schemes and with a better chance of generating functional Abs.

          Length and Composition of CDRs L3 and H3

          To further compare the synthetic and natural Ab data sets, we focused on CDRs L3 and H3 (Fig. 5). In both natural and synthetic Abs, these CDRs are the most variable in terms of length and amino acid composition (Persson et al. 2013), and this observation holds true in both the natural and synthetic Ab data sets. Overall, the CDR-3 length distributions exhibited similar patterns between the synthetic and natural Ab data sets (Fig. 5A). In the case of CDR-L3, six-residue loops dominated both data sets, accounting for 66% and 62% of synthetic and natural Abs, respectively. The lengths of CDR-H3 loops were more variable, with loops ranging from eight to 22 residues being present in both data sets. However, there were biases for some loop lengths in each data set, with 12-residue loops and 14- or 17-residue loops being highly prevalent in synthetic or natural Abs, respectively. Notably, most natural and synthetic CDR-L1 loops contained five residues, but a substantial number of natural CDR-L1 loops contained six residues and some contained nine, 10, or 11 residues. Interestingly, this difference arises from fixing both the length and sequence of CDR-L1 in synthetic libraries. In contrast, CDR-L2 displayed highly similar length diversity in both Ab sets despite being fixed in synthetic libraries, suggesting that length diversity, as designed, is dispensable here.

          Figure 5.
          View larger version:
            Figure 5.

            Analysis of complementarity-determining regions (CDRs) L3 and H3. (A) Distribution of CDR-L3 lengths (positions 107–116, left) and CDR-H3 lengths (positions 106–117, right) for the selected synthetic and natural antibodies (Abs). (B) Amino acid composition of CDR-L3 (positions 107–114, left) and CDR-H3 (positions 107–113, right) for the selected synthetic and natural Abs. The percent abundance within the CDR is shown for each amino acid, and the amino acids are ordered from the most to least abundant within the synthetic Abs.

            Consistent with the above analysis of structural paratope residues, comparison of the amino acid composition within CDR-3 loops of natural and synthetic Abs revealed both similarities and differences (Fig. 5B). Within CDRs L1 and L2, the amino acid frequencies are very similar at most positions despite being fixed within the synthetic libraries, suggesting strong natural selection of these residues, with the exception of residues 56 and 68 in CDR-L2, which display significant diversity in the paratopes of natural Abs. Within CDR-L3, Ser and Tyr are the most abundant amino acids in both synthetic and natural data sets, and Asn is third most prevalent in natural Abs but is rare in synthetic Abs. Interestingly, despite the prevalence of Asn residues in the CDR-L3 loops of natural Abs, there are no glycosylation sites in the CDR-L3 loops of any of the 50 Abs in our data set (Fig. 1). In CDR-H3 loops, Gly and Arg are abundant and are observed with similar frequencies between the synthetic and natural data sets, though Gly is prevalent in the other CDRs of natural Abs. The aromatic amino acids Tyr and Trp dominate CDR-H3 structural paratope positions in the synthetic Abs (Fig. 4). Conversely, the negatively charged amino acid Asp is fairly common within the CDR-H3 loops of natural Abs but is absent in the synthetic Abs, and overall, Asp appears to make a greater contribution to the paratopes of natural Abs relative to their synthetic counterparts. Notably, Cys residues were found to be least frequent in CDRs L3 and H3 in both data sets. This suggests that during evolution of natural Abs, there is a strong selective pressure against unpaired Cys residues that could form spurious disulfides, and in synthetic Abs, Cys residues are excluded by design.

            Canonical Conformations of CDR Loops

            As Ab structures began to accumulate, the structural analysis of CDRs revealed that they adopt a small number of main-chain conformations that were classified as canonical structures (Chothia and Lesk 1987). As more structures have accumulated, the original classification has been updated, expanded, and refined (Martin and Thornton 1996; Al-Lazikani et al. 1997; North et al. 2011; Kelow et al. 2022), and web tools now facilitate CDR analysis and assignment to “standard conformations” (Adolf-Bryfogle et al. 2015).

            The different approaches to generating synthetic libraries, whether on a single framework or on a multitude of frameworks, raises an interesting question regarding conformational representation. Although it has been argued intuitively that single-framework libraries “lose the structural diversity present across the different frameworks of natural Abs” (Rothe et al. 2008) and that their “structural diversity does not approach that of other naive libraries” (Knappik et al. 2000), these statements have not been supported with evidence.

            To explore this question, the canonical cluster assignments were obtained for each of the CDRs in the data sets and analyzed for comparison of the diversity of conformations represented in the natural and synthetic Abs using PyIgClassify2 (Adolf-Bryfogle et al. 2015; Kelow et al. 2022). Plots of the assigned conformations versus the number of times they were observed in each data set were determined for each of CDRs (Fig. 6). In CDRs L1, L2, and H1, clustering was similarly dominated by a single conformation in both data sets, suggesting that conformational diversity in these CDRs is limited despite the variety of the frameworks, and that CDRs L1 and L2 are fixed for length and sequence in most analyzed synthetic Abs. Notable differences were observed in CDR-L1, which in addition to the dominant L1-11-1 conformation, had a preference for L1-12-1 or L1-ll-* in natural or synthetic Abs, respectively. In CDR-H2, while the H2-10-1 conformation was the most represented in both sets, the natural set also exhibited similar representation of H2-9-1 and H2-10-1 and lesser but significant representation from H2-9-*, H2-10-*, and H2-10-2, suggesting some conformational limitations in CDR-H2 of synthetic Abs. Remarkably, CDR-L3 in both the synthetic and natural sets is dominated by the L3-9-cis7-1 conformation, with >40% of the Abs in each set assigned as such. The remaining diversity was dispersed throughout a wide variety of clusters, both overlapping and unique for each set. In contrast, the most observed CDR-H3 clusters in the natural and synthetic sets were H3-18-* or H3-13-*, respectively, indicating different loop length preferences. Beyond that, no other cluster was dominant, and each set displayed diversity of conformational clusters with overlapping and distinct clusters in each.

            Figure 6.
            View larger version:
              Figure 6.

              Classification of antibody (Ab) complementarity-determining regions (CDRs) into canonical clusters. Canonical cluster assignments for all six CDRs were determined from the Protein Data Bank files for each of the Ab:Ag complexes using PyIgClassify 2 (Kelow et al. 2022). The number of times a cluster ID (x-axis) was observed in the structure set was tallied (y-axis) and plotted for each assignment for both the synthetic and natural Ab sets.

              Contrary to intuitive assertions (Knappik et al. 2000; Rothe et al. 2008), the results of our quantitative survey of canonical structures show that Abs derived from a single-framework library are not conformationally constrained relative to the 13 different framework pairs represented in the natural Ab set. Recognizing that these synthetic libraries have generated Abs to thousands of structurally different antigens, these observations are in accord with empirical success. Overall, these findings suggest that single-framework libraries appear to possess no immunological blind spots that limit the breadth of antigens that can be targeted with high affinity and specificity.

              Limitations

              Although our survey provides the most extensive structural comparison of natural and synthetic Abs to date, inherent biases in the data sets chosen for analysis limit the broader interpretation of the results. First, the exclusive use of the κ-light-chain isotype in the synthetic libraries chosen for analysis skews the data and precludes application of the findings to Abs containing the λ light chain. Several groups have noted differences in the biophysical, sequence, and structural properties of κ- and λ-light-chain Abs (DeKosky et al. 2016; Townsend et al. 2016; Raybould et al. 2019), and thus there is no expectation that the observations and conclusions made here would extrapolate beyond other κ Abs. The incorporation of λ frameworks into other synthetic libraries (Knappik et al. 2000; Rothe et al. 2008; Prassler et al. 2011), however, creates a similar opportunity to structurally compare natural versus synthetic λ-light-chain Abs to determine whether biophysical and structural differences observed in nature are maintained or altered based on the synthetic diversity introduced. Second, though the set of κ natural Abs sourced for this study is small relative to the available κ Ab structures found in the PDB, the broad diversity of CDR-3 loop lengths, sequences, and conformations observed in the set analyzed suggests that any unintentional biases introduced by structure complex selection do not appreciably skew the data obtained for the natural set of Abs. Last, by focusing strictly on a structural approach to analyzing the success of library design, this study is biased toward functional, high-affinity Abs. However, this provides little insight into the physicochemical behavior of the Abs obtained. Early iterations of the library provided tight, specific binding clones that were, however, dominated by Tyr in the paratope. Likely, such a clone would be poorly developable, and thus alternate measures of library success that are not captured by this methodology would also be needed to provide a more comprehensive assessment of library quality.

              CONCLUSIONS AND PERSPECTIVES

              To date, natural Abs have dominated the therapeutic and structural landscape, with most Ab therapeutics and structures published being derived from natural rather than synthetic sources. Inarguably, the lessons learned from the sequencing of natural Ab repertoires have informed the design of synthetic libraries, and most synthetic approaches tend to emulate natural Ab repertoires to varying degrees (Knappik et al. 2000; Sidhu et al. 2004; Zhai et al. 2011). Early designs focused on emulating Ab sequences from genomic sources (Johnson and Wu 2000; Sidhu et al. 2004), presumably with varying degrees of functional and structural adaptation. More recently, herculean efforts to sequence the peripheral IgM+ B-cell compartment (containing a mix of naive, memory, and plasma cells) (Glanville et al. 2009) continue to inform Ab library design and reflect both naive unadapted and functionally adapted repertoires. These approaches have undoubtedly achieved the first aim of a synthetic library—the development of synthetic repertoires that closely resemble natural repertories—and enable the generation of specific and high-affinity Abs targeting highly diverse antigens.

              Insofar as natural Abs can also exhibit developability liabilities (Jain et al. 2017; Raybould et al. 2019), design strategies aimed at optimizing synthetic libraries for therapeutic applications must also consider which elements of the natural repertoire are to be emulated and which are not. Thus, from a structural perspective, although it is important to determine whether the diverse landscape of Ab interactions in which natural Abs participate is reflected in the paratopes of synthetic Abs, this, however, reflects only the functional side of the balance between function and developability that ensures that a functional Ab is of sufficient quality for therapeutic applications. As synthetic libraries are increasingly used for therapeutic development, design strategies must also ensure that the properties possessed by the Abs are amenable to development; in other words, that they can be produced in high yield and in soluble and monodisperse form, and that they possess low nonspecific or self-binding that could otherwise compromise manufacturing or good pharmacokinetic behavior.

              With these issues in mind, we described here a detailed structural analysis of 50 Fab–antigen complexes with synthetic Abs derived from the minimalist, single-framework libraries described here. By conducting the same analysis on a set of 50 natural Fab–antigen complexes that included Abs with diverse frameworks found in human Ab repertoires, the objective was to assess a variety of interface measures that would enable comparison of natural and synthetic paratopes. By comparing the participation of specific positions, amino acid types, and CDRs, we aimed to obtain insights that could be used to devise strategies for the design of new iterations of already highly optimized synthetic libraries.

              Overall, the structural interaction landscapes were remarkably similar between the synthetic and natural Ab data sets, given the highly restricted nature of these minimalist libraries. Natural and synthetic Abs used similar positions with similar trends in frequency observed (Fig. 2) and made similar use of and contribution to the paratopes (Fig. 3). Differences and similarities in the observed positional diversity between natural and synthetic Abs suggest opportunities for library refinement and strategies for optimization to enhance or reduce “naturalness” where appropriate (Figs. 4 and 5). An unexpected result of these analyses, however, is the largely equivalent use of conformational diversity in the two Ab sets, suggesting that the single-framework approach to synthetic library generation does not constrain conformational diversity, as has previously been asserted (Fig. 6).

              In conclusion, our structural analyses suggest that Abs derived from synthetic Ab libraries with a trastuzumab framework interact with antigens in a manner similar to natural Abs but with some key distinctions. We conclude that, in many aspects, synthetic Ab libraries successfully recapitulate much of the structural landscape of natural Ab–antigen interactions, both in the way that CDRs are used in paratopes and in the conformational diversity exhibited by CDRs. The similarities and differences that were observed between the two sets, particularly with regards to residue usage, provide a rational path to library optimization, but as has been emphasized previously, there is a trade-off between specificity, affinity, solubility, and stability, and thus the priorities of any design strategy must be carefully considered. Structural analysis nevertheless provides a critical reflection on the quality of synthetic Ab libraries, and the insights obtained here provide ample suggestions on ways to further improve both the functional performance and developability of the Abs they generate.

              Footnotes

              • From the Advances in Phage Display collection, edited by Gregg J. Silverman, Christoph Rader, and Sachdev S. Sidhu.

              REFERENCES

              | Table of Contents
              点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载