Main

Antibodies are the dominant class of protein therapeutics, with over 160 antibody therapeutics currently licensed globally and a market value expected to reach US$445 billion in the next 5 years4. Antibody development generally proceeds in two stages: (1) the discovery of antibodies that bind to a specific epitope; and (2) the subsequent affinity maturation and clinical optimization of those antibodies. Currently, identifying epitope-specific antibodies relies on animal immunization or screening of antibody libraries to identify candidate molecules that bind to a desired target, followed by subsequent epitope mapping. These methods are laborious, time-consuming and can fail to identify antibodies that interact with the therapeutically relevant epitope1. Efforts at computational design of antibodies have generally focused on the second optimization step of antibody development, such as sampling alternative native CDR loops to improve affinities5,6 or using Rosetta7 sequence design to improve the interacting regions. More recently, structure-based and sequence-based deep learning networks have been trained to design novel antibody sequence variants8,9,10, but these methods require an initial binding antibody from which to optimize. There have also been recent advances in antibody optimization with deep learning methods trained on data generated by powerful new experimental methods11,12. By contrast, computational methods able to perform the first stage of antibody design (generating epitope-specific binding antibodies) do not exist, and de novo (no homology to an existing antibody targeting that epitope) design of antibodies therefore remains an unsolved problem. There has been rapid progress in designing binding proteins (not antibodies) using RFdiffusion2,13,14. However, as with other methods for de novo interface design15,16,17, these binders almost exclusively rely on regular secondary structure-based (helical or strand) interactions with the target epitope, and the original (‘vanilla’) RFdiffusion network is therefore unable to design antibodies de novo (Supplementary Fig. 1; see ref. 18).

An ideal method for designing de novo antibodies would enable: (1) targeting of any specified epitope on any target of interest; (2) focusing of sampling on the CDR loops, keeping the framework sequence and structure close to a user-specified highly optimized therapeutic antibody framework; and (3) sampling of alternative rigid-body placements of the designed antibody with respect to the epitope. We hypothesized that a specialized version of RFdiffusion fine-tuned on antibody structures should be capable of designing de novo CDR-mediated interfaces, given the diversity and quality of de novo interfaces that RFdiffusion can design and given that the underlying thermodynamics of interface formation are the same, and set out to develop such a method.

Training RFdiffusion for antibody design

RFdiffusion uses the AlphaFold2 (ref. 19) and RF2 frame representation of protein backbones comprising the Cα coordinate and N-Cα-C rigid orientation for each residue. During training, a noising schedule is used that, over a set number of ‘timesteps’ (T), corrupts the protein frames towards random prior distributions (Cα coordinates are corrupted with three-dimensional Gaussian noise, and residue orientations with Brownian motion on SO3). During training, a Protein Data Bank (PDB) structure and a random timestep (t) are sampled, and t noising steps are applied to the structure. RFdiffusion predicts the de-noised (pX0) structure at each timestep, and a mean squared error loss is minimized between the true structure (X0) and the prediction (pX0). At inference time, a random residue distribution (XT) is sampled, and RFdiffusion iteratively de-noises this to generate novel protein structures.

We fine-tuned RFdiffusion predominantly on antibody complex structures (Fig. 1; see Methods in Supplementary Information). At each step of training, the antibody structure is corrupted. To permit specification of the framework structure and sequence at inference time, the framework sequence and structure are provided as conditioning input to RFdiffusion during training (Fig. 1b). Because it is desirable for the rigid-body position (dock) between antibody and target to be designed by RFdiffusion along with the CDR loop conformations, the framework structure is provided in a global-frame-invariant manner during training (Fig. 1c). We utilize the ‘template track’ of RF2/RFdiffusion to provide the framework structure as a two-dimensional matrix of pairwise distances and dihedral angles between each pair of residues (a representation from which three-dimensional structures can be accurately recapitulated)20 (Supplementary Fig. 1a). The framework and target templates do not encode their relative positions in the three-dimensional space. In this work, we kept the sequence and structure of the framework region fixed, and focused on the design of the CDRs and the overall rigid-body placement of the antibody to the target. We trained RFdiffusion with an additional one-hot encoded ‘hotspot’ feature, which provides some fraction of the residues that the antibody CDRs interact with, such that at inference, we can direct antibodies towards a specific site (Fig. 1d; we refer to these sites as ‘epitopes’ throughout the remainder of the text). For simplicity, we refer to this fine-tuned version of the network as RFdiffusion for the remainder of this paper. 

Fig. 1: Overview of RFdiffusion for antibody design.
figure 1

a, RFdiffusion is trained such that at time T, a sample is drawn from the prior distribution (three-dimensional Gaussian distribution for translations and uniform SO3 distribution for rotations), and de-noised between times T and 0 to generate an (in this case) scFv. b, The antibody framework is provided as a sequence and ‘template’ to RFdiffusion; the latter specifying the pairwise distances and dihedral angles between framework residues. For example, one can specify the design of a VHH (top) or scFv (bottom). c, Diversity in the antibody–target dock is achieved because the framework template does not encode the rigid body framework–target relationship. Diverse docking modes are sampled by RFdiffusion. d, The epitope is specified by provision of ‘hotspot’ residues, which direct the designed antibody (compare orange, left, versus pink, right). e, Overview of the computational design pipeline described in this article. RFdiffusion performs the backbone design step, given a target, epitope hotspots and antibody framework. ProteinMPNN designs only the sequence of the CDR residues (not the framework residues). Fine-tuned RoseTTAFold2 predicts the structure of the designed antibody, given the target (sequence, structure and, optionally, some fraction of hotspot residues) and designed antibody sequence. Self-consistency (high similarity between predicted and designed structures) and high confidence (low predicted alignment error) define in silico success. Note that AlphaFold3, not available at the time of this work, is a better predictor of success than RoseTTAFold2. f, The contribution of this work is the epitope-specific antibody design pipeline depicted in panel e. Several methods can be used to experimentally validate designs and subsequently affinity-mature or optimize them. In this work, we used yeast surface display and/or E. coli expression with SPR for experimental validation (taking approximately 6 weeks and 2 weeks post-oligonucleotide order, respectively), and OrthoRep affinity maturation.

With this training regime, RFdiffusion is able to design antibody structures that closely match the structure of the input framework structure and target the specified epitope with novel CDR loops (Supplementary Fig. 1). After the RFdiffusion step, we use ProteinMPNN to design the CDR loop sequences. The designed antibodies make diverse interactions with the target epitope and differ significantly from sequences in the training dataset (Extended Data Fig. 1). There was no correlation between training dataset similarity and binding success (Extended Data Fig. 1a, red lines).

Fine-tuning RF2 for antibody validation

Design pipelines typically produce a wide range of solutions to any given design challenge. An effective way to filter designed proteins and interfaces that are most likely to succeed experimentally is based on the similarity of the designed structure to the AlphaFold2-predicted structure for the designed sequence (this is often referred to as ‘self-consistency’), which has been shown to correlate well with experimental success21,22. In the case of antibodies, however, AlphaFold2 fails to accurately predict antibody–antigen structures23, preventing its use as a filter in an antibody design pipeline, and at the outset of this project, AlphaFold3 (ref. 24) was not available.

We sought to improve design filtering by fine-tuning RoseTTAFold2 on antibody structures. To simplify antibody structure prediction, we provided information during training about the structure of the target and the location of the target epitope to which the antibody binds; the fine-tuned RF2 must still correctly model the CDRs and find the correct orientation of the antibody to the targeted region. The rationale for providing this information is that the target structure and binding location are available during design (but are typically not available during general structure prediction). With this training regimen and additional information, RF2 is able to robustly distinguish true antibody–antigen pairs from decoy pairs and often accurately predicts antibody–antigen complex structures, but only when the bound (holo) conformation of the target structure and epitope information is provided (Extended Data Fig. 2a–d). At monomer prediction, the fine-tuned RF2 outperformed previous models available at the time, especially at CDR H3 structure prediction (Extended Data Fig. 2e,f).

When this fine-tuned RF2 network is used to re-predict the structure of RFdiffusion-designed VHHs, a significant fraction are confidently predicted to bind in an almost identical manner to their designed structure (Extended Data Fig. 3a). Furthermore, in silico cross-reactivity analyses demonstrated that RFdiffusion-designed VHHs are rarely predicted to bind to unrelated proteins (Extended Data Fig. 3b). VHHs that are confidently predicted to bind to their designed target are predicted to form high-quality interfaces, as measured by Rosetta ddG (Extended Data Fig. 3c). This indicates that RF2 filtering might enrich for experimentally successful binders.

Design and characterization of VHHs

We initially focused on the design of single-domain antibodies (VHHs) produced by camelids25. To date, two VHH-based therapies have been approved by the FDA with many clinical trials ongoing25. Despite having fewer CDR loops (three) than conventional antibodies (six), the average interaction surface area of a VHH is very similar to that of an antibody26, suggesting that a method capable of VHH design could also be suitable for antibody design. Indeed, in silico metrics for scFvs and VHHs showed similar qualities of interfaces, as assessed by Rosetta7 and fine-tuned RF2 (Extended Data Fig. 3b–f).

We chose a widely used humanized VHH framework (h-NbBcII10FGLA)27 as the basis of our VHH design campaigns, and designed VHHs to a range of disease-relevant targets: C. difficile TcdB, influenza H1 haemagglutinin, respiratory syncytial virus (RSV) sites I and III, SARS-CoV-2 receptor-binding domain (RBD) and IL-7Rα. Computationally filtered designs were screened either at high throughput by yeast surface display (9,000 designs per target; RSV sites I and III, RBD and influenza haemagglutinin) or at lower throughput with Escherichia coli expression and single-concentration surface plasmon resonance (SPR; 95 designs per target; TcdB, IL-7Rα and influenza haemagglutinin; the latter was screened using both methods).

The highest affinity binders to RSV site III, influenza haemagglutinin, RBD and TcdB are shown in Fig. 2a–c,e, respectively (see also Supplementary Fig. 2 for all the SPR traces of confirmed VHH binders identified in this study and Supplementary Methods Table 6 for success rates against each target, which range from 0% to 2%). The CDR loops are distinct from VHHs observed in nature, indicating substantial generalization beyond the training dataset (Extended Data Fig. 1). Of the haemagglutinin binders tested against the insect-cell-produced haemagglutinin monomer, the highest affinity binder had a dissociation constant (Kd) of 78 nM (Fig. 2b), with other binders having affinities of 546 nM, 698 nM and 790 nM. For TcdB, the target epitope was the Frizzled interface, for which there are no antibodies or VHHs targeting this site in the PDB. For the best-designed VHH from both RBD (Kd = 5.5 μM; Fig. 2c) and TcdB (Kd = 260 nM; Fig. 2d), binding was confirmed to be to the desired epitope: binding was completely abolished upon addition of a previously designed, structurally characterized de novo binder to that epitope (AHB2 (PDB ID 7UHB28) for RBD and FZD48 (PDB ID 9CM5 (ref. 29)) for TcdB; Fig. 2c,d and Extended Data Fig. 4a–c). This TcdB VHH also neutralized TcdB toxicity in CSPG4-knockout cells (an alternative TcdB receptor) with a half-maximal effective concentration (EC50) of 460 nM (Extended Data Fig. 4d,e). For TcdB, the interactions were specific, with no binding observed to the highly related (70% sequence homology) Paeniclostridium sordellii lethal toxin L (TcsL; Extended Data Fig. 4b). These data demonstrate the ability of RFdiffusion to design VHHs that make specific interactions with the target epitope.

Fig. 2: Biochemical characterization of designed VHHs.
figure 2

a,b, Nine thousand designed VHHs were screened against RSV site III (a; VHH_RSV_01) and influenza haemagglutinin (b; VHH_flu_01) with yeast surface display, before soluble expression of the top hits in E. coli. SPR demonstrated that the highest affinity VHHs to RSV site III and influenza haemagglutinin bound their respective targets with 1.4 μM and 78 nM, respectively. c, Nine thousand VHH designs were tested against the SARS-CoV-2 RBD, and after soluble expression, SPR confirmed an affinity of 5.5 μM to the target for design VHH_RBD_D4 (left). Binding was to the expected epitope, confirmed by competition with a structurally confirmed de novo binder (AHB2 (PDB ID 7UHB), right). d, Ninety-five VHH designs were tested against C. difficile TcdB. The highest affinity VHH, VHH_TcdB_H2, bound with 262 nM affinity (left), and also competed with a structurally confirmed de novo binder (FZD48, PDB ID 9CM5 (ref. 29)) to the same epitope (right). See also Extended Data Fig. 4a–c for quantification of the competition shown in panels c,d. For all panels, the measured binding response is indicated in a solid blue line, and the global fit using a 1:1 binding interaction model is indicated with a black dashed line.

Cryo-EM of a VHH-binding influenza haemagglutinin

We sought to evaluate design accuracy by cryo-electron microscopy (cryo-EM) structure determination of the designed anti-haemagglutinin VHHs in complex with natively glycosylated, trimeric influenza haemagglutinin glycoprotein (strain A/USA:Iowa/1943 H1N1; Supplementary Fig. 4), which retains the conserved stem epitope used during computational VHH design and upstream biochemical screening. Cryo-EM data processing revealed that one VHH design effectively bound to the fully glycosylated haemagglutinin trimer (out of the four tested), denoted hereafter as VHH_flu_01 (Fig. 3 and Extended Data Fig. 5). Two-dimensional classification of all particles in the dataset (Fig. 3a) and the determined 3.0 Å structure of the complex (Fig. 3b and Supplementary Methods Table 10) identified approximately 66% of haemagglutinin particles bound to a maximum of two VHHs per trimer (Fig. 3a–h). This partial occupancy is probably attributable to the N296 glycan, which, in unbound subunits, partially occludes the target epitope but reorients when bound to VHH_flu_01 (see Fig. 3h).

Fig. 3: Cryo-EM structural characterization of de novo-designed VHH binding to influenza haemagglutinin and TcdB.
figure 3

a, Labelled cryo-EM two-dimensional class averages of designed VHH_flu_01 bound to influenza haemagglutinin (HA) strain A/USA:Iowa/1943 H1N1. b, The 3.0 Å cryo-EM three-dimensional reconstruction shows VHH_flu_01 bound to H1 along the stem in two protomers. c, Cryo-EM structure of VHH_flu_01 bound to influenza haemagglutinin. d, Superposition of the designed VHH CDR3 structure with the cryo-EM structure. e, Comparison of predicted CDR3 rotamers compared with the built 3.0 Å cryo-EM structure. f,g, The cryo-EM structure closely matches the design. h, Examination of apo haemagglutinin protomers juxtaposed with those bound to the designed VHH shows repositioning of glycan N296 to allow for binding of the designed VHH to the stem. i, Labelled cryo-EM two-dimensional class averages of the designed VHH, VHH_TcdB_H2, bound to full-length TcdB. j, The 4.6 Å cryo-EM three-dimensional reconstruction of the complex shows VHH_TcdB_H2 bound to the target epitope as predicted. CROPs, combined repetitive oligopeptides; GTD, glucosyltransferase domain. k, Owing to the modest resolution, a fragment of TcdB was first docked into the cryo-EM density map, and the full design model—including both the TcdB fragment and the designed VHH—was then aligned to the pre-fitted TcdB fragment. The predicted design closely matches the experimentally determined complex in structure, epitope targeting and overall conformation. l, Labelled cryo-EM two-dimensional class averages of the designed VHH, VHH_TcdB_H2_ortho, bound to full-length TcdB. m, The 5.7 Å cryo-EM three-dimensional reconstruction of the complex shows that VHH_TcdB_H2_ortho bound the target epitope as predicted. n, A TcdB fragment was docked into the cryo-EM map, followed by alignment of the full model including the OrthoRep-matured VHH. The resulting structure shows no detectable change in binding orientation or docking angle compared with the original design, indicating that OrthoRep maturation preserved the predicted mode of epitope engagement. In all panels: yellow indicates haemagglutinin; grey denotes the computational design prediction; pink or navy shows VHH (cryo-EM); and teal indicates glycan.

The structure of influenza haemagglutinin bound to two copies of VHH_flu_01 (Fig. 3b,c and Extended Data Fig. 5) reveals a VHH approach angle that closely matches the predicted model (Fig. 3f) and a VHH backbone that is very close to the RFdiffusion design, with a calculated root mean square deviation (RMSD) of 1.45 Å (Fig. 3g). The CDR3 structure is also very similar between the cryo-EM structure and the computational model (RMSD = 0.8 Å; Fig. 3d), with residues V100, V101, S103 and F108 in the de novo-designed CDR3 loop interacting with the influenza haemagglutinin stem epitope in the cryo-EM structure, as designed by RFdiffusion and re-predicted with RF2 (Fig. 3e). The design is highly dissimilar from the closest antibody–VHH binding to this epitope in the PDB (Extended Data Fig. 1f,g and Supplementary Fig. 5). Together, these results demonstrate the VHH design with atomic-level precision.

Cryo-EM of VHHs to TcdB and SARS-CoV-2

To improve the binding affinity of de novo-designed VHHs, we utilized the orthogonal error-prone DNA replication system, OrthoRep, for continuous hypermutation of target genes in vivo30,31. OrthoRep has been shown to drive the rapid affinity maturation of yeast surface-displayed antibodies. We used this capability to affinity-mature VHHs targeting TcdB, influenza H1 haemagglutinin and the SARS-CoV-2 RBD. Affinity-matured VHHs acquired several mutations relative to the parent designs and improved binding affinities by approximately two orders of magnitude (Supplementary Fig. 3), making them suitable candidates for downstream cryo-EM structural characterization.

For TcdB, our design campaign targeted the Frizzled-binding epitope located on the RBD. TcdB consists of four functional domains including a central delivery and RBD (DRBD) where the VHHs were designed to bind. Cryo-EM characterization of the original parent design, VHH_TcdB_H2, confirmed that the VHH engages the target Frizzled DRBD epitope (Supplementary Fig. 7). Analysis via two-dimensional and three-dimensional classification revealed a mix of bound and unbound TcdB particles (Fig. 3i and Supplementary Figs. 6 and 7). Extensive three-dimensional classification and local refinement identified multiple structural states of TcdB within the dataset, including an extended bound state (Extended Data Fig. 6 and Supplementary Fig. 8). Three-dimensional refinement of the bound VHH in the extended TcdB state yielded a modest 4.6 Å map, into which the design model was confidently rigid-body docked, showing high agreement with the intended design structure (Fig. 3i–k). To evaluate whether the improved affinity achieved through OrthoRep preserved the original binding mode of the parent design, we performed additional cryo-EM analysis on the affinity-matured VHH, VHH_TcdB_H2_ortho. These experiments revealed a high proportion of TcdB particles now bound by the VHH, consistent with its enhanced affinity (Fig. 3l–n and Supplementary Fig. 3b). Using a similar processing pipeline as for the parent VHH–TcdB complex, we resolved the affinity-matured VHH–TcdB complex to a modest 5.7 Å resolution, enabling us to confidently dock the designed VHH into the cryo-EM density with close agreement. This confirmed that the VHH maintained targeting to the correct epitope and retained its original binding pose after OrthoRep-mediated affinity maturation (Fig. 3l–n and Extended Data Fig. 6). These results underscore the capability of RFdiffusion to design accurate de novo VHHs that are capable of targeting previously unexplored epitopes and are amenable to downstream affinity maturation.

We next used cryo-EM to characterize an affinity-matured VHH (VHH_RBD_D4_ortho19) targeting the SARS-CoV-2 spike RBD, where competition experiments indicated that the parental VHH bound the intended epitope (Fig. 2c, Extended Data Fig. 4c and Supplementary Figs. 3b and 9). The RBD transitions between ‘up’ and ‘down’ conformations, with the ‘up’ state enabling receptor binding and viral entry32. Cryo-EM two-dimensional class averages and three-dimensional classification reconstructions of the VHH-bound complex revealed a mixture of RBD conformations (1–2 ‘up’), with VHH density observed exclusively in the up state. This is consistent with its design, as the target epitope is occluded in the down conformation (Supplementary Fig. 9a,b). Global refinement with an average estimated resolution of 3.9 Å provided well-defined density for the lower portion of the spike protein (local resolution of approximately 2.5 Å), but the relative flexibility of the RBD resulted in substantial signal averaging, causing density loss at higher contour levels, which precluded assessment of VHH design accuracy (Supplementary Fig. 9c–e). Symmetry expansion and local refinement helped improve the resolution of the RBD–VHH interface, confirming the intended VHH fold and accurate epitope targeting following rigid-body docking of the design model into the density map (Supplementary Fig. 9f,g), in agreement with our biochemical competition data (Fig. 2c). However, although the VHH bound the correct RBD epitope, its binding mode deviated notably from the design model, instead adopting a predominantly framework-mediated interaction that more closely matched retrospective AlphaFold3 predictions (Supplementary Fig. 9g,h). Owing to the deviation between the designed dock and the experimentally determined dock, we classified this as a design failure.

Design of scFvs with six designed CDRs

Given the success of RFdiffusion at designing VHHs with three de novo CDRs, we next tested its ability to design both heavy and light chains in scFv format. RFdiffusion was used to generate scFvs targeting specific epitope sites, following a strategy similar to the VHH design approach. However, unlike VHHs, where only three CDRs were built de novo, scFv design involved constructing all six CDRs on both the heavy and the light chains in addition to the docking mode.

The gene synthesis problem is more formidable for scFvs than for VHHs as they are too long to be simply assembled from pairs of conventional oligonucleotides synthesized on oligonucleotide arrays, and are challenging to uniquely pair due to high sequence homology between scFvs. We developed stepwise assembly protocols that enable the construction of libraries with heavy and light chains either specifically paired as in the design models (Supplementary Figs. 10 and 11) or combinatorially mixed within subsets of designs specifically with similar target-binding modes (Supplementary Fig. 12). The latter approach helps to overcome the greater challenge of accurate design of six CDRs de novo, which increases the possibilities for error compared with the VHH problem as only one suboptimal CDR can compromise binding. We found that given sets of nearly superimposable designs targeting the same site with the same binding mode, new scFvs generated by combining pairs of heavy and light chains from different designs were confidently predicted to bind to the target site in the designed binding mode at similar frequencies as compared to the original designs (Extended Data Fig. 7a). By contrast, random, structure-agnostic pairing rarely led to predicted binders (Extended Data Fig. 7a). Hence, by mixing CDRs from different designs that bind in the same orientation, we can effectively overcome failures due to single imperfectly designed CDRs, thereby offering a combinatorial solution to a combinatorially more complex problem (two-chain scFv design versus one-chain VHH design). This strategy highlights a key advantage of structure-based design: ‘intelligent’ pairing of heavy and light chains is possible with a structural model of every antibody, and allows de novo-designed antibody libraries to reach scales attainable by traditional library assembly methods, despite current limits in gene synthesis.

We succeeded in identifying epitope-specific scFvs from the heavy–light combinatorial libraries (of a theoretical complexity of approximately 10 million; Extended Data Figs. 7b,c and 8a–c) but not the fixed pairing libraries (Supplementary Fig. 13). Following expression and purification, SPR analysis of six distinct scFvs originating from two unique docks targeting the Frizzled epitope of TcdB revealed a range of affinities (Fig. 4d–h): the highest affinity binder, scFv6, had a Kd of 72 nM (Fig. 4g). Conversion of the scFv to a full length IgG1 generated antibodies that bind with comparable (68 nM) affinity, demonstrating that our design method can be used to generate full-length antibodies (Fig. 4i). There are no antibodies binding to this epitope in the PDB, hence, this success cannot be attributed to memorization. Subsequent prediction of the structure of the scFv with AlphaFold3 showed a binding mode identical to that of the two nearly superimposable parent designs that contributed the light and heavy chains (Supplementary Fig. 16c,d). Competition with a known receptor, Frizzled-7, to this epitope confirmed that binding of scFv5 was on target (Fig. 4j). By contrast, no competition was seen in the presence of CSPG4, an alternative receptor that interacts with an epitope at the toxin core. Thus, scFvs targeting user-specified epitopes can be identified from structure-aware designed combinatorial libraries.

Fig. 4: Biochemical characterization of combinatorially assembled scFvs with six designed CDRs.
figure 4

a, Multiple sequence alignment of six scFvs that bind to TcdB. scFvs 1–5 originate from the same structural cluster, whereas scFv6 originates from a distinct cluster. b,c, AlphaFold3 predictions of scFv5 (b) and scFv6 (c) in complex with TcdB. scFv5 and scFv6 are predicted to bind to a similar but not identical epitope. The predicted orientation of scFv6 relative to TcdB is rotated compared with scFv5. d, Affinity of scFv5 to TcdB was 460 nM by SPR. e, Computational prediction of the scFv5–TcdB interface for VH (variable heavy-chain fragment; left) and VL (variable light-chain fragment; right). f, scFv5, when expressed as a full-length IgG1, shows a binding affinity of 380 nM to TcdB by SPR. g, Affinity of scFv6 to TcdB was 72 nM by SPR. h, Computational prediction of the scFv6–TcdB interface for VH (left) and VL (right). i, scFv6, when expressed as a full-length IgG1, shows a binding affinity of 68 nM to TcdB by SPR. j, scFv5 competes with Frizzled-7 and does not compete with CSPG4, indicating on-target binding. scFv5 was conjugated to a CM5 chip and TcdB RBD was flowed over at 50 nM either alone or mixed with 1 μM of Frizzled-7, CSPG4 or scFv5. k,l, SPR comparative analysis of B1.2.1 binding to C*07:02–PHOX2B versus C*07:02–PHOX2B(R6A). scFv was immobilized and then on-target and off-target binding was measured across an eight-step, twofold titration with an upper concentration of 5 μM. Steady-state kinetic analysis (k) and raw SPR trace (l) of on-target and off-target binding indicate specific binding to the intended target. m, AlphaFold3 predictions of HLA-C*07:02 with peptide PHOX2B (left) and PHOX2B(R6A) (right). R6 of PHOX2B is predicted to be solvent exposed. n, AlphaFold3 prediction of scFv B1.2.1 in complex with C*07:02–PHOX2B (left). Predicted polar contacts with R6 of the PHOX2B peptide (right), mediated by CDRH3, CDRL1 and CDRL2, are also shown. Figure was created using BioRender (http://biorender.com).

We next targeted a clinically relevant epitope: the QYNPIRTTF peptide derived from the PHOX2B neuroblastoma-dependency gene and master transcriptional regulator in complex with the major histocompatibility complex (MHC) allotype HLA-C*07:02 (we refer to this peptide below simply as PHOX2B). The PHOX2B peptide was originally discovered by immunopeptidomics of neuroblastoma patient-derived samples, and has been targeted with peptide-centric chimeric antigen receptors (PC-CARs) for treating high-risk neuroblastoma33. However, the PC-CARs identified previously are restricted to recognizing PHOX2B presented on HLAs of the A9 serological group, excluding the common allotype HLA-C*07:02 (ref. 34). Targeting the PHOX2B–HLA-C*07:02 complex could meaningfully increase the addressable patient population for these immunotherapies, and has been the focus of ongoing therapeutics development. Recently, computationally designed (non-antibody) binders for PHOX2B–HLA-C*07:02 have been developed, using the TRACeR-I system35, whereas high-affinity TCRs have been identified for targeting peptides on the common HLA-C*08:02/HLA-C*05:01 allotypes36. A benefit of structure-based design is the ability to target specific peptide residues to achieve binding specificity (rather than binding only to the MHC), and we therefore used RFdiffusion to target the R6 residue, which is known to be important for binding in the PC-CAR34. Given the low stability of the PHOX2B–HLA-C*07:02 complex (Tm of 44.2 °C)34, we leveraged a disulfide-stabilized approach to prepare a stabilized form of the pHLA target37. Using the combinatorial assembly approach described above, we identified modest-affinity (400 nM as measured by SPR and 1 μM as measured by isothermal titration calorimetry (ITC); Fig. 4k,l and Extended Data Fig. 8e–g) scFv binders to PHOX2B–HLA-C*07:02. Binding was specific to the peptide, with no detectable binding to the R6A point mutant PHOX2B peptide (PHOX2B(R6A)–HLA-C*07:02; Fig. 4l). Attempts were made to incorporate scFv binders into a 4-1BB-CAR, but T cell cytotoxicity assays demonstrated no detectable killing of a range of neuroblastoma cell lines (Supplementary Fig. 14), probably because of the modest binding affinity and/or low levels of antigen density expressed on the tumour cells. Although there is still considerable room for improvement in affinity, this demonstrates the ability of structure-based antibody design, paired with appropriate library assembly methods, to design specific binders to challenging and clinically important target epitopes.

Atomically accurate scFv design to TcdB

To evaluate the accuracy of de novo scFv design, we determined the cryo-EM structures of two combinatorially assembled scFvs, scFv5 and scFv6, both targeting the Frizzled epitope of TcdB. Cryo-EM analysis confirmed that both scFvs bound the Frizzled epitope as designed (Fig. 5 and Extended Data Fig. 9). High-resolution two-dimensional class averages of scFv6 revealed clear density for both TcdB and the bound scFv, further supported by a 3.6 Å three-dimensional reconstruction (Fig. 5a,b and Extended Data Fig. 9). The resolved structure showed that scFv6 engaged the Frizzled epitope along its DRBD domain with the predicted binding orientation (Fig. 5d and Supplementary Methods Table 10). Superposition of the cryo-EM structure with the design model demonstrated remarkable agreement, with both heavy and light chains interacting with the epitope as intended (Fig. 5c and Supplementary Fig. 16a,b). The overall fold closely matched the design, a composite model of the two chains originating from distinct but structurally similar designs (RMSD = 0.9 Å), and each of the six CDRs exhibited near-atomic precision (backbone RMSDs: CDRH1 = 0.4 Å, CDRH2 = 0.3 Å, CDRH3 = 0.7 Å, CDRL1 = 0.2 Å, CDRL2 = 1.1 Å and CDRL3 = 0.2 Å; Fig. 5e,f). This agreement extended to the rotameric conformations of CDR side chains and their interactions with the Frizzled epitope, underscoring the accuracy of RFdiffusion in designing de novo scFv–target interactions (Fig. 5g).

Fig. 5: Cryo-EM structural characterization of two TcdB-binding scFvs.
figure 5

a, Labelled cryo-EM two-dimensional class averages of a designed scFv, scFv6, bound to TcdB. b, A 3.6 Å cryo-EM three-dimensional reconstruction of the complex shows scFv6 bound to TcdB along the Frizzled epitope. c, The cryo-EM structure of scFv6 in complex with TcdB closely matches the design model. d, Cryo-EM structure of scFv6 bound to TcdB. e, Cryo-EM reveals the accurate design of scFv6 using RFdiffusion. f, Superposition of each of the six designed scFv6 CDR loop predicted structures as compared with the built cryo-EM structure. g, Comparison of predicted CDRH3 rotamers compared with the built 3.6 Å cryo-EM structure. h, Labelled cryo-EM two-dimensional class averages of the designed scFv, scFv5, bound to full-length TcdB. i, A 6.1 Å cryo-EM three-dimensional reconstruction of the complex shows the scFv5 bound to the target epitope as predicted. j, Owing to the modest resolution, a fragment of TcdB was first docked into the cryo-EM density map, and the full design model—including both the TcdB fragment and the designed scFv—was then aligned to the pre-fitted TcdB fragment. This approach demonstrates that the predicted design closely matches the experimentally determined complex in structure, epitope targeting and overall conformation. In models, yellow denotes TcdB; navy indicates the variable heavy-chain fragment (cryo-EM); pink shows the variable light-chain fragment (cryo-EM); and grey denotes the computational design prediction.

scFv5 was designed to bind to the same epitope but with a distinct approach angle relative to scFv6 (Fig. 4b,c). A 6.1 Å cryo-EM reconstruction confirmed scFv5 binding to the TcdB Frizzled epitope, with two-dimensional class averages showing clear density for the complex (Fig. 5h and Supplementary Fig. 15). Rigid-body docking of the design model into the cryo-EM density revealed close agreement between the predicted-binding and experimentally determined-binding modes (Fig. 5i,j).

Improved oracles increase success rate

Although our results demonstrate that the de novo design of antibodies is possible, the experimental success rates remain low. A key contributor to previous successes in de novo binder design was improved filters (primarily AlphaFold2 (ref. 19), which enriched for experimental success in the subset of designs that are tested experimentally2,22. At the outset of this study, we sought to build such a filter by fine-tuning RoseTTAFold2 (Extended Data Figs. 2 and 3), but the filtering power of this model is limited (at least with the settings used; providing 100% of interface ‘hotspots’). This probably accounts for the low experimental success rates and the inaccurate SARS-CoV-2 design, where the overall fold and epitope targeting were correct, but the binding orientation was not.

Subsequent to the design work in this study, AlphaFold3 (ref. 24) was released and has improved antibody structure prediction accuracy24,38, both with24,38 and without38 antigen present. Retrospectively, we can assess how filtering with AlphaFold3 would have improved experimental success rates. First, AlphaFold3 accurately predicts the experimentally validated structure of the inaccurately designed SARS-CoV-2 VHH (Supplementary Fig. 9). Had AlphaFold3 been used as an initial filter, this design would have been rejected due to the discrepancy between the predicted and intended structures, thereby preventing its experimental testing. Second, we predicted the structures of the SARS-CoV-2, influenza haemagglutinin, TcdB and IL-7Rα VHH designs using AlphaFold3 with a multiple sequence alignment (MSA) and templates for the target and only a template for the VHH (as CDRs are de novo, we reasoned the MSA would be of limited utility). We analysed the predictions for libraries with at least one structurally validated VHH (TcdB, influenza haemagglutinin and SARS-CoV-2). These results are dominated by anti-haemagglutinin VHHs as the majority of successful binders came from this library. We found that the AlphaFold3 interface predicted template modelling (ipTM) score, a measure of model confidence over the interface, is predictive of binding success (area under the curve = 0.86; Extended Data Fig. 10a,b). Overall, only 9% of our ordered VHH designs have an ipTM > 0.6, suggesting that success rate will be improved by incorporation of an ipTM filter. We ran a similar analysis for the combinatorially assembled scFv libraries; we predicted the structures of the parental scFv designs (before combinatorial assembly) and the experimentally confirmed scFv designs (combinatorially assembled) using AlphaFold3 with an MSA for the target sequence and templates for the target as well as the heavy and light chains, taking the maximum ipTM score over 10 seeds. We found that successful designs cluster to higher AlphaFold3 ipTM scores than the parental designs (Extended Data Fig. 10c). Only 4% of the initial design library has ipTM > 0.85, whereas 5 out of the 6 experimentally confirmed designs pass this threshold, again suggesting that filtering by AlphaFold3 ipTM should increase success rates (Extended Data Fig. 10d).

Discussion

Our results demonstrate that de novo design of antibody domains targeting specific epitopes on a target is possible. The cryo-EM structural data for the designed VHHs to influenza haemagglutinin and TcdB reveals very close agreement to the computational design models, showing that our approach can design VHH complexes with atomic accuracy—including the highly variable H3 loop and the overall binding orientation—that are highly dissimilar from any known structures in the PDB. Moreover, cryo-EM structural data of designed scFvs bound to TcdB demonstrate the ability of RFdiffusion to design two-chain scFvs accurately. To our knowledge, these are the first structurally validated cases of de novo-designed antibodies.

Our computational method synergizes with experimental screening approaches developed for retrieving antibodies from large random libraries in several ways. First, yeast display selection methods widely used for antibody library screening enable the retrieval of the highest affinity binders among large sets of designs, which is currently necessary due to the quite low design success rate. Second, screening combinatorial libraries that mix heavy and light chains from designs with similar binding modes allows for the identification of scFvs composed of structurally compatible chains targeting specific epitopes, as demonstrated here for TcdB and PHOX2B–peptide MHC. Third, affinity maturation using OrthoRep3 improves the measured affinity of initial VHH designs down to the single-digit nanomolar or subnanomolar range, while preserving the original designed-binding mode. From a practical standpoint, the key advance of this work is not the ability to generate VHHs and scFvs against a target—something often achievable through purely experimental methods—but rather the ability to accurately target specific binding epitopes. The epitope specificity is critical for therapeutic applications such as antagonists that block receptor–ligand interactions, antibodies that avoid competing with endogenous molecules, modulators that induce conformational changes to trigger signalling, or antibodies targeting conserved or evolutionarily restricted viral epitopes.

There remains considerable room for improvement. For the backbone design step, incorporating recent architectural improvements39 and new advances in generative modelling40,41,42 may yield design models with higher designability and diversity. RoseTTAFold2 and RFdiffusion have also recently been extended to model all biomolecules (rather than just proteins)43, and incorporating this capability into the antibody design RFdiffusion variant should permit the accurate design of antibodies to epitopes containing non-protein atoms, such as glycans. ProteinMPNN was not modified in this current work, but designing sequences that more closely match human CDR sequences would be expected to reduce the potential immunogenicity of designed antibodies44,45. Indeed, designed sequences are currently somewhat less human (as assessed by an OASis score46) than therapeutic antibody CDRs (Supplementary Fig. 1d). Further improvements in antibody structure prediction methods should allow faster optimization of upstream design methods and improve experimental success rates.

Ultimately, computational de novo design of antibodies using our RFdiffusion and related approaches47 could revolutionize antibody discovery and development. As the method improves and success rates increase, it has the potential to be faster and more cost-effective than immunizing animals or screening random libraries. A structure-based approach to antibody design should also aid the optimization of key pharmaceutical properties, such as aggregation, solubility and expression levels (all major challenges in antibody development) in a structure-informed manner. Together, we expect that computational design of antibodies will increase the number of tractable clinical targets and diseases accessible to antibody therapeutics.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.